簡書Markdown遷移至Ghost

簡書一直用著挺好的，唯一讓我難受的點就是管理文章的頁面沒有搜索，文章多了就很難找之前寫過的文章。另外，有好長時間了，簡書的圖片一直加載不出來，我起初以為是我的網絡開代理造成，搜索一番才知道是簡書截停了Firefox的訪問，不清楚處于什么原因，但是堅定了我撤離簡書的想法。

簡書還是很大度的，可以設置-賬號設置里可以打包下載所有文章，結果是純markdown文本，很容易做遷移。

困難來到Ghost這邊， Ghost支持幾個國外平臺的導入，國內的就不用想了。Ghost提供導入和導出選項，我這里仿造Ghost的導出格式，把簡書的文章塞進去，再導回去。

Ghost內容生成

文章末尾提供了一個python腳本，用于生成Ghost的導入文件。

聲明：腳本和本文所述內容可能造成未知問題，使用前確保你清楚其中的功能并做好備份，本人不對造成的任何損失負責，轉載請注明出處。

首先介紹一下這個腳本的輸入輸出：

輸入
1. 簡書導出的rar文件
2. Ghost導出的json文件，用于獲取Ghost 中的配置信息
輸出
1. json格式的Ghost導入文件，包含文章信息
2. zip格式的Ghost導入文件，包含圖片信息，兩個文件需要分開單獨導入

腳本依賴

系統中7z命令進行壓縮和解壓，所以運行前保證你在系統命令行中可以使用7z。
需要使用requests下載簡書的圖片，使用pip install requests進行安裝

腳本運行

找到main函數，這里有四個參數，改成你的，執行就好了，生成文件放在了簡書導出的rar文件同級的目錄，文章名在下載時簡書出于通用性考慮把特殊字符換成了“-”，和本文無關。

設置參數

去我的Ghost，看看效果吧：http://ray.twig.ink


import os
import json
from pathlib import Path
import datetime
import subprocess


def handle_img (post_info, save_path, featured_first_img):
    """下載圖片并替換鏈接"""
    md_str = post_info['markdown']
    if 'https://upload-images' not in md_str:
        return md_str

    import re
    import requests
    # 匹配Markdown圖片鏈接
    pattern = r'!\[(.*?)\]\((.*?)\)'  # 匹配 ![alt text](image_url) 格式的圖片鏈接

    now = datetime.datetime.now()
    _rel_path = f'/content/images/{now.year}/{now.month}/'
    ghost_image_path = f'__GHOST_URL__{_rel_path}'
    image_save_path = f'{save_path}{_rel_path}'
    if not os.path.exists(image_save_path):
        os.makedirs(image_save_path)

    # 下載圖片
    matches = re.findall(pattern, md_str)
    for alt, url in matches:
        img_url = url.split('?')[0]
        img_file_name = img_url.split('/')[-1]
        image_save_url = f'{image_save_path}/{img_file_name}'
        print(f'downloading.. {url}')
        response = requests.get(url)
        if response.status_code == 200:
            with open(image_save_url, 'wb') as file:
                file.write(response.content)

        if featured_first_img and post_info['feature_image'] is None:
            post_info['feature_image'] = f'{ghost_image_path}/{img_file_name}'

    # 替換原文圖片鏈接
    def replace_image_url(match):
        alt_text = match.group(1)
        original_url = match.group(2)
        # 提取圖片名
        image_name = os.path.basename(original_url.split('?')[0])
        # 構建新的圖片鏈接
        new_url = f'{ghost_image_path}{image_name}'
        return f'![{alt_text}]({new_url})'
    res = re.sub(pattern, replace_image_url, md_str)
    return res

def md_to_mobiledoc(markdown, mobiledoc_version):
    mobiledoc = json.dumps({
        'version': mobiledoc_version,
        'markups': [],
        'atoms': [],
        'cards': [['markdown', {'cardName': 'markdown', 'markdown': markdown}]],
        'sections': [[10, 0]]
    }, ensure_ascii=False)
    return mobiledoc

def generate_uuid():
    import uuid
    return str(uuid.uuid4())

def generate_id():
    """生成ghost格式的id，但是這個導入的時候并沒有用，系統會自動再生成一個"""
    custom_id = generate_uuid().replace('-', '')[-24:]
    return custom_id

def read_jianshu(zip_path: str):
    """將簡書的所有markdown文件讀出來"""
    _path = Path(zip_path)

    extract_to = os.path.join(_path.parent, _path.stem)
    unzip_file(zip_path, extract_to)
    posts = []
    tags = {}
    for md_file in find_md_files(extract_to):
        # print(f"Found MD file: {md_file}")
        __path = Path(md_file)
        with open(md_file, 'r', encoding='utf-8') as file:
            tag = __path.parent.name
            if tag not in tags.keys():
                tags[tag] = generate_id()
            tag_id = tags[tag]
            posts.append({
                'id': generate_id(),
                'tag': tag,
                'tag_id': tag_id,
                'title': __path.stem,
                'markdown': file.read(),
                'feature_image': None
            })
    return posts, tags

def unzip_file(zip_path, extract_to):
    """解壓rar文件到指定目錄"""
    if not os.path.exists(extract_to):
        os.makedirs(extract_to)
    res = subprocess.run(['7z', 'x', zip_path, f'-o{extract_to}', '-aoa'], capture_output=True, text=True)
    print(res.stdout)

def zip_file(folder_to_compress, compress_to):
    """壓縮文件"""
    res = subprocess.run(['7z', 'a', compress_to, folder_to_compress], capture_output=True, text=True)
    print(res.stdout)

def find_md_files(directory):
    """遞歸遍歷目錄，找到所有的.md文件"""
    for root, dirs, files in os.walk(directory):
        for file in files:
            if file.endswith('.md'):
                yield os.path.join(root, file)

def build_ghost(post_infos: list[dict], ghost_config: dict, tags) -> dict:
    """使用已知的信息組裝post"""
    from datetime import datetime, timezone
    # 格式化時間為指定格式
    current_time = datetime.now(timezone.utc)
    formatted_time = current_time.strftime('%Y-%m-%dT%H:%M:%S.000Z')

    author_id = ghost_config['db'][0]['data']['users'][0]['id']

    _model = {
        'posts_authors': [{
            'id': generate_id(),
            "post_id": post['id'],
            "author_id": author_id,
            "sort_order": 0
        }for post in post_infos],
        'posts': [{
            "id": post['id'],
            "uuid": generate_uuid(),
            "title": post['title'],
            "feature_image": post['feature_image'],
            "mobiledoc": post['mobiledoc'],
            "type": 'post',
            "status": post['post_status'],
            "visibility": "public",
            "email_recipient_filter": "all",
            "created_at": formatted_time,
            "updated_at": formatted_time,
            "published_at": formatted_time,
            "show_title_and_feature_image": 1
        } for post in post_infos],
        'posts_tags': [{
            "id": generate_id(),
            "post_id": post['id'],
            "tag_id": post['tag_id'],
            "sort_order": 0
        } for post in post_infos],
        'tags': [{
            'id': tag_id,
            'name': tag,
            "visibility": "public",
            "created_at": formatted_time,
            "updated_at": formatted_time

        } for tag, tag_id in tags.items()],
    }
    res = ghost_config

    res_post = res['db'][0]['data']
    # ghost導入本身就是增量更新，不需要保留之前的文章
    res_post['posts'] = _model['posts']
    res_post['tags'] = _model['tags']
    res_post['posts_tags'] = _model['posts_tags']
    return res

def get_mobiledoc_version(ghost_config):
    _mobiledoc_str = ghost_config['db'][0]['data']['posts'][0]['mobiledoc']
    _mobiledoc = json.loads(_mobiledoc_str)
    return _mobiledoc['version']

def main():
    # 簡書文件路徑
    zip_path = '/Users/era/Downloads/user-7914065-1730503948.rar'
    # ghost 導出文件，需要文章里的數據，保證導出的文件中有文章
    ghost_json_path = '/Users/era/Downloads/tui-ge.ghost.2024-11-02-00-00-48.json'
    # 導入的文章設置為 草稿 或者 已發布 draft published
    post_status = 'published'
    # 第一張圖片作為封面
    first_img_as_feature = True

    post_infos, tags = read_jianshu(zip_path)
    with open(ghost_json_path) as file:
        ghost_config = json.load(file)
    # mobiledoc version
    mobiledoc_version = get_mobiledoc_version(ghost_config)

    for info in post_infos:
        # 先替換markdown中的圖片鏈接，再轉換成mobiledoc
        md_str  = handle_img(info, Path(zip_path).parent, first_img_as_feature)
        info['mobiledoc'] = md_to_mobiledoc(md_str, mobiledoc_version)
        info['post_status'] = post_status
    print('download completed.')

    ghost_res = build_ghost(post_infos, ghost_config, tags)

    # 指定寫入文件路徑
    output_json_path = zip_path.replace('.rar', '.json')
    output_zip_path = zip_path.replace('.rar', '-pictures.zip')
    with open(output_json_path, 'w', encoding='utf-8') as json_file:
        json.dump(ghost_res, json_file, indent=4, ensure_ascii=False)

    zip_file(f'{Path(zip_path).parent}/content', output_zip_path)

    print(f"All done! Data saved to {output_json_path},{output_zip_path}")



if __name__ == "__main__":
    """
        pip install requests
        保證7z命令可用
    """
    main()

參考

json結構 https://ghost.org/docs/migration/custom/
導入圖片 https://ghost.org/help/imports/#image-imports
導入內容 https://ghost.org/docs/migration/content/

最后編輯于：2024.11.03 09:22:03

?著作權歸作者所有,轉載或內容合作請聯系作者
平臺聲明：文章內容（如有圖片或視頻亦包括在內）由作者上傳并發布，文章內容僅代表作者本人觀點，簡書系信息發布平臺，僅提供信息存儲服務。

人面猴
序言：七十年代末，一起剝皮案震驚了整個濱河市，隨后出現的幾起案子，更是在濱河造成了極大的恐慌，老刑警劉巖，帶你破解...
沈念sama閱讀 230,501評論 6贊 544
死咒
序言：濱河連續發生了三起死亡事件，死亡現場離奇詭異，居然都是意外死亡，警方通過查閱死者的電腦和手機，發現死者居然都...
沈念sama閱讀 99,673評論 3贊 429
救了他兩次的神仙讓他今天三更去死
文/潘曉璐我一進店門，熙熙樓的掌柜王于貴愁眉苦臉地迎上來，“玉大人，你說我怎么就攤上這事。” “怎么了？”我有些...
開封第一講書人閱讀 178,610評論 0贊 383
道士緝兇錄：失蹤的賣姜人
文/不壞的土叔我叫張陵，是天一觀的道長。經常有香客問我，道長，這世上最難降的妖魔是什么？我笑而不...
開封第一講書人閱讀 63,939評論 1贊 318
?港島之戀（遺憾婚禮）
正文為了忘掉前任，我火速辦了婚禮，結果婚禮上，老公的妹妹穿的比我還像新娘。我一直安慰自己，他們只是感情好，可當我...
茶點故事閱讀 72,668評論 6贊 412
惡毒庶女頂嫁案：這布局不是一般人想出來的
文/花漫我一把揭開白布。她就那樣靜靜地躺著，像睡著了一般。火紅的嫁衣襯著肌膚如雪。梳的紋絲不亂的頭發上，一...
開封第一講書人閱讀 56,004評論 1贊 329
城市分裂傳說
那天，我揣著相機與錄音，去河邊找鬼。笑死，一個胖子當著我的面吹牛，可吹牛的內容都是我干的。我是一名探鬼主播，決...
沈念sama閱讀 44,001評論 3贊 449
雙鴛鴦連環套：你想象不到人心有多黑
文/蒼蘭香墨我猛地睜開眼，長吁一口氣：“原來是場噩夢啊……” “哼！你這毒婦竟也來了？” 一聲冷哼從身側響起，我...
開封第一講書人閱讀 43,173評論 0贊 290
萬榮殺人案實錄
序言：老撾萬榮一對情侶失蹤，失蹤者是張志新（化名）和其女友劉穎，沒想到半個月后，有當地人在樹林里發現了一具尸體，經...
沈念sama閱讀 49,705評論 1贊 336
?護林員之死
正文獨居荒郊野嶺守林人離奇死亡，尸身上長有42處帶血的膿包…… 初始之章·張勛以下內容為張勛視角年9月15日...
茶點故事閱讀 41,426評論 3贊 359
?白月光啟示錄
正文我和宋清朗相戀三年，在試婚紗的時候發現自己被綠了。大學時的朋友給我發了我未婚夫和他白月光在一起吃飯的照片。...
茶點故事閱讀 43,656評論 1贊 374
活死人
序言：一個原本活蹦亂跳的男人離奇死亡，死狀恐怖，靈堂內的尸體忽然破棺而出，到底是詐尸還是另有隱情，我是刑警寧澤，帶...
沈念sama閱讀 39,139評論 5贊 364
?日本核電站爆炸內幕
正文年R本政府宣布，位于F島的核電站，受9級特大地震影響，放射性物質發生泄漏。R本人自食惡果不足惜，卻給世界環境...
茶點故事閱讀 44,833評論 3贊 350
男人毒藥：我在死后第九天來索命
文/蒙蒙一、第九天我趴在偏房一處隱蔽的房頂上張望。院中可真熱鬧，春花似錦、人聲如沸。這莊子的主人今日做“春日...
開封第一講書人閱讀 35,247評論 0贊 28
一樁弒父案，背后竟有這般陰謀
文/蒼蘭香墨我抬頭看了看天上的太陽。三九已至，卻和暖如春，著一層夾襖步出監牢的瞬間，已是汗流浹背。一陣腳步聲響...
開封第一講書人閱讀 36,580評論 1贊 295
情欲美人皮
我被黑心中介騙來泰國打工，沒想到剛下飛機就差點兒被人妖公主榨干…… 1. 我叫王不留，地道東北人。一個月前我還...
沈念sama閱讀 52,371評論 3贊 400
代替公主和親
正文我出身青樓，卻偏偏與公主長得像，于是被迫代替她去往敵國和親。傳聞我的和親對象是個殘疾皇子，可洞房花燭夜當晚...
茶點故事閱讀 48,621評論 2贊 380

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

簡書Markdown遷移至Ghost

簡書Markdown遷移至Ghost

Ghost內容生成

腳本依賴

腳本運行

參考

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

簡書Markdown遷移至Ghost

Ghost內容生成

腳本依賴

腳本運行

參考

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频