欧美丰满熟妇xx猛交,大伊香蕉精品一区视频在线 ,我爱操成人网

隨著 AI 的大熱，我的好奇心也受到了吸引。閱讀了一些文章后發(fā)現(xiàn)，Pyhton 是一個(gè)非常適合 AI 編程的語言。于是開始了對(duì)其打怪升級(jí)的探索。

探索中發(fā)現(xiàn)，Python 提供豐富的庫來幫助開發(fā)者們進(jìn)行數(shù)據(jù)分析。自己由于工作需要，正好在準(zhǔn)備托福寫作。于是，當(dāng) Python 遇上 Tofel，一場(chǎng)美麗的邂逅便展開了。

目標(biāo)

筆者完成了 5 篇托福作文后，想分析一下哪些詞是我最常用的，進(jìn)而學(xué)習(xí)這些詞的同義詞，擴(kuò)大詞匯量，然后在作文中自由替換。

思路

利用 Python 讀取文件
統(tǒng)計(jì)每篇文章的詞頻
合并 5 篇文章的詞頻
輸出前 10 詞頻的單詞

行動(dòng)

STEP 1:

導(dǎo)出作文

筆者使用 Evernote 進(jìn)行寫作，其支持導(dǎo)出 hmtl 格式文件。導(dǎo)出后，重命名文件方便讀取。

重命名

STEP 2:

通過分析 html 文件，我發(fā)現(xiàn)正文都在 <body> 中。通過查詢，發(fā)現(xiàn) BeautifulSoup 庫可以幫助處理 html 格式文件。

于是：

def filter_html(html):
    soup = BeautifulSoup(html, 'html.parser')
    # 需要過濾<title>標(biāo)簽，避免作文題目干擾
    text = soup.body.get_text()
    return text

STEP 3:
接下來，需要統(tǒng)計(jì)一篇文章中每個(gè)單詞的出現(xiàn)個(gè)數(shù)。這里主要用到了 re, collections.counter 兩個(gè) Python 內(nèi)置對(duì)象。

def calculate_words_frequency(file):
    # 讀取文件
    with open(file) as f:
        # html 處理
        f = filter_html(f)

        line_box = []
        word_box = []
        
        # 轉(zhuǎn)成小寫并將句子分成詞
        line_box.extend(f.strip().lower().split())
        
        # 去除標(biāo)點(diǎn)符號(hào)的影響    
        for word in line_box:
            if not word.isalpha():
                word = filter_puctuation(word)
            word_box.append(word)
        
        # 統(tǒng)計(jì)詞頻
        word_box = fileter_simple_words(collections.Counter(word_box))

        return word_box

這里解釋一下 filter_puctuation()這個(gè)函數(shù)。當(dāng)筆者輸出詞頻結(jié)果時(shí)，發(fā)現(xiàn)由于標(biāo)點(diǎn)符號(hào)的存在，很多單詞的尾部會(huì)跟著. , or ?

為了避免標(biāo)點(diǎn)對(duì)詞頻統(tǒng)計(jì)的干擾，筆者使用了簡單的正則去過濾掉標(biāo)點(diǎn)。（正則不太會(huì)，測(cè)試時(shí)夠用，應(yīng)該有更簡單和全面的寫法）

# 過濾單詞尾部的,.?"和頭部的"
def filter_puctuation(word):
    return re.sub(r'(\,$)|(\.$)|(\?$)|(\"$)|(^\")', '', word)

STEP 4:

在測(cè)試結(jié)果集的時(shí)候發(fā)現(xiàn)，排名靠前的單詞都是介詞，代詞，連詞等常用詞。如 he, and, that. 但這些詞并不是筆者想要的，于是需要先把常用簡單詞匯給過濾掉，再統(tǒng)計(jì)詞頻。（我手動(dòng)敲了一些，應(yīng)該網(wǎng)上有更全的清單）

def fileter_simple_words(words):
    # 過濾詞清單
    simple_words = ['the', 'a', 'an', 'to', 'is',
                    'am', 'are', 'the', 'that', 'which',
                    'i', 'you', 'he', 'she', 'they',
                    'it', 'of', 'for', 'have', 'has',
                    'their', 'my', 'your', 'will', 'all',
                    'but', 'while', 'with', 'only', 'more',
                    'who', 'should', 'there', 'can', 'might',
                    'could', 'may', 'be', 'on', 'at',
                    'after', 'most', 'even', 'and', 'in',
                    'best', 'better', 'as', 'no', 'ever',
                    'me', 'not', 'his', 'her'
                    ]

    # words type is counter.
    for word in list(words):
        if word in simple_words:
            del words[word]

    return words

STEP 5:
快接近尾聲啦。在統(tǒng)計(jì)完 1 篇文章的詞頻后，我需要將 5 篇文章的詞頻求和。鑒于 counter
對(duì)象的可加性，于是

def multiple_file_frequency(files):
    total_counter = collections.Counter()
    for file in files:
        total_counter += calculate_words_frequency(file)
    return total_counter

STEP 6:
求和之后，我想知道前 10 高頻的詞匯是哪些。

def most_common_words(files, number):
    total_counter = multiple_file_frequency(files)
    return total_counter.most_common(number)

STEP 7:
最后，使用 Python 可視化工具把結(jié)果生成柱狀圖。

def draw_figures(figures):
    labels, values = zip(*figures)
    indexes = np.arange(len(labels))
    width = 0.5
    plt.bar(indexes, values, width)
    plt.xticks(indexes, labels)
    plt.show()

Final results

大功告成。

學(xué)托福

好不容易算出來了結(jié)果，當(dāng)然要好好利用啦。

通過同義詞網(wǎng)站 Thesaurus，我可以查詢單詞的同義詞。Take parents and teachers as examples.

parent thesaurus

image thesaurus

接下來我會(huì)選取一些同義詞進(jìn)行記憶，提高自己的詞匯量，然后在寫作中靈活替換，從而提高寫作能力。當(dāng)然，考試時(shí)，也會(huì)提高分?jǐn)?shù)。

畢竟 appropriate word choice 是托福寫作的一項(xiàng)考核標(biāo)準(zhǔn)。

改進(jìn)

花了半天時(shí)間做這個(gè)小 Demo，有一些地方是自己覺得可以以后繼續(xù)研究的。

簡單詞的詞庫更新
自動(dòng)批量讀取文件，無需重命名，手動(dòng)輸入
數(shù)據(jù)圖更直觀，美觀 (研究 numpy, pandas, matplotlib.pyplot)
結(jié)果存儲(chǔ)為 cvs，便于日后使用

Reference

Github 項(xiàng)目地址

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

Python 統(tǒng)計(jì)托福作文詞頻

Python 統(tǒng)計(jì)托福作文詞頻

目標(biāo)

思路

行動(dòng)

學(xué)托福

改進(jìn)

Reference

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

Python 統(tǒng)計(jì)托福作文詞頻

目標(biāo)

思路

行動(dòng)

學(xué)托福

改進(jìn)

Reference

推薦閱讀更多精彩內(nèi)容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频