原来的琪琪电影,三年片在线观看免费播放大全电影国语版 ,电家庭影院午夜29332

另參見：好玩的分詞（2）——分析了《三體》全集，我看到了這樣的三體

在好玩的分詞（2）——分析了《三體》全集，我看到了這樣的三體一文中，通過分詞獲取到了三體全集文本中topn的詞及詞頻，那么本文中進一步用詞云的形式來展現出來。

廢話不多說，直接上代碼：

#!/usr/bin/python
# coding:utf-8
# 繪制一個《三體》全集詞云
import sys
from collections import Counter
import jieba.posseg as psg
import matplotlib.pyplot as plt
from scipy.misc import imread
from wordcloud import WordCloud,ImageColorGenerator

# 對文本分詞并標注詞性，并緩存到文件
def cut_and_cache(text):
    # 將文本分詞，并附帶上詞性，因為數據量比較大，防止每次運行腳本都花大量時間，所以第一次分詞后就將結果存入文件cut_result.txt中
    # 相當于做一個緩存，格式為每個詞占一行，每一行的內容為：
    # 詞,詞性
    words_with_attr = [(x.word,x.flag) for x in psg.cut(text) if len(x.word) >= 2]
    print len(words_with_attr)
    with open('cut_result.txt','w+') as f:
        for x in words_with_attr:
            f.write('{0}\t{1}\n'.format(x[0],x[1]))  
    return words_with_attr 

# 從cut_result.txt中讀取帶詞性的分詞結果列表
def read_cut_result():
    words_with_attr = []
    with open('cut_result.txt','r') as f:
        for x in f.readlines():
            # 這里解碼成utf-8格式，是為了防止后面生成詞云的時候出現亂碼
            x = x.decode('utf-8')
            pair = x.split()
            if len(pair) < 2:
                continue
            words_with_attr.append((pair[0],pair[1]))
    return words_with_attr

#　統計在分詞表中出現次數排名前topn的詞的列表，并將結果輸出到文件topn_words.txt中，每行一個詞，格式為：
# 詞,出現次數
def get_topn_words(words,topn):
    c = Counter(words).most_common(topn)
    top_words_with_freq = {}
    with open('top{0}_words.txt'.format(topn),'w+') as f:
        for x in c:
            f.write('{0},{1}\n'.format(x[0],x[1]))
            top_words_with_freq[x[0]] = x[1]
    return top_words_with_freq

# 傳入文本文件的路徑file_path和topn，獲取文本文件中topn關鍵詞列表及詞頻
def get_top_words(file_path,topn):
    # 讀取文本文件，然后分詞并緩存，只需運行一次，后續運行腳本可注釋掉下面兩行
    text = open(file_path).read()
    words_with_attr = cut_and_cache(text)
    
    # 從cut_result.txt中讀取帶詞性的分詞結果列表
    words_with_attr = read_cut_result()
    
    # 要過濾掉的詞性列表
    stop_attr = ['a','ad','b','c','d','f','df','m','mq','p','r','rr','s','t','u','v','z']
    
    # 過濾掉不需要的詞性的詞
    words = [x[0] for x in words_with_attr if x[1] not in stop_attr]
    
    # 獲取topn的詞并存入文件topn_words.txt，top_words_with_freq為一個字典，在生成詞云的時候會用到，格式為：
    # {'aa':1002,'bb':879,'cc':456}
    top_words_with_freq = get_topn_words(words = words,topn = topn)
    
    return top_words_with_freq

# 根據傳入的背景圖片路徑和詞頻字典、字體文件，生成指定名稱的詞云圖片
def generate_word_cloud(img_bg_path,top_words_with_freq,font_path,to_save_img_path,background_color = 'white'):
    # 讀取背景圖形
    img_bg = imread(img_bg_path)
    
    # 創建詞云對象
    wc = WordCloud(font_path = font_path,  # 設置字體
    background_color = background_color,  # 詞云圖片的背景顏色，默認為白色
    max_words = 500,  # 最大顯示詞數為1000
    mask = img_bg,  # 背景圖片蒙版
    max_font_size = 50,  # 字體最大字號
    random_state = 30,  # 字體的最多模式
    width = 1000,  # 詞云圖片寬度
    margin = 5,  # 詞與詞之間的間距
    height = 700)  # 詞云圖片高度
    
    # 用top_words_with_freq生成詞云內容
    wc.generate_from_frequencies(top_words_with_freq)
    
    # 用matplotlib繪出詞云圖片顯示出來
    plt.imshow(wc)
    plt.axis('off')
    plt.show()
    
    # 如果背景圖片顏色比較鮮明，可以用如下兩行代碼獲取背景圖片顏色函數，然后生成和背景圖片顏色色調相似的詞云
    #img_bg_colors = ImageColorGenerator(img_bg)
    #plt.imshow(wc.recolor(color_func = img_bg_colors))
    
    # 將詞云圖片保存成圖片
    wc.to_file(to_save_img_path)

def main():
    # 設置環境為utf-8編碼格式，防止處理中文出錯
    reload(sys)
    sys.setdefaultencoding('utf-8')
    
    # 獲取topn詞匯的'詞:詞頻'字典，santi.txt是當前目錄下三體全集的文本
    top_words_with_freq = get_top_words('./santi.txt',300)
    
    # 生成詞云圖片，bg.jpg是當前目錄下的一副背景圖片，yahei.ttf是當前目錄下微軟雅黑字體文件，santi_cloud.png是要生成的詞云圖片名
    generate_word_cloud('./bg.jpg',top_words_with_freq,'./yahei.ttf','./santi_cloud.png')
    
    print 'finish'
    
if __name__ == '__main__':
    main()

上述代碼中，bg.jpg圖片如下，是一只豹子的剪影，像一個在黑暗森林中潛伏的獵人：

注：作為詞云背景的圖片一定要輪廓分明，且圖片主體顏色要和圖片自身的背景顏色對比度較大，這樣生成的詞云圖片才能更清晰。一般剪影圖片更容易滿足這種要求。

此外，三體全集santi.txt文本從網上很好搜到。

注：有個坑要注意，就是在生成詞云前要把santi.txt的編碼格式轉為utf-8格式，否則可能并出不來預期的結果。

運行上述代碼，生成的詞云圖片如下：

最后，可以將這里的背景圖片和文本文件修改成其他的圖片和文本路徑，那么運行上面代碼就可以馬上得到自己想要的詞云了！

代碼已經放到：我的GitHub

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

好玩的分詞（3）——繪制《三體》全集詞云

好玩的分詞（3）——繪制《三體》全集詞云

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

好玩的分詞（3）——繪制《三體》全集詞云

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频