国产美女在线精品免费观看网址,19禁在线观看漫画视频,无码人妻aⅴ一区二区三区鲁大师

加載nltk.book中的text，完成以下問題

在text2中有多少個詞？有多少個不同的詞？
嘗試寫一個切片表達式提取text2中最后兩個詞。
查找text5中的2-gram搭配，并統計搭配頻數

下載安裝nltk與nltk_data

nltk包的安裝
- 在Mac和Unix系統上
  - 終端運行：sudo pip install -U nltk
  - 需要numpy支持
- 在Windows系統上
  - tar包，下載地址。
  - 解壓，在cmd命令行進入解壓好的文件夾，執行python setup.py install。
- 進入python環境，運行import nltk，沒報錯就行。
下載nltk_data
- 方法1：python環境下執行以下代碼：
```
import nltk
nltk.download()
```
  出現一個下載窗口，選擇路徑，下載需要的數據包。
  
  特點，很慢。我下載過無數次都沒成功。
- 方法2：手動下載nltk_data，放到python的lib中。
  - 老師給的資料試了一下，加載出錯。又從新找資源。
  - 下載地址：GitHub，packages文件夾下的內容就就是nltk_data。
  - 把下好的nltk_data放到python目錄下。其實用戶目錄也可以。看出錯提醒這里，它會在這些目錄下查找。
    
    所以放在任意一個目錄下面都行，方便自己找就好了。

一個出錯問題的解決過程

在加載nltk.book的時候出錯了，先看出錯的代碼：

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\book.py", line 35, in <module>
    text6 = Text(webtext.words('grail.txt'),
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\corpus\util.py", line 99, in __getattr__
    self.__load()
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\corpus\util.py", line 61, in __load
    root = nltk.data.find('corpora/%s' % self.__name)
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\data.py", line 628, in find
    return find(modified_name, paths)
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\data.py", line 614, in find
    return ZipFilePathPointer(p, zipentry)
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\compat.py", line 561, in _decorator
    return init_func(*args, **kwargs)
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\data.py", line 469, in __init__
    zipfile = OpenOnDemandZipFile(os.path.abspath(zipfile))
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\compat.py", line 561, in _decorator
    return init_func(*args, **kwargs)
  File "C:\Users\yishikeji-05\Anaconda3\lib\site-packages\nltk\data.py", line 979, in __init__
    zipfile.ZipFile.__init__(self, filename)
  File "C:\Users\yishikeji-05\Anaconda3\lib\zipfile.py", line 1026, in __init__
    self._RealGetContents()
  File "C:\Users\yishikeji-05\Anaconda3\lib\zipfile.py", line 1093, in _RealGetContents
    raise BadZipFile("File is not a zip file")
zipfile.BadZipFile: File is not a zip file

錯誤類型是BadZipFile，需要的文件不是zip格式的file。然后我就各種查啊搜啊。均無果。
然后，仔細看了一下錯誤日志，最上面顯示出錯的代碼行是text6 = Text(webtext.words('grail.txt')這里。
所以應該是webtext這個文件的問題。于是我就去nltk_data中找webtext。果然有個叫webtext.zip的壓縮包。打開里面果然有grail.txt這個文件的，那解壓了試試吧。

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908
>>>

呵呵噠，神奇的好了。

可以做作業了

上面已經加載過nltk和nltk.book了，就在命令行繼續做吧。

在text2中有多少個詞？有多少個不同的詞？
```
>>> len(text2)
141576
>>> len(set(text2))
6833
```
嘗試寫一個切片表達式提取text2中最后兩個詞。

直接當做一個list來選取最后兩個項目能行嗎？
```
>>> text2[-2:]
['THE', 'END']
```

查找text5中的2-gram搭配，并統計搭配頻數

代碼部分：

import nltk
from nltk.book import text2,text5
import re
from collections import OrderedDict

# text2的單詞數，和無重復單詞數
print(len(text2),len(set(text2)))

# text2的最后兩個詞
print(text2[-2:])

# text5中的2-gram搭配，統計搭配頻數
def getNgrams(input, n):
    output = dict()
    for i in range(len(input)-n+1):
        newNGram = " ".join(input[i:i+n])
        if newNGram in output:
            output[newNGram] += 1
        else:
            output[newNGram] = 1
    return output

ngrams = getNgrams(text5, 2)
print(ngrams)
ngrams_freq = OrderedDict(sorted(ngrams.items(), key=lambda t: t[1], reverse=True))
print(ngrams_freq)

結果輸出：

2-grams結果：

image

詞頻統計結果：

image

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频

作業筆記10_nltk

作業筆記10_nltk

下載安裝nltk與nltk_data

可以做作業了

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美 国产 综合 欧美 视频

作業筆記10_nltk

下載安裝nltk與nltk_data

可以做作業了

推薦閱讀更多精彩內容

三个男躁一个女,国精产品一区一手机的秘密,麦子交换系列最经典十句话,欧美国产综合欧美视频