在NLP任務(wù)中,常需要分析單詞的詞性,借助nltk庫(kù)的pos_tag方法可以較好地實(shí)現(xiàn)。
以下是一個(gè)例子:
import nltk
line = 'i love this world which was beloved by all the people here'
tokens = nltk.word_tokenize(line)
# ['i', 'love', 'this', 'world', 'which', 'was', 'beloved', 'by',
# 'all', 'the', 'people', 'here']
pos_tags = nltk.pos_tag(tokens)
# [('i', 'RB'), ('love', 'VBP'), ('this', 'DT'), ('world', 'NN'), ('which', 'WDT'),
# ('was', 'VBD'), ('beloved', 'VBN'), ('by', 'IN'), ('all', 'PDT'), ('the', 'DT'),
# ('people', 'NNS'), ('here', 'RB')]
for word,pos in pos_tags:
if (pos == 'NN' or pos == 'NNP' or pos == 'NNS' or pos == 'NNPS'):
print word,pos
# world NN
# people NNS
作為nltk的替代,TextBlob庫(kù)能夠更進(jìn)一步進(jìn)行詞組劃分,例如“computer science”會(huì)被當(dāng)做一個(gè)單詞,而非"computer"和"science"
from textblob import TextBlob
txt = """Natural language processing (NLP) is a field of computer science, artificial intelligence, and computational linguistics concerned with the inter
actions between computers and human (natural) languages."""
blob = TextBlob(txt)
print(blob.noun_phrases)
# [u'natural language processing', 'nlp', u'computer science', u'artificial intelligence', u'computational linguistics']
更多例子請(qǐng)參考nltk官方教科書(shū)第五章
其中pos_tag分析出來(lái)的詞性含義按照賓夕法尼亞大學(xué)tag詞性對(duì)照表
tag | 含義 |
---|---|
CC | Coordinating conjunction |
CD | Cardinal number |
DT | Determiner |
EX | Existential there |
FW | Foreign word |
IN | Preposition or subordinating conjunction |
JJ | Adjective |
JJR | Adjective, comparative |
JJS | Adjective, superlative |
LS | List item marker |
MD | Modal |
NN | Noun, singular or mass |
NNS | Noun, plural |
NNP | Proper noun, singular |
NNPS | Proper noun, plural |
PDT | Predeterminer |
POS | Possessive ending |
PRP | Personal pronoun |
PRP$ | Possessive pronoun |
RB | Adverb |
RBR | Adverb, comparative |
RBS | Adverb, superlative |
RP | Particle |
SYM | Symbol |
TO | to |
UH | Interjection |
VB | Verb, base form |
VBD | Verb, past tense |
VBG | Verb, gerund or present participle |
VBN | Verb, past participle |
VBP | Verb, non-3rd person singular present |
VBZ | Verb, 3rd person singular present |
WDT | Wh-determiner |
WP | Wh-pronoun |
WP$ | Possessive wh-pronoun |
WRB | Wh-adverb |