Stanford Pos Tagger를 이용한 POS Tagging
from nltk.tag import StanfordPOSTagger
from nltk.tokenize import word_tokenize
STANFORD_POS_MODEL_PATH = "압축 푼 디렉토리/stanford-postagger-full-2018-02-27/models/english-bidirectional-distsim.tagger"
STANFORD_POS_JAR_PATH = "압축 푼 디렉토리/stanford-postagger-full-2018-02-27/stanford-postagger-3.9.1.jar"
pos_tagger = StanfordPOSTagger(STANFORD_POS_MODEL_PATH, STANFORD_POS_JAR_PATH)
text = """Facebook CEO Mark Zuckerberg acknowledged a range of mistakes on Wednesday,
including allowing most of its two billion users to have their public profile data scraped by outsiders.
However, even as he took responsibility, he maintained he was the best person to fix the problems he created."""
tokens = word_tokenize(text)
print(tokens)
print()
print(pos_tagger.tag(tokens))
['Facebook', 'CEO', 'Mark', 'Zuckerberg', 'acknowledged', 'a', 'range', 'of', 'mistakes', 'on', 'Wednesday', ',', 'including', 'allowing', 'most', 'of', 'its', 'two', 'billion', 'users', 'to', 'have', 'their', 'public', 'profile', 'data', 'scraped', 'by', 'outsiders', '.', 'However', ',', 'even', 'as', 'he', 'took', 'responsibility', ',', 'he', 'maintained', 'he', 'was', 'the', 'best', 'person', 'to', 'fix', 'the', 'problems', 'he', 'created', '.']
[('Facebook', 'NNP'), ('CEO', 'NNP'), ('Mark', 'NNP'), ('Zuckerberg', 'NNP'), ('acknowledged', 'VBD'), ('a', 'DT'), ('range', 'NN'), ('of', 'IN'), ('mistakes', 'NNS'), ('on', 'IN'), ('Wednesday', 'NNP'), (',', ','), ('including', 'VBG'), ('allowing', 'VBG'), ('most', 'JJS'), ('of', 'IN'), ('its', 'PRP$'), ('two', 'CD'), ('billion', 'CD'), ('users', 'NNS'), ('to', 'TO'), ('have', 'VB'), ('their', 'PRP$'), ('public', 'JJ'), ('profile', 'NN'), ('data', 'NNS'), ('scraped', 'VBN'), ('by', 'IN'), ('outsiders', 'NNS'), ('.', '.'), ('However', 'RB'), (',', ','), ('even', 'RB'), ('as', 'IN'), ('he', 'PRP'), ('took', 'VBD'), ('responsibility', 'NN'), (',', ','), ('he', 'PRP'), ('maintained', 'VBD'), ('he', 'PRP'), ('was', 'VBD'), ('the', 'DT'), ('best', 'JJS'), ('person', 'NN'), ('to', 'TO'), ('fix', 'VB'), ('the', 'DT'), ('problems', 'NNS'), ('he', 'PRP'), ('created', 'VBD'), ('.', '.')]
noun_and_verbs = []
for token in pos_tagger.tag(tokens):
if token[1].startswith("V") or token[1].startswith("N"):
noun_and_verbs.append(token[0])
print(', '.join(noun_and_verbs))
Facebook, CEO, Mark, Zuckerberg, acknowledged, range, mistakes, Wednesday, including, allowing, users, have, profile, data, scraped, outsiders, took, responsibility, maintained, was, person, fix, problems, created
novdov.github.io/nlp/2018/04/05/NLP-POS-Tagging-%ED%92%88%EC%82%AC-%ED%83%9C%EA%B9%85/
Stanford Pos Tagger를 이용한 POS Tagging
Stanford Pos Tagger를 이용해 POS tagging 방법을 간단하게 알아봅니다.
novdov.github.io
품사 태깅 약어 정보
www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html
Penn Treebank P.O.S. Tags
31. VBP Verb, non-3rd person singular present
www.ling.upenn.edu
Number
Tag
Description
1.
CC
Coordinating conjunction
2.
CD
Cardinal number
3.
DT
Determiner
4.
EX
Existential there
5.
FW
Foreign word
6.
IN
Preposition or subordinating conjunction
7.
JJ
Adjective
8.
JJR
Adjective, comparative
9.
JJS
Adjective, superlative
10.
LS
List item marker
11.
MD
Modal
12.
NN
Noun, singular or mass
13.
NNS
Noun, plural
14.
NNP
Proper noun, singular
15.
NNPS
Proper noun, plural
16.
PDT
Predeterminer
17.
POS
Possessive ending
18.
PRP
Personal pronoun
19.
PRP$
Possessive pronoun
20.
RB
Adverb
21.
RBR
Adverb, comparative
22.
RBS
Adverb, superlative
23.
RP
Particle
24.
SYM
Symbol
25.
TO
to
26.
UH
Interjection
27.
VB
Verb, base form
28.
VBD
Verb, past tense
29.
VBG
Verb, gerund or present participle
30.
VBN
Verb, past participle
31.
VBP
Verb, non-3rd person singular present
32.
VBZ
Verb, 3rd person singular present
33.
WDT
Wh-determiner
34.
WP
Wh-pronoun
35.
WP$
Possessive wh-pronoun
36.
WRB
Wh-adverb