반응형

Gensim word vector visualization of various word vectors

web.stanford.edu/class/cs224n/materials/Gensim%20word%20vector%20visualization.html

 

Gensim word vector visualization

For looking at word vectors, I'll use Gensim. We also use it in hw1 for word vectors. Gensim isn't really a deep learning package. It's a package for for word and text similarity modeling, which started with (LDA-style) topic models and grew into SVD and n

web.stanford.edu

import numpy as np

# Get the interactive Tools for Matplotlib
%matplotlib notebook
import matplotlib.pyplot as plt
plt.style.use('ggplot')

from sklearn.decomposition import PCA

from gensim.test.utils import datapath, get_tmpfile
from gensim.models import KeyedVectors
from gensim.scripts.glove2word2vec import glove2word2vec
 
glove_file = datapath('/Users/manning/Corpora/GloVe/glove.6B.100d.txt')
word2vec_glove_file = get_tmpfile("glove.6B.100d.word2vec.txt")
glove2word2vec(glove_file, word2vec_glove_file)
model = KeyedVectors.load_word2vec_format(word2vec_glove_file)
model.most_similar('obama')
model.most_similar('banana')
model.most_similar(negative='banana')
result = model.most_similar(positive=['woman', 'king'], negative=['man'])
print("{}: {:.4f}".format(*result[0]))
def analogy(x1, x2, y1):
    result = model.most_similar(positive=[y1, x2], negative=[x1])
    return result[0][0]
Analogy

analogy('japan', 'japanese', 'australia')
analogy('australia', 'beer', 'france')
analogy('obama', 'clinton', 'reagan')
analogy('tall', 'tallest', 'long')
analogy('good', 'fantastic', 'bad')
print(model.doesnt_match("breakfast cereal dinner lunch".split()))
 
def display_pca_scatterplot(model, words=None, sample=0):
    if words == None:
        if sample > 0:
            words = np.random.choice(list(model.vocab.keys()), sample)
        else:
            words = [ word for word in model.vocab ]
        
    word_vectors = np.array([model[w] for w in words])

    twodim = PCA().fit_transform(word_vectors)[:,:2]
    
    plt.figure(figsize=(6,6))
    plt.scatter(twodim[:,0], twodim[:,1], edgecolors='k', c='r')
    for word, (x,y) in zip(words, twodim):
        plt.text(x+0.05, y+0.05, word)
 
display_pca_scatterplot(model, 
                        ['coffee', 'tea', 'beer', 'wine', 'brandy', 'rum', 'champagne', 'water',
                         'spaghetti', 'borscht', 'hamburger', 'pizza', 'falafel', 'sushi', 'meatballs',
                         'dog', 'horse', 'cat', 'monkey', 'parrot', 'koala', 'lizard',
                         'frog', 'toad', 'monkey', 'ape', 'kangaroo', 'wombat', 'wolf',
                         'france', 'germany', 'hungary', 'luxembourg', 'australia', 'fiji', 'china',
                         'homework', 'assignment', 'problem', 'exam', 'test', 'class',
                         'school', 'college', 'university', 'institute'])
display_pca_scatterplot(model, sample=300)
반응형
반응형

Stanford Pos Tagger를 이용한 POS Tagging

from nltk.tag import StanfordPOSTagger
from nltk.tokenize import word_tokenize

STANFORD_POS_MODEL_PATH = "압축 푼 디렉토리/stanford-postagger-full-2018-02-27/models/english-bidirectional-distsim.tagger"
STANFORD_POS_JAR_PATH = "압축 푼 디렉토리/stanford-postagger-full-2018-02-27/stanford-postagger-3.9.1.jar"

pos_tagger = StanfordPOSTagger(STANFORD_POS_MODEL_PATH, STANFORD_POS_JAR_PATH)

text = """Facebook CEO Mark Zuckerberg acknowledged a range of mistakes on Wednesday, 
including allowing most of its two billion users to have their public profile data scraped by outsiders. 
However, even as he took responsibility, he maintained he was the best person to fix the problems he created."""

tokens = word_tokenize(text)
print(tokens)
print()
print(pos_tagger.tag(tokens))

['Facebook', 'CEO', 'Mark', 'Zuckerberg', 'acknowledged', 'a', 'range', 'of', 'mistakes', 'on', 'Wednesday', ',', 'including', 'allowing', 'most', 'of', 'its', 'two', 'billion', 'users', 'to', 'have', 'their', 'public', 'profile', 'data', 'scraped', 'by', 'outsiders', '.', 'However', ',', 'even', 'as', 'he', 'took', 'responsibility', ',', 'he', 'maintained', 'he', 'was', 'the', 'best', 'person', 'to', 'fix', 'the', 'problems', 'he', 'created', '.']

[('Facebook', 'NNP'), ('CEO', 'NNP'), ('Mark', 'NNP'), ('Zuckerberg', 'NNP'), ('acknowledged', 'VBD'), ('a', 'DT'), ('range', 'NN'), ('of', 'IN'), ('mistakes', 'NNS'), ('on', 'IN'), ('Wednesday', 'NNP'), (',', ','), ('including', 'VBG'), ('allowing', 'VBG'), ('most', 'JJS'), ('of', 'IN'), ('its', 'PRP$'), ('two', 'CD'), ('billion', 'CD'), ('users', 'NNS'), ('to', 'TO'), ('have', 'VB'), ('their', 'PRP$'), ('public', 'JJ'), ('profile', 'NN'), ('data', 'NNS'), ('scraped', 'VBN'), ('by', 'IN'), ('outsiders', 'NNS'), ('.', '.'), ('However', 'RB'), (',', ','), ('even', 'RB'), ('as', 'IN'), ('he', 'PRP'), ('took', 'VBD'), ('responsibility', 'NN'), (',', ','), ('he', 'PRP'), ('maintained', 'VBD'), ('he', 'PRP'), ('was', 'VBD'), ('the', 'DT'), ('best', 'JJS'), ('person', 'NN'), ('to', 'TO'), ('fix', 'VB'), ('the', 'DT'), ('problems', 'NNS'), ('he', 'PRP'), ('created', 'VBD'), ('.', '.')]

noun_and_verbs = []
for token in pos_tagger.tag(tokens):
    if token[1].startswith("V") or token[1].startswith("N"):
        noun_and_verbs.append(token[0])
print(', '.join(noun_and_verbs))

Facebook, CEO, Mark, Zuckerberg, acknowledged, range, mistakes, Wednesday, including, allowing, users, have, profile, data, scraped, outsiders, took, responsibility, maintained, was, person, fix, problems, created

novdov.github.io/nlp/2018/04/05/NLP-POS-Tagging-%ED%92%88%EC%82%AC-%ED%83%9C%EA%B9%85/

 

Stanford Pos Tagger를 이용한 POS Tagging

Stanford Pos Tagger를 이용해 POS tagging 방법을 간단하게 알아봅니다.

novdov.github.io

품사 태깅 약어 정보

www.ling.upenn.edu/courses/Fall_2003/ling001/penn_treebank_pos.html

 

Penn Treebank P.O.S. Tags

31. VBP Verb, non-3rd person singular present

www.ling.upenn.edu

Number

Tag

Description

1. CC Coordinating conjunction
2. CD Cardinal number
3. DT Determiner
4. EX Existential there
5. FW Foreign word
6. IN Preposition or subordinating conjunction
7. JJ Adjective
8. JJR Adjective, comparative
9. JJS Adjective, superlative
10. LS List item marker
11. MD Modal
12. NN Noun, singular or mass
13. NNS Noun, plural
14. NNP Proper noun, singular
15. NNPS Proper noun, plural
16. PDT Predeterminer
17. POS Possessive ending
18. PRP Personal pronoun
19. PRP$ Possessive pronoun
20. RB Adverb
21. RBR Adverb, comparative
22. RBS Adverb, superlative
23. RP Particle
24. SYM Symbol
25. TO to
26. UH Interjection
27. VB Verb, base form
28. VBD Verb, past tense
29. VBG Verb, gerund or present participle
30. VBN Verb, past participle
31. VBP Verb, non-3rd person singular present
32. VBZ Verb, 3rd person singular present
33. WDT Wh-determiner
34. WP Wh-pronoun
35. WP$ Possessive wh-pronoun
36. WRB Wh-adverb

반응형
반응형

Remove Elements From A Counter

 

counter에서 요소 삭제하기

Question:

How do you remove an element from a counter?

Answer:

Setting a count to zero does not remove an element from a counter. Use del to remove it entirely.

Source: (example.py)

from collections import Counter
 
c = Counter(x=10, y=7, z=3)
print(c)
 
c['z'] = 0
print(c)
 
del c['z']
print(c)

Output:

$ python example.py
Counter({'x': 10, 'y': 7, 'z': 3})
Counter({'x': 10, 'y': 7, 'z': 0})
Counter({'x': 10, 'y': 7})
반응형
반응형

How to add or increment single item of the Python Counter class

 

How to add or increment single item of the Python Counter class

A set uses .update to add multiple items, and .add to add a single one. Why doesn't collections.Counter work the same way? To increment a single Counter item using Counter.update, you have to add i...

stackoverflow.com

python 에서  Counter 증가시키명서 리스트 적용하기. 

>>> c = collections.Counter(a=23, b=-9)

#You can add a new element and set its value like this:

>>> c['d'] = 8
>>> c
Counter({'a': 23, 'd': 8, 'b': -9})


#Increment:

>>> c['d'] += 1
>>> c
Counter({'a': 23, 'd': 9, 'b': -9} 


#Note though that c['b'] = 0 does not delete:

>>> c['b'] = 0
>>> c
Counter({'a': 23, 'd': 9, 'b': 0})


#To delete use del:

>>> del c['b']
>>> c
Counter({'a': 23, 'd': 9})
Counter is a dict subclass
반응형
반응형

Yes Cyber solution is best.

For beginners

  1. Read file in Read mode.
  2. Iterate lines by readlines() or readline()
  3. Use split(",") method to split line by '
  4. Use float to convert string value to float. OR We can use eval() also.
  5. Use list append() method to append tuple to list.
  6. Use try except to prevent code from break.
My text file is:

-34.968398,-6.487265
-34.969448,-6.488250
-34.967364,-6.492370
-34.965735,-6.582322


Code:

p = "/home/vivek/Desktop/test.txt"
result = []
with open(p, "rb") as fp:
    for i in fp.readlines():
        tmp = i.split(",")
        try:
            result.append((float(tmp[0]), float(tmp[1])))
            #result.append((eval(tmp[0]), eval(tmp[1])))
        except:pass

print result


Output:

$ python test.py 
[(-34.968398, -6.487265), (-34.969448, -6.48825), (-34.967364, -6.49237), (-34.965735, -6.582322)]

Read file as a list of tuples, 파일읽어서 튜플 만들기

 

 

반응형
반응형

Sublime Text 3에서 Python 프로그래밍을 하기 위해 필요한 설정을 다루고 있습니다.

 

webnautes.tistory.com/454

 

Sublime Text 3와 함께 Python 프로그래밍(Windows / Ubuntu)

Sublime Text 3에서 Python 프로그래밍을 하기 위해 필요한 설정을 다루고 있습니다. 1. Python 설치 2. Sublime Text 3 설치 3. Sublime Text 3 기본 사용방법 4. Package Control 5. sublimeREPL 플러그인 설치..

webnautes.tistory.com

 

반응형
반응형

style cloud 설정

 

> pip install stylecloud

 

 

- file_path: 입력할 데이터를 텍스트 문서로 지정합니다.

- text: 입력할 데이터를 딕셔너리 자료형으로 지정합니다.

- font_path: 워드클라우드를 그릴 path를 지정합니다.

- size: 사이즈를 지정, (1024, 512)과 같은 형식으로 입력합니다.

- background_color: 배경색을 지정한다. 색이름을 입력하면 된다. ( 예) white )

- icon_name: 어떤 모양으로 그릴 지 입력합니다. fab fa-twitter(트워터 모양), fas fa-dog(강아지), fas fa-flag(깃발), fas fa-fish(물고기) 등이 있다. 띄어쓰기 앞은 폰트를 의미하고, 뒤에는 모양을 의미한다. 그릴 수 있는 모양은 가지수가 좀 많은데, stylecolud패키지가 설치된 폴더에서 static폴더 밑에 fontawesome.min.css파일을 확인하면 알 수 있다.

- font_path: 폰트를 지정한다.

- output_name: 결과를 파일로 저장한다.

 

Generate Modern Stylish Wordcloud : towardsdatascience.com/generate-modern-stylish-wordcloud-with-stylecloud-9cbb059696d2

 

Generate Modern Stylish Wordcloud with stylecloud

But deep down, all of us have always wished for modern-stylish-beautiful wordclouds. That wish has become true with this new python…

towardsdatascience.com

파이썬 wordcloud를 사용한 한글 명사 시각화 :  liveyourit.tistory.com/58

 

파이썬 wordcloud를 사용한 한글 명사 시각화

파이썬 wordcloud는 중요한 단어나 키워드를 시각화해서 보여주는 시각화 도구이다. wordcloud 자체적으로 빈도수를 계산하는 기능이 있다고 하지만 아무래도 한글의 특성이 있다보니, 나는 한글 명

liveyourit.tistory.com

 

#워드 클라우드, 파이썬에서 이쁘게 그리는 방법은 tariat.tistory.com/854?category=678887

 

워드 클라우드, 파이썬에서 이쁘게 그리는 방법은?!

빅데이터가 많은 사람들에게 관심을 받기 시작할 때 많이 볼 수 있었던 것 중에 하나로 워드 클라우드가 있다. 워드 클라우드는 단어별 빈도수를 기준으로 한 단순한 시각화에 불과하지만, 그림

tariat.tistory.com

 

반응형
반응형

wordcloud시 불용어 지정

>pip install wordcloud


from wordcloud import WordCloud

texts = ['이것 은 예문 입니다', '여러분 의 문장을 넣 으세요']
keywords = {'이것':5, '예문':3, '단어':5, '빈도수':3}

wordcloud = WordCloud()
wordcloud = wordcloud.generate_from_text(texts)
wordcloud = wordcloud.generate_from_frequencies(keywords)

###########################################

from wordcloud import WordCloud
from wordcloud import STOPWORDS

stopwords = {'은', '입니다'}

wordcloud = WordCloud(stopwords=stopwords)
wordcloud = wordcloud.generate_from_text(texts)

keywords.pop('영화')
keywords.pop('관람객')
keywords.pop('너무')
keywords.pop('정말')

from wordcloud import WordCloud

wordcloud = WordCloud(
    width = 800,
    height = 800
)

wordcloud = wordcloud.generate_from_frequencies(keywords)

font_path = '/usr/share/fonts/truetype/nanum/NanumBarunGothic.ttf'

from wordcloud import WordCloud

wordcloud = WordCloud(
    font_path = font_path,
    width = 800,
    height = 800
)
wordcloud = wordcloud.generate_from_frequencies(keywords)



from wordcloud import WordCloud

wordcloud = WordCloud(
    font_path = font_path,
    width = 800,
    height = 800
)
wordcloud = wordcloud.generate_from_frequencies(keywords)



%matplotlib inline
import matplotlib.pyplot as plt

fig = plt.figure(figsize=(10, 10))
plt.imshow(array, interpolation="bilinear")
plt.show()
fig.savefig('wordcloud_without_axisoff.png')
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt 
 
stopwords = set(STOPWORDS) 
stopwords.add('워드클라우드') 
 
wordcloud = WordCloud(font_path='font/NanumGothic.ttf',stopwords=stopwords,background_color='white').generate(text)
 
 

 

반응형

+ Recent posts