프로그래밍/Python

[Python] 인스타그램 이미지 크롤링 2022.09.15
[python] ChromeDriver - WebDriver for Chrome 2022.09.14
[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화 2022.08.10
파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea 2022.07.21
[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기 2022.07.04
[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기 2022.07.04

[Python] 인스타그램 이미지 크롤링

홍반장水_ 2022. 9. 15. 14:18

2022. 9. 15. 14:18

예전 방식이라 잘되지 않는다.

다른 url로 했을때 데이터는 잘 가져왔다.

css 보안이 걸려있는듯.

##  인스타그램 이미지 크롤링
#    
##
import os
import sys
import konlpy
import pandas as pd
import numpy as np
os.environ['JAVA_OPTS'] = 'Xmx4096M'
    
## 시간 표시  ##################################### 
import time
import datetime
now = datetime.datetime.now()

timeserise = time.time()
timeserise = str(int(timeserise))
print(timeserise)
print(now)
#################################################  


#작업하는 경로(위치)가 어디인지 확인
print(os.getcwd())

prePath = "./Project/instagram_cr/"
file_name = prePath + "outputfile0.txt" 

# 라이브러리 추가
from bs4 import BeautifulSoup  #불러온 데이터를 구분지어 원라는 데이터 출력
from selenium import webdriver #Chromedriver를 사용하여, 자동화 시스템 구동
## chrome 버전 안맞으면 아래와 같은 에러 발생함. chromedriver 버전 확인 필수
#  selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 94
# Current browser version is 105.0.5195.102 with binary path C:\Program Files\Google\Chrome\Application\chrome.exe
# 
# GoUrl : https://chromedriver.storage.googleapis.com/index.html?path=105.0.5195.52/
##


from urllib.request import urlopen
from urllib.parse  import quote_plus # ASCII 형태로 자동 변형
import requests
import shutil
 


testurl_01 = "https://www.instagram.com/explore/tags/"
testurl_02 = input("Please input the word to search for : ")
testurl_03 = testurl_01 + quote_plus(testurl_02)


print(testurl_03)

## 아래 오류때문에 추가함. options
#  USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: 시스템에 부착된 장치가 작동하지 않습니다. 
## options start
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
#browser = webdriver.Chrome(options=options)
## options end 

#driver_01 = webdriver.Chrome()
driver_01 = webdriver.Chrome(options=options)
driver_01.get(testurl_03)

html_01 = driver_01.page_source
#print(html_01)

Source_01 = BeautifulSoup(html_01,"html.parser")
#Source_01 = BeautifulSoup(html_01,"lxml")
#Source_01 = BeautifulSoup(html_01)

time.sleep(5)

#print(Source_01)
print(Source_01.prettify())
 
o = open(prePath +'result_list.txt', 'w', encoding='utf-8')
o.write("")
o.write(Source_01.prettify())
o.close()    




var_list = [1, 3, 5, 7, 9]
for ii in var_list:
    print("----------------------------------------")


Demo_insta = Source_01.select('._a3wf._-kb.segoe') 
print(Demo_insta)

for each_div in Source_01.findAll('div',{'class':'list'}):
    print(each_div)


"""
x_1 = 1

for i in Demo_insta:
    print("https://www.instagram.com/" + i.a['href'])
    #img_01 = i.select_one('_aagt').img['src']
    #print(img_01)
"""

driver_01.close()

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] 어린이를 위한 파이썬 교육용 서버리스 주피터 노트북 앱 만들기, 신정규 - PyCon Korea 2022 (0)	2022.10.13
[Python] PyCon.KR 2022 유튜브 영상 (0)	2022.10.13
[python] ChromeDriver - WebDriver for Chrome (0)	2022.09.14
[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화 (0)	2022.08.10
파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea (0)	2022.07.21

[python] ChromeDriver - WebDriver for Chrome

홍반장水_ 2022. 9. 14. 14:25

2022. 9. 14. 14:25

로컬 크롬 버전 확인 : chrome://version/

크롬드라이브 다운로드 https://sites.google.com/a/chromium.org/chromedriver/downloads

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[Python] PyCon.KR 2022 유튜브 영상 (0)	2022.10.13
[Python] 인스타그램 이미지 크롤링 (0)	2022.09.15
[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화 (0)	2022.08.10
파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea (0)	2022.07.21
[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기 (0)	2022.07.04

[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화

홍반장水_ 2022. 8. 10. 16:52

2022. 8. 10. 16:52

https://pythontutor.com/

Python Tutor: Learn Python, JavaScript, C, C++, and Java by visualizing code

Learn Python, JavaScript, C, C++, and Java This coding tutor tool helps you learn Python, JavaScript, C, C++, and Java by visualizing code execution. You can use it to debug your homework assignments and as a supplement to online coding tutorials. Related

pythontutor.com

This coding tutor tool helps you learn Python, JavaScript, C, C++, and Java by visualizing code execution. You can use it to debug your homework assignments and as a supplement to online coding tutorials.

Start writing and visualizing code now

Over ten million people in more than 180 countries have used Python Tutor to visualize over 100 million pieces of code. It's the most widely-used program visualization tool for computing education.

이 코딩 튜터 도구는 코드 실행을 시각화하여 Python, JavaScript, C, C 및 Java를 배우는 데 도움이 됩니다. 이를 사용하여 숙제를 디버그하고 온라인 코딩 자습서를 보완할 수 있습니다. 지금 코드 작성 및 시각화 시작.

Learn Python, JavaScript, C, C++, and Java 로직 시각화

https://pythontutor.com/visualize.html#mode=edit

Python Tutor code visualizer: Visualize code in Python, JavaScript, C, C++, and Java

Please wait ... your code is running (up to 10 seconds) Write code in Python 3.6 Java 8 JavaScript ES6 C (gcc 9.3, C17 + GNU extensions) C++ (g++ 9.3, C++20 + GNU extensions) ------ [unsupported] Python 2.7 [unsupported] C (gcc 4.8, C11) [unsupported] C++

pythontutor.com

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[Python] 인스타그램 이미지 크롤링 (0)	2022.09.15
[python] ChromeDriver - WebDriver for Chrome (0)	2022.09.14
파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea (0)	2022.07.21
[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기 (0)	2022.07.04
[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기 (0)	2022.07.04

파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea

홍반장水_ 2022. 7. 21. 14:41

2022. 7. 21. 14:41

파이콘 한국 2022 - https://2022.pycon.kr/

온라인 컨퍼런스 10/1(토)~10/2(일)

페이스북 : https://www.facebook.com/pyconkorea

Twitter : https://twitter.com/PyConKR

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[python] ChromeDriver - WebDriver for Chrome (0)	2022.09.14
[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화 (0)	2022.08.10
[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기 (0)	2022.07.04
[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기 (0)	2022.07.04
[python] PyScript - Run Python in Your HTML (0)	2022.05.25

[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기

홍반장水_ 2022. 7. 4. 17:02

2022. 7. 4. 17:02

가상환경에 https://github.com/lovit/customized_konlpy 들어가서 내려받은 후에 잘 적용시켜야 한다.

https://inspiringpeople.github.io/data%20analysis/ckonlpy/ 에서는 이미 가상환경을 잘 알고 있다는 가정하에 작성된거 같다.

1. 대상 폴더에 내려받고

2. python 가상환경 들어간 후에 " activate main "

- 나는 main 이라는 가상환경을 따로 만들었다.

3. 내려받은 파일의 setup.py를 실행.

4. python 실행 시킨후 예제 구문 실행해보면 되는 것을 확인 할 수 있다.

- 2022-07-04 다시 확인해봤는데 잘 된다.

한국어 자연어처리를 할 수 있는 파이썬 패키지, KoNLPy의 customized version입니다.

customized_KoNLPy는 확실히 알고 있는 단어들에 대해서는 라이브러리를 거치지 않고 주어진 어절을 아는 단어들로 토크나이징 / 품사판별을 하는 기능을 제공합니다. 이를 위해 template 기반 토크나이징을 수행합니다.

사전: {'아이오아이': 'Noun', '는': 'Josa'}
탬플릿: Noun + Josa

위와 같은 단어 리스트와 탬플릿이 있다면 '아이오아이는' 이라는 어절은 [('아이오아이', 'Noun'), ('는', 'Josa')]로 분리됩니다.

Install

$ git clone https://github.com/lovit/customized_konlpy.git

$ pip install customized_konlpy

Requires

JPype >= 0.6.1
KoNLPy >= 0.4.4

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화 (0)	2022.08.10
파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea (0)	2022.07.21
[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기 (0)	2022.07.04
[python] PyScript - Run Python in Your HTML (0)	2022.05.25
[python] 초보자를 위한 파이썬 300제 (0)	2022.04.12

[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기

홍반장水_ 2022. 7. 4. 16:22

2022. 7. 4. 16:22

[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기

.to_csv를 이용하면 된다.

import pandas as pd



#dictionary형 자료형을 판다스 데이터프레임으로 만들어줌 
#orient=index를 넣어야 행으로 쭉 나열이 됨 
df=pd.DataFrame.from_dict(count, orient='index')

df.to_csv('bigKeyword_all.txt')
df.head(100).to_csv('bigKeyword_top100.txt')


# Dataframe의 내용을 csv로 생성
## DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w'
#                    , encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None
#                    , doublequote=True, escapechar=None, decimal='.', errors='strict')
df.to_csv('output/word_ex_note_1.csv', index = False, header=False, line_terminator=False, encoding='utf-8-sig')
df.head(100).to_csv('output/word_ex_note_100.csv', header=False, line_terminator=False, encoding='utf-8-sig')
df.to_csv('output/word_ex_note_1.txt', sep = '\t', index = False,header=False, line_terminator=False, encoding='utf-8-sig')

저작자표시 비영리 (새창열림)

'프로그래밍 > Python' 카테고리의 다른 글

파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea (0)	2022.07.21
[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기 (0)	2022.07.04
[python] PyScript - Run Python in Your HTML (0)	2022.05.25
[python] 초보자를 위한 파이썬 300제 (0)	2022.04.12
[python] jupyter notebook 설치 & 구동 (0)	2022.04.07

PREV 이전 1 ···34 35 36 37 38 39 40 ···65 NEXT 다음

긍정적 사고, 음식의 절제, 규칙적인 운동

프로그래밍/Python

[Python] 인스타그램 이미지 크롤링

'프로그래밍 > Python' 카테고리의 다른 글

[python] ChromeDriver - WebDriver for Chrome

'프로그래밍 > Python' 카테고리의 다른 글

[Python] https://pythontutor.com/ Learn Python, JavaScript, C, C++, and Java 로직 시각화

Learn Python, JavaScript, C, C++, and Java 로직 시각화

'프로그래밍 > Python' 카테고리의 다른 글

파이콘 한국 2022 - https://2022.pycon.kr/ PyCon Korea

'프로그래밍 > Python' 카테고리의 다른 글

[Python] Customized Konlpy 사용하기. okt 에 dict 추가하기

Install

Requires

'프로그래밍 > Python' 카테고리의 다른 글

[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기

'프로그래밍 > Python' 카테고리의 다른 글

+ Recent posts

티스토리툴바