반응형

예전 방식이라 잘되지 않는다. 

다른 url로 했을때 데이터는 잘 가져왔다. 

css 보안이 걸려있는듯. 

 

##  인스타그램 이미지 크롤링
#    
##
import os
import sys
import konlpy
import pandas as pd
import numpy as np
os.environ['JAVA_OPTS'] = 'Xmx4096M'
    
## 시간 표시  ##################################### 
import time
import datetime
now = datetime.datetime.now()

timeserise = time.time()
timeserise = str(int(timeserise))
print(timeserise)
print(now)
#################################################  


#작업하는 경로(위치)가 어디인지 확인
print(os.getcwd())

prePath = "./Project/instagram_cr/"
file_name = prePath + "outputfile0.txt" 

# 라이브러리 추가
from bs4 import BeautifulSoup  #불러온 데이터를 구분지어 원라는 데이터 출력
from selenium import webdriver #Chromedriver를 사용하여, 자동화 시스템 구동
## chrome 버전 안맞으면 아래와 같은 에러 발생함. chromedriver 버전 확인 필수
#  selenium.common.exceptions.SessionNotCreatedException: Message: session not created: This version of ChromeDriver only supports Chrome version 94
# Current browser version is 105.0.5195.102 with binary path C:\Program Files\Google\Chrome\Application\chrome.exe
# 
# GoUrl : https://chromedriver.storage.googleapis.com/index.html?path=105.0.5195.52/
##


from urllib.request import urlopen
from urllib.parse  import quote_plus # ASCII 형태로 자동 변형
import requests
import shutil
 


testurl_01 = "https://www.instagram.com/explore/tags/"
testurl_02 = input("Please input the word to search for : ")
testurl_03 = testurl_01 + quote_plus(testurl_02)


print(testurl_03)

## 아래 오류때문에 추가함. options
#  USB: usb_device_handle_win.cc:1048 Failed to read descriptor from node connection: 시스템에 부착된 장치가 작동하지 않습니다. 
## options start
options = webdriver.ChromeOptions()
options.add_experimental_option("excludeSwitches", ["enable-logging"])
#browser = webdriver.Chrome(options=options)
## options end 

#driver_01 = webdriver.Chrome()
driver_01 = webdriver.Chrome(options=options)
driver_01.get(testurl_03)

html_01 = driver_01.page_source
#print(html_01)

Source_01 = BeautifulSoup(html_01,"html.parser")
#Source_01 = BeautifulSoup(html_01,"lxml")
#Source_01 = BeautifulSoup(html_01)

time.sleep(5)

#print(Source_01)
print(Source_01.prettify())
 
o = open(prePath +'result_list.txt', 'w', encoding='utf-8')
o.write("")
o.write(Source_01.prettify())
o.close()    




var_list = [1, 3, 5, 7, 9]
for ii in var_list:
    print("----------------------------------------")


Demo_insta = Source_01.select('._a3wf._-kb.segoe') 
print(Demo_insta)

for each_div in Source_01.findAll('div',{'class':'list'}):
    print(each_div)


"""
x_1 = 1

for i in Demo_insta:
    print("https://www.instagram.com/" + i.a['href'])
    #img_01 = i.select_one('_aagt').img['src']
    #print(img_01)
"""

driver_01.close()
반응형
반응형

로컬 크롬 버전 확인 : chrome://version/

 

크롬드라이브 다운로드       https://sites.google.com/a/chromium.org/chromedriver/downloads  

 

반응형
반응형

https://pythontutor.com/

 

Python Tutor: Learn Python, JavaScript, C, C++, and Java by visualizing code

Learn Python, JavaScript, C, C++, and Java This coding tutor tool helps you learn Python, JavaScript, C, C++, and Java by visualizing code execution. You can use it to debug your homework assignments and as a supplement to online coding tutorials. Related

pythontutor.com

This coding tutor tool helps you learn Python, JavaScript, C, C++, and Java by visualizing code execution. You can use it to debug your homework assignments and as a supplement to online coding tutorials.

Related services: JavaScript Tutor C Tutor C++ Tutor Java Tutor

Over ten million people in more than 180 countries have used Python Tutor to visualize over 100 million pieces of code. It's the most widely-used program visualization tool for computing education.

 

이 코딩 튜터 도구는 코드 실행을 시각화하여 Python, JavaScript, C, C 및 Java를 배우는 데 도움이 됩니다. 이를 사용하여 숙제를 디버그하고 온라인 코딩 자습서를 보완할 수 있습니다. 지금 코드 작성 및 시각화 시작.

Learn Python, JavaScript, C, C++, and Java 로직 시각화

https://pythontutor.com/visualize.html#mode=edit

 

Python Tutor code visualizer: Visualize code in Python, JavaScript, C, C++, and Java

Please wait ... your code is running (up to 10 seconds) Write code in Python 3.6 Java 8 JavaScript ES6 C (gcc 9.3, C17 + GNU extensions) C++ (g++ 9.3, C++20 + GNU extensions) ------ [unsupported] Python 2.7 [unsupported] C (gcc 4.8, C11) [unsupported] C++

pythontutor.com

반응형
반응형

파이콘 한국 2022 - https://2022.pycon.kr/ 

 

온라인 컨퍼런스 10/1(토)~10/2(일)

 

페이스북 : https://www.facebook.com/pyconkorea

Twitter : https://twitter.com/PyConKR

반응형
반응형

가상환경에  https://github.com/lovit/customized_konlpy  들어가서 내려받은 후에 잘 적용시켜야 한다. 

 

https://inspiringpeople.github.io/data%20analysis/ckonlpy/ 에서는 이미 가상환경을 잘 알고 있다는 가정하에 작성된거 같다.

 

1. 대상 폴더에 내려받고

2. python 가상환경 들어간 후에  "  activate main " 

   - 나는  main 이라는 가상환경을 따로 만들었다.

3. 내려받은 파일의  setup.py를 실행. 

4. python 실행 시킨후 예제 구문 실행해보면 되는 것을 확인 할 수 있다.

  - 2022-07-04 다시 확인해봤는데 잘 된다. 

 

한국어 자연어처리를 할 수 있는 파이썬 패키지, KoNLPy의 customized version입니다.

customized_KoNLPy는 확실히 알고 있는 단어들에 대해서는 라이브러리를 거치지 않고 주어진 어절을 아는 단어들로 토크나이징 / 품사판별을 하는 기능을 제공합니다. 이를 위해 template 기반 토크나이징을 수행합니다.

사전: {'아이오아이': 'Noun', '는': 'Josa'}
탬플릿: Noun + Josa

위와 같은 단어 리스트와 탬플릿이 있다면 '아이오아이는' 이라는 어절은 [('아이오아이', 'Noun'), ('는', 'Josa')]로 분리됩니다.

Install

$ git clone https://github.com/lovit/customized_konlpy.git

$ pip install customized_konlpy

Requires

  • JPype >= 0.6.1
  • KoNLPy >= 0.4.4

 

반응형
반응형

[python] Pandas 데이터프레임(Dataframe)을 txt로 저장하기

 

.to_csv를 이용하면 된다. 

import pandas as pd



#dictionary형 자료형을 판다스 데이터프레임으로 만들어줌 
#orient=index를 넣어야 행으로 쭉 나열이 됨 
df=pd.DataFrame.from_dict(count, orient='index')

df.to_csv('bigKeyword_all.txt')
df.head(100).to_csv('bigKeyword_top100.txt')


# Dataframe의 내용을 csv로 생성
## DataFrame.to_csv(path_or_buf=None, sep=',', na_rep='', float_format=None, columns=None, header=True, index=True, index_label=None, mode='w'
#                    , encoding=None, compression='infer', quoting=None, quotechar='"', line_terminator=None, chunksize=None, date_format=None
#                    , doublequote=True, escapechar=None, decimal='.', errors='strict')
df.to_csv('output/word_ex_note_1.csv', index = False, header=False, line_terminator=False, encoding='utf-8-sig')
df.head(100).to_csv('output/word_ex_note_100.csv', header=False, line_terminator=False, encoding='utf-8-sig')
df.to_csv('output/word_ex_note_1.txt', sep = '\t', index = False,header=False, line_terminator=False, encoding='utf-8-sig')

 

 

반응형
반응형

PyScript -  Run Python in Your HTML

 

https://pyscript.net/

 

Pyscript.net

Run Python code in your HTML.

pyscript.net

Examples: https://pyscript.net/examples

PyScript

What is PyScript

Summary

PyScript is a Pythonic alternative to Scratch, JSFiddle, and other "easy to use" programming frameworks, with the goal of making the web a friendly, hackable place where anyone can author interesting and interactive applications.

To get started see the getting started tutorial.

For examples see the pyscript folder.

Longer Version

PyScript is a meta project that aims to combine multiple open technologies into a framework that allows users to create sophisticated browser applications with Python. It integrates seamlessly with the way the DOM works in the browser and allows users to add Python logic in a way that feels natural both to web and Python developers.

Try PyScript

To try PyScript, import the appropriate pyscript files to your html page with:

<link rel="stylesheet" href="https://pyscript.net/alpha/pyscript.css" />
<script defer src="https://pyscript.net/alpha/pyscript.js"></script>

You can then use PyScript components in your html page. PyScript currently implements the following elements:

  • <py-script>: can be used to define python code that is executable within the web page. The element itself is not rendered to the page and is only used to add logic
  • <py-repl>: creates a REPL component that is rendered to the page as a code editor and allows users to write executable code

Check out the pyscriptjs/examples folder for more examples on how to use it, all you need to do is open them in Chrome.

How to Contribute

Read the contributing guide to learn about our development process, reporting bugs and improvements, creating issues and asking questions.

Resources

https://github.com/pyscript/pyscript

 

GitHub - pyscript/pyscript: Home Page: https://pyscript.net Examples: https://pyscript.net/examples

Home Page: https://pyscript.net Examples: https://pyscript.net/examples - GitHub - pyscript/pyscript: Home Page: https://pyscript.net Examples: https://pyscript.net/examples

github.com

Anaconda Engineering Blog

 

Anaconda Engineering Blog

Sat 30 April 2022 By Fabio Pliger - pandas One of the main reasons I joined Anaconda seven and a half years ago was the company’s commitment to the data science and Python communities by creating tools that enable people to do more with less. Today I'm h

engineering.anaconda.com

 

반응형
반응형

초보자를 위한 파이썬 300제 

https://wikidocs.net/book/922

 

초보자를 위한 파이썬 강의/기초 300 문제 같이 풀어보기

https://www.youtube.com/watch?v=SiK4iYt_7-s&list=PLNPt2ycoheHqhS_OP4XA8nWycWQWnQtki&index=1

 

반응형

+ Recent posts