AUTO TRADE/Web Scraping

2.3 웹 스크래핑 - 원하는 자료 찾기 (3)

지난 게시글

종목 코드 가져오기

지난 게시글에서 제작한 i['href']와 i['onclick'] 안에 종목코드가 포함되어 있는데, 이 중 i['href'] 뒤에 있는 숫자들이 바로 그 종목의 종목 코드에 해당한다. 여기서의 i['href']는 자료형이 문자열(string)이기 때문에 우리는 인덱싱을 통해 해당 종목의 종목 코드를 가져올 수 있다. 다행히도 모든 종목의 종목코드는 6자리로 구성되어 있기 때문에, 뒤에서부터 6개의 문자를 가져오도록 하면 그게 곧 종목코드가 된다.

 from bs4 import BeautifulSoup as bs
import requests
 
url = 'https://finance.naver.com/sise'
response = requests.get(url)
 
if response.status_code == 200:
	html = response.text
	bsObj = bs(html, 'html.parser')
	test = bsObj.select_one('div.box_type_l')
	test_2 = test.select('td > a')
 
	for i in test_2:
		print("href:", i['href'])
		print("code:", i['href'][-6:])
		print("text:", i.get_text())
 
else:
	pass
    
    
>>>
href: /item/main.naver?code=002420
code: 002420
text: 세기상사
href: /item/main.naver?code=018700
code: 018700
text: 바른손
href: /item/main.naver?code=035620
code: 035620
text: 바른손이앤에이
(이하 생략)

주소 완성시키기

위의 결과값 중 href: 뒤에 이어지는 문자들은 주소에 해당하는데, 실제로 저 주소를 입력해서 들어가면 웹 사이트를 찾을 수 없다는 문구가 출력된다. 그 이유는 저건 완전한 주소가 아니기 때문이다. 실제 화면에 접속해서 주소를 확인해보자.

위의 사진을 보면 알 수 있듯이, href 뒤에 이어지는 문자열들은 빨간색 영역에 해당한다. 따라서 우리는 finance.naver.com을 앞에 붙여주어야 주소에 접근할 수 있는 것이다. 따라서 코드 내에서 주소를 추가적으로 입력해주도록 하자.

 from bs4 import BeautifulSoup as bs
import requests
 
url = 'https://finance.naver.com/sise'
response = requests.get(url)
 
if response.status_code == 200:
	html = response.text
	bsObj = bs(html, 'html.parser')
	test = bsObj.select_one('div.box_type_l')
	test_2 = test.select('td > a')
 
	for i in test_2:
		print("href:", "finance.naver.com" + i['href'])
		print("code:", i['href'][-6:])
		print("text:", i.get_text())
 
else:
	pass
 
 
>>>
href: finance.naver.com/item/main.naver?code=002420
code: 002420
text: 세기상사
href: finance.naver.com/item/main.naver?code=018700
code: 018700
text: 바른손
href: finance.naver.com/item/main.naver?code=035620
code: 035620
text: 바른손이앤에이

href 안에 있는 값을 그대로 복사해서 주소창에 붙여넣으면 올바르게 접속되는 것을 확인할 수 있다.

728x90

저작자표시 비영리 변경금지

Contents

지난게시글

종목코드가져오기

주소완성시키기

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

새소식

2.3 웹 스크래핑 - 원하는 자료 찾기 (3)

지난 게시글

종목 코드 가져오기

주소 완성시키기

당신이 좋아할만한 콘텐츠

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

	from bs4 import BeautifulSoup as bs
	import requests

	url = 'https://finance.naver.com/sise'
	response = requests.get(url)

	if response.status_code == 200:
	html = response.text
	bsObj = bs(html, 'html.parser')
	test = bsObj.select_one('div.box_type_l')
	test_2 = test.select('td > a')

	for i in test_2:
	print("href:", i['href'])
	print("code:", i['href'][-6:])
	print("text:", i.get_text())

	else:
	pass


	>>>
	href: /item/main.naver?code=002420
	code: 002420
	text: 세기상사
	href: /item/main.naver?code=018700
	code: 018700
	text: 바른손
	href: /item/main.naver?code=035620
	code: 035620
	text: 바른손이앤에이
	(이하 생략)