xml 데이터 분석

xml은 특정 데이터를 태그에 감싸서 표현하는 마크업 형태의 텍스트 데이터를 말한다. 데이터 밑에 서브데이터를 붙이고 계속해서 붙여서 사용하여 계층적 구조를 갖고있다. <요소="">데이터 형태로 되어 있으며 요소의 이름은 아무거나 하면 된다.

from bs4 import BeautifulSoup
import urllib.request as req
import os.path


url = "http://www.kma.go.kr/weather/forecast/mid-term-rss3.jsp?stnId=108"
savename = "forecast.xml"

if not os.path.exists(savename):
    req.urlretrieve(url,savename)


xml = open(savename,"r",encoding="utf-8").read()
soup = BeautifulSoup(xml, 'html.parser')

info = {}
for location in soup.find_all("location"):
    name = location.find('city').string
    weather = location.find('wf').string
    if not (weather in info):
        info[weather] = []
    info[weather].append(name)

코드를 보면 urllib의 request를 이용해서 xml파일을 다운 받고 BeautifulSoup를 통해 분석한다. 요소의 이름을 find(요소이름)으로 찾으면 값을 가져올 수 있다.

'Python-이론 > python-인공지능2' 카테고리의 다른 글

yaml데이터 분석하기 (0)	2018.04.03
json데이터 분석 (0)	2018.04.03
텍스트 데이터와 바이너리 데이터 (0)	2018.04.02
web api를 사용하여 데이터 추출하기 (0)	2018.03.30
ubuntu에서 selenium과 Phantomjs를 사용해서 스크래핑해오기 (0)	2018.03.29

후니의 컴퓨터

후니의 컴퓨터

태그

최근글

댓글

공지사항

아카이브

xml 데이터 분석

'Python-이론 > python-인공지능2' 카테고리의 다른 글

관련글

티스토리툴바