Python learning for Natural Language Processing (2nd)

홍은기
PYTHON LEARNING FOR
NATURAL LANGUAGE
PROCESSING

1. Learning Sequence
2. Lists and Functions
3. Loops
4. Processing Raw Text with NLTK
CONTENTS

• 1. Python Syntax
• 2. Strings and Console Output
• 3. Conditionals and Control Flow
• 4. Functions
• 5. Lists & Dictionaries
• 6. Student Becomes the Teacher(test)
• 7. Lists and Functions
• 8. Loops
• 9. Exam Statistics(test)
• 10. Advanced Topic in Python
• 11. Introduction to Classes
• 12. File Input and Output
LEARNING SEQUENCE
(WWW.CODECADEMY.COM)

PROCESSING RAW TEXT
WITH NLTK
(http://www.nltk.org/book/)
웹 상의 HTML 문서로부터 텍스트를 추출 후 ,
NLTK 를 사용하여 텍스트의 키워드를 추출
After extracting a text from HTML document on the
web, I tried to extract keywords from the text with
NLTK.

EXAMPLES
이주 아동 외면하는 ' 다문화 한국사회‘
(http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea)
[('', 65), ('(', 9), (')', 9), (' 한다 ', 6), ("'", 6), (' 있다 ', 5),
(' 아동 ', 5), (' 큰 ', 5), (' 모든 ', 5), (' 일 ', 5), (' 국제 ', 4),
(' 대한민국 ', 4), (' 나라 ', 4), (' 땅 ', 4), (' 국제사회 ', 4),
(' 인권 ', 4), (' 의원 ', 3), (' 세계 ', 3), (' 여의 ', 3), (' 수 ', 3),
(' 안 ', 3), (' 강한 ', 3), (' 불문 ', 2), (' 이주 ', 2), (' 법무부 ', 2)]

1. HTML TO RAW TEXT
# -*- coding: utf-8 -*-
from urllib import request
import nltk, re, pprint
from nltk import word_tokenize
from nltk import *
from bs4 import BeautifulSoup
url = “http://www.huffingtonpost.kr/kyongwhan-
ahn/story_b_6927970.html?utm_hp_ref=korea”
html = request.urlopen(url).read().decode(‘utf8’)
raw = BeautifulSoup(html).get_text()

2. RAW TEXT TO LIST
raw = raw[30123:32364]
print (type(raw))
-> <class ‘str’>
tokens = word_tokenize(raw)
print (type(tokens))
-> <class ‘list’>

3. LIST TO VOCABULARIES
words = Trial.NounExtractor(token)

3. LIST TO VOCABULARIES
token = [‘ 철수는’ , ‘ 동생에게’ , ‘ 자전거를’ , ‘ 빌려주었다’ ]
words = Trial.NounExtractor(token)
words = [‘ 철수’ , ‘ 동생’ , ‘ 자전거’ , ‘ 빌려주었다’ ]

4. FREQUENCY DISTRIBUTION
fdist = FreqDist(words)
print (fdist.most_common(25))

EXAMPLES
나를 끌어내린 롯데월드
(http://www.huffingtonpost.kr/seungjoon-
ahn/story_b_6928016.html?utm_hp_ref=korea)
[('', 63), (' 그 ', 19), (' 것 ', 12), (' 우리 ', 10), ('!', 8), (' 없 ', 8),
(' 놀이기구 ', 8), (' 직원 ', 8), (' 수 ', 8), (' 시각장애인 ', 8), (' 안 ', 6),
(' 내 ', 5), (' 있었다 ', 5), (' 않 ', 5), (' 매뉴얼 ', 5), (' 근거 ', 5), (' 사람 ', 5),
(' 롯데월드 ', 5), (' 다른 ', 5), (' 있던 ', 4), (' 한 ', 4), (' 장애인 ', 4),
(' 설명 ', 4), (' 때 ', 4), (' 상황 ', 4)]

POS TAGGED
Thank_VB You_PRP !_.

Python learning for Natural Language Processing (2nd)

More Related Content

More from EunGi Hong (16)

Recently uploaded (20)

Python learning for Natural Language Processing (2nd)