SlideShare a Scribd company logo
Daemin PARK
Korea Press Foundation
1
Toward a News Data Science
Research Plans
Toward a News Data Science
Research Histories
2
Toward a News Data Science
Improving Analytics
Designing Systems
Creating Ecosystems
3
Multi-Level Semantic Network Analysis of News
Level of
Analysis
Network
Type
Node Edge Analysis Algorithm
Named
Entities
1 mode
Person Cooccurrence
in articles
Disputant Degree centrality
Organization Relevance Tie strength
Topic
Cooccurrence
in quotes
Depth of discussion Degree centrality
Relevance Tie strength
2 mode
Person-Topic Cooccurrence
in quotes
- Specialists/Generalists
- Main issues/Peripheral issues
2-mode degree centrality
Organization-Topic
Sentences 1 mode Quotes
Cooccurrence
in articles
+
Identical sources
+
Similarity
Agenda network Clustering
Semantic distance Manhattan distance
Semantic path Path
Main theme Degree centrality
Summary Diameter
Particularization Clique
Media 1 mode Media Similarity
Uniqueness Normalized sum of reversed similarity
synchronization Ratio of duplication
Park, D.M., Baek, Y.M., & Kim, S.H. (2015). News big data analysis system. Seoul, Korea: Korea Press Foundation.
4
News Big Data Analysis System
5
News Big Data Ecosystem
Park, D.M., Kim, S.H., & Yang J.A. (2014). Strategies for smart news media platform innovation. Seoul, Korea: Korea Press Foundation.
Big Data Analysis
System
-Text Mining
-Computer Vision
-Semantic Net Analysis
Data Driven
Services
- Expert System
- News Startups
Content
Provider
– Media
- User
- Experts
Social Media
- Advanced Search Engine
- CMS
- SNS, chatbot
- Ads
open
API
opensource
content
opendata
revenueshare
Archive
unstructured
data
revenueshare
openAPI
6
Research Plans
Research Histories
Toward a News Data Science
7
News Source Network Analysis
Park, D.M.(2013). News source network analysis as big data analytics of news articles. Korean Journal of Journalism and Communication Studies. 57(6). 233-261.
2 persons
1 article
2 persons
2 articles
4 persons
1 article
4 persons
2 articles
8
Distribution of Semantic Network
Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic.
Information Sciences. 327. 32-52.
1.6±0.2
9
Fat Tailed, Micro-small World
Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327.
32-52.
Important Sources
Barack Obama
Jay Carney
Ban Kimoon
John Kerry
Victoria Nuland
Kim Hyunwook
Susan Rice
…
10
Crawling Advanced NLP Customized SNA Discourse Analysis
Text Mining with NLP & SNA
- tokenization
- stemming
- stopword elimination
- tagging part of speech
- Indexing
- sentence boundary
recognition
- URL tagging
- co-occurrence analysis
- partial parsing
- named entity
recognition
- coreference resolution
- word sense
disambiguation
- classification
- clustering
- visualization
- data cleansing
- time series content analysis
- governmentalitiy studies
- projector
- file name standardizer
- edge list converter
- degree centrality
- periodic analysis
- degree exponent
- rank
- quote rank
- description
- Fragmentation
Park, D.M.(2016). Natural language processing of news articles: A case of ‘NewsSource beta’. Korean Communication Theory. 12(1). 4-52.
- crawler
- data aggregation
BigKinds
Semantic Net
Analyzer
11
Content Analysis: <News Big Data Analytics & Insights>
12
Visualization of Millions of News
Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years.
Korean Journal of Journalism and Communication Studies. 60(5). 353-407.
13
Automated Time Series Content Analysis
Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years.
Korean Journal of Journalism and Communication Studies. 60(5). 353-407.
14
Toward a News Data Science
Research Plans
Research Histories
15
Available Data
Data Sources Language Period No. of Media No. of Articles Topics
KINDS
Korean
1 Jan. 1990 - 30 Jun. 2014 66 About 30 million All
BIGKINDS 1 Jan. 1990- 31 Aug. 2016 44 About 30 million All
Naver, Daum 1 Jun. 2016 - 30 Jun. 2016 200 About 6 million All
UPI
English
4 Jan. 2010 – 16 Jul. 2013 1 About 0.15 million All
LexisNexis 1 Jan.1999 – 31 Dec. 2013 10* About 73 thousand North Korea
Type of Named Entities No. of Entities
Person
Korean 116,787
Foreigner 6,438
Organization
489,023
148,405
Rank 1,035
* NYT, FT, WP, the Daily Yomiuri (Tokyo), the Nikkei Weekly(Japan), South China Morning Post, The Business Times, The Strait Times,
Korea Herald, Korea Times
16
Current Research Projects
No. Themes Collaboration Progress Journal
1 Debating chatbot? : Sentence-level news search engine Prof. B.W. Suh (SNU) Prototyping complete SCI
2
Is user-centrism a journalistic value?:
Social media design based on news big data
Prof. J.S. Lee (SNU) UI design complete SCI
3 Financialization of KPOP
Prof. G.T. Lee
(George Mason Uni.)
English draft
in progress
SSCI
4 Political change and journalists’ use of news sources
Prof. Y.M. Baek
(Yonsei Uni.)
English draft
in progress
SSCI
5 Politicization of Hallyu Prof. S.K. Hong (SNU)
Data analysis
in progress
SSCI
6
Time series content analysis on ‘public opinion’, ‘people's
voice’, and ‘people's livelihood’
Dr. S.H. Kim (KPF)
Data analysis
in progress
SSCI
7 Prediction of stock prices(KOSPI)
Prof. W.S. Lee
(Dongseo Uni.)
Dr. Y.S. Park
(Bank of Korea)
Data analysis
in progress
KCI
8 Prediction of North Korea’s provocation
Prof. Y.H. Kim
(Sungkyungkwan Uni.)
Data analysis
in progress
SSCI
9 Time series content analysis on ‘social media’
Prof. E.J. Lee
(SNU)
Data crawling
complete
SSCI
17
Integration of Heterogeneous Data for Expert Systems
- Multimedia: texts, audios, videos,
interactive units
- Multilevel: words, sentences, articles, media,
systems
- Multilingual: Korean, English, Japanese,
Chinese, …
- Multisource: news, reports, journals,
literatures, behaviors, sensors …
18
Advanced Methodology
Opinion Dynamics Bayesian Statistics Machine Learning
19
Facebook was not originally created to be a company.
It was built to accomplish a social mission :
to make the world more open and connected.
Be open,
build social value.
Mark Zuckerberg’s Letter to Investors: ‘The Hacker Way’
Q & A
20

More Related Content

PDF
Dynamic Organization of User Historical Queries
PPT
Movie business ppt by GIGI
PDF
Building efficient and effective metasearch engines
PDF
Social Data Mining
PPT
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
PDF
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
PDF
P2P DOMAIN CLASSIFICATION USING DECISION TREE
PDF
Groundhog day: near duplicate detection on twitter
Dynamic Organization of User Historical Queries
Movie business ppt by GIGI
Building efficient and effective metasearch engines
Social Data Mining
Information Seeking with Social Signals: Anatomy of a Social Tag-based Explor...
“What is WeGov” - User Guide for the Phase 2 Evaluation (in English)
P2P DOMAIN CLASSIFICATION USING DECISION TREE
Groundhog day: near duplicate detection on twitter

Similar to Toward a news data science (20)

PDF
On the Coverage of Science in the Media a Big Data Study on the Impact of th...
PDF
SNOW_WWW
PDF
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
PDF
IRJET- Fake News Detection
PDF
IRJET- Milestones and Challenges of Fake News Detection using Digital Forensi...
PDF
Tools for (Almost) Real-Time Social Media Analysis
PPTX
Global Media Monitor - Marko Grobelnik
PPTX
How to utilize ‘big data’ on SNS for academic purpose?
DOCX
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
PDF
Samos Summit 2013 ARCOMEM - The Journalistic approach
PDF
International life Sciences
PDF
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
PDF
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
PDF
Dynamics of Semantic Networks of Independence Day Speeches
PPTX
Semanticnews 230913-final
PDF
20574-38941-1-PB.pdf
PDF
Finding News Curators in Twitter
PDF
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
PDF
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
PDF
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
On the Coverage of Science in the Media a Big Data Study on the Impact of th...
SNOW_WWW
빅데이터 시대의 미디어&커뮤니케이션 교육과 연구
IRJET- Fake News Detection
IRJET- Milestones and Challenges of Fake News Detection using Digital Forensi...
Tools for (Almost) Real-Time Social Media Analysis
Global Media Monitor - Marko Grobelnik
How to utilize ‘big data’ on SNS for academic purpose?
박한우 영어 이력서 Curriculum vitae 경희대 행사 제출용
Samos Summit 2013 ARCOMEM - The Journalistic approach
International life Sciences
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Emergent Methods: Multilingual narrative tracking in the news - real-time exp...
Dynamics of Semantic Networks of Independence Day Speeches
Semanticnews 230913-final
20574-38941-1-PB.pdf
Finding News Curators in Twitter
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
INTELLIGENT AGENT FOR PUBLICATION AND SUBSCRIPTION PATTERN ANALYSIS OF NEWS W...
Ad

More from Daemin Park (20)

PDF
8 week: Technology of Platformless Media Blockchain
PDF
7주차: 플랫폼리스 미디어 블록체인 모형
PDF
Steemit and Governance for Creators (2019-S: Media Blockchain)
PDF
6 week: Cryptoeconomics over the mechanism design (2019-S: Media Blockchain)
PDF
4주차: 플랫폼리스 미디어 블록체인
PDF
Platformless Mediablockchain (2019-S: Media Blockchain)
PDF
Media Innovation Ecosystem (2019-S: Media Blockchain)
PDF
세션 3-2: 도시에도 OS가 필요하다 (홍주석)
PDF
세션 3-3 로컬 크리에이터, 힙스터인가 혁신가인가 (김혁주)
PDF
세션 3-1: 지역방송의 크로스미디어 전략 (하현제)
PDF
세션 2-2: 블록체인 기반 미디어 유통을 위한 메타데이터 표준의 중요성 (박춘원)
PDF
세션 2-3: 블록체인이 콘텐츠 딜리버리 시스템에 미치는 영향과 그 변화에 관하여 (남현우)
PDF
세션 2-4: 자유 없는 블록체인은 디스토피아의 BIG (BR)Other (유성훈)
PDF
세션 1-1: 블록체인 환경에서 미디어의 미래전략연구(김상호)
PDF
13주차 뉴스 빅데이터 기반 저널리즘 연구
PDF
11주차 뉴스 중심어 연결망 분석
PDF
10주차 뉴스 정보원-주제 연결망 분석
PDF
9주차 뉴스 주제 연결망 분석
PDF
7주차 뉴스 정보원 연결망 분석
PDF
6주차 의미 연결망 분석 이론
8 week: Technology of Platformless Media Blockchain
7주차: 플랫폼리스 미디어 블록체인 모형
Steemit and Governance for Creators (2019-S: Media Blockchain)
6 week: Cryptoeconomics over the mechanism design (2019-S: Media Blockchain)
4주차: 플랫폼리스 미디어 블록체인
Platformless Mediablockchain (2019-S: Media Blockchain)
Media Innovation Ecosystem (2019-S: Media Blockchain)
세션 3-2: 도시에도 OS가 필요하다 (홍주석)
세션 3-3 로컬 크리에이터, 힙스터인가 혁신가인가 (김혁주)
세션 3-1: 지역방송의 크로스미디어 전략 (하현제)
세션 2-2: 블록체인 기반 미디어 유통을 위한 메타데이터 표준의 중요성 (박춘원)
세션 2-3: 블록체인이 콘텐츠 딜리버리 시스템에 미치는 영향과 그 변화에 관하여 (남현우)
세션 2-4: 자유 없는 블록체인은 디스토피아의 BIG (BR)Other (유성훈)
세션 1-1: 블록체인 환경에서 미디어의 미래전략연구(김상호)
13주차 뉴스 빅데이터 기반 저널리즘 연구
11주차 뉴스 중심어 연결망 분석
10주차 뉴스 정보원-주제 연결망 분석
9주차 뉴스 주제 연결망 분석
7주차 뉴스 정보원 연결망 분석
6주차 의미 연결망 분석 이론
Ad

Recently uploaded (20)

PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Introduction to Business Data Analytics.
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
climate analysis of Dhaka ,Banglades.pptx
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Moving the Public Sector (Government) to a Digital Adoption
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
IBA_Chapter_11_Slides_Final_Accessible.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
oil_refinery_comprehensive_20250804084928 (1).pptx
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Clinical guidelines as a resource for EBP(1).pdf
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Introduction to Knowledge Engineering Part 1
STUDY DESIGN details- Lt Col Maksud (21).pptx
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Fluorescence-microscope_Botany_detailed content
Supervised vs unsupervised machine learning algorithms
Introduction to Business Data Analytics.
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf

Toward a news data science

  • 1. Daemin PARK Korea Press Foundation 1 Toward a News Data Science
  • 2. Research Plans Toward a News Data Science Research Histories 2
  • 3. Toward a News Data Science Improving Analytics Designing Systems Creating Ecosystems 3
  • 4. Multi-Level Semantic Network Analysis of News Level of Analysis Network Type Node Edge Analysis Algorithm Named Entities 1 mode Person Cooccurrence in articles Disputant Degree centrality Organization Relevance Tie strength Topic Cooccurrence in quotes Depth of discussion Degree centrality Relevance Tie strength 2 mode Person-Topic Cooccurrence in quotes - Specialists/Generalists - Main issues/Peripheral issues 2-mode degree centrality Organization-Topic Sentences 1 mode Quotes Cooccurrence in articles + Identical sources + Similarity Agenda network Clustering Semantic distance Manhattan distance Semantic path Path Main theme Degree centrality Summary Diameter Particularization Clique Media 1 mode Media Similarity Uniqueness Normalized sum of reversed similarity synchronization Ratio of duplication Park, D.M., Baek, Y.M., & Kim, S.H. (2015). News big data analysis system. Seoul, Korea: Korea Press Foundation. 4
  • 5. News Big Data Analysis System 5
  • 6. News Big Data Ecosystem Park, D.M., Kim, S.H., & Yang J.A. (2014). Strategies for smart news media platform innovation. Seoul, Korea: Korea Press Foundation. Big Data Analysis System -Text Mining -Computer Vision -Semantic Net Analysis Data Driven Services - Expert System - News Startups Content Provider – Media - User - Experts Social Media - Advanced Search Engine - CMS - SNS, chatbot - Ads open API opensource content opendata revenueshare Archive unstructured data revenueshare openAPI 6
  • 8. News Source Network Analysis Park, D.M.(2013). News source network analysis as big data analytics of news articles. Korean Journal of Journalism and Communication Studies. 57(6). 233-261. 2 persons 1 article 2 persons 2 articles 4 persons 1 article 4 persons 2 articles 8
  • 9. Distribution of Semantic Network Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52. 1.6±0.2 9
  • 10. Fat Tailed, Micro-small World Park, D.M., Kim, G.N., & On, B.W.(2016). Understanding the network fundamentals of the news sources associated with a specific topic. Information Sciences. 327. 32-52. Important Sources Barack Obama Jay Carney Ban Kimoon John Kerry Victoria Nuland Kim Hyunwook Susan Rice … 10
  • 11. Crawling Advanced NLP Customized SNA Discourse Analysis Text Mining with NLP & SNA - tokenization - stemming - stopword elimination - tagging part of speech - Indexing - sentence boundary recognition - URL tagging - co-occurrence analysis - partial parsing - named entity recognition - coreference resolution - word sense disambiguation - classification - clustering - visualization - data cleansing - time series content analysis - governmentalitiy studies - projector - file name standardizer - edge list converter - degree centrality - periodic analysis - degree exponent - rank - quote rank - description - Fragmentation Park, D.M.(2016). Natural language processing of news articles: A case of ‘NewsSource beta’. Korean Communication Theory. 12(1). 4-52. - crawler - data aggregation BigKinds Semantic Net Analyzer 11
  • 12. Content Analysis: <News Big Data Analytics & Insights> 12
  • 13. Visualization of Millions of News Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407. 13
  • 14. Automated Time Series Content Analysis Park, D.M.(2016). Automated time series content analysis with news big data analytics: Analyzing sources and quotes in one million news articles for 26 years. Korean Journal of Journalism and Communication Studies. 60(5). 353-407. 14
  • 15. Toward a News Data Science Research Plans Research Histories 15
  • 16. Available Data Data Sources Language Period No. of Media No. of Articles Topics KINDS Korean 1 Jan. 1990 - 30 Jun. 2014 66 About 30 million All BIGKINDS 1 Jan. 1990- 31 Aug. 2016 44 About 30 million All Naver, Daum 1 Jun. 2016 - 30 Jun. 2016 200 About 6 million All UPI English 4 Jan. 2010 – 16 Jul. 2013 1 About 0.15 million All LexisNexis 1 Jan.1999 – 31 Dec. 2013 10* About 73 thousand North Korea Type of Named Entities No. of Entities Person Korean 116,787 Foreigner 6,438 Organization 489,023 148,405 Rank 1,035 * NYT, FT, WP, the Daily Yomiuri (Tokyo), the Nikkei Weekly(Japan), South China Morning Post, The Business Times, The Strait Times, Korea Herald, Korea Times 16
  • 17. Current Research Projects No. Themes Collaboration Progress Journal 1 Debating chatbot? : Sentence-level news search engine Prof. B.W. Suh (SNU) Prototyping complete SCI 2 Is user-centrism a journalistic value?: Social media design based on news big data Prof. J.S. Lee (SNU) UI design complete SCI 3 Financialization of KPOP Prof. G.T. Lee (George Mason Uni.) English draft in progress SSCI 4 Political change and journalists’ use of news sources Prof. Y.M. Baek (Yonsei Uni.) English draft in progress SSCI 5 Politicization of Hallyu Prof. S.K. Hong (SNU) Data analysis in progress SSCI 6 Time series content analysis on ‘public opinion’, ‘people's voice’, and ‘people's livelihood’ Dr. S.H. Kim (KPF) Data analysis in progress SSCI 7 Prediction of stock prices(KOSPI) Prof. W.S. Lee (Dongseo Uni.) Dr. Y.S. Park (Bank of Korea) Data analysis in progress KCI 8 Prediction of North Korea’s provocation Prof. Y.H. Kim (Sungkyungkwan Uni.) Data analysis in progress SSCI 9 Time series content analysis on ‘social media’ Prof. E.J. Lee (SNU) Data crawling complete SSCI 17
  • 18. Integration of Heterogeneous Data for Expert Systems - Multimedia: texts, audios, videos, interactive units - Multilevel: words, sentences, articles, media, systems - Multilingual: Korean, English, Japanese, Chinese, … - Multisource: news, reports, journals, literatures, behaviors, sensors … 18
  • 19. Advanced Methodology Opinion Dynamics Bayesian Statistics Machine Learning 19
  • 20. Facebook was not originally created to be a company. It was built to accomplish a social mission : to make the world more open and connected. Be open, build social value. Mark Zuckerberg’s Letter to Investors: ‘The Hacker Way’ Q & A 20