SlideShare a Scribd company logo
EXPLORING ARTICLE NETWORKS
ON WIKIPEDIA WITH NODEXL
PRESENTATION DESCRIPTION
• With 4.8 million articles in the English version of Wikipedia, this crowd-sourced online
encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to
source for a first read on a topic. The open-source and free Network Overview, Discovery
and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the
capture of “article networks” from Wikipedia. Such content network analysis-based data
visualizations enable the development of research leads; some understandings of public
conceptualizations of related concepts, peoples, events, and phenomena; the profiling of
Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will
showcase this affordance of NodeXL and provide some ideas for practical applications of this
channel of research and knowing.
2
OVERVIEW
• Wikipedia ethos and practices
• Wikipedia
• The many Wikipedias; the English Wikipedia
• The Wikimedia Foundation
• MediaWiki and basic functionalities
• Basic article network analysis
• NodeXL and basic functionalities; automation
3
OVERVIEW (CONT.)
• http page networks on Wikipedia:
• article networks
• human author / editor networks
• robot networks
• Live demos
• Other (future) networks from Wikipedia
4
WIKIPEDIA ETHOS AND PRACTICES
• Objective, fact-based, and
research-focused
• Full research citations
• Isolating of opinions into Talk pages
• Open
• Open-access
• Open-source, public domain-released
• Crowd-sourced knowledge co-
creation; curated public data
• Crowd-funded 501(C)3; transparent
finances ($58.5 million goal for FY
2015)
• Editing via email-verified accounts
or Internet Protocol (IP) capture
5
WIKIPEDIA
THE MANY WIKIPEDIAS
• 288 Wikipedias (with 277 active)
• In order of articles: English (13.9%),
Swedish (5.6%), Dutch (5.2%), German
(5.25%), French (4.6%), Waray-Waray
(3.6%), Russian (3.5%), Cebuano
(3.4%), Italian (3.4%), Spanish (3.4%),
and Other (48.2%)
• (“List of Wikipedias” on Wikipedia)
THE ENGLISH WIKIPEDIA
• Founded in Jan. 15, 2001
• 4.8 million articles
• 25 million user accounts
• 1.347 administrators (“English
Wikipedia” on Wikipedia)
6
THE WIKIMEDIA FOUNDATION
• Objective: to encourage “the growth, development and distribution of free,
multilingual, educational content,” and to provide “the full content of these
wiki-based projects to the public free of charge”
• A range of projects: Wikipedia, Wikibooks, Wikiversity, Wikimedia
Commons, Wiktionary, Wikiquote, Wikivoyage, Wikidata, Wikinews,
Wikisource, Wikispecies, and MediaWiki (Wikimedia Foundation)
7
MEDIAWIKI AND BASIC FUNCTIONALITIES
• “wiki wiki”: “quick” or “fast” in Hawaiian
• Ward Cunningham as the developer of the first wiki software (WikiWikiWeb) in 1994 to
enable online collaborations with history versioning and rollback capabilities
• MediaWiki first created by the Wikimedia Foundation in 2002
• Magnus Manske and Lee Daniel Crocker were the initial developers of this tool using PHP
(MediaWiki)
8
A WIKIMEDIA ARTICLE INTERFACE
9
A VIEW OF THE REVISION HISTORY
10
BASIC ARTICLE NETWORK ANALYSIS
• Basics of network graphs: nodes-links, entities-relationships, vertices-edges;
undirected or directed (digraphs) graphs; networks and meta-networks;
subgraphs and clusters, motifs; network centrality
• Direct ties represented in ego neighborhoods (with a maximum geodesic
distance or graph diameter of 2); also 1.5 degree ties for transitivity (with a
maximum geodesic distance or graph diameter of 3) and 2 degree ties to
include networks of the respective “alters” (with much larger maximum
geodesic distances possible)
11
BASIC ARTICLE NETWORK ANALYSIS (CONT.)
• Entities may be individuals or groups, contents, and other elements
• Relatedness: Article networks created based on in-links and outlinks; node
“degree”
• Other types of relatedness are possible such as based on word co-occurrences, title
relatedness (same synset or “synonym set”), shared categories, and others
• Relations are conceptualized as enabling paths
12
NODEXL AND BASIC FUNCTIONALITIES;
AUTOMATION
• A free and open-source add-on to Microsoft Excel available on the Microsoft
CodePlex platform
• Enables…
• Graph visualization (with datasets from UCINET, GraphML, and other types)
• Data extraction from a number of social media platform APIs; refreshed runs based on
the same parameters (macros)
• Large number of tools of graph analysis
• A number of layout algorithms and selections to represent the data visually
13
HTTP PAGE NETWORKS ON WIKIPEDIA
(IN THIS CASE)
• http page links within Wikipedia, not connecting out to the Surface Web
• One-directional (outlink) directional graph of the target Wikipedia page
• May include article page networks, human page networks, robot page networks, and
others
• Networks seeded by one target title or name (as long as the string appears as a
page in Wikipedia)
• No need for an application programming interface (API) on the MediaWiki platform
14
MEDIAWIKI
ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 237 VERTICES, 237
EDGES)
15
MEDIAWIKI ARTICLE
NETWORK ON
WIKIPEDIA
(1.5 DEG., 12,368 VERTICES AND
17,686 UNIQUE EDGES)
16
MEDIAWIKI
ARTICLE
NETWORK ON
WIKIPEDIA
(2 DEG., 923,006 VERTICES)
17
In the first run, the software
kicked up an “out of memory”
exception error and crashed.
Another run was conducted on a
different machine with more
processing capability. The
screenshots are from that data
extraction. The data itself
involved some edge pairs (over
half a dozen) in which one of the
vertices was missing.
EXAMPLE: ARTICLE NETWORK
• Who are individuals related to a topic? Events? Years? Topics? Which of
these may be useful leads to learn more about the basic seed topic?
• Based on a real-world individual, what is he or she known for? Who are
people that this person is connected with?
• Based on a technology, when was it originated? Who originated it? What
were precursor inventions? What inventions were linked to the particular
technology?
18
EXAMPLE: ARTICLE NETWORK (CONT.)
• Based on collected lists, who is on a target list, and for what?
• Based on a particular topic, are there gaps in the information based on
“missing” article links?
• Based on a particular phenomena, event, phrase, or individual, in a foreign
context and foreign language, what may be learned?
19
WIKI ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 162 VERTICES)
20
WEB_LOG_
ANALYSIS_
SOFTWARE
ARTICLE
NETWORK
ON
WIKIPEDIA
(1 DEG., 13 VERTICES)
21
EXAMPLE: HUMAN (AUTHOR / EDITOR) USER
NETWORK
• Based on the human user’s network on Wikipedia, what articles does he or she
tend to edit? In total, what does this network suggest about the person behind
the edits?
• (This requires the existence of a user page though.)
22
USER:LWEDEKIND
NETWORK ON
WIKIPEDIA
(1 DEG., 9 VERTICES)
23
USER:THIS_LOUSY_
T-SHIRT ARTICLE
NETWORK ON
WIKIPEDIA
(1 DEG., 30 VERTICES)
24
EXAMPLE: ROBOT NETWORK
• Based on the approved robot user’s network, what are the interests of the
maker of the robot? What other accounts is the robot connected to?
25
USER:OGREBOT
NETWORK ON
WIKIPEDIA
(1 DEG., 5 VERTICES)
26
USER:EMAUSBOT
NETWORK ON
WIKIPEDIA
(1 DEG., 2 VERTICES)
27
ADDITIONAL APPROACHES
• Chaining from one target account to related others
• Cross-comparing information on the Wikipedia site with the extracted
networks
• Connecting the Wikipedia information with related sites on the Surface Web /
World Wide Web (WWW) and Internet
28
OTHER (FUTURE) NETWORKS FROM WIKIPEDIA
• The third-party tool to NodeXL has spaces to enable user-content (two-mode)
network extractions and the mapping of co-editing networks…but those
functions are not currently enabled (apparently)
29
DISCUSSIONS
• Questions?
• Ideas for research?
30

More Related Content

PDF
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
PDF
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
PPTX
Maltego Radium Mapping Network Ties and Identities across the Internet
PPTX
Basics of Maltego
PPT
Information Gathering With Maltego
PDF
Eavesdropping on the Twitter Microblogging Site
PDF
Using Maltego Tungsten to Explore Cyber-Physical Confluence in Geolocation
PDF
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools
Hashtag Conversations,Eventgraphs, and User Ego Neighborhoods: Extracting So...
Real-time Tweet Analysis w/ Maltego Carbon 3.5.3
Maltego Radium Mapping Network Ties and Identities across the Internet
Basics of Maltego
Information Gathering With Maltego
Eavesdropping on the Twitter Microblogging Site
Using Maltego Tungsten to Explore Cyber-Physical Confluence in Geolocation
Introduction to the Responsible Use of Social Media Monitoring and SOCMINT Tools

What's hot (20)

PDF
OSINT Social Media Techniques - Macau social mediat lc
KEY
Enterprise Open Source Intelligence Gathering
PPTX
OSINT Tool - Reconnaissance with Maltego
PDF
Social Media Analysis... according to Net7
PPTX
Hacker tool talk: maltego
PDF
What Your Tweets Tell Us About You, Speaker Notes
PDF
30 Tools and Tips to Speed Up Your Digital Workflow
PDF
Shibboleth: Open Source Distributed Authentication and Authorization
PDF
Gates Toorcon X New School Information Gathering
PPT
Social Data and Multimedia Analytics for News and Events Applications
PDF
OSINT- Leveraging data into intelligence
PPTX
Data mining for social media
PPT
Microsoft Research Cambridge 20071207 Workshop On Online Social Networks (T...
PDF
SemTech West 2011 - Digital Provenance
ODT
Riding The Semantic Wave
PPTX
Data Science Workflow
PPT
Owasp osint presentation - by adam nurudini
PPTX
A Return on Investment: Making the data work harder
PPTX
IoTA : Where IoT Meets Social Network
PPT
Archives 2.0, the Archives Hub and AIM25
OSINT Social Media Techniques - Macau social mediat lc
Enterprise Open Source Intelligence Gathering
OSINT Tool - Reconnaissance with Maltego
Social Media Analysis... according to Net7
Hacker tool talk: maltego
What Your Tweets Tell Us About You, Speaker Notes
30 Tools and Tips to Speed Up Your Digital Workflow
Shibboleth: Open Source Distributed Authentication and Authorization
Gates Toorcon X New School Information Gathering
Social Data and Multimedia Analytics for News and Events Applications
OSINT- Leveraging data into intelligence
Data mining for social media
Microsoft Research Cambridge 20071207 Workshop On Online Social Networks (T...
SemTech West 2011 - Digital Provenance
Riding The Semantic Wave
Data Science Workflow
Owasp osint presentation - by adam nurudini
A Return on Investment: Making the data work harder
IoTA : Where IoT Meets Social Network
Archives 2.0, the Archives Hub and AIM25
Ad

Viewers also liked (20)

PDF
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
PDF
LIWC-ing at Texts for Insights from Linguistic Patterns
PDF
Sentiment Analysis with NVivo 11 Plus
PDF
Formations & Deformations of Social Network Graphs
PDF
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
PPTX
LIWC Dictionary Expansion
PDF
Expert Perceptions of the Feasibility of MOOCs
PDF
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
PDF
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
PDF
Using Qualtrics for Online Trainings
PDF
Exploring Social Media with NodeXL
PDF
Building Surveys in Qualtrics for Efficient Analytics
PDF
Matrix Queries and Matrix Data Representations in NVivo 11 Plus
PDF
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
PDF
Using Qualtrics to Create Automated Online Trainings
PDF
Writing and Publishing about Applied Technologies in Tech Journals and Books
PDF
Building a Digital Learning Object w/ Articulate Storyline 2
PDF
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
PDF
Designing Online Learning to Actual Human Capabilities
PDF
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
Coding Social Imagery: Learning from a #selfie #humor Image Set from Instagram
LIWC-ing at Texts for Insights from Linguistic Patterns
Sentiment Analysis with NVivo 11 Plus
Formations & Deformations of Social Network Graphs
Auto Mapping Texts for Human-Machine Analysis and Sensemaking
LIWC Dictionary Expansion
Expert Perceptions of the Feasibility of MOOCs
Letting the Machine Code Qualitative and Mixed Methods Data in NVivo 10
Native Emigration from the U.S. and Renunciation of U.S. Citizenship
Using Qualtrics for Online Trainings
Exploring Social Media with NodeXL
Building Surveys in Qualtrics for Efficient Analytics
Matrix Queries and Matrix Data Representations in NVivo 11 Plus
Understanding Public Sentiment: Conducting a Related-Tags Content Network Ext...
Using Qualtrics to Create Automated Online Trainings
Writing and Publishing about Applied Technologies in Tech Journals and Books
Building a Digital Learning Object w/ Articulate Storyline 2
Fully Exploiting Qualitative and Mixed Methods Data from Online Surveys
Designing Online Learning to Actual Human Capabilities
See Ya! Creating a Custom Spatial-Based Linguistic Analysis Dictionary from ...
Ad

Similar to Exploring Article Networks on Wikipedia with NodeXL (20)

PDF
Analyzing Multidimensional Networks within MediaWikis
PPTX
Aswc2009 Smw Tutorial Part 1 Intro And Examples
PDF
Overview of the Research in Wimmics 2018
PDF
On the many graphs of the Web and the interest of adding their missing links.
PPTX
Knowledge Technologies: Opportunities and Challenges
PDF
Weeki - Wikipedia <- tweets
PPT
Analyzing social media networks with NodeXL - Chapter-15 Images
PPT
A Survey of the Landscape and State-of-Art in Semantic Wiki
PPTX
The Web of Data: do we actually understand what we built?
PPT
BioWikis BSB10
PDF
Applying And Extending Semantic Wikis For Semantic Web Courses
PPTX
Semantic Wiki: Social Semantic Web in Use
PPTX
Semantic Wiki: Social Semantic Web In Action:
PPT
Wikis as Social Networks: Evolution and Dynamics
PPTX
20120301 strata-marc smith-mapping social media networks with no coding using...
PPTX
Pre-SMWCon Spring 2012 meetup (short)
PDF
From Wikis to Knowledge Graphs
PDF
Getting Started with Knowledge Graphs
PPTX
Tutorial semantic wikis and applications
PPT
Wikis at work
Analyzing Multidimensional Networks within MediaWikis
Aswc2009 Smw Tutorial Part 1 Intro And Examples
Overview of the Research in Wimmics 2018
On the many graphs of the Web and the interest of adding their missing links.
Knowledge Technologies: Opportunities and Challenges
Weeki - Wikipedia <- tweets
Analyzing social media networks with NodeXL - Chapter-15 Images
A Survey of the Landscape and State-of-Art in Semantic Wiki
The Web of Data: do we actually understand what we built?
BioWikis BSB10
Applying And Extending Semantic Wikis For Semantic Web Courses
Semantic Wiki: Social Semantic Web in Use
Semantic Wiki: Social Semantic Web In Action:
Wikis as Social Networks: Evolution and Dynamics
20120301 strata-marc smith-mapping social media networks with no coding using...
Pre-SMWCon Spring 2012 meetup (short)
From Wikis to Knowledge Graphs
Getting Started with Knowledge Graphs
Tutorial semantic wikis and applications
Wikis at work

More from Shalin Hai-Jew (20)

PDF
Number Line (used with an Absolute Values presentation)
PDF
Absolute Values (slideshow used with a number line)
PDF
Academic Grant Pursuits Newsletter - July 2028
PDF
Academic Grant Pursuits Newsletter - June 2028
PDF
Academic Grant Pursuits Newsletter - May 2028
PDF
Academic Grant Pursuits Newsletter - April 2028
PDF
Academic Grant Pursuits Newsletter - March 2028
PDF
Academic Grant Pursuits Newsletter - February 2028
PDF
Academic Grant Pursuits Newsletter - January 2028
PDF
Academic Grant Pursuits Newsletter - December 2027
PDF
Academic Grant Pursuits Newsletter - November 2027
PDF
Academic Grant Pursuits Newsletter - October 2027
PDF
Academic Grant Pursuits Newsletter - September 2027
PDF
Academic Grant Pursuits Newsletter - August 2027
PDF
Academic Grant Pursuits Newsletter - July 2027
PDF
Academic Grant Pursuits Newsletter - June 2027
PDF
Academic Grant Pursuits Newsletter - May 2027
PDF
Academic Grant Pursuits Newsletter - April 2027
PDF
Academic Grant Pursuits Newsletter - March 2027
PDF
Academic Grant Pursuits Newsletter - February 2027
Number Line (used with an Absolute Values presentation)
Absolute Values (slideshow used with a number line)
Academic Grant Pursuits Newsletter - July 2028
Academic Grant Pursuits Newsletter - June 2028
Academic Grant Pursuits Newsletter - May 2028
Academic Grant Pursuits Newsletter - April 2028
Academic Grant Pursuits Newsletter - March 2028
Academic Grant Pursuits Newsletter - February 2028
Academic Grant Pursuits Newsletter - January 2028
Academic Grant Pursuits Newsletter - December 2027
Academic Grant Pursuits Newsletter - November 2027
Academic Grant Pursuits Newsletter - October 2027
Academic Grant Pursuits Newsletter - September 2027
Academic Grant Pursuits Newsletter - August 2027
Academic Grant Pursuits Newsletter - July 2027
Academic Grant Pursuits Newsletter - June 2027
Academic Grant Pursuits Newsletter - May 2027
Academic Grant Pursuits Newsletter - April 2027
Academic Grant Pursuits Newsletter - March 2027
Academic Grant Pursuits Newsletter - February 2027

Recently uploaded (20)

PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
Introduction to machine learning and Linear Models
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
1_Introduction to advance data techniques.pptx
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Business Acumen Training GuidePresentation.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Introduction to machine learning and Linear Models
Introduction-to-Cloud-ComputingFinal.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
.pdf is not working space design for the following data for the following dat...
1_Introduction to advance data techniques.pptx
Reliability_Chapter_ presentation 1221.5784
IBA_Chapter_11_Slides_Final_Accessible.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Clinical guidelines as a resource for EBP(1).pdf
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx

Exploring Article Networks on Wikipedia with NodeXL

  • 1. EXPLORING ARTICLE NETWORKS ON WIKIPEDIA WITH NODEXL
  • 2. PRESENTATION DESCRIPTION • With 4.8 million articles in the English version of Wikipedia, this crowd-sourced online encyclopedia is regularly one of the top-ten visited sites online. For many, this is the go-to source for a first read on a topic. The open-source and free Network Overview, Discovery and Exploration for Excel (NodeXL), which is an add-on to Microsoft Excel, enables the capture of “article networks” from Wikipedia. Such content network analysis-based data visualizations enable the development of research leads; some understandings of public conceptualizations of related concepts, peoples, events, and phenomena; the profiling of Wikipedia editors (both humans and ‘bots), and other research insights. This presentation will showcase this affordance of NodeXL and provide some ideas for practical applications of this channel of research and knowing. 2
  • 3. OVERVIEW • Wikipedia ethos and practices • Wikipedia • The many Wikipedias; the English Wikipedia • The Wikimedia Foundation • MediaWiki and basic functionalities • Basic article network analysis • NodeXL and basic functionalities; automation 3
  • 4. OVERVIEW (CONT.) • http page networks on Wikipedia: • article networks • human author / editor networks • robot networks • Live demos • Other (future) networks from Wikipedia 4
  • 5. WIKIPEDIA ETHOS AND PRACTICES • Objective, fact-based, and research-focused • Full research citations • Isolating of opinions into Talk pages • Open • Open-access • Open-source, public domain-released • Crowd-sourced knowledge co- creation; curated public data • Crowd-funded 501(C)3; transparent finances ($58.5 million goal for FY 2015) • Editing via email-verified accounts or Internet Protocol (IP) capture 5
  • 6. WIKIPEDIA THE MANY WIKIPEDIAS • 288 Wikipedias (with 277 active) • In order of articles: English (13.9%), Swedish (5.6%), Dutch (5.2%), German (5.25%), French (4.6%), Waray-Waray (3.6%), Russian (3.5%), Cebuano (3.4%), Italian (3.4%), Spanish (3.4%), and Other (48.2%) • (“List of Wikipedias” on Wikipedia) THE ENGLISH WIKIPEDIA • Founded in Jan. 15, 2001 • 4.8 million articles • 25 million user accounts • 1.347 administrators (“English Wikipedia” on Wikipedia) 6
  • 7. THE WIKIMEDIA FOUNDATION • Objective: to encourage “the growth, development and distribution of free, multilingual, educational content,” and to provide “the full content of these wiki-based projects to the public free of charge” • A range of projects: Wikipedia, Wikibooks, Wikiversity, Wikimedia Commons, Wiktionary, Wikiquote, Wikivoyage, Wikidata, Wikinews, Wikisource, Wikispecies, and MediaWiki (Wikimedia Foundation) 7
  • 8. MEDIAWIKI AND BASIC FUNCTIONALITIES • “wiki wiki”: “quick” or “fast” in Hawaiian • Ward Cunningham as the developer of the first wiki software (WikiWikiWeb) in 1994 to enable online collaborations with history versioning and rollback capabilities • MediaWiki first created by the Wikimedia Foundation in 2002 • Magnus Manske and Lee Daniel Crocker were the initial developers of this tool using PHP (MediaWiki) 8
  • 9. A WIKIMEDIA ARTICLE INTERFACE 9
  • 10. A VIEW OF THE REVISION HISTORY 10
  • 11. BASIC ARTICLE NETWORK ANALYSIS • Basics of network graphs: nodes-links, entities-relationships, vertices-edges; undirected or directed (digraphs) graphs; networks and meta-networks; subgraphs and clusters, motifs; network centrality • Direct ties represented in ego neighborhoods (with a maximum geodesic distance or graph diameter of 2); also 1.5 degree ties for transitivity (with a maximum geodesic distance or graph diameter of 3) and 2 degree ties to include networks of the respective “alters” (with much larger maximum geodesic distances possible) 11
  • 12. BASIC ARTICLE NETWORK ANALYSIS (CONT.) • Entities may be individuals or groups, contents, and other elements • Relatedness: Article networks created based on in-links and outlinks; node “degree” • Other types of relatedness are possible such as based on word co-occurrences, title relatedness (same synset or “synonym set”), shared categories, and others • Relations are conceptualized as enabling paths 12
  • 13. NODEXL AND BASIC FUNCTIONALITIES; AUTOMATION • A free and open-source add-on to Microsoft Excel available on the Microsoft CodePlex platform • Enables… • Graph visualization (with datasets from UCINET, GraphML, and other types) • Data extraction from a number of social media platform APIs; refreshed runs based on the same parameters (macros) • Large number of tools of graph analysis • A number of layout algorithms and selections to represent the data visually 13
  • 14. HTTP PAGE NETWORKS ON WIKIPEDIA (IN THIS CASE) • http page links within Wikipedia, not connecting out to the Surface Web • One-directional (outlink) directional graph of the target Wikipedia page • May include article page networks, human page networks, robot page networks, and others • Networks seeded by one target title or name (as long as the string appears as a page in Wikipedia) • No need for an application programming interface (API) on the MediaWiki platform 14
  • 15. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (1 DEG., 237 VERTICES, 237 EDGES) 15
  • 16. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (1.5 DEG., 12,368 VERTICES AND 17,686 UNIQUE EDGES) 16
  • 17. MEDIAWIKI ARTICLE NETWORK ON WIKIPEDIA (2 DEG., 923,006 VERTICES) 17 In the first run, the software kicked up an “out of memory” exception error and crashed. Another run was conducted on a different machine with more processing capability. The screenshots are from that data extraction. The data itself involved some edge pairs (over half a dozen) in which one of the vertices was missing.
  • 18. EXAMPLE: ARTICLE NETWORK • Who are individuals related to a topic? Events? Years? Topics? Which of these may be useful leads to learn more about the basic seed topic? • Based on a real-world individual, what is he or she known for? Who are people that this person is connected with? • Based on a technology, when was it originated? Who originated it? What were precursor inventions? What inventions were linked to the particular technology? 18
  • 19. EXAMPLE: ARTICLE NETWORK (CONT.) • Based on collected lists, who is on a target list, and for what? • Based on a particular topic, are there gaps in the information based on “missing” article links? • Based on a particular phenomena, event, phrase, or individual, in a foreign context and foreign language, what may be learned? 19
  • 20. WIKI ARTICLE NETWORK ON WIKIPEDIA (1 DEG., 162 VERTICES) 20
  • 22. EXAMPLE: HUMAN (AUTHOR / EDITOR) USER NETWORK • Based on the human user’s network on Wikipedia, what articles does he or she tend to edit? In total, what does this network suggest about the person behind the edits? • (This requires the existence of a user page though.) 22
  • 25. EXAMPLE: ROBOT NETWORK • Based on the approved robot user’s network, what are the interests of the maker of the robot? What other accounts is the robot connected to? 25
  • 28. ADDITIONAL APPROACHES • Chaining from one target account to related others • Cross-comparing information on the Wikipedia site with the extracted networks • Connecting the Wikipedia information with related sites on the Surface Web / World Wide Web (WWW) and Internet 28
  • 29. OTHER (FUTURE) NETWORKS FROM WIKIPEDIA • The third-party tool to NodeXL has spaces to enable user-content (two-mode) network extractions and the mapping of co-editing networks…but those functions are not currently enabled (apparently) 29