Twenty Thousand Leagues Above the Book:
An Interactive Visual Analytics Approach to Literature
n	words
n	words
k	slices	of	n	words
(e.g.	n=1,000)
...
1 2 3 k
Abel	Magwitch
Joe	Gargery,	
Mrs	Joe	Gargery,	
Pip
Joe	Gargery,	
Mrs	Joe	Gargery,	
Pip
+
Lexicon	of	characters	(can	
also	be	a	list	of	other	
entities	such	as	places	or	
phrases)
- network	visualisations
- statistics	and	other
measures
Directed	network
Dynamic	network	visualization
Source	texts	from	
Project	Gutenberg
Matched	information
• Abel	Magwitch
• Joe	Gargary
• Mrs Joe	Gargary
• Pip
Matched	information
• Abel	Magwitch
automatic/semi-automatic	
methods	to	detect	match	
entities	in	text,	topic	
detection,	sentiment	
analysis,	part-of-speech	
tagging)
configure
analyze
In recent years data-driven analysis has emerged as a growing methodology within literary
studies. These distant reading practices harness available technology to open new avenues for how
we understand literary texts. Whereas traditional literary scholarship is generally grounded in the
interpretation of the specific language of a text or body of texts, macroanalytic approaches present
new ways of seeing texts, both individually and in the aggregate. Here we introduce a
novel tool for collaborative interaction with literature that
transcends the boundaries of traditional close and data-driven
distant reading. Our approach constructs temporally ordered networks of information
occurrence, which we configured to match characters as the unit of information. This creates a
unique view of the narrative structures within novels and opens a variety of possibilities for visual
as well as information theoretic analysis.
Figure 2: Overview of the data processing pipeline of our tool prototype. The user can choose between a supervised approach (lexicon of information to match needed)
or an unsupervised approach (using state-of-the-art natural language processing).
Markus Luczak-Roesch (@mluczak), Adam Grener, Emma Fenton
As part of our initial user studies humanities scholars
collaboratively produced a set of annotated
graphs based on the visualisations that were provided to them
(see Figures 3 and 4 for two examples). These annotated graphs
show to great detail what kind of insights humanities scholars are
looking for.Annotating groups of nodes to mark up larger episodes
orsequences in the novel was a common task theyperformed. The
interactions between the derived networks and the original texts
flowed both ways; the scholars began with moments of known
narrativesignificance and moved to the network to understand that
significance, but they also identified elements of the network that
presented as significant to identify areas to begin textual analysis.
These observations on self-designed annotations are evidence for
an emergent methodological practice, as the
humanities scholars were given a novel tool and then developed
their own analytical process around it. Our users studies also
demonstrate the potential of our tool to cultivate
collaborative interpretive practices for readers.
Project page: https://vuw-sim-stia.github.io/computational-literary-science/- Source code repository: https://github.com/vuw-sim-stia/lit-cascades
Figure 3: Manual annotation of the Bleak
House network that identifies moments of
convergence for important characters.
Figure 4: Manual annotations of the David
Copperfield network that identify moments of
narrative significance.
Figure 1: Deployment of our
prototype on a 49'' multi-touch
screen. Individuals and groups can
convene in front of this setup and use
the tool while working with one of
the analysed novels.
DOI of the paper describing this work:
https://doi.org/10.1145/3148330.3154507
Demo: https://goo.gl/yFoZ6U

More Related Content

PDF
Text Mining: (Asynchronous Sequences)
PDF
WEB PAGE RANKING BASED ON TEXT SUBSTANCE OF LINKED PAGES
PDF
SPIRIT: A TREE KERNEL-BASED METHOD FOR TOPIC PERSON INTERACTION DETECTION
PPT
Pula 5 Giugno 2007
PDF
Semantics-based clustering approach for similar research area detection
PPT
eMargin at #tagginganna workshop, Leicester
PPTX
Reusing digital content: towards making research using this content limited b...
PDF
Lacuna Stories
Text Mining: (Asynchronous Sequences)
WEB PAGE RANKING BASED ON TEXT SUBSTANCE OF LINKED PAGES
SPIRIT: A TREE KERNEL-BASED METHOD FOR TOPIC PERSON INTERACTION DETECTION
Pula 5 Giugno 2007
Semantics-based clustering approach for similar research area detection
eMargin at #tagginganna workshop, Leicester
Reusing digital content: towards making research using this content limited b...
Lacuna Stories

Similar to Analysing literature through the lens of information theory and network science (6)

PDF
Forty Years of the OTA
PPT
Thinking Through Networks Workshop_InterFace2011
PPT
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
PDF
Digital Research – why we are here, what we have, what we can do for you
PPT
Dmdh winter 2015 session #2
PDF
Future Libraries: considering 'publishing', City University, London, 10 April...
Forty Years of the OTA
Thinking Through Networks Workshop_InterFace2011
Demystifying Digital Humanities: Winter 2014 Workshop #2: Programming on the ...
Digital Research – why we are here, what we have, what we can do for you
Dmdh winter 2015 session #2
Future Libraries: considering 'publishing', City University, London, 10 April...
Ad

More from Markus Luczak-Rösch (12)

PDF
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
PDF
Our World is Socio-technical
PDF
Web of Data Usage Mining
PDF
Transcending our views to sequential data
PDF
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
PDF
Context-free data analysis with Transcendental Information Cascades.
PDF
From coincidence to purposeful flow? Properties of transcendental information...
PDF
When resources collide: Towards a theory of coincidence in information spaces...
PDF
Observation and Analysis of Social Machines
PDF
Zooniverse - Through the Observatory
PDF
loomp - semantic content authoring
PPTX
Statistical Analysis of Web of Data Usage
Not re-decentralizing the Web is not only a missed opportunity, it is irrespo...
Our World is Socio-technical
Web of Data Usage Mining
Transcending our views to sequential data
The Web Science MacroScope: Mixed-methods Approach for Understanding Web Acti...
Context-free data analysis with Transcendental Information Cascades.
From coincidence to purposeful flow? Properties of transcendental information...
When resources collide: Towards a theory of coincidence in information spaces...
Observation and Analysis of Social Machines
Zooniverse - Through the Observatory
loomp - semantic content authoring
Statistical Analysis of Web of Data Usage
Ad

Recently uploaded (20)

PDF
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PPT
Enhancing Laboratory Quality Through ISO 15189 Compliance
PPT
Computional quantum chemistry study .ppt
PPTX
gene cloning powerpoint for general biology 2
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Packaging materials of fruits and vegetables
PPTX
A powerpoint on colorectal cancer with brief background
PPTX
Presentation1 INTRODUCTION TO ENZYMES.pptx
PPT
Presentation of a Romanian Institutee 2.
PPT
Mutation in dna of bacteria and repairss
PDF
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
PDF
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
PPTX
Hypertension_Training_materials_English_2024[1] (1).pptx
PPTX
perinatal infections 2-171220190027.pptx
PDF
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPT
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
PPTX
endocrine - management of adrenal incidentaloma.pptx
PDF
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...
Looking into the jet cone of the neutrino-associated very high-energy blazar ...
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Enhancing Laboratory Quality Through ISO 15189 Compliance
Computional quantum chemistry study .ppt
gene cloning powerpoint for general biology 2
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Packaging materials of fruits and vegetables
A powerpoint on colorectal cancer with brief background
Presentation1 INTRODUCTION TO ENZYMES.pptx
Presentation of a Romanian Institutee 2.
Mutation in dna of bacteria and repairss
S2 SOIL BY TR. OKION.pdf based on the new lower secondary curriculum
Worlds Next Door: A Candidate Giant Planet Imaged in the Habitable Zone of ↵ ...
Hypertension_Training_materials_English_2024[1] (1).pptx
perinatal infections 2-171220190027.pptx
Warm, water-depleted rocky exoplanets with surfaceionic liquids: A proposed c...
TORCH INFECTIONS in pregnancy with toxoplasma
Biochemestry- PPT ON Protein,Nitrogenous constituents of Urine, Blood, their ...
endocrine - management of adrenal incidentaloma.pptx
Unit 5 Preparations, Reactions, Properties and Isomersim of Organic Compounds...

Analysing literature through the lens of information theory and network science

  • 1. Twenty Thousand Leagues Above the Book: An Interactive Visual Analytics Approach to Literature n words n words k slices of n words (e.g. n=1,000) ... 1 2 3 k Abel Magwitch Joe Gargery, Mrs Joe Gargery, Pip Joe Gargery, Mrs Joe Gargery, Pip + Lexicon of characters (can also be a list of other entities such as places or phrases) - network visualisations - statistics and other measures Directed network Dynamic network visualization Source texts from Project Gutenberg Matched information • Abel Magwitch • Joe Gargary • Mrs Joe Gargary • Pip Matched information • Abel Magwitch automatic/semi-automatic methods to detect match entities in text, topic detection, sentiment analysis, part-of-speech tagging) configure analyze In recent years data-driven analysis has emerged as a growing methodology within literary studies. These distant reading practices harness available technology to open new avenues for how we understand literary texts. Whereas traditional literary scholarship is generally grounded in the interpretation of the specific language of a text or body of texts, macroanalytic approaches present new ways of seeing texts, both individually and in the aggregate. Here we introduce a novel tool for collaborative interaction with literature that transcends the boundaries of traditional close and data-driven distant reading. Our approach constructs temporally ordered networks of information occurrence, which we configured to match characters as the unit of information. This creates a unique view of the narrative structures within novels and opens a variety of possibilities for visual as well as information theoretic analysis. Figure 2: Overview of the data processing pipeline of our tool prototype. The user can choose between a supervised approach (lexicon of information to match needed) or an unsupervised approach (using state-of-the-art natural language processing). Markus Luczak-Roesch (@mluczak), Adam Grener, Emma Fenton As part of our initial user studies humanities scholars collaboratively produced a set of annotated graphs based on the visualisations that were provided to them (see Figures 3 and 4 for two examples). These annotated graphs show to great detail what kind of insights humanities scholars are looking for.Annotating groups of nodes to mark up larger episodes orsequences in the novel was a common task theyperformed. The interactions between the derived networks and the original texts flowed both ways; the scholars began with moments of known narrativesignificance and moved to the network to understand that significance, but they also identified elements of the network that presented as significant to identify areas to begin textual analysis. These observations on self-designed annotations are evidence for an emergent methodological practice, as the humanities scholars were given a novel tool and then developed their own analytical process around it. Our users studies also demonstrate the potential of our tool to cultivate collaborative interpretive practices for readers. Project page: https://vuw-sim-stia.github.io/computational-literary-science/- Source code repository: https://github.com/vuw-sim-stia/lit-cascades Figure 3: Manual annotation of the Bleak House network that identifies moments of convergence for important characters. Figure 4: Manual annotations of the David Copperfield network that identify moments of narrative significance. Figure 1: Deployment of our prototype on a 49'' multi-touch screen. Individuals and groups can convene in front of this setup and use the tool while working with one of the analysed novels. DOI of the paper describing this work: https://doi.org/10.1145/3148330.3154507 Demo: https://goo.gl/yFoZ6U