SlideShare a Scribd company logo
Context and collections, and
the British Library
Ben O’Steen, British Library Labs
@benosteen
British Library Labs - Overview Talk 2017
The British Library
Inside the British Library
Space for 1200 readers, around 400,000 visitors per year
Uses low oxygen and robots
Reading room and delivery to London
Document Supply and Storage at Boston Spa
Stockton-on-Tees
Author right to payment each time their books
are borrowed from public libraries.
St Pancras, London, UK
Many books are stored 4 stories below the building
Legal Deposit Library – Reference only
Living Knowledge Vision (2015 – 2023)
Custodianship Research Business
Culture Learning International
Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK
Roly Keating (Chief Executive Officer of the British Library)
To make our intellectual heritage accessible to everyone,
for research, inspiration and enjoyment and be the most open,
creative and innovative institution of its kind by 2023.
Collections – not just books!
> 180*million items
> 0.8* m serial titles
> 8* m stamps
> 14* m books
> 3* m sound recordings
> 4* m maps
> 1.6* m musical scores
> 0.3* m manuscripts
> 60* m patents
King’s Library *Estimates
Wider…not just Researchers
Researchers
https://goo.gl/WutNyi
Artists
http://goo.gl/nNKhQ2
Librarians
Curators
https://goo.gl/9NWZUW
Software Developers
https://goo.gl/7QQ5Tf
Archivists
https://goo.gl/x7b4tg
Educators
https://goo.gl/qh01Mi
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Digital research methods
Visualisations
Application Programming Interfaces
for datasets e.g. Metadata, Images Annotation
Location based searching & Geo-tagging Crowdsourcing
Human Computation
How did we do this?
Competitions
Awards
Projects
Tell us your ideas of what to do with our digital content
Show us what you have already done with our digital
content in research, artistic, commercial and learning and
teaching categories
Talk to us about working on collaborative projects
Getting to the heart of it
British Library Labs works with researchers on their specific
problems, trying to assess how widely this problem is felt.
With their help, we talk to communities of researchers and
try to pinpoint what they need as opposed to what they think
they need to ask us.
Researchers often ask for all the content
we have.
What does that mean for digitised items
in practice?
Taking a peek at our Open Data
A digitised book…
002819694
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
OCR XML Generated by ABBY Fine Reader
Could Labs provide other ways to
understand this book?
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Optically Character Recognised (OCR)
generated Text
Scanned Page
Image on Flickr
Commons
https://goo.gl/AC43vs
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Tagging, Tagging, Tagging…
Iterative crowdsourcing?
(The term is borrowed from Mia Ridge.)
1. Crowdsource broad facts and subcollections of related items emerge.
2. No 'one-size-fits-all': Subcollections allow for more focussed curation.
GOTO 1
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
SherlockNet: Competition Winner 2016
Karen Wang, Luda Zhao and Brian Do
Using Convolutional Neural Networks to Automatically Tag and Caption
the British Library Flickr Commons 1 million Image Collection
12 categories
>20 million tags added
>100,000 captions
bit.ly/sherlocknet
Pooled surrounding
OCR text on page
from similar images
Used Microsoft COCO (photographs) &
British Museum Prints and Drawings
collections as training sets.
Tags Captions
Artistic / Creative Works
http://goo.gl/dM8ie
A
Mario Klingeman (2015)
David Normal 2014 and 2015
Kris Hoffman (2016)
https://goo.gl/Qilqq
T
Jiayi Chong 2016 Ling Low 2016
https://www.youtube.com/watch?v=bcOP1E5bRE0
https://www.facebook.com/RealmlandStory/
Paul Rand Pierce 2016
A Hat on the Ground Spells
trouble
Tragic Looking
Women
44 Men who Look 44
(Notice the direction faces)
Imaginary Cities – BL Labs Project 16-17
Michael Takeo Magruder
https://goo.gl/4ARwTy
An artistic exploration seeking to create provocative fictional cityscapes for the
Information Age from the British Library’s digital collection of historic urban maps
British Library Labs - Overview Talk 2017
Mario Klingemann 2016
https://www.youtube.com/watch?v=xgnxnmqnR7Y
Google Arts and Culture Lab – Experiments with Machine Learning
https://artsexperiments.withgoogle.com/
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Mario Klingemann
http://www.robertelliottsmith.com/?p=530
MIT Moral Machine survey:
http://moralmachine.mit.edu/
Presentation shapes perception
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Creative Uses
• David Normal installation at Burning Man Festival
• “Moments” by Joe Bell
• Colouring-in Pages for Children
David Normal
http://www.davidnormal.com/
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Burning Man Festival
David Normal created light boxes around the
Burning man, using the British Library’s Flickr Images
“Crossroads of Curiosity”
(20th June -> November, 2015)
British Library Labs - Overview Talk 2017
But how can anyone find anything
useful?
John Cooper, https://www.flickr.com/photos/atomicshed/2436324958 CC-BY-NC-ND 2.0
British Library Labs - Overview Talk 2017
Infancy of understanding
Large-scale analysis of text is
evolving but young.
Exasperating situation where
‘black boxes’ of algorithms are
used to draw conclusions.
http://www.scottbot.net/HIAL/?p=41271
“Black Boxes”:
a misnomer
It is legitimate and useful to use code
that you could not write.
It is not legitimate to simply believe the
‘label’ on the side of the box.
E.g. “Sentiment Analysis” is often nothing
of the sort.
Quoting Scott Weingart: (emphasis mine)
● Do sentiment analysis algorithms agree with one another enough to be considered
valid?
● Do sentiment analysis results agree with humans performing the same task enough to
be considered valid?
● Is Jockers’ instantiation of aggregate sentiment analysis validly measuring anything
besides random fluctuations?
● Is aggregate sentiment analysis, by human or machine, a valid method for revealing plot
arcs?
● If aggregate sentiment analysis finds common but distinct patterns and they don’t seem to map
onto plot arcs, can they still be valid measurements of anything at all?
● Can a subjective concept, whether measured by people or machines, actually be
considered invalid or valid?
(again from http://www.scottbot.net/HIAL/?p=41271)
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
* (2012) https://ariddell.org/where-are-the-novels.html
Digitisation
Often through Partnerships with
Commercial & Other Organisations
Bias in digitisation
http://goo.gl/bR9UJ
L
Sample Generator
Open Licensed Digital Content?
15% Openly
Licensed
Around 10%* available online
Working through
Breakdown by collection*
Manuscripts 59%
Books 9%
Maps and Views 7%
Newspapers 3%
Archives and Records 3%
Paintings, Prints and Drawings 2%
*Based on digitisation projects
Largest proportion of funding
Public / Private Partnership
15%* Openly Licensed
85%* Available onsite
*Estimates
Accessing digital collections onsite
OPEN £
•Have to be ‘onsite’
•Need to be security cleared for some collections
– Hence ‘Researcher in Residence Model’
•Permission required (depending on ‘story’ of collection)
•Content on various media formats
•20 % re-use of material for non commercial research for some
collections
•We are learning ‘pathways’ so that this becomes ‘everyday’ to
provide onsite access in the future
Typical pattern of research for Labs
•Finding invisible things in ‘messy’ historical
data
•Unearthing / unlocking hidden histories and
data to stimulate new research
•Celebrating hidden histories / data creatively
through events, art and performance
Finding things in messy OCR text
Mrs Folly
• Clean up some manually
• Get human ‘ground truth’
• Write code to find things
reliably in it automatically
• Try code on messy content
• Tweak if necessary
• Digital ‘lasso’ around content
• Human sift through
Mrs Folly
Code: Machine Learning / Reading
•Analogies to how humans read / learn
•Machines acquire ‘knowledge’ / data and use that knowledge
/ data to make sense / identify patterns
•Labs doing this on a case by case basis so methods can vary
•Need computational AND human effort
•Legalities of this process being ‘ironed’ out with publishers,
•Often a misunderstood area…
•Computers look for ‘patterns’ or the ‘essence’ of something
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
Katrina Navickas (2015)
Political Meetings Mapper
http://politicalmeetingsmapper.co
.uk https://goo.gl/Qq78Oa
Labs Symposium
2015
https://goo.gl/BSA3be
Interview
2015
The Chartist
Newspaper
http://goo.gl/vOLSn
H
Chartist Monster Meeting
Chartists Walking Tour and
Re-enactment London
Working with Newspaper
Collections
Using Jupyter Notebooks
Virtual Infrastructure for OCR text
OCR text scraped from
digitised newspapers
and in cloud
Jupyter notebook
Write python code and results
in browser
http://jupyter.org
Access available for researchers ‘in residence’
Black Abolitionists
In the UK
Researcher: Hannah Rose Murray
Black Abolitionist Performances & their
Presence in Britain (2016) – Hannah-Rose Murray
Aberdeen Journal, 5 February 1851 “Fugitive Slaves”
Aberdeen Journal, 14 April 1847
“Frederick Douglass, The Emancipated Slave”
Frederick
Douglass
Ellen
Craft
Josiah
Henson
Ida B
Wells
A Performance by
Joe Williams &
Martelle Edinborough
http://frederickdouglassinbritain.com/
British Library Labs - Overview Talk 2017
Use of Overproof / OCR Correction?
Re-OCR with
ABBY FineReader?
https://www.abbyy.com/en-gb/
http://overproof.projectcomputing.com/
British Library Labs - Overview Talk 2017
Surveyed a set
portion of the
collection for words
we were interested
in, and those 1 and
2 ‘distant’ from
these (Levenshtein
distance).
British Library Labs - Overview Talk 2017
Naive-Bayes Classifier:
Classifiers allowed us to prioritise on
relevant articles without us reading them:
Data-mining verse in 18th
Century newspapers
BL Labs Project 16-17, Jennifer Batt
https://goo.gl/5Akthd
Slides courtesy Jennifer BattJennifer Batt @ the BL on World Poetry Day
What thoj' among ourrelves, with too much Heat, or t
W: fweutimes.wongle, wvhen we Ihould debate, W –
(A confequential Ill which Freedom drawvs, fl t
A bad Efficf, but from a noble Caufe) t
We can with univeifal Zcal advance, to
To cutb the faithlefs Arrogancccof V rance. hi
Dublin Journal
10-14 September,
1745
Slides courtesy Jennifer Batt
Verse: 81% lines begin
with initial capital
Prose: 52% lines begin
with initial capital
Westminster Journal 3
March 1745
Slides courtesy Jennifer Batt
British Library Labs - Overview Talk 2017
http://varianceexplained.org/r/kmeans-free-lunch/
British Library Labs - Overview Talk 2017
In Summary:
- Context about how an digitised image came to be and
why it was scanned is both crucial to understand and
sometimes crucial to hide.
- aka Opening up large collections brings its own issues.
- Presentation shapes perception.
- Too much trust in black boxes algorithms, like search
engines or social feed suggestions.
- So little of our history is online that there is a natural bias.
The gaps are being filled in with less credible sources.
- It still might have happened even if you cannot google
it, and vice versa!
←

More Related Content

PPTX
Digitised Images Sharing and Reuse by Stella Wisdom
PPT
British Library Labs Roadshow - Open University
PPTX
Bl Labs roadshow at Warwick University by Stella Wisdom
PDF
The Weston Library (formerly the New Bodleian): high tech storage enables inn...
PPTX
Talk for RIVAL (Research Impact Value and LIS) event by Stella Wisdom
PPTX
Stella Wisdom's slides for a talk to UCL BASc students on 02/03/2015
PPTX
Places of inspiration: playing and making in the library
PPTX
Places of Inspiration: Playing and Making in the Library
Digitised Images Sharing and Reuse by Stella Wisdom
British Library Labs Roadshow - Open University
Bl Labs roadshow at Warwick University by Stella Wisdom
The Weston Library (formerly the New Bodleian): high tech storage enables inn...
Talk for RIVAL (Research Impact Value and LIS) event by Stella Wisdom
Stella Wisdom's slides for a talk to UCL BASc students on 02/03/2015
Places of inspiration: playing and making in the library
Places of Inspiration: Playing and Making in the Library

What's hot (20)

PPTX
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...
PPTX
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
PPTX
Digital scholarship at the British Library by stella wisdom for Researching B...
PPTX
Talk for Digital Conversation: History and Games
PDF
Creating, Curating and Collecting Interactive Fiction at the British Library
PPTX
Digital Scholarship at the British Library: Collecting, Collaboration and Res...
PPTX
The British Library’s Gothic Adventures Off the Map by Stella Wisdom
PPTX
PDF
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
PPTX
Places of inspiration: playing and making in the library by Stella Wisdom
PPTX
7th BL Labs Symposium (2019): 13_Closing comments
PPT
BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615
PPTX
Talk for The Library of Ideas: Creative Use of the British Library by Stella ...
PPTX
Crowdsourcing in the Cultural Sector: approaches, challenges and issues
PPT
British Library Labs Roadshow 2016 UCL 24 Feb 2016
PPTX
British Library Labs Leeds Roadshow 2018
PPTX
British Library Labs - Bodleian - University of Oxford
PPT
Supporting the Digital Scholar: Experiences from the British Library Labs
PPTX
Virtual and Actual
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections, D...
Digital Research Support by Stella Wisdom, 20th & 21st Century Collections
Digital scholarship at the British Library by stella wisdom for Researching B...
Talk for Digital Conversation: History and Games
Creating, Curating and Collecting Interactive Fiction at the British Library
Digital Scholarship at the British Library: Collecting, Collaboration and Res...
The British Library’s Gothic Adventures Off the Map by Stella Wisdom
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Places of inspiration: playing and making in the library by Stella Wisdom
7th BL Labs Symposium (2019): 13_Closing comments
BL Labs and Channel 4 Presentation at Sunnyside of the Doc 250615
Talk for The Library of Ideas: Creative Use of the British Library by Stella ...
Crowdsourcing in the Cultural Sector: approaches, challenges and issues
British Library Labs Roadshow 2016 UCL 24 Feb 2016
British Library Labs Leeds Roadshow 2018
British Library Labs - Bodleian - University of Oxford
Supporting the Digital Scholar: Experiences from the British Library Labs
Virtual and Actual
Ad

Similar to British Library Labs - Overview Talk 2017 (20)

PPTX
Bl labs what is british library labs
PDF
UKSG 2015 Mechanical curator and British Library labs
PPT
British Library Labs Presentation at the Accelerating Human Imagination Workshop
PPTX
AHRC CDP Digital Humanities 101
PPT
BL Labs Presentation at Liverpool John Moores University
PPTX
Presentation to the London Psychology Group
PPT
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
PPTX
BL_English doctoral_open_day_session
PPS
BL Labs at Arts and Humanities event
PDF
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
PPTX
Aquiles imlr seminar
PPT
Presentation to the National Science Library of the Chinese Academy of Sciences
PPT
Working with the British Library’s Digital Collections & Data - Insights from...
PPTX
PDF
NDF,Te Papa, New Zealand 2015 - Keynote
PPT
British Library Labs Presentation at UK Medical Heritage Library Live Lab
PPTX
Digital Research Support by Stella Wisdom, for 20th & 21st Century Collection...
PPT
BL Labs Presentation to Michigan State Students
PPTX
BL Labs Presentation to the British Library Development Team
PPTX
Why do we digitise? 20 reasons in 20 pictures
Bl labs what is british library labs
UKSG 2015 Mechanical curator and British Library labs
British Library Labs Presentation at the Accelerating Human Imagination Workshop
AHRC CDP Digital Humanities 101
BL Labs Presentation at Liverpool John Moores University
Presentation to the London Psychology Group
British Library Labs Presentation at Ed Tech Hackathon 2013 - hackathoncentra...
BL_English doctoral_open_day_session
BL Labs at Arts and Humanities event
British Library Labs - Open University Presentation - 3 April 2014, 1100-1200
Aquiles imlr seminar
Presentation to the National Science Library of the Chinese Academy of Sciences
Working with the British Library’s Digital Collections & Data - Insights from...
NDF,Te Papa, New Zealand 2015 - Keynote
British Library Labs Presentation at UK Medical Heritage Library Live Lab
Digital Research Support by Stella Wisdom, for 20th & 21st Century Collection...
BL Labs Presentation to Michigan State Students
BL Labs Presentation to the British Library Development Team
Why do we digitise? 20 reasons in 20 pictures
Ad

More from benosteen (20)

PPTX
Arches Getty Brownbag Talk
PPTX
Bl labs ucl-services
PDF
Uses of Library Collections
PDF
CityLIS talk, Feb 1st 2016
PDF
British library labs - What? Why?
PDF
Lightning Talk - LDCX 2015 Stanford
PDF
104 Communicating our Collections Online
PDF
Sharing and Serendipity
PDF
BL Labs 2014 Symposium: The Mechanical Curator
PPTX
The surprising adventures of the mechanical curator
PPTX
Mechanical curator - Technical notes
PPTX
Apache pig as a researcher’s stepping stone
PPTX
New methods of access and discoverability bring new affordances for digital r...
PPTX
Visualising Knowledge: Why? What? How?
PDF
Mashspa
PDF
Postscript, books and binding
PDF
Open Bibliography, Citations and Scholarship
ODP
Text-mining and Automation
ODP
Bodleian Library's DAMS system
PDF
Choices, modelling and Frankenstein Ontologies
Arches Getty Brownbag Talk
Bl labs ucl-services
Uses of Library Collections
CityLIS talk, Feb 1st 2016
British library labs - What? Why?
Lightning Talk - LDCX 2015 Stanford
104 Communicating our Collections Online
Sharing and Serendipity
BL Labs 2014 Symposium: The Mechanical Curator
The surprising adventures of the mechanical curator
Mechanical curator - Technical notes
Apache pig as a researcher’s stepping stone
New methods of access and discoverability bring new affordances for digital r...
Visualising Knowledge: Why? What? How?
Mashspa
Postscript, books and binding
Open Bibliography, Citations and Scholarship
Text-mining and Automation
Bodleian Library's DAMS system
Choices, modelling and Frankenstein Ontologies

Recently uploaded (20)

PPTX
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
PPTX
master seminar digital applications in india
PDF
Business Ethics Teaching Materials for college
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PDF
Classroom Observation Tools for Teachers
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
FourierSeries-QuestionsWithAnswers(Part-A).pdf
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
102 student loan defaulters named and shamed – Is someone you know on the list?
PPTX
Week 4 Term 3 Study Techniques revisited.pptx
PDF
RMMM.pdf make it easy to upload and study
PDF
Complications of Minimal Access Surgery at WLH
Introduction_to_Human_Anatomy_and_Physiology_for_B.Pharm.pptx
master seminar digital applications in india
Business Ethics Teaching Materials for college
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
BÀI TẬP BỔ TRỢ 4 KỸ NĂNG TIẾNG ANH 9 GLOBAL SUCCESS - CẢ NĂM - BÁM SÁT FORM Đ...
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
2.FourierTransform-ShortQuestionswithAnswers.pdf
Classroom Observation Tools for Teachers
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Mark Klimek Lecture Notes_240423 revision books _173037.pdf
Supply Chain Operations Speaking Notes -ICLT Program
FourierSeries-QuestionsWithAnswers(Part-A).pdf
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
102 student loan defaulters named and shamed – Is someone you know on the list?
Week 4 Term 3 Study Techniques revisited.pptx
RMMM.pdf make it easy to upload and study
Complications of Minimal Access Surgery at WLH

British Library Labs - Overview Talk 2017

  • 1. Context and collections, and the British Library Ben O’Steen, British Library Labs @benosteen
  • 3. The British Library Inside the British Library Space for 1200 readers, around 400,000 visitors per year Uses low oxygen and robots Reading room and delivery to London Document Supply and Storage at Boston Spa Stockton-on-Tees Author right to payment each time their books are borrowed from public libraries. St Pancras, London, UK Many books are stored 4 stories below the building Legal Deposit Library – Reference only
  • 4. Living Knowledge Vision (2015 – 2023) Custodianship Research Business Culture Learning International Document:http://goo.gl/h41wW7 Speech:https://goo.gl/Py9uHK Roly Keating (Chief Executive Officer of the British Library) To make our intellectual heritage accessible to everyone, for research, inspiration and enjoyment and be the most open, creative and innovative institution of its kind by 2023.
  • 5. Collections – not just books! > 180*million items > 0.8* m serial titles > 8* m stamps > 14* m books > 3* m sound recordings > 4* m maps > 1.6* m musical scores > 0.3* m manuscripts > 60* m patents King’s Library *Estimates
  • 6. Wider…not just Researchers Researchers https://goo.gl/WutNyi Artists http://goo.gl/nNKhQ2 Librarians Curators https://goo.gl/9NWZUW Software Developers https://goo.gl/7QQ5Tf Archivists https://goo.gl/x7b4tg Educators https://goo.gl/qh01Mi
  • 11. Digital research methods Visualisations Application Programming Interfaces for datasets e.g. Metadata, Images Annotation Location based searching & Geo-tagging Crowdsourcing Human Computation
  • 12. How did we do this?
  • 13. Competitions Awards Projects Tell us your ideas of what to do with our digital content Show us what you have already done with our digital content in research, artistic, commercial and learning and teaching categories Talk to us about working on collaborative projects
  • 14. Getting to the heart of it British Library Labs works with researchers on their specific problems, trying to assess how widely this problem is felt. With their help, we talk to communities of researchers and try to pinpoint what they need as opposed to what they think they need to ask us.
  • 15. Researchers often ask for all the content we have. What does that mean for digitised items in practice?
  • 16. Taking a peek at our Open Data A digitised book…
  • 20. OCR XML Generated by ABBY Fine Reader
  • 21. Could Labs provide other ways to understand this book?
  • 24. Optically Character Recognised (OCR) generated Text Scanned Page Image on Flickr Commons https://goo.gl/AC43vs
  • 28. Iterative crowdsourcing? (The term is borrowed from Mia Ridge.) 1. Crowdsource broad facts and subcollections of related items emerge. 2. No 'one-size-fits-all': Subcollections allow for more focussed curation. GOTO 1
  • 31. SherlockNet: Competition Winner 2016 Karen Wang, Luda Zhao and Brian Do Using Convolutional Neural Networks to Automatically Tag and Caption the British Library Flickr Commons 1 million Image Collection 12 categories >20 million tags added >100,000 captions bit.ly/sherlocknet Pooled surrounding OCR text on page from similar images Used Microsoft COCO (photographs) & British Museum Prints and Drawings collections as training sets. Tags Captions
  • 32. Artistic / Creative Works http://goo.gl/dM8ie A Mario Klingeman (2015) David Normal 2014 and 2015 Kris Hoffman (2016) https://goo.gl/Qilqq T Jiayi Chong 2016 Ling Low 2016 https://www.youtube.com/watch?v=bcOP1E5bRE0 https://www.facebook.com/RealmlandStory/ Paul Rand Pierce 2016 A Hat on the Ground Spells trouble Tragic Looking Women 44 Men who Look 44 (Notice the direction faces)
  • 33. Imaginary Cities – BL Labs Project 16-17 Michael Takeo Magruder https://goo.gl/4ARwTy An artistic exploration seeking to create provocative fictional cityscapes for the Information Age from the British Library’s digital collection of historic urban maps
  • 35. Mario Klingemann 2016 https://www.youtube.com/watch?v=xgnxnmqnR7Y Google Arts and Culture Lab – Experiments with Machine Learning https://artsexperiments.withgoogle.com/
  • 40. MIT Moral Machine survey: http://moralmachine.mit.edu/
  • 46. Creative Uses • David Normal installation at Burning Man Festival • “Moments” by Joe Bell • Colouring-in Pages for Children
  • 50. Burning Man Festival David Normal created light boxes around the Burning man, using the British Library’s Flickr Images
  • 51. “Crossroads of Curiosity” (20th June -> November, 2015)
  • 53. But how can anyone find anything useful?
  • 56. Infancy of understanding Large-scale analysis of text is evolving but young. Exasperating situation where ‘black boxes’ of algorithms are used to draw conclusions. http://www.scottbot.net/HIAL/?p=41271
  • 57. “Black Boxes”: a misnomer It is legitimate and useful to use code that you could not write. It is not legitimate to simply believe the ‘label’ on the side of the box. E.g. “Sentiment Analysis” is often nothing of the sort.
  • 58. Quoting Scott Weingart: (emphasis mine) ● Do sentiment analysis algorithms agree with one another enough to be considered valid? ● Do sentiment analysis results agree with humans performing the same task enough to be considered valid? ● Is Jockers’ instantiation of aggregate sentiment analysis validly measuring anything besides random fluctuations? ● Is aggregate sentiment analysis, by human or machine, a valid method for revealing plot arcs? ● If aggregate sentiment analysis finds common but distinct patterns and they don’t seem to map onto plot arcs, can they still be valid measurements of anything at all? ● Can a subjective concept, whether measured by people or machines, actually be considered invalid or valid? (again from http://www.scottbot.net/HIAL/?p=41271)
  • 62. Digitisation Often through Partnerships with Commercial & Other Organisations Bias in digitisation http://goo.gl/bR9UJ L Sample Generator
  • 63. Open Licensed Digital Content? 15% Openly Licensed Around 10%* available online Working through Breakdown by collection* Manuscripts 59% Books 9% Maps and Views 7% Newspapers 3% Archives and Records 3% Paintings, Prints and Drawings 2% *Based on digitisation projects Largest proportion of funding Public / Private Partnership 15%* Openly Licensed 85%* Available onsite *Estimates
  • 64. Accessing digital collections onsite OPEN £ •Have to be ‘onsite’ •Need to be security cleared for some collections – Hence ‘Researcher in Residence Model’ •Permission required (depending on ‘story’ of collection) •Content on various media formats •20 % re-use of material for non commercial research for some collections •We are learning ‘pathways’ so that this becomes ‘everyday’ to provide onsite access in the future
  • 65. Typical pattern of research for Labs •Finding invisible things in ‘messy’ historical data •Unearthing / unlocking hidden histories and data to stimulate new research •Celebrating hidden histories / data creatively through events, art and performance
  • 66. Finding things in messy OCR text Mrs Folly • Clean up some manually • Get human ‘ground truth’ • Write code to find things reliably in it automatically • Try code on messy content • Tweak if necessary • Digital ‘lasso’ around content • Human sift through Mrs Folly
  • 67. Code: Machine Learning / Reading •Analogies to how humans read / learn •Machines acquire ‘knowledge’ / data and use that knowledge / data to make sense / identify patterns •Labs doing this on a case by case basis so methods can vary •Need computational AND human effort •Legalities of this process being ‘ironed’ out with publishers, •Often a misunderstood area… •Computers look for ‘patterns’ or the ‘essence’ of something
  • 72. Katrina Navickas (2015) Political Meetings Mapper http://politicalmeetingsmapper.co .uk https://goo.gl/Qq78Oa Labs Symposium 2015 https://goo.gl/BSA3be Interview 2015 The Chartist Newspaper http://goo.gl/vOLSn H Chartist Monster Meeting Chartists Walking Tour and Re-enactment London
  • 74. Virtual Infrastructure for OCR text OCR text scraped from digitised newspapers and in cloud Jupyter notebook Write python code and results in browser http://jupyter.org Access available for researchers ‘in residence’
  • 75. Black Abolitionists In the UK Researcher: Hannah Rose Murray
  • 76. Black Abolitionist Performances & their Presence in Britain (2016) – Hannah-Rose Murray Aberdeen Journal, 5 February 1851 “Fugitive Slaves” Aberdeen Journal, 14 April 1847 “Frederick Douglass, The Emancipated Slave” Frederick Douglass Ellen Craft Josiah Henson Ida B Wells A Performance by Joe Williams & Martelle Edinborough http://frederickdouglassinbritain.com/
  • 78. Use of Overproof / OCR Correction? Re-OCR with ABBY FineReader? https://www.abbyy.com/en-gb/ http://overproof.projectcomputing.com/
  • 80. Surveyed a set portion of the collection for words we were interested in, and those 1 and 2 ‘distant’ from these (Levenshtein distance).
  • 83. Classifiers allowed us to prioritise on relevant articles without us reading them:
  • 84. Data-mining verse in 18th Century newspapers BL Labs Project 16-17, Jennifer Batt https://goo.gl/5Akthd Slides courtesy Jennifer BattJennifer Batt @ the BL on World Poetry Day
  • 85. What thoj' among ourrelves, with too much Heat, or t W: fweutimes.wongle, wvhen we Ihould debate, W – (A confequential Ill which Freedom drawvs, fl t A bad Efficf, but from a noble Caufe) t We can with univeifal Zcal advance, to To cutb the faithlefs Arrogancccof V rance. hi Dublin Journal 10-14 September, 1745 Slides courtesy Jennifer Batt
  • 86. Verse: 81% lines begin with initial capital Prose: 52% lines begin with initial capital Westminster Journal 3 March 1745 Slides courtesy Jennifer Batt
  • 90. In Summary: - Context about how an digitised image came to be and why it was scanned is both crucial to understand and sometimes crucial to hide. - aka Opening up large collections brings its own issues. - Presentation shapes perception. - Too much trust in black boxes algorithms, like search engines or social feed suggestions. - So little of our history is online that there is a natural bias. The gaps are being filled in with less credible sources. - It still might have happened even if you cannot google it, and vice versa!
  • 91.