SlideShare a Scribd company logo
The Trouble with TDM*
…
(*Text & Data Mining)
Matthew Lambert
2
bl.uk
Matthew Lambert
• Head of Copyright Policy & Assurance
The British Library
• National Library
• 150-200M items
• Legal deposit Library
Introductions
3
bl.uk
The project – Living with Machines
• Living with Machines is research project between The British Library, The Turing
Institute, Universities of Cambridge, East Anglia, Exeter and London (QMUL)
• Funded by the UK Research and Innovation (UKRI) Strategic Priority Fund
• Focussing on the period 1780-1918 the project uses computational analytical tools to
examine the ways in which technology impacted society in the UK
4
bl.uk
British Library’s Contribution
• The British Library holds a vast collection of content, including newspapers and maps
detailing changes in the period being looked at.
• We provided over 100 newspaper titles and digitised more than 600,000 pages for the
project.
5
bl.uk
One quick caveat…
This is only the copyright aspect – there are many other difficulties associated with a
project of this size
6
bl.uk
Difficulties
Legal:
• The TDM exception not drafted with research partnerships in mind
• No other useful exceptions
• Need to be able to make sample datasets available following research
Practical:
• Assessing the material at scale
• Tracking down rights holders
• Managing content while enabling analysis
7
bl.uk
The law – CDPA 1988 – s29A
The UK’s Copyright legislation does contain an exception covering Text and Data Mining
• It is fantastic that we have the exception, however it was not written with partnerships
in mind
Let’s take a closer look…
8
bl.uk
The law – CDPA 1988 – s29A
• Great! The British Library holds physical copies of the items and so has lawful access to them
• The project is non-commercial, and will provide acknowledgement
• Content is not being sold or ‘let for hired’
9
bl.uk
The law – CDPA 1988 – s29A
So while the British Library can make the copies we can not give it to anyone else…
…where does this leave our partners who are doing the research?
10
bl.uk
The law – CDPA 1988 – s29A
• So on the face of it the exception isn’t all that helpful...
• Still all is not lost
• It does allow us to digitise the material
• It does also allow the BL to use the content (more on that later)
• We will just need to use other approaches to fill the gaps
• Assessing content
• Licensing material
• Managing risk
• In addition to the initial legal difficulties, research of this type is expected to publish
datasets so research can be checked
11
bl.uk
Practical Difficulties
• There is a vast amount of content the project wishes to use, hundreds of thousands of
pages
• Some of the content will be out of copyright and so free to use, but which?
• Often the content was published by organizations which no longer exist
• Is the content anonymous? Article by-lines are relatively recent in newspapers
12
bl.uk
Practical Difficulties
• If we decide to go down a licensing route who do we speak to?
• Given all the unknowns there will inevitably be risks, so that needs to be managed.
• How do we get the content to our partners?
13
bl.uk
Approaches – Practical Difficulties
• As each article and image in a newspaper is a distinct copyright work they potentially
need to be individually looked at – very resource intensive
• Assessing content at such scale has significant difficulties, so had to take a high level
approach
• Content was looked at in detail to determine safe dates to help with assessment
• Spot checks to find out if / when by-lines became common in newspapers
• Risk appetite for the project determined and processes built to manage them
14
bl.uk
Approaches – Legal Difficulties
• Once copyright status was determined we needed to work out how to deal with the in
copyright material
• The British Library can’t give the content to our partners and still remain within the
exception
• But what if the partners temporarily joined the British Library? Then they would have
lawful access and work would fall under the exception…
• Digitised content is held on a dedicated, secure server which the BL controls,
researchers have limited access to the content they need
15
bl.uk
• When content no longer needed, it
is deleted from the secure server
• Out of copyright datasets will be
made available to test research
results
• Provided guidelines to researchers
regarding publication of their
results to ensure copyright was not
infringed.
Approaches – Legal Difficulties
16
bl.uk
• After Brexit the DSM - will not change
UK legislation (so articles 3 and 4 no
help).
• Recent consultation by the UK IPO on
Copyright and AI did specifically cover
TDM.
• For UK research to remain globally
competitive we need this exception to
be widened.
The Future
The Trouble with TDM

More Related Content

PPTX
Legal Framework for TDM
PPTX
Operationalising AI at a national library
PDF
Muehlberger umea google
PDF
UKSG 2023 - A TDM journey: understanding user needs and developing library su...
PPT
British Library Roadshow 2016 Wales
PPT
BL Labs and Digital Humanities
PPT
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
PPTX
StacyKfoury.pptx
Legal Framework for TDM
Operationalising AI at a national library
Muehlberger umea google
UKSG 2023 - A TDM journey: understanding user needs and developing library su...
British Library Roadshow 2016 Wales
BL Labs and Digital Humanities
British Library Labs, Aly Conteh, Digitisation Programme Manager at British L...
StacyKfoury.pptx

Similar to The Trouble with TDM (20)

PPTX
BL Labs Presentation to the British Library Development Team
PPTX
Copyright Reform and Open Data
PPTX
Copyright Reform and Open Data
PDF
Rusbridge Feb 8 Improving Clarity around Continuing Access
PPT
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
PPT
BL Labs at Bloomsbury Digital Humanities Group
PPTX
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
PPTX
DH Project Management
PPTX
Rethink research, illuminate history with the British Library
PDF
23 April 2018 British Library Research Collaboration Open House - News Collec...
PPTX
CILIP Copyright Conference - Prof Melissa Terras - University of Edinburgh
PPT
British Library Labs Competition Presentation - Digital Humanities, Universit...
PPT
British Library Labs 21st Century Curatorship Talk
PPT
British Library Labs
PPT
British Librrary Labs Roadshow 2016 Birmingham
PPTX
Does anybody care about digital preservation? Digital preservation from a per...
PDF
Extended Collective Licensing - the view of a national library
PDF
British Library Labs - Overview Talk 2017
PPT
British Library Labs Virtual Event - 17 May 2013, 1500GMT
PPTX
Living with Machines at The Past, Present and Future of Digital Scholarship w...
BL Labs Presentation to the British Library Development Team
Copyright Reform and Open Data
Copyright Reform and Open Data
Rusbridge Feb 8 Improving Clarity around Continuing Access
Mahendra Mahay's slides from the Bloomsbury DH Meeting 30/09/2013
BL Labs at Bloomsbury Digital Humanities Group
7th BL Labs Symposium (2019): 08_An update on the ‘Living with machines’ project
DH Project Management
Rethink research, illuminate history with the British Library
23 April 2018 British Library Research Collaboration Open House - News Collec...
CILIP Copyright Conference - Prof Melissa Terras - University of Edinburgh
British Library Labs Competition Presentation - Digital Humanities, Universit...
British Library Labs 21st Century Curatorship Talk
British Library Labs
British Librrary Labs Roadshow 2016 Birmingham
Does anybody care about digital preservation? Digital preservation from a per...
Extended Collective Licensing - the view of a national library
British Library Labs - Overview Talk 2017
British Library Labs Virtual Event - 17 May 2013, 1500GMT
Living with Machines at The Past, Present and Future of Digital Scholarship w...
Ad

More from CILIP (20)

PDF
Everything about well-being
PDF
The Art of Collaboration
PPTX
Towards a green Library: the British library’s response to climate change
PPTX
Climate Beacons in Scotland
PPTX
Opening the Doors: Scotland moving Forward in collaboration
PPTX
Making an impact by optimising space: How to keep track of print material in ...
PPTX
Environmentally sustainable libraries - CILIP's sustainability initiatives.pptx
PPTX
High level search skills
PPTX
Celebrating the story of where higher education began in Wales
PPTX
Reinventing online services to bridge the digital divide
PPTX
Our place in an organisation that cares for the natural resources of Wales
PPTX
Copyright and research data
PPTX
CLA Licensing and Product Innovation
PPTX
The Modernist Archives Publishing Project (MAPP) and Copyright
PPTX
The quotation exception in educational and scholarly contexts
PPTX
CLA Licensing and Product Innovation
PPTX
The question of #ebookSOS: is copyright reform the answer?
PPTX
License to View: Copyright & Films in the Age of Covid at a Canadian University
PPTX
Exceptions to Copyright
PPTX
An Act for the Encouragement of Lending? UK Copyright Law and Access to Digit...
Everything about well-being
The Art of Collaboration
Towards a green Library: the British library’s response to climate change
Climate Beacons in Scotland
Opening the Doors: Scotland moving Forward in collaboration
Making an impact by optimising space: How to keep track of print material in ...
Environmentally sustainable libraries - CILIP's sustainability initiatives.pptx
High level search skills
Celebrating the story of where higher education began in Wales
Reinventing online services to bridge the digital divide
Our place in an organisation that cares for the natural resources of Wales
Copyright and research data
CLA Licensing and Product Innovation
The Modernist Archives Publishing Project (MAPP) and Copyright
The quotation exception in educational and scholarly contexts
CLA Licensing and Product Innovation
The question of #ebookSOS: is copyright reform the answer?
License to View: Copyright & Films in the Age of Covid at a Canadian University
Exceptions to Copyright
An Act for the Encouragement of Lending? UK Copyright Law and Access to Digit...
Ad

Recently uploaded (20)

PDF
advance database management system book.pdf
PDF
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
PPTX
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
PPTX
Lesson notes of climatology university.
PDF
Computing-Curriculum for Schools in Ghana
PPTX
Unit 4 Skeletal System.ppt.pptxopresentatiom
PDF
Trump Administration's workforce development strategy
PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
PDF
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
PPTX
History, Philosophy and sociology of education (1).pptx
PDF
What if we spent less time fighting change, and more time building what’s rig...
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PDF
Classroom Observation Tools for Teachers
PPTX
Cell Types and Its function , kingdom of life
PDF
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Empowerment Technology for Senior High School Guide
PPTX
Orientation - ARALprogram of Deped to the Parents.pptx
advance database management system book.pdf
SOIL: Factor, Horizon, Process, Classification, Degradation, Conservation
Onco Emergencies - Spinal cord compression Superior vena cava syndrome Febr...
Lesson notes of climatology university.
Computing-Curriculum for Schools in Ghana
Unit 4 Skeletal System.ppt.pptxopresentatiom
Trump Administration's workforce development strategy
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
LNK 2025 (2).pdf MWEHEHEHEHEHEHEHEHEHEHE
RTP_AR_KS1_Tutor's Guide_English [FOR REPRODUCTION].pdf
History, Philosophy and sociology of education (1).pptx
What if we spent less time fighting change, and more time building what’s rig...
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Classroom Observation Tools for Teachers
Cell Types and Its function , kingdom of life
احياء السادس العلمي - الفصل الثالث (التكاثر) منهج متميزين/كلية بغداد/موهوبين
Final Presentation General Medicine 03-08-2024.pptx
Empowerment Technology for Senior High School Guide
Orientation - ARALprogram of Deped to the Parents.pptx

The Trouble with TDM

  • 1. The Trouble with TDM* … (*Text & Data Mining) Matthew Lambert
  • 2. 2 bl.uk Matthew Lambert • Head of Copyright Policy & Assurance The British Library • National Library • 150-200M items • Legal deposit Library Introductions
  • 3. 3 bl.uk The project – Living with Machines • Living with Machines is research project between The British Library, The Turing Institute, Universities of Cambridge, East Anglia, Exeter and London (QMUL) • Funded by the UK Research and Innovation (UKRI) Strategic Priority Fund • Focussing on the period 1780-1918 the project uses computational analytical tools to examine the ways in which technology impacted society in the UK
  • 4. 4 bl.uk British Library’s Contribution • The British Library holds a vast collection of content, including newspapers and maps detailing changes in the period being looked at. • We provided over 100 newspaper titles and digitised more than 600,000 pages for the project.
  • 5. 5 bl.uk One quick caveat… This is only the copyright aspect – there are many other difficulties associated with a project of this size
  • 6. 6 bl.uk Difficulties Legal: • The TDM exception not drafted with research partnerships in mind • No other useful exceptions • Need to be able to make sample datasets available following research Practical: • Assessing the material at scale • Tracking down rights holders • Managing content while enabling analysis
  • 7. 7 bl.uk The law – CDPA 1988 – s29A The UK’s Copyright legislation does contain an exception covering Text and Data Mining • It is fantastic that we have the exception, however it was not written with partnerships in mind Let’s take a closer look…
  • 8. 8 bl.uk The law – CDPA 1988 – s29A • Great! The British Library holds physical copies of the items and so has lawful access to them • The project is non-commercial, and will provide acknowledgement • Content is not being sold or ‘let for hired’
  • 9. 9 bl.uk The law – CDPA 1988 – s29A So while the British Library can make the copies we can not give it to anyone else… …where does this leave our partners who are doing the research?
  • 10. 10 bl.uk The law – CDPA 1988 – s29A • So on the face of it the exception isn’t all that helpful... • Still all is not lost • It does allow us to digitise the material • It does also allow the BL to use the content (more on that later) • We will just need to use other approaches to fill the gaps • Assessing content • Licensing material • Managing risk • In addition to the initial legal difficulties, research of this type is expected to publish datasets so research can be checked
  • 11. 11 bl.uk Practical Difficulties • There is a vast amount of content the project wishes to use, hundreds of thousands of pages • Some of the content will be out of copyright and so free to use, but which? • Often the content was published by organizations which no longer exist • Is the content anonymous? Article by-lines are relatively recent in newspapers
  • 12. 12 bl.uk Practical Difficulties • If we decide to go down a licensing route who do we speak to? • Given all the unknowns there will inevitably be risks, so that needs to be managed. • How do we get the content to our partners?
  • 13. 13 bl.uk Approaches – Practical Difficulties • As each article and image in a newspaper is a distinct copyright work they potentially need to be individually looked at – very resource intensive • Assessing content at such scale has significant difficulties, so had to take a high level approach • Content was looked at in detail to determine safe dates to help with assessment • Spot checks to find out if / when by-lines became common in newspapers • Risk appetite for the project determined and processes built to manage them
  • 14. 14 bl.uk Approaches – Legal Difficulties • Once copyright status was determined we needed to work out how to deal with the in copyright material • The British Library can’t give the content to our partners and still remain within the exception • But what if the partners temporarily joined the British Library? Then they would have lawful access and work would fall under the exception… • Digitised content is held on a dedicated, secure server which the BL controls, researchers have limited access to the content they need
  • 15. 15 bl.uk • When content no longer needed, it is deleted from the secure server • Out of copyright datasets will be made available to test research results • Provided guidelines to researchers regarding publication of their results to ensure copyright was not infringed. Approaches – Legal Difficulties
  • 16. 16 bl.uk • After Brexit the DSM - will not change UK legislation (so articles 3 and 4 no help). • Recent consultation by the UK IPO on Copyright and AI did specifically cover TDM. • For UK research to remain globally competitive we need this exception to be widened. The Future