SlideShare a Scribd company logo
TEXT AND DATA MINING IN
PUBLIC RESEARCH
Rob Johnson – 13/12/2016
1
2
1.Why doesTDM matter?
2.Why isn’t it used more widely in public
research?
3. How do we change this?
Study aims
Assess economic impact of TDM
on public research in France via:
• Case studies (France, UK, Europe)
• Analysis of the relevance of a
copyright exception for TDM
3
http://adbu.fr/etude-tdm/
6-fold return
€6 contribution to EU economy for each €1 directly
generated by research universities (source: Biggar
Economics)
20% per annum
Estimated rate of return to public investment in
science and innovation (source: Frontier Economics)
€16 billion
Value of R&D performed within French universities
and public research bodies (source: Eurostat)
4
2.4 million
Scientific articles per annum
Zero
Number of researchers who can keep up
2.5 quintillion
bytes
Data produced each day
5
Any automated analytical technique
aiming to analyse text and data in
digital form in order to generate
information such as patterns, trends
and correlations.
European Commission. Proposal for a Directive of the European Parliament
and of the Council on copyright in the Digital Single Market
6
What is TDM?
BASE CAMP
Where are we now, and how did we get here?
7
…countries, in which academic researchers must
acquire the express consent of rights holders to
conduct lawful datamining, exhibit a
significantly lower share of data mining
research output relative to total research
output
Handke, Guilbault and Vallbe IS EUROPE FALLING BEHIND IN DATA MINING? (2015)
8
What is the problem?
The European ecosystem for engaging
in text and data mining remains
highly problematic… The end result:
Europe is being leapfrogged by rising
interest in other regions, notably
Asia.
Filippov, S. & Hofheinz, P. Text and Data Mining for Research and Innovation
(2016)
9
What is the result?
Legislative options
10
2014 2017?
Industry
self-
regulation
Mandatory exceptions to copyright
Non-commercial
research only
Commercial
research,
beneficiaries
restricted
1 2 3 4
Commercial
research purpose,
beneficiaries
unrestricted
Loi pour une République Numérique (Loi LEMAIRE)
28 September 2016
1.5?
Restriction France
No lawful access
Not scientific
literature
-
Not public research
Commercial
purpose
Conservation not by
designated body
Using a TDM exception
11
1.
ACHIEVING LEGAL CLARITY
12
Copyright exception
(Base Camp)
Camp 1:
Legal clarity
EC Directive
Camp 2: Access
to content
Camp 3: Technical
infrastructure
Camp 4: Skills
and support
Summit: Researchers
embrace TDM
The exception has made
a massive difference...
Petr Knoth, Open University, UK
14
…the definition of commercial
and non-commercial research
is creating uncertainty
Petr Knoth, Open University, UK
15
EC Proposed
Directive
• Consistent with the existing EU
copyright legal framework
• Could help resolve uncertainty over
commercial partnerships
• Currently out for consultation
Source: http://www.comodinicachia.com/timeline.html
What needs to happen?
• Communicate legal provisions for TDM with
certainty and clarity
• Clarify the exception’s scope where public
researchers collaborate with commercial partners
• Monitor the interaction of the copyright exception
with digital rights management (DRM), licensing and
other relevant legal regimes
17
Any questions?
18
2.
SECURING ACCESS
19
I scaled down my TDM research,
and had to exclude two
publishers… I couldn’t do what I
set out to do
Chris Hartgerink, Tilburg University, Netherlands
20
I had to ask too many publishers for the
right to download … it takes a lot of time
and … the publishers’ servers frequently
block us.
Mathieu Andro, INRA, France
21
What is the problem
with access?
• Technical protection measures (TPMs)
• Crawler traps
• Restricted access to application programming
interfaces (APIs)
22
• Incorporate TDM clauses into model licence
agreements
• Educate researchers on their rights
• Maintain dialogue with publishers
• Improve access through better infrastructure…
23
What needs to happen?
3.
INFRASTRUCTURE & TOOLS
24
Image: National Geographic
…Every time you have a new project or
data source… you hit issues about how
the documents are structured, oddities
of formatting, and so on.
Mark Greenwood, GATE, UK
25
The TDM Landscape
26
Source: OpenMinTED
• Invest in TDM infrastructure
• Make TDM accessible to non-specialists
• Streamline access
• Open standards and harmonised data formats
27
What needs to happen?
4.
SKILLS & SUPPORT
28
…We have algorithms to
answer questions, but we do
not have algorithms to ask
questions
François Rioult, GREYC Laboratory, Université de
Caen, France
• François Rioult
29
30
What is the role of the librarian?
Photo: REUTERS
The library needs to be able to say: ‘If
you’ve got a question about TDM,
come to us’
Danny Kingsley, Head of Scholarly Communications,
University of Cambridge, UK
31
Library support for TDM
• Advocacy
• Copyright advice
• Access to legal expertise
• Skills development and training
• Advice on data sources and tools
32
5.
EMBRACING TDM
33
34
"Because it's there"
35
Why?
There are so many obstructions in the
way of doing this research, and doing it
well. It is just too hard and so people do
other things
Ross Mounce, University of Cambridge, UK
36
• Endorsement by senior research leaders
• Funding and incentives linked to TDM
• Alignment with moves to open science
37
What needs to happen?
38
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
Why does TDM matter?
Public research is valuable
39
TDM makes research more efficient
TDM is worth investing in
40
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
Copyright exception
(Base Camp)
Camp 1:
Legal clarity
EC Directive
Camp 2: Access
to content
Camp 3: Technical
infrastructure
Camp 4: Skills
and support
Summit: Researchers
embrace TDM
42
1.Why doesTDM matter?
2.Why isn’t it used more widely in
public research?
3. How do we change this?
43
Libraries
•Monitor researchers’ experience
•Develop case studies and guidance
•Involve the national library
•Invest in TDM support
•Incorporate TDM clauses into licence
agreements
researchers’ experiences
Making TDM a reality
44
Legislators
• Provide certainty
• Enable public/private partnerships
• Monitor interaction with other
legislation (e.g. DRM)
Institutions/research leaders
• Endorse TDM
• Invest in library services
• Explore knowledge exchange
opportunities
Research funders
• Invest in infrastructure
• Forum to improve access
• Link TDM to Open Science
Publishers & providers
• Cloud services for TDM
• Steamline access
• Open, harmonised standards
Making TDM a reality
Rob Johnson
Template inspired by SlidesCarnival
Thank you
rob.johnson@research-consulting.com
www.research-consulting.com
45
http://adbu.fr/etude-tdm/
Full report available at::

More Related Content

PPTX
ContentMining for France and Europe; Lessons from 2 years in UK
PPTX
Towards Responsible Content Mining: A Cambridge perspective
PPTX
High throughput mining of the scholarly literature; talk at NIH
PPTX
ContentMining and Copyright at CopyCamp2017
PPTX
Big Data and ContentMining for Libraries
PPTX
WikiFactMine: Science for Everyone
PPTX
Mining Scientific Images
PPTX
ContentMining in Neuroscience
ContentMining for France and Europe; Lessons from 2 years in UK
Towards Responsible Content Mining: A Cambridge perspective
High throughput mining of the scholarly literature; talk at NIH
ContentMining and Copyright at CopyCamp2017
Big Data and ContentMining for Libraries
WikiFactMine: Science for Everyone
Mining Scientific Images
ContentMining in Neuroscience

What's hot (20)

PPTX
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
PPTX
A Global Commons for Scientific Data: Molecules and Wikidata
PPTX
Climate Change and Human Migration
PDF
Open Research Data: Licensing | Standards | Future
PPTX
High throughput mining of the scholarly literature
PDF
Museum impact: linking-up specimens with research published on them
PPTX
Content Mining at Wellcome Trust
PPTX
Can Computers understand the scientific literature (includes compscie material)
PPTX
High throughput mining of the scholarly literature: journals and theses
PDF
The State of Open Research Data
PPTX
Automatic Extraction of Knowledge from the Literature
PDF
Modern Tools & Rationales for 21st Century Research
PDF
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
PPTX
ContentMining in Neuroscience
PPTX
The Content Mine (presented at UKSG)
PPTX
High throughput mining of the scholarly literature
PPTX
ContentMine: Mining the Scientific Literature
PPTX
Disrupting the Publisher-Academic Complex
PPTX
Liberating facts from the scientific literature - Jisc Digifest 2016
PPSX
Cochrane workshop 2016
The mining "Revolution"; are Libraries supporting Researchers or Publishers"?
A Global Commons for Scientific Data: Molecules and Wikidata
Climate Change and Human Migration
Open Research Data: Licensing | Standards | Future
High throughput mining of the scholarly literature
Museum impact: linking-up specimens with research published on them
Content Mining at Wellcome Trust
Can Computers understand the scientific literature (includes compscie material)
High throughput mining of the scholarly literature: journals and theses
The State of Open Research Data
Automatic Extraction of Knowledge from the Literature
Modern Tools & Rationales for 21st Century Research
Specimen-level mining: bringing knowledge back 'home' to the Natural History ...
ContentMining in Neuroscience
The Content Mine (presented at UKSG)
High throughput mining of the scholarly literature
ContentMine: Mining the Scientific Literature
Disrupting the Publisher-Academic Complex
Liberating facts from the scientific literature - Jisc Digifest 2016
Cochrane workshop 2016
Ad

Similar to Text and data mining in UK and France (ADBU - 13 Dec 16) (20)

PDF
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
PDF
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
PPTX
Social, Political and Legal Aspects of Text and Data Mining (TDM)
PPT
Library Science Talk: Tensions between copyright and knowledge discovery
PPTX
FutureTDM Roadmap
PDF
FutureTDM Workshop II 29 March
PPT
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
PPT
Libraries at the centre of the debate on copyright and text and data mining: ...
PPTX
Supporting the uptake of TDM
PDF
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
PPTX
Legal Framework for TDM
PPTX
Tdm dechamp colin_open_minted
PDF
McCulloch NISO-ICSTI Joint Webinar
PPTX
Text and Data Mining and DG RTD (European Commission)
PDF
What have we learned from talking with the TDM community?
PPTX
UKSG webinar: The Law on TDM in Europe: an introduction with Giulia Dore, Uni...
PDF
Text & Data Mining Licensing Issues
PPTX
Developing a research Library position statement on Text and Data Mining in t...
PPTX
FutureTDM Symposium: Skills & Education
PPTX
TDM of National Libraries in the EU.pptx
Text and Data Mining : Making the Most of a Copyright Exception. Julien Roche...
Data Analytics and the Legal Landscape: Intellectual Property and Data Protec...
Social, Political and Legal Aspects of Text and Data Mining (TDM)
Library Science Talk: Tensions between copyright and knowledge discovery
FutureTDM Roadmap
FutureTDM Workshop II 29 March
FutureTDM: Increasing Uptake of Text and Data Mining in the EU
Libraries at the centre of the debate on copyright and text and data mining: ...
Supporting the uptake of TDM
Text and data mining - the opportunities and the EU conundrum - why aren’t we...
Legal Framework for TDM
Tdm dechamp colin_open_minted
McCulloch NISO-ICSTI Joint Webinar
Text and Data Mining and DG RTD (European Commission)
What have we learned from talking with the TDM community?
UKSG webinar: The Law on TDM in Europe: an introduction with Giulia Dore, Uni...
Text & Data Mining Licensing Issues
Developing a research Library position statement on Text and Data Mining in t...
FutureTDM Symposium: Skills & Education
TDM of National Libraries in the EU.pptx
Ad

More from Rob Johnson (6)

PPTX
Research for development - ARMA conference June 2019
PPTX
Where next for Plan S?
PPTX
Embracing Complexity - The new normal in scholarly communication
PPTX
OA market presentation for open aire 20 april (final)
PDF
Securing the future of OA policies - Rob Johnson
PDF
Open Access Advocacy - Joining the Dots (session 4c)
Research for development - ARMA conference June 2019
Where next for Plan S?
Embracing Complexity - The new normal in scholarly communication
OA market presentation for open aire 20 april (final)
Securing the future of OA policies - Rob Johnson
Open Access Advocacy - Joining the Dots (session 4c)

Recently uploaded (20)

PPTX
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
PPTX
Microbes in human welfare class 12 .pptx
PPTX
PMR- PPT.pptx for students and doctors tt
PPTX
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
PPT
LEC Synthetic Biology and its application.ppt
PPT
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
PPTX
TORCH INFECTIONS in pregnancy with toxoplasma
PPTX
Welcome-grrewfefweg-students-of-2024.pptx
PDF
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
PPTX
Substance Disorders- part different drugs change body
PPTX
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
PPT
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
PPT
6.1 High Risk New Born. Padetric health ppt
PPT
Presentation of a Romanian Institutee 2.
PDF
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
PPTX
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
PDF
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
PDF
Science Form five needed shit SCIENEce so
PPTX
Seminar Hypertension and Kidney diseases.pptx
PPTX
A powerpoint on colorectal cancer with brief background
INTRODUCTION TO PAEDIATRICS AND PAEDIATRIC HISTORY TAKING-1.pptx
Microbes in human welfare class 12 .pptx
PMR- PPT.pptx for students and doctors tt
GREEN FIELDS SCHOOL PPT ON HOLIDAY HOMEWORK
LEC Synthetic Biology and its application.ppt
THE CELL THEORY AND ITS FUNDAMENTALS AND USE
TORCH INFECTIONS in pregnancy with toxoplasma
Welcome-grrewfefweg-students-of-2024.pptx
BET Eukaryotic signal Transduction BET Eukaryotic signal Transduction.pdf
Substance Disorders- part different drugs change body
Lesson-1-Introduction-to-the-Study-of-Chemistry.pptx
1. INTRODUCTION TO EPIDEMIOLOGY.pptx for community medicine
6.1 High Risk New Born. Padetric health ppt
Presentation of a Romanian Institutee 2.
Assessment of environmental effects of quarrying in Kitengela subcountyof Kaj...
SCIENCE 4 Q2W5 PPT.pptx Lesson About Plnts and animals and their habitat
Is Earendel a Star Cluster?: Metal-poor Globular Cluster Progenitors at z ∼ 6
Science Form five needed shit SCIENEce so
Seminar Hypertension and Kidney diseases.pptx
A powerpoint on colorectal cancer with brief background

Text and data mining in UK and France (ADBU - 13 Dec 16)

  • 1. TEXT AND DATA MINING IN PUBLIC RESEARCH Rob Johnson – 13/12/2016 1
  • 2. 2 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 3. Study aims Assess economic impact of TDM on public research in France via: • Case studies (France, UK, Europe) • Analysis of the relevance of a copyright exception for TDM 3 http://adbu.fr/etude-tdm/
  • 4. 6-fold return €6 contribution to EU economy for each €1 directly generated by research universities (source: Biggar Economics) 20% per annum Estimated rate of return to public investment in science and innovation (source: Frontier Economics) €16 billion Value of R&D performed within French universities and public research bodies (source: Eurostat) 4
  • 5. 2.4 million Scientific articles per annum Zero Number of researchers who can keep up 2.5 quintillion bytes Data produced each day 5
  • 6. Any automated analytical technique aiming to analyse text and data in digital form in order to generate information such as patterns, trends and correlations. European Commission. Proposal for a Directive of the European Parliament and of the Council on copyright in the Digital Single Market 6 What is TDM?
  • 7. BASE CAMP Where are we now, and how did we get here? 7
  • 8. …countries, in which academic researchers must acquire the express consent of rights holders to conduct lawful datamining, exhibit a significantly lower share of data mining research output relative to total research output Handke, Guilbault and Vallbe IS EUROPE FALLING BEHIND IN DATA MINING? (2015) 8 What is the problem?
  • 9. The European ecosystem for engaging in text and data mining remains highly problematic… The end result: Europe is being leapfrogged by rising interest in other regions, notably Asia. Filippov, S. & Hofheinz, P. Text and Data Mining for Research and Innovation (2016) 9 What is the result?
  • 10. Legislative options 10 2014 2017? Industry self- regulation Mandatory exceptions to copyright Non-commercial research only Commercial research, beneficiaries restricted 1 2 3 4 Commercial research purpose, beneficiaries unrestricted Loi pour une République Numérique (Loi LEMAIRE) 28 September 2016 1.5?
  • 11. Restriction France No lawful access Not scientific literature - Not public research Commercial purpose Conservation not by designated body Using a TDM exception 11
  • 13. Copyright exception (Base Camp) Camp 1: Legal clarity EC Directive Camp 2: Access to content Camp 3: Technical infrastructure Camp 4: Skills and support Summit: Researchers embrace TDM
  • 14. The exception has made a massive difference... Petr Knoth, Open University, UK 14
  • 15. …the definition of commercial and non-commercial research is creating uncertainty Petr Knoth, Open University, UK 15
  • 16. EC Proposed Directive • Consistent with the existing EU copyright legal framework • Could help resolve uncertainty over commercial partnerships • Currently out for consultation Source: http://www.comodinicachia.com/timeline.html
  • 17. What needs to happen? • Communicate legal provisions for TDM with certainty and clarity • Clarify the exception’s scope where public researchers collaborate with commercial partners • Monitor the interaction of the copyright exception with digital rights management (DRM), licensing and other relevant legal regimes 17
  • 20. I scaled down my TDM research, and had to exclude two publishers… I couldn’t do what I set out to do Chris Hartgerink, Tilburg University, Netherlands 20
  • 21. I had to ask too many publishers for the right to download … it takes a lot of time and … the publishers’ servers frequently block us. Mathieu Andro, INRA, France 21
  • 22. What is the problem with access? • Technical protection measures (TPMs) • Crawler traps • Restricted access to application programming interfaces (APIs) 22
  • 23. • Incorporate TDM clauses into model licence agreements • Educate researchers on their rights • Maintain dialogue with publishers • Improve access through better infrastructure… 23 What needs to happen?
  • 24. 3. INFRASTRUCTURE & TOOLS 24 Image: National Geographic
  • 25. …Every time you have a new project or data source… you hit issues about how the documents are structured, oddities of formatting, and so on. Mark Greenwood, GATE, UK 25
  • 27. • Invest in TDM infrastructure • Make TDM accessible to non-specialists • Streamline access • Open standards and harmonised data formats 27 What needs to happen?
  • 29. …We have algorithms to answer questions, but we do not have algorithms to ask questions François Rioult, GREYC Laboratory, Université de Caen, France • François Rioult 29
  • 30. 30 What is the role of the librarian? Photo: REUTERS
  • 31. The library needs to be able to say: ‘If you’ve got a question about TDM, come to us’ Danny Kingsley, Head of Scholarly Communications, University of Cambridge, UK 31
  • 32. Library support for TDM • Advocacy • Copyright advice • Access to legal expertise • Skills development and training • Advice on data sources and tools 32
  • 34. 34
  • 36. There are so many obstructions in the way of doing this research, and doing it well. It is just too hard and so people do other things Ross Mounce, University of Cambridge, UK 36
  • 37. • Endorsement by senior research leaders • Funding and incentives linked to TDM • Alignment with moves to open science 37 What needs to happen?
  • 38. 38 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 39. Why does TDM matter? Public research is valuable 39 TDM makes research more efficient TDM is worth investing in
  • 40. 40 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 41. Copyright exception (Base Camp) Camp 1: Legal clarity EC Directive Camp 2: Access to content Camp 3: Technical infrastructure Camp 4: Skills and support Summit: Researchers embrace TDM
  • 42. 42 1.Why doesTDM matter? 2.Why isn’t it used more widely in public research? 3. How do we change this?
  • 43. 43 Libraries •Monitor researchers’ experience •Develop case studies and guidance •Involve the national library •Invest in TDM support •Incorporate TDM clauses into licence agreements researchers’ experiences Making TDM a reality
  • 44. 44 Legislators • Provide certainty • Enable public/private partnerships • Monitor interaction with other legislation (e.g. DRM) Institutions/research leaders • Endorse TDM • Invest in library services • Explore knowledge exchange opportunities Research funders • Invest in infrastructure • Forum to improve access • Link TDM to Open Science Publishers & providers • Cloud services for TDM • Steamline access • Open, harmonised standards Making TDM a reality
  • 45. Rob Johnson Template inspired by SlidesCarnival Thank you rob.johnson@research-consulting.com www.research-consulting.com 45 http://adbu.fr/etude-tdm/ Full report available at::

Editor's Notes

  • #5: France - €6.4billion R&D in government sector, €10 billion in HE UK - €3billion in government, €9 billion
  • #7: A number of studies indicate that TDM can increase the efficiency of research Increase coverage of literature reviews Cut down manual work Automate information retrieval Accelerate drug discovery
  • #12: Note - Conservation requirements could be a positive in terms of reproducibility
  • #23: A many-to-many problem
  • #31: Edmund Hillary and Tenzing Norgay
  • #33: Advocacy for the benefits of TDM at all levels of the organisation Copyright advice on using the TDM exception Access to legal expertise Skills development (indexing and metadata curation) and access to technical training (coding and high performance computing) Advice on data sources and tools