SlideShare a Scribd company logo
Copyright & Fair Use for Digital Projects
Text Data Mining & Publishing
UC Berkeley Library
Rachael Samberg, J.D., MLIS
Stacy Reardon, MA, MLS
What you can do,
not what you can’t
Scholars are
turning content
into data
But scholars
(and
academic
staff, in
supporting
them) face
questions
about rights
The Basics of TDM
“Text mining is the use of automated tools, techniques or
technology to process large volumes of digital content
that is often not well structured - to identify and
select relevant information; to extract information from
the content, to identify relationships within / between /
across documents and incidents or events for
meta-analysis.”
- from Text & Data Mining - A Librarian Overview by Ann
Oakerson (2013)
TDM
Literacies
Contracts
Privacy
Copyright
Ethics &
Policy
Other
Statutes/
Use Cases
Copyright
Exclusive rights to original
expression for limited
periods of time
Exclusive Rights
▪Reproduction
▪Derivative works
▪Distribution
▪Public performance
▪Public display
Public Domain
War and Peace, Tolstoy, English
translation 1899 CDC report
Facts & Ideas
Nicholas Mazza,
Poetry therapy: Toward a
research agenda for the 1990s,
The Arts in Psychotherapy,
Volume 20, Issue 1,1993,51-59,
Content Data about the content
TDM researchers can use copyrighted content!
Fair Use
17 U.S.C.§ 107
“The fair use of a
copyrighted work…for
prposes such as
criticism, comment,
news reporting,
teaching…, scholarship,
or research, is not an
infringement of
copyright.”
Four-Factor Balancing Test
1. Purpose & character of use
“Transformativeness” often
dominates
2. Nature of copyrighted work
Whether factual/scholarly work
3. Amount and substantiality
Size & importance of portion
4. Effect on potential market
Whether it supplants market
Authors Guild v. HathiTrust
755 F.3d 87 (2d Cir. 2014)
Textual analysis that digital
library enabled was
transformative under factor
one, and overall fair
Authors Guild v. Google
804 F.3d 202 (2d Cir. 2015)
Creation of full-text
searchable database with
“snippet view” and “ngram
viewer” [search strings]
were fair uses
iParadigms, 562 F. 3d 630
(4th Cir. 2009)
Plagiarism detection
software that replicated
content to detect
similarities was fair use
From research
to publishing
Fox News v TVEyes,
883 F.3d 169 (2018)
Basic functionality and
archiving features were
fair use, but making
available 10-minute
clips was not
● Likely fair to digitize to
conduct text data mining
(w/security precautions)
● May not be fair to republish
large portions of content
● May not be fair to circulate
the digitized texts/corpus
● Case-by-case
Takeaways
Contracts
Database Agreements
Challenges:
- Terms
- Visibility
Archives
Agreement
“I understand that
permission to publish, or
otherwise publicly use,
materials . . . must be
[granted by library]
I understand further that
the University makes no
representation that it is
the owner of the
copyright... and that
permission to publish must
also be obtained from the
owner of the copyright.”
Website Terms
“If you intend to
quote extensive
amounts of text, use
other original
content, or
reproduce images
from this site,
please contact us
for permission.”
California Digital Library’s Model Database Language
Authorized Users may use the Licensed Materials to
perform and engage in text and/or data mining
activities for academic research, scholarship, and
other educational purposes... and may utilize and
share the results of text and/or data mining in their
scholarly work and make the results available for use
by others, so long as the purpose is not to create a
product for use by third parties that would substitute
for the Licensed Materials.
CDL Model License: Preserving Fair Use
Notwithstanding the foregoing, nothing in this
agreement shall otherwise restrict uses of the
material that would be fair use pursuant to 17 U.S.C.§
107 et seq.
● Agreements may constrict uses that
would otherwise be fair
● Familiarize yourself with the
agreement(s), ask for help,
evaluate risk
● Alternatives:
○ Check to see if site has an API
○ Negotiate with content providers
/ ask permission
Takeaways
Other
Statutes/
Use Cases
- Computer Fraud &
Abuse Act
- Digital Rights
Management (DRM) &
Digital Millennium
Copyright Act
Other Issues
Privacy
Rights of Privacy
● © protects copyright holders'
property rights
● Privacy protects people who are
subjects of works
● Fed’l (FERPA, HIPAA) vs. State
● State limits
○ Expire at death
○ Newsworthiness and permission
are defenses
Text Data Mining & Publishing
Ethics &
Policy
Text Data Mining & Publishing
- Indigenous knowledge
- Cultural heritage
materials
- Endangered species
protection
Exercise
http://ucblib.link/rw
UC Berkeley Library
Rachael Samberg, J.D., MLIS
Stacy Reardon, MA, MLS
Text Data Mining & Publishing
Text Data Mining Guide (Library)
guides.lib.berkeley.edu/text-mining
TDM Access Help
tdm-access@berkeley.edu

More Related Content

PDF
Text Data Mining & Publishing: Legal Literacies
PDF
Building Legal Literacies for Text Data Mining
PDF
Copyright Choices and Voices
PPTX
PhD Projects in Learning Technologies Research Guidance
PPTX
PhD Research Topics in IoT Research Ideas
DOCX
Information Retrieval
PPTX
DRI Introductory Training: Introduction to Metadata
PPT
Library Science Talk: Tensions between copyright and knowledge discovery
Text Data Mining & Publishing: Legal Literacies
Building Legal Literacies for Text Data Mining
Copyright Choices and Voices
PhD Projects in Learning Technologies Research Guidance
PhD Research Topics in IoT Research Ideas
Information Retrieval
DRI Introductory Training: Introduction to Metadata
Library Science Talk: Tensions between copyright and knowledge discovery

What's hot (20)

PPTX
ODiP: Reproducibility, open data and GDPR
ODP
Open Opportunities
PPTX
AZ to eDiscovery
PDF
Di d dlf_handout
PPT
Mid-Sweden University/SNIA Conference 13 October 2008
PDF
Technologies and infrastructures supporting text and data analytics: Challeng...
PPTX
Digital Nightmares: Accessing the Technology
PDF
Data Sharing and the Polar Information Commons
PDF
20200504_Data, Data Ownership and Open Science
PDF
20200429_Data, Data Ownership and Open Science
PPTX
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
PDF
A profile of Applied Data Analysis Lab (ADA Lab)
PPTX
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
PPT
Semantic data mining of literature
PPT
Minning WWW
PPTX
Preparing research data for sharing
PPTX
Supporting the uptake of TDM
PPTX
Open Scientific Data
PDF
OpenMinTeD, LIBER conference 2017
PPTX
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
ODiP: Reproducibility, open data and GDPR
Open Opportunities
AZ to eDiscovery
Di d dlf_handout
Mid-Sweden University/SNIA Conference 13 October 2008
Technologies and infrastructures supporting text and data analytics: Challeng...
Digital Nightmares: Accessing the Technology
Data Sharing and the Polar Information Commons
20200504_Data, Data Ownership and Open Science
20200429_Data, Data Ownership and Open Science
20200429_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
A profile of Applied Data Analysis Lab (ADA Lab)
Horizon 2020 Open Research Data Pilot, Jean-Claude Burgelman, DG RTD European...
Semantic data mining of literature
Minning WWW
Preparing research data for sharing
Supporting the uptake of TDM
Open Scientific Data
OpenMinTeD, LIBER conference 2017
20200504_OpenAIRE Legal Policy Webinar: GDPR and Sharing Data
Ad

Similar to Text Data Mining & Publishing (20)

PDF
Librarian Legal Literacies for Text Data Mining
PPTX
Legal aspects of content mining
PPTX
Copyright Reform and Open Data
PPTX
Copyright Reform and Open Data
PPTX
Copyright and text and data mining
PPTX
Legal Framework for TDM
PDF
What to know before you click "publish"
PDF
How to share and publish data: resources, law, and policy
PDF
New statutory limitation for Text Data Mining - a Pyrrhic victory? | www.euda...
PPTX
November 18, 2015 NISO Webinar: Text Mining: Digging Deep for Knowledge
PPT
Libraries at the centre of the debate on copyright and text and data mining: ...
PDF
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
PPT
The legal perspective- Lucie Guibault, University of Amsterdam
PDF
Text & Data Mining Licensing Issues
PPTX
R. palomares's copyright crash course updated from chapter 5,6,7 readings
PPT
Issues and trend in multimedia
PPTX
R. palomares's copyright crash course updated from chapter 8 and 9 readings
PPTX
Streamline Your Negotiation: Creating & Updating a License Template for Your...
PPT
Legal & ethical issues
Librarian Legal Literacies for Text Data Mining
Legal aspects of content mining
Copyright Reform and Open Data
Copyright Reform and Open Data
Copyright and text and data mining
Legal Framework for TDM
What to know before you click "publish"
How to share and publish data: resources, law, and policy
New statutory limitation for Text Data Mining - a Pyrrhic victory? | www.euda...
November 18, 2015 NISO Webinar: Text Mining: Digging Deep for Knowledge
Libraries at the centre of the debate on copyright and text and data mining: ...
ICIC 2013 Conference Proceedings Kim Zwollo Rights Direct
The legal perspective- Lucie Guibault, University of Amsterdam
Text & Data Mining Licensing Issues
R. palomares's copyright crash course updated from chapter 5,6,7 readings
Issues and trend in multimedia
R. palomares's copyright crash course updated from chapter 8 and 9 readings
Streamline Your Negotiation: Creating & Updating a License Template for Your...
Legal & ethical issues
Ad

More from UC Berkeley Office of Scholarly Communication Services (20)

PDF
Copyright (& Other Laws & Policies) and Your Dissertation
PDF
Copyright, contracts & open licensing for digital scholarship
PDF
Update on UC Berkeley Library Open Access Investment Process
PDF
Opportunities for Open Access in Arts & Humanities
PDF
Open access investment at the local level
PDF
Copyright & Fair Use for Digital Projects
PDF
Can We Digitize This? Should We? Navigating Ethics, Law, and Policy in Bringi...
PDF
Managing & Maximizing Your Scholarly Impact
PDF
PDF
Publish Digital Books and Open Educational Resources with Pressbooks
PDF
Can We Digitize This? Understanding Law, Policy, & Ethics in Bringing our Col...
PDF
TSPOA/SPC Webinar 3: Engaging societies and society journals in transitioning...
PDF
TSPOA/SPC Webinar 2: Funding pathways for learned society open access publis...
PDF
TSPOA/SPC Webinar 1: Understanding Learned Societies
PDF
Dipping a toe into the sea of scholarly publishing
PDF
Transitioning Society Publications to Open Access
PDF
Responsible Access For Digital Special Collections
PDF
Copyright and Scholarly Publishing Issues and options for Publishing Librarians
PDF
Publish or Perish Reframed: Navigating the New Landscape of Scholarly Publis...
PDF
Copyright and Fair Use in Digital Projects
Copyright (& Other Laws & Policies) and Your Dissertation
Copyright, contracts & open licensing for digital scholarship
Update on UC Berkeley Library Open Access Investment Process
Opportunities for Open Access in Arts & Humanities
Open access investment at the local level
Copyright & Fair Use for Digital Projects
Can We Digitize This? Should We? Navigating Ethics, Law, and Policy in Bringi...
Managing & Maximizing Your Scholarly Impact
Publish Digital Books and Open Educational Resources with Pressbooks
Can We Digitize This? Understanding Law, Policy, & Ethics in Bringing our Col...
TSPOA/SPC Webinar 3: Engaging societies and society journals in transitioning...
TSPOA/SPC Webinar 2: Funding pathways for learned society open access publis...
TSPOA/SPC Webinar 1: Understanding Learned Societies
Dipping a toe into the sea of scholarly publishing
Transitioning Society Publications to Open Access
Responsible Access For Digital Special Collections
Copyright and Scholarly Publishing Issues and options for Publishing Librarians
Publish or Perish Reframed: Navigating the New Landscape of Scholarly Publis...
Copyright and Fair Use in Digital Projects

Recently uploaded (20)

PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
PPH.pptx obstetrics and gynecology in nursing
PPTX
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
PPTX
Renaissance Architecture: A Journey from Faith to Humanism
PDF
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
PDF
TR - Agricultural Crops Production NC III.pdf
PDF
Microbial disease of the cardiovascular and lymphatic systems
PPTX
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PDF
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
PDF
Insiders guide to clinical Medicine.pdf
PDF
Supply Chain Operations Speaking Notes -ICLT Program
PDF
Anesthesia in Laparoscopic Surgery in India
PDF
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
PDF
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
PPTX
master seminar digital applications in india
Final Presentation General Medicine 03-08-2024.pptx
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPH.pptx obstetrics and gynecology in nursing
BOWEL ELIMINATION FACTORS AFFECTING AND TYPES
Renaissance Architecture: A Journey from Faith to Humanism
ANTIBIOTICS.pptx.pdf………………… xxxxxxxxxxxxx
TR - Agricultural Crops Production NC III.pdf
Microbial disease of the cardiovascular and lymphatic systems
The Healthy Child – Unit II | Child Health Nursing I | B.Sc Nursing 5th Semester
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
The Lost Whites of Pakistan by Jahanzaib Mughal.pdf
Module 4: Burden of Disease Tutorial Slides S2 2025
IMMUNITY IMMUNITY refers to protection against infection, and the immune syst...
Insiders guide to clinical Medicine.pdf
Supply Chain Operations Speaking Notes -ICLT Program
Anesthesia in Laparoscopic Surgery in India
Physiotherapy_for_Respiratory_and_Cardiac_Problems WEBBER.pdf
Origin of periodic table-Mendeleev’s Periodic-Modern Periodic table
master seminar digital applications in india

Text Data Mining & Publishing

  • 1. Copyright & Fair Use for Digital Projects Text Data Mining & Publishing UC Berkeley Library Rachael Samberg, J.D., MLIS Stacy Reardon, MA, MLS
  • 2. What you can do, not what you can’t
  • 5. The Basics of TDM “Text mining is the use of automated tools, techniques or technology to process large volumes of digital content that is often not well structured - to identify and select relevant information; to extract information from the content, to identify relationships within / between / across documents and incidents or events for meta-analysis.” - from Text & Data Mining - A Librarian Overview by Ann Oakerson (2013)
  • 7. Copyright Exclusive rights to original expression for limited periods of time
  • 9. Public Domain War and Peace, Tolstoy, English translation 1899 CDC report
  • 10. Facts & Ideas Nicholas Mazza, Poetry therapy: Toward a research agenda for the 1990s, The Arts in Psychotherapy, Volume 20, Issue 1,1993,51-59,
  • 11. Content Data about the content TDM researchers can use copyrighted content!
  • 12. Fair Use 17 U.S.C.§ 107 “The fair use of a copyrighted work…for prposes such as criticism, comment, news reporting, teaching…, scholarship, or research, is not an infringement of copyright.”
  • 13. Four-Factor Balancing Test 1. Purpose & character of use “Transformativeness” often dominates 2. Nature of copyrighted work Whether factual/scholarly work 3. Amount and substantiality Size & importance of portion 4. Effect on potential market Whether it supplants market
  • 14. Authors Guild v. HathiTrust 755 F.3d 87 (2d Cir. 2014) Textual analysis that digital library enabled was transformative under factor one, and overall fair Authors Guild v. Google 804 F.3d 202 (2d Cir. 2015) Creation of full-text searchable database with “snippet view” and “ngram viewer” [search strings] were fair uses
  • 15. iParadigms, 562 F. 3d 630 (4th Cir. 2009) Plagiarism detection software that replicated content to detect similarities was fair use
  • 17. Fox News v TVEyes, 883 F.3d 169 (2018) Basic functionality and archiving features were fair use, but making available 10-minute clips was not
  • 18. ● Likely fair to digitize to conduct text data mining (w/security precautions) ● May not be fair to republish large portions of content ● May not be fair to circulate the digitized texts/corpus ● Case-by-case Takeaways
  • 21. Archives Agreement “I understand that permission to publish, or otherwise publicly use, materials . . . must be [granted by library] I understand further that the University makes no representation that it is the owner of the copyright... and that permission to publish must also be obtained from the owner of the copyright.”
  • 22. Website Terms “If you intend to quote extensive amounts of text, use other original content, or reproduce images from this site, please contact us for permission.”
  • 23. California Digital Library’s Model Database Language Authorized Users may use the Licensed Materials to perform and engage in text and/or data mining activities for academic research, scholarship, and other educational purposes... and may utilize and share the results of text and/or data mining in their scholarly work and make the results available for use by others, so long as the purpose is not to create a product for use by third parties that would substitute for the Licensed Materials.
  • 24. CDL Model License: Preserving Fair Use Notwithstanding the foregoing, nothing in this agreement shall otherwise restrict uses of the material that would be fair use pursuant to 17 U.S.C.§ 107 et seq.
  • 25. ● Agreements may constrict uses that would otherwise be fair ● Familiarize yourself with the agreement(s), ask for help, evaluate risk ● Alternatives: ○ Check to see if site has an API ○ Negotiate with content providers / ask permission Takeaways
  • 27. - Computer Fraud & Abuse Act - Digital Rights Management (DRM) & Digital Millennium Copyright Act Other Issues
  • 29. Rights of Privacy ● © protects copyright holders' property rights ● Privacy protects people who are subjects of works ● Fed’l (FERPA, HIPAA) vs. State ● State limits ○ Expire at death ○ Newsworthiness and permission are defenses
  • 33. - Indigenous knowledge - Cultural heritage materials - Endangered species protection
  • 35. UC Berkeley Library Rachael Samberg, J.D., MLIS Stacy Reardon, MA, MLS Text Data Mining & Publishing Text Data Mining Guide (Library) guides.lib.berkeley.edu/text-mining TDM Access Help tdm-access@berkeley.edu