SlideShare a Scribd company logo
The HathiTrust Research Center: 
Big Data Analytics in a Secure 
pti.iu.edu/sc14 
Data Framework 
@hathitrust #SC14 
Beth Plale | @bplale 
Director Data to Insight Center | Indiana University 
Robert H. McDonald | @mcdonald 
Deputy Director Data to Insight Center | Indiana University
pti.iu.edu/sc14 
@hathitrust #SC14 
Outline 
• What is the HTRC? 
• Non-Consumptive Research Paradigm 
• Current Architecture 
• Future Architecture 
• Advanced Collaborative Support (RFP) 
• HTRC Science on a Sphere 
• HTRC @ Events
pti.iu.edu/sc14 
@hathitrust #SC14 
HathiTrust Digital Library 
• HathiTrust is a partnership of 
90+ academic & research 
institutions, offering a collection 
of millions of digitized titles. 
• http://hathitrust.org 
– IU is a founding member of the 
HathiTrust along with University of 
Michigan, University of California, 
and the University of Virginia
@hathitrust #SC14 
HathiTrust Research Center 
Mission 
• Public research arm of HathiTrust 
• Goal: enable researchers world-wide to accomplish tera-scale 
pti.iu.edu/sc14 
text data-mining and analysis 
– Develop cutting-edge software tools for processing, analyzing 
text 
– Develop cyberinfrastructure to enable HPC access to the 
HathiTrust Digital Library 
• Established: July, 2011 
• Collaborative center: Indiana University & University of 
Illinois
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Timeline 
• Phase I: development 01 Jul 2011 – 31 Mar 2013 
– HTRC software and services release v1.0 
https://github.com/htrc 
• Phase II: outreach, 01 Apr 2013 – 30 June 2014 
– 2nd HTRC UnCamp Sep ’13 
• Phase III: operations, 01 July 2014 - present
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Current Users 
Projected Use 2019 
Digital 
Humanities 
(60) 
Education 
(60) 
Informatics 
(60) 
Observers 
(20) 
194 existing user accounts 
Lots of user accounts; good 
starting point. 
Improve : 
• Increase amount of real work 
being accomplished as 
measured by usage on HTRC’s 
compute resources Quarry and 
Big Red II at IU 
• Develop educational uses 
• Develop informatics uses 
• Decrease number of observers 
to 10% 
 Project 200 users at any one time 
of which 90% are doing relevant 
education/scholarship 
6
pti.iu.edu/sc14 
@hathitrust #SC14 
Non-Consumptive Research 
Paradigm 
• No action or set of actions on part of users, 
either acting alone or in cooperation with other 
users over duration of one or multiple sessions 
can result in sufficient information gathered from 
collection of copyrighted works to reassemble 
pages from collection. 
• Definition disallows collusion between users, or 
accumulation of material over time. 
Differentiates human researcher from proxy 
which is not a user. Users are human beings.
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC 
All the complexity 
Complexity hiding interface 
Request 
Spatial plots 
Statistical plots 
Tabular info
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Version 2.0
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Goals 
• Provide a persistent and sustainable structure to 
enable original and cutting edge research. 
– Leverage data storage and computational infrastructure at Indiana 
& Illinois 
– Stimulate community development of new functionality and tools 
– Use tools to enable discoveries that would not be possible without 
the HTRC 
• Enable scholars to fully utilize content of 
HathiTrust Library while preventing intellectual 
property misuse within U.S. copyright law. 
– Provision secure computational and data environment for scholars 
to perform research using HathiTrust Digital Library.
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Organization 
2014-18 
HTRC Executive 
Mgmt 
Administrative 
Support 
Core 
Development 
Advanced 
Research 
Advanced 
Collaborative 
Support 
Scholarly 
Commons
HTRC Data Capsule 
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Data Capsule@IU 
Team 
• Beth Plale (PI) 
• Jiaan Zeng 
• Guangchen Ruan 
HTRC Data Capsule@Michigan Team 
• Atul Prakash (PI) 
• Alexander Crowell 
Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and 
Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse 
of texts. In Proceedings of the 5th ACM workshop 
on Scientific cloud computing (ScienceCloud '14). ACM, New York, 
NY, USA, 9-16. DOI=10.1145/2608029.2608031 
http://doi.acm.org/10.1145/2608029.2608031 
Special Thanks to 
• Samitha Liyanage 
• Milinda Pathirage 
• Zong Peng 
• Earlence Fernandes 
• Ajit Aluri
User Authentication 
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Data Capsule 
VM-1 … 
Host-1 
Web UI 
Web Services 
Hypervisor Scripts 
… 
Database 
Firewall 
Audit 
Image Store 
Volume Store 
VM-k 
VM-1 … VM-k 
Host-N 
Web front end Web service Backend
@hathitrust #SC14 
HTRC Data Capsule Workflow 
pti.iu.edu/sc14
@hathitrust #SC14 
Data Capsule Screenshots 
pti.iu.edu/sc14 
Maintenance Mode 
Secure Mode
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC Science on a Sphere #SC14 
1. Texts published per 
country 
2. HathiTrust Member 
Institutions 
3. HT Google analytics
@hathitrust #SC14 
HTRC Advanced Collaborative Support 
• ACS will be offered on a rolling basis over next 
pti.iu.edu/sc14 
four years 2014-18 
• 1st RFP Call Deadline is Jan 8, 2015 5:00pm 
eastern 
– RFP - http://www.hathitrust.org/htrc/acs-rfp 
• For more info on the Advanced Collaborative 
Support please contact: 
htrc.acs.awards@gmail.com
pti.iu.edu/sc14 
@hathitrust #SC14 
HTRC@Events 
• DHCS 2014, Oct 22, 2014 
Evanston, IL 
• SC14 – IU Booth, Nov 17-19, 
2014, New Orleans, LA 
• CLIR/CNI Workshop on 
Expanded Access to 
Collections, Dec. 7, 2014, 
Washington, DC 
• HTRC UnCamp 2015 – March 
30-31, 2015 Ann Arbor, MI
pti.iu.edu/sc14 
@hathitrust #SC14 
Thank You 
HTRC IU Team 
• Beth Plale (PI) 
• Robert H. McDonald 
• Miao Chen 
• Guangchen Ruan 
• Zong Peng 
• Milinda Pathirage 
• Samitha Liyanage 
• Leena Unnikrishnan 
• Nicholae Cline 
HTRC UIUC Team 
• J. Stephen Downie (PI) 
• Beth Namachchivaya 
• Megan Senseney 
• Sayan Bhattacharyya 
• Colleen Fallaw 
• Loretta Auvil 
• Boris Capitanu 
• Harriet Green
@hathitrust #SC14 
More Information on HTRC 
• For details http://www.hathitrust.org/htrc/faq 
• General contact info 
pti.iu.edu/sc14 
– J. Stephen Downie, Co-Director HTRC, 
jdownie@Illinois.edu 
– Beth Plale, Co-Director HTRC, plale@indiana.edu 
• Requests for capability, interest 
– Miao Chen, Asst. Director for Outreach HTRC 
miaochen@indiana.edu
@hathitrust #SC14 
The HathiTrust Research Center: 
Big Data Analytics in a Secure 
pti.iu.edu/sc14 
Data Framework 
For more on HTRC: http://www.hathitrust.org/htrc 
For these slides go to:

More Related Content

PDF
Plale HathiTrust El Colegio de Mexico May2014
PDF
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
PDF
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
PPTX
Open Data and the Panton Principles in the Humanities
PDF
When Search becomes Research and Research becomes Search
PDF
Introduction for skills seminar on Search and Data Mining, Master of European...
PDF
Big Data in the Arts and Humanities
PPTX
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"
Plale HathiTrust El Colegio de Mexico May2014
Bridging Digital Humanities Research and Big Data Repositories of Digital Text
Case Study Big Data: Socio-Technical Issues of HathiTrust Digital Texts
Open Data and the Panton Principles in the Humanities
When Search becomes Research and Research becomes Search
Introduction for skills seminar on Search and Data Mining, Master of European...
Big Data in the Arts and Humanities
Zarneger "Supporting AI: Best Practices for Content Delivery Platforms"

What's hot (20)

PPTX
Research Data Management in the Humanities and Social Sciences
PPTX
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
PDF
Building Capacity for Open Science
PPTX
CST4599 July 2020
PPTX
Research into Practice case study 2: Library linked data implementations an...
PDF
Linked Open Data for Digital Humanities
PPTX
Research Data Management at the University of Edinburgh
PPTX
Research data management: a tale of two paradigms:
PPTX
Digital Humanities by Ingrid Thomson
PPTX
From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
PDF
Research Data in the Arts and Humanities: A Few Tricky Questions
PPT
Pampel/Bertelnmann/Hobohm: Data Librarianship
PPT
Research 101 for Mid-Career Students
PDF
co:op-READ-Convention Marburg - Milena Dobreva
PPTX
The liaison librarian: connecting with the qualitative research lifecycle
PDF
From DARPA to Shakespeare: All the Data we Can Handle
PDF
Introducing Web of Science Profiles
PPTX
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
PPTX
Making Sense of Digital Humanities: a Conversation Starter
PPT
Cosi Usage Data
Research Data Management in the Humanities and Social Sciences
Research as infrastructure, Digital Humanities Congress, Sheffield 2012
Building Capacity for Open Science
CST4599 July 2020
Research into Practice case study 2: Library linked data implementations an...
Linked Open Data for Digital Humanities
Research Data Management at the University of Edinburgh
Research data management: a tale of two paradigms:
Digital Humanities by Ingrid Thomson
From Theory to Practice: Can Opennesss Improve the Quality of OER Research?
Research Data in the Arts and Humanities: A Few Tricky Questions
Pampel/Bertelnmann/Hobohm: Data Librarianship
Research 101 for Mid-Career Students
co:op-READ-Convention Marburg - Milena Dobreva
The liaison librarian: connecting with the qualitative research lifecycle
From DARPA to Shakespeare: All the Data we Can Handle
Introducing Web of Science Profiles
CUA Humanities Lecture on Scholarly Communications LSC634 Fall2014
Making Sense of Digital Humanities: a Conversation Starter
Cosi Usage Data
Ad

Similar to The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework (20)

PPTX
JCDL 2015 Tutorial Opening Slides
PPTX
The HathiTrust Research Center: An Overview of Advanced Computational Services
PDF
The HathiTrust Research Center (HTRC): An Overview and Demo
PPTX
HathiTrust Research Center Data Capsule Overview 09.10.14
PPTX
Teaching Data Science to Undergraduate Students
PPTX
Building a Public Research Center for the HathiTrust Digital Library
PDF
SGCI - Science Gateways: Sustainability via On-Campus Teams
PPTX
Open data & knowledge solutions - a cgiar perspective dileep
PDF
BLC & Digital Science: Mark Hahnel, Figshare
PPT
Curation Service Models - Michael Witt - RDAP12
PPTX
Immersive informatics - research data management at Pitt iSchool and Carnegie...
PPTX
Embedding OA within research practice: the HHuLOA JISC OA PathFinder project
PDF
SGCI Science Gateways: Software sustainability via on-campus teams - Webinar ...
PPTX
Introduction to UC San Diego’s Integrated Digital Infrastructure
PDF
African Open Science Platform: Pilot Phase
PPTX
RDM skills
PPTX
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
PPTX
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
PPTX
Data Strategy and Services at the British Library: Data, Software and PIDs
PDF
SGCI - URSSI - Research Software Engineers, Science Gateway Developers and Cy...
JCDL 2015 Tutorial Opening Slides
The HathiTrust Research Center: An Overview of Advanced Computational Services
The HathiTrust Research Center (HTRC): An Overview and Demo
HathiTrust Research Center Data Capsule Overview 09.10.14
Teaching Data Science to Undergraduate Students
Building a Public Research Center for the HathiTrust Digital Library
SGCI - Science Gateways: Sustainability via On-Campus Teams
Open data & knowledge solutions - a cgiar perspective dileep
BLC & Digital Science: Mark Hahnel, Figshare
Curation Service Models - Michael Witt - RDAP12
Immersive informatics - research data management at Pitt iSchool and Carnegie...
Embedding OA within research practice: the HHuLOA JISC OA PathFinder project
SGCI Science Gateways: Software sustainability via on-campus teams - Webinar ...
Introduction to UC San Diego’s Integrated Digital Infrastructure
African Open Science Platform: Pilot Phase
RDM skills
Can’t Pay, Won’t Pay, Don’t Pay: Delivering open science, a Digital Research...
Meeting Federal Research Requirements for Data Management Plans, Public Acces...
Data Strategy and Services at the British Library: Data, Software and PIDs
SGCI - URSSI - Research Software Engineers, Science Gateway Developers and Cy...
Ad

More from Robert H. McDonald (20)

PDF
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
PPTX
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
PDF
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
PPTX
TLT Discussion on "Saving My Stuff" - 06.05.15
PPTX
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
PPTX
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
PPTX
ER&L 2015 Closing Keynote Slides
PPT
Owning the Discovery Experience for Your Patrons
PPTX
Kuali OLE: Enabling Choices for Libraries
PPTX
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
PPTX
SCONUL Kuali OLE Briefing
PPTX
SEAD Datanet and Sustainability Science
PPTX
New Perspectives for Business Intelligence: Library and Research Technologies...
PPTX
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
PPTX
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
PPTX
Kuali OLE @ LITA Forum 2012
PPTX
HathiTrust Research Center: The Fast Version
PPTX
HTRC Architecture Overview
PPTX
Building a Data Discovery Network for Sustainability Science
PPT
Panel Session: VIVO and the data culture of universities-VIVO@IU
ER&L The Role of Choice in the Future of Discovery Evaluations Panel
The HathiTrust Research Center: Enabling New Knowledge Through Shared Infras...
Academic Libraries and Big Data: Trends in Collection, Publication, Preservat...
TLT Discussion on "Saving My Stuff" - 06.05.15
Elephant in the Room: Scaling Storage for the HathiTrust Research Center
Creating Sustainable Communities in Open Data Resources: The eagle-i and VIVO...
ER&L 2015 Closing Keynote Slides
Owning the Discovery Experience for Your Patrons
Kuali OLE: Enabling Choices for Libraries
Charleston Seminar Being Earnest with our Collections - Legacy to Cloud
SCONUL Kuali OLE Briefing
SEAD Datanet and Sustainability Science
New Perspectives for Business Intelligence: Library and Research Technologies...
Kuali OLE: Deep Library Collaboration and the Release of a Community-Sourced ...
GOKb & KB+: An International Partnership to leverage Open Access and Communit...
Kuali OLE @ LITA Forum 2012
HathiTrust Research Center: The Fast Version
HTRC Architecture Overview
Building a Data Discovery Network for Sustainability Science
Panel Session: VIVO and the data culture of universities-VIVO@IU

Recently uploaded (20)

PPTX
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
PDF
Complications of Minimal Access Surgery at WLH
PDF
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
PPTX
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
PDF
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
PPTX
Microbial diseases, their pathogenesis and prophylaxis
PDF
A systematic review of self-coping strategies used by university students to ...
PPTX
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
PDF
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
PPTX
GDM (1) (1).pptx small presentation for students
PDF
Abdominal Access Techniques with Prof. Dr. R K Mishra
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PDF
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
PDF
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
PPTX
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
PDF
Module 4: Burden of Disease Tutorial Slides S2 2025
PPTX
Final Presentation General Medicine 03-08-2024.pptx
PPTX
human mycosis Human fungal infections are called human mycosis..pptx
PDF
2.FourierTransform-ShortQuestionswithAnswers.pdf
PPTX
Pharma ospi slides which help in ospi learning
Tissue processing ( HISTOPATHOLOGICAL TECHNIQUE
Complications of Minimal Access Surgery at WLH
grade 11-chemistry_fetena_net_5883.pdf teacher guide for all student
Introduction-to-Literarature-and-Literary-Studies-week-Prelim-coverage.pptx
A GUIDE TO GENETICS FOR UNDERGRADUATE MEDICAL STUDENTS
Microbial diseases, their pathogenesis and prophylaxis
A systematic review of self-coping strategies used by university students to ...
PPT- ENG7_QUARTER1_LESSON1_WEEK1. IMAGERY -DESCRIPTIONS pptx.pptx
OBE - B.A.(HON'S) IN INTERIOR ARCHITECTURE -Ar.MOHIUDDIN.pdf
GDM (1) (1).pptx small presentation for students
Abdominal Access Techniques with Prof. Dr. R K Mishra
Final Presentation General Medicine 03-08-2024.pptx
Saundersa Comprehensive Review for the NCLEX-RN Examination.pdf
Chapter 2 Heredity, Prenatal Development, and Birth.pdf
1st Inaugural Professorial Lecture held on 19th February 2020 (Governance and...
Module 4: Burden of Disease Tutorial Slides S2 2025
Final Presentation General Medicine 03-08-2024.pptx
human mycosis Human fungal infections are called human mycosis..pptx
2.FourierTransform-ShortQuestionswithAnswers.pdf
Pharma ospi slides which help in ospi learning

The HathiTrust Research Center: Big Data Analytics in a Secure Data Framework

  • 1. The HathiTrust Research Center: Big Data Analytics in a Secure pti.iu.edu/sc14 Data Framework @hathitrust #SC14 Beth Plale | @bplale Director Data to Insight Center | Indiana University Robert H. McDonald | @mcdonald Deputy Director Data to Insight Center | Indiana University
  • 2. pti.iu.edu/sc14 @hathitrust #SC14 Outline • What is the HTRC? • Non-Consumptive Research Paradigm • Current Architecture • Future Architecture • Advanced Collaborative Support (RFP) • HTRC Science on a Sphere • HTRC @ Events
  • 3. pti.iu.edu/sc14 @hathitrust #SC14 HathiTrust Digital Library • HathiTrust is a partnership of 90+ academic & research institutions, offering a collection of millions of digitized titles. • http://hathitrust.org – IU is a founding member of the HathiTrust along with University of Michigan, University of California, and the University of Virginia
  • 4. @hathitrust #SC14 HathiTrust Research Center Mission • Public research arm of HathiTrust • Goal: enable researchers world-wide to accomplish tera-scale pti.iu.edu/sc14 text data-mining and analysis – Develop cutting-edge software tools for processing, analyzing text – Develop cyberinfrastructure to enable HPC access to the HathiTrust Digital Library • Established: July, 2011 • Collaborative center: Indiana University & University of Illinois
  • 5. pti.iu.edu/sc14 @hathitrust #SC14 HTRC Timeline • Phase I: development 01 Jul 2011 – 31 Mar 2013 – HTRC software and services release v1.0 https://github.com/htrc • Phase II: outreach, 01 Apr 2013 – 30 June 2014 – 2nd HTRC UnCamp Sep ’13 • Phase III: operations, 01 July 2014 - present
  • 6. pti.iu.edu/sc14 @hathitrust #SC14 HTRC Current Users Projected Use 2019 Digital Humanities (60) Education (60) Informatics (60) Observers (20) 194 existing user accounts Lots of user accounts; good starting point. Improve : • Increase amount of real work being accomplished as measured by usage on HTRC’s compute resources Quarry and Big Red II at IU • Develop educational uses • Develop informatics uses • Decrease number of observers to 10%  Project 200 users at any one time of which 90% are doing relevant education/scholarship 6
  • 7. pti.iu.edu/sc14 @hathitrust #SC14 Non-Consumptive Research Paradigm • No action or set of actions on part of users, either acting alone or in cooperation with other users over duration of one or multiple sessions can result in sufficient information gathered from collection of copyrighted works to reassemble pages from collection. • Definition disallows collusion between users, or accumulation of material over time. Differentiates human researcher from proxy which is not a user. Users are human beings.
  • 8. pti.iu.edu/sc14 @hathitrust #SC14 HTRC All the complexity Complexity hiding interface Request Spatial plots Statistical plots Tabular info
  • 10. pti.iu.edu/sc14 @hathitrust #SC14 HTRC Goals • Provide a persistent and sustainable structure to enable original and cutting edge research. – Leverage data storage and computational infrastructure at Indiana & Illinois – Stimulate community development of new functionality and tools – Use tools to enable discoveries that would not be possible without the HTRC • Enable scholars to fully utilize content of HathiTrust Library while preventing intellectual property misuse within U.S. copyright law. – Provision secure computational and data environment for scholars to perform research using HathiTrust Digital Library.
  • 11. pti.iu.edu/sc14 @hathitrust #SC14 HTRC Organization 2014-18 HTRC Executive Mgmt Administrative Support Core Development Advanced Research Advanced Collaborative Support Scholarly Commons
  • 12. HTRC Data Capsule pti.iu.edu/sc14 @hathitrust #SC14 HTRC Data Capsule@IU Team • Beth Plale (PI) • Jiaan Zeng • Guangchen Ruan HTRC Data Capsule@Michigan Team • Atul Prakash (PI) • Alexander Crowell Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031 Special Thanks to • Samitha Liyanage • Milinda Pathirage • Zong Peng • Earlence Fernandes • Ajit Aluri
  • 13. User Authentication pti.iu.edu/sc14 @hathitrust #SC14 HTRC Data Capsule VM-1 … Host-1 Web UI Web Services Hypervisor Scripts … Database Firewall Audit Image Store Volume Store VM-k VM-1 … VM-k Host-N Web front end Web service Backend
  • 14. @hathitrust #SC14 HTRC Data Capsule Workflow pti.iu.edu/sc14
  • 15. @hathitrust #SC14 Data Capsule Screenshots pti.iu.edu/sc14 Maintenance Mode Secure Mode
  • 16. pti.iu.edu/sc14 @hathitrust #SC14 HTRC Science on a Sphere #SC14 1. Texts published per country 2. HathiTrust Member Institutions 3. HT Google analytics
  • 17. @hathitrust #SC14 HTRC Advanced Collaborative Support • ACS will be offered on a rolling basis over next pti.iu.edu/sc14 four years 2014-18 • 1st RFP Call Deadline is Jan 8, 2015 5:00pm eastern – RFP - http://www.hathitrust.org/htrc/acs-rfp • For more info on the Advanced Collaborative Support please contact: htrc.acs.awards@gmail.com
  • 18. pti.iu.edu/sc14 @hathitrust #SC14 HTRC@Events • DHCS 2014, Oct 22, 2014 Evanston, IL • SC14 – IU Booth, Nov 17-19, 2014, New Orleans, LA • CLIR/CNI Workshop on Expanded Access to Collections, Dec. 7, 2014, Washington, DC • HTRC UnCamp 2015 – March 30-31, 2015 Ann Arbor, MI
  • 19. pti.iu.edu/sc14 @hathitrust #SC14 Thank You HTRC IU Team • Beth Plale (PI) • Robert H. McDonald • Miao Chen • Guangchen Ruan • Zong Peng • Milinda Pathirage • Samitha Liyanage • Leena Unnikrishnan • Nicholae Cline HTRC UIUC Team • J. Stephen Downie (PI) • Beth Namachchivaya • Megan Senseney • Sayan Bhattacharyya • Colleen Fallaw • Loretta Auvil • Boris Capitanu • Harriet Green
  • 20. @hathitrust #SC14 More Information on HTRC • For details http://www.hathitrust.org/htrc/faq • General contact info pti.iu.edu/sc14 – J. Stephen Downie, Co-Director HTRC, jdownie@Illinois.edu – Beth Plale, Co-Director HTRC, plale@indiana.edu • Requests for capability, interest – Miao Chen, Asst. Director for Outreach HTRC miaochen@indiana.edu
  • 21. @hathitrust #SC14 The HathiTrust Research Center: Big Data Analytics in a Secure pti.iu.edu/sc14 Data Framework For more on HTRC: http://www.hathitrust.org/htrc For these slides go to:

Editor's Notes

  • #9: HTRC hides complexity of analytics. In this sense, it is like Google search, which is a simple interface that hides complexity to search billions of pages. The kinds of things returned from HTRC interaction are spatial relationship of words (and their frequency obviously), statistical plots of information or tabular information.
  • #10: Shifting the complexity hiding interface to the right, we open up the cloud to see what’s inside. HTRC at it simplest has 1) algorithms – these are drawn from SEASR and from other analysis tool suites including Mahout and mapreduce, the 2) HT corpus (and subsets of the corpus that users either have personally as part of a workset, or are publically available, and 3) other data sets that are used. HTRC brokers the bringing together of these pieces so that computation can take place on a resource like Big Red II (or XSEDE). Note that there is an arrow from the compute engine to the complexity hiding interface. This is because researcher interaction with the texts isn’t an automated workflow; it is one requiring levels of interaction with the computation as it is running.
  • #13: Jiaan Zeng, Guangchen Ruan, Alexander Crowell, Atul Prakash, and Beth Plale. 2014. Cloud computing data capsules for non-consumptiveuse of texts. In Proceedings of the 5th ACM workshop on Scientific cloud computing (ScienceCloud '14). ACM, New York, NY, USA, 9-16. DOI=10.1145/2608029.2608031 http://doi.acm.org/10.1145/2608029.2608031
  • #17: 1.) Texts published per country Data were from the Gender metadata work. It was used because it has volume authors and country of publication information. The total records have 60K volumes, with some country fields missing 2.) HathiTrust Member institutions It maps geolocations of the UnCamp 2013 participants; The text band shows UnCamp 15’ and 13M books available for non-consumptive use soon 3.) HT Google analytics Shows HT webpage use over the time, by aggregating over the quarter A drop around 2013 summer: could possibly be cause by summer break