SlideShare a Scribd company logo
Big Data and You
Preparing Current & Future Information Specialists

Sands Fish
Data Scientist / MIT Libraries
@sandsfish
sands@mit.edu
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Knowing in the Age of
Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Nothing is static.
Everything is connected.
Knowledge representation
is now complex
Scholarly Primitives
- Discovering
- Annotating
- Comparing
- Referring
- Sampling
- Illustrating
- Representing
John Unsworth, 2000.
http://people.brandeis.edu/~unsworth/Kings.5-00/primitives.html
Complex Knowledge
Objects
- Have multiple representations & ways of being consumed
- Can be a link in a chain, node in a graph, or ecosystem of
knowledge.

- Allow different perspectives or ways to ask questions of.

(none of these are true of physical books)
Complex Knowledge
Objects
Data Examples:
•
•
•
•
•

JSON, XML, etc. esp. from a URL that allows it to be updated
Visualizations, sonifications, etc. (mind-maps, interactives)
Geospatial data, layered, constrained by area
APIs
Linked Data, integrated with many other resources
Complex Knowledge
Objects
• Tool / Platform Examples:
•
•
•
•
•

Integrated Data Platforms
Courseware
MOOCs
Interactive Visualizations
Commons-based peer production
(wikis, reviews, software, etc.)
• Tweets
• Data analysis tools
• Data Enclaves (limited access processing endpoints)
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Methods of Exploration
In this diverse ecosystem, there is no one way of exploring a
topic.
- Manual Browsing

- Automated Spidering (e.g. Berkman / Media Cloud)
- Collection / Trawling (e.g. Browser Plugins)
- Conventional Big Data (e.g. Hadoop, Map/Reduce)
- Using Linked Data to branch out through related concepts
- Algorithmic Data Processing (e.g. Topic Modeling)
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Sands Fish - Knowing in the Age of Networked Knowledge
Problems of Completeness
- When do you know that you have enough information?

- What kind of compromises are made when information is
more massive than anyone can consume?
Problems of Integration
When data comes from many different silos, in many
different structures and formats, how do you bring all of this
knowledge together?

- One solution is RDF, which provides a common data
generic data model. Collaborative ontology development
can allow communities to work together.
- Open standards.
- Build tools and services that provide easy access to the
underlying data.
How To Get A Grip
- Keep abreast of W3C developments and other standards
bodies.
- Don’t focus too much on single technologies. They will
shift quickly.
- Learn at least one data visualization technology.
- Remember to frame questions of data in more than one
way.
- Ask your own questions of the data yourself. Understand
it from the point of the user.
Sands Fish - Knowing in the Age of Networked Knowledge

More Related Content

PPTX
DERI Stream Meeting 2010: What I'm working on
PDF
Semantic Metadata Interoperability in Digital Libraries
PDF
Finding learning resources through Web Data
PPT
Getaneh Alemu
PPTX
Research Data Services at the University of Utah
PPT
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
PPTX
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
PPTX
Research into Practice case study 2: Library linked data implementations an...
DERI Stream Meeting 2010: What I'm working on
Semantic Metadata Interoperability in Digital Libraries
Finding learning resources through Web Data
Getaneh Alemu
Research Data Services at the University of Utah
Linked Data for Libraries: Benefits of a Conceptual Shift from Library-Specif...
American Art Collaborative Linked Open Data presentation to "The Networked Cu...
Research into Practice case study 2: Library linked data implementations an...

What's hot (20)

PDF
Rise of the Databrarian - Jeroen Rombouts
PDF
Library Connect Webinar - Data Sharing
PDF
Freedman Center for Digital Scholarship Colloquium - 14_1106
PPT
Towards a digital library for York
PDF
Linked Data for Knowledge Discovery: Introduction
PPT
Organising and Documenting Data
PDF
Why should semantic technologies pay more attention to privacy... and vice-ve...
PPTX
The Blossoming of the Semantic Web
PPTX
DH2012_Bellamy
PPTX
Networked Science, And Integrating with Dataverse
PPTX
Presentation to KILT
PPT
Incremental idcc 08_12_10_slideshare
PPTX
Your digital humanities are in my library! No, your library is in my digital ...
PPTX
Introduction to databases and metadata
PPT
MANTRA & Open Educational Resources
PPTX
Engaging the Researcher in RDM
PPTX
Is Linked Open Data the way forward?
PPTX
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
PPTX
The liaison librarian: connecting with the qualitative research lifecycle
Rise of the Databrarian - Jeroen Rombouts
Library Connect Webinar - Data Sharing
Freedman Center for Digital Scholarship Colloquium - 14_1106
Towards a digital library for York
Linked Data for Knowledge Discovery: Introduction
Organising and Documenting Data
Why should semantic technologies pay more attention to privacy... and vice-ve...
The Blossoming of the Semantic Web
DH2012_Bellamy
Networked Science, And Integrating with Dataverse
Presentation to KILT
Incremental idcc 08_12_10_slideshare
Your digital humanities are in my library! No, your library is in my digital ...
Introduction to databases and metadata
MANTRA & Open Educational Resources
Engaging the Researcher in RDM
Is Linked Open Data the way forward?
Hiberlink: Prototypes of pro-active approaches to support the archiving of we...
The liaison librarian: connecting with the qualitative research lifecycle
Ad

Viewers also liked (14)

PPTX
Evaluation of preliminary task
PPTX
Plan for music magazine
PPTX
Marcela ocampo
PPTX
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
PPTX
Roman, britanian roman, peninggalan bangsa roman
PPT
A Life Changing Opportunity
PPTX
Evaluation question 3
PDF
Huge UX Design School Exercise
PDF
Handsketches for Huge Design Exercise
DOCX
Makalah Raeding
PDF
Makalah
PPTX
Dampak IPTEK Terhadap kehidupan Sosial
PDF
Makalah ISBD(manusia dan lingkungan)
Evaluation of preliminary task
Plan for music magazine
Marcela ocampo
Pushing MODIS to the edge: high-resolution applications of moderate-resolutio...
Roman, britanian roman, peninggalan bangsa roman
A Life Changing Opportunity
Evaluation question 3
Huge UX Design School Exercise
Handsketches for Huge Design Exercise
Makalah Raeding
Makalah
Dampak IPTEK Terhadap kehidupan Sosial
Makalah ISBD(manusia dan lingkungan)
Ad

Similar to Sands Fish - Knowing in the Age of Networked Knowledge (20)

PPTX
Data analytics introduction
PDF
lec1_ref.pdf
PDF
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
PPTX
Foundations of Big Data: Concepts, Techniques, and Applications
PPTX
Big data analytics
PPTX
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
PDF
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
PPTX
AI Project Cycle Summary Class ninth please
PDF
20CS601 - Big data Analytics - types of data , definition of big data
PPT
ai based computer basic learning Lecture about Bigdata.ppt
PPT
130214 copy
PDF
Mining Big Data to Predicting Future
PDF
Big Data Analytics Introduction chapter.pdf
PPTX
SKILLWISE-BIGDATA ANALYSIS
PDF
Lecture1 introduction to big data
PPTX
Learning analytics and Big Data: A tentative exploration
PPTX
Big Data Driven Solutions to Combat Covid' 19
PPTX
MIS Big Data & Data Analytics.pptx
DOCX
Introduction to big data – convergences.
PDF
Provenance in Data Science From Data Models to Context Aware Knowledge Graphs...
Data analytics introduction
lec1_ref.pdf
Be3 experimentingbigdatainabox-part1:comprehendingthescenario
Foundations of Big Data: Concepts, Techniques, and Applications
Big data analytics
PPT 1.1.2.pptx ehhllo hi hwi bdfhd dbdhu
EDF2013: Invited Talk Julie Marguerite: Big data: a new world of opportunitie...
AI Project Cycle Summary Class ninth please
20CS601 - Big data Analytics - types of data , definition of big data
ai based computer basic learning Lecture about Bigdata.ppt
130214 copy
Mining Big Data to Predicting Future
Big Data Analytics Introduction chapter.pdf
SKILLWISE-BIGDATA ANALYSIS
Lecture1 introduction to big data
Learning analytics and Big Data: A tentative exploration
Big Data Driven Solutions to Combat Covid' 19
MIS Big Data & Data Analytics.pptx
Introduction to big data – convergences.
Provenance in Data Science From Data Models to Context Aware Knowledge Graphs...

Recently uploaded (20)

PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PPT
Teaching material agriculture food technology
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Empathic Computing: Creating Shared Understanding
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
Cloud computing and distributed systems.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Spectroscopy.pptx food analysis technology
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
Teaching material agriculture food technology
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Empathic Computing: Creating Shared Understanding
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Cloud computing and distributed systems.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Spectroscopy.pptx food analysis technology
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Profit Center Accounting in SAP S/4HANA, S4F28 Col11

Sands Fish - Knowing in the Age of Networked Knowledge

  • 1. Big Data and You Preparing Current & Future Information Specialists Sands Fish Data Scientist / MIT Libraries @sandsfish sands@mit.edu
  • 4. Knowing in the Age of Networked Knowledge
  • 14. Scholarly Primitives - Discovering - Annotating - Comparing - Referring - Sampling - Illustrating - Representing John Unsworth, 2000. http://people.brandeis.edu/~unsworth/Kings.5-00/primitives.html
  • 15. Complex Knowledge Objects - Have multiple representations & ways of being consumed - Can be a link in a chain, node in a graph, or ecosystem of knowledge. - Allow different perspectives or ways to ask questions of. (none of these are true of physical books)
  • 16. Complex Knowledge Objects Data Examples: • • • • • JSON, XML, etc. esp. from a URL that allows it to be updated Visualizations, sonifications, etc. (mind-maps, interactives) Geospatial data, layered, constrained by area APIs Linked Data, integrated with many other resources
  • 17. Complex Knowledge Objects • Tool / Platform Examples: • • • • • Integrated Data Platforms Courseware MOOCs Interactive Visualizations Commons-based peer production (wikis, reviews, software, etc.) • Tweets • Data analysis tools • Data Enclaves (limited access processing endpoints)
  • 23. Methods of Exploration In this diverse ecosystem, there is no one way of exploring a topic. - Manual Browsing - Automated Spidering (e.g. Berkman / Media Cloud) - Collection / Trawling (e.g. Browser Plugins) - Conventional Big Data (e.g. Hadoop, Map/Reduce) - Using Linked Data to branch out through related concepts - Algorithmic Data Processing (e.g. Topic Modeling)
  • 30. Problems of Completeness - When do you know that you have enough information? - What kind of compromises are made when information is more massive than anyone can consume?
  • 31. Problems of Integration When data comes from many different silos, in many different structures and formats, how do you bring all of this knowledge together? - One solution is RDF, which provides a common data generic data model. Collaborative ontology development can allow communities to work together. - Open standards. - Build tools and services that provide easy access to the underlying data.
  • 32. How To Get A Grip - Keep abreast of W3C developments and other standards bodies. - Don’t focus too much on single technologies. They will shift quickly. - Learn at least one data visualization technology. - Remember to frame questions of data in more than one way. - Ask your own questions of the data yourself. Understand it from the point of the user.

Editor's Notes

  • #6: We have crossed technological thresholds in the past, where the scale of operational parameters exceeded human conceptualization. 10-60rpms.
  • #7: 1000-6000rpms
  • #8: 40,000rpms
  • #9: We are crossing a similar threshold now with big data. Mathematical complexity in 3 dimensions is conceivable.
  • #10: Anything beyond that is basically impossible to visualize or conceptualize, even though the Euclidean geometry scales and continues to work for things like the Vector Space Model for representing high-dimensionality text for data mining.
  • #11: The scale of network complexity has crossed this threshold as well. (2007)
  • #12: Linked Open Data Cloud, 2010. Has scaled enormously since.
  • #13: This all leaves us in an environment where the knowledge we need to educate ourselves about a given topic are not static, or found in a stack of books, but live, connected, spanning the borders of conventional containers.
  • #14: We now have to contend with new forms of knowledge, new ways of discovering it, and new challenges to making use of it.Amazon deleted copies of Farrentheit 451 off of users’ Kindles. We are no longer in a world where knowledge is a simple, static asset. If authors can change their writing, the government can change their data.
  • #15: Unsworth’s Scholarly Primitives are worth considering when planning for high-level functionality in a world where we rely on abstractions in the form of tools and algorithms to represent things in a lower-dimensionality. What are the tasks we need to provide access to on top of great scale.
  • #16: Knowledge is being represented in vastly more complex structures than it used to be. Previously, books, tables, encyclopedias, and simple web pages were the standard. Now we have a heterogeneous mix of knowledge representation. These go beyond the page or document. They have the following properties.
  • #17: They range from the concrete to the abstract.
  • #19: Enigma.io, connecting disparate but linkable gov’t data sets.
  • #20: WikiData as a complex knowledge object / ecosystem.
  • #21: Tweet Metadatahttp://stackoverflow.com/questions/16600099/how-to-output-json-data-correctly-using-php
  • #22: Mind maps (even in planning for this talk) are complex knowledge objects.
  • #23: Browsing History Data; Civic Media; Catherine D’Ignazio doing work to learn about where you pay attention to and where you don’t. Browsing trackers are not independent, but networked.http://skyeome.net/wordpress/?paged=2
  • #25: Linked Data Wanderer: Archie Bunker -> Singing -> List of Sovereign States -> Republican Party -> Imigrants to Cuba -> Human Voice -> Homeward Bound (the album)
  • #26: Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
  • #27: Highly linked.Influence sub-graph on top of live DBpedia data. ( @sandsfish – http://dbpeople.herokuapp.com )
  • #28: Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
  • #29: Geo-parsing of place mentions in MIT Open Access collection. http://dspace.mit.edu
  • #30: For an Open Access collection to be useful to a wide population, it needs to be exposed for mining.I’m building an API to allow anyone to mine this information, instead of being limited to the representation of it on a web page or in a PDF. @sandsfish
  • #33: If you primarily work with raw data, or are a librarian, learn even the most basic methods of visualization. This will give you a vocabulary with which to interact with data owners or patrons, and provide a better way of conceptualizing what questions are possible with data.Shifting your perspective from geographic to temporal, or from a single answer to a range of possibilities will help expand the knowledge you can acquire about a topic. This is one of the benefits of knowledge being complex.