SlideShare a Scribd company logo
Big Data =         Bigger Meta
O’Reilly Strata Conference
February 29 2012
Pivot/Skate, etc…
   Founded 2003
    Poor man’s GIS
    Panamap

   Refounded 2006
    Neighborhood boundaries
    Mass transit data


   Refocused 2009
    SaaS for mapping + on-demand data
Achtung!

     NoSQL is no panacea
           Big Data isn’t about data
           Big Data isn’t new
           Big Data doesn’t present a Boolean quandary
           With power comes responsibility
            AWS bills
            Lady Gaga tweets
            Innumeracy (correlation v causation)
Big v Important

  Big                         Important
        Heterogeneous            Well-defined schema
        Raw                      High value (not free)
        Distributed              Test-driven
        Streaming/real time      Relational
        Search for meaning       Historical
        Time-sensitive           Enterprise-focused
        Philosophical
Data Exhaust


     Analytics                  Probes




                 Social Media            Gov 2.0
Platforms




 Commoditization of compute and storage
A Brief History of Metadata




       Callimachus            Library of Alexandria, Egypt
A Brief History of Metadata

                              “Pinakes” (lists)
                                  Title
                                  Category
                                  Author
                                  Author birthplace
                                  Father
                                  Word count




       Callimachus
A Brief History of Metadata
A Brief History of Metadata
A Brief History of Metadata




Card catalog room,
Library of Congress c. 1920
A Brief History of Metadata

 Dewey Decimal System goes electronic in 1967
Out with the Old, in with the New




Archiving card catalogs
after digitization
Why Can’t We Be Together?


      Metadata              Data
Exponential Growth in Data


         Unprecedented rate of data creation, 1995-today
Data




       Pinakes                                     Catalog     Taxonomy Database




         300 BC                                      1595 AD         1876   1970
Oh, How I’ve Missed You


The reunification of metadata
and the artifact
Together At Last
GIS Data is Unevolved




               +        =
Enter the Data Curator


Part social scientist, part librarian,
part statistician, part RDBMS wiz
DIKW Model
    Data
        Fact, Signal, Symbol
    Information
        Structural v Functional
        Symbolic v Subjective
    Knowledge
        Processed
        Procedural
        Propositional
Popularity (Google Trends)
Words to Live By




                   dx /
                          dt
Thank you!
ian@urbanmapping.com
@urbanmapping




                        R.I.P.
                       Schema

More Related Content

PPTX
Getting your head around big data
PPTX
Demystifying Data Science & Analytics - 757ColorCoded 2019
PDF
Small, Medium and Big Data
PPT
Martin Stabe, interactive producer, Financial Times
PPTX
Analysis of Make data more human - TED Talk by Jer Thopr
PDF
Reusing Collection Metadata as Data
PPTX
Big data perspective solution & technology
PPTX
Big Data Infrastructure and Analytics Solution on FITAT2013
Getting your head around big data
Demystifying Data Science & Analytics - 757ColorCoded 2019
Small, Medium and Big Data
Martin Stabe, interactive producer, Financial Times
Analysis of Make data more human - TED Talk by Jer Thopr
Reusing Collection Metadata as Data
Big data perspective solution & technology
Big Data Infrastructure and Analytics Solution on FITAT2013

Viewers also liked (20)

PPT
The Big Metadata
PDF
Understanding Metadata: Why it's essential to your big data solution and how ...
PDF
Creating a Modern Data Architecture
PPTX
JOSA TechTalk: Metadata Management
in Big Data
PPTX
Data Harmony Thesaurus Master®
PDF
3 dw architectures
PDF
10 razones para quiebran un emprendimiento (2)
PDF
Big Data Madison: Architecting for Big Data (with notes)
PDF
Self-Service Access and Exploration of Big Data
PPTX
Inline Tagging and Dictionary Connection
PPT
Convergence and Interoperability (IFLA 2011)
PDF
Work In Progress
PPS
The Design of Data
DOC
Project-imp Report 02
PPT
მშობლიურის აქტივობა
PPTX
Paolo ciccarese DILS 2013 keynote
PDF
Chapter 2 5
PPT
Assistive Technology Webquest
PPT
დედაენა
PPT
An Integrated Solution for Runtime Compliance Governance in SOA
The Big Metadata
Understanding Metadata: Why it's essential to your big data solution and how ...
Creating a Modern Data Architecture
JOSA TechTalk: Metadata Management
in Big Data
Data Harmony Thesaurus Master®
3 dw architectures
10 razones para quiebran un emprendimiento (2)
Big Data Madison: Architecting for Big Data (with notes)
Self-Service Access and Exploration of Big Data
Inline Tagging and Dictionary Connection
Convergence and Interoperability (IFLA 2011)
Work In Progress
The Design of Data
Project-imp Report 02
მშობლიურის აქტივობა
Paolo ciccarese DILS 2013 keynote
Chapter 2 5
Assistive Technology Webquest
დედაენა
An Integrated Solution for Runtime Compliance Governance in SOA
Ad

Similar to Big Data = Bigger Metadata (20)

PDF
STI Summit 2011 - Digital Worlds
PPT
Normalization: A Workshop for Everybody Pt. 1
PPT
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
PPT
introduction to data minining and unit iii
PDF
There's no such thing as big data
PPTX
Tech4Africa - Opportunities around Big Data
PPTX
What is a database (for non techies)
PPTX
NoSQL and MapReduce
PDF
Data Mining: Future Trends and Applications
PDF
Cs501 dm intro
PPTX
CBS CEDAR Presentation
PPTX
introduction to data warehousing and mining
PDF
Thinking of Linking
PDF
Data Monetization
DOCX
Base de datos historia
PDF
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
PDF
INF2190_W1_2016_public
PDF
Scaling Out With Hadoop And HBase
PPTX
Steve Watt Presentation
PDF
Big Metadata: Mining Special Collections Catalogs for New Knowledge
STI Summit 2011 - Digital Worlds
Normalization: A Workshop for Everybody Pt. 1
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...
introduction to data minining and unit iii
There's no such thing as big data
Tech4Africa - Opportunities around Big Data
What is a database (for non techies)
NoSQL and MapReduce
Data Mining: Future Trends and Applications
Cs501 dm intro
CBS CEDAR Presentation
introduction to data warehousing and mining
Thinking of Linking
Data Monetization
Base de datos historia
From Web Data to Knowledge: on the Complementarity of Human and Artificial In...
INF2190_W1_2016_public
Scaling Out With Hadoop And HBase
Steve Watt Presentation
Big Metadata: Mining Special Collections Catalogs for New Knowledge
Ad

More from Ian White (8)

PPTX
Everything about Data for SV2B in Vilnius, Lithuania
PDF
Departmental Seminar: Innovation
PDF
Tableau Customer Conference - Geographic Analysis
PDF
How Open Is Open (Redux)?
PPTX
Geotrends For 2011 And Beyond
PPTX
Dark Side Of Data
PPT
How Open Is Open?
PPTX
Location Doesn\'t Matter
Everything about Data for SV2B in Vilnius, Lithuania
Departmental Seminar: Innovation
Tableau Customer Conference - Geographic Analysis
How Open Is Open (Redux)?
Geotrends For 2011 And Beyond
Dark Side Of Data
How Open Is Open?
Location Doesn\'t Matter

Recently uploaded (20)

PDF
Roadmap Map-digital Banking feature MB,IB,AB
PDF
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
How to Get Business Funding for Small Business Fast
PDF
COST SHEET- Tender and Quotation unit 2.pdf
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
Cours de Système d'information about ERP.pdf
PPT
Lecture 3344;;,,(,(((((((((((((((((((((((
PPTX
ICG2025_ICG 6th steering committee 30-8-24.pptx
PDF
Nidhal Samdaie CV - International Business Consultant
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PDF
IFRS Notes in your pocket for study all the time
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
Roadmap Map-digital Banking feature MB,IB,AB
Solara Labs: Empowering Health through Innovative Nutraceutical Solutions
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
340036916-American-Literature-Literary-Period-Overview.ppt
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Principles of Marketing, Industrial, Consumers,
How to Get Business Funding for Small Business Fast
COST SHEET- Tender and Quotation unit 2.pdf
Tata consultancy services case study shri Sharda college, basrur
Elevate Cleaning Efficiency Using Tallfly Hair Remover Roller Factory Expertise
Unit 1 Cost Accounting - Cost sheet
Outsourced Audit & Assurance in USA Why Globus Finanza is Your Trusted Choice
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
Cours de Système d'information about ERP.pdf
Lecture 3344;;,,(,(((((((((((((((((((((((
ICG2025_ICG 6th steering committee 30-8-24.pptx
Nidhal Samdaie CV - International Business Consultant
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
IFRS Notes in your pocket for study all the time
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider

Big Data = Bigger Metadata

Editor's Notes

  • #3: Some background to Urban Mapping. Wasn’t a straight forward path, but it’s very relevant-started close to 10 yrs ago with a printed map that reveals different layers of thematic imagery—streets, subways, neighborhoods, depending on the angle of viewing. We all know what happened to print, so I shifted the business to a new medium-in 2006 or so we collected much of the same data, but now using a spatial database as opposed to regular old vector/adobe illustrator. The writing was on the wall for licensing content to local web publishers, so shifted again-this time we moved upstream—continue to develop our own data, but greatly expand that effort to include commercial data and deliver it through our own mapping service. We do this for customers in various market segments, like Tableau Software, where we perform a few geo-services like hosting the base map and overlaying data.
  • #4: I can be a bit of a curmudgeon and I hope a cautionary point of view has a place. Let’s talk about what Big Data is not. I’ll talk later about what it is.First thing to note is that Big Data isn’t really about data at all. But I am. It’s about tools and processes to manage and exploit info-nuggets. There’s nothing revolutionary about saying this, but I wanted to make it explicit. Second, big data isn’t especially new– Wall St and Walmart have been processing and deriving value for decades, but they don’t talk about it. Why? Because they make money doing so and don’t need to alert the competition. Anybody hear of Teradata? Whenever companies want to talk about what they are doing, it’s usually a red flag for me, meaning the technology, industry or something else hasn’t sufficiently evolved. But I’m also not saying Big Data is a rehash of enterprise software. More on that later…Finally, Big Data has democratized access to powerful tools at little cost. This doesn’t necessarily mean everybody knows how to use these tools. There can be some blowback, such as high credit card bills, analysis without direction/objective and lack of knowledge about basic statistics
  • #6: There’s been exponential growth in data and it comes from any number of places. Some are shown here—mobile devices as probes, which vast capabilities to record all kinds of environmental variables, open government, social media and a desire for analytics which has been rebranded as business intelligence,
  • #7: Processing and storage costs drop like rocks—enterprise software has been offering big solutions for decades to banking and others, but with incredibly low barriers to entry virtually anybody can participate.
  • #8: Kal-i-um-akuswas a noted poet in the Library of Alexandria in 3rd century BC.
  • #9: He created pin-a-keez, or Lists, a way of organizing works in the libraryEmbarked on the effort to organize 120k scrolls, by title, author, birthplace, father, education, summary of contents and other info. This was first effort to systematically create a bibliographic system. A direct link to metadata 2 millennia later
  • #10: 1595, Johan van der Does publishedNomenclator– this was the first instance of a printed catalog of library holdings. Represented a significant advancement over the Kal-i-um-akuslists, but it too close to two millennia to get here
  • #11: The modern cataloging system: Dewey Decimal System, created 1876. Its father was Melville DeweyThe Dewey Decimal System attempted to organize all knowledge into ten main classes. Further subdivided into ten divisions, and each division into ten sections, giving ten main classes, 100 divisions and 1000 sections. Allows for infinite hierarchy, numerical and faceted (linking content from different areas).Other systems followed: Universal Decimal Classification, Library of Congress, etc…
  • #12: This photo is from the Card Division at the Library of Congress in the1920s. The amount of physical metadata is astounding. Millions of library cards with metadata
  • #13: The next major advancement was in the late 1960s. Early attempts at electronic indexing focused on a taxonomy of keywords and related information. Was efficient for reporting on what the system contained, but also kept the long running divorce between artifact and metadataThe online computer library center was created as a nonprofit to further access to library resources across institutions and decrease costs.The OCLC acquired the Dewey Decimal System and as any standards body does, sought to perpetuate its existence over the decadesThen the internet happened
  • #14: That meant out wit the old, In with the new. This photo is library cards going into storage. Not sure why they’d even be archived after the transition to databases was made, but that’s for another time
  • #15: So this is the situation. Beginning in the late 60s, electronically-stored metadata began to grow. The library cards (at left) went away, but the bifurcation was complete. Total separation of the thing from the description of the thing. And it sort of made sense– IT was in its infancy, so storage and processing costs were high. Publishers also exerted a great deal of control over how they permitted libraries to index and make available works.
  • #16: To put the last 2000 years in perspective, Kal-i-um-akus created the first crude schema, leaving a place for metadata to be storedThe Nomenclator gave us the first bibliographic catalog, printed and bound, produced annuallyThe Dewey Decimal System was born in 1876 and was the basis of an extensive metadata system for published worksThen…the internet happened. In the top right you see the corner of a cloud. That’s my way of representing what happens next.The volume of data product grows exponentially, overtaking 2000 plus years of history in no time.
  • #17: So how about the bifurcation/divorce I mentioned? The web brought the artifact and metadata together again
  • #18: Google Books. Sure, we have the Dewey Decimal type stuff along with ISBN, retail price, etc…but we also threw in the whole damn book—full text search.Amazon does it too
  • #19: In my industry, the state of metadata is horrendous. We’re stuck in the green screen days. Proprietary data formats and slow moving vendors don’t help.While I’m the first person to admit GIS needs to get off its ass and change, radically, there’s also something the real time streaming web can learn from us.
  • #20: We hear about the rise of the curator, the part social scientist, part librarian, part RDBMS wiz and statistician.This is increasingly important across all industries—when dealing with a torrent of data, domain experts will be required to help make sense of it.
  • #21: The Knowledge Hierarchy, as it is sometimes known, has been used to represent relationships between the stuff that turns into something meaningful. You could look at this going from a letter to a sentence to a paragraph or an ingredient to a recipe to a meal or something else. The details don’t matter here, but I think about the fundamental building block of data.One geocoded tweet has little or no value on its own. Contrast that with per capital income for this ZIP code. By amassing enough geocoded tweets, it’s clear we can get to something meaningful, but I don’t know how many tweets that is. I do know that per capita income can directly inform my marketing plans for selling a new shampoo.
  • #22: With that, here’s some more wet blanket for everybody. Using Google Trends, I looked at a number of terms that might indicate the old fashioned RDBMS, SQL way of life and most seem to follow the blue line, which represents the term ‘metadata.’ Big Data, coincidentally, first appears a few months before the first Strata conference in 2011. ‘Curation’ has a longer life but doesn’t show the surge of Big Data, and everybody’s favorite ‘data scientist,’ doesn’t register as much more than a rounding error. I’m not using Google Trends to fully substantiate my argument, but I do hope you take a dose of skepticism before fully embracing ‘this.’
  • #23: In close, I’d like to leave you with an emergent cliché. It’s also my measure of how geeky an audience I have: one person’s metadata is another person’s data.