SlideShare a Scribd company logo
Metadata, Extrametadata & Crowdknowing
      Fostering 'Big Open Data' in government
            through Open Collaboration
             Ontolog - “Big Open Data” session 2
                        May 17, 2012




          Joel Natividad, co-founder
                     @jqnatividad
                                                   1
CROWDKNOWING




                     Human-powered,
                  Machine-accelerated,
        Collective Knowledge Systems
                                   2
0. Huge Open Data
1. Extract Metadata

2. Derive ExtraMetadata
  (Semantics + Statistics + Algorithm + Crowd)


3. Do Federated Queries on both the
   Metadata AND the Data



Crowdknowing
                                                 3
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.            4
a Semantic Data Dictionary




                             5
Semantic Steroids
• Searchable
  • Faceted Search
  • Drilldown
• Interlinked
• Semantic Browsing
• Queryable
• Query Results Formats
   ~3.5M facts
~950 datasets/views



                   6
NYCFacets Spider
             v0.5
• Crawls NYC Open Data Catalog every
  weekend
• RESTFul API
• Extracts metadata & derive extrametadata
• Pumps the data into NYCFacets
                                             7
Metadata
Top Level Metadata         Detail Metadata

   •   Name/ID                •   Column Names

   •   Category               •   Datatype

   •   Dataset Type           •   Width, etc.

   •   Attribution

   •   Owner ID, etc.



                                                 8
9
ExtraMetadata?
• Derived using Algorithm & the Crowd”
   “Semantics, Statistics,

• “Supercharacterize” by sampling the underlying
  not just the schema, but
                           each dataset
  data as well

• Score each dataset - Pediacities Rank
• Virtuous Feedback Loop around the Data
  micro-conversations/contributions
                                                   10
ExtraMetadata
Top Level                    Detail
ExtraMetadata                ExtraMetadata

  •   Number of Rows           •   Top Values

  •   Pediacities Rank         •   Descriptive statistics
      •   Freshness Score          •   Nulls/Non-nulls
      •   Sparseness Score         •   Smallest Value
      •   Social Score             •   Largest Value
      •   Views Score              •   “Uniqueness”
      •   Download Score
      •   Rating Score
                               •   Simple Visualization


                                                            11
12
13
“Crowd”

Microconversations/contributions
  •   Overall Rating

  •   Comments (comment rating)

  •   Bug Reports (data quality)

  •   Likes/Shares

  •   Downloads


                                   14
Crowdknowing
     Human-powered, Machine-accelerated,
        Collective Knowledge Systems




                                   Ontology, Inferencing, Semantic
   Curation, Comments,
                                 Mapping, Query Federation, Statistics,
  Feedback, Bug Reports,
                                   Pattern Recognition, Multivariate
Likes, Shares, Profile, Votes,
                                  Analysis & Forecasting, Automated
    Subscribes, Tagging,
                                     linking, Feeds, Notifications
        etc. etc. etc.
                                             etc. etc. etc.          15
• More Datasources!
• Not just Metadata!
• Federated Queries!
• SPARQL endpoint
• Bugzilla Integration
• Collaborative Ontology Modeling
• Feeds
• Microcontributions
• Gamification
• In time for NYCBigApps 4.0
                                    16
We need your help & feedback




        A Smart Data Exchange for All Data NYC

                  Find out more at
          http://nyc.pediacities.com/facets

@jqnatividad @samimirzabaig @pediacities @ontodia
                                                    17
CREDITS

• Flickr User Weston Price, Paleo-Caveman-
  Omnivore-LowCarb-Meat-Diet-Info (http://
  www.flickr.com/photos/paleo-atkins-meat-
  diet-info/with/6718805047/)
• Flickr User Gao Yi (http://www.flickr.com/
  photos/gaoyi/178514677/)


                                              18

More Related Content

PDF
Using metadata: Who, how and why - Crossref LIVE Hannover
PDF
Crossref webinar: Anna Tolwinska - Crossref Participation Reports Metadata 09...
PDF
8. Reach of Crossref metadata and who is using it
PDF
Global reach of Crossref metadata - Rachael Lammey - London LIVE 2017
PDF
IIIF, Annotations, & Discourse
PPT
Web Mining
PPTX
Giving Credit Where Credit is Due: Author and Funder IDs
PPT
Digital Library Infrastructure for a Million Books
Using metadata: Who, how and why - Crossref LIVE Hannover
Crossref webinar: Anna Tolwinska - Crossref Participation Reports Metadata 09...
8. Reach of Crossref metadata and who is using it
Global reach of Crossref metadata - Rachael Lammey - London LIVE 2017
IIIF, Annotations, & Discourse
Web Mining
Giving Credit Where Credit is Due: Author and Funder IDs
Digital Library Infrastructure for a Million Books

What's hot (14)

PPTX
Federated Search in a Disparate Environment
ODP
hack4knowledge - Mendeley API
PPT
Mendeley Open API
PPT
Presentation federated search
PPT
Federated Search: The Good, The Bad And The Ugly
PPT
Federated Search Falls Short
PDF
Working with ROR as a Crossref member: what you need to know
PPT
Linked library data
PPTX
Data quality problem and solution
PDF
VictorCassen
PPT
Automatic Metadata Generation Charles Duncan
ODP
Web Information Retrieval and Mining
PPTX
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
PPT
Introduction To Data Mining
Federated Search in a Disparate Environment
hack4knowledge - Mendeley API
Mendeley Open API
Presentation federated search
Federated Search: The Good, The Bad And The Ugly
Federated Search Falls Short
Working with ROR as a Crossref member: what you need to know
Linked library data
Data quality problem and solution
VictorCassen
Automatic Metadata Generation Charles Duncan
Web Information Retrieval and Mining
CrossMark: Standardizing Funding Information in Scholarly Journal Articles 20...
Introduction To Data Mining
Ad

Viewers also liked (18)

PPT
Project VCF learning so far
PPTX
CityMission
PDF
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
PDF
The Next Generation of Open Data
PDF
The Coming Web of Data
PDF
Microsoft word
PDF
Guia de illustrator 23 11-15
PDF
Smart Cities and Big Open Data
PDF
NYCBigApps 2013 Expo/Hackathon Talk
PDF
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
PPT
Effortless Hr Offering Presentation
PDF
clase visual basic
PDF
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
PDF
NYC Remapped
PDF
Practica word
PDF
Ejercicios practicos de excel ii
PDF
Raw data in, Insights out - CKANcon 2015
PDF
Open source in government
Project VCF learning so far
CityMission
Smart Cities, Open Data and SMW - SMWCon Spring 2012 Keynote
The Next Generation of Open Data
The Coming Web of Data
Microsoft word
Guia de illustrator 23 11-15
Smart Cities and Big Open Data
NYCBigApps 2013 Expo/Hackathon Talk
Ontodia Overview - Semantics and Wikis panel - SemTech West 2012
Effortless Hr Offering Presentation
clase visual basic
NYC Data Web (static version) - A Semantic, Open Public Data Exchange for NYC
NYC Remapped
Practica word
Ejercicios practicos de excel ii
Raw data in, Insights out - CKANcon 2015
Open source in government
Ad

Similar to NYCFacets: Metadata, Extrametadata and Crowdknowing (20)

PDF
20120419 linkedopendataandteamsciencemcguinnesschicago
PDF
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
PDF
The Semantic Web: RPI ITWS Capstone (Fall 2012)
PDF
ITWS Capstone Lecture (Spring 2013)
PDF
Planetdata simpda
PDF
PlanetData: Consuming Structured Data at Web Scale
PDF
20120411 travelalliancemcguinnessfinal
PPTX
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
PDF
Sharing data on the web (2013)
PDF
On demand access to Big Data through Semantic Technologies
PPT
A Framework for Ontology Usage Analysis
PPTX
reegle - a new key portal for open energy data
PDF
Aaai2012
PDF
20110728 datalift-rpi-troy
PDF
Fact forge aimsa2012
ODT
Riding The Semantic Wave
PDF
Crowdsourcing-enabled Linked Data management architecture
PDF
The state of the art in Linked Data
PDF
Sieve - Data Quality and Fusion - LWDM2012
PPT
State and future of linked data in learning analytics
20120419 linkedopendataandteamsciencemcguinnesschicago
20120718 linkedopendataandnextgenerationsciencemcguinnessesip final
The Semantic Web: RPI ITWS Capstone (Fall 2012)
ITWS Capstone Lecture (Spring 2013)
Planetdata simpda
PlanetData: Consuming Structured Data at Web Scale
20120411 travelalliancemcguinnessfinal
The Information Workbench - Linked Data and Semantic Wikis in the Enterprise
Sharing data on the web (2013)
On demand access to Big Data through Semantic Technologies
A Framework for Ontology Usage Analysis
reegle - a new key portal for open energy data
Aaai2012
20110728 datalift-rpi-troy
Fact forge aimsa2012
Riding The Semantic Wave
Crowdsourcing-enabled Linked Data management architecture
The state of the art in Linked Data
Sieve - Data Quality and Fusion - LWDM2012
State and future of linked data in learning analytics

Recently uploaded (20)

PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
A Presentation on Artificial Intelligence
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Modernizing your data center with Dell and AMD
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
MYSQL Presentation for SQL database connectivity
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Encapsulation_ Review paper, used for researhc scholars
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
A Presentation on Artificial Intelligence
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Modernizing your data center with Dell and AMD
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Network Security Unit 5.pdf for BCA BBA.
“AI and Expert System Decision Support & Business Intelligence Systems”
MYSQL Presentation for SQL database connectivity
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...

NYCFacets: Metadata, Extrametadata and Crowdknowing

  • 1. Metadata, Extrametadata & Crowdknowing Fostering 'Big Open Data' in government through Open Collaboration Ontolog - “Big Open Data” session 2 May 17, 2012 Joel Natividad, co-founder @jqnatividad 1
  • 2. CROWDKNOWING Human-powered, Machine-accelerated, Collective Knowledge Systems 2
  • 3. 0. Huge Open Data 1. Extract Metadata 2. Derive ExtraMetadata (Semantics + Statistics + Algorithm + Crowd) 3. Do Federated Queries on both the Metadata AND the Data Crowdknowing 3
  • 4. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 4
  • 5. a Semantic Data Dictionary 5
  • 6. Semantic Steroids • Searchable • Faceted Search • Drilldown • Interlinked • Semantic Browsing • Queryable • Query Results Formats ~3.5M facts ~950 datasets/views 6
  • 7. NYCFacets Spider v0.5 • Crawls NYC Open Data Catalog every weekend • RESTFul API • Extracts metadata & derive extrametadata • Pumps the data into NYCFacets 7
  • 8. Metadata Top Level Metadata Detail Metadata • Name/ID • Column Names • Category • Datatype • Dataset Type • Width, etc. • Attribution • Owner ID, etc. 8
  • 9. 9
  • 10. ExtraMetadata? • Derived using Algorithm & the Crowd” “Semantics, Statistics, • “Supercharacterize” by sampling the underlying not just the schema, but each dataset data as well • Score each dataset - Pediacities Rank • Virtuous Feedback Loop around the Data micro-conversations/contributions 10
  • 11. ExtraMetadata Top Level Detail ExtraMetadata ExtraMetadata • Number of Rows • Top Values • Pediacities Rank • Descriptive statistics • Freshness Score • Nulls/Non-nulls • Sparseness Score • Smallest Value • Social Score • Largest Value • Views Score • “Uniqueness” • Download Score • Rating Score • Simple Visualization 11
  • 12. 12
  • 13. 13
  • 14. “Crowd” Microconversations/contributions • Overall Rating • Comments (comment rating) • Bug Reports (data quality) • Likes/Shares • Downloads 14
  • 15. Crowdknowing Human-powered, Machine-accelerated, Collective Knowledge Systems Ontology, Inferencing, Semantic Curation, Comments, Mapping, Query Federation, Statistics, Feedback, Bug Reports, Pattern Recognition, Multivariate Likes, Shares, Profile, Votes, Analysis & Forecasting, Automated Subscribes, Tagging, linking, Feeds, Notifications etc. etc. etc. etc. etc. etc. 15
  • 16. • More Datasources! • Not just Metadata! • Federated Queries! • SPARQL endpoint • Bugzilla Integration • Collaborative Ontology Modeling • Feeds • Microcontributions • Gamification • In time for NYCBigApps 4.0 16
  • 17. We need your help & feedback A Smart Data Exchange for All Data NYC Find out more at http://nyc.pediacities.com/facets @jqnatividad @samimirzabaig @pediacities @ontodia 17
  • 18. CREDITS • Flickr User Weston Price, Paleo-Caveman- Omnivore-LowCarb-Meat-Diet-Info (http:// www.flickr.com/photos/paleo-atkins-meat- diet-info/with/6718805047/) • Flickr User Gao Yi (http://www.flickr.com/ photos/gaoyi/178514677/) 18