SlideShare a Scribd company logo
How Linked Data
Can Speed
Information
Discovery
Alex Meadows, CSpring
Bubba Puryear, Syngenta
Agenda
 Linked Data Overview
 Case Study: Linked Data At Syngenta
 Q&A
We don’t know your data, it’s
Going to take us some time.
-or-
We have so many other projects
we’re not sure when we can get
to this request.
We’re not sure what we want,
but can’t we have it all?
-or-
Here’s our requirements, when
can we have this completed?
Business BI Team
New source: weeks to months
Existing source: days to weeks
How Linked Data Can Speed Information Discovery
What is Linked Data?
 Coined in 2006 by Tim Berners-Lee
 Provides vocabulary for every data set
 Can combine vocabularies
 Highly structured in triple format
Vocabulary: Classes
Vocabulary: Properties
Triples
Pale Ale
Beer
Mark
Person
Mt. Carmel Brewing Co.
Brewer
Triples: RDF/XML
Option 1: Virtualization
New source: hours to week
Existing source: hours to days
Ontop
 Mapping layer
between SQL and
SPARQL
 Integrates with many
tools (Protégé,
Sesame, etc.)
Option 2: Lift and Format
New source: days to weeks
Existing source: hours to days
SPARQL
PREFIX beer: http://my.beer.vocab/1.0/
SELECT ?brewery Name
WHERE {
?brewery beer:hasName ?breweryName
?person beer:owner_of ?brewery
?person beer:first_name “Mark”
}
PREFIX beer: http://my.beer.vocab/1.0/
SELECT ?beertype
WHERE {
?beer beer:isOfType ?beertype
?person beer:brews ?beer
?person beer:first_name “Mark”
<beer:isOfType rdf:resource="beer:PaleAle"/>
<beer:isOfType rdf:resource=“beer:Lager”/>
<beer:hasName>Mt. Carmel Brewing
Company</beer:hasName>
Case Study:
Linked Data At Syngenta
Syngenta
Syngenta is a leading agriculture company helping
to improve global food security by enabling millions
of farmers to make better use of available
resources.
We have two primary lines of business: Seeds and
Agricultural Chemicals.
We have a huge commitment to internal R&D and
that is where our linked data initiatives are.
Linked Data at Syngenta
 Concept Store
Enable Syngenta applications to consume and publish
linked data controlled vocabulary (reference terms and
relationships)
 ENVision Tool
Enables trial placements and weightings that best
represent target markets
 MINT Data
Make genetic identity & inventory data available for
discovery, analysis and R&D driven proof of concepts
What we accomplished
 In a 3 day hackathon we:
 Mapped about 60% of MINT’s model from 2
databases to RDF
 Built a virtualized RDF triple store
 Created a data-discovery / browsing user
interface
MINT Data
MINT Browser
Repository
Configuration
• Identity
• Material
MINT Ontology
• Identity
• Material
RDBMS-RDF Mapper
RDF
Repository
Broker
Open-Sesame
MINT Material
RDBMS
JDBC
R2RML Mapping
• Material
Semantic Wiki
SPARQL
Ontology &
Mapping
Designer
Ontologist
RDBMS-RDF Mapper
MINT Identity
RDBMS
JDBC
R2RML Mapping
• Identity
MINT Class Model
 The MINT ontology was created
within Protégé as shown here
MINT Virtualization Mapping
MINT Virtualization Mapping
Next Steps
 Moving from the virtualized layer into actual
physical triple store implementation
 Partnering with our benefits tracking team to get
accurate metrics on MINT adoption and value
 Linking to additional data sources to provide
dashboard KPI’s and analytics for our R&D seeds
pipeline
THANK YOU!
About Alex…
 Principal Consultant, CSpring
 https://www.linkedin.com/in/alexmeadows
 Twitter, GitHub as OpenDataAlex
 Alex has spent the last ten years working in various industries to
help businesses unlock the information hidden in their data sets. He
specializes in open source business intelligence solutions from data
warehousing to dashboards, analytics, and beyond. His latest area
of research has been on linked data (also known as triple stores).
Alex has a Masters in Business Intelligence from Saint Joseph’s
University in Pennsylvania and a Bachelors in Business
Administration from Chowan University in North Carolina.
About Bubba…
 Team Leader, R&D IS, Syngenta
 https://www.linkedin.com/in/bubbapuryear
 I’ve held roles as a software engineer, architect and manager across
multiple industries. The last 13 years I’ve worked in the life sciences
industry supporting Research & Development. I’m currently the program
architect / technical lead for a standardization program within Syngenta
bringing Track & Trace compliance to R&D’s material operations. Many of
Syngenta’s R&D product decisions for our Seeds line of business are
founded on data associated with plant material identity. I have a
Bachelors degree in Computer Science from Rose-Hulman Institute of
Technology.

More Related Content

ODP
Graphing Your Data
ODP
Building next generation data warehouses
PDF
Nosql database presentation
PPTX
NoSQL Type, Bigdata, and Analytics
ODP
Open Source Business Intelligence Overview
PPTX
Data science big data and analytics
PPTX
Big data technologies with Case Study Finance and Healthcare
Graphing Your Data
Building next generation data warehouses
Nosql database presentation
NoSQL Type, Bigdata, and Analytics
Open Source Business Intelligence Overview
Data science big data and analytics
Big data technologies with Case Study Finance and Healthcare

What's hot (20)

PPTX
Improvement of no sql technology for relational databases v2
PPTX
Big data and machine learning / Gil Chamiel
PPT
Big Data: Improving capacity utilization of transport companies
PPTX
The future of Big Data tooling
PPTX
Neo4j_allHands_04112013
PPTX
Introduction to Big Data
PPTX
Big Data - Part I
ODP
Open source data_warehousing_overview
PPTX
Introduction to NoSQL and MongoDB
PPTX
Big Data - Part II
PPTX
Big Data - Part III
PDF
DBPedia-past-present-future
PDF
Big Data Streams Architectures. Why? What? How?
PPTX
Big Data - Part IV
PDF
Oslo bekk2014
PPTX
Database and types of database
PDF
Skillshare - Introduction to Data Scraping
PPTX
Choosing data warehouse considerations
PPTX
Karen Lopez 10 Physical Data Modeling Blunders
Improvement of no sql technology for relational databases v2
Big data and machine learning / Gil Chamiel
Big Data: Improving capacity utilization of transport companies
The future of Big Data tooling
Neo4j_allHands_04112013
Introduction to Big Data
Big Data - Part I
Open source data_warehousing_overview
Introduction to NoSQL and MongoDB
Big Data - Part II
Big Data - Part III
DBPedia-past-present-future
Big Data Streams Architectures. Why? What? How?
Big Data - Part IV
Oslo bekk2014
Database and types of database
Skillshare - Introduction to Data Scraping
Choosing data warehouse considerations
Karen Lopez 10 Physical Data Modeling Blunders
Ad

Viewers also liked (15)

PPSX
Michigan Psychic Medium, Famous Psychic in Michigan - The Traveling Psychics
PPT
Expo capitulo ii
PPT
0905講道
PPTX
講道0722
DOCX
Resume Environmental Geology
DOC
標竿人生 -Purpose driven life
PPTX
gullivertaravels
PDF
An Introduction Managing Safety and Health in School
PPT
木匠Marvelous light(9 1)
PPTX
Michelle.Pavlick Personal Topic Presentation
PDF
Public Administration and Management of the Judiciary
PPTX
P5 packages
PDF
Nfr recruitment-sportsperson-posts-notification-application-form
PPTX
Funciones cognitivas en el envejecimiento
PPT
Polipos de colon y recto cancer colorectal
Michigan Psychic Medium, Famous Psychic in Michigan - The Traveling Psychics
Expo capitulo ii
0905講道
講道0722
Resume Environmental Geology
標竿人生 -Purpose driven life
gullivertaravels
An Introduction Managing Safety and Health in School
木匠Marvelous light(9 1)
Michelle.Pavlick Personal Topic Presentation
Public Administration and Management of the Judiciary
P5 packages
Nfr recruitment-sportsperson-posts-notification-application-form
Funciones cognitivas en el envejecimiento
Polipos de colon y recto cancer colorectal
Ad

Similar to How Linked Data Can Speed Information Discovery (20)

PPT
Sem tech 2011 v8
PDF
Red hatpartner2013edb futureofdatabase
 
PDF
Eclipse day Sydney 2014 BIG data presentation
PDF
The Great Lakes: How to Approach a Big Data Implementation
PPTX
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
PDF
Understanding Metadata: Why it's essential to your big data solution and how ...
PDF
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
PPTX
PPTX
SMX Advanced 2012 - Catching up with the Semantic Web
PDF
Data Infrastructure for a World of Music
PDF
Horses for Courses: Database Roundtable
PPT
Introduction to question answering for linked data & big data
PDF
Introduction Big Data
PDF
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
PDF
Job Data Analysis Reveals Key Skills Required for Data Scientists
PDF
Mastering the variety dimension of Big Data with semantic technologies: high ...
PPTX
Enterprise linked data clouds
PPTX
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
PPTX
Choosing technologies for a big data solution in the cloud
PPTX
21-RDF and triplestores in NOSql database.pptx
Sem tech 2011 v8
Red hatpartner2013edb futureofdatabase
 
Eclipse day Sydney 2014 BIG data presentation
The Great Lakes: How to Approach a Big Data Implementation
Data Con LA 2018 - A tale of two BI standards: Data warehouses and data lakes...
Understanding Metadata: Why it's essential to your big data solution and how ...
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
SMX Advanced 2012 - Catching up with the Semantic Web
Data Infrastructure for a World of Music
Horses for Courses: Database Roundtable
Introduction to question answering for linked data & big data
Introduction Big Data
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Job Data Analysis Reveals Key Skills Required for Data Scientists
Mastering the variety dimension of Big Data with semantic technologies: high ...
Enterprise linked data clouds
Big Data and BI Tools - BI Reporting for Bay Area Startups User Group
Choosing technologies for a big data solution in the cloud
21-RDF and triplestores in NOSql database.pptx

More from Alex Meadows (13)

PPTX
Ethics In A Data Driven World
PDF
SIM RTP Meeting - So Who's Using Open Source Anyway?
ODP
Introduction To Data Warehousing
ODP
Continuous Integration As A Service
ODP
Introduction To Analytics
PDF
Big Data Pitfalls
ODP
Continuous integration with business intelligence and analytics
ODP
Big Data Analytics - Introduction
PDF
Open Source BI Overview
PDF
Agile Business Intelligence
ODP
Data quality overview
ODP
Mondrian and OLAP Overview
ODP
Choosing the right steps in pentaho kettle
Ethics In A Data Driven World
SIM RTP Meeting - So Who's Using Open Source Anyway?
Introduction To Data Warehousing
Continuous Integration As A Service
Introduction To Analytics
Big Data Pitfalls
Continuous integration with business intelligence and analytics
Big Data Analytics - Introduction
Open Source BI Overview
Agile Business Intelligence
Data quality overview
Mondrian and OLAP Overview
Choosing the right steps in pentaho kettle

Recently uploaded (20)

PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PDF
Foundation of Data Science unit number two notes
PPTX
Database Infoormation System (DBIS).pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPT
Miokarditis (Inflamasi pada Otot Jantung)
Business Ppt On Nestle.pptx huunnnhhgfvu
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Foundation of Data Science unit number two notes
Database Infoormation System (DBIS).pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
STUDY DESIGN details- Lt Col Maksud (21).pptx
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
IB Computer Science - Internal Assessment.pptx
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
climate analysis of Dhaka ,Banglades.pptx
Fluorescence-microscope_Botany_detailed content
oil_refinery_comprehensive_20250804084928 (1).pptx
Clinical guidelines as a resource for EBP(1).pdf
Miokarditis (Inflamasi pada Otot Jantung)

How Linked Data Can Speed Information Discovery

  • 1. How Linked Data Can Speed Information Discovery Alex Meadows, CSpring Bubba Puryear, Syngenta
  • 2. Agenda  Linked Data Overview  Case Study: Linked Data At Syngenta  Q&A
  • 3. We don’t know your data, it’s Going to take us some time. -or- We have so many other projects we’re not sure when we can get to this request. We’re not sure what we want, but can’t we have it all? -or- Here’s our requirements, when can we have this completed? Business BI Team
  • 4. New source: weeks to months Existing source: days to weeks
  • 6. What is Linked Data?  Coined in 2006 by Tim Berners-Lee  Provides vocabulary for every data set  Can combine vocabularies  Highly structured in triple format
  • 11. Option 1: Virtualization New source: hours to week Existing source: hours to days
  • 12. Ontop  Mapping layer between SQL and SPARQL  Integrates with many tools (Protégé, Sesame, etc.)
  • 13. Option 2: Lift and Format New source: days to weeks Existing source: hours to days
  • 14. SPARQL PREFIX beer: http://my.beer.vocab/1.0/ SELECT ?brewery Name WHERE { ?brewery beer:hasName ?breweryName ?person beer:owner_of ?brewery ?person beer:first_name “Mark” } PREFIX beer: http://my.beer.vocab/1.0/ SELECT ?beertype WHERE { ?beer beer:isOfType ?beertype ?person beer:brews ?beer ?person beer:first_name “Mark” <beer:isOfType rdf:resource="beer:PaleAle"/> <beer:isOfType rdf:resource=“beer:Lager”/> <beer:hasName>Mt. Carmel Brewing Company</beer:hasName>
  • 15. Case Study: Linked Data At Syngenta
  • 16. Syngenta Syngenta is a leading agriculture company helping to improve global food security by enabling millions of farmers to make better use of available resources. We have two primary lines of business: Seeds and Agricultural Chemicals. We have a huge commitment to internal R&D and that is where our linked data initiatives are.
  • 17. Linked Data at Syngenta  Concept Store Enable Syngenta applications to consume and publish linked data controlled vocabulary (reference terms and relationships)  ENVision Tool Enables trial placements and weightings that best represent target markets  MINT Data Make genetic identity & inventory data available for discovery, analysis and R&D driven proof of concepts
  • 18. What we accomplished  In a 3 day hackathon we:  Mapped about 60% of MINT’s model from 2 databases to RDF  Built a virtualized RDF triple store  Created a data-discovery / browsing user interface
  • 19. MINT Data MINT Browser Repository Configuration • Identity • Material MINT Ontology • Identity • Material RDBMS-RDF Mapper RDF Repository Broker Open-Sesame MINT Material RDBMS JDBC R2RML Mapping • Material Semantic Wiki SPARQL Ontology & Mapping Designer Ontologist RDBMS-RDF Mapper MINT Identity RDBMS JDBC R2RML Mapping • Identity
  • 20. MINT Class Model  The MINT ontology was created within Protégé as shown here
  • 23. Next Steps  Moving from the virtualized layer into actual physical triple store implementation  Partnering with our benefits tracking team to get accurate metrics on MINT adoption and value  Linking to additional data sources to provide dashboard KPI’s and analytics for our R&D seeds pipeline
  • 25. About Alex…  Principal Consultant, CSpring  https://www.linkedin.com/in/alexmeadows  Twitter, GitHub as OpenDataAlex  Alex has spent the last ten years working in various industries to help businesses unlock the information hidden in their data sets. He specializes in open source business intelligence solutions from data warehousing to dashboards, analytics, and beyond. His latest area of research has been on linked data (also known as triple stores). Alex has a Masters in Business Intelligence from Saint Joseph’s University in Pennsylvania and a Bachelors in Business Administration from Chowan University in North Carolina.
  • 26. About Bubba…  Team Leader, R&D IS, Syngenta  https://www.linkedin.com/in/bubbapuryear  I’ve held roles as a software engineer, architect and manager across multiple industries. The last 13 years I’ve worked in the life sciences industry supporting Research & Development. I’m currently the program architect / technical lead for a standardization program within Syngenta bringing Track & Trace compliance to R&D’s material operations. Many of Syngenta’s R&D product decisions for our Seeds line of business are founded on data associated with plant material identity. I have a Bachelors degree in Computer Science from Rose-Hulman Institute of Technology.

Editor's Notes

  • #4: Whether you’re from the business or from the business intelligence side of your company, you’re like familiar with both of these scenarios. On the first scenario, the business wants access to new data, or data that’s available but they don’t have rights to. Because it’s new, they may or may not know everything that’s there, or how to access the particular data they require so their BI team has to do a lot of analysis before they can even commit to providing data in a format that may or may not work. In the second scenario, the business knows exactly what they want but because of all the work on one central team it may take a while to get to it and by then the requirements have changed or the work is no longer needed. Both scenarios have the same problem – it takes time to process data through a data warehouse and into a format that may work for the businesses’ needs. The other question to have here is that is the data warehouse fit for what the business is trying to do? Why does it take so long to get data from source to warehouse frontend tools?
  • #5: This is the typical process used to take data from source to mart. There may be more or less steps involved in your business, but generally speaking there’s a staging process that feeds the data in to be piped into the data warehouse, which is then cleansed/conformed into specialized data marts. All of this processing is typically done using data integration/ETL. If it’s a new data source, regardless of the type of architecture for your warehouse, you’re looking at weeks to months of turn around time. If it’s modification on an existing source, turn around time goes down to days to weeks. It’s time consuming because Business Intelligence has to try to analyze the requirements of any given project and ensure that it meets the overall business requirements. In addition, each ETL step has to process, clean and prune data accordingly.
  • #6: So here’s the real question. What if I told you there’s another way to do data exploration? One that can have faster initial turn around time, provide a way for business areas to agree and disagree on data interpretation, and provide better business requirements for your business intelligence team? That’s what we’re here to talk about today – Linked Data!
  • #7: The concept of linked data has been around for a while, with Tim Berners-Lee coining the term back in 2006. Unlike with traditional relational databases, Linked Data explicitly describes the relationships and metadata around the data stored in ‘vocabularies’. These vocabularies can be combined where they overlap to provide richer data that the source systems alone. This data is stored in what is known as a ‘triple’ format. We’ll get to both this as well as what vocabularies look like in a few slides.
  • #8: Let’s start by diving into what makes up a vocabulary. There are three primary components to any vocabulary: the class hierarchy, how the classes relate to each other, and the properties of said classes. For example, here’s an open source vocabulary for beer. Classes make up the objects that are being described in any given data set. Properties of a parent are inherited by the children. In this case, if we look at BeerType, we can see that there are number of properties that are associated with all beers. Also note that there’s things listed that we don’t explicitly call out in this vocabulary. For instance, the rdfs:label and xsd:float. Those are examples of things defined in other vocabularies that we are applying in this beer vocabulary that helps provide validation and rules to the values. Also note that each object is given a unique URL (IRI). This ensures that every object in the vocabulary is unique. Vocabulary source: http://webprotege.stanford.edu/#Edit:projectId=2c3e9b0f-b7ce-49ca-a621-acb66883607a
  • #9: Vocabularies also have properties that provide descriptions of the classes. Properties can be attached to any classes/relationships (domain and range). Just like with relational databases, properties can be given types that ensure the format is correct.
  • #10: Material for this example from: http://www.thevergemagazine.org/career-qa-beer-brewer/
  • #11: Here’s that same data (plus some more) in RDF/XML format.
  • #12: Now back to our architecture from earlier. We aren’t replacing the data warehouse with Linked Data – far from it! There are a few options for implementation, with the first one being virtualization. In this case, there is a mapping between the source systems and the linked data layer that will translate linked data queries (typically written in SPARQL) into SQL or other formats to pull data from the sources as if they are physically stored in a triple store. The benefit here is that there is fast turn around and your reporting and analysis tools should be able to run SPARQL queries. The downside is that these source systems may get overwhelmed with the types of queries being executed due to the translation/interpretation of the virtualization platform.
  • #14: Our other alternative, which requires a bit more engineering, is to use data integration (ETL, etc.) to morph the source systems into a triple store. The benefit here is that the data is being stored in literal triples, which minimizes impact on the source systems and provides an independent platform for data discovery. The con of course is that there needs to be a bit of work gets sources into the linked data store, but it’s fewer steps that would be required with the data warehouse/specialized data marts. Of the two options, this one is more robust for long term data discovery and proof of concepts.