SlideShare a Scribd company logo
1
More Meaning. Better Results.
1
Building the Inform Semantic Publishing Ecosystem:
from Author to Audience
Marc Hadfield
VP, Research & Development
marc@inform.com
2
Marc Hadfield
• Semantic Technology, Computer Science
• Inform Technologies (Head of R&D)
‣ Semantic Technologies applied to Content Analysis & Distribution
• Alitora Systems (Co-Founder / CTO)
‣ Life Science Semantic Technology, Research, Big Data Analytics, Semantic HPC
‣ Life Science Natural Language Processing
• Columbia Genome Center
‣ NLP applied to Life Science Research Articles
• LCconnect (CTO)
‣ Letter-of-Credit Exchange
2
3
Semantics in Publishing…
3
• Ongoing Theme at ISWC 2010…
‣ NY Times
‣ Facebook (OpenGraph)
‣ Elsevier
‣ BBC
4
What is Inform?
4
• Inform is a content enrichment solution designed to increase consumer
engagement, page views and revenue.
• We provide a hosted Semantic Web Service for content publishers that:
1. Reads your article before you publish it
2. Turns main topics and entities (people, places, companies, organizations) into links
3. Provides feeds of related web content when you publish it
• New Direction: Optimizing Content Distribution via Direct Channels
• Web users moving away from destination web sites, but still want the destination web
site content.
• Companies utilizing Inform include:
Connecting your content
55
Audio, Video & Blogs
from the Web
Articles from
the Web
Content from Inform
Your Affiliates’ Content
Your Content
Affiliated
Content
Your
Content
Licensed
Content
Google Street View Topic 0.90
Google Company 1.00
Ireland Place 0.70
Norway Place 0.70
South Africa Place 0.70
Sweden Place 0.70
Brian McClendon Person 0.80
Mountain View, California Place 0.60
Wi-Fi Topic 0.50
6
Related Content Widgets
6
7
Inform Topic Pages, Micro Sites
7
8
My Job: Building the Semantic Platform…
8
• “Silo”-ed Semantic Technology  Semantic Web
‣ Aligned with Wikipedia, Leverage Linked Data for Mash-Ups
‣ RDFa, SKOS, Semantic SEO
• Semantic / NLP Engine
‣ Improve Features, Quality
• Semantic Data Infrastructure
‣ Scalable Infrastructure
• Semantic Data Analysis
‣ Algorithms (Topology of Graphs), Inference
‣ “PageRank” on semantic data
• Personalization, Usage Analysis
• Micro Sites
‣ Clusters of Topics, Generating Rich Content Experience
• Distributing to Social Platforms
‣ i.e. Facebook
9
Inform: Author to Audience
9
10
Leverage Inform Taxonomy
10
1111
Author 
‣ Content Creation Services
‣ Semantic Data Repository
‣ Semantic Data Analysis
‣ Content Selection Algorithms
‣ Webservices
‣ Content Distribution Services
 Audience
Inside the
Semantic
System
Architecture
12
Content Creation
12
• Article Creation Tool (ACT)
‣ Author Tools
‣ Embed in CMS, Tumblr / Wordpress Plugin
• Publisher Portal
‣ Editorial Tool
‣ Content Feeds
• Web Crawl
• Summarizer
‣ Create smart “blurbs” to advertise article
• LinkedData
‣ Freebase, Wikipedia, DBPedia, et cetera.
13
ACT Tool
13
14
ACT Tool
14
15
ACT Tool, Tumblr, Wordpress
15
16
Publisher Portal
16
17
Summarizer
17
18
Semantic Data Repository
18
• Data Master / Data Node
‣ Federated Semantic Data Managers
‣ SPARQL Triplestore (scalable cluster)
‣ Semantic Search
‣ Search Indexes (Semi-Structured and Full-Text Search)
‣ Lucene/Siren (Sindice)
‣ Facets, Frequency Counts
‣ Cache (In-Memory)
‣ Blob Store (Voldemort)
‣ Listener to Activity (Flume)
‣ User Activity (clicks)
‣ Content Activity (content updates)
‣ Near Real-Time Trends, Analysis
‣ Compute Algorithms (Stored Procedures in Groovy)
‣ Long Term Content Archive (offline)
19
Semantic Data Analysis
19
• Natural Language Processing
‣ Rules & Machine Learning, Training
‣ 500K articles per day, 4,000 unique sites
‣ Text Extraction, Section/Sentence Extraction
‣ Tokenization, Part-of-Speech, Noun/Verb Phrases
‣ Entity Extraction, Entity Normalization
‣ Topic Extraction, Summarization, Clustering
• User Activity
‣ User Model (Personalization)
• Semantic Inference
‣ F-Logic, Multi-Domain
‣ Linked Data Mash-Ups
• Semantic Graph Topology
‣ Entity / Property Importance Metrics, Ranking, “PageRank”
‣ Which triples in LinkedData are interesting?
20
Content Selection Algorithms
20
• Model of User, Personalization
‣ Social Networks provide Context
• Semantic Analysis of Content
• Algorithms
‣ Maximize Relevancy / Relatedness (Meets Editorial Criteria)
‣ Maximize Click-Through
‣ Cute Kitten vs. Engagement Issue
‣ Maximize Monetization
Goal: Content Exchange
21
Webservices
21
• REST
‣ Outputs RDF / JSON Data
• Natural Language Processing
‣ Article to Semantic MetaData
• Related Content
‣ Inputs: Content, Personalization, Algorithm
‣ Articles
‣ Semantic Mash-Ups
‣ Topics
‣ Entities
• Semantic Query, Site Search
• Storage, Content Repository
22
Content Distribution Services
22
• Customer Destinations (Traditional Business)
‣ Deep Integration
• Publisher Widgets
‣ Levels of Lightweight Integration
‣ Example: Related-Content-Widget in JavaScript
• Inform.com
‣ Topic Pages
• Micro Sites
‣ Several Thousand Owned-and-Operated Domains/Sites, Topic Driven
• Social Networks
‣ Facebook
Tools:
• Semantic SEO
‣ RDFa, SKOS
23
Semantic MetaData, RDFa
23
http://inspector.sindice.com
24
Facebook App
24
25
Using Facebook OpenGraph
25
Relevancy Algorithm:
Combine:
•Trending / Popular Topics
•Trending / Popular Articles
•Personalization “Liked” Topics
•Personalization “Liked” Articles
•User Profiles (“Users like you…”)
26
Facebook “Liked” Topics
26
27
Facebook Article Stream
27
28
Inform: Author to Audience via Semantics
28
29
Thanks for your attention!
29
Questions?
Contact Information:
Marc Hadfield
marc@inform.com

More Related Content

PDF
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
PDF
Ontos NLP Stack, Sep. 2016
PPTX
Crawlable Spatial Data - #Geo4Web research topic #3
PPTX
APIs and the Semantic Web: publishing information instead of data
PDF
Three Linked Data choices for Libraries
PDF
Key Success Factors for Enterprise Content Management
PPT
Marc and beyond: 3 Linked Data Choices
PDF
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Chalitha Perera | Cross Media Concept and Entity Driven Search for Enterprise
Ontos NLP Stack, Sep. 2016
Crawlable Spatial Data - #Geo4Web research topic #3
APIs and the Semantic Web: publishing information instead of data
Three Linked Data choices for Libraries
Key Success Factors for Enterprise Content Management
Marc and beyond: 3 Linked Data Choices
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...

What's hot (20)

PDF
Structured data: Where did that come from & why are Google asking for it
PDF
Fried data summit data quality data analytics together
PPTX
A Real-World Implementation of Linked Data
PDF
How to build your own Delve: combining machine learning, big data and SharePoint
PDF
How LinkedIn Democratizes Big Data Visualization
PPTX
Focused Crawling for Structured Data
PDF
Schema.org Structured data the What, Why, & How
PPTX
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
PPTX
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
PPTX
DWCNZ - Content Types: Love Them or Lose It
PDF
Real-time big data analytics based on product recommendations case study
PDF
KESeDa: Knowledge Extraction from Heterogeneous Semi-Structured Data Sources
PDF
Understanding voice of the member via text mining
PPTX
Instant Security and User Management in Spring Boot
PDF
S4: The Self-Service Semantic Suite
PDF
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
PDF
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
PDF
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
PDF
O365Con18 - Invest in Search - Matthew McDermott
KEY
Semantically Enabled Personal Information Management with Cluug.com
Structured data: Where did that come from & why are Google asking for it
Fried data summit data quality data analytics together
A Real-World Implementation of Linked Data
How to build your own Delve: combining machine learning, big data and SharePoint
How LinkedIn Democratizes Big Data Visualization
Focused Crawling for Structured Data
Schema.org Structured data the What, Why, & How
The SAS Search Journey: Using AI to Move from Google to Lucidworks - Alex Fl...
Neo4j graphs in the real world - graph days d.c. - april 14, 2015
DWCNZ - Content Types: Love Them or Lose It
Real-time big data analytics based on product recommendations case study
KESeDa: Knowledge Extraction from Heterogeneous Semi-Structured Data Sources
Understanding voice of the member via text mining
Instant Security and User Management in Spring Boot
S4: The Self-Service Semantic Suite
Understanding Voice of Members via Text Mining – How Linkedin Built a Text An...
Semantically integrated Enterprise Data Lakes and Co-Evolution of Public / Pr...
Couchbase and Apache Kafka - Bridging the gap between RDBMS and NoSQL
O365Con18 - Invest in Search - Matthew McDermott
Semantically Enabled Personal Information Management with Cluug.com
Ad

Similar to Building the Inform Semantic Publishing Ecosystem: from Author to Audience (20)

PPTX
Semantics and Machine Learning
PPTX
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
PDF
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
PPTX
Webinar: The Slippery Slope of Migrating to SharePoint Online or On-Premise
PDF
What do we want computers to do for us?
PPTX
Webinar: Slippery Slope of SharePoint Migrations
PDF
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
PPTX
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
PPTX
Climbing the Slippery Slope of SharePoint Migrations Webinar
PDF
Big problems Big data, simple AWS solution
PPTX
MLaaS - Machine Learning as a Service
PDF
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
PDF
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
PDF
Open Data Summit Presentation by Joe Olsen
PPTX
Social Media Data Collection & Analysis
PDF
Big problems Big Data, simple solutions
PPTX
The Next Web of Linked Data
PDF
Semantic Web For Dummies
PPTX
How to Empower Your Business Users with Oracle Data Visualization
PDF
Rapid Data Exploration With Hadoop
Semantics and Machine Learning
2017 01-11 intelligent search and intranet - chihuahuas vs muffins v1
Oracle Big Data Spatial & Graph 
Social Media Analysis - Case Study
Webinar: The Slippery Slope of Migrating to SharePoint Online or On-Premise
What do we want computers to do for us?
Webinar: Slippery Slope of SharePoint Migrations
Social Network Analysis using Oracle Big Data Spatial & Graph (incl. why I di...
Structuring Serendipitous Collaboration - Nick Inglis at Collab365 Conference
Climbing the Slippery Slope of SharePoint Migrations Webinar
Big problems Big data, simple AWS solution
MLaaS - Machine Learning as a Service
II-SDV 2017: Approaches of Web Information Analysis in a Day to Day Work Envi...
Social Media, Cloud Computing, Machine Learning, Open Source, and Big Data An...
Open Data Summit Presentation by Joe Olsen
Social Media Data Collection & Analysis
Big problems Big Data, simple solutions
The Next Web of Linked Data
Semantic Web For Dummies
How to Empower Your Business Users with Oracle Data Visualization
Rapid Data Exploration With Hadoop
Ad

More from Vital.AI (6)

PDF
Optimizing the
 Data Supply Chain
 for Data Science
PDF
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
PDF
Vital AI: Big Data Modeling
PDF
Vital.AI Creating Intelligent Apps
PPT
Natural Language Processing & Semantic Models in an Imperfect World
PDF
Inform: Targeting the Interest Graph
Optimizing the
 Data Supply Chain
 for Data Science
Vital AI MetaQL: Queries Across NoSQL, SQL, Sparql, and Spark
Vital AI: Big Data Modeling
Vital.AI Creating Intelligent Apps
Natural Language Processing & Semantic Models in an Imperfect World
Inform: Targeting the Interest Graph

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
“AI and Expert System Decision Support & Business Intelligence Systems”
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Machine learning based COVID-19 study performance prediction
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
Chapter 3 Spatial Domain Image Processing.pdf
cuic standard and advanced reporting.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
20250228 LYD VKU AI Blended-Learning.pptx
Big Data Technologies - Introduction.pptx
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf

Building the Inform Semantic Publishing Ecosystem: from Author to Audience

  • 1. 1 More Meaning. Better Results. 1 Building the Inform Semantic Publishing Ecosystem: from Author to Audience Marc Hadfield VP, Research & Development marc@inform.com
  • 2. 2 Marc Hadfield • Semantic Technology, Computer Science • Inform Technologies (Head of R&D) ‣ Semantic Technologies applied to Content Analysis & Distribution • Alitora Systems (Co-Founder / CTO) ‣ Life Science Semantic Technology, Research, Big Data Analytics, Semantic HPC ‣ Life Science Natural Language Processing • Columbia Genome Center ‣ NLP applied to Life Science Research Articles • LCconnect (CTO) ‣ Letter-of-Credit Exchange 2
  • 3. 3 Semantics in Publishing… 3 • Ongoing Theme at ISWC 2010… ‣ NY Times ‣ Facebook (OpenGraph) ‣ Elsevier ‣ BBC
  • 4. 4 What is Inform? 4 • Inform is a content enrichment solution designed to increase consumer engagement, page views and revenue. • We provide a hosted Semantic Web Service for content publishers that: 1. Reads your article before you publish it 2. Turns main topics and entities (people, places, companies, organizations) into links 3. Provides feeds of related web content when you publish it • New Direction: Optimizing Content Distribution via Direct Channels • Web users moving away from destination web sites, but still want the destination web site content. • Companies utilizing Inform include:
  • 5. Connecting your content 55 Audio, Video & Blogs from the Web Articles from the Web Content from Inform Your Affiliates’ Content Your Content Affiliated Content Your Content Licensed Content Google Street View Topic 0.90 Google Company 1.00 Ireland Place 0.70 Norway Place 0.70 South Africa Place 0.70 Sweden Place 0.70 Brian McClendon Person 0.80 Mountain View, California Place 0.60 Wi-Fi Topic 0.50
  • 7. 7 Inform Topic Pages, Micro Sites 7
  • 8. 8 My Job: Building the Semantic Platform… 8 • “Silo”-ed Semantic Technology  Semantic Web ‣ Aligned with Wikipedia, Leverage Linked Data for Mash-Ups ‣ RDFa, SKOS, Semantic SEO • Semantic / NLP Engine ‣ Improve Features, Quality • Semantic Data Infrastructure ‣ Scalable Infrastructure • Semantic Data Analysis ‣ Algorithms (Topology of Graphs), Inference ‣ “PageRank” on semantic data • Personalization, Usage Analysis • Micro Sites ‣ Clusters of Topics, Generating Rich Content Experience • Distributing to Social Platforms ‣ i.e. Facebook
  • 9. 9 Inform: Author to Audience 9
  • 11. 1111 Author  ‣ Content Creation Services ‣ Semantic Data Repository ‣ Semantic Data Analysis ‣ Content Selection Algorithms ‣ Webservices ‣ Content Distribution Services  Audience Inside the Semantic System Architecture
  • 12. 12 Content Creation 12 • Article Creation Tool (ACT) ‣ Author Tools ‣ Embed in CMS, Tumblr / Wordpress Plugin • Publisher Portal ‣ Editorial Tool ‣ Content Feeds • Web Crawl • Summarizer ‣ Create smart “blurbs” to advertise article • LinkedData ‣ Freebase, Wikipedia, DBPedia, et cetera.
  • 15. 15 ACT Tool, Tumblr, Wordpress 15
  • 18. 18 Semantic Data Repository 18 • Data Master / Data Node ‣ Federated Semantic Data Managers ‣ SPARQL Triplestore (scalable cluster) ‣ Semantic Search ‣ Search Indexes (Semi-Structured and Full-Text Search) ‣ Lucene/Siren (Sindice) ‣ Facets, Frequency Counts ‣ Cache (In-Memory) ‣ Blob Store (Voldemort) ‣ Listener to Activity (Flume) ‣ User Activity (clicks) ‣ Content Activity (content updates) ‣ Near Real-Time Trends, Analysis ‣ Compute Algorithms (Stored Procedures in Groovy) ‣ Long Term Content Archive (offline)
  • 19. 19 Semantic Data Analysis 19 • Natural Language Processing ‣ Rules & Machine Learning, Training ‣ 500K articles per day, 4,000 unique sites ‣ Text Extraction, Section/Sentence Extraction ‣ Tokenization, Part-of-Speech, Noun/Verb Phrases ‣ Entity Extraction, Entity Normalization ‣ Topic Extraction, Summarization, Clustering • User Activity ‣ User Model (Personalization) • Semantic Inference ‣ F-Logic, Multi-Domain ‣ Linked Data Mash-Ups • Semantic Graph Topology ‣ Entity / Property Importance Metrics, Ranking, “PageRank” ‣ Which triples in LinkedData are interesting?
  • 20. 20 Content Selection Algorithms 20 • Model of User, Personalization ‣ Social Networks provide Context • Semantic Analysis of Content • Algorithms ‣ Maximize Relevancy / Relatedness (Meets Editorial Criteria) ‣ Maximize Click-Through ‣ Cute Kitten vs. Engagement Issue ‣ Maximize Monetization Goal: Content Exchange
  • 21. 21 Webservices 21 • REST ‣ Outputs RDF / JSON Data • Natural Language Processing ‣ Article to Semantic MetaData • Related Content ‣ Inputs: Content, Personalization, Algorithm ‣ Articles ‣ Semantic Mash-Ups ‣ Topics ‣ Entities • Semantic Query, Site Search • Storage, Content Repository
  • 22. 22 Content Distribution Services 22 • Customer Destinations (Traditional Business) ‣ Deep Integration • Publisher Widgets ‣ Levels of Lightweight Integration ‣ Example: Related-Content-Widget in JavaScript • Inform.com ‣ Topic Pages • Micro Sites ‣ Several Thousand Owned-and-Operated Domains/Sites, Topic Driven • Social Networks ‣ Facebook Tools: • Semantic SEO ‣ RDFa, SKOS
  • 25. 25 Using Facebook OpenGraph 25 Relevancy Algorithm: Combine: •Trending / Popular Topics •Trending / Popular Articles •Personalization “Liked” Topics •Personalization “Liked” Articles •User Profiles (“Users like you…”)
  • 28. 28 Inform: Author to Audience via Semantics 28
  • 29. 29 Thanks for your attention! 29 Questions? Contact Information: Marc Hadfield marc@inform.com