SlideShare a Scribd company logo
Calais PAWS Sep 4, 2008
Calais?
ClearForest Founded in 1998 by text analytics pioneers A software organization that enables Intelligent Information Enterprise and government customers Led the market in the establishment of unstructured text as a key corporate asset Acquired by Reuters June 2007 Offices: Boston, Israel
The Text Problem People consume  text Most of it  isn’t  semantically enabled Most of it  won’t be  semantically enabled Why: Latency, cost and short shelf-life
Calais’ Piece of the Puzzle A semantic metadata generation service that extracts entities, facts and events from unstructured text Two new capabilities: topics & relevance Available for commercial or non-commercial use up to 40,000 times per day Calais Named Entities Facts Events People, Companies, Geographies, Albums, Authors, etc. Position, Alliance, Education, Political Affiliation, etc. Management Change, IPO, Labor Action, Sporting, Entertainment etc. Unstructured Documents (Text / HTML / XML)
Reuters Announced the Acquisition of ClearForest  New York - April 30, 2007 Reuters, the global information company, has entered into an agreement to acquire all of the outstanding shares of ClearForest Ltd., a privately held provider of Text Analytics solutions, whose tagging platform and analytical products allow clients to derive precise business information from huge amounts of textual content. ClearForest has received sufficient shareholder approval to complete the transaction, which is expected to close in approximately 30 days, subject to customary closing conditions. The financial terms were not disclosed. Reuters plans to retain and continue to work with the existing management team and their highly skilled workforces in the US and Israel. It also plans to continue to support existing products and customers. Reuters believes that search will be a pivotal element to the future of how financial information is sourced and consumed. As part of its drive into this space, Reuters has created a new strategic group and appointed Gerry Campbell, who will oversee the integration of ClearForest and drive this innovation.  <Topic>M&A</Topic>  <Acquisition offset=&quot;494&quot; length=&quot;130&quot;>    <Company_Acquirer>Reuters</Company_Acquirer>     <Company_Acquired>ClearForest Ltd.</Company_Acquired>     <Status>Planned</Status>  </Acquisition> <Company>Reuters</Company>  <Company>ClearForest Ltd.</Company>  <Product>Text Analytic Solution </Product>   <Company>ClearForest Ltd.</Company>  <Company>Reuters</Company>  <Country>United States</Country>  <Country>Israel</Country>  <Company>Reuters</Company>  <Person>Gerry Campbell</Person>  <ManagementChange offset=&quot;2789&quot; length=&quot;92&quot;> <Person>Gerry Campbell</Person>  <Company>Reuters</Company>  <Action>Enters</Position>  </ManagementChange>
What’s Behind and Event … An Example Digital Marketing Services,Inc. (DMS), the leading provider of online marketing research and a division of America Online Inc. (AOL), today announced an alliance with Netcentives Inc. (Nasdaq: NCNT) Extracted instances: Company = Digital Marketing Services, Inc. Company = Netcentives Inc. Status = announced DateString = today Date = 2000-01-31
Live Example Viewer Demo Gnosis Demo
Extending Calais’ Reach More than just a web service – a growing collection of tools and applications to make it valuable in the real world Calais Browser Extensions Gnosis Content Management Tools WordPress Drupal UIMA Development  Tools & Libraries PHP Ruby JAVA .NET Applications And more… TopBraid RSS Tagger Powerhouse LinkedFacts Wirecatch FeedShaver
How Calais is Being Used Today Gist   Automatically aggregates multiple news sources and automatically slots them into topic, etc.
The Stack ClearForest Tags Platform File Based Connector Programmatic API (SOAP web Service) RDBMS  Connector Web Crawlers (Agents) Console Rich XML Live Feed Tooling Modeler Developer Cat Manager A F External Content/live feed/Enterprise Content ClearForest Extraction Modules B ClearForest Categorizer C
Detailed Stack Rich XML Rich XML ClearForest Tags Platform Files Document Conversion and Normalization Control DB Tags   API Control API File Based API Programmatic API (SOAP web Service) Web Agents RDBMS based API Enterprise System Categorizer Semantic Tagging Language ID Headline Generation Classifier Extraction Modules Language Classifier Templates Categorization Manager ClearForest Dvlpr/Modeler Languages Configuration Key Concepts Configuration ClearForest Studio Rich XML External Feed Configuration & Monitoring Console Farm Manager
Platform Highlights Single run-time platform for all  technologies   Modular architecture Additional functional plug-in can be added anywhere  Web services interfaces SOA ready Java based Programmatic API to all components Farming support for scalability Best practices/standards (XML, Unicode, Architectural Patterns, Design patterns …)
File API Programmatic API (SOAP web Service) RDBMS  based API Web Custom Document Tagging (Doc Runner) Categorization Information extraction Control Console Control API Tags Pipeline KB Writer DB Writer XML Writer IO Bound Rich XML ANS Collection DB Other (Headline Generation) Document Conversion Conversion  & Normalization PDF Conv. XML Conv. Doc Conv. File/Web/DB based API (Document Provider) Profile Listener Listener Listener Language identification Queues: CPU Bound Web Document Injector (flight plan) Technology
The NLP Stack Events & Facts Entities Candidates, Resolution, Normalization Basic NLP Noun Groups, Verb Groups, Numbers Phrases, Abbreviations Metadata Analysis Title, Date, Body, Paragraph Sentence Marking Morphological Analyzer POS Tagging (per word) Stem, Tense, Aspect, Singular/Plural Gender, Prefix/Suffix Separation Tokenization
Calais, Semantics and the Semantic Web Issues, Opportunities Ontologies How do we make this a community effort? Dereferenceable URI’s & Endpoints Engineering Population Basic data Links Proprietary data sources Functions? Code?
What’s in the Pipeline? 2008 The basics of de-referenceable URI’s Disambiguation – company & geography Hooks 2009 (this is a fuzzy list) Person disambiguation (social networks?) Other disambiguation Continued population of endpoints Calais as hub Exposure of the IDE User managed lexicons Lots and lots of hooks
www.opencalais.com Gallery – code and applications examples Forums Documentation

More Related Content

PPT
Calais @ the SD Forum
PPTX
Conclusions - Linked Data
PDF
Choosing the Right Graph Database to Succeed in Your Project
PDF
Do I need a Graph Database?
PPT
Web 3 0 Krista Thomas 1 26 10
PDF
Knowledge Graphs Webinar- 11/7/2017
PDF
Marketing vs Technology
PDF
Structured SEO Data Overview and How To
Calais @ the SD Forum
Conclusions - Linked Data
Choosing the Right Graph Database to Succeed in Your Project
Do I need a Graph Database?
Web 3 0 Krista Thomas 1 26 10
Knowledge Graphs Webinar- 11/7/2017
Marketing vs Technology
Structured SEO Data Overview and How To

What's hot (6)

PDF
Integrating Relational Databases with the Semantic Web: A Reflection
PDF
Metadata Primer
PDF
Business intelligence 3.0 and the data lake
PDF
Fried data summit big data for lob content
PDF
Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making
PDF
Structured SEO Data: An overview and how to for Drupal
Integrating Relational Databases with the Semantic Web: A Reflection
Metadata Primer
Business intelligence 3.0 and the data lake
Fried data summit big data for lob content
Fast Data Mining: Real Time Knowledge Discovery for Predictive Decision Making
Structured SEO Data: An overview and how to for Drupal
Ad

Similar to Calais @ the Palo Alto Semantic Web Meetup (20)

PPT
Intro To The Calais Web Service @ OpenCalais.com
PPT
Phase2 OpenPublish Presentation SF SemWeb Meetup, April 28, 2009
PPT
Open Calais Release 4.0
PPT
Final Calais For ONA
PPT
Calais For Ona
PPT
Intro to oc + publisher case studies may 2010
PPT
Five Ways To Calais V01
PDF
Semantically enriching content using OpenCalais
PDF
Publisher whitepaper
PDF
Simple OpenCalais Whitepaper
PPTX
Text Mining Infrastructure in R
PDF
SA2: Text Mining from User Generated Content
PPT
Open Calais Workshop at WeMedia 2010
PPT
The OpenCalais Workshop at WeMedia 2010
PDF
San diego
PDF
San diego
PDF
OpenCalais At The San Diego Software Industry Council
PPT
Harvesting and semantically tagging media releases from political websites us...
PPTX
Text Analytics Past, Present & Future
PPT
SemanticWebApp
Intro To The Calais Web Service @ OpenCalais.com
Phase2 OpenPublish Presentation SF SemWeb Meetup, April 28, 2009
Open Calais Release 4.0
Final Calais For ONA
Calais For Ona
Intro to oc + publisher case studies may 2010
Five Ways To Calais V01
Semantically enriching content using OpenCalais
Publisher whitepaper
Simple OpenCalais Whitepaper
Text Mining Infrastructure in R
SA2: Text Mining from User Generated Content
Open Calais Workshop at WeMedia 2010
The OpenCalais Workshop at WeMedia 2010
San diego
San diego
OpenCalais At The San Diego Software Industry Council
Harvesting and semantically tagging media releases from political websites us...
Text Analytics Past, Present & Future
SemanticWebApp
Ad

More from Krista Thomas (6)

PDF
Ad.ly Introduction
PDF
Web 3 0 Krista Thomas 1 26 10
PDF
OpenCalais @ UC Berkeley Media Technology Summit 9/29/09
PPT
Open Calais @ Transparent Text
PDF
Tague Semtech Keynote 2009
PDF
Open Calais For SF And LA Meetups
Ad.ly Introduction
Web 3 0 Krista Thomas 1 26 10
OpenCalais @ UC Berkeley Media Technology Summit 9/29/09
Open Calais @ Transparent Text
Tague Semtech Keynote 2009
Open Calais For SF And LA Meetups

Recently uploaded (20)

PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Accuracy of neural networks in brain wave diagnosis of schizophrenia
PDF
1 - Historical Antecedents, Social Consideration.pdf
PPTX
Tartificialntelligence_presentation.pptx
PPTX
cloud_computing_Infrastucture_as_cloud_p
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PDF
Mushroom cultivation and it's methods.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
WOOl fibre morphology and structure.pdf for textiles
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PDF
project resource management chapter-09.pdf
PPTX
TLE Review Electricity (Electricity).pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Heart disease approach using modified random forest and particle swarm optimi...
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf
Accuracy of neural networks in brain wave diagnosis of schizophrenia
1 - Historical Antecedents, Social Consideration.pdf
Tartificialntelligence_presentation.pptx
cloud_computing_Infrastucture_as_cloud_p
Group 1 Presentation -Planning and Decision Making .pptx
OMC Textile Division Presentation 2021.pptx
Mushroom cultivation and it's methods.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
WOOl fibre morphology and structure.pdf for textiles
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
DP Operators-handbook-extract for the Mautical Institute
project resource management chapter-09.pdf
TLE Review Electricity (Electricity).pptx
Programs and apps: productivity, graphics, security and other tools
Heart disease approach using modified random forest and particle swarm optimi...
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
Agricultural_Statistics_at_a_Glance_2022_0.pdf

Calais @ the Palo Alto Semantic Web Meetup

  • 1. Calais PAWS Sep 4, 2008
  • 3. ClearForest Founded in 1998 by text analytics pioneers A software organization that enables Intelligent Information Enterprise and government customers Led the market in the establishment of unstructured text as a key corporate asset Acquired by Reuters June 2007 Offices: Boston, Israel
  • 4. The Text Problem People consume text Most of it isn’t semantically enabled Most of it won’t be semantically enabled Why: Latency, cost and short shelf-life
  • 5. Calais’ Piece of the Puzzle A semantic metadata generation service that extracts entities, facts and events from unstructured text Two new capabilities: topics & relevance Available for commercial or non-commercial use up to 40,000 times per day Calais Named Entities Facts Events People, Companies, Geographies, Albums, Authors, etc. Position, Alliance, Education, Political Affiliation, etc. Management Change, IPO, Labor Action, Sporting, Entertainment etc. Unstructured Documents (Text / HTML / XML)
  • 6. Reuters Announced the Acquisition of ClearForest New York - April 30, 2007 Reuters, the global information company, has entered into an agreement to acquire all of the outstanding shares of ClearForest Ltd., a privately held provider of Text Analytics solutions, whose tagging platform and analytical products allow clients to derive precise business information from huge amounts of textual content. ClearForest has received sufficient shareholder approval to complete the transaction, which is expected to close in approximately 30 days, subject to customary closing conditions. The financial terms were not disclosed. Reuters plans to retain and continue to work with the existing management team and their highly skilled workforces in the US and Israel. It also plans to continue to support existing products and customers. Reuters believes that search will be a pivotal element to the future of how financial information is sourced and consumed. As part of its drive into this space, Reuters has created a new strategic group and appointed Gerry Campbell, who will oversee the integration of ClearForest and drive this innovation. <Topic>M&A</Topic> <Acquisition offset=&quot;494&quot; length=&quot;130&quot;>   <Company_Acquirer>Reuters</Company_Acquirer>   <Company_Acquired>ClearForest Ltd.</Company_Acquired>   <Status>Planned</Status> </Acquisition> <Company>Reuters</Company> <Company>ClearForest Ltd.</Company> <Product>Text Analytic Solution </Product> <Company>ClearForest Ltd.</Company> <Company>Reuters</Company> <Country>United States</Country> <Country>Israel</Country> <Company>Reuters</Company> <Person>Gerry Campbell</Person> <ManagementChange offset=&quot;2789&quot; length=&quot;92&quot;> <Person>Gerry Campbell</Person> <Company>Reuters</Company> <Action>Enters</Position> </ManagementChange>
  • 7. What’s Behind and Event … An Example Digital Marketing Services,Inc. (DMS), the leading provider of online marketing research and a division of America Online Inc. (AOL), today announced an alliance with Netcentives Inc. (Nasdaq: NCNT) Extracted instances: Company = Digital Marketing Services, Inc. Company = Netcentives Inc. Status = announced DateString = today Date = 2000-01-31
  • 8. Live Example Viewer Demo Gnosis Demo
  • 9. Extending Calais’ Reach More than just a web service – a growing collection of tools and applications to make it valuable in the real world Calais Browser Extensions Gnosis Content Management Tools WordPress Drupal UIMA Development Tools & Libraries PHP Ruby JAVA .NET Applications And more… TopBraid RSS Tagger Powerhouse LinkedFacts Wirecatch FeedShaver
  • 10. How Calais is Being Used Today Gist Automatically aggregates multiple news sources and automatically slots them into topic, etc.
  • 11. The Stack ClearForest Tags Platform File Based Connector Programmatic API (SOAP web Service) RDBMS Connector Web Crawlers (Agents) Console Rich XML Live Feed Tooling Modeler Developer Cat Manager A F External Content/live feed/Enterprise Content ClearForest Extraction Modules B ClearForest Categorizer C
  • 12. Detailed Stack Rich XML Rich XML ClearForest Tags Platform Files Document Conversion and Normalization Control DB Tags API Control API File Based API Programmatic API (SOAP web Service) Web Agents RDBMS based API Enterprise System Categorizer Semantic Tagging Language ID Headline Generation Classifier Extraction Modules Language Classifier Templates Categorization Manager ClearForest Dvlpr/Modeler Languages Configuration Key Concepts Configuration ClearForest Studio Rich XML External Feed Configuration & Monitoring Console Farm Manager
  • 13. Platform Highlights Single run-time platform for all technologies Modular architecture Additional functional plug-in can be added anywhere Web services interfaces SOA ready Java based Programmatic API to all components Farming support for scalability Best practices/standards (XML, Unicode, Architectural Patterns, Design patterns …)
  • 14. File API Programmatic API (SOAP web Service) RDBMS based API Web Custom Document Tagging (Doc Runner) Categorization Information extraction Control Console Control API Tags Pipeline KB Writer DB Writer XML Writer IO Bound Rich XML ANS Collection DB Other (Headline Generation) Document Conversion Conversion & Normalization PDF Conv. XML Conv. Doc Conv. File/Web/DB based API (Document Provider) Profile Listener Listener Listener Language identification Queues: CPU Bound Web Document Injector (flight plan) Technology
  • 15. The NLP Stack Events & Facts Entities Candidates, Resolution, Normalization Basic NLP Noun Groups, Verb Groups, Numbers Phrases, Abbreviations Metadata Analysis Title, Date, Body, Paragraph Sentence Marking Morphological Analyzer POS Tagging (per word) Stem, Tense, Aspect, Singular/Plural Gender, Prefix/Suffix Separation Tokenization
  • 16. Calais, Semantics and the Semantic Web Issues, Opportunities Ontologies How do we make this a community effort? Dereferenceable URI’s & Endpoints Engineering Population Basic data Links Proprietary data sources Functions? Code?
  • 17. What’s in the Pipeline? 2008 The basics of de-referenceable URI’s Disambiguation – company & geography Hooks 2009 (this is a fuzzy list) Person disambiguation (social networks?) Other disambiguation Continued population of endpoints Calais as hub Exposure of the IDE User managed lexicons Lots and lots of hooks
  • 18. www.opencalais.com Gallery – code and applications examples Forums Documentation

Editor's Notes

  • #2: First draft, with beautiful work by Sagit. Note that ALL text is editable.