SlideShare a Scribd company logo
Automation and Text Mining Ben O'Steen 25 May 2010
Hi
Ben O'Steen (Software Engineer, Oxford University Libraries)
Ben O'Steen (Software Engineer, Oxford University Libraries) Freelance Enthusiast
“ Text-mining” (and related techniques): Processing  natural language  to gain  direct  and  contextual  information, often with a means to quantify this information's  accuracy.
Automation  (in this context): Making  decisions , providing additional  options  and increasing the amount of  information understood  by a system without the  need  for human effort.
“BI Search and Text Analytics”[1] says: “structured data in first place at 47%, trailed by unstructured (31%) and semi-structured data (22%).” 53%!  And this was amongst data management professionals! [1] “BI Search and Text Analytics”, 2006 http://tdwi.org/research/2007/07/tdwi-best-practices-reports.aspx Natural Language Processing
Natural Language Processing Taxing ; it is a difficult process often requiring multiple analyses and lots of compute power for reasonable response times.
Natural Language Processing Taxing ;  it is a difficult process often requiring multiple analyses and lots of compute power for reasonable response times.
Developing . Every year, new and better solutions are found.
Natural Language Processing Taxing ;  it is a difficult process often requiring multiple analyses and lots of compute power for reasonable response times.
Developing .  Every year, new and better solutions are found.
Multi-disciplinary . The skills required by the team are broad: Machine-learning, Linguistics, Statistics, Logic and so on.
Natural Language Processing “Natural Language Processing” 982,000 occurrences (Google)
+“Multidisciplinary” - 25,800
+“Multi-disciplinary” - 10,400
Natural Language Processing “Natural Language Processing” 982,000 occurrences (Google)
+“Multidisciplinary” - 25,800
+“Multi-disciplinary” - 10,400 ~27% of the time, 'NLP' occurs in a page with the term 'multidisciplinary'
Crude, but interesting result.
Natural Language Processing “Natural Language Processing” 982,000 occurrences (Google)
+“Multidisciplinary” - 25,800
+“Multi-disciplinary” - 10,400 ~27% of the time, 'NLP' occurs in a page with the term 'multidisciplinary'
Crude, but interesting result. “Text-mining” 557,000 occurrences (Google)
(I'll come back to why it is important to note that NLP is heavily multidisciplinary shortly)
'Real-world' NLP Machine-learning; 'MoreLikeThis' Amazon
Search-engines (from Google's to Apache Solr)
Google's Adwords: Advertising/Marketing
And just about any business that trades solely over the web.
'Real-world' NLP Machine-learning; 'MoreLikeThis'
Predictive analytics (game theory) Business/stockmarket predictive models
Customer 'churn'
Credit ratings
'Real-world' NLP Machine-learning; 'MoreLikeThis'
Predictive analytics (game theory)
Search and Indexing Bing - “Information Overload” campaign
TrueKnowledge – NLP + Semantic Knowledge base
'Smart' results: Not WYTIWYG (What You Typed is What You Get) but WYMIWYG (What You Meant is What You Get)
Specific information added to results, formatted based on the meaning of your search query.
'Real-world' NLP Machine-learning; 'MoreLikeThis'
Predictive analytics (game theory)

More Related Content

PDF
Distributed Natural Language Processing Systems in Python
PPTX
DMDS Winter 2015 Workshop 1 slides
PPT
Collaborative Ontology Building Project
PDF
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
PDF
Smart Data Webinar: Machine Learning Techniques for Analyzing Unstructured Bu...
PPTX
Real World NLP, ML, and Big Data
PDF
Natural Language Processing with Graph Databases and Neo4j
PDF
Natural Language Search with Knowledge Graphs (Haystack 2019)
Distributed Natural Language Processing Systems in Python
DMDS Winter 2015 Workshop 1 slides
Collaborative Ontology Building Project
R, Data Wrangling & Predicting NFL with Elo like Nate SIlver & 538
Smart Data Webinar: Machine Learning Techniques for Analyzing Unstructured Bu...
Real World NLP, ML, and Big Data
Natural Language Processing with Graph Databases and Neo4j
Natural Language Search with Knowledge Graphs (Haystack 2019)

What's hot (20)

PDF
Deep Learning: Application Landscape - March 2018
PPTX
Rigourous evaluation of nlp models in real world deployment
PDF
NLP & Machine Learning - An Introductory Talk
PPTX
Nautral Langauge Processing - Basics / Non Technical
PDF
Blenderbot
PDF
Big data and AI presentation slides
PPT
Gadgets pwn us? A pattern language for CALL
PDF
Semantic web, python, construction industry
PDF
The Next Generation of AI-powered Search
PDF
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
PPT
Natural Language Processing with Neo4j
PDF
Data structures and_algorithms_in_java
PPTX
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
PDF
Y conf talk - Andrej Karpathy
PDF
The Future of Search and AI
PDF
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
PDF
IACT-TAP New School Thinking - Prototype Workshop
PDF
Machine Learning in NLP
PPTX
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
PDF
Implications of GPT-3
Deep Learning: Application Landscape - March 2018
Rigourous evaluation of nlp models in real world deployment
NLP & Machine Learning - An Introductory Talk
Nautral Langauge Processing - Basics / Non Technical
Blenderbot
Big data and AI presentation slides
Gadgets pwn us? A pattern language for CALL
Semantic web, python, construction industry
The Next Generation of AI-powered Search
Data Science for Beginner by Chetan Khatri and Deptt. of Computer Science, Ka...
Natural Language Processing with Neo4j
Data structures and_algorithms_in_java
Smartlogic, Semaphore and Semantically Enhanced Search – For “Discovery”
Y conf talk - Andrej Karpathy
The Future of Search and AI
Transfer_Learning_for_Natural_Language_P_v3_MEAP.pdf
IACT-TAP New School Thinking - Prototype Workshop
Machine Learning in NLP
The Intersection of Robotics, Search and AI with Solr, MyRobotLab, and Deep L...
Implications of GPT-3
Ad

Viewers also liked (7)

PPT
Applied Control Systems Shiploader Acs
PPT
BEC Controls & Automation
PPT
Data Extraction
PPT
PPTX
Data mining
PPT
Data mining slides
 
PPT
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Applied Control Systems Shiploader Acs
BEC Controls & Automation
Data Extraction
Data mining
Data mining slides
 
Enterprise Architecture for Dummies - TOGAF 9 enterprise architecture overview
Ad

Similar to Text-mining and Automation (20)

PPTX
Nlp model
PDF
Text Analytics in Enterprise Search - Daniel Ling
PDF
Text Analytics in Enterprise Search
PDF
Aspects of NLP Practice
DOC
Semi-automatic Text MiningNK
PPT
The impact of standardized terminologies and domain-ontologies in multilingua...
PPT
Text Analytics for Semantic Computing
PDF
Nlp presentation
PPTX
Case study of Rujhaan.com (A social news app )
PDF
Veda Semantics - introduction document
PPTX
Text Analytics Past, Present & Future
PPTX
Natural language processing and search
PDF
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
PDF
NLP Workshop Presentation at Universitat de Barcelona
PPT
Business Intelligence Solution Using Search Engine
PDF
NetIKX Semantic Search Presentation
PPTX
natural language processing ktu syllabus Module 2
PPTX
CS8691 – Artificial Intelligence unit questions
PDF
Best Practices for Large Scale Text Mining Processing
PPTX
Designing and Implementing Search Solutions
Nlp model
Text Analytics in Enterprise Search - Daniel Ling
Text Analytics in Enterprise Search
Aspects of NLP Practice
Semi-automatic Text MiningNK
The impact of standardized terminologies and domain-ontologies in multilingua...
Text Analytics for Semantic Computing
Nlp presentation
Case study of Rujhaan.com (A social news app )
Veda Semantics - introduction document
Text Analytics Past, Present & Future
Natural language processing and search
UKSG webinar - Introduction to Text-Mining Research Papers with Petr Knoth an...
NLP Workshop Presentation at Universitat de Barcelona
Business Intelligence Solution Using Search Engine
NetIKX Semantic Search Presentation
natural language processing ktu syllabus Module 2
CS8691 – Artificial Intelligence unit questions
Best Practices for Large Scale Text Mining Processing
Designing and Implementing Search Solutions

More from benosteen (20)

PPTX
Arches Getty Brownbag Talk
PPTX
Bl labs ucl-services
PPTX
Bl labs what is british library labs
PDF
British Library Labs - Overview Talk 2017
PDF
Uses of Library Collections
PDF
CityLIS talk, Feb 1st 2016
PDF
NDF,Te Papa, New Zealand 2015 - Keynote
PDF
British library labs - What? Why?
PDF
UKSG 2015 Mechanical curator and British Library labs
PDF
Lightning Talk - LDCX 2015 Stanford
PDF
104 Communicating our Collections Online
PDF
Sharing and Serendipity
PDF
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
PDF
BL Labs 2014 Symposium: The Mechanical Curator
PPTX
The surprising adventures of the mechanical curator
PPTX
Mechanical curator - Technical notes
PPTX
Apache pig as a researcher’s stepping stone
PPTX
New methods of access and discoverability bring new affordances for digital r...
PPTX
Visualising Knowledge: Why? What? How?
PDF
Mashspa
Arches Getty Brownbag Talk
Bl labs ucl-services
Bl labs what is british library labs
British Library Labs - Overview Talk 2017
Uses of Library Collections
CityLIS talk, Feb 1st 2016
NDF,Te Papa, New Zealand 2015 - Keynote
British library labs - What? Why?
UKSG 2015 Mechanical curator and British Library labs
Lightning Talk - LDCX 2015 Stanford
104 Communicating our Collections Online
Sharing and Serendipity
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
BL Labs 2014 Symposium: The Mechanical Curator
The surprising adventures of the mechanical curator
Mechanical curator - Technical notes
Apache pig as a researcher’s stepping stone
New methods of access and discoverability bring new affordances for digital r...
Visualising Knowledge: Why? What? How?
Mashspa

Recently uploaded (20)

PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Encapsulation theory and applications.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPT
Teaching material agriculture food technology
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Empathic Computing: Creating Shared Understanding
PDF
cuic standard and advanced reporting.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
KodekX | Application Modernization Development
PPTX
Cloud computing and distributed systems.
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
MYSQL Presentation for SQL database connectivity
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Encapsulation theory and applications.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
Teaching material agriculture food technology
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation_ Review paper, used for researhc scholars
Empathic Computing: Creating Shared Understanding
cuic standard and advanced reporting.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
KodekX | Application Modernization Development
Cloud computing and distributed systems.
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Text-mining and Automation