SlideShare a Scribd company logo
Applied Enterprise
Semantic Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #220 Atlanta GA
May 18, 2013
Networking
Interactive
About MarkTab
Training and Consulting with
http://marktab.com
Data Mining Resources and Blog at
http://marktab.net
Twitter @marktabnet
Quick Look
My Semantic Search
Interactive
Name three things you want from enterprise text
mining
Introduction
SQL Server 2012 has new Programmability Enhancements
Statistical Semantic Search
File Tables
Full-Text Search Improvements
These combined technologies make SQL Server 2012 a strong contender in text
mining
Outline
Why Microsoft is competitive for data mining
Definitions: what is text mining?
History: how Microsoft’s semantic search was born
What is inside semantic search
Logical model
Demos
Performance
Microsoft Resources
Why Microsoft is
Competitive for Data
Mining
Based on 2012 and 2013 Surveys
Gartner 2013
Magic Quadrant for
Business Intelligence
and Analytics
Platforms
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
– February 5, 2013
Gartner 2013
Magic Quadrant for
Data Warehouse
Database
Management
Systems
Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
– January 31, 2013
KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/
Definitions
What is text mining?
Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Text mining is the automated or semi-automated process of
discovering patterns from textual data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
Purposes
Phrase Goal
“Data Mining”
“Text Mining”
Inform actionable decisions
“Machine
Learning”
Determine best performing
algorithm
MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO
Science needs science fiction -- MarkTab
MarkTab Decision Cycle
Analysis
(science)
Synthesis
(art)
GO
History
How Microsoft’s semantic search came to be
History
July 2008
Microsoft purchases Powerset for US$100 Million
Google Dismisses Semantic Search
http://venturebeat.com/2008/06/26/microsoft-to-buy-semantic-search-engine-
powerset-for-100m-plus/
http://www.forbes.com/2008/07/01/powerset-msft-search-tech-intel-
cx_ag_0701powerset.html
History
March 2009
Google announces “snippets” as relevant to search
The media picks this story up as “semantic search”
http://googleblog.blogspot.com/2009/03/two-new-improvements-to-google-
results.html#!/2009/03/two-new-improvements-to-google-results.html
History
February 2012
Google announces Knowledge Graph, an explicit application of semantic search
http://mashable.com/2012/02/13/google-knowledge-graph-change-search/
History
April 2012
Microsoft purchases 800+ patents from AOL for US$1 Billion
Among the patents are semantic search and metadata querying – older than
Google
http://www.theregister.co.uk/2012/04/09/aol_microsoft_patent_deal/
What is inside Semantic
Search
Text Mining introduced for SQL Server 2012
Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
Statistical Semantic Search
Comprises some aspects of text mining
Identifies statistically relevant key phrases
Based on these phrases, can identify (by score) similar documents
FileTables
Built on existing SQL Server FILESTREAM technology
Files and documents
Stored in special tables in SQL Server
Accessed if they were stored in the file system
Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
Logical Model
How semantic search works
Rowset
Output
with Scores
Varchar
NVarchar
Office
PDF
From Documents to Output
(iFilter Required)
Documents
Full-Text
Keyword
Index
“FTI”
iFilters
Semantic Document
Similarity Index “DSI”
Semantic
Database
Semantic
Key Phrase
Index –
Tag Index
“TI”
Languages Currently Supported
Traditional Chinese
German
English
French
Italian
Brazilian
Russian
Swedish
Simplified Chinese
British English
Portuguese
Chinese (Hong Kong SAR, PRC)
Spanish
Chinese (Singapore)
Chinese (Macau SAR)
Phases of Semantic Indexing
Full Text Keyword Index “FTI”
Semantic Key Phrase Index –
Tag Index “TI”
Semantic Document Similarity
Index “DSI”
http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Interactive Demo
SQL Server Management Studio
Semantic Search and
SQL Server Data Mining
SQL Server Data Tools: data mining plus text mining
Performance
The Million-Dollar Edge
Integrated Full Text Search (iFTS)
Improved Performance and Scale:
Scale-up to 350M documents for storage and search
iFTS query performance 7-10 times faster than in SQL Server 2008
Worst-case iFTS query response times less than 3 sec for corpus
Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry
Time in Seconds vs. Number of Documents
(2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Text Mining References
Video
http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
Search
http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources
Links
Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
http://office.microsoft.com/en-us/try
Organizations
Professional Association for SQL Server http://www.sqlpass.org
Atlanta MDF http://www.atlantamdf.com/
Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
Interactive
Takeaways
Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com
Conclusion
SQL Server Data Mining 2012 provides data mining and semantic search
The core technology allows document similarity matching
The results can be combined with SQL Server Data Mining (such as
Association Analysis)

More Related Content

PDF
Applied Semantic Search 201306
PDF
Applied Enterprise Semantic Mining -- Charlotte 201410
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
Fried data summit data quality data analytics together
PPTX
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
PPTX
The Evolution of Search and Big Data
PDF
Smarter content with a Dynamic Semantic Publishing Platform
PDF
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL
Applied Semantic Search 201306
Applied Enterprise Semantic Mining -- Charlotte 201410
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Fried data summit data quality data analytics together
Getting the most ouf of SharePoint Search - Tulsa SharePoint Interest Group
The Evolution of Search and Big Data
Smarter content with a Dynamic Semantic Publishing Platform
Secrets of Enterprise Data Mining: SQL Saturday 328 Birmingham AL

What's hot (20)

PPTX
Beyond the Basics 3: Introduction to the MongoDB BI Connector
PPT
Structured Document Search and Retrieval
PPTX
Google search vs Solr search for Enterprise search
PDF
Vital AI: Big Data Modeling
PDF
PID Services for FAIR data
PDF
PID services - understandability and findability of data
PDF
Building Knowledge Graphs in 10 steps
PDF
Linked Open Data in the World of Patents
PDF
What is Business Intelligence
PDF
Intro to GraphQL
PPTX
Social Media Data Collection & Analysis
PPTX
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
PPTX
Share point metadata
PDF
GraphDB Cloud: Enterprise Ready RDF Database on Demand
PPTX
Jeremy cabral search marketing summit - scraping data-driven content (1)
PDF
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
PPTX
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
PDF
Democratizing Data within your organization - Data Discovery
PPTX
Azure Database Options - NoSql vs Sql
PDF
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Beyond the Basics 3: Introduction to the MongoDB BI Connector
Structured Document Search and Retrieval
Google search vs Solr search for Enterprise search
Vital AI: Big Data Modeling
PID Services for FAIR data
PID services - understandability and findability of data
Building Knowledge Graphs in 10 steps
Linked Open Data in the World of Patents
What is Business Intelligence
Intro to GraphQL
Social Media Data Collection & Analysis
Robert Isele | eccenca CorporateMemory - Semantically integrated Enterprise D...
Share point metadata
GraphDB Cloud: Enterprise Ready RDF Database on Demand
Jeremy cabral search marketing summit - scraping data-driven content (1)
Supporting GDPR Compliance through effectively governing Data Lineage and Dat...
NoSQL: what does it mean, how did we get here, and why should I care? - Hugo ...
Democratizing Data within your organization - Data Discovery
Azure Database Options - NoSql vs Sql
Edgard Marx, Amrapali Zaveri, Diego Moussallem and Sandro Rautenberg | DBtren...
Ad

Viewers also liked (20)

PDF
Data Mining Innovation with SQL Server 2014: SQL Saturday 328 Birmingham AL
PPTX
Mine craft:
PDF
7 ideas on encouraging advanced analytics
PDF
Secrets of Enterprise Data Mining 201310
PDF
Social Marketing 201306
PDF
Data science guide for PASS Summit 2014
PDF
Data Mining Innovation with SQL Server 2014 -- Charlotte 201410
PPTX
Tips on Coding Microsoft Azure ML web service 201412
PDF
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
PDF
Microsoft Technologies for Data Science 201601
PDF
Introduction to Advanced Analytics with SharePoint Composites
PDF
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
PDF
Data Mining Innovation 201306
PDF
Microsoft Technologies for Data Science sql_saturday_201505
PDF
Primer on Power BI 201501
PDF
How Big Companies plan to use Our Big Data 201610
PDF
Earning SQL Server 2014 Certification: SQL Saturday Oregon 201411
PDF
Microsoft Data Science Technologies 201608
PDF
Primer on Power BI 20151003
PDF
.Net development with Azure Machine Learning (AzureML) Nov 2014
Data Mining Innovation with SQL Server 2014: SQL Saturday 328 Birmingham AL
Mine craft:
7 ideas on encouraging advanced analytics
Secrets of Enterprise Data Mining 201310
Social Marketing 201306
Data science guide for PASS Summit 2014
Data Mining Innovation with SQL Server 2014 -- Charlotte 201410
Tips on Coding Microsoft Azure ML web service 201412
Insider's introduction to microsoft azure machine learning: 201411 Seattle Bu...
Microsoft Technologies for Data Science 201601
Introduction to Advanced Analytics with SharePoint Composites
Secrets of Enterprise Data Mining: SQL Saturday Oregon 201411
Data Mining Innovation 201306
Microsoft Technologies for Data Science sql_saturday_201505
Primer on Power BI 201501
How Big Companies plan to use Our Big Data 201610
Earning SQL Server 2014 Certification: SQL Saturday Oregon 201411
Microsoft Data Science Technologies 201608
Primer on Power BI 20151003
.Net development with Azure Machine Learning (AzureML) Nov 2014
Ad

Similar to Applied Enterprise Semantic Search 201305 (20)

PDF
Applied Semantic Search with Microsoft SQL Server
PDF
Sql Saturday 111 Atlanta applied enterprise semantic mining
PDF
Secrets of Enterprise Data Mining 201305
PPT
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
PPTX
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
PPTX
Jonathan Ralton - Trusting Your KM & ECM Strategy To SharePoint
PPTX
Arquitectura de Datos en Azure
PPTX
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
PPT
Business Intelligence Solution Using Search Engine
PDF
GraphSummit - Process Tempo - Build Graph Applications.pdf
PDF
Discover BigQuery ML, build your own CREATE MODEL statement
PPT
PoolParty - Metadata Management made easy
PPTX
SMX Advanced 2012 - Catching up with the Semantic Web
PPTX
The Role of Structured Data in Modern SEO [Zagreb SEO Summit 2025]
PPTX
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
PPTX
ALT-F1 Techtalk 3 - Google AppEngine
PPTX
Web Search Engine, Web Crawler, and Semantics Web
PDF
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
PDF
You Don't Know SEO
PPTX
MongoDB Schema Design by Examples
Applied Semantic Search with Microsoft SQL Server
Sql Saturday 111 Atlanta applied enterprise semantic mining
Secrets of Enterprise Data Mining 201305
SharePoint Jumpstart #2 Making Basic SharePoint Search Work
INFOGOV14 - Trusting Your KM & ECM Strategy to SharePoint
Jonathan Ralton - Trusting Your KM & ECM Strategy To SharePoint
Arquitectura de Datos en Azure
Key Imperatives for the CIO in Digital Age By Lalatendu Das Digital VP, Assoc...
Business Intelligence Solution Using Search Engine
GraphSummit - Process Tempo - Build Graph Applications.pdf
Discover BigQuery ML, build your own CREATE MODEL statement
PoolParty - Metadata Management made easy
SMX Advanced 2012 - Catching up with the Semantic Web
The Role of Structured Data in Modern SEO [Zagreb SEO Summit 2025]
Enhancing Relevancy & User Experience with #SharePoint Search sps-philly 2015
ALT-F1 Techtalk 3 - Google AppEngine
Web Search Engine, Web Crawler, and Semantics Web
Empowering your Enterprise with a Self-Service Data Marketplace (EMEA)
You Don't Know SEO
MongoDB Schema Design by Examples

More from Mark Tabladillo (20)

PDF
How to find low-cost or free data science resources 202006
PDF
Microsoft Build 2020: Data Science Recap
PDF
201909 Automated ML for Developers
PDF
201908 Overview of Automated ML
PDF
201906 01 Introduction to ML.NET 1.0
PDF
201906 04 Overview of Automated ML June 2019
PDF
201906 03 Introduction to NimbusML
PDF
201906 02 Introduction to AutoML with ML.NET 1.0
PDF
201905 Azure Databricks for Machine Learning
PDF
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
PDF
Big Data Advanced Analytics on Microsoft Azure 201904
PDF
Managing Enterprise Data Science 201904
PDF
Training of Python scikit-learn models on Azure
PDF
Big Data Adavnced Analytics on Microsoft Azure
PDF
Advanced Analytics with Power BI 201808
PDF
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
PDF
Machine learning services with SQL Server 2017
PDF
Microsoft Technologies for Data Science 201612
PDF
Georgia Tech Data Science Hackathon September 2016
PDF
Insider's guide to azure machine learning 201606
How to find low-cost or free data science resources 202006
Microsoft Build 2020: Data Science Recap
201909 Automated ML for Developers
201908 Overview of Automated ML
201906 01 Introduction to ML.NET 1.0
201906 04 Overview of Automated ML June 2019
201906 03 Introduction to NimbusML
201906 02 Introduction to AutoML with ML.NET 1.0
201905 Azure Databricks for Machine Learning
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Big Data Advanced Analytics on Microsoft Azure 201904
Managing Enterprise Data Science 201904
Training of Python scikit-learn models on Azure
Big Data Adavnced Analytics on Microsoft Azure
Advanced Analytics with Power BI 201808
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Machine learning services with SQL Server 2017
Microsoft Technologies for Data Science 201612
Georgia Tech Data Science Hackathon September 2016
Insider's guide to azure machine learning 201606

Recently uploaded (20)

PDF
Module 2 - Modern Supervison Challenges - Student Resource.pdf
PPTX
sales presentation، Training Overview.pptx
PDF
How to Get Approval for Business Funding
PDF
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
PPTX
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
PDF
Keppel_Proposed Divestment of M1 Limited
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf
PDF
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
PDF
Digital Marketing & E-commerce Certificate Glossary.pdf.................
PDF
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
PDF
Tata consultancy services case study shri Sharda college, basrur
PDF
Introduction to Generative Engine Optimization (GEO)
PDF
Building a Smart Pet Ecosystem: A Full Introduction to Zhejiang Beijing Techn...
PDF
NEW - FEES STRUCTURES (01-july-2024).pdf
PDF
Comments on Crystal Cloud and Energy Star.pdf
PDF
Cours de Système d'information about ERP.pdf
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PPTX
operations management : demand supply ch
PDF
How to Get Funding for Your Trucking Business
Module 2 - Modern Supervison Challenges - Student Resource.pdf
sales presentation، Training Overview.pptx
How to Get Approval for Business Funding
NewBase 12 August 2025 Energy News issue - 1812 by Khaled Al Awadi_compresse...
Board-Reporting-Package-by-Umbrex-5-23-23.pptx
Keppel_Proposed Divestment of M1 Limited
Principles of Marketing, Industrial, Consumers,
Module 3 - Functions of the Supervisor - Part 1 - Student Resource (1).pdf
pdfcoffee.com-opt-b1plus-sb-answers.pdfvi
Digital Marketing & E-commerce Certificate Glossary.pdf.................
ANALYZING THE OPPORTUNITIES OF DIGITAL MARKETING IN BANGLADESH TO PROVIDE AN ...
Tata consultancy services case study shri Sharda college, basrur
Introduction to Generative Engine Optimization (GEO)
Building a Smart Pet Ecosystem: A Full Introduction to Zhejiang Beijing Techn...
NEW - FEES STRUCTURES (01-july-2024).pdf
Comments on Crystal Cloud and Energy Star.pdf
Cours de Système d'information about ERP.pdf
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
operations management : demand supply ch
How to Get Funding for Your Trucking Business

Applied Enterprise Semantic Search 201305