SlideShare a Scribd company logo
Secrets of Enterprise
Data Mining
Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT)
PASS SQL Saturday #177 Mountain View, CA
February 23, 2013
Networking
Interactive
About MarkTab
Training and Consulting with        Ph.D. – Industrial Engineering,
http://marktab.com                  Georgia Tech
Data Mining Resources and Blog at   Training and consulting
http://marktab.net                  internationally across many
                                    industries – SAS and Microsoft
                                    Contributed to peer-reviewed
                                    research and legislation
                                      Mentoring doctoral dissertations at the
                                      accredited University of Phoenix
                                    Presenter
Interactive
Name (up to) three things you want from enterprise
data mining
Secret: Excel data
mining
Excel add-in for SQL Server data mining
Secret: More than just
SQL Server
Microsoft continues to add machine learning
technology
Microsoft Offers
Bing
  Maps
Xbox Kinect
  Hacker Magnet
SQL Server 2012
  Analysis Services (Multidimensional and Data Mining)
  Integration Services
  Semantic Search
  Hadoop Partnership
Excel Projects from Microsoft Research
Definitions
What is data mining?
Definition
Data mining is the automated or semi-automated process of
discovering patterns in data
Machine learning is the development and optimization of
algorithms for automated or semi-automated pattern discovery
Purposes
    Phrase          Goal

    “Data Mining”   Inform actionable decisions



    “Machine        Determine best performing
    Learning”       algorithm
Secret: Give artists art
Data mining is part of a complete decision cycle
MarkTab Decision Cycle
                             GO




           Synthesis                 Analysis
               (art)                (science)


         Science needs science fiction -- MarkTab
MarkTab Decision Cycle
                      GO




          Synthesis        Analysis
            (art)          (science)
XKCD: Shopping Teams
XKCD: Shopping Teams
XKCD: Shopping Teams
Secret: Microsoft is an
analytics competitor
Industry Comparisons 2012-2013
Gartner 2013
           Magic Quadrant for
           Business Intelligence
           and Analytics
           Platforms




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb
  – February 5, 2013
Gartner 2013
           Magic Quadrant for
           Data Warehouse
           Database
           Management
           Systems




  Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb
  – January 31, 2013
KDNuggets 2012
http://marktab.net/datamining/2012/06/15/excel-number-
commercial-tool-analytics-data-mining-big-data/
SQL Server 2012
Business Intelligence and Business Analytics
New Platform options: managed services
   Platform       Infrastructure                         Platform                            Software
(Self Managed)     (as a Service)                      (as a Service)                      (as a Service)

  Applications     Applications                         Applications                        Applications

     Data              Data                                Data                                Data

   Runtime           Runtime                             Runtime                             Runtime

  Middleware       Middleware                           Middleware                          Middleware




                                                                                                            Managed Services
   Database          Database                            Database                            Database




                                                                        Managed Services
      O/S               O/S                                 O/S                                 O/S

 Virtualization    Virtualization                      Virtualization                      Virtualization




                                    Managed Services
    Servers           Servers                             Servers                             Servers

    Storage          Storage                              Storage                             Storage

  Networking       Networking                           Networking                          Networking
SQL Release timelines                                                                                                                 2008
                                                                                                                                 SQL Server 2008
                                                                                                                                                            2012
                                                                                                                                                      SQL Server 2012
                                                                                                                                                         AlwaysOn
                                                                                                                                                        Columnstore
      1989                   1993                                            2000                                                Sparse Columns          FileTable
  SQL Server 1.0         SQL Server 4.21         1996                  SQL Server 2000                                            Spatial Types       Semantic Search
     (OS/2)                   (NT)           SQL Server 6.5            Reporting Services                                         FILESTREAM            Power View



          1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012



                 1991                       1995                 1998                                            2005                           2010
             SQL Server 1.1             SQL Server 6.0     SQL Server 7.0                                 SQL Server 2005                SQL Server 2008 R2
                (OS/2)                                    Dynamic Locking                                  Unicode Support                 Data-tier Apps
                                                            Auto-Tuning                                      Native XML                     StreamInsight
                                                           Full-text search                                    SQLCLR                        PowerPivot
                                                             Replication                                    Service Broker               Master Data Services
                                                          Analysis Services                              Integration Services
                                                                                                                             Aug 11
                                                    Aug 10
                                                                                                                     New Portal Experience
                                              SQL Azure SU4 RTW                        Feb 11
                                                                                                                         Sparse Columns
                                                Database Copy                 SQL Azure Reporting CTP2              SQL Azure Reporting CTP3
                                                 Web Admin                  Dec DataSync CTP2 Update
                                                                                10                                  SQL Azure DataSync CTP3
                          Apr 10
             Feb 10 SQL Azure SU2 RTW         Jul 10                   SQL Azure SU6 RTW                            DAC Import/Export Service
         SQL Azure RTW MARS               DataSync CTP1                  DataSync CTP2                                     Denali TSQL



                        Apr 10             Jul 10             Oct 10             Jan 11           Apr 11                Jul 11             Oct 11



                Feb 10                  Jun 10                          Nov 10                     Apr 11
          SQL Azure SU1 RTW       SQL Azure SU3 RTW                DataMarket RTW            SQL Azure SU V.Next
             Alter Edition             50 GB Db                SQL Azure Reporting CTP1        Multiple Servers
                                     Spatial Type                                             Server Mgmt API
                                   HierarchyId Type                                                 JDBC
                                                                                                DAC Upgrade
Secret: Many already
have Microsoft analytics
Business Intelligence and Business Analytics are
included with most SQL Server licenses
Data platform: SQL Server 2012
                              Data Integration
  Database Services                                      Analytical Services      Reporting Services
                                 Services

          SQL Server*            Integration Services*                               Reporting Services*
                                                             Analysis Services*
          SQL Azure*                                                                SQL Azure Reporting*


                                Master Data Services*
          Replication
                                                               Data Mining             Report Builder
     SQL Azure Data Sync*
                                Data Quality Services*


      Full Text & Semantic
                                   StreamInsight*              PowerPivot*              Power View*
             Search*
                                  Project “Austin”*




* New / improved in SQL Server 2012
SQL Server 2012 Editions




    Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
Secret: Microsoft offers
three enterprise tools
All three tools support scaled solutions
What Enterprise Tools support Microsoft
Data Mining?
                  Data
                 Mining

      SSMS        SSIS    PowerShell
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7



Discretized
Discretized
Continuous
Discrete
Variable      0   1   2   3   4   5   6   7


Discretized
Discretized
Continuous
Discrete
Data Mining Capacities
   SQL Server 2008 R2 Analysis Services Object                    Maximum sizes/numbers
   Maximum data mining models per structure                       2^31-1 = 2,147,483,647

   Maximum data mining structures per solution                    2^31-1 = 2,147,483,647

   Maximum data mining structures per Analysis
                                                                  2^31-1 = 2,147,483,647
   Services database
   Maximum data mining attributes (variables) per
                                                                  2^31-1 = 2,147,483,647
   structure


Reference:
http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
Semantic Search
Text Mining
Future: Most data is Text
Two Research Types
• Quantitative research = data mining
• Qualitative research = text mining
The future is combining both
Full-Text Search Enhancements
Property search: search on tagged properties (such as author or title)
Customizable NEAR: find words or phrases close to one another
New Word Breakers and Stemmers (for many languages)
(iFilter Required)
                                  iFilters   Full-Text
       Documents                             Keyword
                                              Index
                                               “FTI”



                                              Semantic
                                             Key Phrase
                                  Semantic     Index –
         Semantic Document        Database    Tag Index
         Similarity Index “DSI”                  “TI”
Languages Currently Supported
Traditional Chinese   Simplified Chinese
German                British English
English               Portuguese
French                Chinese (Hong Kong SAR, PRC)
Italian               Spanish
Brazilian             Chinese (Singapore)
Russian               Chinese (Macau SAR)
Swedish
Phases of Semantic Indexing
      Full Text Keyword Index “FTI”

                                                 Semantic Document Similarity
                                                         Index “DSI”
      Semantic Key Phrase Index –
            Tag Index “TI”




     http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
Secret: Semantic Search
scales linearly
Performance
Integrated Full Text Search (iFTS)
Improved Performance and Scale:
  Scale-up to 350M documents for storage and search
  iFTS query performance 7-10 times faster than in SQL Server 2008
  Worst-case iFTS query response times less than 3 sec for corpus
  Similar or better than main database search competitors
(2012, Michael Rys, Microsoft)
Linear Scale of FTI/TI/DSI
First known linearly scaling end-to-end Search and Semantic product in the industry




            Time in Seconds vs. Number of Documents
            (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
Text Mining References
Video
  http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic-
  Search
  http://www.microsoftpdc.com/2009/SVR32
Semantic Search (Books Online) – explains the demo
  http://msdn.microsoft.com/en-us/library/gg492075.aspx
Paper
  http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
Microsoft Resources
Links
Software
SQL Server 2012 Enterprise
(includes database engine, Analysis Services, SSMS and SSDT)
 http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx
Microsoft Office 2012 Professional
 http://office.microsoft.com/en-us/try
Organizations
 Professional Association for SQL Server http://www.sqlpass.org
   Atlanta MDF http://www.atlantamdf.com/
   Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft-
   Business-Intelligence-Users/
PASS Business Analytics Conference http://www.passbaconference.com
Microsoft TechEd North America http://northamerica.msteched.com/
Interactive
Takeaways
Conclusion: Seven Secrets
Excel data mining
More than just SQL Server
Success involves everyone
Microsoft is an analytics competitor
Many already have Microsoft analytics
Microsoft offers three enterprise tools
Semantic search scales linearly
Connect
Data Mining Resources and blog http://marktab.net
Data Mining Training and Consulting (especially Microsoft and SAS)
http://marktab.com
Abstract
If you have a SQL Server license (Standard or higher) then you already have the ability
to start data mining. In this new presentation, you will see how to scale up data
mining from the free Excel 2013 add-in to production use. Aimed at beginning to
intermediate data miners, this presentation will show how mining models move from
development to production. We will use SQL Server 2012 tools including SSMS, SSIS,
and SSDT.

More Related Content

PDF
An overview of Microsoft data mining technology
PDF
An overview of microsoft data mining technology
PPTX
Sql Server 2012
PDF
SQL Server 2008 Highlights
PDF
Sql server 2008 certifikati
PPTX
Building applications using sql azure
PPTX
SQLUG event: An evening in the cloud: the old, the new and the big
PPTX
Introducing SQL Server Data Services
An overview of Microsoft data mining technology
An overview of microsoft data mining technology
Sql Server 2012
SQL Server 2008 Highlights
Sql server 2008 certifikati
Building applications using sql azure
SQLUG event: An evening in the cloud: the old, the new and the big
Introducing SQL Server Data Services

What's hot (18)

PPT
It ready dw_day3_rev00
PDF
Whats New Sql Server 2008 R2
PPS
01 qmds2005 session01
PPTX
Sql azure data services OData
PDF
Sql azure database under the hood
PPTX
SQL Azure Federation and Scalability
PDF
Whats New Sql Server 2008 R2 Cw
PDF
Data Stagev8
PDF
KoprowskiT_SQLSoton_WADBforbeginners
PPTX
SQL Server Developer 70-433
PDF
Microsoft SQL Server - SQL Server 2008 R2 Editions Datasheet
PPT
Ofm msft-interop-v5c-132827
PDF
Microsoft SQL Server Distributing Data with R2 Bertucci
PPTX
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
PDF
First Look to SSIS 2012
PDF
Patel v res_(1)
PPTX
Office 2010 Programming
PPTX
SQL Server R2 Sunumu
It ready dw_day3_rev00
Whats New Sql Server 2008 R2
01 qmds2005 session01
Sql azure data services OData
Sql azure database under the hood
SQL Azure Federation and Scalability
Whats New Sql Server 2008 R2 Cw
Data Stagev8
KoprowskiT_SQLSoton_WADBforbeginners
SQL Server Developer 70-433
Microsoft SQL Server - SQL Server 2008 R2 Editions Datasheet
Ofm msft-interop-v5c-132827
Microsoft SQL Server Distributing Data with R2 Bertucci
Introduction to Business Intelligence in Microsoft SQL Server 2008 R2
First Look to SSIS 2012
Patel v res_(1)
Office 2010 Programming
SQL Server R2 Sunumu
Ad

Viewers also liked (6)

PDF
Enterprise Data Mining for SQL Server Pros
PDF
Slideshare web viewer displays sans serif incorrectly
PDF
Applied Semantic Search 201306
PDF
Social Marketing in 2011 for Microsoft Professionals
PDF
SQL Saturday 79 Social Marketing in 2011 for Microsoft Professionals
PDF
Application Refactoring With Design Patterns
Enterprise Data Mining for SQL Server Pros
Slideshare web viewer displays sans serif incorrectly
Applied Semantic Search 201306
Social Marketing in 2011 for Microsoft Professionals
SQL Saturday 79 Social Marketing in 2011 for Microsoft Professionals
Application Refactoring With Design Patterns
Ad

Similar to Secrets of Enterprise Data Mining (20)

PPTX
System Center
PDF
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
PPTX
Introducing SQL Server Data Services
PDF
SQL Server User Group 02/2009
PDF
Ms Sql Server Black Book
PDF
SQL Server 2008 Migration Workshop 04/29/2009
PDF
SQL Server Workshop Paul Bertucci
PDF
Admin Tech Ed Presentation Hardening Sql Server
PDF
Secrets of Enterprise Data Mining 201305
PDF
Leveraging PowerPivot
DOC
Sql server dba 2012 administration training
PPTX
Confio presentation
PDF
Model Driven Architecture (MDA): Motivations, Status & Future
PPTX
SSDT Workshop @ SQL Bits X (2012-03-29)
PPTX
Data In Cloud
PDF
Secrets of Enterprise Data Mining 201310
PDF
End-to-End Integrated Management with System Center 2012
PDF
Introduction to NuoDB - March 2018
PPTX
Enteprise Data Mining with SQL Server by Mark Tabladillo
PDF
21st Century Service Oriented Architecture
System Center
SQL Saturday 79 Enterprise Data Mining for SQL Server 2008 R2
Introducing SQL Server Data Services
SQL Server User Group 02/2009
Ms Sql Server Black Book
SQL Server 2008 Migration Workshop 04/29/2009
SQL Server Workshop Paul Bertucci
Admin Tech Ed Presentation Hardening Sql Server
Secrets of Enterprise Data Mining 201305
Leveraging PowerPivot
Sql server dba 2012 administration training
Confio presentation
Model Driven Architecture (MDA): Motivations, Status & Future
SSDT Workshop @ SQL Bits X (2012-03-29)
Data In Cloud
Secrets of Enterprise Data Mining 201310
End-to-End Integrated Management with System Center 2012
Introduction to NuoDB - March 2018
Enteprise Data Mining with SQL Server by Mark Tabladillo
21st Century Service Oriented Architecture

More from Mark Tabladillo (20)

PDF
How to find low-cost or free data science resources 202006
PDF
Microsoft Build 2020: Data Science Recap
PDF
201909 Automated ML for Developers
PDF
201908 Overview of Automated ML
PDF
201906 01 Introduction to ML.NET 1.0
PDF
201906 04 Overview of Automated ML June 2019
PDF
201906 03 Introduction to NimbusML
PDF
201906 02 Introduction to AutoML with ML.NET 1.0
PDF
201905 Azure Databricks for Machine Learning
PDF
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
PDF
Big Data Advanced Analytics on Microsoft Azure 201904
PDF
Managing Enterprise Data Science 201904
PDF
Training of Python scikit-learn models on Azure
PDF
Big Data Adavnced Analytics on Microsoft Azure
PDF
Advanced Analytics with Power BI 201808
PDF
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
PDF
Machine learning services with SQL Server 2017
PDF
Microsoft Technologies for Data Science 201612
PDF
How Big Companies plan to use Our Big Data 201610
PDF
Georgia Tech Data Science Hackathon September 2016
How to find low-cost or free data science resources 202006
Microsoft Build 2020: Data Science Recap
201909 Automated ML for Developers
201908 Overview of Automated ML
201906 01 Introduction to ML.NET 1.0
201906 04 Overview of Automated ML June 2019
201906 03 Introduction to NimbusML
201906 02 Introduction to AutoML with ML.NET 1.0
201905 Azure Databricks for Machine Learning
201905 Azure Certification DP-100: Designing and Implementing a Data Science ...
Big Data Advanced Analytics on Microsoft Azure 201904
Managing Enterprise Data Science 201904
Training of Python scikit-learn models on Azure
Big Data Adavnced Analytics on Microsoft Azure
Advanced Analytics with Power BI 201808
Microsoft Cognitive Toolkit (Atlanta Code Camp 2017)
Machine learning services with SQL Server 2017
Microsoft Technologies for Data Science 201612
How Big Companies plan to use Our Big Data 201610
Georgia Tech Data Science Hackathon September 2016

Recently uploaded (20)

PDF
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
DOCX
Business Management - unit 1 and 2
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PPTX
5 Stages of group development guide.pptx
PPTX
Lecture (1)-Introduction.pptx business communication
PPTX
Principles of Marketing, Industrial, Consumers,
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
Training And Development of Employee .pdf
PDF
WRN_Investor_Presentation_August 2025.pdf
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
MSPs in 10 Words - Created by US MSP Network
PDF
IFRS Notes in your pocket for study all the time
PDF
Power and position in leadershipDOC-20250808-WA0011..pdf
PDF
Nidhal Samdaie CV - International Business Consultant
PDF
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PPTX
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
PDF
How to Get Funding for Your Trucking Business
20250805_A. Stotz All Weather Strategy - Performance review July 2025.pdf
DOC-20250806-WA0002._20250806_112011_0000.pdf
Business Management - unit 1 and 2
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Reconciliation AND MEMORANDUM RECONCILATION
5 Stages of group development guide.pptx
Lecture (1)-Introduction.pptx business communication
Principles of Marketing, Industrial, Consumers,
Unit 1 Cost Accounting - Cost sheet
Training And Development of Employee .pdf
WRN_Investor_Presentation_August 2025.pdf
Belch_12e_PPT_Ch18_Accessible_university.pptx
MSPs in 10 Words - Created by US MSP Network
IFRS Notes in your pocket for study all the time
Power and position in leadershipDOC-20250808-WA0011..pdf
Nidhal Samdaie CV - International Business Consultant
Stem Cell Market Report | Trends, Growth & Forecast 2025-2034
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
Dragon_Fruit_Cultivation_in Nepal ppt.pptx
How to Get Funding for Your Trucking Business

Secrets of Enterprise Data Mining

  • 1. Secrets of Enterprise Data Mining Mark Tabladillo, Ph.D. (MVP, MCAD .NET, MCITP, MCT) PASS SQL Saturday #177 Mountain View, CA February 23, 2013
  • 3. About MarkTab Training and Consulting with Ph.D. – Industrial Engineering, http://marktab.com Georgia Tech Data Mining Resources and Blog at Training and consulting http://marktab.net internationally across many industries – SAS and Microsoft Contributed to peer-reviewed research and legislation Mentoring doctoral dissertations at the accredited University of Phoenix Presenter
  • 4. Interactive Name (up to) three things you want from enterprise data mining
  • 5. Secret: Excel data mining Excel add-in for SQL Server data mining
  • 6. Secret: More than just SQL Server Microsoft continues to add machine learning technology
  • 7. Microsoft Offers Bing Maps Xbox Kinect Hacker Magnet SQL Server 2012 Analysis Services (Multidimensional and Data Mining) Integration Services Semantic Search Hadoop Partnership Excel Projects from Microsoft Research
  • 9. Definition Data mining is the automated or semi-automated process of discovering patterns in data Machine learning is the development and optimization of algorithms for automated or semi-automated pattern discovery
  • 10. Purposes Phrase Goal “Data Mining” Inform actionable decisions “Machine Determine best performing Learning” algorithm
  • 11. Secret: Give artists art Data mining is part of a complete decision cycle
  • 12. MarkTab Decision Cycle GO Synthesis Analysis (art) (science) Science needs science fiction -- MarkTab
  • 13. MarkTab Decision Cycle GO Synthesis Analysis (art) (science)
  • 17. Secret: Microsoft is an analytics competitor Industry Comparisons 2012-2013
  • 18. Gartner 2013 Magic Quadrant for Business Intelligence and Analytics Platforms Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DZLPEH&ct=130207&st=sb – February 5, 2013
  • 19. Gartner 2013 Magic Quadrant for Data Warehouse Database Management Systems Retrieved from http://www.gartner.com/technology/reprints.do?id=1-1DU2VD4&ct=130131&st=sb – January 31, 2013
  • 21. SQL Server 2012 Business Intelligence and Business Analytics
  • 22. New Platform options: managed services Platform Infrastructure Platform Software (Self Managed) (as a Service) (as a Service) (as a Service) Applications Applications Applications Applications Data Data Data Data Runtime Runtime Runtime Runtime Middleware Middleware Middleware Middleware Managed Services Database Database Database Database Managed Services O/S O/S O/S O/S Virtualization Virtualization Virtualization Virtualization Managed Services Servers Servers Servers Servers Storage Storage Storage Storage Networking Networking Networking Networking
  • 23. SQL Release timelines 2008 SQL Server 2008 2012 SQL Server 2012 AlwaysOn Columnstore 1989 1993 2000 Sparse Columns FileTable SQL Server 1.0 SQL Server 4.21 1996 SQL Server 2000 Spatial Types Semantic Search (OS/2) (NT) SQL Server 6.5 Reporting Services FILESTREAM Power View 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 1991 1995 1998 2005 2010 SQL Server 1.1 SQL Server 6.0 SQL Server 7.0 SQL Server 2005 SQL Server 2008 R2 (OS/2) Dynamic Locking Unicode Support Data-tier Apps Auto-Tuning Native XML StreamInsight Full-text search SQLCLR PowerPivot Replication Service Broker Master Data Services Analysis Services Integration Services Aug 11 Aug 10 New Portal Experience SQL Azure SU4 RTW Feb 11 Sparse Columns Database Copy SQL Azure Reporting CTP2 SQL Azure Reporting CTP3 Web Admin Dec DataSync CTP2 Update 10 SQL Azure DataSync CTP3 Apr 10 Feb 10 SQL Azure SU2 RTW Jul 10 SQL Azure SU6 RTW DAC Import/Export Service SQL Azure RTW MARS DataSync CTP1 DataSync CTP2 Denali TSQL Apr 10 Jul 10 Oct 10 Jan 11 Apr 11 Jul 11 Oct 11 Feb 10 Jun 10 Nov 10 Apr 11 SQL Azure SU1 RTW SQL Azure SU3 RTW DataMarket RTW SQL Azure SU V.Next Alter Edition 50 GB Db SQL Azure Reporting CTP1 Multiple Servers Spatial Type Server Mgmt API HierarchyId Type JDBC DAC Upgrade
  • 24. Secret: Many already have Microsoft analytics Business Intelligence and Business Analytics are included with most SQL Server licenses
  • 25. Data platform: SQL Server 2012 Data Integration Database Services Analytical Services Reporting Services Services SQL Server* Integration Services* Reporting Services* Analysis Services* SQL Azure* SQL Azure Reporting* Master Data Services* Replication Data Mining Report Builder SQL Azure Data Sync* Data Quality Services* Full Text & Semantic StreamInsight* PowerPivot* Power View* Search* Project “Austin”* * New / improved in SQL Server 2012
  • 26. SQL Server 2012 Editions Retrieved from http://www.microsoft.com/en-us/sqlserver/editions.aspx -- February 2013
  • 27. Secret: Microsoft offers three enterprise tools All three tools support scaled solutions
  • 28. What Enterprise Tools support Microsoft Data Mining? Data Mining SSMS SSIS PowerShell
  • 29. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 30. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 31. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 32. Variable 0 1 2 3 4 5 6 7 Discretized Discretized Continuous Discrete
  • 33. Data Mining Capacities SQL Server 2008 R2 Analysis Services Object Maximum sizes/numbers Maximum data mining models per structure 2^31-1 = 2,147,483,647 Maximum data mining structures per solution 2^31-1 = 2,147,483,647 Maximum data mining structures per Analysis 2^31-1 = 2,147,483,647 Services database Maximum data mining attributes (variables) per 2^31-1 = 2,147,483,647 structure Reference: http://www.marktab.net/datamining/index.php/2010/08/01/sql-server-data-mining-capacities-2008-r2/
  • 35. Future: Most data is Text Two Research Types • Quantitative research = data mining • Qualitative research = text mining The future is combining both
  • 36. Full-Text Search Enhancements Property search: search on tagged properties (such as author or title) Customizable NEAR: find words or phrases close to one another New Word Breakers and Stemmers (for many languages)
  • 37. (iFilter Required) iFilters Full-Text Documents Keyword Index “FTI” Semantic Key Phrase Semantic Index – Semantic Document Database Tag Index Similarity Index “DSI” “TI”
  • 38. Languages Currently Supported Traditional Chinese Simplified Chinese German British English English Portuguese French Chinese (Hong Kong SAR, PRC) Italian Spanish Brazilian Chinese (Singapore) Russian Chinese (Macau SAR) Swedish
  • 39. Phases of Semantic Indexing Full Text Keyword Index “FTI” Semantic Document Similarity Index “DSI” Semantic Key Phrase Index – Tag Index “TI” http://msdn.microsoft.com/en-us/library/gg492085.aspx#SemanticIndexing
  • 40. Secret: Semantic Search scales linearly Performance
  • 41. Integrated Full Text Search (iFTS) Improved Performance and Scale: Scale-up to 350M documents for storage and search iFTS query performance 7-10 times faster than in SQL Server 2008 Worst-case iFTS query response times less than 3 sec for corpus Similar or better than main database search competitors (2012, Michael Rys, Microsoft)
  • 42. Linear Scale of FTI/TI/DSI First known linearly scaling end-to-end Search and Semantic product in the industry Time in Seconds vs. Number of Documents (2011 – K. Mukerjee, T. Porter, S. Gherman – Microsoft)
  • 43. Text Mining References Video http://channel9.msdn.com/Shows/DataBound/DataBound-Episode-2-Semantic- Search http://www.microsoftpdc.com/2009/SVR32 Semantic Search (Books Online) – explains the demo http://msdn.microsoft.com/en-us/library/gg492075.aspx Paper http://users.cis.fiu.edu/~lzhen001/activities/KDD2011Program/docs/p213.pdf
  • 45. Software SQL Server 2012 Enterprise (includes database engine, Analysis Services, SSMS and SSDT) http://www.microsoft.com/sqlserver/en/us/get-sql-server/try-it.aspx Microsoft Office 2012 Professional http://office.microsoft.com/en-us/try
  • 46. Organizations Professional Association for SQL Server http://www.sqlpass.org Atlanta MDF http://www.atlantamdf.com/ Atlanta Microsoft BI Users Group http://www.meetup.com/Atlanta-Microsoft- Business-Intelligence-Users/ PASS Business Analytics Conference http://www.passbaconference.com Microsoft TechEd North America http://northamerica.msteched.com/
  • 48. Conclusion: Seven Secrets Excel data mining More than just SQL Server Success involves everyone Microsoft is an analytics competitor Many already have Microsoft analytics Microsoft offers three enterprise tools Semantic search scales linearly
  • 49. Connect Data Mining Resources and blog http://marktab.net Data Mining Training and Consulting (especially Microsoft and SAS) http://marktab.com
  • 50. Abstract If you have a SQL Server license (Standard or higher) then you already have the ability to start data mining. In this new presentation, you will see how to scale up data mining from the free Excel 2013 add-in to production use. Aimed at beginning to intermediate data miners, this presentation will show how mining models move from development to production. We will use SQL Server 2012 tools including SSMS, SSIS, and SSDT.