SlideShare a Scribd company logo
Machine Learning 101
Advanced Analytics and DataScience
CCG
Analytics Solutions & Services
DATA
MANAGEMENT
Data & analytics consultants with a passion for helping clients
overcome business challenges & increase performance by
leveraging modern analytic solutions.
BUSINESS
ANALYTICS
DATA
STRATEGY
VOICES OF OUR CUSTOMERS
“CCGto brought the expertise and the vision of to help us execute, to provide
visibility to the data in a manner that we can use it faster.”
- Gary Gray, Business Solutions Executive, Corsicana Mattress Company
“The people we talked to know us. CCGwasn’t trying to fit us into a boilerplate
template but prescribe a tailored solution. Their RapidRoadmap was the basis of
our BI Strategy for the next two years.”
- Kevin Davis, Sr. Director of BI, Kforce
“Many times with CCG, we come to the table with questions or ideas and within a
couple of days or weeks the team comes back with above and beyond what we
actually asked for. They care.”
- Chris Fitzpatrick, Vice President of Business Analytics & Strategy, vineyard vines
“"I'mamazed at the talent at CCG, not just the skillset - they're really good people.
We've already referred them once and will do so again!”
- CIO, Ruth’s Chris Hospitality Group
Objectives
By the end of this workshop, you should be able to:
Describe what Machine Learning is and how it fits in to the analytic landscape
Understand the difference between traditional and “advanced” analytics
Describe what a statistical model is
Understand a machine learning approach to statistical modeling
The conceptual methodology behind the Machine Learning areas of classification, and clustering
Describe some the most common tools for implementing data science and Machine Learning
AGENDA
Why should anyone care about
machine learning?
What is Machine Learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
The concepts in Machine Learning are not new.
How has Machine Learning Evolved?
https://www.quantinsti.com/blog/machine-learning-basics
nother human.
Even though the concepts are decades old,
machine learning has only become feasible at scale in recent years.
Why Machine Learning Now?
Flood of data and decreasing costs of storage
Increasing computational power
Increased attention from researchers
Growth of open source technologies
Support from industries
Machine Learning has tons of useful applications you already encounter or
hear about every day.
Analyzing
Images
Understanding
Language
Forming &
Executing Strategy
Personalized
Recommendations
Autonomous
Decisions
Predicting
Asset Values
How is Machine Learning used?
Machine Learning isn’t just applicable to high tech.
There are suitable use cases presentin most business sectors.
Where is Machine Learning used?
Healthcare
• Claims Fraud
• Real-time mortality risk
for ICU patients
• Response Adapted
Radiotherapy
• Predictingpatient
medication adherence
• Translational/precision
medicine
Finance
• Foreclosure/creditrisk
• Risk analysis
• Fraud detection
• Demand forecasting
• Anti Money Laundering
• Algorithmic trading
Energy
• Resource allocation
• Load forecasting
• Grid optimization
• Robotics
• Anomaly detection
• Image recognition
• Predictivemaintenance
Retail
• Singleview of customer
• Customer serviceanalysis
• Inventory planning
• Social media analysis
• Lead scoring
• Marketing campaign
evaluation
Machine learning sits at the intersectionof statistics and computer science to
help businesses make decisions.
Why Machine Learning Now?
Computational
Power
Statistics
Predictive & Prescriptive Decision Support
Faster More
Accurate
More
PowerfulSelf-Improving Always-On
AGENDA
Why should anyone care about
machine learning?
What is Machine Learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
Machine Learning is a technique that can be used in the data
science process to achieve several possible outputs.
What is Machine Learning?
Data Science
A broadprocessfor generatinginsights
that mayinvolve dataingestionfrom
one or manysources(includingexternal
data, streamingdata, or bigdata), data
processingandcleansing, model
generationusingstatistical ormachine
learningapproaches, model selection,
model deploymentandmaintenance,
and visualizationof data.
Advanced Analytics
Applydatascience topredictive (what
will happen?)orprescriptive(what
shouldwe do?) businessuse cases.
Artificial Intelligence /
Cognitive Computing
Applydatascience toapproximate
humanintuitionanddecisionmaking
(e.g.strategy,creativity,planning) or
humansensoryfunctions(e.g.
computervision,natural language
understanding,etc.)
Statistics
A branchof mathfor generating
descriptionsof orinferencesabouta
population,oftenbasedonsamples
of the population.Inferencesmay
take the form of “models,”which
are equationsthatapproximate the
data’sinherentrelationships.
Machine Learning
Combinescomputerscience with
mathconceptsto generate models
by rapidlyiteratingonlarge
datasets.
Other Analytics Disciplines
(Data Engineering,Visualization)
Disciplines Process Outputs
Automation /
Robotics /
Intelligent
Devices
Actions
Strategy /
Operations
Advanced Analytics (“AA”) enable predictive and prescriptive uses of data by
applying sophisticatedmath and statistics to automate parts of the analysis.
What is Advanced Analytics?
Traditional analytics focuses on
understanding and explaining the
data that has been collected.
AA focuses on generating new
data in the form of predictions or
decisions, and going the extra step
to automate decision-making
when possible.
Advanced Analytics deal with making “best guesses” faster,better,and
more consistentthan relying on human SMEs.
Provide insights on existing data using:
• Raw data points
• Summaries of data
• Calculations across existing data fields
• KPIs
The data reported are historical or current facts.
Generally requires the application of basic
mathematics or arithmetic.
Generate new data, including:
• Predicted future values
• Best guesses of missing values
• Suggested next steps
• Categorizations
The data generated are “best guesses” and
contain some uncertainty.
Requires the application of advanced
mathematics, statistics and computing principles.
TraditionalAnalytics AdvancedAnalytics
Traditional vs. Advanced Analytics
AGENDA
Why should anyone care about
machine learning?
What is Machine Learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
A model is a repeatable, data-driven approach to making a best guess.
It does this by formalizing mathematical relationships between data in the form of either:
– Rules (e.g. predict applicants will default on a loan if Credit Score < 700 and Debt to Income Ratio > 30%)
– Or an equation (e.g. predict Home Price = 100*Square Footage + 2*Average Income in the Area)
NOTE that this is not the same as a DATA model. These are different things:
Machine Learning works by using “algorithms” to generate “models.”
How does Machine Learning work?
Data Model Statistical Model
In the past we’ve toldcomputers how to use data to a answer our
questions.
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month sales =
(prior month +
2 months prior +
3 months prior)
/ 3
Answer
This month’s sales = $3MM?
What’s a model?
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
But we’ve found that if we give the machine historicfacts, we can let it find
the right program/ model to plug in for future answers.
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month’s sales =
1/8 * Prior month +
1/3 * 2 months prior +
1/4 * 3 months prior
What’s a model?
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $1MM
Once we have our machine-defined program, we can use it with new data to
make better predictions.
Answer
Last month’s sales: $2MM
Data
Prior month sales: $4MM
2 months prior: $3MM
3 months prior: $2MM
Program / Model
This month’s sales =
1/8 * Prior month +
1/3 * 2 months prior +
1/4 * 3 months prior
New Data
Prior month sales: $8MM
2 months prior: $6MM
3 months prior: $8MM
Answer
This month’s sales = $5MM
What’s a model?
A defined set of steps for solving a problem
Often involves repeating steps
May or may not have an ending condition
– The problem is solved to our satisfaction
• For example – stop when the last 4 iterations have been 95% accurate or better
– The problem hasn’t been solved but we don’t seem to be getting any closer to solving it
• For example – stop if the last 10 iterations have not seen any improvement in accuracy
– The process has run for a long time
• For example – stop after the program has run for 12 hours, regardless of whether progress is still being made
The wordalgorithm gets used a lot, but it isn’t always defined.
What is an algorithm?
Collect the data and randomly create initial decision rules.
Design a method for measurably evaluating how good or bad your hypothesis is.
Update your hypothesis in a way that marginally improves the performance of your decision rules.
Continue this process until the hypothesis either you are satisfied with the results, or your hypothesis
can’t improve anymore with the data available.
Almostall machine learning algorithms followthe same general pattern.
Create a
hypothesis
Evaluate the
hypothesis
Adjustthe
hypothesis
Repeat until
convergence
What is an algorithm?
AGENDA
Why should anyone care about
machine learning?
What is Machine Learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
There are two main families of algorithms to choose from.
Supervised Learning Unsupervised Learning
There aren’t necessarily “right answers,” we just want to
get a better understanding of our data.
We know the “right answers” for some of the scenarios.
– We may have history we can look back on
– We may be hoping to replicate human decision making
Supervisedor Unsupervised?
Predict our profits next quarter. Supervised
Identify the number written on a check.
Group our customers into segments.
Supervised
Unsupervised
Predict a user’s rating for a given product. Supervised
Find the most importantvariables in a dataset. Unsupervised
Identify credit card transactionsthat are out of the ordinary. Unsupervised
Now let’s walkthrough two of the mostpopular machine learning
approaches and discuss how the algorithms are applied.
How does an algorithm really workfor businesses?
Classification Clustering
Use classificationwhen you want to guess a non-numeric value,like a
yes/no answer.We will take a decisiontree approach.
Everyone will repay their loan.
Create a
hypothesis
20 outstanding loans
Use classificationwhen you want to guess a non-numeric value,like a
yes/no answer.We will take a decisiontree approach.
Calculate accuracy as the % of predictions that are correct based on your current set of rules.
Evaluate the
hypothesis
20 outstanding loans
12 repaid, 8 defaulted
Accuracy = 12/20 = 60%
Use classificationwhen you want to guess a non-numeric value,like a
yes/no answer.We will take a decisiontree approach.
Find the next branch by looking for the data split that would have the biggest impact on the purity of
each node. There are several ways to do this mathematically (Gini Index, Information Gain, Chi-
Square).
Adjustthe
hypothesis
20 outstanding loans20 outstanding loans 20 outstanding loans
CreditScore > 700CreditScore < 700 Income > 60kIncome < 60k DTI > 40%DTI < 40%
80%73%70%50%71%53%
59% weighted 60% weighted 75% weighted
Use classificationwhen you want to guess a non-numeric value,like a
yes/no answer.We will take a decisiontree approach.
Repeat the process for each of your new “leaf” nodes. Stop when you reach an acceptable level of
accuracy, or when your accuracy begins getting worse with independent data.
Repeat until
convergence
20 outstanding loans
DTI > 40%DTI < 40%
CreditScore > 700CreditScore < 700Income > $60kIncome < $60k
100%50% 100%100%
80% weighted
Classificationis used for lots of problems that copy human intuition. Think
about how you classify informationto identify these images!
These use cases areobviously
morecomplex than our
simple decision tree, but with
moreadvanced approaches
like convolutionalneural
networks thesepictures can
definitely be classified by a
machine.
Use clustering when there’s no “correct”classification,but you still want to
assign individuals to groups. This algorithmis called k-means clustering.
Imagine Marketing has
asked you to split these
customers into 3 groups.
How would you do it?
Use clustering when there’s no “correct”classification,but you still want to
assign individuals to groups. This algorithmis called k-means clustering.
I can segment my customers by assigning them to 3 groups. We’ll set down 3 random “anchors” and
assign each customer to its closest anchor.
Create a
hypothesis
Use clustering when there’s no “correct”classification,but you still want to
assign individuals to groups. This algorithmis called k-means clustering.
Find the distance between each customer and the center of each group. Take note of which
customers are actually closest to a different center than the one they’re assigned to.
Evaluate the
hypothesis
Use clustering when there’s no “correct”classification,but you still want to
assign individuals to groups. This algorithmis called k-means clustering.
Reassign each customer to the group corresponding to the center they’re closest to, and move the
anchors to the middle of their new group.
Adjustthe
hypothesis
Use clustering when there’s no “correct”classification,but you still want to
assign individuals to groups. This algorithmis called k-means clustering.
Repeat until
convergence
Keep moving the anchors and re-assigning customers until the anchors stop moving.
This is just the tip of the iceberg.There are several
algorithms available for various types of problems.
AGENDA
Why should anyone care about
machine learning?
What is Machine Learning?
How does Machine Learning
work?
Ok but how does it really work?
How can an organization use
Machine Learning?
Delivering analytics with Machine Learning requires alignment
across people, process,technology,and data.
Engaging with Machine Learning
Image inspired by Microsoft
People
Process Technology
Data
Guide
Support
Enable
Data scientists combine broad skills to integrate data, build
models,and drive business value.
People
Process Technology
Data
Let’s lookat the MicrosoftTeamData Science Process to see how
data scientists spend their time.
People
Process Technology
Data
TraditionalAnalytics
The outputs of the process can be used in traditional analytics,
analyzed directly,or fed into automated decision-making.
Storeand access data. Filter and aggregate it. Visualizeit.
Show it to the business
so they can take action.
MachineLearning
Filter and aggregate it. Create a model. Generate new data
(predictions, etc.).
The new data can be
stored with the rest of the
data for usein analytics.
Or it can be visualized
directly to gain insights.
Or it can automate
decisions or actions,
allowing better processes
to run faster and 24/7.
People
Process Technology
Data
The sources of data for use in data science can be broad.
People
Process Technology
Data
Data
Warehouses
•Curated &
Governeddata
•Big data
•Cloud or on-prem
Data Lakes
•Unstructured&
Semi-structured
data
•Streaming data
•Partiallycurated
Externally
Procured
Data
•Maybe purchased
from 3rd
party
providers
•Maybe scraped
from the web
•Mayrequire
designingresearch
experiments
Data scientists typically havethe
programming and data integration skills to
use data fromanywhereitcan be found.
The Microsofttechnology stackprovides a holistic
solutionto your Machine Learning needs.
People
Process Technology
Data
We can work with your business to deliver custompredictive and
prescriptive analytics across the lifecycle.
What can CCG do?
Use Case Definition
• Develop a backlog of
predictive and
prescriptive use cases
• Refine and prioritize use
cases by value
• Develop a predictive
roadmap
Model Development
• Aggregate data from
across internal and
external data sources
• Develop and test
multiple models to find
the best approach to
making predictions
Model Maintenance
• Monitor and maintain
statistical models to
sustain predictive power
• Develop a model telemetry
dashboard
• Test model design changes
to improve predictive
power
Model Governance & Processes
• Assess existing Data Science capabilities
• Develop standards and processes to help guide data science output
• Build a Data Science Center of Excellence
Model Deployment
• Customize and deploy
pre-existing models from
Azure Cognitive Services
• Deploy custom model as
an API or batch job, or
support deployment in
existing systems
Rapid Insight Prototype Offering Model as a Service Subscription Offering
CCG’s Rapid Insight Solution
Actionable Backlog
– Of use cases ripe for predictive
analytics to transform your
business
Detailed Readouts
– The materials we leave behind
will include extensive analysis
of our methodology, findings,
and recommendations
Ownership of the Model
– Just because the project ends
doesn’t mean the model stops
working. Unlike other managed
service providers, what we
produce on your behalf is yours
to keep
Identify Use Cases
– By holding a workshop with
process SMEs to identify
opportunities to supercharge the
business
Summarize the Findings
– So you can understand the
model’s outputs and begin
taking action on what we’ve
learned
Develop a Prototype Model
– To generate forecasts,
classifications, orexploratory
analysis forone of your use
cases using an industry-standard
tool like Azure Machine Learning
Studio or Databricks
Week 1 Weeks 2-5 Week 6
Fully Operational Production Model
– Available at all times, in production
– Batch & API integrations
Model Supervision
– Model is monitored for ongoing usability
– Performance dashboard
– Guaranteed accuracy SLAs
Model Retraining & Support
– Scheduled & triggered model re-tuning or re-training
– Add new data features over time
Model as a Service Solution
Set up model as
a web service
Visualize model
performance in a
dashboard
Maintain and
enhance model
THANK YOU!
What questions do you have?
Microsoftofferspre-builtAPIs through Cognitive Services that
can expedite the deploymentof AI capabilities.
People
Process Technology
Data
VISUAL DRAG -AND-DROP
Azure Machine Learning Studio
What is Azure Databricks?
A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure
Best of Databricks Best of Microsoft
Designed in collaboration with the founders of Apache Spark
One-click set up; streamlined workflows
Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts.
Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage)
Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
Azure Databricks key audiences & benefits
Unified analytics platform
Integrated workspace
Easy data exploration
Collaborative experience
Interactive dashboards
Faster insights
• Best of Spark & serverless
• Databricks managed Spark
Improved ETL performance
• Zero management clusters, serverless
Easy to schedule jobs
Automated workflows
Enhanced monitoring & troubleshooting
• Automated alerts & easy access to logs
Zero Management Spark
Cluster democratization (serverless)
Fast, collaborative analytics platform
accelerating time to market
No dev-ops required
Enterprise grade security
• Encryption
• End-to-end auditing
• Role-based control
• Compliance
Data scientist Data engineer CDO, VP of analytics
Provided by Microsoft and Databricks under NDA

More Related Content

PPTX
Machine learning101 v1.2
 
PPT
Analytics with Descriptive, Predictive and Prescriptive Techniques
PDF
940 sponsor gazdak_using our laptop
PDF
Data driven decision making
PDF
1645 track 3 porter
PDF
Predictive analytics in action real-world examples and advice
PPTX
Data Quality Analytics: Understanding what is in your data, before using it
PDF
Predictive analytics 2025_br
Machine learning101 v1.2
 
Analytics with Descriptive, Predictive and Prescriptive Techniques
940 sponsor gazdak_using our laptop
Data driven decision making
1645 track 3 porter
Predictive analytics in action real-world examples and advice
Data Quality Analytics: Understanding what is in your data, before using it
Predictive analytics 2025_br

What's hot (20)

PDF
Elsevier
PPT
Predictive Model
PPTX
Andreas weigend
PDF
Big Data Tools PowerPoint Presentation Slides
PDF
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
PDF
Community-Assisted Software Engineering Decision Making
PPTX
Supporting innovation in insurance with randomized experimentation
PDF
TLabs - deutsche telekom
PDF
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
PDF
Data Analytics: From Basic Skills to Executive Decision-Making
PDF
Think better using “Descriptive-Prescriptive” Approach
PDF
Predictive and prescriptive analytics: Transform the finance function with gr...
PDF
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
PDF
Introduction to data analytics
PDF
Business Data Analytics Powerpoint Presentation Slides
PDF
What we do; predictive and prescriptive analytics
PDF
Big Data Analytics
PDF
1140 track 3 ramirez_using our laptop
PDF
Analytics Staffing Models of Health Systems That Compete Well Using Data
Elsevier
Predictive Model
Andreas weigend
Big Data Tools PowerPoint Presentation Slides
DI&A Slides: Descriptive, Prescriptive, and Predictive Analytics
Community-Assisted Software Engineering Decision Making
Supporting innovation in insurance with randomized experimentation
TLabs - deutsche telekom
Agile Analytics: The Secret to Test, Improve, Fail & Succeed Quickly.
Data Analytics: From Basic Skills to Executive Decision-Making
Think better using “Descriptive-Prescriptive” Approach
Predictive and prescriptive analytics: Transform the finance function with gr...
Data Science Salon: Quit Wasting Time – Case Studies in Production Machine Le...
Introduction to data analytics
Business Data Analytics Powerpoint Presentation Slides
What we do; predictive and prescriptive analytics
Big Data Analytics
1140 track 3 ramirez_using our laptop
Analytics Staffing Models of Health Systems That Compete Well Using Data
Ad

Similar to Ml in a day v 1.1 (20)

PPTX
Ml in a Day Workshop 5/1
 
PPTX
Machine Learning with Azure and Databricks Virtual Workshop
 
PDF
Introduction to Machine Learning with Azure & Databricks
 
PPTX
Introduction to Data Science
PPTX
BrandsLab Marketing Performance Optimization Session 1 | Off the Beaten Path ...
PPTX
Best practices machine learning final
PPTX
Data Analytics & Visualization (Introduction)
PDF
CSC1202 Lecture 2 Data Science Processes.pdf
PPTX
Simplify our analytics strategy
PDF
365 Data Science
PDF
Data Science for Business Managers - An intro to ROI for predictive analytics
PDF
Comprehensive Notes on Big Data Concepts and Applications Based on University...
PPT
Machine learning
PDF
Economics & Statistics Insights in Data Science by DataPerts Technologies
PPTX
Data Science
PDF
PPTX
Business Analytics.pptx
PDF
Best Practices for Big Data Analytics with Machine Learning by Datameer
PPTX
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
PPTX
Data Science Crash course
Ml in a Day Workshop 5/1
 
Machine Learning with Azure and Databricks Virtual Workshop
 
Introduction to Machine Learning with Azure & Databricks
 
Introduction to Data Science
BrandsLab Marketing Performance Optimization Session 1 | Off the Beaten Path ...
Best practices machine learning final
Data Analytics & Visualization (Introduction)
CSC1202 Lecture 2 Data Science Processes.pdf
Simplify our analytics strategy
365 Data Science
Data Science for Business Managers - An intro to ROI for predictive analytics
Comprehensive Notes on Big Data Concepts and Applications Based on University...
Machine learning
Economics & Statistics Insights in Data Science by DataPerts Technologies
Data Science
Business Analytics.pptx
Best Practices for Big Data Analytics with Machine Learning by Datameer
Lesson 1 - Overview of Machine Learning and Data Analysis.pptx
Data Science Crash course
Ad

More from CCG (20)

PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PPTX
Data Governance Workshop
 
PDF
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PDF
How to Create a Data Analytics Roadmap
 
PDF
Analytics in a Day Ft. Synapse Virtual Workshop
 
PPTX
Power BI Advanced Data Modeling Virtual Workshop
 
PPTX
Artificial Intelligence Executive Brief
 
PDF
Analytics in a Day Virtual Workshop
 
PPTX
Virtual Governance in a Time of Crisis Workshop
 
PPTX
Advance Data Visualization and Storytelling Virtual Workshop
 
PPTX
Azure Fundamentals Part 3
 
PDF
Analytics in a Day Virtual Workshop
 
PPTX
Power BI Advance Modeling
 
PPTX
Azure Fundamentals Part 2
 
PPTX
Shape Your Data into a Data Model with M
 
PPTX
Azure Fundamentals Part 1
 
PPTX
Introduction to Microsoft Power BI
 
PPTX
Data Governance and MDM | Profisse, Microsoft, and CCG
 
Analytics in a Day Ft. Synapse Virtual Workshop
 
Data Governance Workshop
 
How to Monetize Your Data Assets and Gain a Competitive Advantage
 
Analytics in a Day Ft. Synapse Virtual Workshop
 
Analytics in a Day Ft. Synapse Virtual Workshop
 
How to Create a Data Analytics Roadmap
 
Analytics in a Day Ft. Synapse Virtual Workshop
 
Power BI Advanced Data Modeling Virtual Workshop
 
Artificial Intelligence Executive Brief
 
Analytics in a Day Virtual Workshop
 
Virtual Governance in a Time of Crisis Workshop
 
Advance Data Visualization and Storytelling Virtual Workshop
 
Azure Fundamentals Part 3
 
Analytics in a Day Virtual Workshop
 
Power BI Advance Modeling
 
Azure Fundamentals Part 2
 
Shape Your Data into a Data Model with M
 
Azure Fundamentals Part 1
 
Introduction to Microsoft Power BI
 
Data Governance and MDM | Profisse, Microsoft, and CCG
 

Recently uploaded (20)

PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PDF
Lecture1 pattern recognition............
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPT
Quality review (1)_presentation of this 21
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PDF
Mega Projects Data Mega Projects Data
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PPTX
Introduction to Knowledge Engineering Part 1
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
1_Introduction to advance data techniques.pptx
PPT
Miokarditis (Inflamasi pada Otot Jantung)
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
Moving the Public Sector (Government) to a Digital Adoption
Lecture1 pattern recognition............
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Quality review (1)_presentation of this 21
Galatica Smart Energy Infrastructure Startup Pitch Deck
Mega Projects Data Mega Projects Data
Fluorescence-microscope_Botany_detailed content
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Introduction-to-Cloud-ComputingFinal.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
Introduction to Knowledge Engineering Part 1
IB Computer Science - Internal Assessment.pptx
1_Introduction to advance data techniques.pptx
Miokarditis (Inflamasi pada Otot Jantung)

Ml in a day v 1.1

  • 1. Machine Learning 101 Advanced Analytics and DataScience
  • 2. CCG Analytics Solutions & Services DATA MANAGEMENT Data & analytics consultants with a passion for helping clients overcome business challenges & increase performance by leveraging modern analytic solutions. BUSINESS ANALYTICS DATA STRATEGY
  • 3. VOICES OF OUR CUSTOMERS “CCGto brought the expertise and the vision of to help us execute, to provide visibility to the data in a manner that we can use it faster.” - Gary Gray, Business Solutions Executive, Corsicana Mattress Company “The people we talked to know us. CCGwasn’t trying to fit us into a boilerplate template but prescribe a tailored solution. Their RapidRoadmap was the basis of our BI Strategy for the next two years.” - Kevin Davis, Sr. Director of BI, Kforce “Many times with CCG, we come to the table with questions or ideas and within a couple of days or weeks the team comes back with above and beyond what we actually asked for. They care.” - Chris Fitzpatrick, Vice President of Business Analytics & Strategy, vineyard vines “"I'mamazed at the talent at CCG, not just the skillset - they're really good people. We've already referred them once and will do so again!” - CIO, Ruth’s Chris Hospitality Group
  • 4. Objectives By the end of this workshop, you should be able to: Describe what Machine Learning is and how it fits in to the analytic landscape Understand the difference between traditional and “advanced” analytics Describe what a statistical model is Understand a machine learning approach to statistical modeling The conceptual methodology behind the Machine Learning areas of classification, and clustering Describe some the most common tools for implementing data science and Machine Learning
  • 5. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  • 6. The concepts in Machine Learning are not new. How has Machine Learning Evolved? https://www.quantinsti.com/blog/machine-learning-basics nother human.
  • 7. Even though the concepts are decades old, machine learning has only become feasible at scale in recent years. Why Machine Learning Now? Flood of data and decreasing costs of storage Increasing computational power Increased attention from researchers Growth of open source technologies Support from industries
  • 8. Machine Learning has tons of useful applications you already encounter or hear about every day. Analyzing Images Understanding Language Forming & Executing Strategy Personalized Recommendations Autonomous Decisions Predicting Asset Values How is Machine Learning used?
  • 9. Machine Learning isn’t just applicable to high tech. There are suitable use cases presentin most business sectors. Where is Machine Learning used? Healthcare • Claims Fraud • Real-time mortality risk for ICU patients • Response Adapted Radiotherapy • Predictingpatient medication adherence • Translational/precision medicine Finance • Foreclosure/creditrisk • Risk analysis • Fraud detection • Demand forecasting • Anti Money Laundering • Algorithmic trading Energy • Resource allocation • Load forecasting • Grid optimization • Robotics • Anomaly detection • Image recognition • Predictivemaintenance Retail • Singleview of customer • Customer serviceanalysis • Inventory planning • Social media analysis • Lead scoring • Marketing campaign evaluation
  • 10. Machine learning sits at the intersectionof statistics and computer science to help businesses make decisions. Why Machine Learning Now? Computational Power Statistics Predictive & Prescriptive Decision Support Faster More Accurate More PowerfulSelf-Improving Always-On
  • 11. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  • 12. Machine Learning is a technique that can be used in the data science process to achieve several possible outputs. What is Machine Learning? Data Science A broadprocessfor generatinginsights that mayinvolve dataingestionfrom one or manysources(includingexternal data, streamingdata, or bigdata), data processingandcleansing, model generationusingstatistical ormachine learningapproaches, model selection, model deploymentandmaintenance, and visualizationof data. Advanced Analytics Applydatascience topredictive (what will happen?)orprescriptive(what shouldwe do?) businessuse cases. Artificial Intelligence / Cognitive Computing Applydatascience toapproximate humanintuitionanddecisionmaking (e.g.strategy,creativity,planning) or humansensoryfunctions(e.g. computervision,natural language understanding,etc.) Statistics A branchof mathfor generating descriptionsof orinferencesabouta population,oftenbasedonsamples of the population.Inferencesmay take the form of “models,”which are equationsthatapproximate the data’sinherentrelationships. Machine Learning Combinescomputerscience with mathconceptsto generate models by rapidlyiteratingonlarge datasets. Other Analytics Disciplines (Data Engineering,Visualization) Disciplines Process Outputs Automation / Robotics / Intelligent Devices Actions Strategy / Operations
  • 13. Advanced Analytics (“AA”) enable predictive and prescriptive uses of data by applying sophisticatedmath and statistics to automate parts of the analysis. What is Advanced Analytics? Traditional analytics focuses on understanding and explaining the data that has been collected. AA focuses on generating new data in the form of predictions or decisions, and going the extra step to automate decision-making when possible.
  • 14. Advanced Analytics deal with making “best guesses” faster,better,and more consistentthan relying on human SMEs. Provide insights on existing data using: • Raw data points • Summaries of data • Calculations across existing data fields • KPIs The data reported are historical or current facts. Generally requires the application of basic mathematics or arithmetic. Generate new data, including: • Predicted future values • Best guesses of missing values • Suggested next steps • Categorizations The data generated are “best guesses” and contain some uncertainty. Requires the application of advanced mathematics, statistics and computing principles. TraditionalAnalytics AdvancedAnalytics Traditional vs. Advanced Analytics
  • 15. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  • 16. A model is a repeatable, data-driven approach to making a best guess. It does this by formalizing mathematical relationships between data in the form of either: – Rules (e.g. predict applicants will default on a loan if Credit Score < 700 and Debt to Income Ratio > 30%) – Or an equation (e.g. predict Home Price = 100*Square Footage + 2*Average Income in the Area) NOTE that this is not the same as a DATA model. These are different things: Machine Learning works by using “algorithms” to generate “models.” How does Machine Learning work? Data Model Statistical Model
  • 17. In the past we’ve toldcomputers how to use data to a answer our questions. Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month sales = (prior month + 2 months prior + 3 months prior) / 3 Answer This month’s sales = $3MM? What’s a model?
  • 18. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM But we’ve found that if we give the machine historicfacts, we can let it find the right program/ model to plug in for future answers. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month’s sales = 1/8 * Prior month + 1/3 * 2 months prior + 1/4 * 3 months prior What’s a model?
  • 19. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $1MM Once we have our machine-defined program, we can use it with new data to make better predictions. Answer Last month’s sales: $2MM Data Prior month sales: $4MM 2 months prior: $3MM 3 months prior: $2MM Program / Model This month’s sales = 1/8 * Prior month + 1/3 * 2 months prior + 1/4 * 3 months prior New Data Prior month sales: $8MM 2 months prior: $6MM 3 months prior: $8MM Answer This month’s sales = $5MM What’s a model?
  • 20. A defined set of steps for solving a problem Often involves repeating steps May or may not have an ending condition – The problem is solved to our satisfaction • For example – stop when the last 4 iterations have been 95% accurate or better – The problem hasn’t been solved but we don’t seem to be getting any closer to solving it • For example – stop if the last 10 iterations have not seen any improvement in accuracy – The process has run for a long time • For example – stop after the program has run for 12 hours, regardless of whether progress is still being made The wordalgorithm gets used a lot, but it isn’t always defined. What is an algorithm?
  • 21. Collect the data and randomly create initial decision rules. Design a method for measurably evaluating how good or bad your hypothesis is. Update your hypothesis in a way that marginally improves the performance of your decision rules. Continue this process until the hypothesis either you are satisfied with the results, or your hypothesis can’t improve anymore with the data available. Almostall machine learning algorithms followthe same general pattern. Create a hypothesis Evaluate the hypothesis Adjustthe hypothesis Repeat until convergence What is an algorithm?
  • 22. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  • 23. There are two main families of algorithms to choose from. Supervised Learning Unsupervised Learning There aren’t necessarily “right answers,” we just want to get a better understanding of our data. We know the “right answers” for some of the scenarios. – We may have history we can look back on – We may be hoping to replicate human decision making
  • 24. Supervisedor Unsupervised? Predict our profits next quarter. Supervised Identify the number written on a check. Group our customers into segments. Supervised Unsupervised Predict a user’s rating for a given product. Supervised Find the most importantvariables in a dataset. Unsupervised Identify credit card transactionsthat are out of the ordinary. Unsupervised
  • 25. Now let’s walkthrough two of the mostpopular machine learning approaches and discuss how the algorithms are applied. How does an algorithm really workfor businesses? Classification Clustering
  • 26. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Everyone will repay their loan. Create a hypothesis 20 outstanding loans
  • 27. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Calculate accuracy as the % of predictions that are correct based on your current set of rules. Evaluate the hypothesis 20 outstanding loans 12 repaid, 8 defaulted Accuracy = 12/20 = 60%
  • 28. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Find the next branch by looking for the data split that would have the biggest impact on the purity of each node. There are several ways to do this mathematically (Gini Index, Information Gain, Chi- Square). Adjustthe hypothesis 20 outstanding loans20 outstanding loans 20 outstanding loans CreditScore > 700CreditScore < 700 Income > 60kIncome < 60k DTI > 40%DTI < 40% 80%73%70%50%71%53% 59% weighted 60% weighted 75% weighted
  • 29. Use classificationwhen you want to guess a non-numeric value,like a yes/no answer.We will take a decisiontree approach. Repeat the process for each of your new “leaf” nodes. Stop when you reach an acceptable level of accuracy, or when your accuracy begins getting worse with independent data. Repeat until convergence 20 outstanding loans DTI > 40%DTI < 40% CreditScore > 700CreditScore < 700Income > $60kIncome < $60k 100%50% 100%100% 80% weighted
  • 30. Classificationis used for lots of problems that copy human intuition. Think about how you classify informationto identify these images! These use cases areobviously morecomplex than our simple decision tree, but with moreadvanced approaches like convolutionalneural networks thesepictures can definitely be classified by a machine.
  • 31. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Imagine Marketing has asked you to split these customers into 3 groups. How would you do it?
  • 32. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. I can segment my customers by assigning them to 3 groups. We’ll set down 3 random “anchors” and assign each customer to its closest anchor. Create a hypothesis
  • 33. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Find the distance between each customer and the center of each group. Take note of which customers are actually closest to a different center than the one they’re assigned to. Evaluate the hypothesis
  • 34. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Reassign each customer to the group corresponding to the center they’re closest to, and move the anchors to the middle of their new group. Adjustthe hypothesis
  • 35. Use clustering when there’s no “correct”classification,but you still want to assign individuals to groups. This algorithmis called k-means clustering. Repeat until convergence Keep moving the anchors and re-assigning customers until the anchors stop moving.
  • 36. This is just the tip of the iceberg.There are several algorithms available for various types of problems.
  • 37. AGENDA Why should anyone care about machine learning? What is Machine Learning? How does Machine Learning work? Ok but how does it really work? How can an organization use Machine Learning?
  • 38. Delivering analytics with Machine Learning requires alignment across people, process,technology,and data. Engaging with Machine Learning Image inspired by Microsoft People Process Technology Data Guide Support Enable
  • 39. Data scientists combine broad skills to integrate data, build models,and drive business value. People Process Technology Data
  • 40. Let’s lookat the MicrosoftTeamData Science Process to see how data scientists spend their time. People Process Technology Data
  • 41. TraditionalAnalytics The outputs of the process can be used in traditional analytics, analyzed directly,or fed into automated decision-making. Storeand access data. Filter and aggregate it. Visualizeit. Show it to the business so they can take action. MachineLearning Filter and aggregate it. Create a model. Generate new data (predictions, etc.). The new data can be stored with the rest of the data for usein analytics. Or it can be visualized directly to gain insights. Or it can automate decisions or actions, allowing better processes to run faster and 24/7. People Process Technology Data
  • 42. The sources of data for use in data science can be broad. People Process Technology Data Data Warehouses •Curated & Governeddata •Big data •Cloud or on-prem Data Lakes •Unstructured& Semi-structured data •Streaming data •Partiallycurated Externally Procured Data •Maybe purchased from 3rd party providers •Maybe scraped from the web •Mayrequire designingresearch experiments Data scientists typically havethe programming and data integration skills to use data fromanywhereitcan be found.
  • 43. The Microsofttechnology stackprovides a holistic solutionto your Machine Learning needs. People Process Technology Data
  • 44. We can work with your business to deliver custompredictive and prescriptive analytics across the lifecycle. What can CCG do? Use Case Definition • Develop a backlog of predictive and prescriptive use cases • Refine and prioritize use cases by value • Develop a predictive roadmap Model Development • Aggregate data from across internal and external data sources • Develop and test multiple models to find the best approach to making predictions Model Maintenance • Monitor and maintain statistical models to sustain predictive power • Develop a model telemetry dashboard • Test model design changes to improve predictive power Model Governance & Processes • Assess existing Data Science capabilities • Develop standards and processes to help guide data science output • Build a Data Science Center of Excellence Model Deployment • Customize and deploy pre-existing models from Azure Cognitive Services • Deploy custom model as an API or batch job, or support deployment in existing systems Rapid Insight Prototype Offering Model as a Service Subscription Offering
  • 45. CCG’s Rapid Insight Solution Actionable Backlog – Of use cases ripe for predictive analytics to transform your business Detailed Readouts – The materials we leave behind will include extensive analysis of our methodology, findings, and recommendations Ownership of the Model – Just because the project ends doesn’t mean the model stops working. Unlike other managed service providers, what we produce on your behalf is yours to keep Identify Use Cases – By holding a workshop with process SMEs to identify opportunities to supercharge the business Summarize the Findings – So you can understand the model’s outputs and begin taking action on what we’ve learned Develop a Prototype Model – To generate forecasts, classifications, orexploratory analysis forone of your use cases using an industry-standard tool like Azure Machine Learning Studio or Databricks Week 1 Weeks 2-5 Week 6
  • 46. Fully Operational Production Model – Available at all times, in production – Batch & API integrations Model Supervision – Model is monitored for ongoing usability – Performance dashboard – Guaranteed accuracy SLAs Model Retraining & Support – Scheduled & triggered model re-tuning or re-training – Add new data features over time Model as a Service Solution Set up model as a web service Visualize model performance in a dashboard Maintain and enhance model
  • 48. Microsoftofferspre-builtAPIs through Cognitive Services that can expedite the deploymentof AI capabilities. People Process Technology Data
  • 49. VISUAL DRAG -AND-DROP Azure Machine Learning Studio
  • 50. What is Azure Databricks? A fast, easy and collaborative Apache® Spark™ based analytics platform optimized for Azure Best of Databricks Best of Microsoft Designed in collaboration with the founders of Apache Spark One-click set up; streamlined workflows Interactive workspace that enables collaboration between data scientists, data engineers, and business analysts. Native integration with Azure services (Power BI, SQL DW, Cosmos DB, Blob Storage) Enterprise-grade Azure security (Active Directory integration, compliance, enterprise-grade SLAs)
  • 51. Azure Databricks key audiences & benefits Unified analytics platform Integrated workspace Easy data exploration Collaborative experience Interactive dashboards Faster insights • Best of Spark & serverless • Databricks managed Spark Improved ETL performance • Zero management clusters, serverless Easy to schedule jobs Automated workflows Enhanced monitoring & troubleshooting • Automated alerts & easy access to logs Zero Management Spark Cluster democratization (serverless) Fast, collaborative analytics platform accelerating time to market No dev-ops required Enterprise grade security • Encryption • End-to-end auditing • Role-based control • Compliance Data scientist Data engineer CDO, VP of analytics Provided by Microsoft and Databricks under NDA