SlideShare a Scribd company logo
RUNNING INTELLIGENT
APPLICATIONS INSIDE A
DATABASE: DEEP LEARNING
WITH PYTHON STORED
PROCEDURES IN SQL
@ODSC
Dr. Miguel Fierro
@miguelgfierro
https://miguelgfierro.com
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
source: http://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf
$15.7Trillion by 2030 ~ 14% GPD
Productivity gains ($6.6T)
Automation
Increased demand ($9.1T)
Augmentation Higher quality products
AI is the Biggest Business Opportunity
More and more data
source: https://xkcd.com/1838/
90% of the data created in the
last 2 years
Estimations are 40x by 2020
+info: https://miguelgfierro.com/blog/2017/deep-learning-for-
entrepreneurs/
Traditional Python vs SQL Python
Don’t move huge amounts of data
Don’t move critical data
Traditional Python vs SQL Python
Azure Relational Database Platform
Azure Cloud in 38 regions
AzureAnalytics,ML,CognitiveServices,
Bots,PowerBI
Azure Compute & Storage
Database Service Platform
Secure: High Availability, Audit, Backup/Restore
Flexible: On-demand scaling, Resource governance
Intelligence: Advisor, Tuning, Monitoring
SQL Server, MySQL & PostgreSQL
SQL Server 2017 Features
+info: https://www.microsoft.com/en-us/sql-server/sql-server-2017-editions
Management
Platforms Windows, Linux & Docker
Max size 534Pb
Stretch database
Manage hybrid scenarios with on-premise and
cloud data
Programmability JSON & Graph support
Security
Dynamic Data Masking Protects sensitive data
Row-level security Access control of rows based on user priviledges
Performance
In-memory performance Memory optimized tables
Adaptive query processing Performance improvement of batch queries
Analytics
Advance Analytics Python & R integration
Parallel Advanced Analytics Python & R integration with GPU processes
SQL Server 2017 Platforms: Linux
+info: https://blogs.technet.microsoft.com/dataplatforminsider/2016/12/16/sql-server-on-linux-how-introduction/
SQLPAL (SQL Platform Abstraction Layer) allows some
Windows libraries to run on Linux
SQLPAL interacts with the Linux host through Application
Binary Interface calles (ABI)
The performance in Windows and Linux is basically the same
SQL Server 2017 Programmability
Temporal tables
JSON support
Graph data support
Polybase to interact with
Hadoop
Python SQL for Model Development
Python SQL for Model Operationalization
Database Stored Procedures
Functions stored inside the database
Have input and output parameters
Are stored in the database data
dictionary
Example:
CREATE PROCEDURE
<procedure name>
AS BEGIN
<SQL statement>
END GO
System Stored Procedures
+info: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/system-stored-
procedures-transact-sql
Geo-replication SP
Maintenance Plan SP
Policy Management SP
Replication SP
Distributed Query Management SP
Database Engine SP
Execute External Script Stored Procedure
EXECUTE sp_execute_external_script
@language = N’language’
, @script = N‘ <code here> ’
, @input_data_1 = N' SELECT *’
WITH RESULT SETS ((<var_name> char(20) NOT NULL));
EXECUTE sp_execute_external_script
@language = N’R’
, @script = N‘
mytextvariable <- c("hello", " ", "world");
OutputDataSet <- as.data.frame(mytextvariable);’
, @input_data_1 = N‘SELECT 1 as Temp1’
WITH RESULT SETS (([Col1] char(20) NOT NULL));
revoscalepy and RevoScaleR
+info revoscalepy: https://docs.microsoft.com/en-us/machine-learning-server/python-reference/revoscalepy/revoscalepy-package
+info RevoScaleR: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler
RxLocalSeq RxInSqlServer RxSpark
3 compute contexts for Python and R
revoscalepy functions
Category Description
Compute context Getters and Setters of compute context
Data source Data source object for ODBC, XDF, SQL table, SQL query
ETL Data input/output and transformation
Analytics
Linear regression, logistic regression, random forest, boosted
decision trees
Jobs Manage and schedule jobs, monitoring
Serialization Serialization of models and data objects
Utility Manage utilities and status functions
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
Ski rental prediction with revoscalepy
source: https://microsoft.github.io/sql-ml-tutorials/python/rentalprediction/
EXEC sp_configure 'external scripts enabled', 1;
RECONFIGURE WITH OVERRIDE
SQL
USE master;
GO
RESTORE DATABASE TutorialDB
FROM DISK = 'C:MSSQLBackupTutorialDB.bak'
WITH
MOVE 'TutorialDB' TO 'C:MSSQLDATATutorialDB.mdf'
,MOVE 'TutorialDB_log' TO 'C:MSSQLDATATutorialDB.ldf';
GO
SQL
import pandas as pd
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
from revoscalepy import RxComputeContext, RxInSqlServer, RxSqlServerData
from revoscalepy import rx_import
#Connection string to connect to SQL Server named instance
conn_str = 'Driver=SQL Server;Server=MYSQLSERVER;
Database=TutorialDB;
Trusted_Connection=True;’
data_source = RxSqlServerData(table="dbo.rental_data",
connection_string=conn_str,
column_info=column_info)
computeContext = RxInSqlServer(
connection_string = conn_str,
num_tasks = 1,
auto_cleanup = False
)
RxInSqlServer(connection_string=conn_str, num_tasks=1,
auto_cleanup=False)
Python
Ski rental prediction with revoscalepy
# import data source and convert to pandas dataframe
df = pd.DataFrame(rx_import(input_data = data_source))
print("Data frame:", df)
Python
Rows Processed: 453
Data frame: Day Holiday Month RentalCount Snow WeekDay Year
0 20 1 1 445 2 2 2014
1 13 2 2 40 2 5 2014
2 10 2 3 456 2 1 2013
3 31 2 3 38 2 2 2014
4 24 2 4 23 2 5 2014
5 11 2 2 42 2 4 2015
6 28 2 4 310 2 1 2013
...
[453 rows x 7 columns]
Results
Ski rental prediction with revoscalepy
# Store the variable we'll be predicting on.
target = "RentalCount“
# Generate the training set. Set random_state to be able to replicate
results.
train = df.sample(frac=0.8, random_state=1)
# Select anything not in the training set and put it in the testing set.
test = df.loc[~df.index.isin(train.index)]
# Initialize the model class.
lin_model = LinearRegression()
# Fit the model to the training data.
lin_model.fit(train[columns], train[target])
Python
Ski rental prediction with revoscalepy
# Generate our predictions for the test set.
lin_predictions = lin_model.predict(test[columns])
print("Predictions:", lin_predictions)
# Compute error between our test predictions and the actual values.
lin_mse = mean_squared_error(lin_predictions, test[target])
print("Computed error:", lin_mse)
Python
Predictions: [ 40. 38. 240. 39. 514. 48. 297. 25. 507. 24.
30. 54. 40. 26. 30. 34. 42. 390. 336. 37. 22. 35.
55. 350. 252. 370. 499. 48. 37. 494. 46. 25. 312. 390.
35. 35. 421. 39. 176. 21. 33. 452. 34. 28. 37. 260.
49. 577. 312. 24. 24. 390. 34. 64. 26. 32. 33. 358.
348. 25. 35. 48. 39. 44. 58. 24. 350. 651. 38. 468.
26. 42. 310. 709. 155. 26. 648. 617. 26. 846. 729. 44.
432. 25. 39. 28. 325. 46. 36. 50. 63.]
Computed error: 3.59831533436e-26
Results
Ski rental prediction with revoscalepy
Ski rental prediction with SQL store procedures
USE TutorialDB;
DROP TABLE IF EXISTS rental_py_models;
GO
CREATE TABLE rental_py_models (
model_name VARCHAR(30) NOT NULL DEFAULT('default model’) PRIMARY KEY,
model VARBINARY(MAX) NOT NULL);
GO
SQL
DROP TABLE IF EXISTS py_rental_predictions;
GO
CREATE TABLE py_rental_predictions(
[RentalCount_Predicted] [int] NULL,
[RentalCount_Actual] [int] NULL,
[Month] [int] NULL,
[Day] [int] NULL,
[WeekDay] [int] NULL,
[Snow] [int] NULL,
[Holiday] [int] NULL,
[Year] [int] NULL);
GO
SQL
-- Train model
CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max)
OUTPUT)
AS
BEGIN
EXECUTE sp_execute_external_script
@language = N'Python'
, @script = N'
from sklearn.linear_model import LinearRegression
import pickle
df = rental_train_data
lin_model = LinearRegression()
lin_model.fit(df[columns], df[target])
trained_model = pickle.dumps(lin_model)’
, @input_data_1 = N'select "RentalCount", "Year", "Month", "Day",
"WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015'
, @input_data_1_name = N'rental_train_data'
, @params = N'@trained_model varbinary(max) OUTPUT'
, @trained_model = @trained_model OUTPUT;
END;
GO
SQL
Ski rental prediction with SQL store procedures
--Execute model training
DECLARE @model VARBINARY(MAX);
EXEC generate_rental_py_model @model OUTPUT;
INSERT INTO rental_py_models (model_name, model) VALUES('linear_model',
@model);
SQL
Ski rental prediction with SQL store procedures
DROP PROCEDURE IF EXISTS py_predict_rentalcount;
GO
CREATE PROCEDURE py_predict_rentalcount (@model varchar(100))
AS
BEGIN
DECLARE @py_model varbinary(max) = (select model from
rental_py_models where model_name = @model);
EXEC sp_execute_external_script
@language = N‘Python’,
@script = N‘
rental_model = pickle.loads(py_model)
df = rental_score_data
# [… python code here …]
lin_predictions = rental_model.predict(df[columns])
predictions_df = pd.DataFrame(lin_predictions)
OutputDataSet = pd.concat([predictions_df, df["RentalCount"],
df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"],
df["Year"]], axis=1)
’
-- [… continues in next slide…]
SQL
Ski rental prediction with SQL store procedures
--[… from previous slide…]
, @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day",
"WeekDay", "Snow", "Holiday" from rental_data where Year = 2015'
, @input_data_1_name = N'rental_score_data'
, @params = N'@py_model varbinary(max)'
, @py_model = @py_model
WITH RESULT SETS (("RentalCount_Predicted" float, "RentalCount" float,
"Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float,
"Year" float));
END;
GO
SQL
-- Execute the prediction
EXEC py_predict_rentalcount 'linear_model';
SELECT * FROM py_rental_predictions;
SQL
Ski rental prediction with SQL store procedures
AI WHERE THE DATA IS
FORECASTING IN SQLSERVER
CANCER DETECTION IN SQLSERVER
Convolutional
Neural Networks
(CNN)
Recurrent
Neural Networks
(RNN)
Two General Kinds of Neural Networks
low level features high level featuresmedium level features
Interesting paper about representations: https://arxiv.org/abs/1411.1792
Multiple Levels of Representation
$1 million in prizes !
Determine
whether a
patient has
cancer or not
competition
Lung Cancer Competition
Data: CT scans of the lung
1595 patients with a diagnostic
200-500 scans per patient
Images of 512x512px
ImageNet dataset Lung cancer dataset
weight transference
Transfer Learning
Forward and backward propagation
input hidden hidden hidden hidden hidden output
Standard Training
Transference option 1: freeze n-1 layers
Frozen layers
input hidden hidden hidden hidden hidden output
Transference option 2: freeze initial layers
Frozen layers Forward and backward propagation
input hidden hidden hidden hidden hidden output
Transference option 3: fine tuning
Forward and backward propagation
input hidden hidden hidden hidden hidden output
3
224
224
last
layer
ImageNet ResNet N layers
penultimate
layer
cat
Pretrained ResNet 152
source: https://github.com/Azure/sql_python_deep_learning
Solution: CNN Featurizer
source: https://github.com/Azure/sql_python_deep_learning
3
224
224
ResNet N-1 layers
penultimate
layer
CNTK
(53min)k batch
of images
= 1 patient
3
224
224
ResNet N-1 layers
penultimate
layer
no
cancer
CNTK
(53min)
LightGBM
(2min)
Boosted tree
k batch
of images
= 1 patient
features
Solution: Boosted Tree Classifier
source: https://github.com/Azure/sql_python_deep_learning
(Extra slide): 2nd place in the competition
source: https://github.com/juliandewit/kaggle_ndsb2017
Deep Learning in SQL Server: Training
sp.dbo.GenerateFeatures
CNTK with GPUs
sp.dbo.TrainLungCancerModel
LightGBM
Populate tables
Deep Learning in SQL Server: Operationalization
sp.dbo.PredictLungCancer Web App
Demo
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Solution in SQL Server 2017
Web App
Web App
Web App
THANK YOU
@ODSC
Dr. Miguel Fierro
@miguelgfierro
https://miguelgfierro.com

More Related Content

PDF
Agile Data Science 2.0
PDF
Agile Data Science 2.0
PDF
Agile Data Science 2.0
PDF
Predictive Analytics with Airflow and PySpark
PDF
Agile Data Science 2.0
PPT
Agile Data Science: Hadoop Analytics Applications
PDF
High-Performance Advanced Analytics with Spark-Alchemy
PDF
Goal Based Data Production with Sim Simeonov
Agile Data Science 2.0
Agile Data Science 2.0
Agile Data Science 2.0
Predictive Analytics with Airflow and PySpark
Agile Data Science 2.0
Agile Data Science: Hadoop Analytics Applications
High-Performance Advanced Analytics with Spark-Alchemy
Goal Based Data Production with Sim Simeonov

What's hot (20)

PDF
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
PDF
New developments in open source ecosystem spark3.0 koalas delta lake
PDF
Telemetry doesn't have to be scary; Ben Ford
PDF
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
PDF
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
PDF
DataFu @ ApacheCon 2014
PPTX
Building Data Products at LinkedIn with DataFu
PPTX
Enabling Search in your Cassandra Application with DataStax Enterprise
PPTX
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
PDF
Apache Cassandra & Data Modeling
PPTX
How Concur uses Big Data to get you to Tableau Conference On Time
PDF
PostgreSQL Open SV 2018
PDF
Fishing Graphs in a Hadoop Data Lake
PPTX
AdClickFraud_Bigdata-Apic-Ist-2019
PDF
Time series with Apache Cassandra - Long version
PDF
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
PPTX
Genomic Scale Big Data Pipelines
PDF
Clustering your Application with Hazelcast
PDF
Nike Tech Talk: Double Down on Apache Cassandra and Spark
PDF
H20: A platform for big math
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
New developments in open source ecosystem spark3.0 koalas delta lake
Telemetry doesn't have to be scary; Ben Ford
AI 클라우드로 완전 정복하기 - 데이터 분석부터 딥러닝까지 (윤석찬, AWS테크에반젤리스트)
개발자가 알아두면 좋을 5가지 AWS 인공 지능 깨알 지식 - 윤석찬 (AWS 테크 에반젤리스트)
DataFu @ ApacheCon 2014
Building Data Products at LinkedIn with DataFu
Enabling Search in your Cassandra Application with DataStax Enterprise
From Postgres to Cassandra (Rimas Silkaitis, Heroku) | C* Summit 2016
Apache Cassandra & Data Modeling
How Concur uses Big Data to get you to Tableau Conference On Time
PostgreSQL Open SV 2018
Fishing Graphs in a Hadoop Data Lake
AdClickFraud_Bigdata-Apic-Ist-2019
Time series with Apache Cassandra - Long version
Flickr: Computer vision at scale with Hadoop and Storm (Huy Nguyen)
Genomic Scale Big Data Pipelines
Clustering your Application with Hazelcast
Nike Tech Talk: Double Down on Apache Cassandra and Spark
H20: A platform for big math
Ad

Similar to Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL (20)

PDF
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
PDF
Agile Data Science
PDF
Viktor Tsykunov: Azure Machine Learning Service
PDF
Agile Data Science 2.0 - Big Data Science Meetup
PPTX
Unsupervised Aspect Based Sentiment Analysis at Scale
PDF
Intershop Commerce Management with Microsoft SQL Server
PDF
Agile Data Science 2.0: Using Spark with MongoDB
PDF
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
PDF
Strategies for refactoring and migrating a big old project to be multilingual...
PDF
I want my model to be deployed ! (another story of MLOps)
PDF
Workshop: Your first machine learning project
PPTX
Employee Salary Presentation.l based on data science collection of data
PPTX
Example R usage for oracle DBA UKOUG 2013
PPTX
Data visualization in python/Django
PDF
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
PDF
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
PDF
Interactively querying Google Analytics reports from R using ganalytics
PDF
PerlApp2Postgresql (2)
PPTX
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
PPTX
Smart Data Conference: DL4J and DataVec
Yufeng Guo | Coding the 7 steps of machine learning | Codemotion Madrid 2018
Agile Data Science
Viktor Tsykunov: Azure Machine Learning Service
Agile Data Science 2.0 - Big Data Science Meetup
Unsupervised Aspect Based Sentiment Analysis at Scale
Intershop Commerce Management with Microsoft SQL Server
Agile Data Science 2.0: Using Spark with MongoDB
IRJET- Unabridged Review of Supervised Machine Learning Regression and Classi...
Strategies for refactoring and migrating a big old project to be multilingual...
I want my model to be deployed ! (another story of MLOps)
Workshop: Your first machine learning project
Employee Salary Presentation.l based on data science collection of data
Example R usage for oracle DBA UKOUG 2013
Data visualization in python/Django
Lessons Learned Using Apache Spark for Self-Service Data Prep in SaaS World
"Lessons learned using Apache Spark for self-service data prep in SaaS world"
Interactively querying Google Analytics reports from R using ganalytics
PerlApp2Postgresql (2)
Analytics Metrics delivery and ML Feature visualization: Evolution of Data Pl...
Smart Data Conference: DL4J and DataVec
Ad

More from Miguel González-Fierro (12)

PPTX
Los retos de la inteligencia artificial en la sociedad actual
PDF
Knowledge Graph Recommendation Systems For COVID-19
PDF
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
PPTX
Best practices in coding for beginners
PDF
Distributed training of Deep Learning Models
PPTX
Deep Learning for Sales Professionals
PPTX
Deep Learning for Lung Cancer Detection
PPTX
Mastering Computer Vision Problems with State-of-the-art Deep Learning
PPTX
Speeding up machine-learning applications with the LightGBM library
PDF
Leveraging Data Driven Research Through Microsoft Azure
PDF
Empowering every person on the planet to achieve more
PDF
Deep Learning for NLP
Los retos de la inteligencia artificial en la sociedad actual
Knowledge Graph Recommendation Systems For COVID-19
Thesis dissertation: Humanoid Robot Control of Complex Postural Tasks based o...
Best practices in coding for beginners
Distributed training of Deep Learning Models
Deep Learning for Sales Professionals
Deep Learning for Lung Cancer Detection
Mastering Computer Vision Problems with State-of-the-art Deep Learning
Speeding up machine-learning applications with the LightGBM library
Leveraging Data Driven Research Through Microsoft Azure
Empowering every person on the planet to achieve more
Deep Learning for NLP

Recently uploaded (20)

PDF
composite construction of structures.pdf
PDF
Well-logging-methods_new................
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
DOCX
573137875-Attendance-Management-System-original
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PPTX
Geodesy 1.pptx...............................................
PDF
PPT on Performance Review to get promotions
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
PDF
Automation-in-Manufacturing-Chapter-Introduction.pdf
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PDF
Digital Logic Computer Design lecture notes
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Sustainable Sites - Green Building Construction
PPTX
web development for engineering and engineering
PPT
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS
composite construction of structures.pdf
Well-logging-methods_new................
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
573137875-Attendance-Management-System-original
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
Geodesy 1.pptx...............................................
PPT on Performance Review to get promotions
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
Automation-in-Manufacturing-Chapter-Introduction.pdf
R24 SURVEYING LAB MANUAL for civil enggi
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Embodied AI: Ushering in the Next Era of Intelligent Systems
Digital Logic Computer Design lecture notes
Operating System & Kernel Study Guide-1 - converted.pdf
Sustainable Sites - Green Building Construction
web development for engineering and engineering
CRASH COURSE IN ALTERNATIVE PLUMBING CLASS

Running Intelligent Applications inside a Database: Deep Learning with Python Stored Procedures in SQL

  • 1. RUNNING INTELLIGENT APPLICATIONS INSIDE A DATABASE: DEEP LEARNING WITH PYTHON STORED PROCEDURES IN SQL @ODSC Dr. Miguel Fierro @miguelgfierro https://miguelgfierro.com
  • 2. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 3. source: http://www.pwc.com/gx/en/issues/analytics/assets/pwc-ai-analysis-sizing-the-prize-report.pdf $15.7Trillion by 2030 ~ 14% GPD Productivity gains ($6.6T) Automation Increased demand ($9.1T) Augmentation Higher quality products AI is the Biggest Business Opportunity
  • 4. More and more data source: https://xkcd.com/1838/ 90% of the data created in the last 2 years Estimations are 40x by 2020 +info: https://miguelgfierro.com/blog/2017/deep-learning-for- entrepreneurs/
  • 6. Don’t move huge amounts of data Don’t move critical data Traditional Python vs SQL Python
  • 7. Azure Relational Database Platform Azure Cloud in 38 regions AzureAnalytics,ML,CognitiveServices, Bots,PowerBI Azure Compute & Storage Database Service Platform Secure: High Availability, Audit, Backup/Restore Flexible: On-demand scaling, Resource governance Intelligence: Advisor, Tuning, Monitoring SQL Server, MySQL & PostgreSQL
  • 8. SQL Server 2017 Features +info: https://www.microsoft.com/en-us/sql-server/sql-server-2017-editions Management Platforms Windows, Linux & Docker Max size 534Pb Stretch database Manage hybrid scenarios with on-premise and cloud data Programmability JSON & Graph support Security Dynamic Data Masking Protects sensitive data Row-level security Access control of rows based on user priviledges Performance In-memory performance Memory optimized tables Adaptive query processing Performance improvement of batch queries Analytics Advance Analytics Python & R integration Parallel Advanced Analytics Python & R integration with GPU processes
  • 9. SQL Server 2017 Platforms: Linux +info: https://blogs.technet.microsoft.com/dataplatforminsider/2016/12/16/sql-server-on-linux-how-introduction/ SQLPAL (SQL Platform Abstraction Layer) allows some Windows libraries to run on Linux SQLPAL interacts with the Linux host through Application Binary Interface calles (ABI) The performance in Windows and Linux is basically the same
  • 10. SQL Server 2017 Programmability Temporal tables JSON support Graph data support Polybase to interact with Hadoop
  • 11. Python SQL for Model Development
  • 12. Python SQL for Model Operationalization
  • 13. Database Stored Procedures Functions stored inside the database Have input and output parameters Are stored in the database data dictionary Example: CREATE PROCEDURE <procedure name> AS BEGIN <SQL statement> END GO
  • 14. System Stored Procedures +info: https://docs.microsoft.com/en-us/sql/relational-databases/system-stored-procedures/system-stored- procedures-transact-sql Geo-replication SP Maintenance Plan SP Policy Management SP Replication SP Distributed Query Management SP Database Engine SP
  • 15. Execute External Script Stored Procedure EXECUTE sp_execute_external_script @language = N’language’ , @script = N‘ <code here> ’ , @input_data_1 = N' SELECT *’ WITH RESULT SETS ((<var_name> char(20) NOT NULL)); EXECUTE sp_execute_external_script @language = N’R’ , @script = N‘ mytextvariable <- c("hello", " ", "world"); OutputDataSet <- as.data.frame(mytextvariable);’ , @input_data_1 = N‘SELECT 1 as Temp1’ WITH RESULT SETS (([Col1] char(20) NOT NULL));
  • 16. revoscalepy and RevoScaleR +info revoscalepy: https://docs.microsoft.com/en-us/machine-learning-server/python-reference/revoscalepy/revoscalepy-package +info RevoScaleR: https://docs.microsoft.com/en-us/machine-learning-server/r-reference/revoscaler/revoscaler RxLocalSeq RxInSqlServer RxSpark 3 compute contexts for Python and R
  • 17. revoscalepy functions Category Description Compute context Getters and Setters of compute context Data source Data source object for ODBC, XDF, SQL table, SQL query ETL Data input/output and transformation Analytics Linear regression, logistic regression, random forest, boosted decision trees Jobs Manage and schedule jobs, monitoring Serialization Serialization of models and data objects Utility Manage utilities and status functions
  • 18. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 19. Ski rental prediction with revoscalepy source: https://microsoft.github.io/sql-ml-tutorials/python/rentalprediction/ EXEC sp_configure 'external scripts enabled', 1; RECONFIGURE WITH OVERRIDE SQL USE master; GO RESTORE DATABASE TutorialDB FROM DISK = 'C:MSSQLBackupTutorialDB.bak' WITH MOVE 'TutorialDB' TO 'C:MSSQLDATATutorialDB.mdf' ,MOVE 'TutorialDB_log' TO 'C:MSSQLDATATutorialDB.ldf'; GO SQL
  • 20. import pandas as pd from sklearn.linear_model import LinearRegression from sklearn.metrics import mean_squared_error from revoscalepy import RxComputeContext, RxInSqlServer, RxSqlServerData from revoscalepy import rx_import #Connection string to connect to SQL Server named instance conn_str = 'Driver=SQL Server;Server=MYSQLSERVER; Database=TutorialDB; Trusted_Connection=True;’ data_source = RxSqlServerData(table="dbo.rental_data", connection_string=conn_str, column_info=column_info) computeContext = RxInSqlServer( connection_string = conn_str, num_tasks = 1, auto_cleanup = False ) RxInSqlServer(connection_string=conn_str, num_tasks=1, auto_cleanup=False) Python Ski rental prediction with revoscalepy
  • 21. # import data source and convert to pandas dataframe df = pd.DataFrame(rx_import(input_data = data_source)) print("Data frame:", df) Python Rows Processed: 453 Data frame: Day Holiday Month RentalCount Snow WeekDay Year 0 20 1 1 445 2 2 2014 1 13 2 2 40 2 5 2014 2 10 2 3 456 2 1 2013 3 31 2 3 38 2 2 2014 4 24 2 4 23 2 5 2014 5 11 2 2 42 2 4 2015 6 28 2 4 310 2 1 2013 ... [453 rows x 7 columns] Results Ski rental prediction with revoscalepy
  • 22. # Store the variable we'll be predicting on. target = "RentalCount“ # Generate the training set. Set random_state to be able to replicate results. train = df.sample(frac=0.8, random_state=1) # Select anything not in the training set and put it in the testing set. test = df.loc[~df.index.isin(train.index)] # Initialize the model class. lin_model = LinearRegression() # Fit the model to the training data. lin_model.fit(train[columns], train[target]) Python Ski rental prediction with revoscalepy
  • 23. # Generate our predictions for the test set. lin_predictions = lin_model.predict(test[columns]) print("Predictions:", lin_predictions) # Compute error between our test predictions and the actual values. lin_mse = mean_squared_error(lin_predictions, test[target]) print("Computed error:", lin_mse) Python Predictions: [ 40. 38. 240. 39. 514. 48. 297. 25. 507. 24. 30. 54. 40. 26. 30. 34. 42. 390. 336. 37. 22. 35. 55. 350. 252. 370. 499. 48. 37. 494. 46. 25. 312. 390. 35. 35. 421. 39. 176. 21. 33. 452. 34. 28. 37. 260. 49. 577. 312. 24. 24. 390. 34. 64. 26. 32. 33. 358. 348. 25. 35. 48. 39. 44. 58. 24. 350. 651. 38. 468. 26. 42. 310. 709. 155. 26. 648. 617. 26. 846. 729. 44. 432. 25. 39. 28. 325. 46. 36. 50. 63.] Computed error: 3.59831533436e-26 Results Ski rental prediction with revoscalepy
  • 24. Ski rental prediction with SQL store procedures USE TutorialDB; DROP TABLE IF EXISTS rental_py_models; GO CREATE TABLE rental_py_models ( model_name VARCHAR(30) NOT NULL DEFAULT('default model’) PRIMARY KEY, model VARBINARY(MAX) NOT NULL); GO SQL DROP TABLE IF EXISTS py_rental_predictions; GO CREATE TABLE py_rental_predictions( [RentalCount_Predicted] [int] NULL, [RentalCount_Actual] [int] NULL, [Month] [int] NULL, [Day] [int] NULL, [WeekDay] [int] NULL, [Snow] [int] NULL, [Holiday] [int] NULL, [Year] [int] NULL); GO SQL
  • 25. -- Train model CREATE PROCEDURE generate_rental_py_model (@trained_model varbinary(max) OUTPUT) AS BEGIN EXECUTE sp_execute_external_script @language = N'Python' , @script = N' from sklearn.linear_model import LinearRegression import pickle df = rental_train_data lin_model = LinearRegression() lin_model.fit(df[columns], df[target]) trained_model = pickle.dumps(lin_model)’ , @input_data_1 = N'select "RentalCount", "Year", "Month", "Day", "WeekDay", "Snow", "Holiday" from dbo.rental_data where Year < 2015' , @input_data_1_name = N'rental_train_data' , @params = N'@trained_model varbinary(max) OUTPUT' , @trained_model = @trained_model OUTPUT; END; GO SQL Ski rental prediction with SQL store procedures
  • 26. --Execute model training DECLARE @model VARBINARY(MAX); EXEC generate_rental_py_model @model OUTPUT; INSERT INTO rental_py_models (model_name, model) VALUES('linear_model', @model); SQL Ski rental prediction with SQL store procedures
  • 27. DROP PROCEDURE IF EXISTS py_predict_rentalcount; GO CREATE PROCEDURE py_predict_rentalcount (@model varchar(100)) AS BEGIN DECLARE @py_model varbinary(max) = (select model from rental_py_models where model_name = @model); EXEC sp_execute_external_script @language = N‘Python’, @script = N‘ rental_model = pickle.loads(py_model) df = rental_score_data # [… python code here …] lin_predictions = rental_model.predict(df[columns]) predictions_df = pd.DataFrame(lin_predictions) OutputDataSet = pd.concat([predictions_df, df["RentalCount"], df["Month"], df["Day"], df["WeekDay"], df["Snow"], df["Holiday"], df["Year"]], axis=1) ’ -- [… continues in next slide…] SQL Ski rental prediction with SQL store procedures
  • 28. --[… from previous slide…] , @input_data_1 = N'Select "RentalCount", "Year" ,"Month", "Day", "WeekDay", "Snow", "Holiday" from rental_data where Year = 2015' , @input_data_1_name = N'rental_score_data' , @params = N'@py_model varbinary(max)' , @py_model = @py_model WITH RESULT SETS (("RentalCount_Predicted" float, "RentalCount" float, "Month" float,"Day" float,"WeekDay" float,"Snow" float,"Holiday" float, "Year" float)); END; GO SQL -- Execute the prediction EXEC py_predict_rentalcount 'linear_model'; SELECT * FROM py_rental_predictions; SQL Ski rental prediction with SQL store procedures
  • 29. AI WHERE THE DATA IS FORECASTING IN SQLSERVER CANCER DETECTION IN SQLSERVER
  • 31. low level features high level featuresmedium level features Interesting paper about representations: https://arxiv.org/abs/1411.1792 Multiple Levels of Representation
  • 32. $1 million in prizes ! Determine whether a patient has cancer or not competition Lung Cancer Competition
  • 33. Data: CT scans of the lung 1595 patients with a diagnostic 200-500 scans per patient Images of 512x512px
  • 34. ImageNet dataset Lung cancer dataset weight transference Transfer Learning
  • 35. Forward and backward propagation input hidden hidden hidden hidden hidden output Standard Training
  • 36. Transference option 1: freeze n-1 layers Frozen layers input hidden hidden hidden hidden hidden output
  • 37. Transference option 2: freeze initial layers Frozen layers Forward and backward propagation input hidden hidden hidden hidden hidden output
  • 38. Transference option 3: fine tuning Forward and backward propagation input hidden hidden hidden hidden hidden output
  • 39. 3 224 224 last layer ImageNet ResNet N layers penultimate layer cat Pretrained ResNet 152 source: https://github.com/Azure/sql_python_deep_learning
  • 40. Solution: CNN Featurizer source: https://github.com/Azure/sql_python_deep_learning 3 224 224 ResNet N-1 layers penultimate layer CNTK (53min)k batch of images = 1 patient
  • 41. 3 224 224 ResNet N-1 layers penultimate layer no cancer CNTK (53min) LightGBM (2min) Boosted tree k batch of images = 1 patient features Solution: Boosted Tree Classifier source: https://github.com/Azure/sql_python_deep_learning
  • 42. (Extra slide): 2nd place in the competition source: https://github.com/juliandewit/kaggle_ndsb2017
  • 43. Deep Learning in SQL Server: Training sp.dbo.GenerateFeatures CNTK with GPUs sp.dbo.TrainLungCancerModel LightGBM Populate tables
  • 44. Deep Learning in SQL Server: Operationalization sp.dbo.PredictLungCancer Web App
  • 45. Demo
  • 46. Solution in SQL Server 2017
  • 47. Solution in SQL Server 2017
  • 48. Solution in SQL Server 2017
  • 49. Solution in SQL Server 2017
  • 50. Solution in SQL Server 2017
  • 51. Solution in SQL Server 2017
  • 52. Solution in SQL Server 2017
  • 53. Solution in SQL Server 2017
  • 57. THANK YOU @ODSC Dr. Miguel Fierro @miguelgfierro https://miguelgfierro.com