SlideShare a Scribd company logo
GIORGIO ALFREDO SPEDICATO, PHD FCAS FSA CSPA
UNISACT 2018
Machine Learning and Actuarial Science
Supervised learning
Intro
▪There is an explicit target that the algorithms aims to predict
▪Dependent variable could, for example, be:
▪ Continuous: es. loss cost, income, …
▪ Integer: es. #of claims, # of purchased covers, # of successes in n trials.
▪ Binary: fraud, retention and conversion probability, …
▪ Multinomial: a-priori categories.
▪Classical multivariate regression is the most common algorithm
Linear models, GLM & GAM
GLMs are the first predictive models widely used in the Insurance
industry, currently the gold standard for personal lines pricing:
𝑔 μ𝑖 = 𝑥𝑖
𝑇
𝛽 = 𝑓 𝑥𝑖
𝑇
+ 𝑜𝑓𝑓𝑠𝑒𝑡
Possible link ( g) functions:
• logistic (log(p/(1-p)), to model probability
• Log(μ), Poisson or Gamma regression (# of claims and severity modeling);
• μ *1, identity, linear gaussian regression
Linear models, GLM & GAM
▪𝑓 𝑥𝑖
𝑇
can be any linear component, of which:
▪ Additive terms;
▪ Binned continuous variables;
▪ Splines / Polynomials;
▪GAM consist in smooth functions in the additive term.
▪All actuarial pricing software (like Emblem, SAS) implement GLMs and their
extensions (splines, GAM, …)
Linear models, GLM & GAM
▪Pros:
▪ GLM are «market» wide standards (baseline for competing models’ comparisons)
▪ Easy to fit;
▪ Interpretability.
▪Cons:
▪ Strong non-linearities difficult to be handled;
▪ Need to explicitly define interactions;
▪ Need to overcome collinearity when # of predictors increases.
Elasticnet
▪GLMs are usually fit maximizing logLikelihood max 𝐿𝑜𝑔𝐿𝑖𝑘
▪The ElasticNet approach extends GLM optimizing max 𝐿𝑜𝑔𝐿𝑖𝑘 − 𝑃𝑒𝑛𝑎𝑙𝑡𝑦,
where 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝜆 𝛼 𝛽1 + (1 − 𝛼) 𝛽2 being:
▪ 𝛽1 = σ 𝑘=1
𝑝
|𝛽𝑖| 𝑖𝑠 the Lasso component (it helps to drop non significant
predictors);
▪ 𝛽2 = σ 𝑘=1
𝑝
𝛽𝑖
2 is the Ridge component, that handle
▪ α, λ is the relative Lasso and Ridge weight and penalty overall weight.
▪Elasticnet joins GLMs interpretability with higher robustness.
Elasticnet
Estaticnet is implemented both in R (glmnet
package), in SAS (Proc GLMSELECT), and in H2O.
Elastinet output is the same of GLMs.
Classification and Regression Tree
▪A CART creates hierarchical partitions of input data sets as a function of predictors.
▪Chosen predictors at every splits and cut offs maximizes loss reduction of outcome prediction.
▪CARTs can handle continuous and categorical predictors, optimizing F-test or 𝜒2
statistics when
defining the splits.
Classification and Regression Trees
▪ Sofware:
▪ SAS STAT: PROC HPSPLIT
▪ R packages: rpart, C50, party, partykit
▪ Python libraries: sklearn
DecisionTreeClassifier
▪ SPSS: CHAID
Classification and Regression Trees
▪Pros:
▪Easy to be explained to not technical audience
▪Alllow to easily understand predictors importance and interactions
▪Allows to have an insight of variable importance (useful as first analysis)
▪Cons:
▪Sensible to outliners «pruning»
▪Predictions in constant in intervals, can be less performant than other
approaches
Random Forest & Bagging
▪ Classification and regression trees extension following «bagging». approach. Can handle
continuous and categorical outcomes
▪ «bagging» means:
▪ Creating many independent samples;
▪ Fit «simple» models on each of them ( a «forest» overall)
▪ The prediction of an observation is the average of induvial trees predictions
▪ Most important parameters are:
▪ «mtries»: fraction of the number of predictions(p) to use in each trees; eristically p/3 for
classification problems, 𝑝 for regression ones
▪ Max depth, min rows: max depth of a single trees
▪ Ntrees: number of trees (independent sample) to be fit (usually>=50)
Random Forest & Bagging
Random Forest – Grid search
▪As most ML models, no closed form is available to define optimal parameters
▪Grid search is needed. Various parameters configurations are tried to find the one that
maximizes predictive performance.
▪Possible approaches are:
▪ Cartesian grid: all possible combination of parameters «curse of dimensionality» issues
▪ Random grid search: a sample of the cartesian grid;
▪ Bayesian optimization / Genetic algorithms: a an initial grid search is performed, then
parameters are changed following a direction that tends to increase predictive
performance.
Random Forest – Grid search
▪Pros:
▪ Generally, it offers good fits
▪ Little sensitivity to outliers
▪ Easily scalable
▪Cons:
▪ Opaque (just variable importance is available)
▪ Difficult to use to fit rates (offset)
Boosted Models
▪Strong predictive performance, frequently used in Kaggle
competitions.
▪Extends CART, with enhancement to avoid overfit and to increase
predictive performance;
▪They can be use for:
▪ Classification problems;
▪ Regression. «base margin» allows to use initial estimates (as boosting existing models) or to handle
offsets
▪ Ranking and Survival modeling
Boosted Models
▪The boosting algorithm is a recursive error correction model: 𝐹𝑡 𝑥 = 𝐹𝑡−1 𝑥 + 𝜂 ∗ ℎ𝑡 𝑥 ,
being:
▪ ℎ𝑡 𝑥 a simple models to predict t-1 prediction errors
▪ 𝜂 is a shrinkage factor that increases model robustness.
▪The Gradient Descendent algorithm is used to minimize a chosen loss function.
▪The number of iterations depends by:
▪ A fixed #;
▪ A moving average approach.
Boosted Models
Boosted Models
Typical boosted models parameters are:
• Η (shrinkage) e n (number of iteration / sub-trees);
• Fractions of sampled observations at each observations;
• Fraction of predictors available at each step;
• Max dept of each tree;
• Other regularization parameters
A grid search approach is needed to tune optimal hyperparameter configuration:
Boosted Models
▪GBM is the first widely used gradient boosting algorithm;
▪XGBoost is currently the gold standard of boosted trees. It extends GBM thanks to:
▪ Parallelizing;
▪ Regularization;
▪ Checkpointing: a new models starts from the results of a previous one.
▪LightGBM is a promising very recent evolution of XGBoost from Microsoft Research.
Boosted Models
▪R (libraries gbm, xgboost, ligthgbm) and Python (libraries scikit-learn, xgboost) are the core
packages to fit boosted models.
▪Boosted models are also in SAS (Enterprise miner) and Matlab (statistic and machine learning
toolbox)
▪H2O suite implements both GBM and XGBOOST, allowing an easy parallelization (also across
computing clusters) and a GPU extension.
Stacked Ensemble
▪Combining different algorithms by a «superlearner» to obtain an even more robust prediction;
▪The algorithm is:
▪ Estimating L separate algorithms based on N using k-fold cross validation;
▪ Combining L prediction by a «superlearner» fit finding the L best weights
▪ Final predictions is the weighed average of L models individual ones
▪Pros&Cons:
▪ Pros: generally increases predictive performance combining L models strengths;
▪ Cons: higher computing time, lower explicative performance
Stacked Ensemble
▪ H2O (stackedEnsemble)
▪ Easily to be generalized
Deep Learning
▪Multi – layer neural networks
▪Very effective for:
▪Image recognition
▪Natural Language Processing
▪Multivariate time series analysis
▪Unsupervised learnings
▪Requests:
▪Huge data
▪Computing powerr (often GPU
computing necessary)
Deep Learning
▪A Deep neural network consist in different neuron strata:
▪ Retrieving inputs from previous layers
▪ Properly weight inputs
▪ Retrieving an output 𝜑𝑖 σ𝑗=1
𝑞
𝑤𝑗 𝑥𝑗 + 𝑏𝑖
▪The increase in computing power and the introduction of methodologies (e.g.
Dropout) that reduces overfitting contributed to the renewed attention to
Neural Network that are currently the state of the art of ML and Artificial
Intelligence.
▪Most relevant drawback are:
▪ Lack of interpretability;
▪ Difficult to define the best architecture configuration
Deep Learning: tipi di reti neurali
▪Multi layer perceptron (MLP): it consists in an
input layer, an output one, one or more hidden
layers. Used for regression and classification;
▪Convolutionary neural networks (CNN):
Convolution layers allow to obtain spatial
feature. Useful to for image recognition and
natural language processing;
▪Recurrent neural networks (RNN): memory
effects can be get that can be useful in
sequence analysis (translation, nlp, time series
analysis).
word2vec
Natural Language Processing applications.
Word occurrence depends by neighbors frequence.
A weights vectors is associated to each word (𝑛 ∈ 150 − 300 )
Each word belong to an 𝑅 𝑛 space that means that:
1. Semantic similarity between words can be computed
2. Word algebra can be performed: “king” – “man”+ “woman” get a word vector close to
“queen” one.

More Related Content

PDF
Machine Learning - Principles
PDF
Machine Learning - Intro
PDF
Multi-data-types Interval Decision Diagrams for XACML Evaluation Engine
PDF
Generalized Linear Models with H2O
PDF
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
PDF
Scalable Data Analysis in R Webinar Presentation
PPT
Logistic Regression using Mahout
PDF
Enterprise Scale Topological Data Analysis Using Spark
Machine Learning - Principles
Machine Learning - Intro
Multi-data-types Interval Decision Diagrams for XACML Evaluation Engine
Generalized Linear Models with H2O
모듈형 패키지를 활용한 나만의 기계학습 모형 만들기 - 회귀나무모형을 중심으로
Scalable Data Analysis in R Webinar Presentation
Logistic Regression using Mahout
Enterprise Scale Topological Data Analysis Using Spark

What's hot (20)

PPTX
An intelligent scalable stock market prediction system
PDF
Introduction to Machine Learning with SciKit-Learn
PPTX
Towards Increasing Predictability of Machine Learning Research
PDF
A Pioneering Approach to Parallel Array Processing in Quantitative and Mathem...
PDF
Generalized Linear Models in Spark MLlib and SparkR
PDF
Feature Engineering
PPT
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
PPTX
Anomaly detection
PDF
Predictive Analytics for Alpha Generation and Risk Management
PDF
Evolutionary Design of Swarms (SSCI 2014)
PDF
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
PPTX
Streaming Python on Hadoop
PPTX
Machine learning Algorithms with a Sagemaker demo
PPTX
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
PPTX
Deep Learning with Apache Spark: an Introduction
PDF
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
PDF
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
PPTX
Automated Machine Learning (Auto ML)
PDF
Feature Engineering
PDF
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
An intelligent scalable stock market prediction system
Introduction to Machine Learning with SciKit-Learn
Towards Increasing Predictability of Machine Learning Research
A Pioneering Approach to Parallel Array Processing in Quantitative and Mathem...
Generalized Linear Models in Spark MLlib and SparkR
Feature Engineering
Evaluating Classification Algorithms Applied To Data Streams Esteban Donato
Anomaly detection
Predictive Analytics for Alpha Generation and Risk Management
Evolutionary Design of Swarms (SSCI 2014)
Melanie Warrick, Deep Learning Engineer, Skymind.io at MLconf SF - 11/13/15
Streaming Python on Hadoop
Machine learning Algorithms with a Sagemaker demo
Misha Bilenko, Principal Researcher, Microsoft at MLconf SEA - 5/01/15
Deep Learning with Apache Spark: an Introduction
Kaggle Otto Challenge: How we achieved 85th out of 3,514 and what we learnt
Parallel Processing of Big Data in Finance for Alpha Generation and Risk Mana...
Automated Machine Learning (Auto ML)
Feature Engineering
Nikhil Garg, Engineering Manager, Quora at MLconf SF 2016
Ad

Similar to Machine Learning - Supervised Learning (20)

PPTX
Machine Learning Workshop
PPTX
Machine Learning Innovations
PDF
Applied machine learning: Insurance
PDF
Machine learning cheat sheet
PPTX
Intro to ml_2021
PDF
Cheatsheet supervised-learning
PPTX
Introduction to Machine Learning Concepts
PDF
Data Science Cheatsheet.pdf
PPTX
Machine learning Method and techniques
PPTX
Machine learning with R
PDF
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
PPTX
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
PDF
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
PDF
Introduction to machine learning
PPTX
Machine learning - session 3
PPTX
Machine Learning techniques used in AI.
PPTX
Deep learning from mashine learning AI..
PDF
Python Code for Classification Supervised Machine Learning.pdf
PPTX
Introduction to RandomForests 2004
PPTX
INTRODUCTIONTOML2024 for graphic era.pptx
Machine Learning Workshop
Machine Learning Innovations
Applied machine learning: Insurance
Machine learning cheat sheet
Intro to ml_2021
Cheatsheet supervised-learning
Introduction to Machine Learning Concepts
Data Science Cheatsheet.pdf
Machine learning Method and techniques
Machine learning with R
Lecture 2 - Introduction to Machine Learning, a lecture in subject module Sta...
Machine learning basics using trees algorithm (Random forest, Gradient Boosting)
Top Machine Learning Algorithms Used By AI Professionals ARTiBA.pdf
Introduction to machine learning
Machine learning - session 3
Machine Learning techniques used in AI.
Deep learning from mashine learning AI..
Python Code for Classification Supervised Machine Learning.pdf
Introduction to RandomForests 2004
INTRODUCTIONTOML2024 for graphic era.pptx
Ad

More from Giorgio Alfredo Spedicato (15)

PDF
Machine Learning - Unsupervised Learning
PPTX
Meta analysis essentials
PDF
Long term care insurance with markovchain package
PPTX
The markovchain package use r2016
PDF
Cat Bond and Traditional Insurance
PPTX
It skills for actuaries
PPTX
Statistics in insurance business
PDF
Spedicato r ininsurance2015
PPT
R in Insurance 2014
PDF
Introduction to lifecontingencies R package
PPT
L'assicurazione rca gs
PDF
P & C Reserving Using GAMLSS
PDF
Princing insurance contracts with R
PDF
Pricing and modelling under the Italian Direct Compensation Card Scheme
PDF
Actuarial modeling of general practictioners' drug prescriptions costs
Machine Learning - Unsupervised Learning
Meta analysis essentials
Long term care insurance with markovchain package
The markovchain package use r2016
Cat Bond and Traditional Insurance
It skills for actuaries
Statistics in insurance business
Spedicato r ininsurance2015
R in Insurance 2014
Introduction to lifecontingencies R package
L'assicurazione rca gs
P & C Reserving Using GAMLSS
Princing insurance contracts with R
Pricing and modelling under the Italian Direct Compensation Card Scheme
Actuarial modeling of general practictioners' drug prescriptions costs

Recently uploaded (20)

PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
1_Introduction to advance data techniques.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Business Acumen Training GuidePresentation.pptx
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Clinical guidelines as a resource for EBP(1).pdf
PDF
Fluorescence-microscope_Botany_detailed content
PDF
Lecture1 pattern recognition............
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
Introduction to machine learning and Linear Models
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
1_Introduction to advance data techniques.pptx
Mega Projects Data Mega Projects Data
Business Ppt On Nestle.pptx huunnnhhgfvu
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Business Acumen Training GuidePresentation.pptx
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
STUDY DESIGN details- Lt Col Maksud (21).pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Acceptance and paychological effects of mandatory extra coach I classes.pptx
.pdf is not working space design for the following data for the following dat...
Miokarditis (Inflamasi pada Otot Jantung)
Clinical guidelines as a resource for EBP(1).pdf
Fluorescence-microscope_Botany_detailed content
Lecture1 pattern recognition............
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
Introduction to machine learning and Linear Models
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb

Machine Learning - Supervised Learning

  • 1. GIORGIO ALFREDO SPEDICATO, PHD FCAS FSA CSPA UNISACT 2018 Machine Learning and Actuarial Science Supervised learning
  • 2. Intro ▪There is an explicit target that the algorithms aims to predict ▪Dependent variable could, for example, be: ▪ Continuous: es. loss cost, income, … ▪ Integer: es. #of claims, # of purchased covers, # of successes in n trials. ▪ Binary: fraud, retention and conversion probability, … ▪ Multinomial: a-priori categories. ▪Classical multivariate regression is the most common algorithm
  • 3. Linear models, GLM & GAM GLMs are the first predictive models widely used in the Insurance industry, currently the gold standard for personal lines pricing: 𝑔 μ𝑖 = 𝑥𝑖 𝑇 𝛽 = 𝑓 𝑥𝑖 𝑇 + 𝑜𝑓𝑓𝑠𝑒𝑡 Possible link ( g) functions: • logistic (log(p/(1-p)), to model probability • Log(μ), Poisson or Gamma regression (# of claims and severity modeling); • μ *1, identity, linear gaussian regression
  • 4. Linear models, GLM & GAM ▪𝑓 𝑥𝑖 𝑇 can be any linear component, of which: ▪ Additive terms; ▪ Binned continuous variables; ▪ Splines / Polynomials; ▪GAM consist in smooth functions in the additive term. ▪All actuarial pricing software (like Emblem, SAS) implement GLMs and their extensions (splines, GAM, …)
  • 5. Linear models, GLM & GAM ▪Pros: ▪ GLM are «market» wide standards (baseline for competing models’ comparisons) ▪ Easy to fit; ▪ Interpretability. ▪Cons: ▪ Strong non-linearities difficult to be handled; ▪ Need to explicitly define interactions; ▪ Need to overcome collinearity when # of predictors increases.
  • 6. Elasticnet ▪GLMs are usually fit maximizing logLikelihood max 𝐿𝑜𝑔𝐿𝑖𝑘 ▪The ElasticNet approach extends GLM optimizing max 𝐿𝑜𝑔𝐿𝑖𝑘 − 𝑃𝑒𝑛𝑎𝑙𝑡𝑦, where 𝑃𝑒𝑛𝑎𝑙𝑡𝑦 = 𝜆 𝛼 𝛽1 + (1 − 𝛼) 𝛽2 being: ▪ 𝛽1 = σ 𝑘=1 𝑝 |𝛽𝑖| 𝑖𝑠 the Lasso component (it helps to drop non significant predictors); ▪ 𝛽2 = σ 𝑘=1 𝑝 𝛽𝑖 2 is the Ridge component, that handle ▪ α, λ is the relative Lasso and Ridge weight and penalty overall weight. ▪Elasticnet joins GLMs interpretability with higher robustness.
  • 7. Elasticnet Estaticnet is implemented both in R (glmnet package), in SAS (Proc GLMSELECT), and in H2O. Elastinet output is the same of GLMs.
  • 8. Classification and Regression Tree ▪A CART creates hierarchical partitions of input data sets as a function of predictors. ▪Chosen predictors at every splits and cut offs maximizes loss reduction of outcome prediction. ▪CARTs can handle continuous and categorical predictors, optimizing F-test or 𝜒2 statistics when defining the splits.
  • 9. Classification and Regression Trees ▪ Sofware: ▪ SAS STAT: PROC HPSPLIT ▪ R packages: rpart, C50, party, partykit ▪ Python libraries: sklearn DecisionTreeClassifier ▪ SPSS: CHAID
  • 10. Classification and Regression Trees ▪Pros: ▪Easy to be explained to not technical audience ▪Alllow to easily understand predictors importance and interactions ▪Allows to have an insight of variable importance (useful as first analysis) ▪Cons: ▪Sensible to outliners «pruning» ▪Predictions in constant in intervals, can be less performant than other approaches
  • 11. Random Forest & Bagging ▪ Classification and regression trees extension following «bagging». approach. Can handle continuous and categorical outcomes ▪ «bagging» means: ▪ Creating many independent samples; ▪ Fit «simple» models on each of them ( a «forest» overall) ▪ The prediction of an observation is the average of induvial trees predictions ▪ Most important parameters are: ▪ «mtries»: fraction of the number of predictions(p) to use in each trees; eristically p/3 for classification problems, 𝑝 for regression ones ▪ Max depth, min rows: max depth of a single trees ▪ Ntrees: number of trees (independent sample) to be fit (usually>=50)
  • 12. Random Forest & Bagging
  • 13. Random Forest – Grid search ▪As most ML models, no closed form is available to define optimal parameters ▪Grid search is needed. Various parameters configurations are tried to find the one that maximizes predictive performance. ▪Possible approaches are: ▪ Cartesian grid: all possible combination of parameters «curse of dimensionality» issues ▪ Random grid search: a sample of the cartesian grid; ▪ Bayesian optimization / Genetic algorithms: a an initial grid search is performed, then parameters are changed following a direction that tends to increase predictive performance.
  • 14. Random Forest – Grid search ▪Pros: ▪ Generally, it offers good fits ▪ Little sensitivity to outliers ▪ Easily scalable ▪Cons: ▪ Opaque (just variable importance is available) ▪ Difficult to use to fit rates (offset)
  • 15. Boosted Models ▪Strong predictive performance, frequently used in Kaggle competitions. ▪Extends CART, with enhancement to avoid overfit and to increase predictive performance; ▪They can be use for: ▪ Classification problems; ▪ Regression. «base margin» allows to use initial estimates (as boosting existing models) or to handle offsets ▪ Ranking and Survival modeling
  • 16. Boosted Models ▪The boosting algorithm is a recursive error correction model: 𝐹𝑡 𝑥 = 𝐹𝑡−1 𝑥 + 𝜂 ∗ ℎ𝑡 𝑥 , being: ▪ ℎ𝑡 𝑥 a simple models to predict t-1 prediction errors ▪ 𝜂 is a shrinkage factor that increases model robustness. ▪The Gradient Descendent algorithm is used to minimize a chosen loss function. ▪The number of iterations depends by: ▪ A fixed #; ▪ A moving average approach.
  • 18. Boosted Models Typical boosted models parameters are: • Η (shrinkage) e n (number of iteration / sub-trees); • Fractions of sampled observations at each observations; • Fraction of predictors available at each step; • Max dept of each tree; • Other regularization parameters A grid search approach is needed to tune optimal hyperparameter configuration:
  • 19. Boosted Models ▪GBM is the first widely used gradient boosting algorithm; ▪XGBoost is currently the gold standard of boosted trees. It extends GBM thanks to: ▪ Parallelizing; ▪ Regularization; ▪ Checkpointing: a new models starts from the results of a previous one. ▪LightGBM is a promising very recent evolution of XGBoost from Microsoft Research.
  • 20. Boosted Models ▪R (libraries gbm, xgboost, ligthgbm) and Python (libraries scikit-learn, xgboost) are the core packages to fit boosted models. ▪Boosted models are also in SAS (Enterprise miner) and Matlab (statistic and machine learning toolbox) ▪H2O suite implements both GBM and XGBOOST, allowing an easy parallelization (also across computing clusters) and a GPU extension.
  • 21. Stacked Ensemble ▪Combining different algorithms by a «superlearner» to obtain an even more robust prediction; ▪The algorithm is: ▪ Estimating L separate algorithms based on N using k-fold cross validation; ▪ Combining L prediction by a «superlearner» fit finding the L best weights ▪ Final predictions is the weighed average of L models individual ones ▪Pros&Cons: ▪ Pros: generally increases predictive performance combining L models strengths; ▪ Cons: higher computing time, lower explicative performance
  • 22. Stacked Ensemble ▪ H2O (stackedEnsemble) ▪ Easily to be generalized
  • 23. Deep Learning ▪Multi – layer neural networks ▪Very effective for: ▪Image recognition ▪Natural Language Processing ▪Multivariate time series analysis ▪Unsupervised learnings ▪Requests: ▪Huge data ▪Computing powerr (often GPU computing necessary)
  • 24. Deep Learning ▪A Deep neural network consist in different neuron strata: ▪ Retrieving inputs from previous layers ▪ Properly weight inputs ▪ Retrieving an output 𝜑𝑖 σ𝑗=1 𝑞 𝑤𝑗 𝑥𝑗 + 𝑏𝑖 ▪The increase in computing power and the introduction of methodologies (e.g. Dropout) that reduces overfitting contributed to the renewed attention to Neural Network that are currently the state of the art of ML and Artificial Intelligence. ▪Most relevant drawback are: ▪ Lack of interpretability; ▪ Difficult to define the best architecture configuration
  • 25. Deep Learning: tipi di reti neurali ▪Multi layer perceptron (MLP): it consists in an input layer, an output one, one or more hidden layers. Used for regression and classification; ▪Convolutionary neural networks (CNN): Convolution layers allow to obtain spatial feature. Useful to for image recognition and natural language processing; ▪Recurrent neural networks (RNN): memory effects can be get that can be useful in sequence analysis (translation, nlp, time series analysis).
  • 26. word2vec Natural Language Processing applications. Word occurrence depends by neighbors frequence. A weights vectors is associated to each word (𝑛 ∈ 150 − 300 ) Each word belong to an 𝑅 𝑛 space that means that: 1. Semantic similarity between words can be computed 2. Word algebra can be performed: “king” – “man”+ “woman” get a word vector close to “queen” one.