SlideShare a Scribd company logo
Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Introduction Objective:  Use machine learning to select a small number of “good” stocks to form a portfolio  Research questions: Learning in the noisy dataset Learning in the imbalanced dataset Our solution: Prototype Ranking A specially designed machine learning method
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Stock Selection Task Given information prior to week  t , predict performance of stocks of week  t Training set Learning a ranking function to rank   testing data Select  n  highest to buy,  n  lowest to short-sell  Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week  t -1 Return of week  t -2 Volume ratio of  t -2/ t -1 Return of  week  t
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Prototype Ranking Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data
Step 1: Finding Prototypes Prototypes: representative points Goal:  discover the underlying density/clusters of the training  samples by distributing  prototypes in sample  space Reduce data size prototypes prototype  neighborhood samples
Analysis??? Competitive learning for stock selection task Pros: Noise-tolerant On-line update: practical for huge dataset Smoothly simulate the training samples Cons: Searching the nearest prototype is tedious  Poor performance for the prediction task Design for tasks such as clustering, feature mapping… The stock selection is a prediction task Poor performance for imbalanced dataset modeling
Finding prototypes using  competitive learning General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes Hidden density in training is reflected in  prototypes
Modifications for Stock data In step 1: Initial prototypes organized in a tree-structure Fast nearest prototype searching  In step 2: Searching prototypes in the  predictor space Better learning effect for the prediction tasks In step 3: Adjusting prototypes in the  goal attribute space Better learning effect in the imbalanced stock data  In step 4, prune the prototype tree Prune children prototypes if they are similar to the parent Combine leaf prototypes to form the final prototypes
Step 2: Predicting Test Data The weighted average of  k  nearest prototypes Online update the model with new data
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Data CRSP daily stock database 300 NYSE and AMEX stocks, largest market cap From 1962 to 2004
Testing PR Experiment 1: Larger portfolio, lower average return, lower risk – diversification Experiment 2: is PR better than Cooper’s method?
Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
Experiment 2: Comparison to Cooper’s method Cooper’s method (CP): A traditional non-ML method for stock selection… Compare PR and CP in 10-stock portfolios
Results of Experiment 2  Measures:  Average Return (Ret.) Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22
Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
Conclusions PR: modified competitive learning and k-NN for  noisy and imbalanced stock data PR does well in stock selection Larger portfolio, lower return, lower risk PR outperforms the non-ML method CP Future work: use it to invest and make money!

More Related Content

PPTX
Lecture 09(introduction to machine learning)
PPTX
Machine Learning - Ensemble Methods
PPTX
Ensemble learning
PPTX
Ensemble learning
PPTX
Ensemble methods
PPTX
Supervised Machine Learning in R
PPTX
(Machine Learning) Ensemble learning
PPTX
Ensemble hybrid learning technique
Lecture 09(introduction to machine learning)
Machine Learning - Ensemble Methods
Ensemble learning
Ensemble learning
Ensemble methods
Supervised Machine Learning in R
(Machine Learning) Ensemble learning
Ensemble hybrid learning technique

What's hot (20)

PPTX
Ensemble learning Techniques
PDF
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
PDF
Understanding Bagging and Boosting
PDF
Ensemble modeling and Machine Learning
PDF
Introduction to Machine Learning Classifiers
PPTX
Ensemble methods
PPTX
Developing a Computerized Adaptive Test
PPT
activelearning.ppt
PPTX
Lecture 01: Machine Learning for Language Technology - Introduction
PPTX
Bag the model with bagging
PPTX
Presentation on supervised learning
PPT
Machine Learning presentation.
PDF
L4. Ensembles of Decision Trees
PDF
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
PPTX
Generating SPSS training materials in StatJR
PDF
Cmpe 255 cross validation
PPTX
Sampling and measurement
PPTX
RapidMiner: Learning Schemes In Rapid Miner
PPT
Stareast2008
PPTX
Model Selection Techniques
Ensemble learning Techniques
[Paper Introduction] Training a Natural Language Generator From Unaligned Data
Understanding Bagging and Boosting
Ensemble modeling and Machine Learning
Introduction to Machine Learning Classifiers
Ensemble methods
Developing a Computerized Adaptive Test
activelearning.ppt
Lecture 01: Machine Learning for Language Technology - Introduction
Bag the model with bagging
Presentation on supervised learning
Machine Learning presentation.
L4. Ensembles of Decision Trees
Lecture 3: Basic Concepts of Machine Learning - Induction & Evaluation
Generating SPSS training materials in StatJR
Cmpe 255 cross validation
Sampling and measurement
RapidMiner: Learning Schemes In Rapid Miner
Stareast2008
Model Selection Techniques
Ad

Similar to KDD (14)

PDF
fma.ny.presentation
PDF
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
PDF
IRJET- Prediction in Stock Marketing
PDF
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
PDF
IRJET- Prediction of Stock Market using Machine Learning Algorithms
PDF
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
PDF
Stock Price Prediction Using Sentiment Analysis and Historic Data of Stock
PDF
Stock Market Prediction Using Artificial Neural Network
PDF
Visualizing and Forecasting Stocks Using Machine Learning
PPTX
python web development ppt with code and the output.pptx
PPS
Stock Ranking - A Neural Networks Approach
PDF
Stock Market Prediction using Alpha Vantage API and Machine Learning Algorithm
PDF
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
fma.ny.presentation
Investment Portfolio Risk Manager using Machine Learning and Deep-Learning.
IRJET- Prediction in Stock Marketing
RETRIEVING FUNDAMENTAL VALUES OF EQUITY
IRJET- Prediction of Stock Market using Machine Learning Algorithms
STOCK MARKET ANALYZING AND PREDICTION USING MACHINE LEARNING TECHNIQUES
Stock Price Prediction Using Sentiment Analysis and Historic Data of Stock
Stock Market Prediction Using Artificial Neural Network
Visualizing and Forecasting Stocks Using Machine Learning
python web development ppt with code and the output.pptx
Stock Ranking - A Neural Networks Approach
Stock Market Prediction using Alpha Vantage API and Machine Learning Algorithm
STOCK PRICE PREDICTION USING MACHINE LEARNING [RANDOM FOREST REGRESSION MODEL]
Ad

More from Shyam Singh (9)

PDF
demo
PPT
PDF
pdf file test
PPT
PPT
PPT
slide->title; ?>
PPT
PPT
PPT
demo
pdf file test
slide->title; ?>

KDD

  • 1. Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca
  • 2. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 3. Introduction Objective: Use machine learning to select a small number of “good” stocks to form a portfolio Research questions: Learning in the noisy dataset Learning in the imbalanced dataset Our solution: Prototype Ranking A specially designed machine learning method
  • 4. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 5. Stock Selection Task Given information prior to week t , predict performance of stocks of week t Training set Learning a ranking function to rank testing data Select n highest to buy, n lowest to short-sell Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week t -1 Return of week t -2 Volume ratio of t -2/ t -1 Return of week t
  • 6. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 7. Prototype Ranking Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data
  • 8. Step 1: Finding Prototypes Prototypes: representative points Goal: discover the underlying density/clusters of the training samples by distributing prototypes in sample space Reduce data size prototypes prototype neighborhood samples
  • 9. Analysis??? Competitive learning for stock selection task Pros: Noise-tolerant On-line update: practical for huge dataset Smoothly simulate the training samples Cons: Searching the nearest prototype is tedious Poor performance for the prediction task Design for tasks such as clustering, feature mapping… The stock selection is a prediction task Poor performance for imbalanced dataset modeling
  • 10. Finding prototypes using competitive learning General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes Hidden density in training is reflected in prototypes
  • 11. Modifications for Stock data In step 1: Initial prototypes organized in a tree-structure Fast nearest prototype searching In step 2: Searching prototypes in the predictor space Better learning effect for the prediction tasks In step 3: Adjusting prototypes in the goal attribute space Better learning effect in the imbalanced stock data In step 4, prune the prototype tree Prune children prototypes if they are similar to the parent Combine leaf prototypes to form the final prototypes
  • 12. Step 2: Predicting Test Data The weighted average of k nearest prototypes Online update the model with new data
  • 13. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 14. Data CRSP daily stock database 300 NYSE and AMEX stocks, largest market cap From 1962 to 2004
  • 15. Testing PR Experiment 1: Larger portfolio, lower average return, lower risk – diversification Experiment 2: is PR better than Cooper’s method?
  • 16. Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)
  • 17. Experiment 2: Comparison to Cooper’s method Cooper’s method (CP): A traditional non-ML method for stock selection… Compare PR and CP in 10-stock portfolios
  • 18. Results of Experiment 2 Measures: Average Return (Ret.) Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.
  • 19. Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22
  • 20. Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions
  • 21. Conclusions PR: modified competitive learning and k-NN for noisy and imbalanced stock data PR does well in stock selection Larger portfolio, lower return, lower risk PR outperforms the non-ML method CP Future work: use it to invest and make money!