KDD

Machine Learning for Stock Selection Robert J. Yan Charles X. Ling University of Western Ontario, Canada {jyan, cling}@csd.uwo.ca

Outline Introduction The stock selection task The Prototype Ranking method Experimental results Conclusions

Introduction Objective: Use machine learning to select a small number of “good” stocks to form a portfolio Research questions: Learning in the noisy dataset Learning in the imbalanced dataset Our solution: Prototype Ranking A specially designed machine learning method

Stock Selection Task Given information prior to week t , predict performance of stocks of week t Training set Learning a ranking function to rank testing data Select n highest to buy, n lowest to short-sell Predictor 1 Predictor 2 Predictor 3 Goal Stock ID Return of week t -1 Return of week t -2 Volume ratio of t -2/ t -1 Return of week t

Prototype Ranking Prototype Ranking (PR): special machine learning for noisy and imbalanced stock data The PR System Step 1. Find good “prototypes” in training data Step 2. Use k-NN on prototypes to rank test data

Step 1: Finding Prototypes Prototypes: representative points Goal: discover the underlying density/clusters of the training samples by distributing prototypes in sample space Reduce data size prototypes prototype neighborhood samples

Analysis??? Competitive learning for stock selection task Pros: Noise-tolerant On-line update: practical for huge dataset Smoothly simulate the training samples Cons: Searching the nearest prototype is tedious Poor performance for the prediction task Design for tasks such as clustering, feature mapping… The stock selection is a prediction task Poor performance for imbalanced dataset modeling

Finding prototypes using competitive learning General competitive learning Step 1: Randomly initialize a set of prototypes Step 2: Search the nearest prototypes Step 3: Adjust the prototypes Step 4: Output the prototypes Hidden density in training is reflected in prototypes

Modifications for Stock data In step 1: Initial prototypes organized in a tree-structure Fast nearest prototype searching In step 2: Searching prototypes in the predictor space Better learning effect for the prediction tasks In step 3: Adjusting prototypes in the goal attribute space Better learning effect in the imbalanced stock data In step 4, prune the prototype tree Prune children prototypes if they are similar to the parent Combine leaf prototypes to form the final prototypes

Step 2: Predicting Test Data The weighted average of k nearest prototypes Online update the model with new data

Data CRSP daily stock database 300 NYSE and AMEX stocks, largest market cap From 1962 to 2004

Testing PR Experiment 1: Larger portfolio, lower average return, lower risk – diversification Experiment 2: is PR better than Cooper’s method?

Results of Experiment 1 Average Return (1978-2004) Risk (std) (1978-2004)

Experiment 2: Comparison to Cooper’s method Cooper’s method (CP): A traditional non-ML method for stock selection… Compare PR and CP in 10-stock portfolios

Results of Experiment 2 Measures: Average Return (Ret.) Sharpe Ratio (SR): a risk-adjusted return: SR= Ret. / Std.

Results Portfolio Performance 1978-1993 1994-2004 PR CP PR CP 10-stock Ave. Return (%) 1.69 0.89 1.37 0.81 STD (%) 3.30 2.80 6.20 5.10 Sharpe Ratio 0.51 0.32 0.22 0.16 20-stock Ave. Return (%) 1.35 0.80 1.32 0.81 STD (%) 2.60 2.10 5.10 4.30 Sharpe Ratio 0.52 0.38 0.26 0.19 30-stock Ave. Return (%) 1.14 0.67 1.16 0.77 STD (%) 2.20 1.80 4.60 3.50 Sharpe Ratio 0.52 0.37 0.27 0.22

Conclusions PR: modified competitive learning and k-NN for noisy and imbalanced stock data PR does well in stock selection Larger portfolio, lower return, lower risk PR outperforms the non-ML method CP Future work: use it to invest and make money!

KDD

More Related Content

What's hot (20)

Similar to KDD (14)

More from Shyam Singh (9)

KDD