Asset Price Prediction with Machine Learning

Which variables matter for predicting S1
When assessing this task, it is important to remember that this is time series
based data. As such, and often particularly with stock related data, multicollinearity
will most likely be an issue. This presents major problems for regression analysis
since multicollinearity will inflate the sum of squares in our regressions. In addition
to this, we also must recognize that including all of the variables in a model would
lead to over-fitting in sample and subsequently poor predictive performance in out
of sample data. With these problems in mind, we will remedy them with Principal
Component Analysis.
Principal Component Analysis is a statistical method used to reduce
dimensionality of data sets. Simply stated, we transform the data into new variables
called principal components and eliminate the principal components that explain
negligible amounts of the variance exhibited within the data set. The benefit of this
technique is that we preserve the variance of the data set while being able to
perform visual and exploratory analysis much easier than prior to the
transformation. When forming the matrix of data we will perform PCA on, we
remove S1, since this is the response variable, and retain columns S2 through S10.
After running principal component analysis on the first 50 rows of S2 through S10,
we see the following:

Each row index number represents the principal component number and each
value within a particular principal component represents the percentage of the
variability that principal component explains. In this experiment, our threshold for
whether we shall retain a principal component is 1%. Subsequently, we notice that
only the first 5 principal components meet the threshold we have set. As such, we
remove components 6 through 10. When translating this elimination of the principal
components to the original data, we choose to keep columns S2 through S6, and
eliminate the rest from our training data.

Does S1 go up or down cumulatively (on an open-to-close basis) over this
period?
S1 represents daily open to close changes of a stock. We find that s1
increases cumulatively over this 50 day period by 5.92 points. When observing the
cumulative changes in stock over the first 50 days, we see the following:

What Techniques did you use? Why?
We began our experiment by using principal component analysis and from this
technique determined our explanatory variables to be S2 through S6. As stated prior,
the benefit of this technique is that we preserve the variance of the data set, but are
also able to transform it in a manner that allows us to understand the contribution
of each principal component to the total variance within the data. After the training
data for the explanatory variables has been determined, we cross validate both the
response and explanatory variables by randomly sampling the index and row
respectively within the range of the training set. By doing this, we are not only
preventing over fitting, but we are also able to test our model on “new” data. This
allows us to gain a more realistic perspective on how it would perform with out of
sample data.

Models Used to Predict S1
When performing this experiment, these following five models were chosen for
evaluation. The Scikitlearn module was used for several of the implementations,
while on model was constructed in stepwise fashion. The models used are as
follows:

a. Ridge Regression – method used to analyzing multiple regression
data that suffers from multicollinearity (linear or near linear
relationships between explanatory variables). This regression accounts
for bias, so the standard errors are reduced and therefore more
reliable than traditional regression methods. [Scikitlearn]

b. Support Vector Regression – regression that utilizes kernels
(functions that operate in feature space without having to compute
coordinates of the data and computing inner products between data
pairs instead) to optimize the bounds for the regression. [Scikitlearn]

c. Kernel Ridge Regression – ridge regression except linear function is
learned in the space induced by the respective kernel. [Scikitlearn]
d. Neural Network using Ridge Regression – system of “neurons” that
data is inputted into containing weights. These weights are updated
each iteration of the algorithm. Ridge Regression is used as the
function within the neurons. [Implemented manually]

e. Stochastic Gradient Descent – finding the local minimum of a
function, using the negative direction of the gradient (increase or
decrease in magnitude/derivative of function). [Scikitlearn]
For this experiment, we choose to iterate the implementation of these
algorithms for 100 trials. The reasoning behind this is to gain a more reasonable
approximation of the following summary statistics with respect to the sum of
squares:
• Maximum
• Minimum
• Mean
• Standard deviation
• Range
We shall also be finding the r squared value, which informs us how much of the
variability in y can be explained by x. However, this does not change from iteration
to iteration since only the orientation of the data is changing. Our objective is to
choose the model with the lowest sum of squares while also maximizing our r
squared value. Upon completion of the iterations, we observe the following:

Determining The Model to Choose and Why
We find that, generally, the Support Vector Regression performs the best in
consideration of our objectives. Of all the models utilized, it has the highest r
squared value, the lowest standard deviation of sum of squares, and has the lowest
maximum sum of squared values. While it does not have the lowest range of sum of
squares, nor does it have the lowest minimum sum of squares observed, the
difference in these statistics from the best performing models is very minimal.

`The positive performance of the support vector regression is due in part to
the epsilon intensive loss function. This function essentially ignores errors within a
certain distance of the true value of the data point. Using this function, we achieve a
global minimum, while still retaining generalization within the bounds of the
hyperplane or set of hyperplanes (the bounds within which we observe the given
data, defined by the kernel). This model is robust, and can handle both linear and
nonlinear regression, also making it a suitable choice for the task at hand. Be this as
it may, our model is not perfect and we must understand its limits, particularly
within the context of financial data.

How much confidence do you have in your model? Why and when would it
fail?
As stated prior, financial data presents itself with many problems that must be
accounted for. When examining the volatility of S1 in our training data set, we
observe the following:
Where
Y-axis: F – M , Tu – Th, and Total represent Fridays and Mondays, Tuesdays through
Thursdays, and Total Days respectively.
X-axis: Vol, #Days, %Days, and SSRs represent volatility, number of days,
percentage of days and sum of squared residuals for a particular iteration.
It is worth noting F-M is 10 two-day pairs and Tu – Th 10 three-day pairs. We
can see that there is more variability on Friday and Monday than Thursday through
Friday. Cumulatively, the most inaccurate predictions in this observation come from
the Tuesday through Thursday period. Below, we observe the actual S1 in red and
our predicted S1 in green for the training period (not cross validated data):

The algorithm does perform well with respect to its predictive abilities, however
there are still shortcomings to this technique. The main shortcoming is that the
model generally overestimates returns slightly and with moderate variability in the
residuals. As for why this happens, this is most likely due to the kernel we have
selected, which is non-linear. Different kernels produce differing hyperplanes, and
therefore different predictions. In general, we would like to keep our models more
generalized for out of sample prediction, but support vector regression is noted for
often requiring specific kernel selection for better predictive results. Figuring which
kernel to choose would require significant amounts of time and whatever benefit we
gain in more accurate predictions in sample, we trade in the general accuracy of the
model, particularly with out of sample data.

As for when this model would operate best, that would likely be when the
systemic factors within the market stay the same, so that the kernel chosen is still
appropriate across the entirety of the data set. Periods such as 2008 would likely
render this model not as useful as periods in which there is relative stability, or
known as when the market is trading “sideways.” In conclusion, the support vector
regression modeling of our reduced data set (via principal component analysis) is
the best model for our regression, but we can see that there is still fine tuning that
must be done respective to the situation, such as which kernel to use. So long as this
model is used in periods in which systemic factors are constant, the predictive
power is significantly enhanced and is therefore recommendable as a component of
a decision-making processes.

Asset Price Prediction with Machine Learning

More Related Content

Recently uploaded (20)

Featured (20)

Asset Price Prediction with Machine Learning