Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

In this article we will go through a complete validation of a parametric short term trading model for futures.

The validation, carried out on the model applied to a basket of futures, will allow us to define a concrete methodology to eventually trade the model.

You will find a step-by-step guidance on how to implement the validation in python, with code snippets and hopefully useful references.

I have been using python 3 and Jupiter notebooks to carry out the analysis. 

THE DATA

First thing to do is to collect the data, I have used 13 years of hourly OHLC data for futures market, back adjusted by difference.

The historical data are organized into a dictionary: what I usually do is to save the data into a specific folder and then create a function that outputs the data as a pandas dataframe

No alt text provided for this image
No alt text provided for this image

MODEL AND CALIBRATION

The trading model under discussion is a parametric model, which means that the strategy is a function of the data sets, and of the n parameters of the model.

The model will be defined as a class as described in this code snippet:

class model()
    def __init__(self,df,par_1,par_2,...,par_n):
        self.df= df
        self.par_1 = par_1
        .
        .
        .
        self.par_n = par_n
        
        #a series of operation that will calculate pl and sharpe of 
        #your model correpsonding to a specific set of parameters
        
        self.pl = pl(df,pars)
        self.sharpe = sharpe(df,pars):        
No alt text provided for this image

The set of values these n_parameters can take, will form a specific parameter space of certain dimensionality, specifically here all the parameters will be integer numbers bounded from above and below.

Par1 [LB1,HB1]

Par2 [LB2,HB2]

.

.

ParN [LBN,HBN]

where LB and HB stands for Lower bound and Higher bound

To calibrate the model over the parameter space, we need to specify a fitness function that will be maximized during the calibration.

I will be using the Sharpe Ratio.

The parameters space is extremely large, the amount of all possible combinations is about 10^6, in order to make the calculation time affordable, and to mitigate the risk of overfitting, genetic optimization is needed.

I use an open-source library called pygad:

https://pygad.readthedocs.io/en/latest/

In order to calibrate a model with a genetic algorithm there are a certain numbers of inputs that needs to be specified, I will not go through a detailed discussion on how to select those inputs, but whoever wants to dig further can easily finds references on the net.

I will just mention that I am using these rather standards parameters to define the GA:

  • number of generation = 15,
  • number of solutions to be selected as parents = 10
  • crossover probability = 0.95
  • mutation_probability= 0.05

With the help of pygad, we can define a function that, given the parameter space, and the data, will return the best solution.

It is common practice in the context of genetic optimization to call the parameters GENES.

To represent the gene space I am using an other dictionary, with keys given by the instruments under considerations and the corresponding item by the parameter space, for instance:

generanges[DAX] = [[LB1...HB1],[LB2...HB2],.....,[[LBN...HBN]]

(If you copy and paste the code on your jupyter notebook the indentation will be automatically fixed)

def run_ga_calibration(model,df,symbol,generanges,num_generation =   
15,num_parents_mating = 10,sol_per_pop = 100,crossover_probability = 0.95,mutation_probability= 0.05): 
    
    def fitness_func(solution,solution_idx):
        fitness = model(df,symbol,*solution).sharpe
        return fitness 
    
    ga_instance = pygad.GA(num_generations=num_generation,
                               num_parents_mating=num_parents_mating,
                               fitness_func=fitness_func,
                               sol_per_pop=sol_per_pop,
                               num_genes=len(generanges[symbol]),
                               gene_type=int,
                               gene_space = generanges[symbol],
                               crossover_probability = crossover_probability,
                               mutation_probability = mutation_probability,
                               save_solutions=True,
                               save_best_solutions=True,
                               suppress_warnings=True
                              )
   
    ga_instance.run()
    solution, solution_fitness, solution_idx = ga_instance.best_solution()
    
    return solution,solution_fitness         

if you run it for instance on the whole dataset for DAX you will get something like:

No alt text provided for this image

THE VALIDATION ENVIRONMENT - WALK FORWARD

To assess the robustness of the model, walk forward analysis will be used.

In particular I will perform 2 tests, differing in the way the sets of Train and Test sets are defined:

  • Anchored walk forward

No alt text provided for this image

  • Unanchored or rolling walk forward

No alt text provided for this image

There are pros and cons in both cases:

The anchored walk forward has the benefits to use larger amount of data for calibration as compared to the unanchored one, but it is somehow affected by the lack of replicability in live trading, as it assumes that the dimension of the train set varies with time.

The unanchored walk forward is more similar to what it will happens in reality, but it trains each time in a small datasets compared to the one that will be used in live, increasing the risk of overfitting at every round.

I believe it is useful to perform both tests, but qualitatively I tend to give more significance to the anchored test. The important point to understand here is that both cases it will be possible to recreate a complete out of sample performance of the model by connecting all the test sets together.

The code below will help in constructing the sets of train and test, given the historical dataframe, the date at which you want to start the out of sample evaluation, the length of the test sets and the typology of the walk forward:

def create_IS_OS(df,startyear,n_months_OS,typology)
    df = df.copy()
    df_IS_OS = []
    if typology == 'anchored':
        firstendoftrain = datetime(startyear, 1, 1)
        i = 0
        listofendtrain = []
        while firstendoftrain+pd.DateOffset(months=i*n_months_OS)<df.index[-1]:
            endoftrain = firstendoftrain+pd.DateOffset(months=i*n_months_OS)
            OS = df[(df.index >= endoftrain) & (df.index <=endoftrain+pd.DateOffset(months=n_months_OS))]
            IS = df[(df.index <endoftrain)]
            df_IS_OS.append([IS,OS])
            i = i+1
    if typology == 'unanchored':
        firstendoftrain = datetime(startyear, 1, 1)
        firststartoftrain = df.index[0]
        i = 0
        listofendtrain = []
        listofstarttrain = []
        while firstendoftrain+pd.DateOffset(months=i*n_months_OS)<df.index[-1]:
            endoftrain = firstendoftrain+pd.DateOffset(months=i*n_months_OS)
            startoftrain = firststartoftrain+pd.DateOffset(months=i*n_months_OS)
            OS = df[(df.index >= endoftrain) & (df.index <=endoftrain+pd.DateOffset(months=n_months_OS))]
            IS = df[(df.index >=startoftrain)&(df.index <endoftrain)]
            df_IS_OS.append([IS,OS])
            i = i+1
    return df_IS_OS        
No alt text provided for this image

PERFORMING THE TEST

Equipped with these resources we are now ready to perform the walk forward test with the function below:

def generate_wf(df_IS_OS,symbol,generanges)
    summary = []
    for sample in df_IS_OS[symbol]:
        IS = sample[0]
        OS = sample[1]
        IS_start = IS.index[0]
        IS_end = IS.index[-1]
        OS_start = OS.index[0]
        OS_end = OS.index[-1]
        solution,solution_fitness = run_ga_calibration(model,IS,symbol,generanges)


        print("Symbol {n}".format(n = symbol))
        print("IS starts: {IS_start}".format(IS_start = IS_start))
        print("IS end: {IS_end}".format(IS_end = IS_end))
        print("OS starts: {OS_start}".format(OS_start = OS_start))
        print("OS end: {OS_end}".format(OS_end = OS_end))
        print("Parameters of the best solution : {solution}".format(solution=solution))
        print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
        metric_os = model(OS,symbol,*solution).sharpe
        metric_is = solution_fitness
        summary.append([symbol,IS_start,IS_end,OS_start,OS_end,metric_is,metric_os,solution])
    SUMMARY = pd.DataFrame(summary)
    SUMMARY.columns = [symbol,'IS_start','IS_end','OS_start','OS_end','Shrpe_IS','Sharpe_OS','best_solution']
    return SUMMARY        

Assuming to set the start date of the first test set in 2015 and 6 monhts as the length of each out of sample we will get results in this format:

No alt text provided for this image

A little bit of manipulation on the results will enable us to visualize the complete out of sample performances:

pl_IS = {symbol:[model(df_IS_OS[symbol][i][0],symbol,*j).pl for (i,j)
                 in enumerate(SUMMARY[symbol].best_solution)] for symbol in symbols}


w = {symbol: [1/a.std() for a in pl_IS[symbol]] 
     for symbol in symbols} #weighting the pl contribution in order to have the same vol in each train set for every instrument


pl_OS = {symbol:[model(df_IS_OS[symbol][i][1],symbol,*j).pl for (i,j) 
                 in enumerate(SUMMARY[symbol].best_solution)] for symbol in symbols}


pl_OS_weighted = {symbol:[pl_OS[symbol][i]*w[symbol][i] for (i,j) i
                          n enumerate(pl_OS[symbol])] for symbol in symbols}


os_pl = pd.concat(pl_OS_concat,axis = 1).fillna(0)


os_pl_weighted = pd.concat(pl_OS_weighted_concat,axis = 1).fillna(0)         

First let's show an example of the results In sample - Out of sample for a specific instrument:

fig = plt.figure(figsize=(30,20)
for i in range(14):
    fig.add_subplot(5,3, i+1)
    pl_is = pl_IS['GOLD'][i]
    connect = pl_IS['GOLD'][i].sum()
    pl_os = pl_OS['GOLD'][i]
    plt.plot(pl_is.cumsum(), color='green',label='train_set')
    plt.plot(pl_os.cumsum()+connect, color='blue',label='test_set')
    plt.title('Walk forward')
    plt.legend()
plt.show())        
No alt text provided for this image

Then let's connect all the blue segments for each instruments:

No alt text provided for this image

and finally the plot of the overall performance summing up all the instruments:

No alt text provided for this image

ADDITIONAL TESTS

To further strengthen the validation, one idea is to record the sharpe of the out of sample under different assumptions: anchored/unanchored, different start date of the out of sample, different test set lenght,weighting the pl contribution or not weighting them:

No alt text provided for this image
No alt text provided for this image

THANK YOU!








 

To view or add a comment, sign in

Others also viewed

Explore topics