Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

In this article we will go through a complete validation of a parametric short term trading model for futures.

The validation, carried out on the model applied to a basket of futures, will allow us to define a concrete methodology to eventually trade the model.

You will find a step-by-step guidance on how to implement the validation in python, with code snippets and hopefully useful references.

I have been using python 3 and Jupiter notebooks to carry out the analysis.

THE DATA

First thing to do is to collect the data, I have used 13 years of hourly OHLC data for futures market, back adjusted by difference.

The historical data are organized into a dictionary: what I usually do is to save the data into a specific folder and then create a function that outputs the data as a pandas dataframe

MODEL AND CALIBRATION

The trading model under discussion is a parametric model, which means that the strategy is a function of the data sets, and of the n parameters of the model.

The model will be defined as a class as described in this code snippet:

class model()
    def __init__(self,df,par_1,par_2,...,par_n):
        self.df= df
        self.par_1 = par_1
        .
        .
        .
        self.par_n = par_n
        
        #a series of operation that will calculate pl and sharpe of 
        #your model correpsonding to a specific set of parameters
        
        self.pl = pl(df,pars)
        self.sharpe = sharpe(df,pars):

The set of values these n_parameters can take, will form a specific parameter space of certain dimensionality, specifically here all the parameters will be integer numbers bounded from above and below.

Par1 [LB1,HB1]

Par2 [LB2,HB2]

.

ParN [LBN,HBN]

where LB and HB stands for Lower bound and Higher bound

To calibrate the model over the parameter space, we need to specify a fitness function that will be maximized during the calibration.

I will be using the Sharpe Ratio.

The parameters space is extremely large, the amount of all possible combinations is about 10^6, in order to make the calculation time affordable, and to mitigate the risk of overfitting, genetic optimization is needed.

I use an open-source library called pygad:

https://pygad.readthedocs.io/en/latest/

In order to calibrate a model with a genetic algorithm there are a certain numbers of inputs that needs to be specified, I will not go through a detailed discussion on how to select those inputs, but whoever wants to dig further can easily finds references on the net.

I will just mention that I am using these rather standards parameters to define the GA:

number of generation = 15,
number of solutions to be selected as parents = 10
crossover probability = 0.95
mutation_probability= 0.05

With the help of pygad, we can define a function that, given the parameter space, and the data, will return the best solution.

It is common practice in the context of genetic optimization to call the parameters GENES.

To represent the gene space I am using an other dictionary, with keys given by the instruments under considerations and the corresponding item by the parameter space, for instance:

generanges[DAX] = [[LB1...HB1],[LB2...HB2],.....,[[LBN...HBN]]

(If you copy and paste the code on your jupyter notebook the indentation will be automatically fixed)

def run_ga_calibration(model,df,symbol,generanges,num_generation =   
15,num_parents_mating = 10,sol_per_pop = 100,crossover_probability = 0.95,mutation_probability= 0.05): 
    
    def fitness_func(solution,solution_idx):
        fitness = model(df,symbol,*solution).sharpe
        return fitness 
    
    ga_instance = pygad.GA(num_generations=num_generation,
                               num_parents_mating=num_parents_mating,
                               fitness_func=fitness_func,
                               sol_per_pop=sol_per_pop,
                               num_genes=len(generanges[symbol]),
                               gene_type=int,
                               gene_space = generanges[symbol],
                               crossover_probability = crossover_probability,
                               mutation_probability = mutation_probability,
                               save_solutions=True,
                               save_best_solutions=True,
                               suppress_warnings=True
                              )
   
    ga_instance.run()
    solution, solution_fitness, solution_idx = ga_instance.best_solution()
    
    return solution,solution_fitness

if you run it for instance on the whole dataset for DAX you will get something like:

THE VALIDATION ENVIRONMENT - WALK FORWARD

To assess the robustness of the model, walk forward analysis will be used.

In particular I will perform 2 tests, differing in the way the sets of Train and Test sets are defined:

Anchored walk forward

Unanchored or rolling walk forward

There are pros and cons in both cases:

The anchored walk forward has the benefits to use larger amount of data for calibration as compared to the unanchored one, but it is somehow affected by the lack of replicability in live trading, as it assumes that the dimension of the train set varies with time.

The unanchored walk forward is more similar to what it will happens in reality, but it trains each time in a small datasets compared to the one that will be used in live, increasing the risk of overfitting at every round.

I believe it is useful to perform both tests, but qualitatively I tend to give more significance to the anchored test. The important point to understand here is that both cases it will be possible to recreate a complete out of sample performance of the model by connecting all the test sets together.

The code below will help in constructing the sets of train and test, given the historical dataframe, the date at which you want to start the out of sample evaluation, the length of the test sets and the typology of the walk forward:

def create_IS_OS(df,startyear,n_months_OS,typology)
    df = df.copy()
    df_IS_OS = []
    if typology == 'anchored':
        firstendoftrain = datetime(startyear, 1, 1)
        i = 0
        listofendtrain = []
        while firstendoftrain+pd.DateOffset(months=i*n_months_OS)<df.index[-1]:
            endoftrain = firstendoftrain+pd.DateOffset(months=i*n_months_OS)
            OS = df[(df.index >= endoftrain) & (df.index <=endoftrain+pd.DateOffset(months=n_months_OS))]
            IS = df[(df.index <endoftrain)]
            df_IS_OS.append([IS,OS])
            i = i+1
    if typology == 'unanchored':
        firstendoftrain = datetime(startyear, 1, 1)
        firststartoftrain = df.index[0]
        i = 0
        listofendtrain = []
        listofstarttrain = []
        while firstendoftrain+pd.DateOffset(months=i*n_months_OS)<df.index[-1]:
            endoftrain = firstendoftrain+pd.DateOffset(months=i*n_months_OS)
            startoftrain = firststartoftrain+pd.DateOffset(months=i*n_months_OS)
            OS = df[(df.index >= endoftrain) & (df.index <=endoftrain+pd.DateOffset(months=n_months_OS))]
            IS = df[(df.index >=startoftrain)&(df.index <endoftrain)]
            df_IS_OS.append([IS,OS])
            i = i+1
    return df_IS_OS

PERFORMING THE TEST

Equipped with these resources we are now ready to perform the walk forward test with the function below:

def generate_wf(df_IS_OS,symbol,generanges)
    summary = []
    for sample in df_IS_OS[symbol]:
        IS = sample[0]
        OS = sample[1]
        IS_start = IS.index[0]
        IS_end = IS.index[-1]
        OS_start = OS.index[0]
        OS_end = OS.index[-1]
        solution,solution_fitness = run_ga_calibration(model,IS,symbol,generanges)


        print("Symbol {n}".format(n = symbol))
        print("IS starts: {IS_start}".format(IS_start = IS_start))
        print("IS end: {IS_end}".format(IS_end = IS_end))
        print("OS starts: {OS_start}".format(OS_start = OS_start))
        print("OS end: {OS_end}".format(OS_end = OS_end))
        print("Parameters of the best solution : {solution}".format(solution=solution))
        print("Fitness value of the best solution = {solution_fitness}".format(solution_fitness=solution_fitness))
        metric_os = model(OS,symbol,*solution).sharpe
        metric_is = solution_fitness
        summary.append([symbol,IS_start,IS_end,OS_start,OS_end,metric_is,metric_os,solution])
    SUMMARY = pd.DataFrame(summary)
    SUMMARY.columns = [symbol,'IS_start','IS_end','OS_start','OS_end','Shrpe_IS','Sharpe_OS','best_solution']
    return SUMMARY

Assuming to set the start date of the first test set in 2015 and 6 monhts as the length of each out of sample we will get results in this format:

A little bit of manipulation on the results will enable us to visualize the complete out of sample performances:

pl_IS = {symbol:[model(df_IS_OS[symbol][i][0],symbol,*j).pl for (i,j)
                 in enumerate(SUMMARY[symbol].best_solution)] for symbol in symbols}


w = {symbol: [1/a.std() for a in pl_IS[symbol]] 
     for symbol in symbols} #weighting the pl contribution in order to have the same vol in each train set for every instrument


pl_OS = {symbol:[model(df_IS_OS[symbol][i][1],symbol,*j).pl for (i,j) 
                 in enumerate(SUMMARY[symbol].best_solution)] for symbol in symbols}


pl_OS_weighted = {symbol:[pl_OS[symbol][i]*w[symbol][i] for (i,j) i
                          n enumerate(pl_OS[symbol])] for symbol in symbols}


os_pl = pd.concat(pl_OS_concat,axis = 1).fillna(0)


os_pl_weighted = pd.concat(pl_OS_weighted_concat,axis = 1).fillna(0)

First let's show an example of the results In sample - Out of sample for a specific instrument:

fig = plt.figure(figsize=(30,20)
for i in range(14):
    fig.add_subplot(5,3, i+1)
    pl_is = pl_IS['GOLD'][i]
    connect = pl_IS['GOLD'][i].sum()
    pl_os = pl_OS['GOLD'][i]
    plt.plot(pl_is.cumsum(), color='green',label='train_set')
    plt.plot(pl_os.cumsum()+connect, color='blue',label='test_set')
    plt.title('Walk forward')
    plt.legend()
plt.show())

Then let's connect all the blue segments for each instruments:

and finally the plot of the overall performance summing up all the instruments:

ADDITIONAL TESTS

To further strengthen the validation, one idea is to record the sharpe of the out of sample under different assumptions: anchored/unanchored, different start date of the out of sample, different test set lenght,weighting the pl contribution or not weighting them:

THANK YOU!

Validation of a short term parametric trading model with genetic optimization and walk forward analysis with python

Francesco Landolfi

THE DATA

MODEL AND CALIBRATION

THE VALIDATION ENVIRONMENT - WALK FORWARD

More articles by this author

Others also viewed

Modular Markov Chain Monte Carlo in Python

Numpy

DQ Outlier Detection with Interquartile Range (IQR) in Python

The Algorithm

Random Data Generation ( Important Topic )

Transportation Problem (capacity allocation) in Python using Gurobi

Will Julia Replace Python and R for Data Science?

FIVE NUMBER SUMMARY USING PYTHON

Predict Time Series Data using GMDH Method in Python in 2 minutes

Technical Tuesday: Exploring financial market data with Z-Scores in Python

Explore topics

THE DATA

MODEL AND CALIBRATION

THE VALIDATION ENVIRONMENT - WALK FORWARD

Validation Results of a Parametric Model for European Equity Index Futures

Jul 12, 2023

Validation of parametric short-term strategies with python: how fast can you really go?

Jun 28, 2023

Mapping sharpe ratio with critical performance measures to improve live strategy assessments

Jun 12, 2023

Estimating live trading confidence levels from multiple walk forward generation

May 26, 2023

Decoding the concept of probability of backtest overfitting: a step-by-step guide with python scripts and visual aids.

Apr 18, 2023

Model-Specific equity control methodology from walk forward validation"

Mar 3, 2023

Deriving a Model-Specific Equity Control Methodology from Time-Aware Validation Techniques"

Mar 2, 2023

What's the threshold for concern when a strategy's PL remains negative? A numerical study on the connection between Sharpe and latest negative time.

Feb 22, 2023

From Sharpe Ratio to Max-Drawdown, a numerical approach.

Jan 30, 2023

Validating a parametric trading system calibrated through a genetic algorithm with python.

Nov 2, 2021

Others also viewed

Modular Markov Chain Monte Carlo in Python

Numpy

DQ Outlier Detection with Interquartile Range (IQR) in Python

The Algorithm

Random Data Generation ( Important Topic )

Transportation Problem (capacity allocation) in Python using Gurobi

Will Julia Replace Python and R for Data Science?

FIVE NUMBER SUMMARY USING PYTHON

Predict Time Series Data using GMDH Method in Python in 2 minutes

Technical Tuesday: Exploring financial market data with Z-Scores in Python

Explore topics