Branch Prediction Contest: Implementation of Piecewise Linear Prediction
                                Algorithm
                                               Prosunjit Biswas
                                         Department of Computer Science.
                                        University of Texas at San Antonio.

                     Abstract                                         First Path-Based Neural Branch Prediction[4]
Branch predictor’s accuracy is very important to            is another attempt that combines path and pattern
harness the parallelism available in ILP and thus           history to overcome the limitation associated with
improve performance of today’s microprocessors              preexisting neural predictors. It improved accuracy
and specially superscalar processors. Among branch          over previous neural predictors and achieved
predictors, various neural branch predictors                significantly low latency. This predictor achieved IPC
including Scaled Neural Branch Predictor (SNAP),            of an aggressively clocked microarchitecture by 16%
Piecewise Linear Branch predictor outperform other          over the former perceptron predictor.
state-of-the-art predictors. In this course final           Scaled neural analog predictor, or SNAP is another
project for the course of Computer Architecture             recently proposed neural branch predictor which uses
(CS-5513), I have studied various neural predictors         the concept of piecewise-linear branch prediction and
and implemented the Piecewise Linear Branch                 relies on a mixed analog/digital implementation. This
Predictor as per the algorithm provided by a                predictor decreases latency over power consumption
research paper of Dr. Daniel A. Jimenez. The                over other available neural predictors [5]. Fig.1
hardware budget is restricted for this project and I        (Courtesy – “An Optimized Scaled Neural Branch
have implemented the predictor within a predefined          Predictor” by Daniel A. Jimenez) shows comparative
hardware budget of 64K of memory. I am also                 performance of noted branch prediction approaches on
competing for branch prediction contest.                    a set of SPEC CPU 2000 and 2006 integer benchmarks.

                                                                                 III.   THE ALGORITHM
Keywords: Piecewise        Linear,   Neural   Network,
                                                            The Branch predictor algorithm has two major parts
Branch Prediction.
                                                            namely i) Prediction algorithm ii) Train/Update
                                                            algorithm. Before going to the implementation of these
                  I.    INTRODUCTION
Neural Branch predictors are the most accurate
predictors in the literature but they were impractical
due to the high latency associated with prediction. This
latency is due to the complex computation that must be
carried out to determine the excitation of an artificial
neuron. [3]
Piecewise Linear Branch Prediction [1] improved both
accuracy and latency over previous neural predictors.
This predictor works by developing a set of linear
functions, one for each program path to the branch to
be predicted that separate predicted taken from
predicted untaken.
In this paper, Piecewise Linear Branch Prediction,
Daniel A. Jimenez proposed two versions of the
prediction algorithm – i) The Idealized Piecewise
Linear Branch Predictor and ii) A Practical Piecewise
Linear Branch Predictor. In this project, I have focused
on the idealized predictor.

                 II. RELATED WORKS                          Fig. 1. Performance of Branch different branch
                                                            Predictors over SPEC CPU 2000 and 2006 integer
                                                            benchmarks (Courtesy - “An Optimized Scaled Neural
Perceptron prediction is one of the first attempts in       Branch Predictor” by Daniel A. Jimenez)
branch prediction history that associated branch            two algorithms, we will discuss the states and variable
prediction through neural network. This predictor           they use. The three dimensional array W is the data
achieved a improved misprediction rate on a composite       structure used to store weights of the branches which is
trace of SPEC2000 benchmarks by 14.7%. [2] But              used in both prediction and update algorithm.
unfortunately, this predictor was impractical due to its
high latency.
Table II: The update/train algorithm

                                                              void update (branch_update *u, bool taken, unsigned int target) {
                                                                         if (bi.br_flags & BR_CONDITIONAL) {
    Fig2: The array of W with its corresponding indices                    if ( abs(output)< theta || ( (output>=0) != taken) ){

                                                                                    if (taken == true ) {
Branch address is generally taken as the last 8/10 bits                                   if (W[address][0][0] < SAT_VAL)
of the instruction address. For each predicting branch,                                        W[address][0][0] ++;
the algorithm keeps history of all other branches that                                 } else {
                                                                                          if (W[address][0][0] > (-1) * SAT_VAL)
precede this branch in the dynamic path taken by the                                           W[address][0][0] --;
branch. The second dimension indicated by the variable
GA keeps track of these per branch dynamic path                                     }
history. The third dimension, as shown as GHR[i],                          for(int i=0; i<H-1; i++) {
                                                                              if(GHR[i] == taken ) {
keeps track of the position of the address GA[i] in the                              if (W[address][GA[i]][i] < SAT_VAL)
global branch history register namely GHR.                                                 W[address][GA[i]][i] ++;
                                                                                      } else {
Some of the important variables of the algorithm is also                             if (W[address][GA[i]][i] > (-1) * SAT_VAL+1
                                                              )
given here for the clarity purpose.                                                      W[address][GA[i]][i] --;
                                                                                     }
GA : An array of address. This array keeps the path                            }
history associated with each branch address. As new                        }
                                                                         shift_update_GA(address);
branch is executed, the address of the branch is shifted                 shift_update_GHR(taken);
into the first position of the array.                                    }
                                                              }
GHR: An array of Boolean true/false value. This array
keep track of the taken / untaken status of the branches.

H : Length of History Register.                                          IV. TUNING PERFORMANCE
Output: An integer value generated by the predictor           Besides the algorithm, the MPKI (Miss Per Kilo
algorithm to predict current branch.                          Instruction) rate of the algorithm depends on the size of
                                                              various dimension of the array W. I have experienced
                                                              MPKI against various dimension of W. The result of
Table I: The prediction algorithm.                            my experiment is shown below. Table 1 shows the
                                                              result of the experiment.


void branch_update *predict (branch_info & b) {               Table I : MPKI rate of the Piecewise Linear Algorithm
            bi = b;
          if (b.br_flags & BR_CONDITIONAL) {                  with limited budget of 64K
              address = ( ((b.address >> 4 ) & 0x0F )<<2) |
                           ((b.address>>2)) & 0x03;               W[i][GA[i]][GHR[i]                          MPKI
               output = W[address][0][0];
               for (int i=0; i<H; i++) {
                                                                    W[64][16][64]                             3.982
               if ( GHR[i] == true )                               W[128][16][32]                             4.217
                         output += W[address][GA[i]][i];            W[64][8][128]                             4.292
                          else if (GHR[i] == false)                W[32][16][128]                             5.807
                            output -= W[address][GA[i]][i];
                                                                    W[64][64][16]                             4.826
                         }
                        u.direction_prediction(output>=0);    The table shows that the predictor performs better when
             } else {
                                                              i, GA[i], GHR[i] has corresponding 64,16,64 entries.
                      u.direction_prediction (false);
              }
           u.target_prediction (0);                                V. TWEAKING INSTRUCTION ADDRESS
          return &u;
}
                                                              I have found that rather than taking the last bits from
                                                              the address, discarding the 2 least significant bits of the
                                                              address and then taking 3-8 bits make the predictor
                                                              predicts more accurately. It decreases the aliasing and
                                                              thus improves prediction rate a little bit.
Table II: 64 K ( 65,532 Byte) memory budget limit
                                                                                                                                                                                                                                                      calculation

                                                                                                                                                                                                                                                      DataStructure/Array/Varia      Memory calculation
  Fig. 3: Tweaking Branch address for performance                                                                                                                                                                                                     ble
                     speed up.                                                                                                                                                                                                                        W[64][16][63] of each 1        64,512 byte
                                                                                                                                                                                                                                                      Byte long
                                                                                                                                                                                                                                                      Constants(SIZE,H,SAT_V         5*1 byte ( each value < 128)
                                                                                                                   VI. RESULT                                                                                                                         AL,theta,N)
                                                                                                                                                                                                                                                      (GA[63] * 6 bits / 8) byte     48 byte
                                                                                                                                                                                                                                                      (GHR[63] * 1 bit / 8) byte     8 byte
Misprediction rate of the benchmarks according to the                                                                                                                                                                                                 vaiables (address , output )   8 byte
piecewise linear algorithm is shown in fig 4. Fig.5                                                                                                                                                                                                   * 4 byte
shows      comparison     of   different   prediction                                                                                                                                                                                                 Total:                         64,581 byte
algorithms(piecewise linear, perceptron and gshare)
against various given benchmarks.

  14
  12                                                                                                                                                                                                                                                                           VIII CONCLUSION
  10
  8
                                                                                                                                                                                                                                                      In this individual course final project, I have tried to
  6
                                                                                                                                                                                                                                                      implement the piecewise linear branch prediction
  4                                                                                                                                                                                                                                                   algorithm. . In my implementation, I have achieved a
  2                                                                                                                                                                                                                                                   MPKI of 3.988 at best. I think, it is also possible to
  0                                                                                                                                                                                                                                                   enhance the performance of this algorithm with better
                                                                                                                                                                                        /253.perlbmk
                                                                                                                                        222.mpegaudio




                                                                                                                                                                                                                                          300.twolf
                                                                                                    205.raytrace




                                                                                                                                                                                                                 255.vortex
                                                                                                                                                        227.mtrt




                                                                                                                                                                                                                              256.bzip2
       164.gzip



                                      181.mcf


                                                             197.parser
                                                                          201.compress



                                                                                                                   209.db
                                                186.crafty
                            176.gcc




                                                                                                                            213.javac
                  175.vpr




                                                                                         202.jess




                                                                                                                                                                              252.eon


                                                                                                                                                                                                       254.gap
                                                                                                                                                                   228.jack




                                                                                                                                                                                                                                                      implementation tricks. I have also compared the
                                                                                                                                                                                                                                                      performance of piecewise prediction algorithm with
                                                                                                                                                                                                                                                      perceptron and gshare algorithms. With the same
                                                                                                                                                                                                                                                      memory limit, piecewise prediction performs
Fig 4: Misprediction rate of different benchmarks using                                                                                                                                                                                               significantly better than the other two.
         piecewise linear prediction algorithm

                                                                                                                                                                                                                                                                              REFERENCES
                                                                                                                                                                                                                                                      [1] Daniel A. Jimenez. Piecewise linear branch
                                                                                                                                                                                                                                                          prediction. In Proceedings of the 32nd Annual
                                                                                                                                                                                                                                                          International    Symposium      on   Computer
                                                                                                                                                                                                                                                          Architecture (ISCA-32), June 2005.

                                                                                                                                                                                                                                                      [2] D. Jimenez and C. Lin. Dynamic branch prediction
                                                                                                                                                                                                                                                          with per-ceptrons. In Proceedings of the Seventh
                                                                                                                                                                                                                                                          International Sym-posium on High Performance
                                                                                                                                                                                                                                                          Computer Architecture,Jan-uary 2001

                                                                                                                                                                                                                                                      [3] Lakshminarayanan, Arun; Shriraghavan, Sowmya,
                                                                                                                                                                                                                                                          “Neural Branch Prediction” available at
  Fig 5: Comparison of prediction algorithms against                                                                                                                                                                                                      http://webspace.ulbsibiu.ro/lucian.vintan/html/neu
      different benchmarks on given 64K budget.                                                                                                                                                                                                           ralpredictors.pdf

                                                                                                                                                                                                                                                      [4] D.A. Jimenez, “Fast Path-Based Neural Branch
                                          VII. 64K BUDGET CALCULATION                                                                                                                                                                                     Prediction,” Proc. 36th Ann. Int’l Symp.
                                                                                                                                                                                                                                                          Microarchitecture, pp. 243-252, Dec. 2003.
I have limited the implementation of piecewise linear
prediction algorithm within 64K + 256 byte memory.                                                                                                                                                                                                    [5] D.A. Jimenez, “An optimized scaled neural branch
The algorithm performs better as I increase the memory                                                                                                                                                                                                    predictor,” Computer Design (ICCD), 2011 IEEE
limit. In table II, I have shown the calculation of 64K +                                                                                                                                                                                                 29th International Conference, pp. 113 - 118, Oct.
256 byte budget.                                                                                                                                                                                                                                          2011.

More Related Content

PDF
Graph Based Clustering
PDF
Parallel Algorithms K – means Clustering
PDF
Principal component analysis and matrix factorizations for learning (part 1) ...
PPT
Intro to MATLAB and K-mean algorithm
PDF
Parallel-kmeans
PPT
Clustering: Large Databases in data mining
PPTX
Clustering on database systems rkm
PPT
Enhance The K Means Algorithm On Spatial Dataset
Graph Based Clustering
Parallel Algorithms K – means Clustering
Principal component analysis and matrix factorizations for learning (part 1) ...
Intro to MATLAB and K-mean algorithm
Parallel-kmeans
Clustering: Large Databases in data mining
Clustering on database systems rkm
Enhance The K Means Algorithm On Spatial Dataset

What's hot (13)

PPTX
Clustering techniques
PDF
FINITE DIFFERENCE MODELLING FOR HEAT TRANSFER PROBLEMS
PDF
Analysing and combining partial problem solutions for properly informed heuri...
PDF
Clustering: A Survey
PDF
Parallel kmeans clustering in Erlang
PPTX
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
PDF
Alternating direction-implicit-finite-difference-method-for-transient-2 d-hea...
PPT
K mean-clustering
PPTX
Numerical methods for 2 d heat transfer
PDF
On selection of periodic kernels parameters in time series prediction
PPT
Data miningpresentation
PPT
Cluster analysis using k-means method in R
PDF
Data scientist training in bangalore
Clustering techniques
FINITE DIFFERENCE MODELLING FOR HEAT TRANSFER PROBLEMS
Analysing and combining partial problem solutions for properly informed heuri...
Clustering: A Survey
Parallel kmeans clustering in Erlang
"FingerPrint Recognition Using Principle Component Analysis(PCA)”
Alternating direction-implicit-finite-difference-method-for-transient-2 d-hea...
K mean-clustering
Numerical methods for 2 d heat transfer
On selection of periodic kernels parameters in time series prediction
Data miningpresentation
Cluster analysis using k-means method in R
Data scientist training in bangalore
Ad

Viewers also liked (6)

PDF
Cyber Security Exam 2
TXT
Recitation
TXT
Recitation
PDF
Transcription Factor DNA Binding Prediction
DOCX
Attribute Based Encryption
Cyber Security Exam 2
Recitation
Recitation
Transcription Factor DNA Binding Prediction
Attribute Based Encryption
Ad

Similar to Branch prediction contest_report (20)

PPT
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
PDF
Hybrid branch prediction for pipelined MIPS processor
PPT
Branch prediction
PPT
[2009 11-09] branch prediction
PPT
Lect09 adv-branch-prediction
PPT
Like 2014214
PPTX
improved register value pattern generation for branch prediction
PDF
FPGA configuration of an alloyed correlated branch predictor used with RISC p...
PPTX
Dynamic Branch Prediction - 2 Bit Predicition
PPTX
Dynamic Branch Prediction - 1 Bit Predicition
PPTX
Conditional branches
PDF
Performance and predictability (1)
PDF
Performance and Predictability - Richard Warburton
PPTX
Control hazards MIPS pipeline.pptx
PDF
Branch prediction
PDF
Performance and predictability
DOCX
Evaluation of Branch Predictors
PDF
Meltdown & Spectre
PDF
Instruction level parallelism using ppm branch prediction
PPT
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...
Lec5 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Branch Pred...
Hybrid branch prediction for pipelined MIPS processor
Branch prediction
[2009 11-09] branch prediction
Lect09 adv-branch-prediction
Like 2014214
improved register value pattern generation for branch prediction
FPGA configuration of an alloyed correlated branch predictor used with RISC p...
Dynamic Branch Prediction - 2 Bit Predicition
Dynamic Branch Prediction - 1 Bit Predicition
Conditional branches
Performance and predictability (1)
Performance and Predictability - Richard Warburton
Control hazards MIPS pipeline.pptx
Branch prediction
Performance and predictability
Evaluation of Branch Predictors
Meltdown & Spectre
Instruction level parallelism using ppm branch prediction
Lec6 Computer Architecture by Hsien-Hsin Sean Lee Georgia Tech -- Instruction...

More from UT, San Antonio (20)

PPTX
digital certificate - types and formats
PPTX
Saml metadata
PPTX
Static Analysis with Sonarlint
PPTX
Shellshock- from bug towards vulnerability
PPT
Abac17 prosun-slides
PPT
Abac17 prosun-slides
PDF
Big Data Processing: Performance Gain Through In-Memory Computation
PDF
Enumerated authorization policy ABAC (EP-ABAC) model
PDF
Where is my Privacy presentation slideshow (one page only)
PDF
Three month course
PDF
One month-syllabus
PPT
Zerovm backgroud
PPTX
Security_of_openstack_keystone
PDF
Research seminar group_1_prosunjit
PPT
Final Project Transciption Factor DNA binding Prediction
PPT
Transcription Factor DNA Binding Prediction
PPTX
Secure webbrowsing 1
PPT
On the incoherencies in web browser access control
PPT
Cultural conflict
PPTX
Pair programming
digital certificate - types and formats
Saml metadata
Static Analysis with Sonarlint
Shellshock- from bug towards vulnerability
Abac17 prosun-slides
Abac17 prosun-slides
Big Data Processing: Performance Gain Through In-Memory Computation
Enumerated authorization policy ABAC (EP-ABAC) model
Where is my Privacy presentation slideshow (one page only)
Three month course
One month-syllabus
Zerovm backgroud
Security_of_openstack_keystone
Research seminar group_1_prosunjit
Final Project Transciption Factor DNA binding Prediction
Transcription Factor DNA Binding Prediction
Secure webbrowsing 1
On the incoherencies in web browser access control
Cultural conflict
Pair programming

Recently uploaded (20)

PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
August Patch Tuesday
PPTX
Final SEM Unit 1 for mit wpu at pune .pptx
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
PDF
Unlock new opportunities with location data.pdf
PDF
DP Operators-handbook-extract for the Mautical Institute
PPTX
observCloud-Native Containerability and monitoring.pptx
PDF
Getting Started with Data Integration: FME Form 101
PDF
NewMind AI Weekly Chronicles – August ’25 Week III
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Zenith AI: Advanced Artificial Intelligence
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPT
Module 1.ppt Iot fundamentals and Architecture
PDF
Hybrid model detection and classification of lung cancer
PDF
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
PDF
Five Habits of High-Impact Board Members
PDF
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
PPTX
Group 1 Presentation -Planning and Decision Making .pptx
PDF
sustainability-14-14877-v2.pddhzftheheeeee
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
August Patch Tuesday
Final SEM Unit 1 for mit wpu at pune .pptx
A comparative study of natural language inference in Swahili using monolingua...
From MVP to Full-Scale Product A Startup’s Software Journey.pdf
Unlock new opportunities with location data.pdf
DP Operators-handbook-extract for the Mautical Institute
observCloud-Native Containerability and monitoring.pptx
Getting Started with Data Integration: FME Form 101
NewMind AI Weekly Chronicles – August ’25 Week III
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Zenith AI: Advanced Artificial Intelligence
Univ-Connecticut-ChatGPT-Presentaion.pdf
Module 1.ppt Iot fundamentals and Architecture
Hybrid model detection and classification of lung cancer
Hybrid horned lizard optimization algorithm-aquila optimizer for DC motor
Five Habits of High-Impact Board Members
TrustArc Webinar - Click, Consent, Trust: Winning the Privacy Game
Group 1 Presentation -Planning and Decision Making .pptx
sustainability-14-14877-v2.pddhzftheheeeee

Branch prediction contest_report

  • 1. Branch Prediction Contest: Implementation of Piecewise Linear Prediction Algorithm Prosunjit Biswas Department of Computer Science. University of Texas at San Antonio. Abstract First Path-Based Neural Branch Prediction[4] Branch predictor’s accuracy is very important to is another attempt that combines path and pattern harness the parallelism available in ILP and thus history to overcome the limitation associated with improve performance of today’s microprocessors preexisting neural predictors. It improved accuracy and specially superscalar processors. Among branch over previous neural predictors and achieved predictors, various neural branch predictors significantly low latency. This predictor achieved IPC including Scaled Neural Branch Predictor (SNAP), of an aggressively clocked microarchitecture by 16% Piecewise Linear Branch predictor outperform other over the former perceptron predictor. state-of-the-art predictors. In this course final Scaled neural analog predictor, or SNAP is another project for the course of Computer Architecture recently proposed neural branch predictor which uses (CS-5513), I have studied various neural predictors the concept of piecewise-linear branch prediction and and implemented the Piecewise Linear Branch relies on a mixed analog/digital implementation. This Predictor as per the algorithm provided by a predictor decreases latency over power consumption research paper of Dr. Daniel A. Jimenez. The over other available neural predictors [5]. Fig.1 hardware budget is restricted for this project and I (Courtesy – “An Optimized Scaled Neural Branch have implemented the predictor within a predefined Predictor” by Daniel A. Jimenez) shows comparative hardware budget of 64K of memory. I am also performance of noted branch prediction approaches on competing for branch prediction contest. a set of SPEC CPU 2000 and 2006 integer benchmarks. III. THE ALGORITHM Keywords: Piecewise Linear, Neural Network, The Branch predictor algorithm has two major parts Branch Prediction. namely i) Prediction algorithm ii) Train/Update algorithm. Before going to the implementation of these I. INTRODUCTION Neural Branch predictors are the most accurate predictors in the literature but they were impractical due to the high latency associated with prediction. This latency is due to the complex computation that must be carried out to determine the excitation of an artificial neuron. [3] Piecewise Linear Branch Prediction [1] improved both accuracy and latency over previous neural predictors. This predictor works by developing a set of linear functions, one for each program path to the branch to be predicted that separate predicted taken from predicted untaken. In this paper, Piecewise Linear Branch Prediction, Daniel A. Jimenez proposed two versions of the prediction algorithm – i) The Idealized Piecewise Linear Branch Predictor and ii) A Practical Piecewise Linear Branch Predictor. In this project, I have focused on the idealized predictor. II. RELATED WORKS Fig. 1. Performance of Branch different branch Predictors over SPEC CPU 2000 and 2006 integer benchmarks (Courtesy - “An Optimized Scaled Neural Perceptron prediction is one of the first attempts in Branch Predictor” by Daniel A. Jimenez) branch prediction history that associated branch two algorithms, we will discuss the states and variable prediction through neural network. This predictor they use. The three dimensional array W is the data achieved a improved misprediction rate on a composite structure used to store weights of the branches which is trace of SPEC2000 benchmarks by 14.7%. [2] But used in both prediction and update algorithm. unfortunately, this predictor was impractical due to its high latency.
  • 2. Table II: The update/train algorithm void update (branch_update *u, bool taken, unsigned int target) { if (bi.br_flags & BR_CONDITIONAL) { Fig2: The array of W with its corresponding indices if ( abs(output)< theta || ( (output>=0) != taken) ){ if (taken == true ) { Branch address is generally taken as the last 8/10 bits if (W[address][0][0] < SAT_VAL) of the instruction address. For each predicting branch, W[address][0][0] ++; the algorithm keeps history of all other branches that } else { if (W[address][0][0] > (-1) * SAT_VAL) precede this branch in the dynamic path taken by the W[address][0][0] --; branch. The second dimension indicated by the variable GA keeps track of these per branch dynamic path } history. The third dimension, as shown as GHR[i], for(int i=0; i<H-1; i++) { if(GHR[i] == taken ) { keeps track of the position of the address GA[i] in the if (W[address][GA[i]][i] < SAT_VAL) global branch history register namely GHR. W[address][GA[i]][i] ++; } else { Some of the important variables of the algorithm is also if (W[address][GA[i]][i] > (-1) * SAT_VAL+1 ) given here for the clarity purpose. W[address][GA[i]][i] --; } GA : An array of address. This array keeps the path } history associated with each branch address. As new } shift_update_GA(address); branch is executed, the address of the branch is shifted shift_update_GHR(taken); into the first position of the array. } } GHR: An array of Boolean true/false value. This array keep track of the taken / untaken status of the branches. H : Length of History Register. IV. TUNING PERFORMANCE Output: An integer value generated by the predictor Besides the algorithm, the MPKI (Miss Per Kilo algorithm to predict current branch. Instruction) rate of the algorithm depends on the size of various dimension of the array W. I have experienced MPKI against various dimension of W. The result of Table I: The prediction algorithm. my experiment is shown below. Table 1 shows the result of the experiment. void branch_update *predict (branch_info & b) { Table I : MPKI rate of the Piecewise Linear Algorithm bi = b; if (b.br_flags & BR_CONDITIONAL) { with limited budget of 64K address = ( ((b.address >> 4 ) & 0x0F )<<2) | ((b.address>>2)) & 0x03; W[i][GA[i]][GHR[i] MPKI output = W[address][0][0]; for (int i=0; i<H; i++) { W[64][16][64] 3.982 if ( GHR[i] == true ) W[128][16][32] 4.217 output += W[address][GA[i]][i]; W[64][8][128] 4.292 else if (GHR[i] == false) W[32][16][128] 5.807 output -= W[address][GA[i]][i]; W[64][64][16] 4.826 } u.direction_prediction(output>=0); The table shows that the predictor performs better when } else { i, GA[i], GHR[i] has corresponding 64,16,64 entries. u.direction_prediction (false); } u.target_prediction (0); V. TWEAKING INSTRUCTION ADDRESS return &u; } I have found that rather than taking the last bits from the address, discarding the 2 least significant bits of the address and then taking 3-8 bits make the predictor predicts more accurately. It decreases the aliasing and thus improves prediction rate a little bit.
  • 3. Table II: 64 K ( 65,532 Byte) memory budget limit calculation DataStructure/Array/Varia Memory calculation Fig. 3: Tweaking Branch address for performance ble speed up. W[64][16][63] of each 1 64,512 byte Byte long Constants(SIZE,H,SAT_V 5*1 byte ( each value < 128) VI. RESULT AL,theta,N) (GA[63] * 6 bits / 8) byte 48 byte (GHR[63] * 1 bit / 8) byte 8 byte Misprediction rate of the benchmarks according to the vaiables (address , output ) 8 byte piecewise linear algorithm is shown in fig 4. Fig.5 * 4 byte shows comparison of different prediction Total: 64,581 byte algorithms(piecewise linear, perceptron and gshare) against various given benchmarks. 14 12 VIII CONCLUSION 10 8 In this individual course final project, I have tried to 6 implement the piecewise linear branch prediction 4 algorithm. . In my implementation, I have achieved a 2 MPKI of 3.988 at best. I think, it is also possible to 0 enhance the performance of this algorithm with better /253.perlbmk 222.mpegaudio 300.twolf 205.raytrace 255.vortex 227.mtrt 256.bzip2 164.gzip 181.mcf 197.parser 201.compress 209.db 186.crafty 176.gcc 213.javac 175.vpr 202.jess 252.eon 254.gap 228.jack implementation tricks. I have also compared the performance of piecewise prediction algorithm with perceptron and gshare algorithms. With the same memory limit, piecewise prediction performs Fig 4: Misprediction rate of different benchmarks using significantly better than the other two. piecewise linear prediction algorithm REFERENCES [1] Daniel A. Jimenez. Piecewise linear branch prediction. In Proceedings of the 32nd Annual International Symposium on Computer Architecture (ISCA-32), June 2005. [2] D. Jimenez and C. Lin. Dynamic branch prediction with per-ceptrons. In Proceedings of the Seventh International Sym-posium on High Performance Computer Architecture,Jan-uary 2001 [3] Lakshminarayanan, Arun; Shriraghavan, Sowmya, “Neural Branch Prediction” available at Fig 5: Comparison of prediction algorithms against http://webspace.ulbsibiu.ro/lucian.vintan/html/neu different benchmarks on given 64K budget. ralpredictors.pdf [4] D.A. Jimenez, “Fast Path-Based Neural Branch VII. 64K BUDGET CALCULATION Prediction,” Proc. 36th Ann. Int’l Symp. Microarchitecture, pp. 243-252, Dec. 2003. I have limited the implementation of piecewise linear prediction algorithm within 64K + 256 byte memory. [5] D.A. Jimenez, “An optimized scaled neural branch The algorithm performs better as I increase the memory predictor,” Computer Design (ICCD), 2011 IEEE limit. In table II, I have shown the calculation of 64K + 29th International Conference, pp. 113 - 118, Oct. 256 byte budget. 2011.