SlideShare a Scribd company logo
Welcome to Redefining Perspectives
November 2012
Capital Markets Risk Management
And Hadoop
Kevin Samborn and
Nitin Agrawal




       © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   2
Agenda


• Risk Management
• Hadoop
• Monte Carlo VaR Implementation
•Q&A




         © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   4
Risk Management



   © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   5
What is Risk Management
•   Risk is a tool – the goal is to optimize and understand risk
    o Too much risk is locally and systemically dangerous
    o Too little risk means the firm may be “leaving profit on the table”
•   Portfolio exposure
    o Modern portfolios contain many different types of assets
    o Simple instruments, Complex instruments and derivatives
•   Many types of risk measures
    o Defined scenario-based stress testing
    o Value at Risk (VaR)
    o “Sensitivities”
•   Key is valuation under different scenarios
•   VaR is used in banking regulations, margin calculations and risk
    management
                © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL         6
Value at Risk (VaR)
• VaR is a statistical measure of risk – expressed as amount of loss given
  probability %. E.g. 97.5% chance that the firm will not lose more than 1mill
  USD over the next 5 days
• Computing VaR is a challenging data sourcing and compute intensive process
• VaR calculation:
  o Generate statistical scenarios of market behavior
  o Revalue the portfolio for each scenario, compare returns to today’s value
  o Sort results and select the desired percentage return: VALUE AT RISK
• Different VaR techniques:
  o Parametric – analytic approximation
  o Historical – captures real (historical) market dynamics
  o Monte Carlo – many scenarios, depends on statistical distributions
              © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                7
VaR Graphically




Source: An Introduction To Value at Risk (VAR), Investopedia, May 2010

                © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL      8
Complexities
• For modern financial firms, VaR is complex. Calculation requirements:
  o Different types of assets require different valuation models
    •   Risk-based approach
    •   Full revaluation
  o With large numbers of scenarios, many thousands of calculations are required
  o Monte Carlo simulations require significant calibration, depending on large historical
    data
• Many different reporting dimensions
  o VaR is not additive across dimensions. Product/asset class, Currency
  o Portfolio – including “what-if” and intraday activity
• Intraday market changes requiring new simulations
• Incremental VaR – how does a single (new) trade contribute to the total
                © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                          9
Backtesting VaR




        © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   10
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   11
Hadoop Core


• Data stored with REDUNDANCY on a                                  • Provides an EASY ABSTRACTION for
  Distributed File System                                             processing large data sets
• Abstracts H/W FAILURES delivering a                               • Infrastructure for PARALLEL DATA
  highly-available service on                                         PROCESSING across huge
  COMMODITY H/W                                                       Commodity cluster
• SCALES-UP from single to thousands                                • Infrastructure for TASK and LOAD
  of nodes                                                            MANAGEMENT
• Data stored WITHOUT A SCHEMA                                      • Framework achieves DATA-PROCESS
• Tuned for SEQUENTIAL DATA ACCESS                                    LOCALITY

Makes two critical assumptions though:
• Data doesn’t need to be updated
• Data doesn’t need to be accessed randomly
              © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                                        12
A Simple Map Reduce Job
   Problem Statement: From historical price data, create frequency distribution of 1-day %age change
   for various stocks
Stock   Date    Open Close                                                     BP|1, 33
                                  Map 1            S        Reduce 1
BP       23-Nov 435.25 435.5
                                                   O                           BP|2, 64
NXT      23-Nov 3598 3620
                                                   R                           …
MKS      23-Nov 378.5 380.7
BP       22-Nov 434.8 433.6                        T
NXT      22-Nov 3579 3603         Map 2            /        Reduce 2           NXT|81, 2
MKS      22-Nov 377.8    378                       S                           NXT|-20, 5
BP       21-Nov 430.75   433                       H
NXT      21-Nov 3574 3582                                                      …
                                                   U
MKS      21-Nov    375   376
                                                   F        Reduce 3           Output3
BP       20-Nov 430.9 432.25
                                                   F
NXT      20-Nov 3592 3600
MKS      20-Nov 373.7 375.3      Map M             L
BP       19-Nov 422.5 431.6                        E
NXT      19-Nov 3560 3600
MKS      19-Nov 368.5 372.6                                Reduce N            Output N
BP       16-Nov 423.9 416.6
NXT      16-Nov 3575 3542
MKS      16-Nov 370.3 366.4
BP
                 public void reduce(Text key, Iterable<IntWritable> values,
         15-Nov public void map(LongWritable key, Text value, Context
                   422 425.4
NXT                   Context context) throws IOException, InterruptedException {
         15-Nov 3596 3550
                  context) throwsLong> freqDist InterruptedException {
                    Map<Integer, IOException, = buildFreqDistribution(values);
MKS      15-Nov 376.5 370.6
                SecurityAttributes sa =
               Set<Integer> percentChanges = freqDist.keySet();
                     RecordsReadHelper.readAttribs(value.toString());
               for (Integer percentChange : percentChanges) {
                context.write(new Text(sa.getTicker()), + "|" + percentChange.toString()),
                  context.write(new Text(key.toString()
                     new IntWritable(sa.getPercentChange()));
                    new LongWritable(freqDist.get(percentChange)));
              } }    © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                               13
Hadoop Ecosystem | How/Where These Fit

                                              VISUALIZATION TOOLS

  USERS                                          DATA WAREHOUSE




                                                          PROCESSING
          Sqoop                                                         Zoo
           hiho                                                        Keeper
          Scribe
                                                                        HUE
          Flume
           LOAD                                             STORAGE    SUPPORT

          © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                    14
Monte-Carlo VaR Implementation



   © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   15
Monte Carlo VaR
2 Steps

    IBM                        … …

   MSFT                        … …                                   HLV1 = (∑AiVi) 1
  IBM.CO                       … …                                   HLV2 = (∑AiVi) 2
     …     …                   … …                                   …
                                                                     …
           V1 V2 V3                      V10,000                     HLV10k= (∑AiVi) 10k
                                                                              Aggregation
               SIMULATION                                                   Aggregation
                                                                          AGGREGATION


Challenges
  Daily trade data could be massive
  Valuations are Compute intensive
  VaR is not a simple arithmetic sum across hierarchies


               © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                          16
IBM         ……
                                                                                               MSFT         ……
                                                                                              IBM.CO        ……
                                                                                                 ……         ……

    Simulation Step - MapReduce                                                                    V1 V2 V3
                                                                                                   SIMULATION




MAP                                                       REDUCE
- Read-through portfolio data                             - For the Underlyer, perform 10k random
- Emit (K,V) as                                           walks in parallel
(Underlyer,InstrumentDetails)                             - For each random walk output, simulate
e.g. (IBM, IBM.CO.DEC14.225)                              derivative prices
                                                          - Emit 10k sets of simulated prices of the
                                                          stock and associated derivatives i.e.
                                                          IBM                  , [V1, V2, …..V10000]
                                                          IBM.CO.DEC14.225 , [V1, V2, …..V10000]
Job job = new Job(getConf());
SecurityAttributes stockAttrib = (SecurityAttributes) iter.next();
job.setJobName("RandomValuationGenerator");
simPricesStock = getSimPricesForStock(stockAttrib);
job.setMapperClass(SecurityAttributeMapper.class);
writeReducerOutput(stockAttrib, simPricesStock, context);
job.setReducerClass(PriceSimulationsReducer.class);
…
public void BlackScholesMertonPricingOption(); Context context) throws IOException,
bsmp = new map(LongWritable key, Text value,
InterruptedException { {
while (iter.hasNext())
SecurityAttributes sa secAttribs = iter.next();
  SecurityAttributes = RecordsReadHelper.readAttribs(value.toString());
     writeReducerOutput(secAttribs,getSimPricesForOptions(
       context.write(new Text(sa.getUnderlyer()), sa);
}   simPricesStock, bsmp, secAttribs), context);
}
                  © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                                            17
HLV1 =
                                                                                              … iVi)=
                                                                                              HLV2
                                                                                              (∑A 1

Aggregation Step MapReduce                                                                    (∑AiVi) 2
                                                                                              …Aggregation
                                                                                               Aggregation




MAP                                                        REDUCE
- Read-through de-normalized                               • For the hierarchy level (e.g. US|ERIC),
portfolio data                                                perform ∑AiVi for each simulation and
- Emit (K,V) as (Hierarchy-level,                             get simulated portfolio values - HLVi
Position Details)                                          • Sort HLVi , find 1%, 5% and 10% values
US        , [IBM, 225, 191.23]                               and emit position and VaR data
US|Tech , [IBM, 400, 191.23]
US|Tech|Eric , [IBM, 400, 191.23]



Map<String, Double> portfolioPositionData = combineInputForPFPositionData(rows);
Map<String, Double[]> simulatedPrices=
protected void map(LongWritable key, HoldingWritable value, Context context)
loadSimulatedPrices(portfolioPositionData.keySet());
  throws java.io.IOException ,InterruptedException {
for(long i=0; i<NO_OF_SIMULATIONS-1; i++) {
  SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString());
   simulatedPFValues.add(getPFSimulatedValue(i,
   Set<String> hierarchyLevels = sa.getHierarchyLevels();
portfolioPositionData, simulatedPrices)); }
   for (String hierarchyLevel : hierarchyLevels) {
Collections.sort(simulatedPFValues);
      context.write(new Text(hierarchyLevel), new
Text(sa.getPositionDtls()));simulatedPFValues);
emitResults(portfolioPositionData,
  }            © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                                           18
DEMO RUN




   © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   21
Observations
• As expected, processing time of Map jobs increased marginally
 when input data volume was increased
• Process was IO-bound on Simulation’s Reduce job as
 intermediate data emitted was huge
• Data replication factor needs to be chosen carefully
• MapReduce jobs should be designed such that Map/Reduce
 output is not huge




           © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL    22
Questions?


© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   23
Thank You!


© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   24
Appendix




    © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   25
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   26
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   27
Let’s build a Simple Map Reduce Job
    Problem Statement: Across a huge set of documents, we need to find all locations (i.e.
    document, page, line) for all words having more than 10 characters.




D
A
T
A
N
O
D
E
2                                                        STORAGE

D
A
T
A
N
O
D
E
1

           Store                           Map
                    © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL                      28
© COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL   29

More Related Content

PDF
Presentation hawgscuff fall2012
PDF
Redefining Perspectives edition 12 and 13 session 2
PDF
Redefining Perspectives 4 - Metro ui Session 1
PPTX
Redefining Perspectives - June 2015
PPTX
Ppt adithyawarman suaib proposal
DOCX
Hadoop online training by certified trainer
PDF
PDF
The Next Wave of Computing
Presentation hawgscuff fall2012
Redefining Perspectives edition 12 and 13 session 2
Redefining Perspectives 4 - Metro ui Session 1
Redefining Perspectives - June 2015
Ppt adithyawarman suaib proposal
Hadoop online training by certified trainer
The Next Wave of Computing

Similar to Risk managementusinghadoop (20)

PDF
Scalability 09262012
PDF
Pune open cloudfoundry keynote niranjan maka share
KEY
Consolidated shared indexes in real time
PDF
Mainframe
PDF
Cloud Foundry Open Tour India 2012 , Keynote
PPTX
Qubism and scala nlp
PDF
How InfluxDB Enables NodeSource to Run Extreme Levels of Node.js Processes
PDF
Fachseminar Wcms 2008 Day
PDF
Evento Startup Essential Barcelona
PPTX
Do You Really Need to Evolve From Monitoring to Observability?
PDF
Porting a new architecture (NDS32) to open wrt project
PDF
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
PDF
ICEBreaker Option: S- functions Verification
PDF
IRJET- Design and HW/SW Implementation of a Nonlinear Interpolator for Border...
PDF
Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...
PDF
PGConf.ASIA 2019 Bali - Toward Implementing Incremental View Maintenance on P...
PDF
Understanding IBM Tivoli OMEGAMON for DB2 Batch Reporting, Customization and ...
PDF
Transparency, Productivity and What-If Modeling with HPCM
PDF
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
PDF
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
Scalability 09262012
Pune open cloudfoundry keynote niranjan maka share
Consolidated shared indexes in real time
Mainframe
Cloud Foundry Open Tour India 2012 , Keynote
Qubism and scala nlp
How InfluxDB Enables NodeSource to Run Extreme Levels of Node.js Processes
Fachseminar Wcms 2008 Day
Evento Startup Essential Barcelona
Do You Really Need to Evolve From Monitoring to Observability?
Porting a new architecture (NDS32) to open wrt project
HOUSE PRICE ESTIMATION USING DATA SCIENCE AND ML
ICEBreaker Option: S- functions Verification
IRJET- Design and HW/SW Implementation of a Nonlinear Interpolator for Border...
Help, My Kafka is Broken! (Emma Humber & Gantigmaa Selenge, IBM) Kafka Summit...
PGConf.ASIA 2019 Bali - Toward Implementing Incremental View Maintenance on P...
Understanding IBM Tivoli OMEGAMON for DB2 Batch Reporting, Customization and ...
Transparency, Productivity and What-If Modeling with HPCM
VoltDB and Flytxt Present: Building a Single Technology Platform for Real-Tim...
Predicting When Your Applications Will Go Off the Rails! Managing DB2 Appli...
Ad

Recently uploaded (20)

DOCX
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
PPTX
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
PPTX
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
PPTX
Amazon (Business Studies) management studies
PDF
Training And Development of Employee .pdf
PPT
340036916-American-Literature-Literary-Period-Overview.ppt
PDF
Business model innovation report 2022.pdf
DOCX
Business Management - unit 1 and 2
PDF
Chapter 5_Foreign Exchange Market in .pdf
PDF
Unit 1 Cost Accounting - Cost sheet
PDF
DOC-20250806-WA0002._20250806_112011_0000.pdf
PPTX
Principles of Marketing, Industrial, Consumers,
PPTX
Belch_12e_PPT_Ch18_Accessible_university.pptx
PDF
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
PDF
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
PDF
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
PDF
Reconciliation AND MEMORANDUM RECONCILATION
PDF
WRN_Investor_Presentation_August 2025.pdf
PDF
How to Get Business Funding for Small Business Fast
PDF
COST SHEET- Tender and Quotation unit 2.pdf
unit 2 cost accounting- Tender and Quotation & Reconciliation Statement
job Avenue by vinith.pptxvnbvnvnvbnvbnbmnbmbh
CkgxkgxydkydyldylydlydyldlyddolydyoyyU2.pptx
Amazon (Business Studies) management studies
Training And Development of Employee .pdf
340036916-American-Literature-Literary-Period-Overview.ppt
Business model innovation report 2022.pdf
Business Management - unit 1 and 2
Chapter 5_Foreign Exchange Market in .pdf
Unit 1 Cost Accounting - Cost sheet
DOC-20250806-WA0002._20250806_112011_0000.pdf
Principles of Marketing, Industrial, Consumers,
Belch_12e_PPT_Ch18_Accessible_university.pptx
SIMNET Inc – 2023’s Most Trusted IT Services & Solution Provider
BsN 7th Sem Course GridNNNNNNNN CCN.pdf
kom-180-proposal-for-a-directive-amending-directive-2014-45-eu-and-directive-...
Reconciliation AND MEMORANDUM RECONCILATION
WRN_Investor_Presentation_August 2025.pdf
How to Get Business Funding for Small Business Fast
COST SHEET- Tender and Quotation unit 2.pdf
Ad

Risk managementusinghadoop

  • 1. Welcome to Redefining Perspectives November 2012
  • 2. Capital Markets Risk Management And Hadoop Kevin Samborn and Nitin Agrawal © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 2
  • 3. Agenda • Risk Management • Hadoop • Monte Carlo VaR Implementation •Q&A © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 4
  • 4. Risk Management © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 5
  • 5. What is Risk Management • Risk is a tool – the goal is to optimize and understand risk o Too much risk is locally and systemically dangerous o Too little risk means the firm may be “leaving profit on the table” • Portfolio exposure o Modern portfolios contain many different types of assets o Simple instruments, Complex instruments and derivatives • Many types of risk measures o Defined scenario-based stress testing o Value at Risk (VaR) o “Sensitivities” • Key is valuation under different scenarios • VaR is used in banking regulations, margin calculations and risk management © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 6
  • 6. Value at Risk (VaR) • VaR is a statistical measure of risk – expressed as amount of loss given probability %. E.g. 97.5% chance that the firm will not lose more than 1mill USD over the next 5 days • Computing VaR is a challenging data sourcing and compute intensive process • VaR calculation: o Generate statistical scenarios of market behavior o Revalue the portfolio for each scenario, compare returns to today’s value o Sort results and select the desired percentage return: VALUE AT RISK • Different VaR techniques: o Parametric – analytic approximation o Historical – captures real (historical) market dynamics o Monte Carlo – many scenarios, depends on statistical distributions © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 7
  • 7. VaR Graphically Source: An Introduction To Value at Risk (VAR), Investopedia, May 2010 © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 8
  • 8. Complexities • For modern financial firms, VaR is complex. Calculation requirements: o Different types of assets require different valuation models • Risk-based approach • Full revaluation o With large numbers of scenarios, many thousands of calculations are required o Monte Carlo simulations require significant calibration, depending on large historical data • Many different reporting dimensions o VaR is not additive across dimensions. Product/asset class, Currency o Portfolio – including “what-if” and intraday activity • Intraday market changes requiring new simulations • Incremental VaR – how does a single (new) trade contribute to the total © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 9
  • 9. Backtesting VaR © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 10
  • 10. © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 11
  • 11. Hadoop Core • Data stored with REDUNDANCY on a • Provides an EASY ABSTRACTION for Distributed File System processing large data sets • Abstracts H/W FAILURES delivering a • Infrastructure for PARALLEL DATA highly-available service on PROCESSING across huge COMMODITY H/W Commodity cluster • SCALES-UP from single to thousands • Infrastructure for TASK and LOAD of nodes MANAGEMENT • Data stored WITHOUT A SCHEMA • Framework achieves DATA-PROCESS • Tuned for SEQUENTIAL DATA ACCESS LOCALITY Makes two critical assumptions though: • Data doesn’t need to be updated • Data doesn’t need to be accessed randomly © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 12
  • 12. A Simple Map Reduce Job Problem Statement: From historical price data, create frequency distribution of 1-day %age change for various stocks Stock Date Open Close BP|1, 33 Map 1 S Reduce 1 BP 23-Nov 435.25 435.5 O BP|2, 64 NXT 23-Nov 3598 3620 R … MKS 23-Nov 378.5 380.7 BP 22-Nov 434.8 433.6 T NXT 22-Nov 3579 3603 Map 2 / Reduce 2 NXT|81, 2 MKS 22-Nov 377.8 378 S NXT|-20, 5 BP 21-Nov 430.75 433 H NXT 21-Nov 3574 3582 … U MKS 21-Nov 375 376 F Reduce 3 Output3 BP 20-Nov 430.9 432.25 F NXT 20-Nov 3592 3600 MKS 20-Nov 373.7 375.3 Map M L BP 19-Nov 422.5 431.6 E NXT 19-Nov 3560 3600 MKS 19-Nov 368.5 372.6 Reduce N Output N BP 16-Nov 423.9 416.6 NXT 16-Nov 3575 3542 MKS 16-Nov 370.3 366.4 BP public void reduce(Text key, Iterable<IntWritable> values, 15-Nov public void map(LongWritable key, Text value, Context 422 425.4 NXT Context context) throws IOException, InterruptedException { 15-Nov 3596 3550 context) throwsLong> freqDist InterruptedException { Map<Integer, IOException, = buildFreqDistribution(values); MKS 15-Nov 376.5 370.6 SecurityAttributes sa = Set<Integer> percentChanges = freqDist.keySet(); RecordsReadHelper.readAttribs(value.toString()); for (Integer percentChange : percentChanges) { context.write(new Text(sa.getTicker()), + "|" + percentChange.toString()), context.write(new Text(key.toString() new IntWritable(sa.getPercentChange())); new LongWritable(freqDist.get(percentChange))); } } © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 13
  • 13. Hadoop Ecosystem | How/Where These Fit VISUALIZATION TOOLS USERS DATA WAREHOUSE PROCESSING Sqoop Zoo hiho Keeper Scribe HUE Flume LOAD STORAGE SUPPORT © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 14
  • 14. Monte-Carlo VaR Implementation © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 15
  • 15. Monte Carlo VaR 2 Steps IBM … … MSFT … … HLV1 = (∑AiVi) 1 IBM.CO … … HLV2 = (∑AiVi) 2 … … … … … … V1 V2 V3 V10,000 HLV10k= (∑AiVi) 10k Aggregation SIMULATION Aggregation AGGREGATION Challenges  Daily trade data could be massive  Valuations are Compute intensive  VaR is not a simple arithmetic sum across hierarchies © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 16
  • 16. IBM …… MSFT …… IBM.CO …… …… …… Simulation Step - MapReduce V1 V2 V3 SIMULATION MAP REDUCE - Read-through portfolio data - For the Underlyer, perform 10k random - Emit (K,V) as walks in parallel (Underlyer,InstrumentDetails) - For each random walk output, simulate e.g. (IBM, IBM.CO.DEC14.225) derivative prices - Emit 10k sets of simulated prices of the stock and associated derivatives i.e. IBM , [V1, V2, …..V10000] IBM.CO.DEC14.225 , [V1, V2, …..V10000] Job job = new Job(getConf()); SecurityAttributes stockAttrib = (SecurityAttributes) iter.next(); job.setJobName("RandomValuationGenerator"); simPricesStock = getSimPricesForStock(stockAttrib); job.setMapperClass(SecurityAttributeMapper.class); writeReducerOutput(stockAttrib, simPricesStock, context); job.setReducerClass(PriceSimulationsReducer.class); … public void BlackScholesMertonPricingOption(); Context context) throws IOException, bsmp = new map(LongWritable key, Text value, InterruptedException { { while (iter.hasNext()) SecurityAttributes sa secAttribs = iter.next(); SecurityAttributes = RecordsReadHelper.readAttribs(value.toString()); writeReducerOutput(secAttribs,getSimPricesForOptions( context.write(new Text(sa.getUnderlyer()), sa); } simPricesStock, bsmp, secAttribs), context); } © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 17
  • 17. HLV1 = … iVi)= HLV2 (∑A 1 Aggregation Step MapReduce (∑AiVi) 2 …Aggregation Aggregation MAP REDUCE - Read-through de-normalized • For the hierarchy level (e.g. US|ERIC), portfolio data perform ∑AiVi for each simulation and - Emit (K,V) as (Hierarchy-level, get simulated portfolio values - HLVi Position Details) • Sort HLVi , find 1%, 5% and 10% values US , [IBM, 225, 191.23] and emit position and VaR data US|Tech , [IBM, 400, 191.23] US|Tech|Eric , [IBM, 400, 191.23] Map<String, Double> portfolioPositionData = combineInputForPFPositionData(rows); Map<String, Double[]> simulatedPrices= protected void map(LongWritable key, HoldingWritable value, Context context) loadSimulatedPrices(portfolioPositionData.keySet()); throws java.io.IOException ,InterruptedException { for(long i=0; i<NO_OF_SIMULATIONS-1; i++) { SecurityAttributes sa = RecordsReadHelper.readAttribs(value.toString()); simulatedPFValues.add(getPFSimulatedValue(i, Set<String> hierarchyLevels = sa.getHierarchyLevels(); portfolioPositionData, simulatedPrices)); } for (String hierarchyLevel : hierarchyLevels) { Collections.sort(simulatedPFValues); context.write(new Text(hierarchyLevel), new Text(sa.getPositionDtls()));simulatedPFValues); emitResults(portfolioPositionData, } © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 18
  • 18. DEMO RUN © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 21
  • 19. Observations • As expected, processing time of Map jobs increased marginally when input data volume was increased • Process was IO-bound on Simulation’s Reduce job as intermediate data emitted was huge • Data replication factor needs to be chosen carefully • MapReduce jobs should be designed such that Map/Reduce output is not huge © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 22
  • 20. Questions? © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 23
  • 21. Thank You! © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 24
  • 22. Appendix © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 25
  • 23. © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 26
  • 24. © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 27
  • 25. Let’s build a Simple Map Reduce Job Problem Statement: Across a huge set of documents, we need to find all locations (i.e. document, page, line) for all words having more than 10 characters. D A T A N O D E 2 STORAGE D A T A N O D E 1 Store Map © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 28
  • 26. © COPYRIGHT 2012 SAPIENT CORPORATION | CONFIDENTIAL 29