SlideShare a Scribd company logo
5
Most read
10
Most read
13
Most read
Managing Python at scale
without breaking the bank
Michael (Misha) Tselman
PyData NY 2017
Agenda
• J.P. Morgan and Athena
• Objectives
• Continuous delivery
• Under the hood
• Challenges
• Conclusions
• Q&A
J.P. Morgan
• One of the world’s biggest banks
• $2.5 trillion assets
• $95 billion revenue
• Processing $5 trillion payments every day
• 230,000+ employees globally
• One of the world’s biggest tech companies
• 44,000+ employees in Technology
• $9.5 billion annual investment in technology and innovation
Athena
• Python-based Pricing, Trading, Risk Management, and Analytics
platform with tools for Data Science and Machine Learning
• Thousands of users across multiple business lines
• 1500+ Python developers use and contribute to the platform
• 150,000 python modules, 35 million lines of python code
• 500+ Python packages from the Open Source.
• Rapid development and deployment model that puts developers and
quants at the heart of the business.
Athena
Foundation
• Hydra ( globally replicated object database )
• Reactive Athena ( C++/Python reactive dataflow framework )
• Pixie Graph ( directed acyclic dependency graph )
• Athena Application framework based on QT
• Athena Web ( tornado, html5, websockets, javascript, web assembly )
• Job scheduling ( ~270,000 jobs daily which kick-off ~1M processes )
• Integration with Compute Grid ( tens of thousands of cores + GPUs )
Objectives
• Keep end-users and clients happy 
• Ensure robustness and stability of our production systems
• Keep developers productive and efficient
• Provide quants and data scientists with the best research tools
• Encourage sharing and global consistency across business lines
Approach
• Conceptually:
• Continuous delivery:
• 10,000 – 15,000 production changes every week.
• Full visibility of the entire code base. Anyone can contribute.
• Instant global deployment
• Under the hood:
• Globally replicated object databases for code (and data)
• Monorepo – Monolithic code base
• Extensively automated testing
Continuous delivery
Write code & tests Test Commit Ask for a bless Push Run
PROS
• Time to market
• User satisfaction
• Developer productivity
CONS
• Fear of change / stability
• High reliance on automation
• Tricky in distributed systems
10,000 - 15,000 modules pushed to production every week
Layering of changes / Effective runtime
Developer’s
layer
B3 C2
Shared staging /
UAT
A2
Effective
Runtime
A2 B3 C2 D1
B2
Production A1 B1 C1 D1
E1
E1
Alternatives to filesystem based source
DB-LDN DB-NYC DB-TKO
“lib.foo”
“
def hello():
print ‘world’
“lib.bar” def hello():
print ‘pydata’
“lib.bar @ 2017-10-01
12:33”
“
def hello():
print ‘jpmorgan’
“lib.bar @ 2017-09-21
10:16”
“...”
• Use globally replicated database
• Customize the importer
• SourceMarkers - Take advantage of transactions & timestamps
Python and Binary Runtime
prod old prod prod new
Python Source
C++ & 3rd party
Some Challenges
• Open source package upgrades
• API changes
• Change of pickled/stored representation
• Numerical changes
• Runtime/binary dependencies
• Limited branching
• Streamlines production
• Does not fit some research/experimental workflows
• Full reproducibility requires “freezing” all code including the binary train
Conclusions
• Python’s flexibility makes things easier
• Good integration tests ensure compatibility and consistency
• Modules don’t have to be loaded from a filesystem
• Production stability does not imply slow delivery and deployment
• Open source does not imply free
• Shared platform does not imply shared knowledge
References
• J.P. Morgan
http://www.jpmorgan.com/techcareers
• The motivation for a monolithic codebase
http://cacm.acm.org/magazines/2016/7/204032-why-google-stores-
billions-of-lines-of-code-in-a-single-repository/fulltext
Q&A

More Related Content

PDF
LCA14: LCA14-418: Testing a secure framework
PDF
BUD17-400: Secure Data Path with OPTEE
PPTX
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
PPTX
Ext4 write barrier
PDF
Page reclaim
PDF
Write your own telegraf plugin
PDF
YOW2018 Cloud Performance Root Cause Analysis at Netflix
PPTX
LCA14: LCA14-418: Testing a secure framework
BUD17-400: Secure Data Path with OPTEE
Realizzazione di Microservizi con Docker, Kubernetes, Kafka e Mongodb
Ext4 write barrier
Page reclaim
Write your own telegraf plugin
YOW2018 Cloud Performance Root Cause Analysis at Netflix

What's hot (20)

PDF
Secure storage updates - SFO17-309
PDF
LCU14 500 ARM Trusted Firmware
PDF
Istio Ambient Mesh in ACTION - Istio UG Singapore - 22June,2023
PDF
Accelerating Data Science With GPUs
PDF
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
PDF
Linux Preempt-RT Internals
PPTX
Linux and Java - Understanding and Troubleshooting
PPTX
Mono Repo
PDF
Systemd: the modern Linux init system you will learn to love
PDF
From DTrace to Linux
PDF
Cilium - Container Networking with BPF & XDP
PDF
Faster packet processing in Linux: XDP
PDF
OPA open policy agent
PDF
Gitops: the kubernetes way
PDF
LCA14: LCA14-502: The way to a generic TrustZone® solution
PDF
HKG15-311: OP-TEE for Beginners and Porting Review
PDF
Linux device driver
PDF
GitOps with ArgoCD
Secure storage updates - SFO17-309
LCU14 500 ARM Trusted Firmware
Istio Ambient Mesh in ACTION - Istio UG Singapore - 22June,2023
Accelerating Data Science With GPUs
HKG15-505: Power Management interactions with OP-TEE and Trusted Firmware
Linux Preempt-RT Internals
Linux and Java - Understanding and Troubleshooting
Mono Repo
Systemd: the modern Linux init system you will learn to love
From DTrace to Linux
Cilium - Container Networking with BPF & XDP
Faster packet processing in Linux: XDP
OPA open policy agent
Gitops: the kubernetes way
LCA14: LCA14-502: The way to a generic TrustZone® solution
HKG15-311: OP-TEE for Beginners and Porting Review
Linux device driver
GitOps with ArgoCD
Ad

Similar to Managing python at scale without breaking the bank (20)

PDF
Continuum Analytics and Python
PPTX
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
PPTX
TiConf Australia 2013
PPTX
GraphTour - Neo4j Database Overview
PPTX
New Technology for Modern Development Challenges
PPTX
Enabling application portability with the greatest of ease!
PPTX
Digital transformation and AI @Edge
PPT
Webinar: 2 Billion Data Points Each Day
PDF
Immutable Service Delivery Shenzhen 2016
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PPTX
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
PDF
Big data berlin
PPTX
Why cloud native matters
PDF
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
PPTX
Devops a la sauce Microsoft
PDF
Log Monitoring and Anomaly Detection at Scale at ORNL
PPTX
Effective Microservices In a Data-centric World
PDF
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
PDF
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
PPTX
16370 cics project opening and project update f
Continuum Analytics and Python
How to Manage Your Time Series Data Pipeline at the Edge with InfluxDB
TiConf Australia 2013
GraphTour - Neo4j Database Overview
New Technology for Modern Development Challenges
Enabling application portability with the greatest of ease!
Digital transformation and AI @Edge
Webinar: 2 Billion Data Points Each Day
Immutable Service Delivery Shenzhen 2016
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
"An introduction to Kx Technology - a Big Data solution", Kyra Coyne, Data Sc...
Big data berlin
Why cloud native matters
ITCamp 2011 - Cristian Lefter - SQL Server code-name Denali
Devops a la sauce Microsoft
Log Monitoring and Anomaly Detection at Scale at ORNL
Effective Microservices In a Data-centric World
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
Continuously Deploying Culture: Scaling Culture at Etsy - Velocity Europe 2012
16370 cics project opening and project update f
Ad

More from PyData (20)

PDF
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
PDF
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
PDF
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
PDF
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
PDF
Deploying Data Science for Distribution of The New York Times - Anne Bauer
PPTX
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
PPTX
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
PDF
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
PDF
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
PDF
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
PDF
Words in Space - Rebecca Bilbro
PDF
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
PPTX
Pydata beautiful soup - Monica Puerto
PDF
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
PPTX
Extending Pandas with Custom Types - Will Ayd
PDF
Measuring Model Fairness - Stephen Hoover
PDF
What's the Science in Data Science? - Skipper Seabold
PDF
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
PDF
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
PDF
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...
Michal Mucha: Build and Deploy an End-to-end Streaming NLP Insight System | P...
Unit testing data with marbles - Jane Stewart Adams, Leif Walsh
The TileDB Array Data Storage Manager - Stavros Papadopoulos, Jake Bolewski
Using Embeddings to Understand the Variance and Evolution of Data Science... ...
Deploying Data Science for Distribution of The New York Times - Anne Bauer
Graph Analytics - From the Whiteboard to Your Toolbox - Sam Lerma
Do Your Homework! Writing tests for Data Science and Stochastic Code - David ...
RESTful Machine Learning with Flask and TensorFlow Serving - Carlo Mazzaferro
Mining dockless bikeshare and dockless scootershare trip data - Stefanie Brod...
Avoiding Bad Database Surprises: Simulation and Scalability - Steven Lott
Words in Space - Rebecca Bilbro
End-to-End Machine learning pipelines for Python driven organizations - Nick ...
Pydata beautiful soup - Monica Puerto
1D Convolutional Neural Networks for Time Series Modeling - Nathan Janos, Jef...
Extending Pandas with Custom Types - Will Ayd
Measuring Model Fairness - Stephen Hoover
What's the Science in Data Science? - Skipper Seabold
Applying Statistical Modeling and Machine Learning to Perform Time-Series For...
Solving very simple substitution ciphers algorithmically - Stephen Enright-Ward
The Face of Nanomaterials: Insightful Classification Using Deep Learning - An...

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Approach and Philosophy of On baking technology
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Machine learning based COVID-19 study performance prediction
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Encapsulation theory and applications.pdf
PDF
Electronic commerce courselecture one. Pdf
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Unlocking AI with Model Context Protocol (MCP)
NewMind AI Weekly Chronicles - August'25 Week I
Approach and Philosophy of On baking technology
Mobile App Security Testing_ A Comprehensive Guide.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Per capita expenditure prediction using model stacking based on satellite ima...
“AI and Expert System Decision Support & Business Intelligence Systems”
Dropbox Q2 2025 Financial Results & Investor Presentation
The Rise and Fall of 3GPP – Time for a Sabbatical?
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Machine learning based COVID-19 study performance prediction
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
MIND Revenue Release Quarter 2 2025 Press Release
Chapter 3 Spatial Domain Image Processing.pdf
Spectral efficient network and resource selection model in 5G networks
Encapsulation theory and applications.pdf
Electronic commerce courselecture one. Pdf
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Unlocking AI with Model Context Protocol (MCP)

Managing python at scale without breaking the bank

  • 1. Managing Python at scale without breaking the bank Michael (Misha) Tselman PyData NY 2017
  • 2. Agenda • J.P. Morgan and Athena • Objectives • Continuous delivery • Under the hood • Challenges • Conclusions • Q&A
  • 3. J.P. Morgan • One of the world’s biggest banks • $2.5 trillion assets • $95 billion revenue • Processing $5 trillion payments every day • 230,000+ employees globally • One of the world’s biggest tech companies • 44,000+ employees in Technology • $9.5 billion annual investment in technology and innovation
  • 4. Athena • Python-based Pricing, Trading, Risk Management, and Analytics platform with tools for Data Science and Machine Learning • Thousands of users across multiple business lines • 1500+ Python developers use and contribute to the platform • 150,000 python modules, 35 million lines of python code • 500+ Python packages from the Open Source. • Rapid development and deployment model that puts developers and quants at the heart of the business.
  • 5. Athena Foundation • Hydra ( globally replicated object database ) • Reactive Athena ( C++/Python reactive dataflow framework ) • Pixie Graph ( directed acyclic dependency graph ) • Athena Application framework based on QT • Athena Web ( tornado, html5, websockets, javascript, web assembly ) • Job scheduling ( ~270,000 jobs daily which kick-off ~1M processes ) • Integration with Compute Grid ( tens of thousands of cores + GPUs )
  • 6. Objectives • Keep end-users and clients happy  • Ensure robustness and stability of our production systems • Keep developers productive and efficient • Provide quants and data scientists with the best research tools • Encourage sharing and global consistency across business lines
  • 7. Approach • Conceptually: • Continuous delivery: • 10,000 – 15,000 production changes every week. • Full visibility of the entire code base. Anyone can contribute. • Instant global deployment • Under the hood: • Globally replicated object databases for code (and data) • Monorepo – Monolithic code base • Extensively automated testing
  • 8. Continuous delivery Write code & tests Test Commit Ask for a bless Push Run PROS • Time to market • User satisfaction • Developer productivity CONS • Fear of change / stability • High reliance on automation • Tricky in distributed systems 10,000 - 15,000 modules pushed to production every week
  • 9. Layering of changes / Effective runtime Developer’s layer B3 C2 Shared staging / UAT A2 Effective Runtime A2 B3 C2 D1 B2 Production A1 B1 C1 D1 E1 E1
  • 10. Alternatives to filesystem based source DB-LDN DB-NYC DB-TKO “lib.foo” “ def hello(): print ‘world’ “lib.bar” def hello(): print ‘pydata’ “lib.bar @ 2017-10-01 12:33” “ def hello(): print ‘jpmorgan’ “lib.bar @ 2017-09-21 10:16” “...” • Use globally replicated database • Customize the importer • SourceMarkers - Take advantage of transactions & timestamps
  • 11. Python and Binary Runtime prod old prod prod new Python Source C++ & 3rd party
  • 12. Some Challenges • Open source package upgrades • API changes • Change of pickled/stored representation • Numerical changes • Runtime/binary dependencies • Limited branching • Streamlines production • Does not fit some research/experimental workflows • Full reproducibility requires “freezing” all code including the binary train
  • 13. Conclusions • Python’s flexibility makes things easier • Good integration tests ensure compatibility and consistency • Modules don’t have to be loaded from a filesystem • Production stability does not imply slow delivery and deployment • Open source does not imply free • Shared platform does not imply shared knowledge
  • 14. References • J.P. Morgan http://www.jpmorgan.com/techcareers • The motivation for a monolithic codebase http://cacm.acm.org/magazines/2016/7/204032-why-google-stores- billions-of-lines-of-code-in-a-single-repository/fulltext
  • 15. Q&A

Editor's Notes

  • #9: Ghost tests.
  • #12: At the bottom, a more traditional release train of binaries and 3rd party packages. prod.new continuously changing until lockdown. Allows for seamless testing of new features against the python baseline. Globally distributed and instantly available for import in any region for any user. Not just a repo, but a fully deployed codebase at the same time.