SlideShare a Scribd company logo
Ruby to Scala in 9 Weeks 
Evolving antiquated software with modern tools 
Jake Utley (jutley@uw.edu) 
Sept. 9, 2014
WhitePages 2 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
WhitePages 3 
Scala and Ruby Similarities 
• Readable and concise 
• Object oriented 
• Highly composable (traits and mixins) 
• Highly adaptable
WhitePages 4 
Background: Location Services 
• An internal service at Whitepages, written in Ruby 
• Rationalizes Foursquare data against Whitepages people data 
• Modified to use data from Facebook, LinkedIn, and Twitter 
• Contained many issues that motivated rewrite 
– Unclear code 
– Sloppy code organization 
– Poor performance 
– Minimal documentation
WhitePages 5 
Migration to Scala 
• Concurrency 
– Ruby EventMachine vs. Scala Futures 
• Type system 
• Performance 
– Throughput 
– Latency 
– Hardware Utilization
CONCURRENCY
• Ruby: No built-in concurrency 
– Many implementations are single-threaded 
– Cooperative multitasking comes from 
external libraries 
• Scala: Built-in concurrency 
– Built on top of the JVM 
– Supports standard concurrency models 
o Futures, Actors 
WhitePages 7 
Concurrency
• Ruby library for cooperative 
multitasking 
• Uses the reactor pattern 
• Dangerous: blocking 
incorrectly can kill entire app 
• Long chains of callbacks 
WhitePages 8 
Ruby Concurrency: EventMachine 
Location Services author created 
DeferrableSequence to manage callbacks 
Wikipedia: http://bit.ly/1qHMez5
WhitePages 9 
Scala Concurrency: Futures 
val x = Future { 
1 + 1 
} 
val y = x map { case two => 
two * 2 
} 
y map println 
• Well known concurrency model 
• Built into Scala 
• Functional and imperative: 
– Composition (via map, zip, etc.) 
– Callbacks (onComplete, etc.) 
• Blends nicely into other code 
• Can be treated like a 1-element 
collection
WhitePages 10
val x = Future { 
1 + 1 
} 
val y = x map { case two => 
two * 2 
} 
y map println 
WhitePages Confidential 11 
Scala Concurrency: Futures 
• Drawbacks 
– Have to be used in a non-blocking 
way (without Await) 
– Confusing types 
o Future[Future[Int]] 
o Seq[Future[Int]] 
o flatMap and sequence functions 
help to avoid these types 
– Difficult to debug
WhitePages 12 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
TYPE SYSTEMS
WhitePages 14 
Type Systems 
• Ruby: Dynamically typed 
• Scala: Statically typed 
• Both have advantages depending on the circumstance
WhitePages 15 
Type Systems: Ruby 
• Dynamic typing 
– Types are checked at runtime 
– More flexibility 
– Easier to express ideas -> faster prototyping 
– More room for errors 
– Less self-documentation, potentially harder 
for others to understand code 
– Defensive coding
WhitePages 16 
Type Systems: Scala 
• Static typing 
– Compiler type checks 
– Many errors are caught at compile time 
– Argument types are always documented 
– Easier for others to maintain code 
– Strict contracts 
– IDEs can tell us the types of any variable 
– Drawback: An IDE becomes a crutch
WhitePages 17 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
PERFORMANCE
Performance: Methodology 
• Requests from production sent to services for 5 
WhitePages 19 
minute interval 
• Request rate increased until majority of 
requests time out (10 seconds) 
• Each service ran on identical hardware 
• 30 seconds warmups 
• 3 trials at each request rate
WhitePages 20 
Performance tools: Onslaught 
• Performance testing tool built at Whitepages 
• Reports throughput, p50, p75, p95, p99, p999, 
mean, and max latencies 
• Plan to make Onslaught open-source
WhitePages 21 
Performance: Expectations 
• Non-blocking I/O 
– Higher throughput 
– Better CPU usage (no waits) 
– Higher memory usage 
• JVM optimizations 
– Lower latency
Throughput (Scala) 
WhitePages 22 
Performance: Throughput 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
Throughput (Ruby) 
25 50 75 100 125 150 
Successful responses/s 
Requests/s 
450 
400 
350 
300 
250 
200 
150 
100 
50 
0 
100 200 300 400 500 600 
Successful responses/s 
Requests/s
WhitePages 23 
Performance: Latency 
10000000 
9000000 
8000000 
7000000 
6000000 
5000000 
4000000 
3000000 
2000000 
1000000 
0 
Latency (Ruby) 
25 50 75 100 125 150 
Latency (μs) 
Request rate (Req/s) 
p50 
p95 
p99 
10000000 
9000000 
8000000 
7000000 
6000000 
5000000 
4000000 
3000000 
2000000 
1000000 
0 
Latency (Scala) 
100 200 300 400 500 600 
Latency (μs) 
Request rate (Req/s) 
p50 
p95 
p99
WhitePages 24 
Performance: CPU
WhitePages 25 
Performance: CPU
WhitePages 26 
Performance: Memory
WhitePages 27 
Areas of interest over SDLC 
• Understanding the problem 
• Exploring solution space 
• Speed of prototyping 
• Performance 
• Maintainability 
• Integration 
prototyping production
WhitePages 28 
Summary 
Scala or Ruby? 
• Prototyping: 
– Ruby’s dynamic typing allows for fast prototyping 
• Production: 
– Scala’s static typing catches more errors and can make code clearer 
– Scala supports concurrency with standard library, Ruby does not 
– Scala performs better than Ruby in throughput and hardware utilization
Thank you. 
Questions?

More Related Content

PDF
How to build an event driven architecture with kafka and kafka connect
PDF
Clovaを支える技術 機械学習配信基盤のご紹介
PDF
LINEデリマでのElasticsearchの運用と監視の話
PPTX
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
PDF
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
PDF
Kafka elastic search meetup 09242018
PDF
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
PDF
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...
How to build an event driven architecture with kafka and kafka connect
Clovaを支える技術 機械学習配信基盤のご紹介
LINEデリマでのElasticsearchの運用と監視の話
.NET Core Summer event 2019 in Brno, CZ - .NET Core Networking stack and perf...
Javantura v4 - Java or Scala – Web development with Playframework 2.5.x - Kre...
Kafka elastic search meetup 09242018
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Kafka for Microservices – You absolutely need Avro Schemas! | Gerardo Gutierr...

What's hot (20)

PDF
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
PDF
Getting started with Riak in the Cloud
PPTX
Lessons Learned from Building and Operating Scuba
PPTX
Parallel programming in .NET
PDF
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
PDF
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
PDF
Topic and schema management-meetupberlin
PDF
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
PDF
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
PPTX
NRD: Nagios Result Distributor
PDF
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
PPTX
Kafka - Linkedin's messaging backbone
PDF
Streaming your Lyft Ride Prices - Flink Forward SF 2019
PDF
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
PDF
#NetflixEverywhere Global Architecture
PDF
Slick 3.0 functional programming and db side effects
PDF
Alexander Kolb – Flink. Yet another Streaming Framework?
PDF
Looking towards an official cassandra sidecar netflix
PDF
From Three Nines to Five Nines - A Kafka Journey
PDF
OSOM Operations in the Cloud
Multi cluster, multitenant and hierarchical kafka messaging service slideshare
Getting started with Riak in the Cloud
Lessons Learned from Building and Operating Scuba
Parallel programming in .NET
Flink Forward Berlin 2017: Zohar Mizrahi - Python Streaming API
Kafka Summit SF 2017 - MultiCluster, MultiTenant and Hierarchical Kafka Messa...
Topic and schema management-meetupberlin
Better Kafka Performance Without Changing Any Code | Simon Ritter, Azul
Administrative techniques to reduce Kafka costs | Anna Kepler, Viasat
NRD: Nagios Result Distributor
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Kafka - Linkedin's messaging backbone
Streaming your Lyft Ride Prices - Flink Forward SF 2019
Error Resilient Design: Building Scalable & Fault-Tolerant Microservices with...
#NetflixEverywhere Global Architecture
Slick 3.0 functional programming and db side effects
Alexander Kolb – Flink. Yet another Streaming Framework?
Looking towards an official cassandra sidecar netflix
From Three Nines to Five Nines - A Kafka Journey
OSOM Operations in the Cloud
Ad

Viewers also liked (12)

PDF
Erlang web framework: Chicago boss
PDF
NoSQL CGN: Riak (01/2012)
PDF
Einführung in nosql // ArangoDB mit Symfony 2
PPTX
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
PDF
NoSQL CGN: CouchDB (11/2011)
PDF
Symfony2 Workshop PHP Summit 2013
PDF
Experience with C++11 in ArangoDB
PDF
Artist in Transit @RSE11
PDF
Der Visuelle Virus auf der Arbeit @RSE13
PDF
OpenData - Was hat das mit mir zu tun? @RSE13
PDF
Movement, Empathie und die Sehnsucht nach Rhythmus
PDF
Usability im web
Erlang web framework: Chicago boss
NoSQL CGN: Riak (01/2012)
Einführung in nosql // ArangoDB mit Symfony 2
NoSQL - Neue Ansätze zur Verwaltung unstrukturierter Daten
NoSQL CGN: CouchDB (11/2011)
Symfony2 Workshop PHP Summit 2013
Experience with C++11 in ArangoDB
Artist in Transit @RSE11
Der Visuelle Virus auf der Arbeit @RSE13
OpenData - Was hat das mit mir zu tun? @RSE13
Movement, Empathie und die Sehnsucht nach Rhythmus
Usability im web
Ad

Similar to Ruby to Scala in 9 weeks (20)

PDF
Making Apache Kafka Even Faster And More Scalable
PPT
Writing DSL's in Scala
PPTX
Intro to Big Data and NoSQL
PDF
Using Scala for building DSLs
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
PDF
Micheal Pershyn "Coljure 4 Big Data"
PPTX
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
PPTX
Building FoundationDB
PDF
Scala Days Highlights | BoldRadius
PDF
John adams talk cloudy
PDF
GraphQL API on a Serverless Environment
PDF
DrupalSouth 2015 - Performance: Not an Afterthought
PPT
Large-scale projects development (scaling LAMP)
PDF
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
PDF
Low Latency Polyglot Model Scoring using Apache Apex
PDF
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
PDF
Chirp 2010: Scaling Twitter
PDF
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
PDF
Streaming Analytics with Spark, Kafka, Cassandra and Akka
PDF
Hadoop and Spark
Making Apache Kafka Even Faster And More Scalable
Writing DSL's in Scala
Intro to Big Data and NoSQL
Using Scala for building DSLs
Capital One Delivers Risk Insights in Real Time with Stream Processing
Micheal Pershyn "Coljure 4 Big Data"
A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...
Building FoundationDB
Scala Days Highlights | BoldRadius
John adams talk cloudy
GraphQL API on a Serverless Environment
DrupalSouth 2015 - Performance: Not an Afterthought
Large-scale projects development (scaling LAMP)
Virtual Flink Forward 2020: How Streaming Helps Your Staging Environment and ...
Low Latency Polyglot Model Scoring using Apache Apex
Jay Kreps on Project Voldemort Scaling Simple Storage At LinkedIn
Chirp 2010: Scaling Twitter
Apache Kylin: OLAP Engine on Hadoop - Tech Deep Dive
Streaming Analytics with Spark, Kafka, Cassandra and Akka
Hadoop and Spark

Recently uploaded (20)

DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Encapsulation theory and applications.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Machine learning based COVID-19 study performance prediction
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Empathic Computing: Creating Shared Understanding
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
KodekX | Application Modernization Development
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
The AUB Centre for AI in Media Proposal.docx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Building Integrated photovoltaic BIPV_UPV.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Encapsulation theory and applications.pdf
A Presentation on Artificial Intelligence
Review of recent advances in non-invasive hemoglobin estimation
Machine learning based COVID-19 study performance prediction
Encapsulation_ Review paper, used for researhc scholars
Per capita expenditure prediction using model stacking based on satellite ima...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Empathic Computing: Creating Shared Understanding
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
KodekX | Application Modernization Development
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Ruby to Scala in 9 weeks

  • 1. Ruby to Scala in 9 Weeks Evolving antiquated software with modern tools Jake Utley (jutley@uw.edu) Sept. 9, 2014
  • 2. WhitePages 2 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 3. WhitePages 3 Scala and Ruby Similarities • Readable and concise • Object oriented • Highly composable (traits and mixins) • Highly adaptable
  • 4. WhitePages 4 Background: Location Services • An internal service at Whitepages, written in Ruby • Rationalizes Foursquare data against Whitepages people data • Modified to use data from Facebook, LinkedIn, and Twitter • Contained many issues that motivated rewrite – Unclear code – Sloppy code organization – Poor performance – Minimal documentation
  • 5. WhitePages 5 Migration to Scala • Concurrency – Ruby EventMachine vs. Scala Futures • Type system • Performance – Throughput – Latency – Hardware Utilization
  • 7. • Ruby: No built-in concurrency – Many implementations are single-threaded – Cooperative multitasking comes from external libraries • Scala: Built-in concurrency – Built on top of the JVM – Supports standard concurrency models o Futures, Actors WhitePages 7 Concurrency
  • 8. • Ruby library for cooperative multitasking • Uses the reactor pattern • Dangerous: blocking incorrectly can kill entire app • Long chains of callbacks WhitePages 8 Ruby Concurrency: EventMachine Location Services author created DeferrableSequence to manage callbacks Wikipedia: http://bit.ly/1qHMez5
  • 9. WhitePages 9 Scala Concurrency: Futures val x = Future { 1 + 1 } val y = x map { case two => two * 2 } y map println • Well known concurrency model • Built into Scala • Functional and imperative: – Composition (via map, zip, etc.) – Callbacks (onComplete, etc.) • Blends nicely into other code • Can be treated like a 1-element collection
  • 11. val x = Future { 1 + 1 } val y = x map { case two => two * 2 } y map println WhitePages Confidential 11 Scala Concurrency: Futures • Drawbacks – Have to be used in a non-blocking way (without Await) – Confusing types o Future[Future[Int]] o Seq[Future[Int]] o flatMap and sequence functions help to avoid these types – Difficult to debug
  • 12. WhitePages 12 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 14. WhitePages 14 Type Systems • Ruby: Dynamically typed • Scala: Statically typed • Both have advantages depending on the circumstance
  • 15. WhitePages 15 Type Systems: Ruby • Dynamic typing – Types are checked at runtime – More flexibility – Easier to express ideas -> faster prototyping – More room for errors – Less self-documentation, potentially harder for others to understand code – Defensive coding
  • 16. WhitePages 16 Type Systems: Scala • Static typing – Compiler type checks – Many errors are caught at compile time – Argument types are always documented – Easier for others to maintain code – Strict contracts – IDEs can tell us the types of any variable – Drawback: An IDE becomes a crutch
  • 17. WhitePages 17 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 19. Performance: Methodology • Requests from production sent to services for 5 WhitePages 19 minute interval • Request rate increased until majority of requests time out (10 seconds) • Each service ran on identical hardware • 30 seconds warmups • 3 trials at each request rate
  • 20. WhitePages 20 Performance tools: Onslaught • Performance testing tool built at Whitepages • Reports throughput, p50, p75, p95, p99, p999, mean, and max latencies • Plan to make Onslaught open-source
  • 21. WhitePages 21 Performance: Expectations • Non-blocking I/O – Higher throughput – Better CPU usage (no waits) – Higher memory usage • JVM optimizations – Lower latency
  • 22. Throughput (Scala) WhitePages 22 Performance: Throughput 450 400 350 300 250 200 150 100 50 0 Throughput (Ruby) 25 50 75 100 125 150 Successful responses/s Requests/s 450 400 350 300 250 200 150 100 50 0 100 200 300 400 500 600 Successful responses/s Requests/s
  • 23. WhitePages 23 Performance: Latency 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Latency (Ruby) 25 50 75 100 125 150 Latency (μs) Request rate (Req/s) p50 p95 p99 10000000 9000000 8000000 7000000 6000000 5000000 4000000 3000000 2000000 1000000 0 Latency (Scala) 100 200 300 400 500 600 Latency (μs) Request rate (Req/s) p50 p95 p99
  • 27. WhitePages 27 Areas of interest over SDLC • Understanding the problem • Exploring solution space • Speed of prototyping • Performance • Maintainability • Integration prototyping production
  • 28. WhitePages 28 Summary Scala or Ruby? • Prototyping: – Ruby’s dynamic typing allows for fast prototyping • Production: – Scala’s static typing catches more errors and can make code clearer – Scala supports concurrency with standard library, Ruby does not – Scala performs better than Ruby in throughput and hardware utilization

Editor's Notes

  • #2: Introduction Been moving a service from Ruby to Scala Sharing most interestings aspect of my experience moving a Ruby service that has outgrown Ruby, to Scala Disclaimer: Mostly subjective material, based on experience
  • #3: Over course of SDLC, we’re interested in different things Different languages allow us to explore different ideas better Ruby better for prototyping, Scala better for production
  • #4: Not “write-only”
  • #5: When just Foursquare, resembled prototype Too much added on at a bad state Use Scala for rewrite, better for production What makes Scala better for production? Most issues not due to Ruby
  • #8: Main implementations, MRI (Matz’s Ruby Interpreter) C-based, single threaded Similar issues with other interpretted languages (like python) Outside library I used: EventMachine In my case: EventMachine and Futures
  • #9: Ask who is familiar with reactor pattern (handler dispatches requests) because single threaded, blocking the thread kills the app long chains of callbacks: hard to trace code Author created DeferrableSequence to manage callbacks. Extra work, non-standard (unfamiliar, error-prone) Event machine only ever doing one thing at a time, not true concurrency
  • #10: Built in: reliable, familiar to other developers Callbacks for side effects (don’t return something) Composition for functional code
  • #11: Built in: reliable, familiar to other developers Callbacks for side effects (don’t return something) Composition for functional code
  • #12: Still need to avoid blocking, not as severe consequences as EM Overall: Often when prototyping, we’re not as concerned with concurrency, so Ruby works perfectly fine. In production, Scala is more appealing Other major difference: Type System
  • #13: Other difference: type system
  • #16: compiler doesn’t get in the way dynamically typed code has high potential to become hard to maintain code
  • #17: Compiler is our friend Code with a longer shelf life
  • #18: Type systems and concurrency different, impacts performance
  • #19: Dynamic vs. static, interpreted vs. compiled, different concurrency models: expect different performance
  • #21: Point out Devin and Paul
  • #23: Fairly significant improvement in throughput Seen higher improvements in other services, so we suspect that this can be improved further
  • #24: clarify x axis
  • #25: Over period of throughput testing Graph shows Idle CPU can be confusing upside down About the same amount of CPU use
  • #26: Graph shows Idle CPU can be confusing upside down
  • #27: Graph shows Idle CPU can be confusing upside down Overall: Scala seems to use more resources more heavily, but is much faster
  • #28: Does anybody not care about performance?