SlideShare a Scribd company logo
Scala collections
Sagie Davidovich
@mesagie
singularityworld.com
linkedin.com/in/sagied
Warm up example:
Fibonacci sequence
val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _)
Key concepts:
• Recursive values
• Streams
• Scan
• Binary place-holder notation
Immutable collections
You’ll know about
• Avoid memory allocation for empty collections
• Optimize for small collections
• Equal-hashCode contract
• Asymptotic behavior of common operations
Nil
List.empty and Nil are singletons.
No new memory is allocated
Option[A]
Immutable Sets – emptySet
emptySet is a singleton too
Immutable Sets – Set1
Optimized for sets of size 1
Immutable Sets – Set2
Optimized for sets of size 2
Immutable Sets – Set4
A HashSet is (finally) instantiated
Immutable Collections
Mutable Collections
One liners
Computing a derivative
def derivative(nums: Iterable[Double]) =
nums.sliding(2)
.map (pair => pair._2 - pair._1)
What can be improved in this solution?
Bonus question: change a few characters to find the
max slope
Counting occurrences (histogram)
"encyclopedia" groupBy identity mapValues (_.size)
Map (
e -> 2, n -> 1, y -> 1, a -> 1, i -> 1,
l -> 1, p -> 1, c -> 2, o -> 1, d -> 1
)
Word n-grams
val range = 1 to 3
val text = "hello sweet world"
val tokenize = (s: String) => s.split(" ")
range flatMap (size => tokenize(text) sliding size)
Result:
Vector(Array(hello), Array(sweet), Array(world), Array(hello,
sweet), Array(sweet, world), Array(hello, sweet, world))
Are all members of a greater than
corresponding members of b
val a = List(2,3,4)
val b = List(1,2,3)
// O(n^2) and not very elegant.
(0 until a.size) forall (i => a(i) > b(i))
// O(n) but creates tuples and a temporary list. Yet, more elegant.
a zip b forall (x=> x._1 > x._2)
// same as above but doesn't create a temporary list (lazy)
a.view zip b forall (x=> x._1 > x._2)
// O(n), without tuple or temporary list creation, and even more elegant.
(a corresponds b)(_ > _)
Strings are collections. How come?
“abc”.max
@inline implicit def augmentString(x: String) =
new StringOps(x)
String <% StringOps <: StringLike <: IndexedSeqOptimized …
Complexity of collection operations
• Linear:
– Unary: O(n):
• Mappers: map, collect
• Reducers: reduce, foldLeft, foldRight
• Others: foreach, filter, indexOf, reverse, find, mkString
– Binary: O(n+ m):
• union, diff, and intersect
Immutable Collections time complexity
head tail apply update prepend append
List C C L L C L
Stream C C L L C L
Vector eC eC eC eC eC eC
Stack C C L L C L
Queue aC aC L L L C
Range C C C - - -
String C L C L L L
Mutable Collections time complexity
head tail apply update prepend append insert
ArrayBuffer C L C C L aC L
ListBuffer C L L L C C L
StringBuilde
r C L C C L aC L
MutableList C L L L C C L
Queue C L L L C C L
ArraySeq C L C C - - -
Stack C L L L C L L
ArrayStack C L C C aC L L
Array C L C C - - -
Bonus question
What’s the complexity
of Range.sum?
Range
Equals-hashCode contract
(a equals b)  (a.hashCode == b.hashCode)
All Scala collection implement the contract
Bad idea: Set[Array[Int]]
Good idea: Set[Vector[Int]]
Bad Idea: Set[ArrayBuffer[Int]]
Bad Idea: Set[collection.mutable._]
Good Idea: Set[collection.immutable._]
More on collections equality
val (a, b) = (1 to 3, List(1, 2, 3))
a == b // true
Q: Wait, how efficient is Range.hashCode?
A: O(n)
override def hashCode = util.hashing.MurmurHash3.seqHash(seq)
Challenge yourself:
Is there a closed (o(1)) formula for a range hashCode?
Java interoperability
Implicit (less boilerplate):
import collection.javaConversions._
javaCollection.filter(…)
Explicit (better control):
Import collection.javaConverters._
javaCollection.asScala.filter(…)
scalaCollection.asJava
The power of type-level programming
graph path-finding in compile time
import scala.language.implicitConversions
// Vertices
case class A(l: List[Char])
case class B(l: List[Char])
case class C(l: List[Char])
case class D(l: List[Char])
case class E(l: List[Char])
// Edges
implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A')
implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B')
implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C')
implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E')
def pathFrom(end:D) = end
pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
Want to go Pro?
• Shapeless (Miles Sabin)
– Polytypic programming & Heterogenous lists
– github.com/milessabin/shapeless
• Scalaxy (Olivier Chafik)
– Macros for boosting performance of collections
– github.com/ochafik/Scalaxy

More Related Content

PDF
High-Performance Haskell
PDF
Why Haskell Matters
PDF
Real World Haskell: Lecture 5
PDF
01. haskell introduction
PDF
Monoids
PPT
Functional programming with_scala
PDF
Introduction to NumPy (PyData SV 2013)
PDF
Zippers
High-Performance Haskell
Why Haskell Matters
Real World Haskell: Lecture 5
01. haskell introduction
Monoids
Functional programming with_scala
Introduction to NumPy (PyData SV 2013)
Zippers

What's hot (19)

PPTX
Arrays basics
PDF
Abstracting over the Monad yielded by a for comprehension and its generators
PDF
Scientific Computing with Python - NumPy | WeiYuan
PDF
Mutable data types in python
PDF
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
PDF
Faster persistent data structures through hashing
PDF
Numpy
PPTX
Python - Numpy/Pandas/Matplot Machine Learning Libraries
PDF
PPTX
Python Scipy Numpy
PPTX
1.Array and linklst definition
PDF
PDF
Addendum to ‘Monads do not Compose’
PPTX
2- Dimensional Arrays
KEY
Functional ruby
PPTX
Introduction to numpy Session 1
PPTX
Statistics - ArgMax Equation
PDF
Intoduction to numpy
Arrays basics
Abstracting over the Monad yielded by a for comprehension and its generators
Scientific Computing with Python - NumPy | WeiYuan
Mutable data types in python
Quicksort - a whistle-stop tour of the algorithm in five languages and four p...
Faster persistent data structures through hashing
Numpy
Python - Numpy/Pandas/Matplot Machine Learning Libraries
Python Scipy Numpy
1.Array and linklst definition
Addendum to ‘Monads do not Compose’
2- Dimensional Arrays
Functional ruby
Introduction to numpy Session 1
Statistics - ArgMax Equation
Intoduction to numpy
Ad

Viewers also liked (10)

PPTX
Carolina castro segundo parcial_tarea2
PDF
Principios diseño de interacción
PDF
Exposiçao: Kandinsky - Tudo começa num ponto
PDF
Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...
PDF
Caso La Noria
PDF
Algorithms in nature
PDF
Effective code reviews
PPTX
7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...
PPTX
ZAYANN : Parcours pédagogique entre nature et culture
PDF
Scala Parallel Collections
Carolina castro segundo parcial_tarea2
Principios diseño de interacción
Exposiçao: Kandinsky - Tudo começa num ponto
Interplay Project. National Indigenous Research and Knowledges Network (NIRAK...
Caso La Noria
Algorithms in nature
Effective code reviews
7th Etherum Meetup Vienna: Crypto Token Economy - Price of ETH, Bitcoin 1.0 a...
ZAYANN : Parcours pédagogique entre nature et culture
Scala Parallel Collections
Ad

Similar to Scala collections wizardry - Scalapeño (20)

PDF
Scala Collections
PDF
Scala collections
PPT
Scala collection
PDF
Getting Started With Scala
PDF
Scala collections api expressivity and brevity upgrade from java
ODP
Collections In Scala
ODP
Collection advance
PDF
Introduction to parallel and distributed computation with spark
PDF
Collections Framework Beginners Guide 2
PDF
Collections Framework Begineers guide 2
PPTX
Intro to Functional Programming in Scala
PDF
Scala or functional programming from a python developer's perspective
PDF
Scala by Luc Duponcheel
PDF
Demystifying functional programming with Scala
PDF
Scala intro workshop
ODP
Functional programming with Scala
PPTX
Taxonomy of Scala
PDF
Introduction à Scala - Michel Schinz - January 2010
PDF
Scala. Introduction to FP. Monads
Scala Collections
Scala collections
Scala collection
Getting Started With Scala
Scala collections api expressivity and brevity upgrade from java
Collections In Scala
Collection advance
Introduction to parallel and distributed computation with spark
Collections Framework Beginners Guide 2
Collections Framework Begineers guide 2
Intro to Functional Programming in Scala
Scala or functional programming from a python developer's perspective
Scala by Luc Duponcheel
Demystifying functional programming with Scala
Scala intro workshop
Functional programming with Scala
Taxonomy of Scala
Introduction à Scala - Michel Schinz - January 2010
Scala. Introduction to FP. Monads

Recently uploaded (20)

PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Empathic Computing: Creating Shared Understanding
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Cloud computing and distributed systems.
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Approach and Philosophy of On baking technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Programs and apps: productivity, graphics, security and other tools
Empathic Computing: Creating Shared Understanding
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Cloud computing and distributed systems.
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
Approach and Philosophy of On baking technology
Per capita expenditure prediction using model stacking based on satellite ima...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Chapter 3 Spatial Domain Image Processing.pdf
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Reach Out and Touch Someone: Haptics and Empathic Computing

Scala collections wizardry - Scalapeño

  • 2. Warm up example: Fibonacci sequence val fibs: Stream[Int] = 0 #:: fibs.scanLeft(1)(_ + _) Key concepts: • Recursive values • Streams • Scan • Binary place-holder notation
  • 3. Immutable collections You’ll know about • Avoid memory allocation for empty collections • Optimize for small collections • Equal-hashCode contract • Asymptotic behavior of common operations
  • 4. Nil List.empty and Nil are singletons. No new memory is allocated
  • 6. Immutable Sets – emptySet emptySet is a singleton too
  • 7. Immutable Sets – Set1 Optimized for sets of size 1
  • 8. Immutable Sets – Set2 Optimized for sets of size 2
  • 9. Immutable Sets – Set4 A HashSet is (finally) instantiated
  • 13. Computing a derivative def derivative(nums: Iterable[Double]) = nums.sliding(2) .map (pair => pair._2 - pair._1) What can be improved in this solution? Bonus question: change a few characters to find the max slope
  • 14. Counting occurrences (histogram) "encyclopedia" groupBy identity mapValues (_.size) Map ( e -> 2, n -> 1, y -> 1, a -> 1, i -> 1, l -> 1, p -> 1, c -> 2, o -> 1, d -> 1 )
  • 15. Word n-grams val range = 1 to 3 val text = "hello sweet world" val tokenize = (s: String) => s.split(" ") range flatMap (size => tokenize(text) sliding size) Result: Vector(Array(hello), Array(sweet), Array(world), Array(hello, sweet), Array(sweet, world), Array(hello, sweet, world))
  • 16. Are all members of a greater than corresponding members of b val a = List(2,3,4) val b = List(1,2,3) // O(n^2) and not very elegant. (0 until a.size) forall (i => a(i) > b(i)) // O(n) but creates tuples and a temporary list. Yet, more elegant. a zip b forall (x=> x._1 > x._2) // same as above but doesn't create a temporary list (lazy) a.view zip b forall (x=> x._1 > x._2) // O(n), without tuple or temporary list creation, and even more elegant. (a corresponds b)(_ > _)
  • 17. Strings are collections. How come? “abc”.max @inline implicit def augmentString(x: String) = new StringOps(x) String <% StringOps <: StringLike <: IndexedSeqOptimized …
  • 18. Complexity of collection operations • Linear: – Unary: O(n): • Mappers: map, collect • Reducers: reduce, foldLeft, foldRight • Others: foreach, filter, indexOf, reverse, find, mkString – Binary: O(n+ m): • union, diff, and intersect
  • 19. Immutable Collections time complexity head tail apply update prepend append List C C L L C L Stream C C L L C L Vector eC eC eC eC eC eC Stack C C L L C L Queue aC aC L L L C Range C C C - - - String C L C L L L
  • 20. Mutable Collections time complexity head tail apply update prepend append insert ArrayBuffer C L C C L aC L ListBuffer C L L L C C L StringBuilde r C L C C L aC L MutableList C L L L C C L Queue C L L L C C L ArraySeq C L C C - - - Stack C L L L C L L ArrayStack C L C C aC L L Array C L C C - - -
  • 21. Bonus question What’s the complexity of Range.sum?
  • 22. Range
  • 23. Equals-hashCode contract (a equals b)  (a.hashCode == b.hashCode) All Scala collection implement the contract Bad idea: Set[Array[Int]] Good idea: Set[Vector[Int]] Bad Idea: Set[ArrayBuffer[Int]] Bad Idea: Set[collection.mutable._] Good Idea: Set[collection.immutable._]
  • 24. More on collections equality val (a, b) = (1 to 3, List(1, 2, 3)) a == b // true Q: Wait, how efficient is Range.hashCode? A: O(n) override def hashCode = util.hashing.MurmurHash3.seqHash(seq) Challenge yourself: Is there a closed (o(1)) formula for a range hashCode?
  • 25. Java interoperability Implicit (less boilerplate): import collection.javaConversions._ javaCollection.filter(…) Explicit (better control): Import collection.javaConverters._ javaCollection.asScala.filter(…) scalaCollection.asJava
  • 26. The power of type-level programming graph path-finding in compile time import scala.language.implicitConversions // Vertices case class A(l: List[Char]) case class B(l: List[Char]) case class C(l: List[Char]) case class D(l: List[Char]) case class E(l: List[Char]) // Edges implicit def ad[A1 <% A](x: A1) = D(x.l :+ 'A') implicit def bc[B1 <% B](x: B1) = C(x.l :+ 'B') implicit def ce[C1 <% C](x: C1) = E(x.l :+ 'C') implicit def ea[E1 <% E](x: E1) = A(x.l :+ 'E') def pathFrom(end:D) = end pathFrom(B(Nil)) // res0: D = D(List(B, C, E, A))
  • 27. Want to go Pro? • Shapeless (Miles Sabin) – Polytypic programming & Heterogenous lists – github.com/milessabin/shapeless • Scalaxy (Olivier Chafik) – Macros for boosting performance of collections – github.com/ochafik/Scalaxy