SlideShare a Scribd company logo
Java Collections
The Force Awakens
Darth @RaoulUK
Darth @RichardWarburto
Collections forceawakens
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Collection bugs
1. Element access (Off-by-one error, ArrayOutOfBound)
2. Concurrent modification
3. Check-then-Act
Scenario 1
List<String> jedis = new ArrayList<>(asList("Luke", "yoda"));
for (String jedi: jedis) {
if (Character.isLowerCase(jedi.charAt(0))) {
jedis.remove(jedi);
}
}
Scenario 2
Map<String, BigDecimal> movieViews = new HashMap<>();
BigDecimal views = movieViews.get(MOVIE);
if(views != null) {
movieViews.put(MOVIE, views.add(BigDecimal.ONE));
}
views != nullmoviesViews.get movieViews.put
Then
Check Act
Reducing scope for bugs
● ~280 bugs in 28 projects including Cassandra, Lucene
● ~80% check-then-act bugs discovered are put-if-absent
● Library designers can help by updating APIs as new idioms emerge
● Different data structures can provide alternatives by restricting reads &
updates to reduce scope for bugs
CHECK-THEN-ACT Misuse of Java Concurrent Collections
http://dig.cs.illinois.edu/papers/checkThenAct.pdf
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Java 8 Lazy Collection Initialization
Many allocated HashMaps and ArrayLists never written to, eg Null object
pattern
Java 8 adds Lazy Initialization for the default initialization case
Typically 1-2% reduction in memory consumption
http://www.javamagazine.mozaicreader.
com/MarApr2016/Twitter#&pageSet=28&page=0
Collections forceawakens
Java 9 API updates
Collection factory methods
● Non-goal to provide persistent immutable collections
● http://openjdk.java.net/jeps/269
java.util.Optional
● ifPresentOrElse(), or(), stream(), getWhenPresent()
● Optional.get() becomes deprecated
java.util.Stream
● takeWhile, dropWhile
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Categorising Collections
Mutable
Immutable
Non-Persistent Persistent
Unsynchronized Concurrent
Unmodifiable View
Available in
Core Library
Mutable
● Popular friends include ArrayList, HashMap, TreeSet
● Memory-efficient modification operations
● State can be accidentally modified
● Can be thread-safe, but requires careful design
Unmodifiable
List<String> jedis = new ArrayList<>();
jedis.add("Luke Skywalker");
List<String> cantChangeMe = Collections.unmodifiableList(jedis);
// java.lang.UnsupportedOperationException
//cantChangeMe.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker]
jedis.add("Darth Vader");
System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
Collections forceawakens
Immutable & Non-persistent
● No updates
● Flexibility to convert source in a more efficient representation
● No locking in context of concurrency
● Satisfies co-variant subtyping requirements
● Can be copied with modifications to create a new version (can be
expensive)
Immutable vs. Mutable hierarchy
ImmutableList MutableList
+ ImmutableList<T> toImmutable()
java.util.List
+ MutableList<T> toList()
Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/
ListIterable
Immutable and Persistent
● Changing source produces a new (version) of the collection
● Resulting collections shares structure with source to avoid full copying
on updates
Persistent List (aka Cons)
public final class Cons<T> implements ConsList<T> {
private final T head;
private final ConsList<T> tail;
public Cons(T head, ConsList<T> tail) {
this.head = head; this.tail = tail;
}
@Override
public ConsList<T> add(T e) {
return new Cons(e, this);
}
}
Updating Persistent List
A B C X Y Z
Before
Updating Persistent List
A B C X Y Z
Before
A B D
After
Blue nodes indicate new copies
Purple nodes indicates nodes we wish to update
Concatenating Two Persistent Lists
A B C
X Y Z
Before
Concatenating Two Persistent Lists
- Poor locality due to pointer chasing
- Copying of nodes
A B C
X Y Z
Before
A B C
After
Persistent List
● Structural sharing: no need to copy full structure
● Poor locality due to pointer chasing
● Copying becomes more expensive with larger lists
● Poor Random Access and thus Data Decomposition
Updating Persistent Binary Tree
Before
Updating Persistent Binary Tree
After
Persistent Array
How do we get the immutability benefits with performance of mutable
variants?
Trie
root
10 4520
3. Picking the right branch is done by using
parts of the key as a lookup
1. Branch factor
not limited to
binary
2. Leaf nodes
contain actual
values
a
a e
b
c
b c f
Persistent Array (Bitmapped Vector Trie)
... ...
... ...
... ...
... ...
.
.
.
.
.
.
1 31
0 1 31
Level 1 (root)
Level 2
Leaf nodes
Trade-offs
● Large branching factor facilitates iteration but hinders updates
● Small branching factor facilitates updates but hinders traversal
Java Persistent Collections
- Not available as part of Java Core Library
- Existing projects includes
- PCollections: https://github.com/hrldcpr/pcollections
- Port of Clojure DS: https://github.com/krukow/clj-ds
- Port of Scala DS: https://github.com/andrewoma/dexx
Memory usage survey
10,000,000 elements, heap < 32GB
int[] : 40MB
Integer[]: 160MB
ArrayList<Integer>: 215MB
PersistentVector<Integer>: 214MB (Clojure-DS)
Vector<Integer>: 206MB (Dexx, port of Scala-DS)
Data collected using Java Object Layout: http://openjdk.java.
net/projects/code-tools/jol/
Primitive specialised collections
● Collections often hold boxed representations of primitive values
● Java 8 introduced IntStream, LongStream, DoubleStream and
primitive specialised functional interfaces
● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide
primitive specialised collections today.
● Valhalla investigates primitive specialised generics
Takeaways
● Immutable collections reduce the scope for bugs
● Always a compromise between programming safety and performance
● Performance of persistent data structure is improving
Collection Problems
Java Episode 8 & 9
Persistent & Immutable Collections
HashMaps
Collections forceawakens
HashMaps Basics
...
Han Solo
hash = 72309
Chewbacca
hash = 72309
Chaining Probing
HashMaps
a separate data
structure for
collision lookups
Store inline and
have a probing
sequence
Aliases: Palpatine vs Darth Sidious
Chaining Probing
HashMaps
aka Closed
Addressing
aka Open Hashing
aka Open
Addressing
aka Closed
Hashing
Chaining Probing
HashMaps
Linked List Based Tree Based
java.util.HashMap
Chaining Based HashMap
Historically maintained a LinkedList in the case of a collision
Problem: with high collision rates that the HashMap approaches O(N)
lookup
java.util.HashMap in Java 8
Starts by using a List to store colliding values.
Trees used when there are over 8 elements
Tree based nodes use about twice the memory
Make heavy collision lookup case O(log(N)) rather than O(N)
Relies on keys being Comparable
https://github.com/RichardWarburton/map-visualiser
So which HashMap is best?
Benchmarking is about building a mental
model of the performance tradeoffs
Example Jar-Jar Benchmark
call get() on a single value for a map
of size 1
No model of the different factors that
affect things!
Benchmarking HashMaps
Load Factor
Nonlinear key access
Successful vs Failed get()
Hash Collisions
Comparable vs Incomparable keys
Different Keys and Values
Cost of hashCode/Equals
Tree Optimization - 60% Collisions
Tree Optimization - 10% Collisions
Probing vs Chaining
Probing Maps usually have lower memory consumption
Small Maps: Probing never has long clusters, can be up to 91% faster.
In large maps with high collision rates, probing scales poorly and can be
significantly slower.
Takeaways
There’s no clearcut “winner”.
JDK Implementations try to minimise worst case.
Linear Probing requires a good hashCode() distribution, Often hashmaps
“precondition” their hashes.
IdentityHashMap has low memory consumption and is fast, use it!
3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
Conclusions
Interface Popularity
List 1576210
Set 980763
Map 803171
Queue 62024
Deque 3464
SortedSet 9121
NavigableSet 1735
SortedMap 8677
NavigableMap 1484
Implementation Popularity
ArrayList 225029
LinkedList 26850
ArrayDeque 1086
HashSet 68940
TreeSet 10108
EnumSet 10512
HashMap 137610
TreeMap 7734
WeakHashMap 3473
IdentityHashMap 2443
EnumMap 1904
Evolution can be interesting ...
Java 1.2 Java 10?
Collections forceawakens
Any Questions?
www.pluralsight.com/author/richard-warburton
www.cambridgecoding.com
www.iteratrlearning.com
Further reading
Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays
https://infoscience.epfl.ch/record/64410/files/techlists.pdf
Smaller Footprint for Java Collections
http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf
Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections
http://michael.steindorfer.name/publications/oopsla15.pdf
RRB-Trees: Efficient Immutable Vectors
https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
Further reading
Doug Lea’s Analysis of the HashMap implementation tradeoffs
http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html
Java Specialists HashMap article
http://www.javaspecialists.eu/archive/Issue235.html
Sample and Benchmark Code
https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens
Further reading
Debian code search used for popularity
https://codesearch.debian.net/

More Related Content

PDF
Java collections the force awakens
PDF
Performance and predictability (1)
PDF
Generics Past, Present and Future (Latest)
PDF
Performance and predictability
PDF
Generics Past, Present and Future
PDF
Generics past, present and future
PPTX
Compilers Are Databases
PDF
Clojure, Plain and Simple
Java collections the force awakens
Performance and predictability (1)
Generics Past, Present and Future (Latest)
Performance and predictability
Generics Past, Present and Future
Generics past, present and future
Compilers Are Databases
Clojure, Plain and Simple

What's hot (20)

PDF
Pune Clojure Course Outline
PDF
Clojure made-simple - John Stevenson
ODP
Best practices in Java
PDF
Kotlin @ Coupang Backend 2017
PPTX
Kotlin coroutines and spring framework
PDF
JUnit5 and TestContainers
PDF
Debugging Your Production JVM
PDF
Scala eXchange opening
PPTX
Apache Flink Training: DataStream API Part 2 Advanced
PDF
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
PPT
Oscon keynote: Working hard to keep it simple
PDF
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
PDF
The TclQuadcode Compiler
PPTX
Adventures in TclOO
PDF
Spark workshop
PPTX
Beyond parallelize and collect - Spark Summit East 2016
PPTX
Python 3.6 Features 20161207
PDF
Logic programming a ruby perspective
PPTX
Introduction to Haskell: 2011-04-13
PPTX
TclOO: Past Present Future
Pune Clojure Course Outline
Clojure made-simple - John Stevenson
Best practices in Java
Kotlin @ Coupang Backend 2017
Kotlin coroutines and spring framework
JUnit5 and TestContainers
Debugging Your Production JVM
Scala eXchange opening
Apache Flink Training: DataStream API Part 2 Advanced
How Green are Java Best Coding Practices? - GreenDays @ Rennes - 2014-07-01
Oscon keynote: Working hard to keep it simple
Beyond Shuffling, Tips and Tricks for Scaling Apache Spark updated for Spark ...
The TclQuadcode Compiler
Adventures in TclOO
Spark workshop
Beyond parallelize and collect - Spark Summit East 2016
Python 3.6 Features 20161207
Logic programming a ruby perspective
Introduction to Haskell: 2011-04-13
TclOO: Past Present Future
Ad

Similar to Collections forceawakens (20)

DOC
24 collections framework interview questions
PDF
Guava collection-zero-to-hero
PPTX
Java Hands-On Workshop
ODP
Java Collections
PDF
Java Collection framework
PPTX
Java.util
PPTX
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
PPTX
Java util
ODP
Java - Collections
PPTX
Javasession7
PPTX
collectionsframework210616084411 (1).pptx
PPTX
Java Collections.pptx
PPT
Collections
PPT
description of Collections, seaching & Sorting
ODP
Advanced java
PDF
Collections Java e Google Collections
PDF
Faster persistent data structures through hashing
PDF
Hash map (java platform se 8 )
PPTX
LJ_JAVA_FS_Collection.pptx
24 collections framework interview questions
Guava collection-zero-to-hero
Java Hands-On Workshop
Java Collections
Java Collection framework
Java.util
Collection Framework in Java | Generics | Input-Output in Java | Serializatio...
Java util
Java - Collections
Javasession7
collectionsframework210616084411 (1).pptx
Java Collections.pptx
Collections
description of Collections, seaching & Sorting
Advanced java
Collections Java e Google Collections
Faster persistent data structures through hashing
Hash map (java platform se 8 )
LJ_JAVA_FS_Collection.pptx
Ad

More from RichardWarburton (20)

PDF
Fantastic performance and where to find it
PDF
Production profiling what, why and how technical audience (3)
PDF
Production profiling: What, Why and How
PDF
Production profiling what, why and how (JBCN Edition)
PDF
Production Profiling: What, Why and How
PDF
Jvm profiling under the hood
PDF
How to run a hackday
PDF
Pragmatic functional refactoring with java 8 (1)
PDF
Twins: Object Oriented Programming and Functional Programming
PDF
Pragmatic functional refactoring with java 8
PDF
Introduction to lambda behave
PDF
Introduction to lambda behave
PDF
Performance and predictability
PDF
Simplifying java with lambdas (short)
PDF
Twins: OOP and FP
PDF
Twins: OOP and FP
PDF
The Bleeding Edge
PDF
Lambdas myths-and-mistakes
PDF
Caching in
PDF
Lambdas: Myths and Mistakes
Fantastic performance and where to find it
Production profiling what, why and how technical audience (3)
Production profiling: What, Why and How
Production profiling what, why and how (JBCN Edition)
Production Profiling: What, Why and How
Jvm profiling under the hood
How to run a hackday
Pragmatic functional refactoring with java 8 (1)
Twins: Object Oriented Programming and Functional Programming
Pragmatic functional refactoring with java 8
Introduction to lambda behave
Introduction to lambda behave
Performance and predictability
Simplifying java with lambdas (short)
Twins: OOP and FP
Twins: OOP and FP
The Bleeding Edge
Lambdas myths-and-mistakes
Caching in
Lambdas: Myths and Mistakes

Recently uploaded (20)

PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
A Presentation on Artificial Intelligence
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
cuic standard and advanced reporting.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Encapsulation theory and applications.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Chapter 3 Spatial Domain Image Processing.pdf
Diabetes mellitus diagnosis method based random forest with bat algorithm
Review of recent advances in non-invasive hemoglobin estimation
The Rise and Fall of 3GPP – Time for a Sabbatical?
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
A Presentation on Artificial Intelligence
Dropbox Q2 2025 Financial Results & Investor Presentation
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
NewMind AI Monthly Chronicles - July 2025
NewMind AI Weekly Chronicles - August'25 Week I
cuic standard and advanced reporting.pdf
Big Data Technologies - Introduction.pptx
MYSQL Presentation for SQL database connectivity
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Encapsulation_ Review paper, used for researhc scholars
Encapsulation theory and applications.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Collections forceawakens

  • 1. Java Collections The Force Awakens Darth @RaoulUK Darth @RichardWarburto
  • 3. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 4. Collection bugs 1. Element access (Off-by-one error, ArrayOutOfBound) 2. Concurrent modification 3. Check-then-Act
  • 5. Scenario 1 List<String> jedis = new ArrayList<>(asList("Luke", "yoda")); for (String jedi: jedis) { if (Character.isLowerCase(jedi.charAt(0))) { jedis.remove(jedi); } }
  • 6. Scenario 2 Map<String, BigDecimal> movieViews = new HashMap<>(); BigDecimal views = movieViews.get(MOVIE); if(views != null) { movieViews.put(MOVIE, views.add(BigDecimal.ONE)); } views != nullmoviesViews.get movieViews.put Then Check Act
  • 7. Reducing scope for bugs ● ~280 bugs in 28 projects including Cassandra, Lucene ● ~80% check-then-act bugs discovered are put-if-absent ● Library designers can help by updating APIs as new idioms emerge ● Different data structures can provide alternatives by restricting reads & updates to reduce scope for bugs CHECK-THEN-ACT Misuse of Java Concurrent Collections http://dig.cs.illinois.edu/papers/checkThenAct.pdf
  • 8. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 9. Java 8 Lazy Collection Initialization Many allocated HashMaps and ArrayLists never written to, eg Null object pattern Java 8 adds Lazy Initialization for the default initialization case Typically 1-2% reduction in memory consumption http://www.javamagazine.mozaicreader. com/MarApr2016/Twitter#&pageSet=28&page=0
  • 11. Java 9 API updates Collection factory methods ● Non-goal to provide persistent immutable collections ● http://openjdk.java.net/jeps/269 java.util.Optional ● ifPresentOrElse(), or(), stream(), getWhenPresent() ● Optional.get() becomes deprecated java.util.Stream ● takeWhile, dropWhile
  • 12. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 13. Categorising Collections Mutable Immutable Non-Persistent Persistent Unsynchronized Concurrent Unmodifiable View Available in Core Library
  • 14. Mutable ● Popular friends include ArrayList, HashMap, TreeSet ● Memory-efficient modification operations ● State can be accidentally modified ● Can be thread-safe, but requires careful design
  • 15. Unmodifiable List<String> jedis = new ArrayList<>(); jedis.add("Luke Skywalker"); List<String> cantChangeMe = Collections.unmodifiableList(jedis); // java.lang.UnsupportedOperationException //cantChangeMe.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker] jedis.add("Darth Vader"); System.out.println(cantChangeMe); // [Luke Skywalker, Darth Vader]
  • 17. Immutable & Non-persistent ● No updates ● Flexibility to convert source in a more efficient representation ● No locking in context of concurrency ● Satisfies co-variant subtyping requirements ● Can be copied with modifications to create a new version (can be expensive)
  • 18. Immutable vs. Mutable hierarchy ImmutableList MutableList + ImmutableList<T> toImmutable() java.util.List + MutableList<T> toList() Eclipse Collections (formaly GSCollections) https://projects.eclipse.org/projects/technology.collections/ ListIterable
  • 19. Immutable and Persistent ● Changing source produces a new (version) of the collection ● Resulting collections shares structure with source to avoid full copying on updates
  • 20. Persistent List (aka Cons) public final class Cons<T> implements ConsList<T> { private final T head; private final ConsList<T> tail; public Cons(T head, ConsList<T> tail) { this.head = head; this.tail = tail; } @Override public ConsList<T> add(T e) { return new Cons(e, this); } }
  • 21. Updating Persistent List A B C X Y Z Before
  • 22. Updating Persistent List A B C X Y Z Before A B D After Blue nodes indicate new copies Purple nodes indicates nodes we wish to update
  • 23. Concatenating Two Persistent Lists A B C X Y Z Before
  • 24. Concatenating Two Persistent Lists - Poor locality due to pointer chasing - Copying of nodes A B C X Y Z Before A B C After
  • 25. Persistent List ● Structural sharing: no need to copy full structure ● Poor locality due to pointer chasing ● Copying becomes more expensive with larger lists ● Poor Random Access and thus Data Decomposition
  • 28. Persistent Array How do we get the immutability benefits with performance of mutable variants?
  • 29. Trie root 10 4520 3. Picking the right branch is done by using parts of the key as a lookup 1. Branch factor not limited to binary 2. Leaf nodes contain actual values a a e b c b c f
  • 30. Persistent Array (Bitmapped Vector Trie) ... ... ... ... ... ... ... ... . . . . . . 1 31 0 1 31 Level 1 (root) Level 2 Leaf nodes
  • 31. Trade-offs ● Large branching factor facilitates iteration but hinders updates ● Small branching factor facilitates updates but hinders traversal
  • 32. Java Persistent Collections - Not available as part of Java Core Library - Existing projects includes - PCollections: https://github.com/hrldcpr/pcollections - Port of Clojure DS: https://github.com/krukow/clj-ds - Port of Scala DS: https://github.com/andrewoma/dexx
  • 33. Memory usage survey 10,000,000 elements, heap < 32GB int[] : 40MB Integer[]: 160MB ArrayList<Integer>: 215MB PersistentVector<Integer>: 214MB (Clojure-DS) Vector<Integer>: 206MB (Dexx, port of Scala-DS) Data collected using Java Object Layout: http://openjdk.java. net/projects/code-tools/jol/
  • 34. Primitive specialised collections ● Collections often hold boxed representations of primitive values ● Java 8 introduced IntStream, LongStream, DoubleStream and primitive specialised functional interfaces ● Other libraries, eg: Agrona, Koloboke and Eclipse-Collections provide primitive specialised collections today. ● Valhalla investigates primitive specialised generics
  • 35. Takeaways ● Immutable collections reduce the scope for bugs ● Always a compromise between programming safety and performance ● Performance of persistent data structure is improving
  • 36. Collection Problems Java Episode 8 & 9 Persistent & Immutable Collections HashMaps
  • 38. HashMaps Basics ... Han Solo hash = 72309 Chewbacca hash = 72309
  • 39. Chaining Probing HashMaps a separate data structure for collision lookups Store inline and have a probing sequence
  • 40. Aliases: Palpatine vs Darth Sidious
  • 41. Chaining Probing HashMaps aka Closed Addressing aka Open Hashing aka Open Addressing aka Closed Hashing
  • 43. java.util.HashMap Chaining Based HashMap Historically maintained a LinkedList in the case of a collision Problem: with high collision rates that the HashMap approaches O(N) lookup
  • 44. java.util.HashMap in Java 8 Starts by using a List to store colliding values. Trees used when there are over 8 elements Tree based nodes use about twice the memory Make heavy collision lookup case O(log(N)) rather than O(N) Relies on keys being Comparable https://github.com/RichardWarburton/map-visualiser
  • 45. So which HashMap is best?
  • 46. Benchmarking is about building a mental model of the performance tradeoffs
  • 47. Example Jar-Jar Benchmark call get() on a single value for a map of size 1 No model of the different factors that affect things!
  • 48. Benchmarking HashMaps Load Factor Nonlinear key access Successful vs Failed get() Hash Collisions Comparable vs Incomparable keys Different Keys and Values Cost of hashCode/Equals
  • 49. Tree Optimization - 60% Collisions
  • 50. Tree Optimization - 10% Collisions
  • 51. Probing vs Chaining Probing Maps usually have lower memory consumption Small Maps: Probing never has long clusters, can be up to 91% faster. In large maps with high collision rates, probing scales poorly and can be significantly slower.
  • 52. Takeaways There’s no clearcut “winner”. JDK Implementations try to minimise worst case. Linear Probing requires a good hashCode() distribution, Often hashmaps “precondition” their hashes. IdentityHashMap has low memory consumption and is fast, use it! 3rd Party libraries offer probing HashMaps, eg Koloboke & Eclipse-Collections.
  • 54. Interface Popularity List 1576210 Set 980763 Map 803171 Queue 62024 Deque 3464 SortedSet 9121 NavigableSet 1735 SortedMap 8677 NavigableMap 1484
  • 55. Implementation Popularity ArrayList 225029 LinkedList 26850 ArrayDeque 1086 HashSet 68940 TreeSet 10108 EnumSet 10512 HashMap 137610 TreeMap 7734 WeakHashMap 3473 IdentityHashMap 2443 EnumMap 1904
  • 56. Evolution can be interesting ... Java 1.2 Java 10?
  • 59. Further reading Fast Functional Lists, Hash-Lists, Deques and Variable Length Arrays https://infoscience.epfl.ch/record/64410/files/techlists.pdf Smaller Footprint for Java Collections http://www.lirmm.fr/~ducour/Doc-objets/ECOOP2012/ECOOP/ecoop/356.pdf Optimizing Hash-Array Mapped Tries for Fast and Lean Immutable JVM Collections http://michael.steindorfer.name/publications/oopsla15.pdf RRB-Trees: Efficient Immutable Vectors https://infoscience.epfl.ch/record/169879/files/RMTrees.pdf
  • 60. Further reading Doug Lea’s Analysis of the HashMap implementation tradeoffs http://www.mail-archive.com/core-libs-dev@openjdk.java.net/msg02147.html Java Specialists HashMap article http://www.javaspecialists.eu/archive/Issue235.html Sample and Benchmark Code https://github.com/RichardWarburton/Java-Collections-The-Force-Awakens
  • 61. Further reading Debian code search used for popularity https://codesearch.debian.net/