SlideShare a Scribd company logo
Java Serialization
Facts and Fallacies
© 2013, Roman Elizarov, Devexperts
Serialization
… is the process of translating data structures or object
state into a format that can be stored (for example, in
a file or memory buffer, or transmitted across a network
connection link) and resurrected later in the same or
another computer environment.
-- WIkipedia

Object

Bytes

Network/Storage
Use-cases
• Transfer data over network between cluster nodes or between
servers and clients
- Serialization is a key technology behind any RPC/RMI

• Serialization if often used for storage
- but then data transfer is still often a part of the picture
Java serialization facility (key facts)
• Had appeared in 1997 as a part of JDK 1.1 release
• Uses java.io.Serializable as a marker interface
• Consists of java.io.ObjectInputStream/ObjectOutputStream with
related classes and interfaces
• Has “Java Object Serialization Specification” that documents
both API and object serialization stream protocol
• Is part of any Java platform (even Android has it)
• Somehow has a lot of myths around it
It is a joke
… and the only JEP joke (!)
… and it has a big grain of truth:
There are lots of folk myths about serialization
Java Serialization Facts and Fallacies
Hazelcast Portable
Google protobuf
Hessian
Protostuff

Kryo

Coherence POF
JBoss serialization

Apache Avro

fast-serialization

Apache Thrift

Hadoop Writable

jserial

quickser
GridGain serialization

msgpack

wobly

Burlap

EclipseLink MOXy

Gson
Json-lib

JAXB
XStream

FLEXJSON

Javolution XML

Jackson
svenson

java.beans.XMLEncoder/XMLDecoder
jsonij
The obvious fact of live

Real hackers write everything from scratch 
Every mature appserver/framework/cache/eventbus/soa/rpc/db/etc
must have its own serialization that is better than everybody else’s 
How to choose a serialization lib?
• Cross-language or Java-only
• Binary or text (XML/JSON/other) format
• Custom or automatic object-to-serialized format mapping
• Needs format description/mapping/annotation or not
• Writes object scheme (e.g. field names) with each object, once per
request, once per session, never at all, or some mixed approach
• Writes object fields or bean properties
• Serializes object graph with shared objects or not
• Performance
- CPU consumption to serialize/deserialize

- Network consumption (bytes per object)
Java built-in serialization lib?
• Cross-language or Java-only
• Binary or text (XML/JSON/other) format
• Custom or automatic object-to-serialized format mapping
• Needs format description/mapping/annotation or not
• Writes object scheme (e.g. field names) with each object, once per
request, once per session, never at all, or some mixed approach
• Writes object fields or bean properties
• Serializes object graph with shared objects or not
• Performance
- CPU consumption to serialize/deserialize

- Network consumption (bytes per object)
Serious serialization lib choice diagram
YES

YES

Java to JS?

Cross-platform
data transfer?

NO

Use built-in Java
Serialization

YES
Jackson/gson/etc
do further
research

NO

Do you have
problems with
it?

NO
Serialization myth #1
I’ve found this framework called XYZ…
… it has really simple and fast serialization

All you have to do is to implement this interface! Easy!

I’ve tested it! It is really fast!
It outperforms everything else I’ve tried!
/* This is almost actual real-life conversation.
* Names changed

*/
In fact this is corollary to “The obvious fact of life”
• Of course it is the fastest possible way! But this does not solve
the most complex serialization problems
- You can implement java.io.Externalizable if you are up to writing a
custom serialization code for each object.

• Maintenance cost/efforts are often underestimated
- How are you planning to evolve your system? Add new fields? New
classes?
• What will keep you from forgetting to read/write them?
• Ah… yes, you write a lot of extra tests in addition to writing your
write/read methods to make sure your serialization works
- How about on-the-wire compatibility of old code and new one?
• But I deploy all of my system with a new version at once!
• But how do you work with it during development?
Serialization myth #2

java.io.Serializable …
“This interface limits how its implementing classes can
change in the future. By implementing Serializable you
expose your flexible in-memory implementation details as a
rigid binary representation. Simple code changes--like
renaming private fields--are not safe when the changed
class is serializable.”

http://developer.android.com/reference/java/io/Serializable.html
So what about this code evolution?

write
read

got
serialVersionUID
• serialVersionUID = crazy hash of your class
- If you change your class -> serialVersionUID changes -> incompatible

• But you can set serialVersionUID as a “private static final long”
field in your class to fix it.
• With a fixed serialVersionUID java serialization supports:
- Adding/removing fields without any additional hurdles and annotations
- Moving fields around in class definition
- Extracting/inlining super-classes (but without moving serializable fields
up/down class hierarchy).

• It basically means you can evolve your class at will, just a you
would do with key-value map in json/other-untyped-system
serialver
• That’s the tool to compute serialVersionUID as some crazy hash
of your class
- You only need it when your class is “in the wild” without explicitly
assigned serialVersionUID and you want to be backwards
compatible with that version of it

• DO NOT USE serialver for freshly created classes
- It is a pseudo random number that serves no purpose, but to make
your gzipped class files and gzipped serial files larger
- Just set serialVersionUID = 0
• It is no worse, but better than any other value
So what about this code evolution (2)?

wrie
got
symbol – as written
quantity – as written
price – as written
orderType – null

read

Profit!

It works the other way around, too!
What if I want to make my class incompatible?
• You have two choices
- serialVersionUID++
- Rename class

• So, having serialVersionUID in a modern serialization framework is
an optional feature
- They just did not have refactorings back then in 1996, so renaming a
class was not an option for them

- They chose a default of “your changes break serialization” and force
you to explicitly declare serialVersionUID if you want “your changes
are backwards compatible”
• I would have preferred the other default, but having to manually
add serialVersionUID is a small price to pay for a serialization
facility that you have out-of-the box in Java.
Complex changes made compatible
• Java serialization has a bunch of features to support radical
evolution of serializable classes in backwards-compatible way:
- writeReplace/readResolve
• Lets you migrate your code to a completely new class, but still use
the old class for serialization.
- putFields/writeFields/readFields
Lets you completely redo the fields of the object, but still be
compatible with the old set of fields for serialization.

It really helps if you are working on, evolving,
improving, and refactoring really big projects
What about “renaming private fields is unsafe”?
• It is just a matter of developer’s choice on what actually constitutes
“serial form” of the class:
- It’s private fields
• like in Java serialization
- It’s public API with its getter/setter methods
• aka “Bean Serialization”
• many serialization frameworks follow this approach
• good match when you need to expose public API via serialization
- A totally separate code, file, or annotation defines serial form
• you have to write it in addition to your class
• more work, but the approach is the most flexible and pays of in
case of highly heterogeneous system (many platforms/languages)
http://www.organisationscience.com/styled-6/
Serialization myth #3
In practice
• It really matters when very verbose text formats like XML are used
in a combination with not-so-fast serializer implementations
• Most binary formats are fast and compact enough, so that they do
not constitute a bottleneck in the system
- Data is usually expensive to acquire (compute, fetch from db, etc)
- Otherwise (if data is in cache) and the only action of the system is to
retrieve it from cache and send, then cache serialized data in binary
form, thus making serialization performance irrelevant
aka “marshalled object”
Popular myth #4

© Stack Overflow authors, emphasis is mine
A popular way to benchmark serialization

https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java
The real fact about serialization performance
• ObjectOutputStream/ObjectInputStream are expensive to create
- >1KB of internal buffers allocated
- It matters if you really do a lot of serialization

• You can reuse OOS/OIS instances if you need
- It is easy when you write a single stream of values
• Just keep doing writeObject(!)
• That is what it was designed for
• The added benefit is that class descriptors are written just once
- It is slightly tricky if you want to optimize it for one time writes
• The actual gain is modest and depends on the kinds of objects you
are serializing (their size, your heap/gc load, etc)
The real fact about serialization performance (2)
• Each ObjectOutputStream writes class descriptor on the first
encounter
- reset() clears internal handles table that keep both written objects (for
object graph serialization) and written class descriptors
• BEWARE: All written objects are retained until you call reset()

• Writing class descriptor is a signification portion of both CPU time
consumption and resulting space
- That is the price you pay for all the automagic class evolution features
without having to manually define field ids, create mappings, or write
custom code
- The data itself is relatively fast to write and has straightforward binary
format (4 bytes per int, etc)
Java Serialization Facts and Fallacies
Beware of benchmarks
• A typical benchmark compares different serialization frameworks
doing tasks of absolutely different complexity
• With Java serialization facility you necessary pay for
- No hassle object-to-serial format mapping
• (no writing of interface descriptions, no code generation)
- Support for wide range of class evolution scenarios
- Support for object graph writes

- Support for machines with different byte-endianness

• Most significant performance/size improvements are gained by
forfeiting one of these qualities
- Be sure to understand what exactly you forfeit
Real cons of Java serialization
• Just one reset() to rule them all
- No way to “resetObjects()” to clear all message-level object graph info
(referenced objects), but keep session-level information (written class
descriptors) to avoid repetition when keeping a network session to
recipient open
• Anyone knows ins-and-outs of JEP process to improve it?

• Can use some performance improvement 
- Some obviously unnecessary allocations of internal objects can be
removed, code simplified, streamlined
• I’m looking for help in “contributing to JDK” area
- But do remember, that it rarely matters in practice
Summary
• Java serialization is there out-of-the box
• It handles most Java-to-Java serialization needs well
- While providing a wide range of goodies

• It provides non-compact, but binary representation
- It works especially good, if you have collections of repeated business
objects

• It covers a wide range of class evolution scenarios
- Set serialVersionUID = 0 to be happy

• It has decent performance that does not often show on profiler’s
radar in practice
• We use it in Devexperts for all our Java-to-Java non-market-data
46000

305400
Sent orders in one day

Users on-line in one day

We create professional financial software for
Brokerage and Exchanges since 2002
99.99%
Our software index
of trouble-free operation

350
Peoples in our team

365 7 24
We support our products
around-the-clock
Headquarters
197110, 10/1 Barochnaya st.
Saint Petersburg, Russia
+7 812 438 16 26
mail@devexperts.com

www.devexperts.com
Contact me by email: elizarov at devexperts.com
Read my blog: http://elizarov.livejournal.com/

More Related Content

PPTX
Millions quotes per second in pure java
PDF
Why GC is eating all my CPU?
PPTX
Non blocking programming and waiting
PPTX
DIY Java Profiling
PDF
Wait for your fortune without Blocking!
PDF
Objective-C Is Not Java
PDF
Ruby Performance - The Last Mile - RubyConf India 2016
KEY
JavaOne 2011 - JVM Bytecode for Dummies
Millions quotes per second in pure java
Why GC is eating all my CPU?
Non blocking programming and waiting
DIY Java Profiling
Wait for your fortune without Blocking!
Objective-C Is Not Java
Ruby Performance - The Last Mile - RubyConf India 2016
JavaOne 2011 - JVM Bytecode for Dummies

What's hot (20)

PDF
JVM for Dummies - OSCON 2011
PDF
Lagergren jvmls-2013-final
PDF
Down the Rabbit Hole: An Adventure in JVM Wonderland
PDF
The Year of JRuby - RubyC 2018
PDF
Beyond JVM - YOW! Sydney 2013
PPTX
Scala Refactoring for Fun and Profit (Japanese subtitles)
PPT
iOS Multithreading
KEY
The Why and How of Scala at Twitter
PDF
Fast as C: How to Write Really Terrible Java
PDF
How Scala code is expressed in the JVM
PDF
Introduction of failsafe
PPTX
From Ruby to Scala
PDF
JVM Mechanics: When Does the JVM JIT & Deoptimize?
PDF
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
PDF
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
PPTX
Scala Matsuri 2016: Japanese Text Mining with Scala and Spark
PDF
JRuby and Invokedynamic - Japan JUG 2015
PDF
The State of Managed Runtimes 2013, by Attila Szegedi
PPT
An Introduction to JVM Internals and Garbage Collection in Java
PDF
Kotlin @ Coupang Backend 2017
JVM for Dummies - OSCON 2011
Lagergren jvmls-2013-final
Down the Rabbit Hole: An Adventure in JVM Wonderland
The Year of JRuby - RubyC 2018
Beyond JVM - YOW! Sydney 2013
Scala Refactoring for Fun and Profit (Japanese subtitles)
iOS Multithreading
The Why and How of Scala at Twitter
Fast as C: How to Write Really Terrible Java
How Scala code is expressed in the JVM
Introduction of failsafe
From Ruby to Scala
JVM Mechanics: When Does the JVM JIT & Deoptimize?
Kotlin @ Coupang Backed - JetBrains Day seoul 2018
Weaving Dataflows with Silk - ScalaMatsuri 2014, Tokyo
Scala Matsuri 2016: Japanese Text Mining with Scala and Spark
JRuby and Invokedynamic - Japan JUG 2015
The State of Managed Runtimes 2013, by Attila Szegedi
An Introduction to JVM Internals and Garbage Collection in Java
Kotlin @ Coupang Backend 2017
Ad

Viewers also liked (20)

PPTX
The theory of concurrent programming for a seasoned programmer
PDF
Java Serialization Deep Dive
PDF
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Statechart modeling of interactive gesture-based applications
PDF
Black Hat EU 2010 - Attacking Java Serialized Communication
PDF
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
PDF
Efficient Data Storage for Analytics with Apache Parquet 2.0
PPT
Serialization/deserialization
PPTX
Construcción del blog
PPTX
2.1 ordinal numbers
PDF
Rust samurai lightning talk
PPTX
Lineas de producto PHOTOGRAPHY SHOW
PDF
Derechos de la trabajadora del hogar
PPTX
En el paradisíaco lago titicaca
PPT
Intervento renza luigi_contratto
PPTX
Module #3a nursing disciplines overview
PDF
Freello | Mobile Marketing 4 Media
PPTX
Inspire [autosaved]
The theory of concurrent programming for a seasoned programmer
Java Serialization Deep Dive
ACM ICPC 2016 NEERC (Northeastern European Regional Contest) Problems Review
Statechart modeling of interactive gesture-based applications
Black Hat EU 2010 - Attacking Java Serialized Communication
OWASP SD: Deserialize My Shorts: Or How I Learned To Start Worrying and Hate ...
Efficient Data Storage for Analytics with Apache Parquet 2.0
Serialization/deserialization
Construcción del blog
2.1 ordinal numbers
Rust samurai lightning talk
Lineas de producto PHOTOGRAPHY SHOW
Derechos de la trabajadora del hogar
En el paradisíaco lago titicaca
Intervento renza luigi_contratto
Module #3a nursing disciplines overview
Freello | Mobile Marketing 4 Media
Inspire [autosaved]
Ad

Similar to Java Serialization Facts and Fallacies (20)

PDF
Java Serialization
PPTX
File Handling - Serialization.pptx
PPTX
Serialization and performance by Sergey Morenets
PPT
Java Serialization
PPT
Java Basics
PPTX
Javasession6
PPTX
Java serialization
DOC
Serialization in .NET
PDF
[Wroclaw #7] Why So Serial?
PDF
What is Serialization in Java? | Java Tutorial | Edureka
PPTX
Сергей Моренец. Serialization and performance in Java
ODP
Science andartofbackwardscompatability
PDF
Persistent Session Storage
PPTX
Gulshan serialization inJava PPT ex.pptx
PDF
Serialization & De-serialization in Java
PPT
22CS307-ADAVANCE JAVA PROGRAMMING UNIT 4
PDF
Abusing Java Remote Interfaces
PPTX
Serialization and performance in Java
PDF
.NET Deserialization Attacks
PPT
Java basics
Java Serialization
File Handling - Serialization.pptx
Serialization and performance by Sergey Morenets
Java Serialization
Java Basics
Javasession6
Java serialization
Serialization in .NET
[Wroclaw #7] Why So Serial?
What is Serialization in Java? | Java Tutorial | Edureka
Сергей Моренец. Serialization and performance in Java
Science andartofbackwardscompatability
Persistent Session Storage
Gulshan serialization inJava PPT ex.pptx
Serialization & De-serialization in Java
22CS307-ADAVANCE JAVA PROGRAMMING UNIT 4
Abusing Java Remote Interfaces
Serialization and performance in Java
.NET Deserialization Attacks
Java basics

More from Roman Elizarov (16)

PDF
Kotlin Coroutines in Practice @ KotlinConf 2018
PDF
Deep dive into Coroutines on JVM @ KotlinConf 2017
PDF
Introduction to Coroutines @ KotlinConf 2017
PDF
Fresh Async with Kotlin @ QConSF 2017
PDF
Scale Up with Lock-Free Algorithms @ JavaOne
PDF
Kotlin Coroutines Reloaded
PDF
Lock-free algorithms for Kotlin Coroutines
PDF
Introduction to Kotlin coroutines
PDF
Многопоточное Программирование - Теория и Практика
PDF
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
PDF
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
PDF
Многопоточные Алгоритмы (для BitByte 2014)
PDF
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
PDF
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
PPTX
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
PPTX
Пишем самый быстрый хеш для кэширования данных
Kotlin Coroutines in Practice @ KotlinConf 2018
Deep dive into Coroutines on JVM @ KotlinConf 2017
Introduction to Coroutines @ KotlinConf 2017
Fresh Async with Kotlin @ QConSF 2017
Scale Up with Lock-Free Algorithms @ JavaOne
Kotlin Coroutines Reloaded
Lock-free algorithms for Kotlin Coroutines
Introduction to Kotlin coroutines
Многопоточное Программирование - Теория и Практика
ACM ICPC 2015 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2014 NEERC (Northeastern European Regional Contest) Problems Review
Многопоточные Алгоритмы (для BitByte 2014)
Теоретический минимум для понимания Java Memory Model (для JPoint 2014)
ACM ICPC 2013 NEERC (Northeastern European Regional Contest) Problems Review
ACM ICPC 2012 NEERC (Northeastern European Regional Contest) Problems Review
Пишем самый быстрый хеш для кэширования данных

Recently uploaded (20)

PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Encapsulation theory and applications.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Modernizing your data center with Dell and AMD
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
A Presentation on Artificial Intelligence
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
Big Data Technologies - Introduction.pptx
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Empathic Computing: Creating Shared Understanding
Network Security Unit 5.pdf for BCA BBA.
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Encapsulation theory and applications.pdf
The AUB Centre for AI in Media Proposal.docx
Understanding_Digital_Forensics_Presentation.pptx
Modernizing your data center with Dell and AMD
Reach Out and Touch Someone: Haptics and Empathic Computing
A Presentation on Artificial Intelligence
Advanced methodologies resolving dimensionality complications for autism neur...
Big Data Technologies - Introduction.pptx
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
20250228 LYD VKU AI Blended-Learning.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
NewMind AI Monthly Chronicles - July 2025
Dropbox Q2 2025 Financial Results & Investor Presentation
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...

Java Serialization Facts and Fallacies

  • 1. Java Serialization Facts and Fallacies © 2013, Roman Elizarov, Devexperts
  • 2. Serialization … is the process of translating data structures or object state into a format that can be stored (for example, in a file or memory buffer, or transmitted across a network connection link) and resurrected later in the same or another computer environment. -- WIkipedia Object Bytes Network/Storage
  • 3. Use-cases • Transfer data over network between cluster nodes or between servers and clients - Serialization is a key technology behind any RPC/RMI • Serialization if often used for storage - but then data transfer is still often a part of the picture
  • 4. Java serialization facility (key facts) • Had appeared in 1997 as a part of JDK 1.1 release • Uses java.io.Serializable as a marker interface • Consists of java.io.ObjectInputStream/ObjectOutputStream with related classes and interfaces • Has “Java Object Serialization Specification” that documents both API and object serialization stream protocol • Is part of any Java platform (even Android has it) • Somehow has a lot of myths around it
  • 5. It is a joke … and the only JEP joke (!) … and it has a big grain of truth: There are lots of folk myths about serialization
  • 7. Hazelcast Portable Google protobuf Hessian Protostuff Kryo Coherence POF JBoss serialization Apache Avro fast-serialization Apache Thrift Hadoop Writable jserial quickser GridGain serialization msgpack wobly Burlap EclipseLink MOXy Gson Json-lib JAXB XStream FLEXJSON Javolution XML Jackson svenson java.beans.XMLEncoder/XMLDecoder jsonij
  • 8. The obvious fact of live Real hackers write everything from scratch  Every mature appserver/framework/cache/eventbus/soa/rpc/db/etc must have its own serialization that is better than everybody else’s 
  • 9. How to choose a serialization lib? • Cross-language or Java-only • Binary or text (XML/JSON/other) format • Custom or automatic object-to-serialized format mapping • Needs format description/mapping/annotation or not • Writes object scheme (e.g. field names) with each object, once per request, once per session, never at all, or some mixed approach • Writes object fields or bean properties • Serializes object graph with shared objects or not • Performance - CPU consumption to serialize/deserialize - Network consumption (bytes per object)
  • 10. Java built-in serialization lib? • Cross-language or Java-only • Binary or text (XML/JSON/other) format • Custom or automatic object-to-serialized format mapping • Needs format description/mapping/annotation or not • Writes object scheme (e.g. field names) with each object, once per request, once per session, never at all, or some mixed approach • Writes object fields or bean properties • Serializes object graph with shared objects or not • Performance - CPU consumption to serialize/deserialize - Network consumption (bytes per object)
  • 11. Serious serialization lib choice diagram YES YES Java to JS? Cross-platform data transfer? NO Use built-in Java Serialization YES Jackson/gson/etc do further research NO Do you have problems with it? NO
  • 12. Serialization myth #1 I’ve found this framework called XYZ… … it has really simple and fast serialization All you have to do is to implement this interface! Easy! I’ve tested it! It is really fast! It outperforms everything else I’ve tried! /* This is almost actual real-life conversation. * Names changed */
  • 13. In fact this is corollary to “The obvious fact of life” • Of course it is the fastest possible way! But this does not solve the most complex serialization problems - You can implement java.io.Externalizable if you are up to writing a custom serialization code for each object. • Maintenance cost/efforts are often underestimated - How are you planning to evolve your system? Add new fields? New classes? • What will keep you from forgetting to read/write them? • Ah… yes, you write a lot of extra tests in addition to writing your write/read methods to make sure your serialization works - How about on-the-wire compatibility of old code and new one? • But I deploy all of my system with a new version at once! • But how do you work with it during development?
  • 14. Serialization myth #2 java.io.Serializable … “This interface limits how its implementing classes can change in the future. By implementing Serializable you expose your flexible in-memory implementation details as a rigid binary representation. Simple code changes--like renaming private fields--are not safe when the changed class is serializable.” http://developer.android.com/reference/java/io/Serializable.html
  • 15. So what about this code evolution? write read got
  • 16. serialVersionUID • serialVersionUID = crazy hash of your class - If you change your class -> serialVersionUID changes -> incompatible • But you can set serialVersionUID as a “private static final long” field in your class to fix it. • With a fixed serialVersionUID java serialization supports: - Adding/removing fields without any additional hurdles and annotations - Moving fields around in class definition - Extracting/inlining super-classes (but without moving serializable fields up/down class hierarchy). • It basically means you can evolve your class at will, just a you would do with key-value map in json/other-untyped-system
  • 17. serialver • That’s the tool to compute serialVersionUID as some crazy hash of your class - You only need it when your class is “in the wild” without explicitly assigned serialVersionUID and you want to be backwards compatible with that version of it • DO NOT USE serialver for freshly created classes - It is a pseudo random number that serves no purpose, but to make your gzipped class files and gzipped serial files larger - Just set serialVersionUID = 0 • It is no worse, but better than any other value
  • 18. So what about this code evolution (2)? wrie got symbol – as written quantity – as written price – as written orderType – null read Profit! It works the other way around, too!
  • 19. What if I want to make my class incompatible? • You have two choices - serialVersionUID++ - Rename class • So, having serialVersionUID in a modern serialization framework is an optional feature - They just did not have refactorings back then in 1996, so renaming a class was not an option for them - They chose a default of “your changes break serialization” and force you to explicitly declare serialVersionUID if you want “your changes are backwards compatible” • I would have preferred the other default, but having to manually add serialVersionUID is a small price to pay for a serialization facility that you have out-of-the box in Java.
  • 20. Complex changes made compatible • Java serialization has a bunch of features to support radical evolution of serializable classes in backwards-compatible way: - writeReplace/readResolve • Lets you migrate your code to a completely new class, but still use the old class for serialization. - putFields/writeFields/readFields Lets you completely redo the fields of the object, but still be compatible with the old set of fields for serialization. It really helps if you are working on, evolving, improving, and refactoring really big projects
  • 21. What about “renaming private fields is unsafe”? • It is just a matter of developer’s choice on what actually constitutes “serial form” of the class: - It’s private fields • like in Java serialization - It’s public API with its getter/setter methods • aka “Bean Serialization” • many serialization frameworks follow this approach • good match when you need to expose public API via serialization - A totally separate code, file, or annotation defines serial form • you have to write it in addition to your class • more work, but the approach is the most flexible and pays of in case of highly heterogeneous system (many platforms/languages)
  • 24. In practice • It really matters when very verbose text formats like XML are used in a combination with not-so-fast serializer implementations • Most binary formats are fast and compact enough, so that they do not constitute a bottleneck in the system - Data is usually expensive to acquire (compute, fetch from db, etc) - Otherwise (if data is in cache) and the only action of the system is to retrieve it from cache and send, then cache serialized data in binary form, thus making serialization performance irrelevant aka “marshalled object”
  • 25. Popular myth #4 © Stack Overflow authors, emphasis is mine
  • 26. A popular way to benchmark serialization https://github.com/eishay/jvm-serializers/blob/master/tpc/src/serializers/JavaBuiltIn.java
  • 27. The real fact about serialization performance • ObjectOutputStream/ObjectInputStream are expensive to create - >1KB of internal buffers allocated - It matters if you really do a lot of serialization • You can reuse OOS/OIS instances if you need - It is easy when you write a single stream of values • Just keep doing writeObject(!) • That is what it was designed for • The added benefit is that class descriptors are written just once - It is slightly tricky if you want to optimize it for one time writes • The actual gain is modest and depends on the kinds of objects you are serializing (their size, your heap/gc load, etc)
  • 28. The real fact about serialization performance (2) • Each ObjectOutputStream writes class descriptor on the first encounter - reset() clears internal handles table that keep both written objects (for object graph serialization) and written class descriptors • BEWARE: All written objects are retained until you call reset() • Writing class descriptor is a signification portion of both CPU time consumption and resulting space - That is the price you pay for all the automagic class evolution features without having to manually define field ids, create mappings, or write custom code - The data itself is relatively fast to write and has straightforward binary format (4 bytes per int, etc)
  • 30. Beware of benchmarks • A typical benchmark compares different serialization frameworks doing tasks of absolutely different complexity • With Java serialization facility you necessary pay for - No hassle object-to-serial format mapping • (no writing of interface descriptions, no code generation) - Support for wide range of class evolution scenarios - Support for object graph writes - Support for machines with different byte-endianness • Most significant performance/size improvements are gained by forfeiting one of these qualities - Be sure to understand what exactly you forfeit
  • 31. Real cons of Java serialization • Just one reset() to rule them all - No way to “resetObjects()” to clear all message-level object graph info (referenced objects), but keep session-level information (written class descriptors) to avoid repetition when keeping a network session to recipient open • Anyone knows ins-and-outs of JEP process to improve it? • Can use some performance improvement  - Some obviously unnecessary allocations of internal objects can be removed, code simplified, streamlined • I’m looking for help in “contributing to JDK” area - But do remember, that it rarely matters in practice
  • 32. Summary • Java serialization is there out-of-the box • It handles most Java-to-Java serialization needs well - While providing a wide range of goodies • It provides non-compact, but binary representation - It works especially good, if you have collections of repeated business objects • It covers a wide range of class evolution scenarios - Set serialVersionUID = 0 to be happy • It has decent performance that does not often show on profiler’s radar in practice • We use it in Devexperts for all our Java-to-Java non-market-data
  • 33. 46000 305400 Sent orders in one day Users on-line in one day We create professional financial software for Brokerage and Exchanges since 2002 99.99% Our software index of trouble-free operation 350 Peoples in our team 365 7 24 We support our products around-the-clock
  • 34. Headquarters 197110, 10/1 Barochnaya st. Saint Petersburg, Russia +7 812 438 16 26 mail@devexperts.com www.devexperts.com
  • 35. Contact me by email: elizarov at devexperts.com Read my blog: http://elizarov.livejournal.com/