SlideShare a Scribd company logo
State Schema Evolution for Apache Flink®
Applications
Apache Flink®
流式应用中状态的数据结构定义升级
戴资力, Tzu-Li (Gordon) Tai
Apache Flink PMC
Agenda
1. Evolving Stateful Flink Streaming Applications
2. Schema Evolution for Flink Built-in Types
3. Implementing Custom State Serializers
Flink 有状态流式应用升级的考虑要素
Flink 内建类别的数据结构定义更新
自订状态序列化器的实现
Evolving Stateful Flink Streaming Applications
Flink 有状态流式应用升级的考虑要素
Flink 流式应用升级流程解析
Anatomy of a Flink
stream job upgrade
local read / writes
that manipulate state
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
Flink 流式应用升级流程解析
Anatomy of a Flink
stream job upgrade
local read / writes
that manipulate state
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
persist to DFS
on savepoint
Flink 流式应用升级流程解析
Anatomy of a Flink
stream job upgrade
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
upgrade application
Flink 流式应用升级流程解析
Anatomy of a Flink
stream job upgrade
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
Restore state
to state
backends
Flink 流式应用升级流程解析
Anatomy of a Flink
stream job upgrade
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
continue to
access state
字体
Schema Evolution for Built-In Types
Flink 内建类别的数据结构定义更新
状态注册时使用内建序列化器
State registration with built-in serialization
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
MyStateType.class
);
ValueState<MyStateType> state = getRuntimeContext().getState(desc);
状态注册时使用内建序列化器
State registration with built-in serialization
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
MyStateType.class
);
ValueState<MyStateType> state = getRuntimeContext().getState(desc);
type information for state
状态类别资讯
State registration with built-in serialization
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
MyStateType.class
);
ValueState<MyStateType> state = getRuntimeContext().getState(desc);
type information for state
Flink infers information about the type and creates a serializer for it
● Primitive types: IntSerializer, DoubleSerializer, LongArraySerializer, etc.
● Tuples: TupleSerializer
● POJOs / Scala case classes: PojoSerializer, CaseClassSerializer
● Apache Avro types: AvroSerializer
● Fallback is Kryo: KryoSerializer
状态类别资讯
状态注册时使用内建序列化器
以 Apache Avro 进行状态数据结构定义进化
Evolving state schema for Apache Avro types
Can swap between GenericRecord and code generated SpecificRecords
Can evolve schema according to Avro specifications*
*Avro specifications: http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution
Cannot change namespace of generated SpecificRecord classes
可依据 Avro 规范* 进化状态的数据结构定义
可交替使用 GenericRecord 与代码生成的 SpecificRecord 类别
不可更动 SpecificRecord 类别的命名空间
内建型别的数据结构定义升级支援度现况
Status quo of schema evolution support
More is planned for 1.8+: POJOs, Scala case classes, Rows (for Flink Tables)
Avro types are the only built-in types that support schema evolution (as of 1.7)
Avoid using Kryo if you want evolvable schema for state
目前仅有 Avro 型别有支援数据结构定义升级 (Flink 1.7 现况)
社群有规划支援 POJOs, Scala case class, Rows 等类别的数据结构定义升级
若希望支援数据结构定义升级,请避免使用 KryoSerializer
Implementing Custom State Serializers
自订状态序列化器的实现
State registration with custom serializers
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new MyStateTypeSerializer();
);
class MyStateTypeSerializer extends TypeSerializer<MyStateType> { … }
ValueState<MyStateType> state = getRuntimeContext().getState(desc);
状态注册时使用自订序列化器
状态的数据结构定义和序列化
State Schema and Serialization
Evolving state’s data schema requires evolving the state’s serializer
The terms data schema and serialization format are interchangeable here
Depending on serialization behaviour of state backends (heap v.s. off-heap)
state migration may be required
在此,「数据结构定义」与「序列化格式」两词可交互替换
欲升级状态的数据结构定义则必须升级状态的序列化器
基于不同状态后端 (内存 / 非内存) 的序列化模式,可能需要进行状态迁移
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
Key1
Key2
Key3
Key4
Key5
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV1()
);
本地状态后端
持久保存点
使用者代码
Serialized by
V1 serializer
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
Key1
Key2
Key3
Key4
Key5
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV1()
);
本地状态后端
持久保存点
使用者代码
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
Key1
Key2
Key3
Key4
Key5
Requires
V1 serializer
for restore
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
Key1
Key2
Key3
Key4
Key5
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key 1 bytes V2
Key 2 bytes V2
Key 3 bytes V2
Key 4 bytes V2
Key 5 bytes V2
Key1
Key2
Key3
Key4
Key5
Serialized by
V2 serializer
State Serialization for Heap Backends
By nature, restoring + snapshotting state is already a state migration
process
Serialization happens on restore + snapshot:
lazy serialization, eager deserialization
Requires a written form of the previous serializer in the snapshot
反序列化发生于状态恢復阶段、序列化发生于状态的保存点生成
状态的恢復与保存点生成本质上就是一个状态迁移的过程
需要状态之前的序列化器被写入于保存点中
内存式后端的状态序列化模式
State Serialization for
Out-of-Core Backends
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV1()
);
Key1bytesV1
Key2bytesV1
Key3bytesV1
Key4bytesV1
Key5bytesV1
…01110
…01110
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
State Serialization for
Out-of-Core Backends
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV1()
);
Key1bytesV1
Key2bytesV1
Key3bytesV1
Key4bytesV1
Key5bytesV1
…01110
…01110
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
File transfer Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
File transfer
V1
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
…01110
…01110
state access with
V2 serializer?
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
state access with
V2 serializer?
Requires Migration!
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV2
Key2bytesV2
Key3bytes
Key4bytes
Key5bytes
…01110
…01110
V2
V2
V2
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V2
Key 2 bytes V2
Key 3 bytes V2
Key 4 bytes V2
Key 5 bytes V2
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV2
Key2bytesV2
Key3bytes
Key4bytes
Key5bytes
…01110
…01110
V2
V2
V2
File transfer
State Serialization for Out-of-Core Backends
After restore, state migration occurs on first access if schema has
changed
Serialization happens on every state access:
Eager serialization, lazy deserialization
The previous serializer is required if state migration occurs
状态恢復后,第一次的状态注册即视需求进行发生状态迁移
若需要进行状态迁移,则需要使用到状态的前一个序列化器
非内存式后端的状态序列化模式
序列化、反序列化会发生于每一次状态的读写
编程抽象类:TypeSerializerSnapshot
Main abstraction: TypeSerializerSnapshot
interface TypeSerializerSnapshot<T> {
int getCurrentVersion();
void writeSnapshot(DataOutputView out);
void readSnapshot(int readVersion, DataInputView in, ClassLoader userCodeClassloader);
TypeSerializer<T> restoreSerializer();
TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer);
}
编程抽象类:TypeSerializerSnapshot
Main abstraction:
TypeSerializerSnapshot
Represents the written form of a state’s serializer, written to snapshots
代表着写入于保存点中状态的序列化器
interface TypeSerializerSnapshot<T> {
int getCurrentVersion();
void writeSnapshot(DataOutputView out);
void readSnapshot(int readVersion, DataInputView in, ClassLoader userCodeClassloader);
TypeSerializer<T> restoreSerializer();
TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer);
}
Encodes information about the state’s written schema + serializer configuration
Serves as a factory for the previous serializer
拥有关于状态被序列化的格式以及序列化器的设定相关资讯
可用于建构状态被写入时所使用的序列化器
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
Key1
Key2
Key3
Key4
Key5
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV1()
);
本地状态后端
持久保存点
使用者代码
Serialized by
SerializerV1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1SnapshotSerializerV1
.snapshotConfiguration.write(...)
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
Key1
Key2
Key3
Key4
Key5
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
SerializerV1
SerializerV1Snapshot
.restoreSerializer();
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
内存式后端的状态序列化模式
State Serialization for
Heap Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
Key1
Key2
Key3
Key4
Key5
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
SerializerV1
Deserialized by
SerializerV1
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
…01110
…01110
state access with
V2 serializer?
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
TypeSerializerSchemaCompatibility<T> compat =
serializerV1Snapshot
.resolveSchemaCompatibility(serializerV2)
if (compat.isCompatibleAfterMigration()) {
// migrate the state schema
}
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
SerializerV1
SerializerV1Snapshot
.restoreSerializer();
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
SerializerV1
read
State
object
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
SerializerV1
State
object SerializerV2
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV1
Key2bytesV1
Key3bytes
Key4bytesV1
Key5bytesV1
V1
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
SerializerV1Snapshot
SerializerV1
State
object SerializerV2
write
State Serialization for
Out-of-Core Backends
User code
Local state
backend
Persisted
savepoint
本地状态后端
持久保存点
使用者代码
非内存式后端的状态序列化模式
Key 1 bytes V1
Key 2 bytes V1
Key 3 bytes V1
Key 4 bytes V1
Key 5 bytes V1
ValueStateDescriptor<MyStateType> desc =
new ValueStateDescriptor<>(
“my-value-state”,
new SerializerV2()
);
Key1bytesV2
Key2bytesV2
Key3bytes
Key4bytes
Key5bytes
…01110
…01110
V2
V2
V2
范例:可进化序列化格式的 PojoSerializer
Example: Evolvable PojoSerializer [FLINK-10987]
class Employee {
int age,
String name,
Department dep,
...
}
Example: Evolvable PojoSerializer [FLINK-10987]
class Employee {
int age,
String name,
Department dep,
...
}
write
field
name
IntSerializer
范例:可进化序列化格式的 PojoSerializer
Example: Evolvable PojoSerializer [FLINK-10987]
class Employee {
int age,
String name,
Department dep,
...
}
write
field
name StringSerializer
范例:可进化序列化格式的 PojoSerializer
Example: Evolvable PojoSerializer [FLINK-10987]
class Employee {
int age,
String name,
Department dep,
...
}
write
field
name PojoSerializer
范例:可进化序列化格式的 PojoSerializer
Example: Evolvable
PojoSerializer [FLINK-10987]
class PojoSerializer<T> extends TypeSerializer<T> {
private Field[] fields;
private TypeSerializer<?>[] fieldSerializers;
…
public TypeSerializerSnapshot<T> snapshotConfiguration {
return new PojoSerializerSnapshot<>(fields, fieldSerializers);
}
}
class Employee {
int age,
String name,
Department dep,
...
}
范例:可进化序列化格式的 PojoSerializer
class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> {
private Field[] fields;
private TypeSerializer<?>[] fieldSerializers;
/**
* Constructor for instantiating the snapshot when reading.
*/
public PojoSerializerSnapshot() {}
/**
* Constructor to create a snapshot for writing.
*/
public PojoSerializerSnapshot(Field[] fields, TypeSerializer<?>[]
fieldSerializers) {
this.fields = fields;
this.fieldSerializers = fieldSerializers;
}
...
}
Example: Evolvable
PojoSerializer [FLINK-10987]
范例:可进化序列化格式的 PojoSerializer
class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> {
...
public TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer) {
if (newSerializer instanceof PojoSerializer) {
Field[] newFields = ((PojoSerializer<T>) newSerializer).getFields();
if (hasDifferentTypedFields(this.fields, newFields)) {
return TypeSerializerSchemaCompatibility.incompatible();
} else if (hasNewFields(this.fields, newFields) || hasRemovedFields(this.fields, newFields)) {
return TypeSerializerSchemaCompatibility.compatibleAfterMigration();
}
return TypeSerializerSchemaCompatibility.compatibleAsIs();
}
return TypeSerializerSchemaCompatibility.incompatible();
}
}
Example: Evolvable
PojoSerializer [FLINK-10987]
范例:可进化序列化格式的 PojoSerializer
class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> {
...
public TypeSerializer<T> restoreSerializer() {
return new PojoSerializer<>(fields, fieldSerializers);
}
}
Example: Evolvable
PojoSerializer [FLINK-10987]
范例:可进化序列化格式的 PojoSerializer
Miscellaneous Best Practices
Avoid classname changes to the serializer snapshot class
Use CompositeSerializerSnapshot to handle nested TypeSerializers
避免 TypeSerializerSnapshot 实现类名被更动
类名为读取 TypeSerializerSnapshot 的入口点
避免使用匿名类或巢状类作为 TypeSerializerSnapshot 的实现
可利用 CompositeSerializerSnapshot 类应付巢状的 TypeSerializer
实现最佳守则
Classname is the entrypoint to reading a serializer snapshot
Avoid using anonymous or nested classes for snapshot classes
字体
Conclusion
总结
Flink 1.7 now supports state schema evolution
自 Flink 1.7 开始支援状态的数据结构定义升级
Avro schema evolution is supported; more support is on the radar
Covered details on implementing custom state
serializers with evolve-able schema
支援 Avro 数据结构定义升级;支援其他原生类别的数据结构定义升级将会在未来持续增加
针对可升级数据结构定义的状态序列化器的实现方法进行解析
State schema evolution for Apache Flink Applications

More Related Content

PPTX
Practical learnings from running thousands of Flink jobs
PDF
Building a fully managed stream processing platform on Flink at scale for Lin...
PPTX
Apache Flink in the Cloud-Native Era
PPTX
Deep Dive into Apache Kafka
PDF
Apache Flink internals
PPTX
Evening out the uneven: dealing with skew in Flink
PPTX
Optimizing Apache Spark SQL Joins
PDF
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning
Practical learnings from running thousands of Flink jobs
Building a fully managed stream processing platform on Flink at scale for Lin...
Apache Flink in the Cloud-Native Era
Deep Dive into Apache Kafka
Apache Flink internals
Evening out the uneven: dealing with skew in Flink
Optimizing Apache Spark SQL Joins
From Message to Cluster: A Realworld Introduction to Kafka Capacity Planning

What's hot (20)

PDF
Flink powered stream processing platform at Pinterest
PDF
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
PPTX
Where is my bottleneck? Performance troubleshooting in Flink
PPTX
The top 3 challenges running multi-tenant Flink at scale
PPTX
Autoscaling Flink with Reactive Mode
PPTX
Kafka replication apachecon_2013
PDF
Introduction to Apache Flink - Fast and reliable big data processing
PPTX
Using the New Apache Flink Kubernetes Operator in a Production Deployment
PPTX
Introduction to Kafka Cruise Control
PPTX
The Current State of Table API in 2022
PPTX
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
PDF
Presto on YARNの導入・運用
PPTX
Kafka 101
PPTX
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
PPTX
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
PDF
Disaster Recovery Plans for Apache Kafka
PDF
[2018] MySQL 이중화 진화기
PDF
Building a SIMD Supported Vectorized Native Engine for Spark SQL
PPTX
Apache Kafka at LinkedIn
PDF
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Flink powered stream processing platform at Pinterest
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Where is my bottleneck? Performance troubleshooting in Flink
The top 3 challenges running multi-tenant Flink at scale
Autoscaling Flink with Reactive Mode
Kafka replication apachecon_2013
Introduction to Apache Flink - Fast and reliable big data processing
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Introduction to Kafka Cruise Control
The Current State of Table API in 2022
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Presto on YARNの導入・運用
Kafka 101
Kafka Tutorial - Introduction to Apache Kafka (Part 1)
Demystifying flink memory allocation and tuning - Roshan Naik, Uber
Disaster Recovery Plans for Apache Kafka
[2018] MySQL 이중화 진화기
Building a SIMD Supported Vectorized Native Engine for Spark SQL
Apache Kafka at LinkedIn
Disaster Recovery with MirrorMaker 2.0 (Ryanne Dolan, Cloudera) Kafka Summit ...
Ad

Recently uploaded (20)

PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Encapsulation theory and applications.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
A Presentation on Artificial Intelligence
PDF
Electronic commerce courselecture one. Pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Digital-Transformation-Roadmap-for-Companies.pptx
Big Data Technologies - Introduction.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
Encapsulation theory and applications.pdf
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Encapsulation_ Review paper, used for researhc scholars
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Empathic Computing: Creating Shared Understanding
Advanced methodologies resolving dimensionality complications for autism neur...
The AUB Centre for AI in Media Proposal.docx
A Presentation on Artificial Intelligence
Electronic commerce courselecture one. Pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Ad

State schema evolution for Apache Flink Applications

  • 1. State Schema Evolution for Apache Flink® Applications Apache Flink® 流式应用中状态的数据结构定义升级 戴资力, Tzu-Li (Gordon) Tai Apache Flink PMC
  • 2. Agenda 1. Evolving Stateful Flink Streaming Applications 2. Schema Evolution for Flink Built-in Types 3. Implementing Custom State Serializers Flink 有状态流式应用升级的考虑要素 Flink 内建类别的数据结构定义更新 自订状态序列化器的实现
  • 3. Evolving Stateful Flink Streaming Applications Flink 有状态流式应用升级的考虑要素
  • 4. Flink 流式应用升级流程解析 Anatomy of a Flink stream job upgrade local read / writes that manipulate state User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码
  • 5. Flink 流式应用升级流程解析 Anatomy of a Flink stream job upgrade local read / writes that manipulate state User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 persist to DFS on savepoint
  • 6. Flink 流式应用升级流程解析 Anatomy of a Flink stream job upgrade User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 upgrade application
  • 7. Flink 流式应用升级流程解析 Anatomy of a Flink stream job upgrade User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 Restore state to state backends
  • 8. Flink 流式应用升级流程解析 Anatomy of a Flink stream job upgrade User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 continue to access state
  • 9. 字体 Schema Evolution for Built-In Types Flink 内建类别的数据结构定义更新
  • 10. 状态注册时使用内建序列化器 State registration with built-in serialization ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, MyStateType.class ); ValueState<MyStateType> state = getRuntimeContext().getState(desc);
  • 11. 状态注册时使用内建序列化器 State registration with built-in serialization ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, MyStateType.class ); ValueState<MyStateType> state = getRuntimeContext().getState(desc); type information for state 状态类别资讯
  • 12. State registration with built-in serialization ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, MyStateType.class ); ValueState<MyStateType> state = getRuntimeContext().getState(desc); type information for state Flink infers information about the type and creates a serializer for it ● Primitive types: IntSerializer, DoubleSerializer, LongArraySerializer, etc. ● Tuples: TupleSerializer ● POJOs / Scala case classes: PojoSerializer, CaseClassSerializer ● Apache Avro types: AvroSerializer ● Fallback is Kryo: KryoSerializer 状态类别资讯 状态注册时使用内建序列化器
  • 13. 以 Apache Avro 进行状态数据结构定义进化 Evolving state schema for Apache Avro types Can swap between GenericRecord and code generated SpecificRecords Can evolve schema according to Avro specifications* *Avro specifications: http://avro.apache.org/docs/1.7.7/spec.html#Schema+Resolution Cannot change namespace of generated SpecificRecord classes 可依据 Avro 规范* 进化状态的数据结构定义 可交替使用 GenericRecord 与代码生成的 SpecificRecord 类别 不可更动 SpecificRecord 类别的命名空间
  • 14. 内建型别的数据结构定义升级支援度现况 Status quo of schema evolution support More is planned for 1.8+: POJOs, Scala case classes, Rows (for Flink Tables) Avro types are the only built-in types that support schema evolution (as of 1.7) Avoid using Kryo if you want evolvable schema for state 目前仅有 Avro 型别有支援数据结构定义升级 (Flink 1.7 现况) 社群有规划支援 POJOs, Scala case class, Rows 等类别的数据结构定义升级 若希望支援数据结构定义升级,请避免使用 KryoSerializer
  • 15. Implementing Custom State Serializers 自订状态序列化器的实现
  • 16. State registration with custom serializers ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new MyStateTypeSerializer(); ); class MyStateTypeSerializer extends TypeSerializer<MyStateType> { … } ValueState<MyStateType> state = getRuntimeContext().getState(desc); 状态注册时使用自订序列化器
  • 17. 状态的数据结构定义和序列化 State Schema and Serialization Evolving state’s data schema requires evolving the state’s serializer The terms data schema and serialization format are interchangeable here Depending on serialization behaviour of state backends (heap v.s. off-heap) state migration may be required 在此,「数据结构定义」与「序列化格式」两词可交互替换 欲升级状态的数据结构定义则必须升级状态的序列化器 基于不同状态后端 (内存 / 非内存) 的序列化模式,可能需要进行状态迁移
  • 18. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint Key1 Key2 Key3 Key4 Key5 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV1() ); 本地状态后端 持久保存点 使用者代码
  • 19. Serialized by V1 serializer 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint Key1 Key2 Key3 Key4 Key5 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV1() ); 本地状态后端 持久保存点 使用者代码 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1
  • 20. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1
  • 21. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 Key1 Key2 Key3 Key4 Key5 Requires V1 serializer for restore
  • 22. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 Key1 Key2 Key3 Key4 Key5
  • 23. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key 1 bytes V2 Key 2 bytes V2 Key 3 bytes V2 Key 4 bytes V2 Key 5 bytes V2 Key1 Key2 Key3 Key4 Key5 Serialized by V2 serializer
  • 24. State Serialization for Heap Backends By nature, restoring + snapshotting state is already a state migration process Serialization happens on restore + snapshot: lazy serialization, eager deserialization Requires a written form of the previous serializer in the snapshot 反序列化发生于状态恢復阶段、序列化发生于状态的保存点生成 状态的恢復与保存点生成本质上就是一个状态迁移的过程 需要状态之前的序列化器被写入于保存点中 内存式后端的状态序列化模式
  • 25. State Serialization for Out-of-Core Backends ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV1() ); Key1bytesV1 Key2bytesV1 Key3bytesV1 Key4bytesV1 Key5bytesV1 …01110 …01110 User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式
  • 26. State Serialization for Out-of-Core Backends ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV1() ); Key1bytesV1 Key2bytesV1 Key3bytesV1 Key4bytesV1 Key5bytesV1 …01110 …01110 User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 File transfer Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1
  • 27. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() );
  • 28. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 File transfer V1
  • 29. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 …01110 …01110 state access with V2 serializer?
  • 30. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 state access with V2 serializer? Requires Migration!
  • 31. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV2 Key2bytesV2 Key3bytes Key4bytes Key5bytes …01110 …01110 V2 V2 V2
  • 32. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V2 Key 2 bytes V2 Key 3 bytes V2 Key 4 bytes V2 Key 5 bytes V2 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV2 Key2bytesV2 Key3bytes Key4bytes Key5bytes …01110 …01110 V2 V2 V2 File transfer
  • 33. State Serialization for Out-of-Core Backends After restore, state migration occurs on first access if schema has changed Serialization happens on every state access: Eager serialization, lazy deserialization The previous serializer is required if state migration occurs 状态恢復后,第一次的状态注册即视需求进行发生状态迁移 若需要进行状态迁移,则需要使用到状态的前一个序列化器 非内存式后端的状态序列化模式 序列化、反序列化会发生于每一次状态的读写
  • 34. 编程抽象类:TypeSerializerSnapshot Main abstraction: TypeSerializerSnapshot interface TypeSerializerSnapshot<T> { int getCurrentVersion(); void writeSnapshot(DataOutputView out); void readSnapshot(int readVersion, DataInputView in, ClassLoader userCodeClassloader); TypeSerializer<T> restoreSerializer(); TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer); }
  • 35. 编程抽象类:TypeSerializerSnapshot Main abstraction: TypeSerializerSnapshot Represents the written form of a state’s serializer, written to snapshots 代表着写入于保存点中状态的序列化器 interface TypeSerializerSnapshot<T> { int getCurrentVersion(); void writeSnapshot(DataOutputView out); void readSnapshot(int readVersion, DataInputView in, ClassLoader userCodeClassloader); TypeSerializer<T> restoreSerializer(); TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer); } Encodes information about the state’s written schema + serializer configuration Serves as a factory for the previous serializer 拥有关于状态被序列化的格式以及序列化器的设定相关资讯 可用于建构状态被写入时所使用的序列化器
  • 36. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint Key1 Key2 Key3 Key4 Key5 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV1() ); 本地状态后端 持久保存点 使用者代码 Serialized by SerializerV1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1SnapshotSerializerV1 .snapshotConfiguration.write(...)
  • 37. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 Key1 Key2 Key3 Key4 Key5 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot SerializerV1 SerializerV1Snapshot .restoreSerializer(); ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() );
  • 38. 内存式后端的状态序列化模式 State Serialization for Heap Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 Key1 Key2 Key3 Key4 Key5 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); SerializerV1 Deserialized by SerializerV1
  • 39. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 …01110 …01110 state access with V2 serializer? Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot
  • 40. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot TypeSerializerSchemaCompatibility<T> compat = serializerV1Snapshot .resolveSchemaCompatibility(serializerV2) if (compat.isCompatibleAfterMigration()) { // migrate the state schema }
  • 41. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot SerializerV1 SerializerV1Snapshot .restoreSerializer();
  • 42. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot SerializerV1 read State object
  • 43. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot SerializerV1 State object SerializerV2
  • 44. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV1 Key2bytesV1 Key3bytes Key4bytesV1 Key5bytesV1 V1 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 SerializerV1Snapshot SerializerV1 State object SerializerV2 write
  • 45. State Serialization for Out-of-Core Backends User code Local state backend Persisted savepoint 本地状态后端 持久保存点 使用者代码 非内存式后端的状态序列化模式 Key 1 bytes V1 Key 2 bytes V1 Key 3 bytes V1 Key 4 bytes V1 Key 5 bytes V1 ValueStateDescriptor<MyStateType> desc = new ValueStateDescriptor<>( “my-value-state”, new SerializerV2() ); Key1bytesV2 Key2bytesV2 Key3bytes Key4bytes Key5bytes …01110 …01110 V2 V2 V2
  • 46. 范例:可进化序列化格式的 PojoSerializer Example: Evolvable PojoSerializer [FLINK-10987] class Employee { int age, String name, Department dep, ... }
  • 47. Example: Evolvable PojoSerializer [FLINK-10987] class Employee { int age, String name, Department dep, ... } write field name IntSerializer 范例:可进化序列化格式的 PojoSerializer
  • 48. Example: Evolvable PojoSerializer [FLINK-10987] class Employee { int age, String name, Department dep, ... } write field name StringSerializer 范例:可进化序列化格式的 PojoSerializer
  • 49. Example: Evolvable PojoSerializer [FLINK-10987] class Employee { int age, String name, Department dep, ... } write field name PojoSerializer 范例:可进化序列化格式的 PojoSerializer
  • 50. Example: Evolvable PojoSerializer [FLINK-10987] class PojoSerializer<T> extends TypeSerializer<T> { private Field[] fields; private TypeSerializer<?>[] fieldSerializers; … public TypeSerializerSnapshot<T> snapshotConfiguration { return new PojoSerializerSnapshot<>(fields, fieldSerializers); } } class Employee { int age, String name, Department dep, ... } 范例:可进化序列化格式的 PojoSerializer
  • 51. class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> { private Field[] fields; private TypeSerializer<?>[] fieldSerializers; /** * Constructor for instantiating the snapshot when reading. */ public PojoSerializerSnapshot() {} /** * Constructor to create a snapshot for writing. */ public PojoSerializerSnapshot(Field[] fields, TypeSerializer<?>[] fieldSerializers) { this.fields = fields; this.fieldSerializers = fieldSerializers; } ... } Example: Evolvable PojoSerializer [FLINK-10987] 范例:可进化序列化格式的 PojoSerializer
  • 52. class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> { ... public TypeSerializerSchemaCompatibility<T> resolveSchemaCompatibility(TypeSerializer<T> newSerializer) { if (newSerializer instanceof PojoSerializer) { Field[] newFields = ((PojoSerializer<T>) newSerializer).getFields(); if (hasDifferentTypedFields(this.fields, newFields)) { return TypeSerializerSchemaCompatibility.incompatible(); } else if (hasNewFields(this.fields, newFields) || hasRemovedFields(this.fields, newFields)) { return TypeSerializerSchemaCompatibility.compatibleAfterMigration(); } return TypeSerializerSchemaCompatibility.compatibleAsIs(); } return TypeSerializerSchemaCompatibility.incompatible(); } } Example: Evolvable PojoSerializer [FLINK-10987] 范例:可进化序列化格式的 PojoSerializer
  • 53. class PojoSerializerSnapshot<T> implements TypeSerializerSnapshot<T> { ... public TypeSerializer<T> restoreSerializer() { return new PojoSerializer<>(fields, fieldSerializers); } } Example: Evolvable PojoSerializer [FLINK-10987] 范例:可进化序列化格式的 PojoSerializer
  • 54. Miscellaneous Best Practices Avoid classname changes to the serializer snapshot class Use CompositeSerializerSnapshot to handle nested TypeSerializers 避免 TypeSerializerSnapshot 实现类名被更动 类名为读取 TypeSerializerSnapshot 的入口点 避免使用匿名类或巢状类作为 TypeSerializerSnapshot 的实现 可利用 CompositeSerializerSnapshot 类应付巢状的 TypeSerializer 实现最佳守则 Classname is the entrypoint to reading a serializer snapshot Avoid using anonymous or nested classes for snapshot classes
  • 56. Flink 1.7 now supports state schema evolution 自 Flink 1.7 开始支援状态的数据结构定义升级 Avro schema evolution is supported; more support is on the radar Covered details on implementing custom state serializers with evolve-able schema 支援 Avro 数据结构定义升级;支援其他原生类别的数据结构定义升级将会在未来持续增加 针对可升级数据结构定义的状态序列化器的实现方法进行解析