Make AI ecosystem more interoperable

Make AI ecosystem
more interoperable
Kazuaki Ishizaki
IBM Research - Tokyo

About Me – Kazuaki Ishizaki
▪ Researcher at IBM Research – Tokyo
https://ibm.biz/ishizaki
– Compiler optimization, language runtime, and parallel processing
▪ Apache Spark committer from 2018/9 (SQL module)
▪ Work for IBM Java (Open J9, now) from 1996
– Technical lead for Just-in-time compiler for PowerPC
▪ ACM Distinguished Member
▪ SNS
– @kiszk
– https://www.slideshare.net/ishizaki/
2 Make AI ecosystem more interoperable - Kazuaki Ishizaki

Agenda
▪ Motivation
▪ What is an inhibitor of interoperability?
– Endianness on each machine
▪ What is endian?
▪ What happens in a program?
▪ How to find and fix issues?
▪ How to keep interoperability in AI ecosystem

Very Impressive Performance Improvement on x86
▪ Improve performance of Spark with Python by over 100x
Source: https://databricks.com/blog/2017/10/30/introducing-vectorized-udfs-for-pyspark.html
Apache Spark uses
Apache Arrow
A cross-language
development platform
for in-memory analytics

I Want to Do This on IBM Z
$ bin/pyspark
...
>>> df.show()

Oh!!!
$ bin/pyspark
...
>>> df.show()
...
java.lang.IllegalStateException: Arrow only runs on LittleEndian systems…
...
>>>

Apache Arrow supported only Little Endian
$ bin/pyspark
...
>>> df.show()
...
java.lang.IllegalStateException: Arrow only runs on LittleEndian systems…
...
>>>

One Pager for Current AI Ecosystem
▪ Data can be exchanged among little endian machines (i.e. x86, Arm,
PowerLinux, …)
PowerLinux
Arm Origin: https://www.dremio.com/webinars/apache-arrow-in-theory-practice/

One Pager for Expected AI Ecosystem
▪ Data can be exchanged among both endian machines (i.e. x86, Arm, s390x,
PowerLinux, …)
PowerLinux
Arm
s390x
Origin: https://www.dremio.com/webinars/apache-arrow-in-theory-practice/

What is Endian?
▪ Data layout on a memory
– Example of integer 32bit value
0x01020304

What is Endian?
04
memory layout
Little endian
03
0x01020304
02 01
10
addr 11 12 13
x86_64, ppc64le, …

What is Endian?
04
memory layout
Little endian Big endian
03
0x01020304
02 01 01 02 03 04
memory layout
10
addr 11 12 13 10
addr 11 12 13
x86_64, ppc64le, … s390x

Why Programs Usually Work Well?
▪ Programs work well without special cares if
– No explicit memory access of a subset of data and/or of a super-set of data
– A program is closed itself (no data exchange with other machines)
int32_t a, b;
int16_t d;
...
int32_t c = a + b;
int16_t d = static_cast<int16_t>(c) + 1;
int32_t e = static_cast<int32_t>(d);
...

How to Find Issues on Different Endians?
▪ Find bad smells in source code

Does It Have Bad Smell?
▪ Get 32-bit data from int8 datum by reinterpret_cast
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 3;
int32_t i32 = *reinterpret_cast<int32_t *>(i8p);
printf(“%08x”, i32);

Results are Different on Different Endian Machines
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 3;
01020304 04030201

Why Problem Occurs?
▪ Different endian processors interpret the same memory sequence in
different ways
04 03 02 01
04 03 02 01
memory layout
memory layout
04030201
01020304
uint8_t *i8p = ...
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 1;
i8p i8p

Support Both Endians
▪ Swap data for big endian
uint8_t *i8p = ...;
i8p[0] = 4; i8p[1] = 3; i8p[2] = 2; i8p[3] = 1;
int32_t i32 = reinterpret_cast<int32_t *>(i8p);
#if !defined(__LITTLE_ENDIAN__)
i32 = __builtin_bswap32(i32);
#endif
01020304 01020304

Support Both Endians in Java
▪ Swap data for big endian
static final boolean LITTLE_ENDIAN =
ByteOrder.nativeOrder() == ByteOrder.LITTLE_ENDIAN;
int i32 = ... // get the value from a buffer
if (!LITTLE_ENDIAN) {
i32 = Integer.reverseBytes(i32);
}

Potential Bad Smell and Enhancements
▪ Intra-process (i.e. In-memory)
– Get data from the different data type
04 03 02 01
04 03 02 01
memory layout
memory layout
01020304
01020304 Swap

Potential Bad Smell and Enhancements
▪ Intra-process (i.e. In-memory)
– Get data from memory in different data type
▪ Inter-process (i.e. host – client)
– Exchange data with other machines
01 02 03 04
04 03 02 01
04 03 02 01
memory layout
memory layout
memory layout
04 03 02 01
memory layout
01020304
01020304
Swap
01020304
Swap

Can We Find All Issues?
▪ Find bad smells in source code
New code is coming everyday

Automatically Detect Issues
▪ Continuously run test cases on machines with different endians
Run test cases

Automatically Detect Issues
▪ Continuously run test cases on machines with different endians
▪ Enhance code to support both endians if we find issues
Run test cases Enhance code

CI Tools and Instances Help OSS Community
▪ TravisCI
▪ Jenkins
▪ Virtual machine instance
Enhance code

Free Resources of Big Endian for OSS Community
▪ TravisCI
– https://docs.travis-ci.com/user/multi-cpu-architectures/
▪ Jenkins
– https://osuosl.org/services/ibm-z/
▪ Virtual machine instance
– https://developer.ibm.com/components/ibm-linuxone/gettingstarted/

Apache Arrow Supports Both Endians
▪ Intra-process (from Apache Arrow 3.0)
– C and Java bindings
▪ Inter-process (from Apache Arrow 4.0)
– C bindings
CI on big ending is running for every PR update

Takeaway
▪ We know different endians on machines
– Little endian and big endian
▪ When do we take care of endians?
– Get a sub-set or super-set of data in memory
– Exchange data with other machines
▪ How to find potential issues and support both endians?
– Find bad smell
– Automatically run test cases
▪ How to keep interoperability in AI ecosystem?
– Easy and free to run CI on different types of machines
Visit https://www.slideshare.net/ishizaki if you are interested in this slide

Make AI ecosystem more interoperable

More Related Content

What's hot (20)

Similar to Make AI ecosystem more interoperable (20)

More from Kazuaki Ishizaki (17)

Recently uploaded (20)

Make AI ecosystem more interoperable