SlideShare a Scribd company logo
Parsing JSON Really Quickly: Lessons Learned
ParsingJSONReallyQuickly:LessonsLearned
DanielLemire
blog:https://lemire.me
twitter:@lemire
GitHub:https://github.com/lemire/
professor(ComputerScience)atUniversitéduQuébec(TÉLUQ)
Montreal
2
Howfastcanyoureadalargefile?
Areyoulimitedbyyourdiskor
AreyoulimitedbyyourCPU?
3
AniMacdisk:2.2GB/s,FasterSSDs(e.g.,5GB/s)
areavailable
4
Readingtextlines(CPUonly)
~0.6GB/son3.4GHzSkylakeinJava
void parseLine(String s) {
volume += s.length();
}
void readString(StringReader data) {
BufferedReader bf = new BufferedReader(data);
bf.lines().forEach(s -> parseLine(s));
}
Sourceavailable.
ImprovedbyJDK-8229022
5
Readingtextlines(CPUonly)
~1.5GB/son3.4GHzSkylake
inC++(GNUGCC8.3)
size_t sum_line_lengths(char * data, size_t length) {
std::stringstream is;
is.rdbuf()->pubsetbuf(data, length);
std::string line;
size_t sumofalllinelengths{0};
while(getline(is, line)) {
sumofalllinelengths += line.size();
}
return sumofalllinelengths;
}
Sourceavailable.
6
source 7
JSON
SpecifiedbyDouglasCrockford
RFC7159byTimBrayin2013
Ubiquitousformattoexchangedata
{"Image": {"Width": 800,"Height": 600,
"Title": "View from 15th Floor",
"Thumbnail": {
"Url": "http://www.example.com/81989943",
"Height": 125,"Width": 100}
}
8
"Ourbackendspendshalfitstimeserializinganddeserializingjson"
9
JSONparsing
Readallofthecontent
CheckthatitisvalidJSON
CheckUnicodeencoding
Parsenumbers
BuildDOM(document-object-model)
Harderthanparsinglines?
10
JacksonJSONspeed(Java)
twitter.json:0.35GB/son3.4GHzSkylake
Sourcecodeavailable.
speed
Jackson(Java) 0.35GB/s
readLinesC++ 1.5GB/s
disk 2.2GB/s
11
RapidJSONspeed(C++)
twitter.json:0.650GB/son3.4GHzSkylake
speed
RapidJSON(C++) 0.65GB/s
Jackson(Java) 0.35GB/s
readLinesC++ 1.5GB/s
disk 2.2GB/s
12
simdjsonspeed(C++)
twitter.json:2.4GB/son3.4GHzSkylake
speed
simdjson(C++) 2.4GB/s
RapidJSON(C++) 0.65GB/s
Jackson(Java) 0.35GB/s
readLinesC++ 1.5GB/s
disk 2.2GB/s
13
2.4GB/sona3.4GHz(+turbo)processoris
~1.5cyclesperinputbyte
14
Trick#1:avoidhard-to-predictbranches
15
Writerandomnumbersonanarray.
while (howmany != 0) {
out[index] = random();
index += 1;
howmany--;
}
e.g.,~3cyclesperiteration
16
Writeonlyoddrandomnumbers:
while (howmany != 0) {
val = random();
if( val is odd) { // <=== new
out[index] = val;
index += 1;
}
howmany--;
}
17
From3cyclesto15cyclespervalue!
18
Gobranchless!while (howmany != 0) {
val = random();
out[index] = val;
index += (val bitand 1);
howmany--;
}
backtounder4cycles!
Detailsandcodeavailable
19
WhatifIkeeprunningthesamebenchmark?
(samepseudo-randomintegersfromrun-to-run)
20
Trick#2:Usewide"words"
Don'tprocessbytebybyte
21
Whenpossible,useSIMDAvailableonmostcommodityprocessors(ARM,x64)
Originallyadded(Pentium)formultimedia(sound)
Addwider(128-bit,256-bit,512-bit)registers
Addsnewfuninstructions:do32tablelookupsatonce.
22
ISA where max.registerwidth
ARMNEON(AArch64) mobilephones,tablets 128-bit
SSE2...SSE4.2 legacyx64(Intel,AMD) 128-bit
AVX,AVX2 mainstreamx64(Intel,AMD) 256-bit
AVX-512 latestx64(Intel) 512-bit
23
"Intrinsic"functions(C,C++,Rust,...)mappingtospecificinstructionsonspecific
instructionssets
Higherlevelfunctions(Swift,C++,...):JavaVectorAPI
Autovectorization("compilermagic")(Java,C,C++,...)
Optimizedfunctions(someinJava)
Assembly(e.g.,incrypto)
24
Trick#3:avoidmemory/objectallocation
25
Insimdjson,theDOM(document-object-model)isstoredononecontiguoustape.
26
Trick#4:measuretheperformance!
benchmark-drivendevelopment
27
ContinuousIntegrationPerformancetests
performanceregressionisabugthatshouldbespottedearly
28
Processorfrequenciesarenotconstant
Especiallyonlaptops
CPUcyclesdifferentfromtime
TimecanbenoisierthanCPUcycles
29
Specificexamples
30
Example1.UTF-8StringsareASCII(1bytepercodepoint)
Otherwisemultiplebytes(2,3or4)
Only1.1MvalidUTF-8codepoints
31
ValidatingUTF-8withif/else/while
if (byte1 < 0x80) {
return true; // ASCII
}
if (byte1 < 0xE0) {
if (byte1 < 0xC2 || byte2 > 0xBF) {
return false;
}
} else if (byte1 < 0xF0) {
// Three-byte form.
if (byte2 > 0xBF
|| (byte1 == 0xE0 && byte2 < 0xA0)
|| (byte1 == 0xED && 0xA0 <= byte2)
blablabla
) blablabla
} else {
// Four-byte form.
.... blabla
}
32
UsingSIMD
Load32-byteregisters
Use~20instructions
Nobranch,nobranchmisprediction
33
Example:Verifythatallbytevaluesarenolargerthan244
Saturatedsubtraction: x - 244 isnon-zeroifanonlyif x > 244 .
_mm256_subs_epu8(current_bytes, 244 );
Oneinstruction,checks32bytesatonce!
34
processingrandomUTF-8cycles/byte
branching 11
simdjson 0.5
20xfaster!
Sourcecodeavailable.
35
Example2.Classifyingcharacters
comma(0x2c) ,
colon(0x3a) :
brackets(0x5b,0x5d,0x7b,0x7d): [, ], {, }
white-space(0x09,0x0a,0x0d,0x20)
others
Classify16,32or64charactersatonce!
36
Dividevaluesintotwo'nibbles'
0x2cis2(highnibble)andc(lownibble)
Thereare16possiblelownibbles.
Thereare16possiblehighnibbles.
37
ARMNEONandx64processorshaveinstructionsto
lookup16-bytetablesinavectorizedmanner(16
valuesatatime):pshufb,tbl
38
Startwithanarrayof4-bitvalues
[1,1,0,2,0,5,10,15,7,8,13,9,0,13,5,1]
Createalookuptable
[200,201,202,203,204,205,206,207,208,209,210,211,212,213,214,215]
0 200,1 201,2 202
Result:
[201,201,200,202,200,205,210,215,207,208,213,209,200,213,205,201]
39
Findtwotables H1 and H2 suchasthebitwiseANDofthelookclassifythecharacters.
H1(low(c)) & H2(high(c))
comma(0x2c):1
colon(0x3a):2
brackets(0x5b,0x5d,0x7b,0x7d):4
mostwhite-space(0x09,0x0a,0x0d):8
whitespace(0x20):16
others:0
40
const uint8x16_t low_nibble_mask =
(uint8x16_t){16, 0, 0, 0, 0, 0, 0, 0, 0, 8, 12, 1, 2, 9, 0, 0};
const uint8x16_t high_nibble_mask =
(uint8x16_t){8, 0, 18, 4, 0, 1, 0, 1, 0, 0, 0, 3, 2, 1, 0, 0};
const uint8x16_t low_nib_and_mask = vmovq_n_u8(0xf);
Fiveinstructions:
uint8x16_t nib_lo = vandq_u8(chunk, low_nib_and_mask);
uint8x16_t nib_hi = vshrq_n_u8(chunk, 4);
uint8x16_t shuf_lo = vqtbl1q_u8(low_nibble_mask, nib_lo);
uint8x16_t shuf_hi = vqtbl1q_u8(high_nibble_mask, nib_hi);
return vandq_u8(shuf_lo, shuf_hi);
41
Example3.Detectingescapedcharacters
" "
 
" "
42
Canyoutellwherethestringsstartandend?
{ ""Nam[{": [ 116,"" ...
Withoutbranching?
43
Escapecharactersfollowanoddsequenceof
backslashes!
44
Identifybackslashes:
{ ""Nam[{": [ 116,""
___111________________1111_ :B
Oddandevenpositions
1_1_1_1_1_1_1_1_1_1_1_1_1_1 :E(constant)
_1_1_1_1_1_1_1_1_1_1_1_1_1_ :O(constant)
45
Doabunchofarithmeticandlogicaloperations...
(((B + (B &~(B << 1)& E))& ~B)& ~E) | (((B + ((B &~(B << 1))& O))& ~B)& E)
Result:
{ ""Nam[{": [ 116,"" ...
______1____________________
Nobranch!
46
Removetheescapedquotes,and
theremainingquotestellyouwherethestringsare!
47
{ ""Nam[{": [ 116,""
__1___1_____1________1____1 :allquotes
______1____________________ :escapedquotes
__1_________1________1____1 :string-delimiterquotes
48
Findthespanofthestring
mask = quote xor (quote << 1);
mask = mask xor (mask << 2);
mask = mask xor (mask << 4);
mask = mask xor (mask << 8);
mask = mask xor (mask << 16);
...
__1_________1________1____1 (quotes)
becomes
__1111111111_________11111_ (stringregion)
49
EntirestructureoftheJSONdocumentcanbe
identified(asabitset)withoutanybranch!
50
Example4.Decodebitindexes
Giventhebitset 1000100010001 ,wewantthelocationofthe1s(e.g.,0,4,812)
51
while (word != 0) {
result[i] = trailingzeroes(word);
word = word & (word - 1);
i++;
}
Ifnumberof1sper64-bitishardtopredict:lotsofmispredictions!!!
52
Insteadofpredictingthenumberof1sper64-bit,predictwhetheritisin
{1,2,3,4}
{5,6,7,8}
{9,10,11,12}
Easier!
53
Reducethenumberofmispredictionbydoingmoreworkperiteration:
while (word != 0) {
result[i] = trailingzeroes(word);
word = word & (word - 1);
result[i+1] = trailingzeroes(word);
word = word & (word - 1);
result[i+2] = trailingzeroes(word);
word = word & (word - 1);
result[i+3] = trailingzeroes(word);
word = word & (word - 1);
i+=4;
}
Discardbogusindexesbycountingthenumberof1sintheworddirectly(e.g.,
bitCount )
54
Example5.Numberparsingisexpensive
strtod :
90MB/s
38cyclesperbyte
10branchmissesperfloating-pointnumber
55
Checkwhetherwehave8consecutivedigits
bool is_made_of_eight_digits_fast(const char *chars) {
uint64_t val;
memcpy(&val, chars, 8);
return (((val & 0xF0F0F0F0F0F0F0F0) |
(((val + 0x0606060606060606) & 0xF0F0F0F0F0F0F0F0) >> 4))
== 0x3333333333333333);
}
56
Thenconstructthecorrespondinginteger
Usingonlythreemultiplications(insteadof7):
uint32_t parse_eight_digits_unrolled(const char *chars) {
uint64_t val;
memcpy(&val, chars, sizeof(uint64_t));
val = (val & 0x0F0F0F0F0F0F0F0F) * 2561 >> 8;
val = (val & 0x00FF00FF00FF00FF) * 6553601 >> 16;
return (val & 0x0000FFFF0000FFFF) * 42949672960001 >> 32;
}
CandoevenbetterwithSIMD
57
RuntimedispatchOnfirstcall,pointerchecksCPU,andreassignsitself.Nolanguagesupport.
58
int json_parse_dispatch(...) {
Architecture best_implementation = find_best_supported_implementation();
// Selecting the best implementation
switch (best_implementation) {
case Architecture::HASWELL:
json_parse_ptr = &json_parse_implementation<Architecture::HASWELL>;
break;
case Architecture::WESTMERE:
json_parse_ptr= &json_parse_implementation<Architecture::WESTMERE>;
break;
default:
return UNEXPECTED_ERROR;
}
return json_parse_ptr(....);
}
59
Wheretogetit?
GitHub:https://github.com/lemire/simdjson/
ModernC++,single-header(easyintegration)
ARM(e.g.,iPhone),x64(goingback10years)
Apache2.0(nohiddenpatents)
UsedbyMicrosoftFishStoreandYandexClickHouse
wrappersinPython,PHP,C#,Rust,JavaScript(node),Ruby
portstoRust,GoandC#
60
Reference
GeoffLangdale,DanielLemire,ParsingGigabytesofJSONperSecond,VLDB
Journal,https://arxiv.org/abs/1902.08318
61
Credit
GeoffLangdale(algorithmicarchitectandwizard)
Contributors:
ThomasNavennec,KaiWolf,TylerKennedy,FrankWessels,GeorgeFotopoulos,Heinz
N.Gies,EmilGedda,WojciechMuła,GeorgiosFloros,DongXie,NanXiao,Egor
Bogatov,JinxiWang,LuizFernandoPeres,WouterBolsterlee,AnishKarandikar,Reini
Urban.TomDyson,IhorDotsenko,AlexeyMilovidov,ChangLiu,SunnyGleason,John
Keiser,ZachBjornson,VitalyBaranov,JuhoLauri,MichaelEisel,IoDazaDillon,Paul
Dreik,JérémiePiotteandothers
62
63

More Related Content

PDF
TensorFlow BASTA2018 Machinelearning
PPTX
200 Open Source Projects Later: Source Code Static Analysis Experience
PDF
Everything you wanted to know about Stack Traces and Heap Dumps
PPTX
Down to Stack Traces, up from Heap Dumps
PDF
Csw2016 gong pwn_a_nexus_device_with_a_single_vulnerability
PDF
20140531 serebryany lecture02_find_scary_cpp_bugs
PDF
Checking the Source SDK Project
PDF
Pepe Vila - Cache and Syphilis [rooted2019]
TensorFlow BASTA2018 Machinelearning
200 Open Source Projects Later: Source Code Static Analysis Experience
Everything you wanted to know about Stack Traces and Heap Dumps
Down to Stack Traces, up from Heap Dumps
Csw2016 gong pwn_a_nexus_device_with_a_single_vulnerability
20140531 serebryany lecture02_find_scary_cpp_bugs
Checking the Source SDK Project
Pepe Vila - Cache and Syphilis [rooted2019]

What's hot (19)

PDF
Qt Rest Server
PDF
Rust tutorial from Boston Meetup 2015-07-22
PDF
Windbg랑 친해지기
PPTX
OWASP AppSecCali 2015 - Marshalling Pickles
PDF
Profiling Ruby
PDF
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
PDF
SFO15-500: VIXL
PPTX
How & why-memory-efficient?
PDF
Deep dive into PostgreSQL statistics.
PDF
Nodejs性能分析优化和分布式设计探讨
PPTX
How to write memory efficient code?
PDF
Csw2016 gawlik bypassing_differentdefenseschemes
PDF
JS Fest 2019 Node.js Antipatterns
PDF
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
PDF
Errors detected in C++Builder
PDF
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
PDF
20140531 serebryany lecture01_fantastic_cpp_bugs
PDF
Работа с реляционными базами данных в C++
Qt Rest Server
Rust tutorial from Boston Meetup 2015-07-22
Windbg랑 친해지기
OWASP AppSecCali 2015 - Marshalling Pickles
Profiling Ruby
Индексируем базу: как делать хорошо и не делать плохо Winter saint p 2021 m...
SFO15-500: VIXL
How & why-memory-efficient?
Deep dive into PostgreSQL statistics.
Nodejs性能分析优化和分布式设计探讨
How to write memory efficient code?
Csw2016 gawlik bypassing_differentdefenseschemes
JS Fest 2019 Node.js Antipatterns
Introduction httpClient on Java11 / Java11時代のHTTPアクセス再入門
Errors detected in C++Builder
Если нашлась одна ошибка — есть и другие. Один способ выявить «наследуемые» у...
20140531 serebryany lecture01_fantastic_cpp_bugs
Работа с реляционными базами данных в C++
Ad

Similar to Parsing JSON Really Quickly: Lessons Learned (20)

PPTX
How to Write the Fastest JSON Parser/Writer in the World
PPTX
CPP18 - String Parsing
KEY
RFC4627 Reading
PDF
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
PPTX
Computer Programming for Engineers Spring 2023Lab 8 - Pointers.pptx
DOCX
CS 112 PA #4Like the previous programming assignment, this assignm.docx
PDF
Write a C++ program 1. Study the function process_text() in file.pdf
PDF
اسلاید جلسه سوم کلاس پایتون برای هکرهای قانونی
PDF
Json demo
PDF
Syntax Analysis.pdf
PPTX
Lecture 9_Classes.pptx
PPTX
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
PDF
Functional concepts in C#
PPTX
Screaming fast json parsing on Android
PDF
C for Java programmers (part 3)
PDF
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
PDF
Safe Clearing of Private Data
PDF
100 c interview questions answers
PDF
Monadic parsers in C++
PPT
json.ppt download for free for college project
How to Write the Fastest JSON Parser/Writer in the World
CPP18 - String Parsing
RFC4627 Reading
Kernel Recipes 2019 - GNU poke, an extensible editor for structured binary data
Computer Programming for Engineers Spring 2023Lab 8 - Pointers.pptx
CS 112 PA #4Like the previous programming assignment, this assignm.docx
Write a C++ program 1. Study the function process_text() in file.pdf
اسلاید جلسه سوم کلاس پایتون برای هکرهای قانونی
Json demo
Syntax Analysis.pdf
Lecture 9_Classes.pptx
Build 2016 - B880 - Top 6 Reasons to Move Your C++ Code to Visual Studio 2015
Functional concepts in C#
Screaming fast json parsing on Android
C for Java programmers (part 3)
Analysis of Haiku Operating System (BeOS Family) by PVS-Studio. Part 2
Safe Clearing of Private Data
100 c interview questions answers
Monadic parsers in C++
json.ppt download for free for college project
Ad

More from Daniel Lemire (20)

PDF
Accurate and efficient software microbenchmarks
PDF
Fast indexes with roaring #gomtl-10
PDF
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
PDF
Ingénierie de la performance au sein des mégadonnées
PDF
SIMD Compression and the Intersection of Sorted Integers
PDF
Decoding billions of integers per second through vectorization
PDF
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
PDF
Engineering fast indexes (Deepdive)
PDF
Engineering fast indexes
PDF
MaskedVByte: SIMD-accelerated VByte
PDF
Roaring Bitmaps (January 2016)
PDF
Roaring Bitmap : June 2015 report
PDF
La vectorisation des algorithmes de compression
PDF
OLAP and more
PDF
Decoding billions of integers per second through vectorization
PDF
Extracting, Transforming and Archiving Scientific Data
KEY
Innovation without permission: from Codd to NoSQL
PDF
Write good papers
PDF
Faster Column-Oriented Indexes
PDF
Compressing column-oriented indexes
Accurate and efficient software microbenchmarks
Fast indexes with roaring #gomtl-10
Next Generation Indexes For Big Data Engineering (ODSC East 2018)
Ingénierie de la performance au sein des mégadonnées
SIMD Compression and the Intersection of Sorted Integers
Decoding billions of integers per second through vectorization
Logarithmic Discrete Wavelet Transform for High-Quality Medical Image Compres...
Engineering fast indexes (Deepdive)
Engineering fast indexes
MaskedVByte: SIMD-accelerated VByte
Roaring Bitmaps (January 2016)
Roaring Bitmap : June 2015 report
La vectorisation des algorithmes de compression
OLAP and more
Decoding billions of integers per second through vectorization
Extracting, Transforming and Archiving Scientific Data
Innovation without permission: from Codd to NoSQL
Write good papers
Faster Column-Oriented Indexes
Compressing column-oriented indexes

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
Spectroscopy.pptx food analysis technology
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Electronic commerce courselecture one. Pdf
PPT
Teaching material agriculture food technology
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
Cloud computing and distributed systems.
20250228 LYD VKU AI Blended-Learning.pptx
Programs and apps: productivity, graphics, security and other tools
Review of recent advances in non-invasive hemoglobin estimation
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
Spectroscopy.pptx food analysis technology
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Electronic commerce courselecture one. Pdf
Teaching material agriculture food technology
MYSQL Presentation for SQL database connectivity
Reach Out and Touch Someone: Haptics and Empathic Computing
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Digital-Transformation-Roadmap-for-Companies.pptx
Chapter 3 Spatial Domain Image Processing.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Per capita expenditure prediction using model stacking based on satellite ima...
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Advanced methodologies resolving dimensionality complications for autism neur...

Parsing JSON Really Quickly: Lessons Learned