SlideShare a Scribd company logo
Copyright © 2016 NTT DATA Corporation
December 2, 2016
NTT Data Corporation
Ayumi Ishii
Application of PostgreSQL to large social
infrastructure
PGCONF.ASIA 2016
Copyright © 2016 NTT DATA Corporation 2
How to use PostgreSQL in social infrastructure
3Copyright © 2016 NTT DATA Corporation
Positioning of smart meter management system
aggregation
device
SM
SM
SM
smart meter
management
system
SM
Data Center
SM
SM
SM
aggregation device
wheeling
management system
fee calculation for
new menu
other
power
companies
billing
processing
member management
system
reward points system
switching support
system
Organization
for Cross-
regional
Coordination
of
Transmission
Operators
★
4Copyright © 2016 NTT DATA Corporation
Main processing and mission of the system
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
Mission 1
Mission 2
large scale
SELECT
Mission 35 million
tuple
INSERT
5Copyright © 2016 NTT DATA Corporation
Mission
1. Load 10 million datasets within 10 minutes !
2. Must save data for 24 months !
3. Stabilize large scale SELECT performance !
6Copyright © 2016 NTT DATA Corporation
(1) Load 10 million datasets within 10 minutes !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
Mission 2
large scale
SELECT
Mission 35 million
tuple
INSERT
Mission 1
7Copyright © 2016 NTT DATA Corporation
Data model
data : [Device ID] [Date] [Electricity Usage]
ex) ID: 1 used 500 at 1:00 August 1st.
Method 1 :UPDATE model
UPDATE new data for each device, daily
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
Frequent UPADATEs are unfavorable for
PostgreSQL in terms of performance
8Copyright © 2016 NTT DATA Corporation
Data model
Device
ID
Date Value
1 8/1 0:00 100
1 8/1 0:30 300
1 8/1 1:00 500
… … …
○ performance
× data size
Method 2 : INSERT model
INSERT new data for each device, every 30 mins
Method 1 :UPDATE model
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
9Copyright © 2016 NTT DATA Corporation
Data model
Device
ID
Date Value
1 8/1 0:00 100
1 8/1 0:30 300
1 8/1 1:00 500
… … …
○ performance
× data size
Method 2 : INSERT model
INSERT new data for each device, every 30 mins
Method 1 :UPDATE model
Device
ID
Day 0:00 0:30 1:00 1:30 …
1 8/1 100 300 500
2 8/1 200 400
Selected based on performance
10Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction ?
multiplicity? parameters?
data type?
restrictions?
index?
version?
pre research regarding performance factors
how to load to
partition table?
11Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction
10000multiplicity
8
parameter
wal_bugffers=1GB
data type
minimumrestriction
minimum
index
minimum
version
9.4
direct load to
partition child table
DB design
performance tuning
12Copyright © 2016 NTT DATA Corporation
Performance factors
number of tuples
in one transaction
10000multiplicity
8
parameter
wal_bugffers=1GB
data type
minimumrestriction
minimum
index
minimum
version
9.4
direct load to
partition child table
13Copyright © 2016 NTT DATA Corporation
Bottleneck Analysis with perf
19.83% postgres postgres [.] XLogInsert ★
6.45% postgres postgres [.] LWLockRelease
4.41% postgres postgres [.] PinBuffer
3.03% postgres postgres [.] LWLockAcquire
WAL is the
bottleneck !
perf
WAL
WAL
file
Disk
I/O
memory
WAL buffer
write
・commit
・buffer is full
14Copyright © 2016 NTT DATA Corporation
wal_buffers parameter
“The auto-tuning selected by the default
setting of -1 should give reasonable results
in most cases.”
by PostgreSQL Document
15Copyright © 2016 NTT DATA Corporation
wal_buffers
※INSERT only
(except SELECT)
0:00:00
0:01:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
0:09:00
16MB 1GB
Time
Impact of WAL_buffers
16Copyright © 2016 NTT DATA Corporation
PostgreSQL version
・WAL performance improved
・JSONB
・GIN performance improved
・CONCURRENTLY option
9.3 9.4
17Copyright © 2016 NTT DATA Corporation
Version up
• We had originally planned to use 9.3, but changed to 9.4.
0:00:00
0:01:00
0:02:00
0:03:00
0:04:00
0:05:00
0:06:00
0:07:00
0:08:00
9.3 9.4
time
impact of version up
※INSERT only
(except SELECT)
18Copyright © 2016 NTT DATA Corporation
0:07:57
0:06:59
0:05:49
0:03:29
0:03:29
0:03:29
0:00:00
0:02:00
0:04:00
0:06:00
0:08:00
0:10:00
0:12:01
9.3, 16MB 9.3, 1GB 9.4, 1GB
time
Result
target
accomplished!!
other processes
are already
tuned.
■INSERT
■others
19Copyright © 2016 NTT DATA Corporation
(2) Must save data for 24 months !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
large scale
SELECT
Mission 35 million
tuple
INSERT
Mission 1
Mission 2
108TB
21Copyright © 2016 NTT DATA Corporation
Reduce data size by selecting the best data type
• Integer
 Use the smallest data type that can cover the range and precision
• Boolean
 Use BOOLEAN instead of CHAR(1)
Type precision Size
SMALLINT 4 digit 2 byte
INTEGER 9 digit 4 byte
BIGINT 18 digit 8 byte
NUMERIC 1000 digit 3 or 6 or 8 + ceiling(digit / 4) * 2
Type available data Size
CHAR(1) string (length is 1) 5 byte
BOOLEAN true or false 1 byte
22Copyright © 2016 NTT DATA Corporation
Reduce the data size by changing column order
• alignment
• PostgreSQL does not store data across the alignment
1 2 3 4 5 6 7 8
column_1(4byte) ***PADDING***
column_2(8byte)
8 byte
Column Type
column_1 integer
column_2 timestamp without time zone
column_3 integer
column_4 smallint
column_5 timestamp without time zone
column_6 smallint
column_7 timestamp without time zone
1 2 3 4 5 6 7 8
column_1 ***PADDING***
column_2
column_3 column_4 *PADDING*
column_5
column_6 ********PADDING*********
column_7
1 2 3 4 5 6 7 8
column_2
column_5
column_7
column_1 column_3
column_4 column_6
72 60
ex)
12 type / 1 tuple
 2.8GB /day!
24Copyright © 2016 NTT DATA Corporation
Change data model
num data select
frequency
update
frequency
policy model
1 1st day
~65th day
high high performance is the
priority
INSERT
2 66th day
~24 months
low low data size is the
priority
UPDATE
We adopted INSERT model considering the performance
• However, data size is large making it difficult to store long term
convert model for old data
25Copyright © 2016 NTT DATA Corporation
Change data model
ID date 0:00 0:30 1:00 … 22:30 23:00 23:30
1 8/1 100 300 500 … 1000 1100 1200
2 8/1 100 200 300 … 800 900 1000
ID timestamp value
1 8/1 0:00 100
2 8/1 0:00 100
1 8/1 0:30 300
2 8/1 0:30 200
1 8/1 1:00 500
2 8/1 1:00 300
… … …
1 8/1 22:30 1000
2 8/1 22:30 800
1 8/1 23:00 1100
2 8/1 23:00 900
1 8/1 23:30 1200
2 8/1 23:30 1000
INSERT model UPDATE model
remove duplicated data (ID, timestamp)
num of tuples/day: 240 million →5 million
size: 22GB→3GB
26Copyright © 2016 NTT DATA Corporation
result
108
11
0
20
40
60
80
100
120
datasize(TB)
reduce data size
before after
27Copyright © 2016 NTT DATA Corporation
(3) Stabilize large scale SELECT performance !
★
main processing
5 million datasets
per 30 min
validate
save
data
save
calculated datacalculation
within 10minutes
• 240 million additional tuples per
day
• must be saved for 24 months
5 million
tuple
INSERT
large scale
SELECT
5 million
tuple
INSERT
Mission 1
Mission 2
Mission 3
28Copyright © 2016 NTT DATA Corporation
Stabilize the performance of 10 million SELECT statements!
“stable performance” is important
• Performance degradation is caused by sudden changes in
execution plan is problem
control
execution plans
pg_hint_plan
lock statistical
information
pg_dbms_stats
stable performance
29Copyright © 2016 NTT DATA Corporation
Before using pg_hint_plan & pg_dbms_stats
In most cases, optimizer generates the best execution plan
fixing execution plan does not always bring good result
• The best execution plan at this time may not be best in the future.
However, it is necessary to reduce the risk.
If execution plan suddenly changed during operation, and
performance maybe reduced.
→Understand the demerits and use these extensions
• SELECT immediately after batch, before
ANALYZE
• SELECT from a lot of tables (JOIN)
• …
30Copyright © 2016 NTT DATA Corporation
pg_dbms_stats
Planner
pg_dbms_stats
PostgreSQL
Original
statistics
Plan
generate
Lock
“Locked”
statistics
31Copyright © 2016 NTT DATA Corporation
pg_dbms_stats in this system
usage
data
day
table
locked
statistics
day
table
locked
statistics
day
table
locked
statistics
day partition
set locked statistics with new table
COPY some statistics are different
depending on each child table
We can certainly get best plan even without
using ANALYZE.
• table’s OID, table name
• partition key, date
32Copyright © 2016 NTT DATA Corporation
Replacing statistics that should be changed according to table
• Create assumed dummy data
• ANALYZE dummy data
Column statistic
partition key Most Common Value
Date Histogram
Ex) “ 8/1 0:00” , “8/1 0:30”, “8/1 1:00”
48 pattern per day. Uniform distribution.
33Copyright © 2016 NTT DATA Corporation
1. Load 10 million datasets within 10 minutes !
2. Must save data for 24 months !
3. Stabilize large scale SELECT performance !
Mission
COMPLETE
34Copyright © 2016 NTT DATA Corporation
conclusion
The 20th anniversary of PostgreSQL
PostgreSQL finally evolved to be adopted in large scale social infrastructure.
Both PostgreSQL technical knowledge and business application knowledge are necessary
to be successful in difficult and large scale projects.
Pre research and know-how are important to get the full out of PostgreSQL.
Copyright © 2011 NTT DATA Corporation
Copyright © 2016 NTT DATA Corporation

More Related Content

PDF
Bloat and Fragmentation in PostgreSQL
PDF
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
PDF
FDW-based Sharding Update and Future
PDF
HTrace: Tracing in HBase and HDFS (HBase Meetup)
PDF
PG-Strom v2.0 Technical Brief (17-Apr-2018)
PDF
20201006_PGconf_Online_Large_Data_Processing
PDF
20171206 PGconf.ASIA LT gstore_fdw
PDF
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...
Bloat and Fragmentation in PostgreSQL
Gruter_TECHDAY_2014_04_TajoCloudHandsOn (in Korean)
FDW-based Sharding Update and Future
HTrace: Tracing in HBase and HDFS (HBase Meetup)
PG-Strom v2.0 Technical Brief (17-Apr-2018)
20201006_PGconf_Online_Large_Data_Processing
20171206 PGconf.ASIA LT gstore_fdw
The datascientists workplace of the future, IBM developerDays 2014, Vienna by...

What's hot (20)

PDF
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
PDF
Introduction to Apache Tajo: Future of Data Warehouse
PDF
20180920_DBTS_PGStrom_EN
PDF
20181212 - PGconfASIA - LT - English
PPTX
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
PDF
20181016_pgconfeu_ssd2gpu_multi
PDF
20201128_OSC_Fukuoka_Online_GPUPostGIS
PDF
Hadoop pig
PPTX
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
PDF
20190909_PGconf.ASIA_KaiGai
PDF
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
PDF
Treasure Data on The YARN - Hadoop Conference Japan 2014
PDF
GPGPU Accelerates PostgreSQL (English)
PDF
Dataflow shuffle service
PDF
Aws meetup (sep 2015) exprimir cada centavo
PDF
myHadoop 0.30
PPTX
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
PDF
データ解析技術入門(Hadoop編)
PDF
USENIX NSDI 2016 (Session: Resource Sharing)
PDF
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Transparent Data Encryption in PostgreSQL and Integration with Key Management...
Introduction to Apache Tajo: Future of Data Warehouse
20180920_DBTS_PGStrom_EN
20181212 - PGconfASIA - LT - English
Keynote Hadoop Summit Dublin 2016: Hadoop Platform Innovations - Pushing The ...
20181016_pgconfeu_ssd2gpu_multi
20201128_OSC_Fukuoka_Online_GPUPostGIS
Hadoop pig
PostgreSQL-as-a-Service with Crunchy PostgreSQL for PKS
20190909_PGconf.ASIA_KaiGai
Technology Updates of PG-Strom at Aug-2014 (PGUnconf@Tokyo)
Treasure Data on The YARN - Hadoop Conference Japan 2014
GPGPU Accelerates PostgreSQL (English)
Dataflow shuffle service
Aws meetup (sep 2015) exprimir cada centavo
myHadoop 0.30
Hadoop World 2011: The Powerful Marriage of R and Hadoop - David Champagne, R...
データ解析技術入門(Hadoop編)
USENIX NSDI 2016 (Session: Resource Sharing)
Building Software Ecosystems for AI Cloud using Singularity HPC Container
Ad

Viewers also liked (20)

PDF
Application of postgre sql to large social infrastructure jp
PPTX
ブロックチェーンの仕組みと動向(入門編)
PDF
Apache Hadoop 2.8.0 の新機能 (抜粋)
PDF
20170303 java9 hadoop
PDF
商用ミドルウェアのPuppet化で気を付けたい5つのこと
PPTX
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
PDF
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
PDF
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
PPTX
Kafkaを活用するためのストリーム処理の基本
PPTX
Apache NiFiと 他プロダクトのつなぎ方
PPTX
値型と参照型
PDF
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
PDF
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
PDF
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
PDF
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
PDF
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
PDF
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
PDF
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
PPTX
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
PDF
PostgreSQLコミュニティに飛び込もう
Application of postgre sql to large social infrastructure jp
ブロックチェーンの仕組みと動向(入門編)
Apache Hadoop 2.8.0 の新機能 (抜粋)
20170303 java9 hadoop
商用ミドルウェアのPuppet化で気を付けたい5つのこと
今からはじめるPuppet 2016 ~ インフラエンジニアのたしなみ ~
Hadoopエコシステムの最新動向とNTTデータの取り組み (OSC 2016 Tokyo/Spring 講演資料)
データ活用をもっともっと円滑に! ~データ処理・分析基盤編を少しだけ~
Kafkaを活用するためのストリーム処理の基本
Apache NiFiと 他プロダクトのつなぎ方
値型と参照型
Hadoopのメンテナンスリリースバージョンをリリースしてみた (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo...
Using Oracle Big Data Discovey as a Data Scientist's Toolkit
HDFS新機能総まとめin 2015 (日本Hadoopユーザー会 ライトニングトーク@Cloudera World Tokyo 2015 講演資料)
PostgreSQLでpg_bigmを使って日本語全文検索 (MySQLとPostgreSQLの日本語全文検索勉強会 発表資料)
本当にあったHadoopの恐い話 Blockはどこへきえた? (Hadoop / Spark Conference Japan 2016 ライトニングトー...
SIプロジェクトでのインフラ自動化の事例 (第1回 Puppetユーザ会 発表資料)
サポートメンバは見た! Hadoopバグワースト10 (adoop / Spark Conference Japan 2016 ライトニングトーク発表資料)
Hadoop基盤上のETL構築実践例 ~多様なデータをどう扱う?~
PostgreSQLコミュニティに飛び込もう
Ad

Similar to Application of postgre sql to large social infrastructure (20)

PDF
BigData @ comScore
PPTX
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
PDF
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
PDF
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
PDF
Big data processing with PubSub, Dataflow, and BigQuery
PPTX
Apache Druid Design and Future prospect
PPTX
Hypothetical Partitioning for PostgreSQL
PDF
Balogh gyorgy big_data
PDF
Big Data and PostgreSQL
PDF
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
PDF
M|18 Analytics in the Real World, Case Studies and Use Cases
PDF
Sensor Data Management & Analytics: Advanced Process Control
PPTX
Our journey with druid - from initial research to full production scale
PPT
Tw Bizcases
PDF
Data Science in the Cloud @StitchFix
PDF
How to Suceed in Hadoop
PDF
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
PDF
Histogram Support in MySQL 8.0
PPTX
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
PDF
Concept to production Nationwide Insurance BigInsights Journey with Telematics
BigData @ comScore
Leveraging smart meter data for electric utilities: Comparison of Spark SQL w...
OSMC 2018 | Learnings, patterns and Uber’s metrics platform M3, open sourced ...
Unconstrained Analytics in the Age of Data – Delivering High-Performance Anal...
Big data processing with PubSub, Dataflow, and BigQuery
Apache Druid Design and Future prospect
Hypothetical Partitioning for PostgreSQL
Balogh gyorgy big_data
Big Data and PostgreSQL
MongoDB World 2018: Overnight to 60 Seconds: An IOT ETL Performance Case Study
M|18 Analytics in the Real World, Case Studies and Use Cases
Sensor Data Management & Analytics: Advanced Process Control
Our journey with druid - from initial research to full production scale
Tw Bizcases
Data Science in the Cloud @StitchFix
How to Suceed in Hadoop
How to Succeed in Hadoop: comScore’s Deceptively Simple Secrets to Deploying ...
Histogram Support in MySQL 8.0
Syncsort & comScore Big Data Warehouse Meetup Sept 2013
Concept to production Nationwide Insurance BigInsights Journey with Telematics

More from NTT DATA OSS Professional Services (15)

PDF
Global Top 5 を目指す NTT DATA の確かで意外な技術力
PDF
Spark SQL - The internal -
PDF
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
PDF
Hadoopエコシステムのデータストア振り返り
PDF
HDFS Router-based federation
PDF
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
PDF
Apache Hadoopの新機能Ozoneの現状
PDF
Distributed data stores in Hadoop ecosystem
PDF
Structured Streaming - The Internal -
PDF
Apache Hadoopの未来 3系になって何が変わるのか?
PDF
Apache Hadoop and YARN, current development status
PDF
HDFS basics from API perspective
PDF
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
PDF
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
PDF
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)
Global Top 5 を目指す NTT DATA の確かで意外な技術力
Spark SQL - The internal -
Apache Kafkaって本当に大丈夫?~故障検証のオーバービューと興味深い挙動の紹介~
Hadoopエコシステムのデータストア振り返り
HDFS Router-based federation
PostgreSQL10を導入!大規模データ分析事例からみるDWHとしてのPostgreSQL活用のポイント
Apache Hadoopの新機能Ozoneの現状
Distributed data stores in Hadoop ecosystem
Structured Streaming - The Internal -
Apache Hadoopの未来 3系になって何が変わるのか?
Apache Hadoop and YARN, current development status
HDFS basics from API perspective
SIerとオープンソースの美味しい関係 ~コミュニティの力を活かして世界を目指そう~
Apache Sparkに手を出してヤケドしないための基本 ~「Apache Spark入門より」~ (デブサミ 2016 講演資料)
Apache Spark超入門 (Hadoop / Spark Conference Japan 2016 講演資料)

Recently uploaded (20)

PPTX
Spectroscopy.pptx food analysis technology
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
cuic standard and advanced reporting.pdf
PDF
Encapsulation theory and applications.pdf
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
KodekX | Application Modernization Development
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Network Security Unit 5.pdf for BCA BBA.
PPT
Teaching material agriculture food technology
PDF
Machine learning based COVID-19 study performance prediction
PPTX
sap open course for s4hana steps from ECC to s4
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
Spectroscopy.pptx food analysis technology
Per capita expenditure prediction using model stacking based on satellite ima...
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
Unlocking AI with Model Context Protocol (MCP)
cuic standard and advanced reporting.pdf
Encapsulation theory and applications.pdf
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Building Integrated photovoltaic BIPV_UPV.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Network Security Unit 5.pdf for BCA BBA.
Teaching material agriculture food technology
Machine learning based COVID-19 study performance prediction
sap open course for s4hana steps from ECC to s4
Review of recent advances in non-invasive hemoglobin estimation
“AI and Expert System Decision Support & Business Intelligence Systems”

Application of postgre sql to large social infrastructure

  • 1. Copyright © 2016 NTT DATA Corporation December 2, 2016 NTT Data Corporation Ayumi Ishii Application of PostgreSQL to large social infrastructure PGCONF.ASIA 2016
  • 2. Copyright © 2016 NTT DATA Corporation 2 How to use PostgreSQL in social infrastructure
  • 3. 3Copyright © 2016 NTT DATA Corporation Positioning of smart meter management system aggregation device SM SM SM smart meter management system SM Data Center SM SM SM aggregation device wheeling management system fee calculation for new menu other power companies billing processing member management system reward points system switching support system Organization for Cross- regional Coordination of Transmission Operators ★
  • 4. 4Copyright © 2016 NTT DATA Corporation Main processing and mission of the system main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT Mission 1 Mission 2 large scale SELECT Mission 35 million tuple INSERT
  • 5. 5Copyright © 2016 NTT DATA Corporation Mission 1. Load 10 million datasets within 10 minutes ! 2. Must save data for 24 months ! 3. Stabilize large scale SELECT performance !
  • 6. 6Copyright © 2016 NTT DATA Corporation (1) Load 10 million datasets within 10 minutes ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT Mission 2 large scale SELECT Mission 35 million tuple INSERT Mission 1
  • 7. 7Copyright © 2016 NTT DATA Corporation Data model data : [Device ID] [Date] [Electricity Usage] ex) ID: 1 used 500 at 1:00 August 1st. Method 1 :UPDATE model UPDATE new data for each device, daily Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400 Frequent UPADATEs are unfavorable for PostgreSQL in terms of performance
  • 8. 8Copyright © 2016 NTT DATA Corporation Data model Device ID Date Value 1 8/1 0:00 100 1 8/1 0:30 300 1 8/1 1:00 500 … … … ○ performance × data size Method 2 : INSERT model INSERT new data for each device, every 30 mins Method 1 :UPDATE model Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400
  • 9. 9Copyright © 2016 NTT DATA Corporation Data model Device ID Date Value 1 8/1 0:00 100 1 8/1 0:30 300 1 8/1 1:00 500 … … … ○ performance × data size Method 2 : INSERT model INSERT new data for each device, every 30 mins Method 1 :UPDATE model Device ID Day 0:00 0:30 1:00 1:30 … 1 8/1 100 300 500 2 8/1 200 400 Selected based on performance
  • 10. 10Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction ? multiplicity? parameters? data type? restrictions? index? version? pre research regarding performance factors how to load to partition table?
  • 11. 11Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction 10000multiplicity 8 parameter wal_bugffers=1GB data type minimumrestriction minimum index minimum version 9.4 direct load to partition child table DB design performance tuning
  • 12. 12Copyright © 2016 NTT DATA Corporation Performance factors number of tuples in one transaction 10000multiplicity 8 parameter wal_bugffers=1GB data type minimumrestriction minimum index minimum version 9.4 direct load to partition child table
  • 13. 13Copyright © 2016 NTT DATA Corporation Bottleneck Analysis with perf 19.83% postgres postgres [.] XLogInsert ★ 6.45% postgres postgres [.] LWLockRelease 4.41% postgres postgres [.] PinBuffer 3.03% postgres postgres [.] LWLockAcquire WAL is the bottleneck ! perf WAL WAL file Disk I/O memory WAL buffer write ・commit ・buffer is full
  • 14. 14Copyright © 2016 NTT DATA Corporation wal_buffers parameter “The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.” by PostgreSQL Document
  • 15. 15Copyright © 2016 NTT DATA Corporation wal_buffers ※INSERT only (except SELECT) 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00 0:06:00 0:07:00 0:08:00 0:09:00 16MB 1GB Time Impact of WAL_buffers
  • 16. 16Copyright © 2016 NTT DATA Corporation PostgreSQL version ・WAL performance improved ・JSONB ・GIN performance improved ・CONCURRENTLY option 9.3 9.4
  • 17. 17Copyright © 2016 NTT DATA Corporation Version up • We had originally planned to use 9.3, but changed to 9.4. 0:00:00 0:01:00 0:02:00 0:03:00 0:04:00 0:05:00 0:06:00 0:07:00 0:08:00 9.3 9.4 time impact of version up ※INSERT only (except SELECT)
  • 18. 18Copyright © 2016 NTT DATA Corporation 0:07:57 0:06:59 0:05:49 0:03:29 0:03:29 0:03:29 0:00:00 0:02:00 0:04:00 0:06:00 0:08:00 0:10:00 0:12:01 9.3, 16MB 9.3, 1GB 9.4, 1GB time Result target accomplished!! other processes are already tuned. ■INSERT ■others
  • 19. 19Copyright © 2016 NTT DATA Corporation (2) Must save data for 24 months ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT large scale SELECT Mission 35 million tuple INSERT Mission 1 Mission 2
  • 20. 108TB
  • 21. 21Copyright © 2016 NTT DATA Corporation Reduce data size by selecting the best data type • Integer  Use the smallest data type that can cover the range and precision • Boolean  Use BOOLEAN instead of CHAR(1) Type precision Size SMALLINT 4 digit 2 byte INTEGER 9 digit 4 byte BIGINT 18 digit 8 byte NUMERIC 1000 digit 3 or 6 or 8 + ceiling(digit / 4) * 2 Type available data Size CHAR(1) string (length is 1) 5 byte BOOLEAN true or false 1 byte
  • 22. 22Copyright © 2016 NTT DATA Corporation Reduce the data size by changing column order • alignment • PostgreSQL does not store data across the alignment 1 2 3 4 5 6 7 8 column_1(4byte) ***PADDING*** column_2(8byte) 8 byte
  • 23. Column Type column_1 integer column_2 timestamp without time zone column_3 integer column_4 smallint column_5 timestamp without time zone column_6 smallint column_7 timestamp without time zone 1 2 3 4 5 6 7 8 column_1 ***PADDING*** column_2 column_3 column_4 *PADDING* column_5 column_6 ********PADDING********* column_7 1 2 3 4 5 6 7 8 column_2 column_5 column_7 column_1 column_3 column_4 column_6 72 60 ex) 12 type / 1 tuple  2.8GB /day!
  • 24. 24Copyright © 2016 NTT DATA Corporation Change data model num data select frequency update frequency policy model 1 1st day ~65th day high high performance is the priority INSERT 2 66th day ~24 months low low data size is the priority UPDATE We adopted INSERT model considering the performance • However, data size is large making it difficult to store long term convert model for old data
  • 25. 25Copyright © 2016 NTT DATA Corporation Change data model ID date 0:00 0:30 1:00 … 22:30 23:00 23:30 1 8/1 100 300 500 … 1000 1100 1200 2 8/1 100 200 300 … 800 900 1000 ID timestamp value 1 8/1 0:00 100 2 8/1 0:00 100 1 8/1 0:30 300 2 8/1 0:30 200 1 8/1 1:00 500 2 8/1 1:00 300 … … … 1 8/1 22:30 1000 2 8/1 22:30 800 1 8/1 23:00 1100 2 8/1 23:00 900 1 8/1 23:30 1200 2 8/1 23:30 1000 INSERT model UPDATE model remove duplicated data (ID, timestamp) num of tuples/day: 240 million →5 million size: 22GB→3GB
  • 26. 26Copyright © 2016 NTT DATA Corporation result 108 11 0 20 40 60 80 100 120 datasize(TB) reduce data size before after
  • 27. 27Copyright © 2016 NTT DATA Corporation (3) Stabilize large scale SELECT performance ! ★ main processing 5 million datasets per 30 min validate save data save calculated datacalculation within 10minutes • 240 million additional tuples per day • must be saved for 24 months 5 million tuple INSERT large scale SELECT 5 million tuple INSERT Mission 1 Mission 2 Mission 3
  • 28. 28Copyright © 2016 NTT DATA Corporation Stabilize the performance of 10 million SELECT statements! “stable performance” is important • Performance degradation is caused by sudden changes in execution plan is problem control execution plans pg_hint_plan lock statistical information pg_dbms_stats stable performance
  • 29. 29Copyright © 2016 NTT DATA Corporation Before using pg_hint_plan & pg_dbms_stats In most cases, optimizer generates the best execution plan fixing execution plan does not always bring good result • The best execution plan at this time may not be best in the future. However, it is necessary to reduce the risk. If execution plan suddenly changed during operation, and performance maybe reduced. →Understand the demerits and use these extensions • SELECT immediately after batch, before ANALYZE • SELECT from a lot of tables (JOIN) • …
  • 30. 30Copyright © 2016 NTT DATA Corporation pg_dbms_stats Planner pg_dbms_stats PostgreSQL Original statistics Plan generate Lock “Locked” statistics
  • 31. 31Copyright © 2016 NTT DATA Corporation pg_dbms_stats in this system usage data day table locked statistics day table locked statistics day table locked statistics day partition set locked statistics with new table COPY some statistics are different depending on each child table We can certainly get best plan even without using ANALYZE. • table’s OID, table name • partition key, date
  • 32. 32Copyright © 2016 NTT DATA Corporation Replacing statistics that should be changed according to table • Create assumed dummy data • ANALYZE dummy data Column statistic partition key Most Common Value Date Histogram Ex) “ 8/1 0:00” , “8/1 0:30”, “8/1 1:00” 48 pattern per day. Uniform distribution.
  • 33. 33Copyright © 2016 NTT DATA Corporation 1. Load 10 million datasets within 10 minutes ! 2. Must save data for 24 months ! 3. Stabilize large scale SELECT performance ! Mission COMPLETE
  • 34. 34Copyright © 2016 NTT DATA Corporation conclusion The 20th anniversary of PostgreSQL PostgreSQL finally evolved to be adopted in large scale social infrastructure. Both PostgreSQL technical knowledge and business application knowledge are necessary to be successful in difficult and large scale projects. Pre research and know-how are important to get the full out of PostgreSQL.
  • 35. Copyright © 2011 NTT DATA Corporation Copyright © 2016 NTT DATA Corporation