SlideShare a Scribd company logo
The Data Platform Administration
Handling the 100 PB
May 19th, 2022
Yongduck Lee
Cloud Platform Department
Rakuten Group, Inc.
2
About me
Lecture History
- Colloquium Lecturer at KAIST
Program Committee
- BigComp2017/2019
- EDB 2016
Certification
- Certified Scrum Master (CSM)
- Certified Project Management Professional (PMP #1255421)
… ETC
Lee Yongduck Daniel
A Vice Section Manager and Senior Architect at Data Storage and
Processing Section in Rakuten Group, Inc.
Started as Recommendation Engine Developer and now is focusing on
researching and verifying new Big Data Technology and how to support
users who want to use Big Data System.
B.Sc in Korea University in 2001.
21 years in Japan and have been worked for many organization and
company such as NHK, NTTD and Rakuten Group, Inc.
3
CONTENTS
1. Global Internet & Data Explosion
2. Data in Rakuten
3. Data platform & Big Data Administrator in Rakuten
4. What Advantages as Engineer in Rakuten
4
Internet & Globalization
The Internet is the global system of interconnected computer networks that use the Internet protocol
suite (TCP/IP) to link devices worldwide. It is a network of networks that consists of private, public, academic,
business, and government networks of local to global scope, linked by a broad array of electronic, wireless,
and optical networking technologies
G
C
Vast
Unstructured 80%
Structured 20%
35.2 ZB in 2020
The origins of the Internet date back to research
commissioned by the federal government of the
United States in the 1960s to build robust, fault-
tolerant communication with computer networks.
https://en.wikipedia.org/wiki/Internet#World_Wide_Web
* From IDC white paper & EMC
hances
Lobalization
Information
Structure Volume
5
Internet Users
Internet users are defined as persons who accessed the Internet in the last 12 months from any device,
including mobile phones.
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
6
Internet Users
https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
In Japan 92.3% are using Internet ( Population 127,202,192 / Internet Users 117,400,000 )
At 2018
7
8
9
The Big Data in Rakuten
There are huge potential value and possibilities due to Diversity of Service and Users not
only from Japan but also Global. It is very interesting and ideal environment for Data
Scientiest and Data Analyst.
Increase synergy effect on personalization, clustering, segmentation, etc. by combining
data from various services.
The large volume of data every day, every month, and every year from services and users.
It is a big challenge to store data and make it easy to utilize for data users as System
Infrastructure Engineer and Data Engineer.
Diversity and Synergy
Scale
10
Rakuten Hadoop and Kafka
Supporting near-realtime & streaming processing in
each region.
Handling data totally around 1.3 Million Message/sec
( 10 GB/sec IN/OUT) around peak time at normal
date.
At 2021 Super Sale, we handled more than 2.5 times
messages and traffics.
Supporting Data Lake, Data Mart, and Data Analysis
for Rakuten Service in each region.
Lots of value mining from big data are being done by
data scientist and contributing on Rakuten Service.
Kafka: 800 Core, 20TB Mem, 4728 Topics
Hadoop : 80K Core, 600 TB Mem, 160K TB Disk
11
The Challenge on Administration
12
The Big Data in Rakuten
Platform/Middleware
Administrator
Users
Project/Product
Manager
Big Data Platform
Administrator
Infra/Server
Administrator
Network
Administrator
Software/System
Architect
Software
Developer
13
Administration Use CASE (HBase)
User reported performance issues on HBase but there were no issues or report from other users who are using
other component on Hadoop.
Confirm Way to get/put data on HBase
• HBase
Configuration
Architecture, Work/Dataflow.
Application/GC Logs
• Dependency Component (*HDFS)
READ/Write Performance Logs
Application/GC Logs
• DISK/Mem/CPU Load
• Kernel Log
• Network Connection
Date
&
Time
Matching
Data Hot Spotting.
Data or Configuration Caching
HDFS
JVM Config change
Increasing Handler
Increasing Scanner Interval
HW Improvement
Master Node Replacement
Reduced RegionServers
Move HDD to NVMe
Dedicated RegionServers
OS Configuration
Root noprocs, nofiles increasing on Dedicated RS
HBASE
TCPNoDelay, Parallel Seeking , Master Table Locality
WRITE/Short-READ/Long-READ Queue
DEADLINE Scheduler, Hedged Reads, Short Circuit READ
14
What Advantages in Rakuten as Data Engineer
You can go through all necessary domains of Big Data Platform to get rich experience for Big Data Platform
Administrators. Rakuten has experts who have rich knowledges and experiences on each technical and
management domain.
15
What Advantages in Rakuten as Data Engineer
You can also work with various stakeholders from various service domain, from the point of data utilization.
DB
Services
Event
INFRA
…
The Data Platform Administration Handling the 100 PB.pdf

More Related Content

PDF
Making Cloud Native CI_CD Services.pdf
PDF
楽天の規模とクラウドプラットフォーム統括部の役割
PDF
How We Defined Our Own Cloud.pdf
PDF
100PBを越えるデータプラットフォームの実情
PDF
Rakuten Services and Infrastructure Team.pdf
PDF
Rakuten Platform
PDF
大規模なリアルタイム監視の導入と展開
PDF
楽天サービスを支えるネットワークインフラストラクチャー
Making Cloud Native CI_CD Services.pdf
楽天の規模とクラウドプラットフォーム統括部の役割
How We Defined Our Own Cloud.pdf
100PBを越えるデータプラットフォームの実情
Rakuten Services and Infrastructure Team.pdf
Rakuten Platform
大規模なリアルタイム監視の導入と展開
楽天サービスを支えるネットワークインフラストラクチャー

What's hot (20)

PDF
楽天における大規模データベースの運用
PDF
Travel & Leisure Platform Department's tech info
PDF
楽天サービスとインフラ部隊
PDF
楽天のインフラ事情 2022
PDF
楽天のデータサイエンス/AIによるビッグデータ活用
PDF
ビッグデータ処理データベースの全体像と使い分け
PDF
モニタリングプラットフォーム開発の裏側
PPTX
チームトポロジーから学び、 データプラットフォーム組織を考え直した話.pptx
PDF
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
PDF
MQTTとAMQPと.NET
PDF
DataSkillCultureを浸透させる楽天の取り組み
PDF
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
PDF
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
PDF
リクルートのWebサービスを支える「RAFTEL」
PDF
他山の石勉強会 DRBD編
PDF
ユーザに価値を届けるためのデータプラットフォームの考え方
PPTX
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
PDF
Kafka & Hadoop in Rakuten
PPTX
Dockerからcontainerdへの移行
PDF
実践!OpenTelemetry と OSS を使った Observability 基盤の構築(CloudNative Days Tokyo 2022 発...
楽天における大規模データベースの運用
Travel & Leisure Platform Department's tech info
楽天サービスとインフラ部隊
楽天のインフラ事情 2022
楽天のデータサイエンス/AIによるビッグデータ活用
ビッグデータ処理データベースの全体像と使い分け
モニタリングプラットフォーム開発の裏側
チームトポロジーから学び、 データプラットフォーム組織を考え直した話.pptx
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
MQTTとAMQPと.NET
DataSkillCultureを浸透させる楽天の取り組み
楽天ネットワークエンジニアたちが目指す、次世代データセンターとは
Grafana LokiではじめるKubernetesロギングハンズオン(NTT Tech Conference #4 ハンズオン資料)
リクルートのWebサービスを支える「RAFTEL」
他山の石勉強会 DRBD編
ユーザに価値を届けるためのデータプラットフォームの考え方
Kubernetesでの性能解析 ~なんとなく遅いからの脱却~(Kubernetes Meetup Tokyo #33 発表資料)
Kafka & Hadoop in Rakuten
Dockerからcontainerdへの移行
実践!OpenTelemetry と OSS を使った Observability 基盤の構築(CloudNative Days Tokyo 2022 発...
Ad

Similar to The Data Platform Administration Handling the 100 PB.pdf (20)

PDF
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
PDF
The Growth Of Data Centers
PDF
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
PPTX
MapR and Cisco Make IT Better
PDF
Big data - what, why, where, when and how
PDF
592-1627-1-PB
PDF
Machine Learning for z/OS
PPTX
Bigdata-Intro.pptx
PPTX
Introducing Events and Stream Processing into Nationwide Building Society
PPS
Qo Introduction V2
PDF
Idc analyst report a new breed of servers for digital transformation
PDF
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
PPTX
Deploying cost effective cloud data center
PDF
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
PDF
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
PDF
Modern Data Management for Federal Modernization
PDF
Big Data with Hadoop – For Data Management, Processing and Storing
PDF
IRJET- Systematic Review: Progression Study on BIG DATA articles
DOCX
Resume (1)
DOCX
Resume (1)
Jak konsolidovat Vaše databáze s využitím Cloud služeb?
The Growth Of Data Centers
IRJET- A Comparative Study on Big Data Analytics Approaches and Tools
MapR and Cisco Make IT Better
Big data - what, why, where, when and how
592-1627-1-PB
Machine Learning for z/OS
Bigdata-Intro.pptx
Introducing Events and Stream Processing into Nationwide Building Society
Qo Introduction V2
Idc analyst report a new breed of servers for digital transformation
Denodo DataFest 2016: The Role of Data Virtualization in IoT Integration
Deploying cost effective cloud data center
Oracle databáze - zkonsolidovat, ochránit a ještě ušetřit! (1. část)
Virtual Machine Allocation Policy in Cloud Computing Environment using CloudSim
Modern Data Management for Federal Modernization
Big Data with Hadoop – For Data Management, Processing and Storing
IRJET- Systematic Review: Progression Study on BIG DATA articles
Resume (1)
Resume (1)
Ad

More from Rakuten Group, Inc. (15)

PDF
EPSS (Exploit Prediction Scoring System)モニタリングツールの開発
PPTX
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
PDF
楽天における安全な秘匿情報管理への道のり
PDF
What Makes Software Green?
PDF
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
PDF
Supporting Internal Customers as Technical Account Managers.pdf
PDF
Travel & Leisure Platform Department's tech info
PDF
OWASPTop10_Introduction
PDF
Introduction of GORA API Group technology
PDF
社内エンジニアを支えるテクニカルアカウントマネージャー
PDF
Unclouding Container Challenges
PDF
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
PDF
アジャイル開発とメトリクス
PDF
AR/SLAM and IoT
PDF
Introduction of Rakuten Commerce QA Night#2
EPSS (Exploit Prediction Scoring System)モニタリングツールの開発
コードレビュー改善のためにJenkinsとIntelliJ IDEAのプラグインを自作してみた話
楽天における安全な秘匿情報管理への道のり
What Makes Software Green?
Simple and Effective Knowledge-Driven Query Expansion for QA-Based Product At...
Supporting Internal Customers as Technical Account Managers.pdf
Travel & Leisure Platform Department's tech info
OWASPTop10_Introduction
Introduction of GORA API Group technology
社内エンジニアを支えるテクニカルアカウントマネージャー
Unclouding Container Challenges
Functional Programming in Pattern-Match-Oriented Programming Style <Programmi...
アジャイル開発とメトリクス
AR/SLAM and IoT
Introduction of Rakuten Commerce QA Night#2

Recently uploaded (20)

PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
cuic standard and advanced reporting.pdf
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
NewMind AI Monthly Chronicles - July 2025
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Advanced IT Governance
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
Electronic commerce courselecture one. Pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
cuic standard and advanced reporting.pdf
CIFDAQ's Market Insight: SEC Turns Pro Crypto
NewMind AI Monthly Chronicles - July 2025
Big Data Technologies - Introduction.pptx
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Understanding_Digital_Forensics_Presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Chapter 3 Spatial Domain Image Processing.pdf
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
20250228 LYD VKU AI Blended-Learning.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Advanced IT Governance
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
[발표본] 너의 과제는 클라우드에 있어_KTDS_김동현_20250524.pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
Electronic commerce courselecture one. Pdf

The Data Platform Administration Handling the 100 PB.pdf

  • 1. The Data Platform Administration Handling the 100 PB May 19th, 2022 Yongduck Lee Cloud Platform Department Rakuten Group, Inc.
  • 2. 2 About me Lecture History - Colloquium Lecturer at KAIST Program Committee - BigComp2017/2019 - EDB 2016 Certification - Certified Scrum Master (CSM) - Certified Project Management Professional (PMP #1255421) … ETC Lee Yongduck Daniel A Vice Section Manager and Senior Architect at Data Storage and Processing Section in Rakuten Group, Inc. Started as Recommendation Engine Developer and now is focusing on researching and verifying new Big Data Technology and how to support users who want to use Big Data System. B.Sc in Korea University in 2001. 21 years in Japan and have been worked for many organization and company such as NHK, NTTD and Rakuten Group, Inc.
  • 3. 3 CONTENTS 1. Global Internet & Data Explosion 2. Data in Rakuten 3. Data platform & Big Data Administrator in Rakuten 4. What Advantages as Engineer in Rakuten
  • 4. 4 Internet & Globalization The Internet is the global system of interconnected computer networks that use the Internet protocol suite (TCP/IP) to link devices worldwide. It is a network of networks that consists of private, public, academic, business, and government networks of local to global scope, linked by a broad array of electronic, wireless, and optical networking technologies G C Vast Unstructured 80% Structured 20% 35.2 ZB in 2020 The origins of the Internet date back to research commissioned by the federal government of the United States in the 1960s to build robust, fault- tolerant communication with computer networks. https://en.wikipedia.org/wiki/Internet#World_Wide_Web * From IDC white paper & EMC hances Lobalization Information Structure Volume
  • 5. 5 Internet Users Internet users are defined as persons who accessed the Internet in the last 12 months from any device, including mobile phones. https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14
  • 6. 6 Internet Users https://en.wikipedia.org/wiki/List_of_countries_by_number_of_Internet_users#cite_note-UN_WPP-14 In Japan 92.3% are using Internet ( Population 127,202,192 / Internet Users 117,400,000 ) At 2018
  • 7. 7
  • 8. 8
  • 9. 9 The Big Data in Rakuten There are huge potential value and possibilities due to Diversity of Service and Users not only from Japan but also Global. It is very interesting and ideal environment for Data Scientiest and Data Analyst. Increase synergy effect on personalization, clustering, segmentation, etc. by combining data from various services. The large volume of data every day, every month, and every year from services and users. It is a big challenge to store data and make it easy to utilize for data users as System Infrastructure Engineer and Data Engineer. Diversity and Synergy Scale
  • 10. 10 Rakuten Hadoop and Kafka Supporting near-realtime & streaming processing in each region. Handling data totally around 1.3 Million Message/sec ( 10 GB/sec IN/OUT) around peak time at normal date. At 2021 Super Sale, we handled more than 2.5 times messages and traffics. Supporting Data Lake, Data Mart, and Data Analysis for Rakuten Service in each region. Lots of value mining from big data are being done by data scientist and contributing on Rakuten Service. Kafka: 800 Core, 20TB Mem, 4728 Topics Hadoop : 80K Core, 600 TB Mem, 160K TB Disk
  • 11. 11 The Challenge on Administration
  • 12. 12 The Big Data in Rakuten Platform/Middleware Administrator Users Project/Product Manager Big Data Platform Administrator Infra/Server Administrator Network Administrator Software/System Architect Software Developer
  • 13. 13 Administration Use CASE (HBase) User reported performance issues on HBase but there were no issues or report from other users who are using other component on Hadoop. Confirm Way to get/put data on HBase • HBase Configuration Architecture, Work/Dataflow. Application/GC Logs • Dependency Component (*HDFS) READ/Write Performance Logs Application/GC Logs • DISK/Mem/CPU Load • Kernel Log • Network Connection Date & Time Matching Data Hot Spotting. Data or Configuration Caching HDFS JVM Config change Increasing Handler Increasing Scanner Interval HW Improvement Master Node Replacement Reduced RegionServers Move HDD to NVMe Dedicated RegionServers OS Configuration Root noprocs, nofiles increasing on Dedicated RS HBASE TCPNoDelay, Parallel Seeking , Master Table Locality WRITE/Short-READ/Long-READ Queue DEADLINE Scheduler, Hedged Reads, Short Circuit READ
  • 14. 14 What Advantages in Rakuten as Data Engineer You can go through all necessary domains of Big Data Platform to get rich experience for Big Data Platform Administrators. Rakuten has experts who have rich knowledges and experiences on each technical and management domain.
  • 15. 15 What Advantages in Rakuten as Data Engineer You can also work with various stakeholders from various service domain, from the point of data utilization. DB Services Event INFRA …