SlideShare a Scribd company logo
AI Computing Broker
Fujitsu Limited
© 2024 Fujitsu Limited
1
AI Computing Broker
© 2024 Fujitsu Limited
• works with a wide range of AI apps.
based on PyTorch and TensorFlow
• just install and use; no code changes
are needed
A middleware to share GPUs among AI apps.
ACB
PyTorch TF
AI
app.
AI
app.
AI
app.
2
• Best-in-class GPU utilization efficiency
• Enabling full GPU memory for
each job
Key Features
Feature of AI Computing Broker
© 2024 Fujitsu Limited
3
• “Routine-level” allocation that detects actual GPU parts of
jobs and dynamically allocates GPU accordingly
Best-in-class GPU utilization efficiency
Time
Efficient GPU usage
reduces execution time
of multiple jobs
ACB
Conventional
Job-level
Allocation
Job Job Job Job
Idle Idle Idle Idle Idle
Active Active
Active Active
GPU util.
GPU
CPU CPU
GPU
CPU CPU
GPU
CPU CPU
GPU
CPU CPU
Idle Idle
Active Active
Active Active
Routine-level
GPU Allocation GPU util.
Conventional: Spatial-sharing
Feature of AI Computing Broker
© 2024 Fujitsu Limited
4
Memory
Computing Unit
Job A Job B Job C
Sharing among Job A/B/C
Memory
Comp. Unit
Job A
Job A
Memory
Comp. Unit
Job B
Job B
Memory
Comp. Unit
Job C
Job C
Enabling full GPU memory for each job
Memory is divided among jobs
Limited to small AI models
Memory is occupied by each job
Large AI models can run
swap
memory
swap
memory
• Allocate GPU to only one job at a time (Temporal-sharing)
• Data of other jobs on GPU is automatically swapped to CPU
ACB Temporal-sharing
Development status
© 2024 Fujitsu Limited
5
Doubled the
model training throughput
Deploying more AI tasks
beyond hardware limits
Currency prediction service Data Center Service
TRADOM Inc. Sakura Internet Inc.
・Using multiple GPU in a server
・LLM inference, fine-tuning
・Using multiple servers
・Large-scale LLM training
・Small-scale AI tasks
(e.g., Image recognition)
Refer to the press release on Oct. 22.
Single GPU
Multi GPU
Available
Multi-server
© 2024 Fujitsu Limited
Thank you
For more details:
https://www.fujitsu.com/global/products/computing/
servers/supercomputer/topics/sc24/

More Related Content

PDF
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
PPTX
AI Workloads running on Cloud Run with GPUs
PDF
Ai pipelines powered by jupyter notebooks
PDF
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
PDF
Improving User Experience with Ubiquitous QuickBoot
 
PPTX
Advanced technologies and techniques for debugging HPC applications
PDF
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
PPTX
OpenStack Compute - Juno Updates
GPU Support In Spark And GPU/CPU Mixed Resource Scheduling At Production Scale
AI Workloads running on Cloud Run with GPUs
Ai pipelines powered by jupyter notebooks
AzureNativeQumulo_HPC_Cloud_Native_Benchmarks.pdf
Improving User Experience with Ubiquitous QuickBoot
 
Advanced technologies and techniques for debugging HPC applications
Enabling Cognitive Workloads on the Cloud: GPUs with Mesos, Docker and Marath...
OpenStack Compute - Juno Updates

Similar to PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ2「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computing Broker』」 (20)

PPTX
How GPU Computing literally saved me at work!
PPTX
Play with azure functions
PPTX
Serverless Computing with Azure Functions Best Practices
PPTX
An overview of reference architectures for Postgres
 
PDF
"How to Get the Best Deep Learning Performance with the OpenVINO Toolkit," a ...
PDF
Allocated GPUs vs. GPU Quota in RunAI_ Differences Covered.pdf
PDF
08 Supercomputer Fugaku
PDF
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
PPTX
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
PPTX
Graphics processing unit ppt
PDF
Escalate productivity and output with the newest HP ZBook Firefly 14 Mobile W...
PPTX
How to Get the Best Deep Learning performance with OpenVINO Toolkit
PDF
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
PDF
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
PDF
IRJET- Industry Production Manager using Raspberry Pi
PDF
Optimize creative and design workflows and enjoy a better user experience wit...
PPTX
Forts and Fights Scaling Performance on Unreal Engine*
PDF
How GPU Computing saved me at work PyData talk
PDF
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
PDF
Next Generation Cloud Computing With Google - RightScale Compute 2013
How GPU Computing literally saved me at work!
Play with azure functions
Serverless Computing with Azure Functions Best Practices
An overview of reference architectures for Postgres
 
"How to Get the Best Deep Learning Performance with the OpenVINO Toolkit," a ...
Allocated GPUs vs. GPU Quota in RunAI_ Differences Covered.pdf
08 Supercomputer Fugaku
"Making Computer Vision Software Run Fast on Your Embedded Platform," a Prese...
[DSC Europe 24] Thomas Kitzler - Building the Future – Unpacking the Essentia...
Graphics processing unit ppt
Escalate productivity and output with the newest HP ZBook Firefly 14 Mobile W...
How to Get the Best Deep Learning performance with OpenVINO Toolkit
"SoCs for Computer Vision-enabled IoT Devices," a March 2019 Silicon Valle...
AI Bridging Cloud Infrastructure (ABCI) and its communication performance
IRJET- Industry Production Manager using Raspberry Pi
Optimize creative and design workflows and enjoy a better user experience wit...
Forts and Fights Scaling Performance on Unreal Engine*
How GPU Computing saved me at work PyData talk
“Intel Video AI Box—Converging AI, Media and Computing in a Compact and Open ...
Next Generation Cloud Computing With Google - RightScale Compute 2013
Ad

More from PC Cluster Consortium (20)

PDF
PCCC24(第24回PCクラスタシンポジウム):株式会社アックス テーマ1「『俺のSoC』実現サポート」
PDF
PCCC24(第24回PCクラスタシンポジウム):東京大学情報基盤センター テーマ3「次の一手:Miyabiと将来に向けた取り組み」
PDF
PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』とそのソフトウェ...
PDF
PCCC24(第24回PCクラスタシンポジウム):日本オラクル株式会社 テーマ1「OCIのサステナビリティに対する取組」
PDF
PCCC24(第24回PCクラスタシンポジウム):Pacific Teck Japan テーマ3「そのコンテナ本当に大丈夫?マルウェアなどのセキュリティ対...
PDF
PCCC24(第24回PCクラスタシンポジウム):Pacific Teck Japan テーマ1「AI時代のクラスターマネジメントシステム 『Trinit...
PDF
PCCC24(第24回PCクラスタシンポジウム):エヌビディア合同会社 テーマ2「データセンター効率化のためのデータプロセッシングユニット NVIDIA ...
PDF
PCCC24(第24回PCクラスタシンポジウム):SCSK株式会社 テーマ2-1「マルチクラウド接続サービス『SCNX』」
PDF
PCCC24(第24回PCクラスタシンポジウム):SCSK株式会社 テーマ1「高負荷ハウジングサービス」
PDF
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ3「学際計算科学による最新の研究成果」
PDF
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PDF
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ1「スーパーコンピュータMiyabi」
PDF
PCCC24(第24回PCクラスタシンポジウム):菱洋エレクトロ株式会社 テーマ1「RYOYO AI Techmate Programのご紹介」
PDF
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PDF
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PDF
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PDF
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PDF
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PDF
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PDF
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
PCCC24(第24回PCクラスタシンポジウム):株式会社アックス テーマ1「『俺のSoC』実現サポート」
PCCC24(第24回PCクラスタシンポジウム):東京大学情報基盤センター テーマ3「次の一手:Miyabiと将来に向けた取り組み」
PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』とそのソフトウェ...
PCCC24(第24回PCクラスタシンポジウム):日本オラクル株式会社 テーマ1「OCIのサステナビリティに対する取組」
PCCC24(第24回PCクラスタシンポジウム):Pacific Teck Japan テーマ3「そのコンテナ本当に大丈夫?マルウェアなどのセキュリティ対...
PCCC24(第24回PCクラスタシンポジウム):Pacific Teck Japan テーマ1「AI時代のクラスターマネジメントシステム 『Trinit...
PCCC24(第24回PCクラスタシンポジウム):エヌビディア合同会社 テーマ2「データセンター効率化のためのデータプロセッシングユニット NVIDIA ...
PCCC24(第24回PCクラスタシンポジウム):SCSK株式会社 テーマ2-1「マルチクラウド接続サービス『SCNX』」
PCCC24(第24回PCクラスタシンポジウム):SCSK株式会社 テーマ1「高負荷ハウジングサービス」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ3「学際計算科学による最新の研究成果」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ2「スーパーコンピュータCygnus / Pegasus」
PCCC24(第24回PCクラスタシンポジウム):筑波大学計算科学研究センター テーマ1「スーパーコンピュータMiyabi」
PCCC24(第24回PCクラスタシンポジウム):菱洋エレクトロ株式会社 テーマ1「RYOYO AI Techmate Programのご紹介」
PCCC23:SCSK株式会社 テーマ1「『Azure OpenAI Service』導入支援サービス」
PCCC23:日本AMD株式会社 テーマ2「AMD EPYC™ プロセッサーを用いたAIソリューション」
PCCC23:富士通株式会社 テーマ1「次世代高性能・省電力プロセッサ『FUJITSU-MONAKA』」
PCCC23:東京大学情報基盤センター 「Society5.0の実現を目指す『計算・データ・学習』の融合による革新的スーパーコンピューティング」
PCCC23:日本AMD株式会社 テーマ1「AMD Instinct™ アクセラレーターの概要」
PCCC23:富士通株式会社 テーマ3「Fujitsu Computing as a Service (CaaS)」
PCCC23:日本オラクル株式会社 テーマ1「OCIのHPC基盤技術と生成AI」
Ad

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
MIND Revenue Release Quarter 2 2025 Press Release
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PDF
Unlocking AI with Model Context Protocol (MCP)
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
cuic standard and advanced reporting.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Empathic Computing: Creating Shared Understanding
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Review of recent advances in non-invasive hemoglobin estimation
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Network Security Unit 5.pdf for BCA BBA.
MIND Revenue Release Quarter 2 2025 Press Release
The AUB Centre for AI in Media Proposal.docx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
The Rise and Fall of 3GPP – Time for a Sabbatical?
Unlocking AI with Model Context Protocol (MCP)
Programs and apps: productivity, graphics, security and other tools
MYSQL Presentation for SQL database connectivity
Mobile App Security Testing_ A Comprehensive Guide.pdf
cuic standard and advanced reporting.pdf
Chapter 3 Spatial Domain Image Processing.pdf
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Empathic Computing: Creating Shared Understanding

PCCC24(第24回PCクラスタシンポジウム):富士通株式会社 テーマ2「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computing Broker』」

  • 1. AI Computing Broker Fujitsu Limited © 2024 Fujitsu Limited 1
  • 2. AI Computing Broker © 2024 Fujitsu Limited • works with a wide range of AI apps. based on PyTorch and TensorFlow • just install and use; no code changes are needed A middleware to share GPUs among AI apps. ACB PyTorch TF AI app. AI app. AI app. 2 • Best-in-class GPU utilization efficiency • Enabling full GPU memory for each job Key Features
  • 3. Feature of AI Computing Broker © 2024 Fujitsu Limited 3 • “Routine-level” allocation that detects actual GPU parts of jobs and dynamically allocates GPU accordingly Best-in-class GPU utilization efficiency Time Efficient GPU usage reduces execution time of multiple jobs ACB Conventional Job-level Allocation Job Job Job Job Idle Idle Idle Idle Idle Active Active Active Active GPU util. GPU CPU CPU GPU CPU CPU GPU CPU CPU GPU CPU CPU Idle Idle Active Active Active Active Routine-level GPU Allocation GPU util.
  • 4. Conventional: Spatial-sharing Feature of AI Computing Broker © 2024 Fujitsu Limited 4 Memory Computing Unit Job A Job B Job C Sharing among Job A/B/C Memory Comp. Unit Job A Job A Memory Comp. Unit Job B Job B Memory Comp. Unit Job C Job C Enabling full GPU memory for each job Memory is divided among jobs Limited to small AI models Memory is occupied by each job Large AI models can run swap memory swap memory • Allocate GPU to only one job at a time (Temporal-sharing) • Data of other jobs on GPU is automatically swapped to CPU ACB Temporal-sharing
  • 5. Development status © 2024 Fujitsu Limited 5 Doubled the model training throughput Deploying more AI tasks beyond hardware limits Currency prediction service Data Center Service TRADOM Inc. Sakura Internet Inc. ・Using multiple GPU in a server ・LLM inference, fine-tuning ・Using multiple servers ・Large-scale LLM training ・Small-scale AI tasks (e.g., Image recognition) Refer to the press release on Oct. 22. Single GPU Multi GPU Available Multi-server
  • 6. © 2024 Fujitsu Limited Thank you For more details: https://www.fujitsu.com/global/products/computing/ servers/supercomputer/topics/sc24/