PCCC24（第24回PCクラスタシンポジウム）：富士通株式会社テーマ２「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computing Broker』」

AI Computing Broker
Fujitsu Limited
© 2024 Fujitsu Limited
1

AI Computing Broker
• works with a wide range of AI apps.
based on PyTorch and TensorFlow
• just install and use; no code changes
are needed
A middleware to share GPUs among AI apps.
ACB
PyTorch TF
AI
app.
AI
app.
AI
app.
2
• Best-in-class GPU utilization efficiency
• Enabling full GPU memory for
each job
Key Features

Feature of AI Computing Broker
3
• “Routine-level” allocation that detects actual GPU parts of
jobs and dynamically allocates GPU accordingly
Best-in-class GPU utilization efficiency
Time
Efficient GPU usage
reduces execution time
of multiple jobs
ACB
Conventional
Job-level
Allocation
Job Job Job Job
Idle Idle Idle Idle Idle
Active Active
Active Active
GPU util.
GPU
CPU CPU
GPU
CPU CPU
GPU
CPU CPU
GPU
CPU CPU
Idle Idle
Active Active
Active Active
Routine-level
GPU Allocation GPU util.

Conventional: Spatial-sharing
Feature of AI Computing Broker
4
Memory
Computing Unit
Job A Job B Job C
Sharing among Job A/B/C
Memory
Comp. Unit
Job A
Job A
Memory
Comp. Unit
Job B
Job B
Memory
Comp. Unit
Job C
Job C
Enabling full GPU memory for each job
Memory is divided among jobs
Limited to small AI models
Memory is occupied by each job
Large AI models can run
swap
memory
swap
memory
• Allocate GPU to only one job at a time (Temporal-sharing)
• Data of other jobs on GPU is automatically swapped to CPU
ACB Temporal-sharing

Development status
5
Doubled the
model training throughput
Deploying more AI tasks
beyond hardware limits
Currency prediction service Data Center Service
TRADOM Inc. Sakura Internet Inc.
・Using multiple GPU in a server
・LLM inference, fine-tuning
・Using multiple servers
・Large-scale LLM training
・Small-scale AI tasks
(e.g., Image recognition)
Refer to the press release on Oct. 22.
Single GPU
Multi GPU
Available
Multi-server

Thank you
For more details:
https://www.fujitsu.com/global/products/computing/
servers/supercomputer/topics/sc24/

PCCC24（第24回PCクラスタシンポジウム）：富士通株式会社テーマ２「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computing Broker』」

More Related Content

Similar to PCCC24（第24回PCクラスタシンポジウム）：富士通株式会社テーマ２「AI処理におけるGPUの演算効率を高めるミドルウェア技術『AI Computing Broker』」 (20)

More from PC Cluster Consortium (20)

Recently uploaded (20)