AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI

The power of Ray in the era of
LLM and multi-modality AI

Current: NVIDIA (DGX Cloud)
‘20 ~ ‘24: Anyscale Head of Ray / OSS
Before: Cloudera, LinkedIn, OSS
Hadoop, Spark etc.

Ray is a popular OSS project
Ray
Apache Spark
MLFlow

ChatGPT / GPT-4 Trained on Ray!
Fast iterations at Hyper-scale!

This Talk
Adoption Highlights
What is Ray
How is Ray Used?
Future Outlook

Ray: a short history
2016: Started in UC Berkeley (same lab as Spark / vLLM)
2019: Anyscale founded (company behind Ray)
2020: Ray v1.0 release; Ray Serve released
2022: Ray v2.0 release; (KubeRay, Ray Data etc)
2023 / 2024: a lot of focus on LLMs

Ray: Holistically addresses AI/LLM challenges
Unified Framework for Scaling AI Workloads
Ray Core
“Operating System” for heterogeneous distributed computing
Ray AI Libraries
Data Train Reinforcement Learning Serve
Tune

Minimalist API
ray.init() Initialize Ray context.
@ray.remote
Function or class decorator specifying that the function will be executed as a task or
the class as an actor in a diﬀerent process.
.remote
Postﬁx to every remote function, remote class declaration, or invocation of a remote
class method. Remote operations are asynchronous.
ray.put()
Store object in object store, and return its ID. This ID can be used to pass object as an
argument to any remote function or method call. This is a synchronous operation.
ray.get()
Return an object or list of objects from the object ID or list of object IDs. This is a
synchronous (i.e., blocking) operation.

def read_array(file):
# read ndarray “a”
# from “file”
return a
def add(a, b):
return np.add(a, b)
a = read_array(file1)
b = read_array(file2)
sum = add(a, b)
Function
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter()
c.inc()
c.inc()
Class

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
a = read_array(file1)
b = read_array(file2)
sum = add(a, b)
@ray.remote
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter()
c.inc()
c.inc()
Function → Task Class → Actor

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(file1)
id = add.remote(id1, id2)
sum = ray.get(id)
@ray.remote
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id4 = c.inc.remote()
Function → Task Object → Actor

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
file1 file2
Node 1 Node 2
id1: distributed future (object id)
read_array
id1
Return future id1; before
read_array() finishes
Task API

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
ﬁle1 ﬁle2
Node 1 Node 2
read_array
id1
read_array
id2
Dynamic task graph:
build at runtime
Task API

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
file1 file2
Node 1 Node 2
read_array
id1
read_array
id2
add
id
Node 3
Every task submitted,
but not finished yet
Task API

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
ﬁle1 ﬁle2
Node 1 Node 2
read_array
id1
read_array
id2
add
id
Node 3
ray.get() block until
result available
Task API

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
ﬁle1
Node 1 Node 2
Node 3
read_array
ﬁle2
read_array
add
sum
Task graph executed to
compute sum
Task API

@ray.remote
# from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
sum = ray.get(id)
@ray.remote(num_cpus=2, num_gpus=1)
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
val = ray.get(id2)
Function → Task Class → Actor
can specify
resource demands;
support heterogeneous hardware

The “No B.S” Slide 🤓
At the end of the day, “RPC is all you need”
💻
Run func(x)
Return y

. . .
💻
Harry: do this Ron: do that

. . .
💻
SELECT * …

. . .
💻
I need someone to do this (w/ 2 CPUs)
I need someone to do that (w/ 1 H100 GPU)

“No abstraction; it’s like SSH”
“Do things my way (queries)”
“It’s still Python code”

Most Successful Use Cases
Model Training Unstructured Data
(text, image, video)
LLM Inference /
Fine-tune
Reinforcement
Learning
Graph
Computing
…

Unstructured Data – Benchmark
Leading
open-source
framework
Leading
commercial
ML Platform
$0
$20
$40
$60
$3.5
Load
Pre-
processing
Inference Save
Ray Core
CPU CPU
GPU
Batch Inference
● Use most cost-effective hardware for each stage
● Independently scale every stage
$7.3
$57
Cost to process 1M images
https://www.anyscale.com/blog/offline-batch-inference-comparing-ray-apache-spark-and-sagemaker

Unstructured Data – Benchmark
Load
Pre-
processing
Inference Save
Ray Core
CPU CPU
GPU
● Use most cost-effective hardware for each stage
● Independently scale every stage

Zhe’s Takes on ML Infra
“No abstraction; it’s like SSH”
“Do things my way (queries)”
“It’s still Python code”

How structured is the problem?
- ML is a much more unstructured problem than
Data at this point
- It could become structured at some point
- Most people are still settling with the ssh + cmd
approach (e.g. torchrun on Slurm)

The rise of multi-modality AI
- Much much more data-intensive
- More development / trial-and-error
- It becomes more meaningful to “upgrade your
gear” (try Ray)

AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI

More Related Content

Similar to AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI (20)

More from Alluxio, Inc. (20)

Recently uploaded (20)

AI/ML Infra Meetup | The power of Ray in the era of LLM and multi-modality AI