SlideShare a Scribd company logo
8
Most read
10
Most read
12
Most read
Rivian Internal
Rivian Autonomy and AI
Building a Scalable Platform for Data and ML Workloads
Rivian Internal
Rivian Internal
Aleksandar Cvejic
Data Applications and ML Infrastructure
Autonomy and AI, Rivian
Rivian Internal
Agenda
- ADAS challenges
- ADAS at Rivian
- Polaris - Data Platform
- Polaris Apps
- Simulations
- ML workloads
Rivian Internal
ADAS - Trends and Challenges
- Data volume and variety
- Processing and orchestration
- Cost/time to market
Rivian Internal
ADAS Data Variety
- 11 Cameras
55 MP of real-time imaging
State-of-the-art resolution
and dynamic range
- 5 Radars
Multi-modal sensing supported
by 5 advanced radars
300 meters of forward-facing
detection range
- 360° Sensing
Overlapping sensors provide
redundancy, robustness,
and performance
Rivian Internal
ADAS Data Volumes
- RPD (Records Per Day)
6b -> 40b -> 2t
- Total Volume
14TB -> 50TB -> 1.6PB
- SLA (data availability)
150h -> 38h -> 4h
Rivian Internal
ADAS Data Loop
Uploaded & Anonymized Labelling in Cloud
New Software OTA Model Training
Rivian Internal
ADAS - Data pipeline
Loggers Ingestion
Metadata
Tagging
Curation Labeling
Data Set
Management
Model
Training
Deployment
Data
Collection
OTA
Updates
Rivian Internal
POLARIS - Autonomy Data Platform
Principles:
- Ease of use and Reusability
- Data Management, Governance and Security
- Data Operations, Automation and Collaboration Platform
- Data Visualization and Analytics
- Support Simulations and ML Workloads
Apps:
- Data Discovery
- Metadata Management Framework
- Labeling, Visualization
- Simdash, Cosmo and Rivx
Rivian Internal
ADAS Architecture for ML workload
Model Training
S3
EKS KubeRay mlflow
Simulation and Validation
S3
AWS Batch Databrick
s
Gitlab
Data ingestion and processing
SQS Argo
Workflows
AWS Batch S3
Rivian Internal
Simulations Architecture
Bottlenecks:
- TPS
- Startup time
- Large simulations
Optimizations
- Bundling jobs
- Caching docker images
- Fair-share policies
- shareIdentifier
- weightFactor
- shareDecay
- Compute Capacity reservation
Simdash
Databricks
Amazon S3
AWS Batch
Streamlit
PolarisViz
Simulation and validation
Gitlab CI
Ondemand
EventBridge
Scheduled
Rivian Internal
ML Infrastructure
AWS CloudWatch AWS Managed Grafana AWS Managed
Prometheus
AWS EC2 Amazon S3
EFA Fluentd EBS Prometheus EFS GPU operator S3 Mountpoint
Plugins
Job Management
Kueue Rayjob PyTorch
Rayjobs Experiment Tracking
Rivx CLI
Polaris
Cosmo
Rivian Internal
ML Infrastructure Framework - rivx
Rivian Internal
ML Infrastructure Framework
Rivian Internal
ML Infrastructure Framework - Cosmo
Rivian Internal
ML Infrastructure Framework - Cosmo
Rivian Internal
What’s next?
- Stability
- Optimization
- Capabilities
300x
115x
40x
Rivian Internal

More Related Content

PDF
Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems
PPTX
[DSC Europe 24] Veljko Tanjga - Data Platform Infrastructure at Rivian: A Dee...
PPTX
Autonomous driving on your developer pc. technologies, approaches, future
PPTX
IoT Tech Expo 2023_Micha vor dem Berge presentation
PDF
PDF
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
PDF
A REVIEW ON MACHINE LEARNING IN ADAS
PPTX
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”
Solutions for ADAS and AI data engineering using OpenPOWER/POWER systems
[DSC Europe 24] Veljko Tanjga - Data Platform Infrastructure at Rivian: A Dee...
Autonomous driving on your developer pc. technologies, approaches, future
IoT Tech Expo 2023_Micha vor dem Berge presentation
Lessons Learned from Using Spark for Evaluating Road Detection at BMW Autonom...
A REVIEW ON MACHINE LEARNING IN ADAS
“ADAS in Action (POC Autonomous Driving Vehicle Presentation)”

Similar to [DSC Europe 24] Aleksandar Cvejic - Rivian Autonomy and AI: Building a Scalable Platform for Data and ML Workloads (20)

PPTX
Artificial-Intelligence-in-Autonomous-Vehicles (1).pptx
PPTX
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Analytics for Autonomous Driving with ROS
PDF
au·ton·o·mous Trends | Tech | Future
PDF
Data Driven Development of Autonomous Driving at BMW
PDF
Techniques and Challenges in Autonomous Driving
PDF
Validation Framework for Autonomous Aerial Vehicles
PPTX
Create data-driven services from vehicle operating data.
PDF
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
PDF
Deep Learning for Autonomous Driving
PPTX
hackathon 7.0 idea for presenting latest tech
PDF
"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems...
PDF
Big Data Gold Mining
PPTX
Building ADAS system from scratch
PPTX
Autonomous car.pptx
PPTX
Megatrend: ADAS / AD: Fully digital tool chain Initiative
PDF
Autoware Architecture Proposal
PPTX
[DSC Europe 24] Nemanja Tiosavljevic - From Insights to Impact: Transforming ...
PDF
AWS O&G Day - Ambyint and AWS
Artificial-Intelligence-in-Autonomous-Vehicles (1).pptx
Artificial-Intelligence-in-Autonomous-Vehicles (1)-1.pptx
Processing Large Datasets for ADAS Applications using Apache Spark
Analytics for Autonomous Driving with ROS
au·ton·o·mous Trends | Tech | Future
Data Driven Development of Autonomous Driving at BMW
Techniques and Challenges in Autonomous Driving
Validation Framework for Autonomous Aerial Vehicles
Create data-driven services from vehicle operating data.
Dynniq & GoDataDriven - Shaping the future of traffic with IoT and AI
Deep Learning for Autonomous Driving
hackathon 7.0 idea for presenting latest tech
"Deploying Deep Learning Models on Embedded Processors for Autonomous Systems...
Big Data Gold Mining
Building ADAS system from scratch
Autonomous car.pptx
Megatrend: ADAS / AD: Fully digital tool chain Initiative
Autoware Architecture Proposal
[DSC Europe 24] Nemanja Tiosavljevic - From Insights to Impact: Transforming ...
AWS O&G Day - Ambyint and AWS
Ad

More from DataScienceConferenc1 (20)

PPTX
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
PPTX
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
PPTX
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
PPTX
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
PPTX
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
PPTX
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
PPTX
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
PDF
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
PDF
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
PPTX
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
PPTX
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
PPTX
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
PDF
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
PPTX
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
PPTX
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
PDF
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
PPTX
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
PPTX
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
PPTX
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
PPTX
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
[DSC Europe 24] Anastasia Shapedko - How Alice, our intelligent personal assi...
[DSC Europe 24] Joy Chatterjee - Balancing Personalization and Experimentatio...
[DSC Europe 24] Pratul Chakravarty - Personalized Insights and Engagements us...
[DSC Europe 24] Domagoj Maric - Modern Web Data Extraction: Techniques, Tools...
[DSC Europe 24] Marcin Szymaniuk - The path to Effective Data Migration - Ove...
[DSC Europe 24] Fran Mikulicic - Building a Data-Driven Culture: What the C-S...
[DSC Europe 24] Sofija Pervulov - Building up the Bosch Semantic Data Lake
[DSC Europe 24] Dani Ei-Ayyas - Overcoming Loneliness with LLM Dating Assistant
[DSC Europe 24] Ewelina Kucal & Maciej Dziezyc - How to Encourage Children to...
[DSC Europe 24] Nikola Milosevic - VerifAI: Biomedical Generative Question-An...
[DSC Europe 24] Josip Saban - Buidling cloud data platforms in enterprises
[DSC Europe 24] Sray Agarwal - 2025: year of Ai dilemma - ethics, regulations...
[DSC Europe 24] Peter Kertys & Maros Buban - Application of AI technologies i...
[DSC Europe 24] Orsalia Andreou - Fostering Trust in AI-Driven Finance
[DSC Europe 24] Arnault Ioualalen - AI Trustworthiness – A Path Toward Mass A...
[DSC Europe 24] Nathan Coyle - Open Data for Everybody: Social Action, Peace ...
[DSC Europe 24] Miodrag Vladic - Revolutionizing Information Access: All Worl...
[DSC Europe 24] Katherine Munro - Where there’s a will, there’s a way: The ma...
[DSC Europe 24] Ana Stojkovic Knezevic - How to effectively manage AI/ML proj...
[DSC Europe 24] Simun Sunjic & Lovro Matosevic - Empowering Sales with Intell...
Ad

Recently uploaded (20)

PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
.pdf is not working space design for the following data for the following dat...
PDF
Mega Projects Data Mega Projects Data
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Computer network topology notes for revision
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
SAP 2 completion done . PRESENTATION.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
Introduction to machine learning and Linear Models
PDF
Fluorescence-microscope_Botany_detailed content
PDF
[EN] Industrial Machine Downtime Prediction
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PDF
Introduction to the R Programming Language
PPTX
Qualitative Qantitative and Mixed Methods.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
climate analysis of Dhaka ,Banglades.pptx
.pdf is not working space design for the following data for the following dat...
Mega Projects Data Mega Projects Data
IBA_Chapter_11_Slides_Final_Accessible.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Market Analysis -202507- Wind-Solar+Hybrid+Street+Lights+for+the+North+Amer...
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Computer network topology notes for revision
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
SAP 2 completion done . PRESENTATION.pptx
annual-report-2024-2025 original latest.
Introduction to machine learning and Linear Models
Fluorescence-microscope_Botany_detailed content
[EN] Industrial Machine Downtime Prediction
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
Introduction to the R Programming Language
Qualitative Qantitative and Mixed Methods.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
STUDY DESIGN details- Lt Col Maksud (21).pptx

[DSC Europe 24] Aleksandar Cvejic - Rivian Autonomy and AI: Building a Scalable Platform for Data and ML Workloads

Editor's Notes

  • #3: Hi everyone, my name is Aleksandar, I’m a lead of Data Applications and ML infra team here in Serbia. We are part of the Autonomy and AI organisation at Rivian and together with our colleagues in the US we manage and maintain Cloud infra and cloud data platform for Rivian’s Advanced Driver Assistance Systems—or ADAS.
  • #4: Today, I want to share our journey in tackling ADAS challenges. I’ll walk you through how we manage data at scale, the role of Polaris—our centralized platform—and how it enables workflows like simulations and machine learning. Along the way, I’ll highlight some specific optimizations we’ve implemented to ensure scalability, efficiency, and innovation. Let’s start by exploring the challenges unique to this domain.
  • #5: In the world of autonomy, data isn’t just critical—it’s massive, diverse, and constantly growing. Let’s look at the key challenges we face in ADAS development. Volume and Variety Data grows in an unprecedented manner and requirements to process and utilize that data change continuously. Processing and Orchestration Managing this data is no small task - anonymizing sensitive information, tagging metadata, and preparing it for machine learning or simulation workflows. Each step needs to be efficient, scalable, and accurate to meet the demands of our iteration cycles. Speed is crucial.
  • #6: DATA Variety Rivian’s vehicles are equipped with: 11 cameras capturing 55MP real-time imaging at state-of-the-art resolution and dynamic range, 5 advanced radars providing multi-modal sensing and detecting objects up to 300 meters ahead, 360° sensing with overlapping sensors for redundancy and performance. The result of those sensors is vaste volume of data that grows exponentially.
  • #7: On that scale each optimization translates in cost reduction of hundreds of thousands of dollars in cloud infra and in engineering working hours.
  • #8: So how does the Data loop, or data lifeccyle in ADAS looks like? Bird eye view of data lifecycle in Autonmy is pretty much straightforward. Data is collected uploaded, processed, used for model training, deployed to our vehicles using OTA and then the cycle starts again.
  • #9: Processing Before we built our Data platform, we had tools and pipelines built for the specific use cases, built by different teams, many times different teams solved same problems in a different ways. So we decided to create a comprehensive and centralised Data platform that we named Polaris
  • #10: Polaris is the backbone of Rivian’s ADAS data operations, designed to unify, simplify, and scale our workflows. Think of it as the operating system for autonomy data. We had few of the guiding principles while we were building it: —-- By solving these challenges, Polaris empowers teams to focus on innovation rather than infrastructure.
  • #11: If we go back to that Data loop that I was showing few slides ago I want to reiterate on it with the focus on our ML workload workflow. —------- Now, let’s take a closer look at how simulations and machine learning workflows work within Polaris.
  • #12: Simulations are vital to ADAS development, enabling us to generate additional data and validate models under controlled conditions. Data Augmentation Collecting real-world data is time-consuming and costly. Simulations help us generate millions of miles of synthetic data, accelerating model training. Scenario Testing Simulations allow us to validate models against specific KPIs, such as obstacle detection or lane-keeping accuracy.
  • #13: Fluentd, as well as Prometheus, for collecting logs from the applications and metrics. We also use EBS, EFS, S3, which are storage offerings or storage plugins that interact with Amazon services. GPU operator. It is used to essentially expose the GPUs that are available on the compute as a Kubernetes resource. So users can choose any rate GPUs when they setup their jobs. Kueue, open source plugin from Google, gang scheduling or preemption, prioritization, and different kinds of scheduling. We also use RayJob CRD, which is part of KueueRay operator. This RayJob CRD is plugin that is responsible for managing the lifecycle of a job. PyTorch for distributed data parallel processing inside the RayJob. MLflow for tracking so all the metrics that are emited from the job are collected and stored in MLflow for analysis later. We also store checkpoints in MLflow
  • #14: Rivx is a job scheduling tool for automatically launching jobs on the cloud. Backends Docker image To use rivx-train, you need to create a .rivxconfig. Based on resource requests kueue schedules your job at one of the falvors that can fulfile requirements. Ray clusters are ephemeral- usage optimization and removing single point of failure making it harder to debug issues, get logs, track progress of runs …etc So we built observability and tracking system for users
  • #15: So we use prometheus to collect metrics and logs and expose them in Grafana. As grafana is AWS managed and thus costs and not everyone outside the team are not well-versed we build our own app that will be entry point for all ML Workload insights, inside the Polaris platform, called Cosmo.
  • #17: —------------------ That would be all I wanted to show you today regarding ADAS Cloud platform at Rivian
  • #18: What’s Next? If there is one thing I want you to remember from today, it would be that chart I showed you at the begging. As Rivian expands its fleet and enters new markets, including Europe, so be ready to see one of these in your neighborhood. With this growth, the scale and complexity of data will continue to grow. It’s difficult to predict exactly what we’ll face a year from now, so the only constant in this field is change. Polaris is a scalable, efficient, and user-friendly platform that enables us to stay ahead of the curve. Thank you for your time—I’m happy to take your questions.