Optimizing OCI Dedicated AI Cluster Sizing and Pricing: A Primer

Amrita Mukherjee, CCSP

Sr. Technical Alliances Consultant @ Oracle | Cloud Security, Presales

Published Jan 23, 2025

In this article, we will explore the sizing and pricing of dedicated AI clusters, focusing on use cases like hosting and fine-tuning AI models. Dedicated AI clusters come in four unit types, each tailored to specific tasks and requirements:

Types of AI Cluster Units

Large Cohere Unit: Used for hosting or fine-tuning the Cohere Command base model.
Small Cohere Unit: Supports Cohere Command Light models for hosting or fine-tuning.
Embed Cohere Unit: Designed for hosting Cohere Embed models.
Llama 2-70 Unit: Supports hosting Llama 2 models, specifically the Llama2 70-billion-parameter chat model within the OCI Generative AI Service.

Sizing Requirements for Hosting and Fine-Tuning

Use Case 1: Text Generation

Fine-Tuning: Requires two Large Cohere Units for resource-intensive fine-tuning jobs.
Hosting: Requires a minimum of one Large Cohere Unit post fine-tuning.

Use Case 2: Summarization

Fine-tuning is unsupported, but hosting requires only one Large Cohere Unit.

Use Case 3: Embedding

Similar to summarization, fine-tuning is unsupported. Hosting can be done with a single Embed Cohere Unit.

For models like Llama2, fine-tuning is not currently supported. Hosting Llama2 models requires only one Llama 2-70 Unit.

Example: Fine-Tuning and Hosting Workflow

To illustrate, let’s say a user needs to fine-tune and host a Cohere Command model:

Fine-Tuning Two Large Cohere Units are required for fine-tuning. Since fine-tuning is resource-intensive, each session might run for around five hours.
Hosting After fine-tuning, hosting the model requires one Large Cohere Unit. Hosting ensures that the model is accessible for inference and production traffic.

In total, the user would need three Large Cohere Units: two for fine-tuning and one for hosting.

Real-Life Application: Pricing a Fine-Tuning and Hosting Scenario

Let’s calculate the cost for a scenario:

Fine-Tuning Cost: Fine-tuning requires two units for five hours per week, totaling 10 unit hours per session. Over four weeks, this adds up to 40 unit hours (10 unit hours × 4 weeks).
Hosting Cost: Hosting requires a minimum commitment of 744 unit hours per month (24 hours/day × 31 days).

Total cost = (Fine-Tuning Unit Hours + Hosting Unit Hours) × Cost Per Unit Hour.

For example:

Fine-Tuning: 40 unit hours × [Unit Price].
Hosting: 744 unit hours × [Unit Price].
Total: 784 unit hours × [Unit Price].

Efficient Utilization of Clusters

Clusters can be optimized for multiple models. For instance:

A single fine-tuning cluster can support several models sequentially, eliminating the need for multiple environments.
A hosting cluster can handle up to 50 endpoints, supporting various fine-tuned models.

Conclusion

Understanding the unit requirements and costs associated with fine-tuning and hosting AI models is crucial for optimizing resources. By using dedicated AI clusters effectively, users can manage costs while scaling their AI capabilities.

This guide serves as a roadmap to sizing and pricing AI clusters based on specific use cases. Reach out to learn more!

Optimizing OCI Dedicated AI Cluster Sizing and Pricing: A Primer

Amrita Mukherjee, CCSP

Sr. Technical Alliances Consultant @ Oracle | Cloud Security, Presales

Types of AI Cluster Units

Sizing Requirements for Hosting and Fine-Tuning

Example: Fine-Tuning and Hosting Workflow

Real-Life Application: Pricing a Fine-Tuning and Hosting Scenario

Efficient Utilization of Clusters

Conclusion

More articles by this author

Others also viewed

What’s New and How It’s Shaping the Future

Laravel Cloud 2025: What’s New and How It’s Shaping the Future

AWS EKS - Multi-Tier HA

"How to Host Your Static Website on AWS S3: A Complete Beginner’s Guide"

Vendor lock-in is not a love story ... Freedom is an architecture decision

AWS: How to host web application?

Deploying a Laravel Application on AWS Using Elastic Beanstalk

Choose The Right Instance Types: AWS

How to Set Up and Configure AWS for a Secure and Efficient Laravel Application Environment

Transforming monolithic .NET applications with Azure: A guide for ISVs

Explore topics

Types of AI Cluster Units

Sizing Requirements for Hosting and Fine-Tuning

Example: Fine-Tuning and Hosting Workflow

Real-Life Application: Pricing a Fine-Tuning and Hosting Scenario

Efficient Utilization of Clusters

Conclusion

Bridging the Cybersecurity Budget & Talent Gap in 2025

Jul 27, 2025

Shamoon Defense Checklist - FFIEC Version

Apr 11, 2025

Palo Alto Networks and it’s GenAI toolkit

Mar 14, 2025

Part 4 - The Rise and Fall of Stitch Fix contd.

Mar 11, 2025

Part 3: The Rise and Fall of Stitch Fix

Mar 10, 2025

Stitch Fix: Disrupting Retail through Data Science Part 2 - Technical Depth

Mar 10, 2025

Stitch Fix: Disrupting Retail through Data Science Part 1 - Business Impact

Mar 10, 2025

Tensor Parallelism using a 7-layer dip Analogy!

Jan 31, 2025

DeepSeek - How did they do it?

Jan 28, 2025

Introduction to Oracle Database@Google Cloud

Jan 27, 2025

Others also viewed

What’s New and How It’s Shaping the Future

Laravel Cloud 2025: What’s New and How It’s Shaping the Future

AWS EKS - Multi-Tier HA

"How to Host Your Static Website on AWS S3: A Complete Beginner’s Guide"

Vendor lock-in is not a love story ... Freedom is an architecture decision

AWS: How to host web application?

Deploying a Laravel Application on AWS Using Elastic Beanstalk

Choose The Right Instance Types: AWS

How to Set Up and Configure AWS for a Secure and Efficient Laravel Application Environment

Transforming monolithic .NET applications with Azure: A guide for ISVs

Explore topics

Bridging the Cybersecurity Budget & Talent Gap in 2025