Optimizing OCI Dedicated AI Cluster Sizing and Pricing: A Primer

Optimizing OCI Dedicated AI Cluster Sizing and Pricing: A Primer

In this article, we will explore the sizing and pricing of dedicated AI clusters, focusing on use cases like hosting and fine-tuning AI models. Dedicated AI clusters come in four unit types, each tailored to specific tasks and requirements:

Types of AI Cluster Units

  1. Large Cohere Unit: Used for hosting or fine-tuning the Cohere Command base model.

  2. Small Cohere Unit: Supports Cohere Command Light models for hosting or fine-tuning.

  3. Embed Cohere Unit: Designed for hosting Cohere Embed models.

  4. Llama 2-70 Unit: Supports hosting Llama 2 models, specifically the Llama2 70-billion-parameter chat model within the OCI Generative AI Service.

Sizing Requirements for Hosting and Fine-Tuning

Use Case 1: Text Generation

  • Fine-Tuning: Requires two Large Cohere Units for resource-intensive fine-tuning jobs.

  • Hosting: Requires a minimum of one Large Cohere Unit post fine-tuning.

Use Case 2: Summarization

  • Fine-tuning is unsupported, but hosting requires only one Large Cohere Unit.

Use Case 3: Embedding

  • Similar to summarization, fine-tuning is unsupported. Hosting can be done with a single Embed Cohere Unit.

For models like Llama2, fine-tuning is not currently supported. Hosting Llama2 models requires only one Llama 2-70 Unit.

Example: Fine-Tuning and Hosting Workflow

To illustrate, let’s say a user needs to fine-tune and host a Cohere Command model:

  1. Fine-Tuning Two Large Cohere Units are required for fine-tuning. Since fine-tuning is resource-intensive, each session might run for around five hours.

  2. Hosting After fine-tuning, hosting the model requires one Large Cohere Unit. Hosting ensures that the model is accessible for inference and production traffic.

In total, the user would need three Large Cohere Units: two for fine-tuning and one for hosting.

Real-Life Application: Pricing a Fine-Tuning and Hosting Scenario

Let’s calculate the cost for a scenario:

  • Fine-Tuning Cost: Fine-tuning requires two units for five hours per week, totaling 10 unit hours per session. Over four weeks, this adds up to 40 unit hours (10 unit hours × 4 weeks).

  • Hosting Cost: Hosting requires a minimum commitment of 744 unit hours per month (24 hours/day × 31 days).

Total cost = (Fine-Tuning Unit Hours + Hosting Unit Hours) × Cost Per Unit Hour.

For example:

  • Fine-Tuning: 40 unit hours × [Unit Price].

  • Hosting: 744 unit hours × [Unit Price].

  • Total: 784 unit hours × [Unit Price].

Efficient Utilization of Clusters

Clusters can be optimized for multiple models. For instance:

  • A single fine-tuning cluster can support several models sequentially, eliminating the need for multiple environments.

  • A hosting cluster can handle up to 50 endpoints, supporting various fine-tuned models.

Conclusion

Understanding the unit requirements and costs associated with fine-tuning and hosting AI models is crucial for optimizing resources. By using dedicated AI clusters effectively, users can manage costs while scaling their AI capabilities.

This guide serves as a roadmap to sizing and pricing AI clusters based on specific use cases. Reach out to learn more!

 

To view or add a comment, sign in

Others also viewed

Explore topics