Structuring Your AsyncAPI Definitions: Strategies for Organization and Reuse

Structuring Your AsyncAPI Definitions: Strategies for Organization and Reuse

As event-driven architectures (EDAs) and asynchronous communication patterns become more prevalent, managing the complexity of defining these interactions is crucial. AsyncAPI has emerged as the industry standard for describing asynchronous APIs, but as specifications grow, keeping them organized, maintainable, and reusable becomes a significant challenge.

This article explores strategies for structuring your AsyncAPI files effectively, leveraging the $ref mechanism, addressing common hurdles, particularly around referencing shared and protected components, and utilizing GitOps practices for robust management.

The Core Principle: Reuse with $ref

At the heart of organizing AsyncAPI documents lies the $ref keyword. Borrowed from JSON Schema, $ref allows you to define a piece of information once and reference it from multiple places within your AsyncAPI document or even from other documents entirely. This is fundamental for avoiding duplication and ensuring consistency. You can reference components defined within the same file (e.g., #/components/messages/UserSignedUp) or, more powerfully for organization, components defined in external files or URLs.

What Should You Extract and Reference?

Not all parts of an AsyncAPI document are equally good candidates for extraction into separate, reusable files.

High-Confidence Candidates for Extraction: These components are often shared across multiple services or applications, making them ideal for defining once and referencing everywhere:

  • Servers: Descriptions of message brokers or server endpoints are often reused across applications connecting to the same infrastructure.
  • Schemas: Payload schemas define the structure of your event data and are frequently shared between producers and consumers.
  • Messages: If the same logical message is used by multiple applications, defining it centrally prevents drift.
  • Channels: If multiple applications interact with the same channel, defining it once ensures consistency.

Potential Candidates (Context-Dependent): Extracting these depends more on your specific architecture and team practices:

  • Bindings: Protocol-specific information might be reusable if you have strong conventions.
  • Traits: Reusable definitions for operations or messages are useful if you identify common patterns.
  • Operations: Defining publish/subscribe operations might be reusable but often represent a specific application's interaction.
  • Info: The top-level info block is typically specific to a single AsyncAPI document.

Referencing Strategies: Local Files vs. Remote URLs

You have two main options when using $ref with external resources:

Local Filesystem Paths: These use relative or absolute paths on the local machine (e.g., $ref: './shared/schemas/user.yaml').

  • They are simple for single-developer projects or monorepos and work offline.
  • However, they can become brittle if file structures change and are harder to manage in distributed teams or complex CI/CD pipelines. They don't easily support sharing definitions across disparate repositories.

Remote URLs: These point to resources hosted online (e.g., $ref: 'https://schemas.mycompany.com/user/v1.yaml').

  • This is the generally recommended approach for teams as it decouples definitions from specific project structures and enables centralized repositories.
  • It integrates better with tooling and CI/CD but requires network access. The main challenge arises when referencing protected resources.

Handling Protected Resources via $ref

Referencing protected resources via URLs presents challenges because standard $ref resolution primarily supports publicly accessible URLs or Basic Authentication. Never hardcode credentials in $ref URLs (e.g., https://user:password@...). Here are current workarounds:

  • VPN + Internal Unprotected Access: Host components on an internal server accessible only via VPN, with $ref URLs pointing there. Simple internally, but limited externally.
  • Variable Placeholders + Runtime Replacement: Use placeholders like $SCHEMA_REGISTRY_USER in the URL within the AsyncAPI file. Replace these with actual credentials via a script or CI/CD process before parsing. This works but requires pre-processing.
  • Custom Authentication Proxy: Create a proxy service that your $ref URLs point to. The proxy handles authentication with the backend (e.g., schema registry) and forwards the request, hiding credentials from the AsyncAPI file. Requires maintaining the proxy.
  • Custom Spectral Resolver (Recommended Advanced Approach): Since the AsyncAPI Parser uses Spectral for resolving $refs, you can write a custom Spectral resolver. This code intercepts requests to your protected domains, reads credentials securely (e.g., from environment variables), adds appropriate authentication headers, and fetches the resource. This is the most integrated solution.

Where and How to Store Reusable Components: Git-Based Strategies

Choosing where to store extracted components and how to manage changes is crucial. Using Git provides a robust foundation. Here are common repository strategies:

  • Single Central Repository: All shared AsyncAPI components reside in one dedicated Git repository, organized by folders. This provides a single source of truth and allows centralized governance via Pull Requests (PRs). However, it can become a bottleneck without good automation. Automation and referencing components from PR branches directly during development can mitigate delays.
  • Domain-Specific Repositories (DDD Alignment): Each domain gets its own repository for its specific components (e.g., order-domain-components), perhaps with a separate shared-components repo for cross-cutting concerns. This distributes ownership but requires clear domain boundaries and potentially tooling for discovery across domains.
  • Producer-Owned Definitions (Generally Discouraged): Placing shared definitions (channels, messages, schemas) directly in the producer's repository creates tight coupling and leads to a hard-to-manage "point-to-point mess". Ownership of shared infrastructure like servers also becomes ambiguous. Avoid this approach.

Referencing Best Practices with Git URLs

How you structure $ref URLs pointing to files in Git is critical for stability:

  • Reference Specific Commits: The most robust approach uses URLs pointing to a file at a specific commit hash (e.g., .../blob/<commit_hash>/...). Commit hashes are immutable, guaranteeing the reference always points to the exact intended version.
  • Avoid Referencing Tags (Usually): Git tags can be moved or deleted, making them less reliable for immutable references than commit hashes unless you have strict controls.
  • Referencing Branches (Use with Caution): Pointing to a branch name (e.g., .../blob/main/...) means the reference always resolves to the latest version on that branch. Use this sparingly, only if you explicitly accept the risk of breaking changes introduced by updates to the branch.

Leveraging GitOps for AsyncAPI Component Management

Using Git repositories for shared AsyncAPI components naturally enables GitOps practices:

  • Version Control & Auditability: Git tracks the full history of all changes.
  • Collaboration & Review: Use standard Pull/Merge Requests for reviews, automated checks, and approvals.
  • Access Control: Utilize Git platform features (like GitHub CODEOWNERS) for permissions.
  • Automation: Trigger CI/CD pipelines for linting, validation, and other checks on changes.

Treating AsyncAPI components as code within a GitOps workflow brings consistency, traceability, and control.

A Note on Developer Experience (DX)

Implementing robust repository strategies, GitOps workflows, and handling authenticated $ref resolution requires effort. Some solutions discussed are workarounds for current tooling limitations. The goal is to provide practical ways to unblock developers today, enabling scalable management while the community works on improving the overall DX.

Conclusion

Effectively organizing your AsyncAPI documents is essential for building maintainable, scalable, and consistent event-driven systems. Leverage the $ref mechanism strategically, extract reusable components, and choose appropriate storage methods like Git repositories. Use secure, immutable Git commit URLs for stable references.

Adopting GitOps practices further enhances collaboration, governance, and automation for these components. While challenges remain, particularly around authenticated references and developer experience, these strategies provide a solid foundation for managing your AsyncAPI landscape effectively.

Investing in thoughtful organization upfront pays dividends as your asynchronous architecture grows.

Juan Cruz Viotti

Founder | Fractional CTO | O'Reilly author | JSON Schema TSC | Award-winning University of Oxford alumnus

3mo

This is the key, yet vast majority of users don't do it. What I love about this is that once you decouple your schemas from the API specification wrapper, you suddenly have access to a much wider range of advanced JSON Schema tooling. For example, one of the main things I help people with is, once they extract their schemas, to actually unit test them! (using my JSON Schema CLI: https://github.com/sourcemeta/jsonschema). You would be surprised by how many schemas are out there embedded in API specifications that looks OK but have lots of subtle issues in them.

To view or add a comment, sign in

Others also viewed

Explore topics