Understanding VMware vSAN Architecture: Primer

Operating VMware vSAN may seem simple from the vSphere Client UI, but behind the curtain lies a sophisticated, distributed storage system built on resilient architectural principles. Whether you're setting up a nested lab or managing production workloads, understanding how vSAN truly works is crucial—not only for troubleshooting but also for maximizing performance and resiliency.

Why Knowing the Architecture Matters

Just because vSAN is easy to set up doesn't mean you should skip understanding how it functions under the hood. When issues arise, familiarity with its internal mechanics makes root cause analysis significantly faster. Beyond the practical benefits, there’s something deeply satisfying about knowing how it all works.


vSAN Server Roles: Master, Backup, Agent

When a vSAN cluster is formed, it performs a role election to assign three logical roles to participating hosts:

  • Master Node: One per cluster, the Master is responsible for collecting CMMDS (Cluster Membership, Monitoring, and Directory Services) updates from all nodes and distributing them.
  • Backup Node: A standby to the Master, ready to take over should the Master fail.
  • Agent Nodes: All remaining nodes take on the Agent role and may be promoted to Master or Backup as needed.

⚠️ Note: These roles are dynamically managed by vSAN and cannot be manually configured.

In failure scenarios where nodes cannot communicate, each becomes its own Master, resulting in partitions—a concept akin to that in vSphere HA.


Core Components of the vSAN Architecture

To demystify how vSAN works, let’s use a house-building analogy to represent each of the primary components:

Article content

Key Functional Summaries

CLOM – The Architect

  • Runs on every node (/etc/init.d/clomd)
  • Validates resource availability for object creation
  • Balances workload and handles object compliance
  • Handles proactive and reactive rebalancing across nodes

DOM – The General Contractor

  • Kernel-level process with no restartable daemon
  • Receives object layout instructions from CLOM
  • Coordinates with other DOMs to place object components across nodes
  • Manages DOM Owner (I/O control) and DOM Client (I/O execution)

LSOM – The On-site Worker

  • Executes disk-level I/O as per DOM instructions
  • Handles encryption, I/O retries, device health checks
  • Manages SSD log recovery during node reboots

CMMDS – The Project Manager

  • Maintains cluster state and metadata
  • Governs role elections (Master, Backup, Agent)
  • Publishes inventory and topology data for use by CLOM and DOM
  • Failures in CMMDS communication can impact object availability

OSFS – The Decorator

  • Abstracts vSAN's object-based backend into a namespace-like structure
  • Manages directory-like constructs for VMs in vSAN Datastore
  • Maps object IDs to human-readable VM and file names

SPBM – The Furniture

  • Allows per-VM or per-disk policies (availability, compression, encryption, etc.)
  • Uses VASA (vSphere APIs for Storage Awareness) for compliance checks
  • Directs CLOM on how to place or modify object layouts

RDT – The Delivery Truck

  • Transport layer for vSAN traffic between nodes
  • Reacts quickly to link failures via CMMDS-published link status
  • Optimized for large-scale, low-latency transfers


Putting It All Together

Here’s a simplified workflow of how these components collaborate:

  1. Object Creation Request The user (or vCenter) triggers object creation (e.g., new VMDK or snapshot).
  2. CLOM Validation CLOM checks available resources and policy compliance.
  3. DOM Coordination DOM receives the layout plan from CLOM, determines local vs. remote component creation.
  4. LSOM Execution LSOM creates local components, performs I/O buffering and hardware interactions.
  5. Cross-node Sync Remote components are created by peer DOM/LSOM instances, coordinated via DOM-to-DOM communication.


Final Thoughts

vSAN hides most of this complexity through an elegant and user-friendly vSphere interface. Yet, beneath the surface, it's a sophisticated distributed storage platform with dynamic role management, policy-driven object placement, and high-availability logic. Understanding this architecture not only helps in diagnosing and resolving issues faster, but also elevates your ability to design resilient, scalable HCI environments.

Whether you're running a nested lab, architecting a production system, or simply exploring HCI, understanding how vSAN works is both practically empowering and technically gratifying.

 

To view or add a comment, sign in

Others also viewed

Explore topics