“Why Did My Azure Pipeline Fail?!” - A Hidden Gotcha with Key Vault Access in Synapse vs. ADF

“Why Did My Azure Pipeline Fail?!” - A Hidden Gotcha with Key Vault Access in Synapse vs. ADF

One of the common challenges data engineers face when working with Azure Synapse Analytics or Azure Data Factory (ADF) is handling secure access to Azure Key Vault when executing pipelines. Often, a notebook that runs perfectly fine in an interactive development session fails when executed as part of a pipeline. This behavior is confusing at first glance but makes complete sense once we understand how managed identities behave across different services in Azure.

This article aims to explain this subtle yet important concept of identity inheritance and how it affects access to Azure Key Vault within Synapse and ADF pipelines.


The Scenario

Consider a notebook in Azure Synapse that connects to a SQL database. For security best practices, the database credentials are stored in Azure Key Vault. The notebook accesses these secrets during execution and connects to the database. When run manually in Synapse Studio, the notebook works without any issues.

However, the same notebook, when invoked through a Synapse pipeline, fails with a Key Vault access denied error. This happens despite the notebook and the Spark pool having necessary access when tested in isolation.


The Underlying Cause

This issue stems from a misunderstanding about how Azure handles identity delegation during pipeline execution. When a notebook is executed directly, the identity that gets used to access the Key Vault could be either the workspace's managed identity or the Spark pool's managed identity, depending on configuration.

However, when a notebook is invoked through a Synapse pipeline, the identity that is used to access resources such as Key Vault is the pipeline's managed identity, not the notebook’s or the Spark pool's. This means that unless the pipeline’s managed identity is explicitly granted access to Key Vault, the notebook execution will fail at runtime even if it works fine when tested independently.


Behavior in Azure Data Factory

In Azure Data Factory, the pattern is similar but with additional flexibility. When an ADF pipeline references Key Vault secrets directly—for example, through a linked service—the pipeline's managed identity must have permissions on the Key Vault. However, if the ADF pipeline triggers an external compute service such as Azure Databricks, then the identity of the compute environment (e.g., Databricks cluster) is used to access Key Vault, not ADF’s.

Therefore, while Synapse and ADF pipelines behave similarly with their own internal resources, ADF pipelines differ when interacting with external services by allowing those services' identities to be used instead.


Diagnosing and Resolving the Issue

To resolve this, one must explicitly grant Key Vault access to the managed identity that is responsible for executing the action. In the case of a Synapse pipeline invoking a notebook, the Synapse pipeline's managed identity must be granted access to the Key Vault.

This can be done by navigating to the Azure Key Vault, opening the Access Policies blade, and adding an access policy that grants "Get" and "List" permissions for secrets to the Synapse pipeline's managed identity. In the case of ADF, the same steps apply if the pipeline itself needs access. However, if ADF is triggering an external system, ensure that the compute identity (e.g., Databricks workspace) has the required access.

To view or add a comment, sign in

Others also viewed

Explore topics