3. Data Map
Automate and manage metadata at scale
On-prem
Cloud
SaaS
Applications
Azure
Synapse
Analytics
Power BI
Azure SQL
SQL Server
Microsoft Purview
Data Producers and Consumers
Data Catalog
Enable effortless data
discovery
Data Estate
Insights
Assess data estate health
Generally Available
Preview
Data Policy
Govern access to data
Data Sharing
Share data within and
between organizations
Data Officers
5. Agenda
❖ Recap from last week (High-level Overview)
❖ Custom Classification
❖ Custom Scan Rule
❖ Business Glossary
❖ Managed Attributes
❖ Lineage (ADF)
7. Microsoft Purview Account Creation
A single, centralized place that provides unified experience for data producers, data consumers, data & security officers
Purview Governance Portal divided into Activity hubs.
These organize the tasks needed for managing data governance activities
8. Microsoft Purview Account Creation
Enable the optional Event Hubs namespace by selecting the toggle. It's disabled by default. Enable this option if you want to be able to
programmatically monitor your Microsoft Purview account using Event Hubs and Atlas Kafka topic, ATLAS_HOOK
10. Quick actions, recently accessed
items, owned items, search bar
and documentations.
Create collections, register data
sources and set up scans.
Manage glossary terms, search
glossary terms, manage term
templates and custom
attributes, Import and export
terms using .CSV.
Get Insights on your data.
Metadata management - classifications,
resource sets, data sources, Integration run
time, Alerts, Security, data factories and
data share connections.
Microsoft Purview Governance Portal
A single, centralized place that provides unified experience for data producers, data consumers, data & security officers
Purview Governance Portal divided into Activity hubs.
These organize the tasks needed for managing data governance activities
Share data within and between
organizations
12. Data Catalog Page
All Data Catalog related activities
Catalog metrics
Search bar to
browse and search
data catalog
Data Catalog tiles for
key activities
Links for Overview,
getting started and
purview
documentation
Recently accessed
entities and list of
entities owned by
logged-on data user.
14. Collections
Organize assets and sources based on your business’ requirements, and restrict access to them using Purview Collections
Benefits
• A collection is a tool Microsoft Purview
uses to group assets, sources, and other
artifacts into a hierarchy
• Provide fine-grained access control
• Discover assets by collection
15. Collections – assigning roles
Assign Purview roles at the collection level to users, groups and service principals
Roles Responsibilities
Collection Admin Add users to roles on collections
Data Reader Read-only access to data assets, classifications, classification rules, collections and glossary
terms.
Data Curator Manages assets, configure custom classifications, set up glossary terms, and view data estate
insights
Data Source Admin Manages data sources and scans and can run new scans using an existing scan rule.
Insight Reader Read-only access to insights reports
Data Share Contributor Shares data within an organization and with other organizations
Policy Author (Preview) Able to view, update, and delete Microsoft Purview policies
Workflow Admin Allows a user to access the workflow authoring page
16. Collections – restricting access
Lock down certain collections of assets by restricting inherited permissions
17. Microsoft Purview Data Map
Sources
(Automated Scanning, Classification, Open APIs to populate the Data
Map)
18. Sources
Map your data to manage an enriched metadata map of operational and transactional data no matter where it lives
Benefits
• Automated scanning of on-prem,
multicloud, SaaS data
• Discover sensitive data stored in Azure,
AWS, Google cloud and other services
• Discover Azure data sources, PowerBI,
SQL better. Leverage turnkey
integrations with Power BI, SQL (on-
prem, azure, MI) and key Azure Data
Services such as Azure Synapse, Cosmos
DB, ADLS.
• Manage metadata and scale
understanding of data with automated,
fully managed, serverless metadata
management capability
• Leverage Apache Atlas Open APIs to
programmatically publish metadata and
lineage from a wide range open-source
data systems
Microsoft Purview Data Map supported data sources and file types - Microsoft Purview | Microsoft Docs
19. Register & Scan (ADLS Gen2)
Key Steps Roles
Create Storage Azure Administrator
Grant Microsoft Purview Managed Identity access to ADLS
Gen2 Storage Azure Administrator
Upload data to ADLS Gen 2 storage Azure Administrator
Create new Collection Collection
Administrator
Register a source (ADLS Gen2) Data Source
Administrator
Scan source with the Microsoft Purview Managed Identity Data Source
Administrator
View Assets Data Reader
20. Register & Scan (Azure SQL DB)
Key Steps Roles
Create Azure SQL DB Azure Administrator
Create Key Vault Azure Administrator
Grant Microsoft Purview access to
Key Vault
Access Policies→Add Access
Policy→Select Principal→GET,
READ→
Azure Administrator
Generate Secret for Azure SQL DB
user and password
Azure Administrator
Add Credentials to Microsoft Purview Microsoft Purview Administrator
Register a source (SQL Database) Data Source Administrator
Scan source with Azure Key Vault
Credentials
Data Source Administrator
View Assets Data Reader
21. Data Map– Integration runtimes
Three Options Scanning Sources:
• Microsoft Purview default’s integration
runtime: this option is useful when
connecting to data stores and computes
services with public accessible endpoints.
• Self-hosted integration runtimes (SHIR):
this option particularly useful for VM-based
data sources or applications that either sit
in a private network (VNET) or other
networks, such as on-premises.
• Managed Virtual Network Integration
Runtime: this new option supports
connecting to data stores using private link
service in private network environment.
This ensures that data scanning process is
completely isolated and secure, while also
being fully managed.
Integration runtimes are the compute infrastructure used by scanners to provide automated data scanning,
lineage and classification capabilities across different network and Multi-Cloud environments including
on-prem environments
25. Business Glossary
Consistent and curated understanding of business terms and definitions
Benefits
• Understand business context associated
with data in the organization
• Bulk Import glossary terms from existing
data dictionaries easily
• Flexible business terms definition with
custom attributes per business domain
• Browse & Search your data estate from
a business lens
• Establish hier
26. Business Glossary
Establishing hierarchical business glossary to achieve the following:
• Define parent-child relationship between terms.
• Same term name can be created in different parents to contextualize them as per organization needs.
34. Microsoft Purview Data Estate
Insights
Data Estate Insights
(Bird’s eye view of your Data Estate)
35. Data Estate Insights
Bird’s eye view of data landscape
Benefits
• Intended to help users such as Chief
Data Officers quickly understand their
data estate at large and gain key
insights such as where sensitive data
resides
• Asset Insights to see where all data
resides across Collections and range of
source types. Glossary insights to
understand changes made to business
terms and how much coverage glossary
has over your data map
• Sensitive data Insights to simplify
compliance risk assessment across
operational and transactional data
sources. Assess risk and derive audit
trails of data qualified by sensitivity and
business relevance
39. REST API
• Register and Application: To invoke the REST API, we must first register an application (i.e. service principal) that will
act as the identity that the Microsoft Purview platform recognizes and is configured to trust.
• Provide Service Principal Access to Microsoft Purview as Data Curator
43. Data Sharing
Share data within and between organizations using Microsoft Purview Data Sharing
Benefits
• Easily share in-place with no data
duplication
• Near real time access to shared data
• Centrally manage sharing relationships
• Supports Azure Data Lake Storage
(ADLS Gen2) and Blob Storage