Microsoft Purview – Data Catalog (Post 3)

1 - Microsoft Purview: Creating a service principal

Synopsis:

In this post we will be creating a service principal with a client secret

Components:

App registration
Adding a secret to the client credentials
Adding the secret to your Azure Key Vault
Create a credential for your secret in Microsoft Purview

2 - Microsoft Purview: App registration

Deployment steps:

Create a new or use an existing service principal in your Azure Active Directory tenant for authenticate to other services.

Azure portal > Azure Active Directory > App registrations > New registration >

New name > Select Accounts in this organizational directory only >

For Redirect URI > Web > enter any URL you want – https://allensmicrosoftpurview.com/auth

Register.

Copy the Application (client) ID value. (will be used for Microsoft Purview credential)

3 - Microsoft Purview:  Adding a secret to the client credentials

App registrations > select the newly created Purview App > New client secret >
Provide secret description and set the expiration >
Copy the value of the Secret value.

We’ll use this later to create a secret in Azure Key Vault.

4 - Adding the Microsoft Purview secret to your Azure Key Vault

Microsoft Purview needs to use this service principal to authenticate with other services. MP requires this credential be stored in Azure Key Vault.

Granting Microsoft Purview account access to the Azure Key Vault:
Key vault > Secrets > Manual >

Select a Name > save it to create a credential in Microsoft Purview.

Enter the Value retrieved from the Secret value from your Service Principal (created above).

Select Create to complete.

5 - Microsoft Purview: Create a credential for your secret

To enable Microsoft Purview to use this service principal to authenticate with other services, you’ll need to follow these three steps.

Connect your Azure Key Vault to Microsoft Purview
Grant your service principal authentication on your source – Follow instructions on each source page to grant appropriate authentication.
Create a new credential in Microsoft Purview – You’ll use the service principal’s application (client) ID and the name of the secret you created in your Azure Key Vault.

Microsoft Purview: Discover and govern Azure SQL Database

Microsoft Purview – SQL (Post 4) https://allenvisser.azurewebsites.net/wp-admin/post.php?post=973&action=edit

6 - Microsoft Purview: Data Catalog / Data Discovery

Searching a data catalog is a great tool for data discovery if a data consumer knows what they’re looking for, but often users don’t know exactly how their data estate is structured. The Microsoft Purview Data Catalog offers a browse experience that enables users to explore what data is available to them either by collection or through traversing the hierarchy of each data source in the catalog.

Deployment Steps:

Go to Microsoft Purview governance portal home page > Data Catalog > Browse

You can browse by Collection or Source Type.

You can browse by Collection > which allows you to explore the different collections that you have data reader or curator permissions to. You will only see collections that you have access to. If you need access to additional colllections then see section

Once you u have selected a collection, you will get a list of assets in that collection with the facets and filters available in search. As a collection can have thousands of assets, browse uses the Microsoft Purview search relevance engine by sorting the most important assets to the top.

Once you find the asset you’re looking for, you can click on it to view more details such as schema, lineage, and a detailed classification list.

7 - Microsoft Purview: Asset Description customization

If you need to edit the asset, you can click on the Edit button and edit the following entries to align with your business needs. Guidelines can be found here: https://learn.microsoft.com/en-us/azure/purview/catalog-asset-details#asset-description

Asset description, Classifications category, Glossary terms

You can browse by Source Type >

Browse by source type allows data consumers to explore the hierarchies of data sources using an explorer view. Select a source type to see the list of scanned sources.

Browse by source types page, tiles are categorized by data sources.

To further explore assets in each data source, click on the corresponding source tile.

Certain tiles are groupings of a collection of data sources. For example, the Azure SQL Server tile will display the Azure SQL Server assets that contain Azure SQL Database instances ingested into the catalog.

On the next page, top-level assets under your chosen data type found inside your sources. Pick one of the assets to further explore its contents. For example, after selecting “Azure SQL Database”, you’ll see a list of databases with assets in the data catalog.

Start browsing by selecting the asset on the left panel. Child assets will be listed on the right panel of the page.

To view the details of an asset, select the name or the ellipses button on the far right.

8 - Microsoft Purview: Asset Certification labels

As a Microsoft Purview data catalog grows in size, it becomes important for data consumers to understand what assets they can trust. Data consumers must know if an asset meet their organization’s quality standards and can be regarded as reliable. Microsoft Purview allows data stewards to manually endorse assets that can be used across an organization or business unit. How do we certify assets so that data consumers can view certification labels?

To certify an asset, you must be a data curator for the collection.

Procedure:
Go to the Governance Portal > Data Catalog > Browse > Collection > select the asset > Edit > Toggle the Certified field to Yes > Save

The Asset will now have a “certified” next to the name

or you can do a bulk labelling by selecting multiple assets > click on View Selected in right bottom corner > Bulk edit > and apply classification lablels accordingly.

9 - Microsoft Purview: Automatically apply classifications on assets

Classifications can be automatically applied on file and column assets during scanning.

Deployment Steps:
Select the Data Map > Sources > select a source > new scan > you may create your new custom scan with your custom settings and classifications > schedule your scan.

https://learn.microsoft.com/en-us/azure/purview/apply-classifications

Searching by Certification labels

You can do searches of data catalog via certification labels on any asset that is certified.

10 - Microsoft Purview: Data Catalog Managed Attributes

Managed attributes are user-defined attributes that provide a business or organization context to an asset. Managed attributes enable data consumers using the data catalog to gain context on the role an asset plays in a business.

Managed attribute: Is a set of user-defined attributes that provide a business or organization context to an asset. A managed attribute has a name and a value. For example, “Department” is an attribute name and “Finance” is its value. 

Attribute group: A grouping of managed attributes that allow for easier organization and consumption.

Deployment Steps:
Go to the Microsoft Purview governance portal home page > Data Map > Managed Attributes > New > Attribute Group > Attribute Name > Create

Go back to the Microsoft Purview governance portal home page > Data Map > Managed Attributes > New > New Attribute > Attribute Name > Create

Managed attributes have a name, attribute group, data type, and associated asset types. Attribute groups can be created in-line during the managed attribute creation process. Associated asset types are the asset types you can apply the attribute to. For example, if you select “Azure SQL Table” for an attribute, you can apply it to Azure SQL Database > Create

In the managed attribute management lifecycle, managed attributes can’t be deleted, only expired. Expired attributes can’t be applied to any assets and are, by default, hidden in the user experience. By default, expired managed attributes aren’t removed from an asset. If an asset has an expired managed attribute applied, it can only be removed, not edited.

Both attribute groups and individual managed attributes can be expired. To mark an attribute group or managed attribute as expired, select the Edit > Mark as expired ( Once expired, attribute groups and managed attributes can’t be reactivated.)

Applying the Managed Attribute

  1. Go to the Microsoft Purview governance portal home page > Data Catalog > Browse > Collection > select Asset > Edit > Managed attributes > select Add attribute > Choose the attribute you wish to apply > Attributes are grouped by their attribute group > Choose the value of the attribute > Save

(As you notice, I could not find an asset in my sandbox that I could edit and apply any Managed Attributes to)

11 - Microsoft Purview: Data Catalog lineage

One of the platform features of Microsoft Purview is the ability to show the lineage between datasets created by data processes. Systems like Data Factory, Data Share, and Power BI capture the lineage of data as it moves. Custom lineage reporting is also supported via Atlas hooks and REST API.

Metadata collected in Microsoft Purview from enterprise data systems are inter-connected to show an end to end data lineage. Data systems that collect lineage into Microsoft

Inside the Governance Portal > Browse > Collection > select an Asset >

You can also do a manual lineage as per https://learn.microsoft.com/en-us/azure/purview/catalog-lineage-user-guide#manual-lineage

You can then show dataset column lineage by selecting the check box next to each column you want to display in the data lineage.

Process column lineage

You can then show Process column lineage by view data processes, like copy activities, in the data catalog

12 - Perform actions on your assets using Power BI

Microsoft Purview makes it easy to work with your data catalog. You can open certain assets in Power BI Desktop from the asset details page. Power BI Desktop integration is supported for the following sources:

  • Azure Blob Storage
  • Azure Cosmos DB
  • Azure Data Lake Storage Gen2
  • Azure Dedicated SQL pool (formerly SQL DW)
  • Azure SQL Database
  • Azure SQL Managed Instance
  • Azure Synapse Analytics
  • Azure Database for MySQL
  • Azure Database for PostgreSQL
  • Oracle DB
  • SQL Server
  • Teradata

Below are a list of actions you can take from an asset details page. Actions available to you vary depending on your permissions and the type of asset you’re looking at. Available actions are generally available on the global actions bar.

If you wish to open the asset in Power Bi Desktop then click the link and download the .pbids file > right click and open with Power Bi Desktop.

2 comments

  1. With havin so much content and articles do you ever run into any problems of plagorism or copyright violation? My website has a lot of completely unique content I’ve either created myself or outsourced but it seems a lot of it is popping it up all over the internet without my permission. Do you know any solutions to help protect against content from being ripped off? I’d truly appreciate it.

Leave a comment

Your email address will not be published. Required fields are marked *