Azure. Read Data Data Lake Gen2 from Databricks

Azure Databricks has become the tool for analyzing big data, with an Apache Spark environment. With this service you can program through the language of Python, Scala,, R, Java, SQL,


There are currently four options for connecting from Databricks to ADLS Gen2:
  1. Using the ADLS Gen2 storage account access key directly
  2. Using a service principal directly (OAuth 2.0)
  3. Mounting an ADLS Gen2 filesystem to DBFS using a service principal (OAuth 2.0)
  4. Azure Active Directory (AAD) credential passthrough
In this article, I will explain point four.

In this way, we can use a Vnet Injection, together with the mounting point.

In the firewall rules, we must include the Vnet, which is part of the Databricks service. You can see the article, Create Databricks service


Cluster Configuration Example



Cluster values for this example:
Python Version.

Cluster Mode

Use of Active Directory, for login


Additionally, the databrick service has to be Premium
let's go to the explanation

You must have a token, to make the mount point.




Since we already have the token, we can make the mounting point. We can see that we no longer use the values generated by the main service. ONLY TOKEN




Now, we can read a file from the storage account



I hope it helps a lot


Comentarios

Entradas populares de este blog

Desarrollar un Bot con tecnología Microsoft Azure