Azure Databricks
Chitra Singh
Lack of etiquette and manners is a huge turn off.
KnolX Etiquettes
 Punctuality
Join the session 5 minutes prior to the session start time. We start on
time and conclude on time!
 Feedback
Make sure to submit a constructive feedback for all sessions as it is very
helpful for the presenter.
 Silent Mode
Keep your mobile devices in silent mode, feel free to move out of session
in case you need to attend an urgent call.
 Avoid Disturbance
Avoid unwanted chit chat during the session.
1. What is Azure Databricks ?
2. Why we need Azure Databricks ?
3. How does Azure Databricks Works ?
4. Various Databricks.
5. Integrate Azure Databricks with Azure Blob
storage.
Azure Databricks (For Data Analytics).pptx
What is Databricks
Databrick was founded by original creator of Apache Spark. It was developed as a web-based
platform for working with Apache Spark. It provides automated cluster management and iPython-
style notebooks.
What is Databricks
Azure Databricks is the jointly-developed data and AI cloud service from Microsoft and
Databricks for the data analytics, data science, data engineering and machine learning.
What is Databricks
Azure Databricks, architecturally, is a cloud service that lets you set up and use a cluster of
Azure instances with Apache Spark installed with a Master-Worker nodal dynamic (similar
to a local Hadoop/Spark cluster)
Azure Cluster with Spark
Remote Access
Databricks Notebooks
Multi-Language
Collaborative
Ideal For Exploration
Reproducible
Get to Production
Faster
Enterprise Ready
Adaptable
What is Databricks
Since Azure Databricks is a cloud base service, it has several advantages over traditional
Spark clusters. Let us look at the benefits of using Azure Databricks
Optimised Spark Engine: Data Processing with
Auto-scaling and Spark optimized for up to 50x
performance gain.
Mlfow : Track and share experiments, reproduce runs
and manage models collaboratively from a central
repository.
Machine Learning : Pre-configured environments with
frameworks such as PyTorch, TensorFlow and sci-kit
learn installed.
What is Databricks
Choice of language : Use your preferred language, including Python, Scala, R,
Spark SQL, and .Net - whether you use serverless or provisioned computer
resources.
What is Databricks
Collaborative Notebooks: Quickly access and explore data and share new
insights and building models collectively with the language and tools of your
choice
Delta Lake: Bring data reliability and scalability to your existing data lake with
an open-source transactional storage layer designed for the full data cycle.
Integration with Azure Services: Complete your end-to-end analytics and
machine learning solution and deep integration with azure services such as
Azure Data Factory, Azure Data Storage, Azure Machine learning and Power BI
What is Databricks
Interactive Workspace: Easy and seamless coordination with Data Analyst
Data Scientist ,Data Engineer and Business Analysist to ensure smooth
collaborations.
Enterprise Grade Security: The native security provided by Microsoft Azure
ensure protection of data within storage services and private workspaces.
Production Ready: Easily run, implement and monitor your data-oriented jobs
and job-related stats.
How does Azure Databricks Works
How does Azure Databricks Works
Microsoft Azure provides a very simplified and easy to use interface to implements Databricks.
Databricks Utilites
Databricks Utilities
Databricks utilities and DButils help us to perform a verity of powerful which include efficient object
storage, chaining notebooks together and working with secrets.
In Azure Databricks notebooks, the DBUtils library provides utilities for interacting with various
aspects of the Databricks environment, such as file system operations, database connections, and
cluster configuration.
All DButils are available for notebooks of the following languages:
• Python,
• Scala
• R
Note: DBUtils are not supported outside Notebooks
Overall, regardless of the notebook language you're using (Python, Scala, or R), you can leverage
the capabilities provided by DBUtils to interact with various aspects of Azure Databricks.
Integrating Azure Databricks with Azure Blob
Storage
Integrating Azure Databricks with Azure Blob Storage
Seamless integration with various Azure services:
• Azure Storage: Data storage and retrieval.
• Azure SQL Data Warehouse: Data warehousing and analytics.
• Azure Cosmos DB: NoSQL database for scalable applications.
• Azure Data Lake Storage: Scalable data lake storage.
• Azure Active Directory: Identity and access management.
Microsoft azure provides a multitude of services .It often benefical to combine multiple
services together to approch your use-case
User
Coding
Notebooks
Azure
Databricks
Azure Cluster
with Spark
Hands on- integrating azure databricks with
azure blob storage
Hands on- integrating azure databricks with azure
blob storage
Step 1: Set up Azure Databricks
• Log in to the Azure portal (https://portal.azure.com).
• Search for "Databricks" in the search bar.
• Create a new Azure Databricks workspace by providing necessary details like subscription,
resource group, workspace name, and pricing tier.
• Once the workspace is provisioned, navigate to it from the Azure portal.
Hands on- integrating azure databricks with azure
blob storage
Step 2: Create a Cluster
• Inside the Azure Databricks workspace, go to the Clusters tab.
• Click on "Create Cluster" and configure the cluster settings such as cluster mode, instance type,
and number of workers.
• Click "Create Cluster" to provision the cluster.
Hands on- integrating azure databricks with azure
blob storage
Step 3: Create a Notebook
• Go to the Notebooks tab in the workspace.
• Click on "Create" and choose the language you want to use (Python, Scala, SQL, or R).
• Name your notebook and click "Create."
Hands on- integrating azure databricks with azure
blob storage
Step 4: Connect to Azure Blob Storage
In your notebook, use the following code to configure Azure Blob Storage credentials:
pythonCopy code
# Define storage account credentials
storage_account_name = "your_storage_account_name"
storage_account_access_key = "your_storage_account_access_key"
# Configure Spark to access Azure Blob Storage
spark.conf.set(
"fs.azure.account.key."+storage_account_name+".blob.core.windows.net",
storage_account_access_key
)
Replace "your_storage_account_name" and "your_storage_account_access_key" with
your actual storage account name and access key.
Hands on- integrating azure databricks with azure
blob storage
Step 5: Access Data in Azure Blob Storage
Once connected, you can access data stored in Azure Blob Storage using Spark APIs. For
example:
pythonCopy code
# Load data from Azure Blob Storage
df =
spark.read.csv("wasbs://container@storage_account_name.blob.core.windows.net/path/to/file.cs
v")
# Display the data
display(df)
Replace "container" and "path/to/file.csv" with your container name and file path.
Hands on- integrating azure databricks with azure
blob storage
Step 6: Perform Data Operations
• You can now perform various data operations on the data loaded from Azure Blob Storage using
Spark DataFrame APIs.
• Analyze, transform, visualize, or model the data as needed within your notebook.
Hands on- integrating azure databricks with azure
blob storage
Step 7: Cleanup (Optional)
• Once you're done with your analysis, you can terminate the cluster to avoid incurring
unnecessary costs.
• Go to the Clusters tab, select your cluster, and click "Terminate."
That's it! You've successfully integrated Azure Databricks with Azure Blob Storage and performed
data operations within a notebook.
Conclusion
• Here we have learned about Azure Databricks
• Feature of Azure Databricks
• And, implementation with Blob Storage
• We can explore further and leverage Azure Databricks and Azure Blob Storage for data
analytics needs.
Azure Databricks (For Data Analytics).pptx

More Related Content

PDF
Announcing Databricks Cloud (Spark Summit 2014)
PPTX
Azure DataBricks for Data Engineering by Eugene Polonichko
PPTX
Azure Databricks - An Introduction (by Kris Bock)
PDF
Azure+Databricks+Course+Slide+Deck+V4.pdf
PPTX
Azure Data Explorer deep dive - review 04.2020
PDF
Azure Data Platform Overview.pdf
PPTX
warner-DP-203-slides.pptx
PPTX
AzureSynapse.pptx
Announcing Databricks Cloud (Spark Summit 2014)
Azure DataBricks for Data Engineering by Eugene Polonichko
Azure Databricks - An Introduction (by Kris Bock)
Azure+Databricks+Course+Slide+Deck+V4.pdf
Azure Data Explorer deep dive - review 04.2020
Azure Data Platform Overview.pdf
warner-DP-203-slides.pptx
AzureSynapse.pptx

What's hot (20)

PPTX
Introduction to Azure Databricks
PPTX
Azure data bricks by Eugene Polonichko
PPTX
Databricks Fundamentals
PPTX
Microsoft Azure Databricks
PDF
Intro to Delta Lake
PPTX
Microsoft Fabric Introduction
PDF
Introduction to Azure Data Factory
PDF
Azure Synapse Analytics
PPTX
Azure Synapse Analytics Overview (r2)
PPTX
Data Mesh using Microsoft Fabric
PPTX
Azure data platform overview
PPTX
Azure purview
PPTX
Data Warehousing Trends, Best Practices, and Future Outlook
PDF
Achieving Lakehouse Models with Spark 3.0
PDF
Getting Started with Databricks SQL Analytics
PPTX
Azure Data Engineering.pptx
PDF
Snowflake for Data Engineering
PDF
Learn to Use Databricks for Data Science
PPTX
Data Sharing with Snowflake
PDF
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Introduction to Azure Databricks
Azure data bricks by Eugene Polonichko
Databricks Fundamentals
Microsoft Azure Databricks
Intro to Delta Lake
Microsoft Fabric Introduction
Introduction to Azure Data Factory
Azure Synapse Analytics
Azure Synapse Analytics Overview (r2)
Data Mesh using Microsoft Fabric
Azure data platform overview
Azure purview
Data Warehousing Trends, Best Practices, and Future Outlook
Achieving Lakehouse Models with Spark 3.0
Getting Started with Databricks SQL Analytics
Azure Data Engineering.pptx
Snowflake for Data Engineering
Learn to Use Databricks for Data Science
Data Sharing with Snowflake
Choosing Between Microsoft Fabric, Azure Synapse Analytics and Azure Data Fac...
Ad

Similar to Azure Databricks (For Data Analytics).pptx (20)

PPTX
Azure Databricks Training | Azure Databricks Online Training
PDF
201905 Azure Databricks for Machine Learning
PDF
Comparing Microsoft Big Data Platform Technologies
PDF
Databricks and Logging in Notebooks
PPTX
Azure Databricks - An Introduction 2019 Roadshow.pptx
PPTX
TechEvent Databricks on Azure
PDF
Predicting Flights with Azure Databricks
PPTX
Ai & Data Analytics 2018 - Azure Databricks for data scientist
PPTX
Global AI Bootcamp Madrid - Azure Databricks
PPTX
Databricks for Dummies
PDF
Azure databricks c sharp corner toronto feb 2019 heather grandy
PPTX
Machine Learning and AI
PPTX
Deep Learning Technical Pitch Deck
PDF
Big Data Adavnced Analytics on Microsoft Azure
PPTX
Migration to Databricks - On-prem HDFS.pptx
PPTX
Azure Data Engineer Course | Microsoft Azure Data Engineer.pptx
PDF
5 Comparing Microsoft Big Data Technologies for Analytics
PPTX
slides.pptx
PPTX
slides.pptx
PPTX
Azure Data serices and databricks architecture
Azure Databricks Training | Azure Databricks Online Training
201905 Azure Databricks for Machine Learning
Comparing Microsoft Big Data Platform Technologies
Databricks and Logging in Notebooks
Azure Databricks - An Introduction 2019 Roadshow.pptx
TechEvent Databricks on Azure
Predicting Flights with Azure Databricks
Ai & Data Analytics 2018 - Azure Databricks for data scientist
Global AI Bootcamp Madrid - Azure Databricks
Databricks for Dummies
Azure databricks c sharp corner toronto feb 2019 heather grandy
Machine Learning and AI
Deep Learning Technical Pitch Deck
Big Data Adavnced Analytics on Microsoft Azure
Migration to Databricks - On-prem HDFS.pptx
Azure Data Engineer Course | Microsoft Azure Data Engineer.pptx
5 Comparing Microsoft Big Data Technologies for Analytics
slides.pptx
slides.pptx
Azure Data serices and databricks architecture
Ad

More from Knoldus Inc. (20)

PPTX
Angular Hydration Presentation (FrontEnd)
PPTX
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
PPTX
Self-Healing Test Automation Framework - Healenium
PPTX
Kanban Metrics Presentation (Project Management)
PPTX
Java 17 features and implementation.pptx
PPTX
Chaos Mesh Introducing Chaos in Kubernetes
PPTX
GraalVM - A Step Ahead of JVM Presentation
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
Nomad by HashiCorp Presentation (DevOps)
PPTX
DAPR - Distributed Application Runtime Presentation
PPTX
Introduction to Azure Virtual WAN Presentation
PPTX
Introduction to Argo Rollouts Presentation
PPTX
Intro to Azure Container App Presentation
PPTX
Insights Unveiled Test Reporting and Observability Excellence
PPTX
Introduction to Splunk Presentation (DevOps)
PPTX
Code Camp - Data Profiling and Quality Analysis Framework
PPTX
AWS: Messaging Services in AWS Presentation
PPTX
Amazon Cognito: A Primer on Authentication and Authorization
PPTX
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
PPTX
Managing State & HTTP Requests In Ionic.
Angular Hydration Presentation (FrontEnd)
Optimizing Test Execution: Heuristic Algorithm for Self-Healing
Self-Healing Test Automation Framework - Healenium
Kanban Metrics Presentation (Project Management)
Java 17 features and implementation.pptx
Chaos Mesh Introducing Chaos in Kubernetes
GraalVM - A Step Ahead of JVM Presentation
Nomad by HashiCorp Presentation (DevOps)
Nomad by HashiCorp Presentation (DevOps)
DAPR - Distributed Application Runtime Presentation
Introduction to Azure Virtual WAN Presentation
Introduction to Argo Rollouts Presentation
Intro to Azure Container App Presentation
Insights Unveiled Test Reporting and Observability Excellence
Introduction to Splunk Presentation (DevOps)
Code Camp - Data Profiling and Quality Analysis Framework
AWS: Messaging Services in AWS Presentation
Amazon Cognito: A Primer on Authentication and Authorization
ZIO Http A Functional Approach to Scalable and Type-Safe Web Development
Managing State & HTTP Requests In Ionic.

Recently uploaded (20)

PPTX
The various Industrial Revolutions .pptx
PDF
Assigned Numbers - 2025 - Bluetooth® Document
PDF
Unlock new opportunities with location data.pdf
PDF
Architecture types and enterprise applications.pdf
PDF
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
PDF
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
PDF
Developing a website for English-speaking practice to English as a foreign la...
PDF
Taming the Chaos: How to Turn Unstructured Data into Decisions
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
PDF
A contest of sentiment analysis: k-nearest neighbor versus neural network
PPTX
Web Crawler for Trend Tracking Gen Z Insights.pptx
PDF
1 - Historical Antecedents, Social Consideration.pdf
PDF
August Patch Tuesday
PDF
CloudStack 4.21: First Look Webinar slides
PPTX
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
PDF
Univ-Connecticut-ChatGPT-Presentaion.pdf
PPTX
Benefits of Physical activity for teenagers.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPT
Module 1.ppt Iot fundamentals and Architecture
The various Industrial Revolutions .pptx
Assigned Numbers - 2025 - Bluetooth® Document
Unlock new opportunities with location data.pdf
Architecture types and enterprise applications.pdf
How ambidextrous entrepreneurial leaders react to the artificial intelligence...
A Late Bloomer's Guide to GenAI: Ethics, Bias, and Effective Prompting - Boha...
Developing a website for English-speaking practice to English as a foreign la...
Taming the Chaos: How to Turn Unstructured Data into Decisions
A novel scalable deep ensemble learning framework for big data classification...
ENT215_Completing-a-large-scale-migration-and-modernization-with-AWS.pdf
A contest of sentiment analysis: k-nearest neighbor versus neural network
Web Crawler for Trend Tracking Gen Z Insights.pptx
1 - Historical Antecedents, Social Consideration.pdf
August Patch Tuesday
CloudStack 4.21: First Look Webinar slides
MicrosoftCybserSecurityReferenceArchitecture-April-2025.pptx
Univ-Connecticut-ChatGPT-Presentaion.pdf
Benefits of Physical activity for teenagers.pptx
Getting Started with Data Integration: FME Form 101
Module 1.ppt Iot fundamentals and Architecture

Azure Databricks (For Data Analytics).pptx

  • 2. Lack of etiquette and manners is a huge turn off. KnolX Etiquettes  Punctuality Join the session 5 minutes prior to the session start time. We start on time and conclude on time!  Feedback Make sure to submit a constructive feedback for all sessions as it is very helpful for the presenter.  Silent Mode Keep your mobile devices in silent mode, feel free to move out of session in case you need to attend an urgent call.  Avoid Disturbance Avoid unwanted chit chat during the session.
  • 3. 1. What is Azure Databricks ? 2. Why we need Azure Databricks ? 3. How does Azure Databricks Works ? 4. Various Databricks. 5. Integrate Azure Databricks with Azure Blob storage.
  • 5. What is Databricks Databrick was founded by original creator of Apache Spark. It was developed as a web-based platform for working with Apache Spark. It provides automated cluster management and iPython- style notebooks.
  • 6. What is Databricks Azure Databricks is the jointly-developed data and AI cloud service from Microsoft and Databricks for the data analytics, data science, data engineering and machine learning.
  • 7. What is Databricks Azure Databricks, architecturally, is a cloud service that lets you set up and use a cluster of Azure instances with Apache Spark installed with a Master-Worker nodal dynamic (similar to a local Hadoop/Spark cluster) Azure Cluster with Spark Remote Access
  • 8. Databricks Notebooks Multi-Language Collaborative Ideal For Exploration Reproducible Get to Production Faster Enterprise Ready Adaptable
  • 9. What is Databricks Since Azure Databricks is a cloud base service, it has several advantages over traditional Spark clusters. Let us look at the benefits of using Azure Databricks Optimised Spark Engine: Data Processing with Auto-scaling and Spark optimized for up to 50x performance gain. Mlfow : Track and share experiments, reproduce runs and manage models collaboratively from a central repository. Machine Learning : Pre-configured environments with frameworks such as PyTorch, TensorFlow and sci-kit learn installed.
  • 10. What is Databricks Choice of language : Use your preferred language, including Python, Scala, R, Spark SQL, and .Net - whether you use serverless or provisioned computer resources.
  • 11. What is Databricks Collaborative Notebooks: Quickly access and explore data and share new insights and building models collectively with the language and tools of your choice Delta Lake: Bring data reliability and scalability to your existing data lake with an open-source transactional storage layer designed for the full data cycle. Integration with Azure Services: Complete your end-to-end analytics and machine learning solution and deep integration with azure services such as Azure Data Factory, Azure Data Storage, Azure Machine learning and Power BI
  • 12. What is Databricks Interactive Workspace: Easy and seamless coordination with Data Analyst Data Scientist ,Data Engineer and Business Analysist to ensure smooth collaborations. Enterprise Grade Security: The native security provided by Microsoft Azure ensure protection of data within storage services and private workspaces. Production Ready: Easily run, implement and monitor your data-oriented jobs and job-related stats.
  • 13. How does Azure Databricks Works
  • 14. How does Azure Databricks Works Microsoft Azure provides a very simplified and easy to use interface to implements Databricks.
  • 16. Databricks Utilities Databricks utilities and DButils help us to perform a verity of powerful which include efficient object storage, chaining notebooks together and working with secrets. In Azure Databricks notebooks, the DBUtils library provides utilities for interacting with various aspects of the Databricks environment, such as file system operations, database connections, and cluster configuration. All DButils are available for notebooks of the following languages: • Python, • Scala • R Note: DBUtils are not supported outside Notebooks Overall, regardless of the notebook language you're using (Python, Scala, or R), you can leverage the capabilities provided by DBUtils to interact with various aspects of Azure Databricks.
  • 17. Integrating Azure Databricks with Azure Blob Storage
  • 18. Integrating Azure Databricks with Azure Blob Storage Seamless integration with various Azure services: • Azure Storage: Data storage and retrieval. • Azure SQL Data Warehouse: Data warehousing and analytics. • Azure Cosmos DB: NoSQL database for scalable applications. • Azure Data Lake Storage: Scalable data lake storage. • Azure Active Directory: Identity and access management. Microsoft azure provides a multitude of services .It often benefical to combine multiple services together to approch your use-case User Coding Notebooks Azure Databricks Azure Cluster with Spark
  • 19. Hands on- integrating azure databricks with azure blob storage
  • 20. Hands on- integrating azure databricks with azure blob storage Step 1: Set up Azure Databricks • Log in to the Azure portal (https://portal.azure.com). • Search for "Databricks" in the search bar. • Create a new Azure Databricks workspace by providing necessary details like subscription, resource group, workspace name, and pricing tier. • Once the workspace is provisioned, navigate to it from the Azure portal.
  • 21. Hands on- integrating azure databricks with azure blob storage Step 2: Create a Cluster • Inside the Azure Databricks workspace, go to the Clusters tab. • Click on "Create Cluster" and configure the cluster settings such as cluster mode, instance type, and number of workers. • Click "Create Cluster" to provision the cluster.
  • 22. Hands on- integrating azure databricks with azure blob storage Step 3: Create a Notebook • Go to the Notebooks tab in the workspace. • Click on "Create" and choose the language you want to use (Python, Scala, SQL, or R). • Name your notebook and click "Create."
  • 23. Hands on- integrating azure databricks with azure blob storage Step 4: Connect to Azure Blob Storage In your notebook, use the following code to configure Azure Blob Storage credentials: pythonCopy code # Define storage account credentials storage_account_name = "your_storage_account_name" storage_account_access_key = "your_storage_account_access_key" # Configure Spark to access Azure Blob Storage spark.conf.set( "fs.azure.account.key."+storage_account_name+".blob.core.windows.net", storage_account_access_key ) Replace "your_storage_account_name" and "your_storage_account_access_key" with your actual storage account name and access key.
  • 24. Hands on- integrating azure databricks with azure blob storage Step 5: Access Data in Azure Blob Storage Once connected, you can access data stored in Azure Blob Storage using Spark APIs. For example: pythonCopy code # Load data from Azure Blob Storage df = spark.read.csv("wasbs://container@storage_account_name.blob.core.windows.net/path/to/file.cs v") # Display the data display(df) Replace "container" and "path/to/file.csv" with your container name and file path.
  • 25. Hands on- integrating azure databricks with azure blob storage Step 6: Perform Data Operations • You can now perform various data operations on the data loaded from Azure Blob Storage using Spark DataFrame APIs. • Analyze, transform, visualize, or model the data as needed within your notebook.
  • 26. Hands on- integrating azure databricks with azure blob storage Step 7: Cleanup (Optional) • Once you're done with your analysis, you can terminate the cluster to avoid incurring unnecessary costs. • Go to the Clusters tab, select your cluster, and click "Terminate." That's it! You've successfully integrated Azure Databricks with Azure Blob Storage and performed data operations within a notebook.
  • 27. Conclusion • Here we have learned about Azure Databricks • Feature of Azure Databricks • And, implementation with Blob Storage • We can explore further and leverage Azure Databricks and Azure Blob Storage for data analytics needs.