SlideShare a Scribd company logo
How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect
How R Developers Can Build and
Share Data and AI Applications
that Scale with Databricks and
Rstudio Connect
James Blair, Solutions Engineer, RStudio PBC
Rafi Kurlansik, Sr. Solutions Architect, Databricks
Agenda
Rafi Kurlansik, Databricks
Building Scalable R and Shiny apps with RStudio
and Databricks
James Blair, RStudio PBC
Deploying Scalable Shiny apps with RStudio
Connect and Databricks
Benchmarking performance of Shiny connections
to Spark
How to scale R and Shiny
with RStudio and Databricks
How can we open up the data lake to R users?
▪ Typical development patterns
▪ Local
▪ Cloud / On Prem VM
▪ Challenges with big data
▪ Server memory - can only process so much data in the app itself before crashing R
▪ Performance - even on a powerful VM, eventually see our app get less responsive as we reach 100+ GBs
▪ Managing big data infrastructure - app value must be higher to justify the energy investment
If only there was a technology with a familiar API in R that let our app scale to process 100s of GBs...
Imagine trying to do so with traditional R development...
Scale R Apps with Databricks and RStudio
▪ Development Patterns
▪ Hosted RStudio Server (Pro) on Databricks Cluster
▪ RStudio with remote Spark access using Databricks Connect
▪ Overcoming challenges with big data
▪ Auto-scaling Databricks Spark Clusters - dynamically respond to accommodate larger data processing tasks
▪ Consistently fast performance with Delta Lake and Databricks Runtime
▪ Managed service allows data teams to focus on building data products, not maintaining infrastructure
Databricks Spark, RStudio IDE
Hosted RStudio Server Pro on Databricks
RStudio with Databricks Connect
Local RStudio, Remote Spark
Sharing Scalable Shiny Apps
The Data Science Process
Shiny and Spark: A cautionary tale
ODBC to the Rescue
- The R + ODBC toolchain is robust and stable
- As performant as a native Spark connection
- Easy to migrate code from sparklyr to ODBC
- Spark still does all of the computation
- Databricks provides an optimized Spark ODBC driver
ODBC with RStudio Connect
ODBC Performance
Comparing sparklyr against two versions of the Databricks ODBC/JDBC Driver
Collecting Joins
Sparklyr to ODBC
▪ Interactive data analysis with SparkSQL
▪ sparklyr
▪ ODBC
▪ Other Spark APIs
▪ sparklyr
▪ Interactive data analysis with SparkSQL
▪ Shiny with ODBC
▪ Other Spark APIs
▪ ¯_(ツ)_/¯
▪ Deploy models with MLflow?
▪ Submit individual commands with Databricks
REST API 1.2?
Run sparklyr jobs from RStudio on Databricks
with bricksteR?
Stay tuned….
Deploy at scaleDevelop at scale
Conclusion
Additional Resources
▪ Hosted RStudio on Databricks
▪ Databricks Connect
▪ ODBC
▪ ODBC Configuration
▪ RStudio Connect
▪ Sparklyr
▪ blairj09-talks/spark-summit-2020
▪ RafiKurlansik/bricksteR
▪ delta-io/delta
▪ sparklyr/sparklyr
Related ReposDocumentation
Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.
How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

More Related Content

PPT
Fire extinguisher training
PDF
Basic fire alarm training (1)
PDF
Fire Safety in Medium Voltage Substation.pdf
PDF
3.3 PDF MATPEL SUSTANCIAS QUIMICAS (2).pdf
PPTX
Experience Design Presentation
PDF
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
PDF
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
PDF
Introduction to Spark 2.0 Dataset API
Fire extinguisher training
Basic fire alarm training (1)
Fire Safety in Medium Voltage Substation.pdf
3.3 PDF MATPEL SUSTANCIAS QUIMICAS (2).pdf
Experience Design Presentation
Lessons from Building Large-Scale, Multi-Cloud, SaaS Software at Databricks
The Modern Data Team for the Modern Data Stack: dbt and the Role of the Analy...
Introduction to Spark 2.0 Dataset API

Similar to How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect (20)

PDF
Demystifying large PointCloud datasets for simple integration on complex proj...
PPTX
Unlock the value of your big data infrastructure
PDF
Rajeev kumar apache_spark & scala developer
PDF
Simplifying AI integration on Apache Spark
PPTX
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
PDF
ESGYN Overview
PPTX
NuoDB 3.0: Getting Started with Community Edition
PDF
BDTC2015 databricks-辛湜-state of spark
PDF
Horses for Courses: Database Roundtable
PPTX
Elevate MongoDB with ODBC/JDBC
PPTX
Rusty Waters: Elevating Lakehouses Beyond Spark
PDF
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
PDF
Beyond Relational
PPTX
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
PDF
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
PDF
Data Engineer Intro - WeCloudData
PDF
What's New in Upcoming Apache Spark 2.3
PDF
List of Top Local Databases used for react native app developement in 2022
PPTX
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
PDF
Ibm db2 big sql
Demystifying large PointCloud datasets for simple integration on complex proj...
Unlock the value of your big data infrastructure
Rajeev kumar apache_spark & scala developer
Simplifying AI integration on Apache Spark
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
ESGYN Overview
NuoDB 3.0: Getting Started with Community Edition
BDTC2015 databricks-辛湜-state of spark
Horses for Courses: Database Roundtable
Elevate MongoDB with ODBC/JDBC
Rusty Waters: Elevating Lakehouses Beyond Spark
Big Data LDN 2018: A TALE OF TWO BI STANDARDS: DATA WAREHOUSES AND DATA LAKES
Beyond Relational
Apache Cassandra Lunch 119: Desktop GUI Tools for Apache Cassandra
A Tale of 2 BI Standards: One for Data Warehouses and One for Data Lakes
Data Engineer Intro - WeCloudData
What's New in Upcoming Apache Spark 2.3
List of Top Local Databases used for react native app developement in 2022
Apache Spark and Apache Ignite: Where Fast Data Meets the IoT
Ibm db2 big sql
Ad

More from Databricks (20)

PPTX
DW Migration Webinar-March 2022.pptx
PPTX
Data Lakehouse Symposium | Day 1 | Part 1
PPT
Data Lakehouse Symposium | Day 1 | Part 2
PPTX
Data Lakehouse Symposium | Day 2
PPTX
Data Lakehouse Symposium | Day 4
PDF
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
PDF
Democratizing Data Quality Through a Centralized Platform
PDF
Learn to Use Databricks for Data Science
PDF
Why APM Is Not the Same As ML Monitoring
PDF
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
PDF
Stage Level Scheduling Improving Big Data and AI Integration
PDF
Simplify Data Conversion from Spark to TensorFlow and PyTorch
PDF
Scaling your Data Pipelines with Apache Spark on Kubernetes
PDF
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
PDF
Sawtooth Windows for Feature Aggregations
PDF
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
PDF
Re-imagine Data Monitoring with whylogs and Spark
PDF
Raven: End-to-end Optimization of ML Prediction Queries
PDF
Processing Large Datasets for ADAS Applications using Apache Spark
PDF
Massive Data Processing in Adobe Using Delta Lake
DW Migration Webinar-March 2022.pptx
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 4
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
Democratizing Data Quality Through a Centralized Platform
Learn to Use Databricks for Data Science
Why APM Is Not the Same As ML Monitoring
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
Stage Level Scheduling Improving Big Data and AI Integration
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Sawtooth Windows for Feature Aggregations
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Re-imagine Data Monitoring with whylogs and Spark
Raven: End-to-end Optimization of ML Prediction Queries
Processing Large Datasets for ADAS Applications using Apache Spark
Massive Data Processing in Adobe Using Delta Lake
Ad

Recently uploaded (20)

PDF
Mega Projects Data Mega Projects Data
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
PPTX
IBA_Chapter_11_Slides_Final_Accessible.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Introduction to Business Data Analytics.
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Launch Your Data Science Career in Kochi – 2025
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Global journeys: estimating international migration
Mega Projects Data Mega Projects Data
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
IBA_Chapter_11_Slides_Final_Accessible.pptx
Supervised vs unsupervised machine learning algorithms
MODULE 8 - DISASTER risk PREPAREDNESS.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Business Data Analytics.
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Reliability_Chapter_ presentation 1221.5784
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
oil_refinery_comprehensive_20250804084928 (1).pptx
Launch Your Data Science Career in Kochi – 2025
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Clinical guidelines as a resource for EBP(1).pdf
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Global journeys: estimating international migration

How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

  • 2. How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and Rstudio Connect James Blair, Solutions Engineer, RStudio PBC Rafi Kurlansik, Sr. Solutions Architect, Databricks
  • 3. Agenda Rafi Kurlansik, Databricks Building Scalable R and Shiny apps with RStudio and Databricks James Blair, RStudio PBC Deploying Scalable Shiny apps with RStudio Connect and Databricks Benchmarking performance of Shiny connections to Spark
  • 4. How to scale R and Shiny with RStudio and Databricks
  • 5. How can we open up the data lake to R users? ▪ Typical development patterns ▪ Local ▪ Cloud / On Prem VM ▪ Challenges with big data ▪ Server memory - can only process so much data in the app itself before crashing R ▪ Performance - even on a powerful VM, eventually see our app get less responsive as we reach 100+ GBs ▪ Managing big data infrastructure - app value must be higher to justify the energy investment If only there was a technology with a familiar API in R that let our app scale to process 100s of GBs... Imagine trying to do so with traditional R development...
  • 6. Scale R Apps with Databricks and RStudio ▪ Development Patterns ▪ Hosted RStudio Server (Pro) on Databricks Cluster ▪ RStudio with remote Spark access using Databricks Connect ▪ Overcoming challenges with big data ▪ Auto-scaling Databricks Spark Clusters - dynamically respond to accommodate larger data processing tasks ▪ Consistently fast performance with Delta Lake and Databricks Runtime ▪ Managed service allows data teams to focus on building data products, not maintaining infrastructure Databricks Spark, RStudio IDE
  • 7. Hosted RStudio Server Pro on Databricks
  • 8. RStudio with Databricks Connect Local RStudio, Remote Spark
  • 10. The Data Science Process
  • 11. Shiny and Spark: A cautionary tale
  • 12. ODBC to the Rescue - The R + ODBC toolchain is robust and stable - As performant as a native Spark connection - Easy to migrate code from sparklyr to ODBC - Spark still does all of the computation - Databricks provides an optimized Spark ODBC driver
  • 13. ODBC with RStudio Connect
  • 14. ODBC Performance Comparing sparklyr against two versions of the Databricks ODBC/JDBC Driver Collecting Joins
  • 16. ▪ Interactive data analysis with SparkSQL ▪ sparklyr ▪ ODBC ▪ Other Spark APIs ▪ sparklyr ▪ Interactive data analysis with SparkSQL ▪ Shiny with ODBC ▪ Other Spark APIs ▪ ¯_(ツ)_/¯ ▪ Deploy models with MLflow? ▪ Submit individual commands with Databricks REST API 1.2? Run sparklyr jobs from RStudio on Databricks with bricksteR? Stay tuned…. Deploy at scaleDevelop at scale Conclusion
  • 17. Additional Resources ▪ Hosted RStudio on Databricks ▪ Databricks Connect ▪ ODBC ▪ ODBC Configuration ▪ RStudio Connect ▪ Sparklyr ▪ blairj09-talks/spark-summit-2020 ▪ RafiKurlansik/bricksteR ▪ delta-io/delta ▪ sparklyr/sparklyr Related ReposDocumentation
  • 18. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.