How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

How R Developers Can Build and
Share Data and AI Applications
that Scale with Databricks and
Rstudio Connect
James Blair, Solutions Engineer, RStudio PBC
Rafi Kurlansik, Sr. Solutions Architect, Databricks

Agenda
Rafi Kurlansik, Databricks
Building Scalable R and Shiny apps with RStudio
and Databricks
James Blair, RStudio PBC
Deploying Scalable Shiny apps with RStudio
Connect and Databricks
Benchmarking performance of Shiny connections
to Spark

How to scale R and Shiny
with RStudio and Databricks

How can we open up the data lake to R users?
▪ Typical development patterns
▪ Local
▪ Cloud / On Prem VM
▪ Challenges with big data
▪ Server memory - can only process so much data in the app itself before crashing R
▪ Performance - even on a powerful VM, eventually see our app get less responsive as we reach 100+ GBs
▪ Managing big data infrastructure - app value must be higher to justify the energy investment
If only there was a technology with a familiar API in R that let our app scale to process 100s of GBs...
Imagine trying to do so with traditional R development...

Scale R Apps with Databricks and RStudio
▪ Development Patterns
▪ Hosted RStudio Server (Pro) on Databricks Cluster
▪ RStudio with remote Spark access using Databricks Connect
▪ Overcoming challenges with big data
▪ Auto-scaling Databricks Spark Clusters - dynamically respond to accommodate larger data processing tasks
▪ Consistently fast performance with Delta Lake and Databricks Runtime
▪ Managed service allows data teams to focus on building data products, not maintaining infrastructure
Databricks Spark, RStudio IDE

Hosted RStudio Server Pro on Databricks

RStudio with Databricks Connect
Local RStudio, Remote Spark

Shiny and Spark: A cautionary tale

ODBC to the Rescue
- The R + ODBC toolchain is robust and stable
- As performant as a native Spark connection
- Easy to migrate code from sparklyr to ODBC
- Spark still does all of the computation
- Databricks provides an optimized Spark ODBC driver

ODBC Performance
Comparing sparklyr against two versions of the Databricks ODBC/JDBC Driver
Collecting Joins

▪ Interactive data analysis with SparkSQL
▪ sparklyr
▪ ODBC
▪ Other Spark APIs
▪ sparklyr
▪ Interactive data analysis with SparkSQL
▪ Shiny with ODBC
▪ Other Spark APIs
▪ ¯_(ツ)_/¯
▪ Deploy models with MLflow?
▪ Submit individual commands with Databricks
REST API 1.2?
Run sparklyr jobs from RStudio on Databricks
with bricksteR?
Stay tuned….
Deploy at scaleDevelop at scale
Conclusion

Additional Resources
▪ Hosted RStudio on Databricks
▪ Databricks Connect
▪ ODBC
▪ ODBC Configuration
▪ RStudio Connect
▪ Sparklyr
▪ blairj09-talks/spark-summit-2020
▪ RafiKurlansik/bricksteR
▪ delta-io/delta
▪ sparklyr/sparklyr
Related ReposDocumentation

Feedback
Your feedback is important to us.
Don’t forget to rate and
review the sessions.

How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect

More Related Content

Similar to How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect (20)

More from Databricks (20)

Recently uploaded (20)

How R Developers Can Build and Share Data and AI Applications that Scale with Databricks and RStudio Connect