SlideShare a Scribd company logo
Turbocharge your data science
with Python or R
Kelli-Jean Chun
North Bay Python
Nov 4, 2018
Turbocharge your data science
with Python or AND R
Kelli-Jean Chun
North Bay Python
Nov 4, 2018
What the heck is a data scientist?
It depends on the company, here are a few example roles:
- Data science analysts: aka data analysts or business analysts
- Product data scientists: Partner with product managers &
engineers to focus on product initiatives
- Experimentation data scientists
- Growth/marketing data scientists
Leverage data
to gain insight
and solve
problems
What is R?
Python R
Indexing starts at 0 1 :)
Loops for i in range(3):
print(i)
for (i in 0:2){
print(i)
}
List/Vector [0, 1, 2, 3] c(0, 1, 2, 3)
Data Frames import pandas as pd
pd.DataFrame({'A': [1, 2], 'B': [3, 4]})
data.frame('A' = c(1,2), 'B' = c(3,4))
When typical people say
this, they usually refer to
Type of snake Letter in the alphabet
“R is a language and environment for statistical computing and graphics.”
Source: https://www.r-project.org/
The Great Debate: Python or R
A brief comparison of some Python & R packages
used in Data Science
Use Case Python R
Data frame + manipulation Pandas + Numpy Base R + dplyr
Plotting matplotlib, seaborn, bokeh Base R, ggplot2, highcharter
Statistics statsmodels Base R
ML scikit-learn caret + glm + xgboost + ...
Deep Learning TensorFlow TensorFlow
Connecting to the other
language
rpy2, pyRserve, RPython reticulate, PythonInR,
rPython, rJython,
SnakeCharmR
So, Python or R?
As a data scientist, I’ll have both!
Predicting whether or not a NYC dog is spayed/neutered
There is a publicly available NYC dataset that has
information on licensed dogs, such as the:
- Dog name
- Gender
- Breed
- Birth month & year
- Coloring
- Borough (e.g. Manhattan, Bronx)
- Zip Code
- Whether or not guard or trained
- Whether or not spayed/neutered
Using this dataset, let’s build a model to predict
whether or not a NYC dog is spayed/neutered.
https://project.wnyc.org/dogs-of-nyc/
What is my typical data scientific method when
building a model?
- ETLs
- Pre-learning: Explore the data, feature engineering, visualizations
- Learning: Model the data
- Post-learning:
- Evaluate the model
- Document and present the final model in a consumable format for product, engineering, and
other data scientists
- Deployment: Data science as a service / microservice to call the model in production
Plan of action
Goal: Using the other features (dog name, gender, etc)
provide a prediction for whether or not we believe a dog is
spayed/neutered
1. Pre-learning: Process the data and explore in R
2. Learning: Develop a predictive model in Python
3. Post-learning: Evaluate the model in Python
Pre-Learning
Exploratory data analysis can be quickly done in R and a summary of the
exploration can be easily shared with RMarkdown.
Similar to Jupyter notebooks:
- Allows for reproducible analysis
- Quickly provide a report & visuals for others
- Organize code chunks
- Embed code in report
As a bonus, R provides fast and easy functions (once you understand some of the
strange syntax) to produce clean visuals.
RMarkdown HTML (or PDF)
Learning & Post-Learning
Python + Sklearn + Pandas + Numpy = 100%
- Sklearn (aka Scikit-learn): provides a wide variety of Machine Learning and
Statistical models. As well as allows for easier splitting of data into training
and testing and model evaluation.
- Pandas: provides the DataFrame type that makes working with data easier.
- NumPy provides broadcasting functions that make it easier to work with
arrays (specifically columns in a pandas DataFrame)
Turbocharge your data science with python and r
How do we connect the two languages?
R in Python with rpy2
Loading the data frame of NYC dogs that was processed in R into Python
can be done with rpy2
import rpy2.robjects as robjects
from rpy2.robjects import pandas2ri
# Read in data from R
pandas2ri.activate()
readRDS = robjects.r['readRDS']
df = readRDS('data/dogs_proc.RDS')
df = pandas2ri.ri2py(df)
R function to read R’s
RDS files
Python in RMarkdown with reticulate
```{r}
library("reticulate")
```
```{python}
print('Python in R')
for i in range(3):
print(i)
# execute Jupyter notebooks
import papermill as pm
pm.execute_notebook("example_notebook.ipynb",
"executed_notebook/example_notebook.ipynb")
```
Instead of specifying r code
(e.g. with {r}), specify python
Turbocharge your data science with python and r
Thanks!

More Related Content

PPTX
Data Science With Python | Python For Data Science | Python Data Science Cour...
PDF
Python and R for quantitative finance
PDF
Download Python for R Users pdf for free
PDF
A Data Science Tutorial in Python
PDF
Introduction to R ajay Ohri
PPTX
Machine learning libraries with python
PDF
The Statistical Significance of "R"
Data Science With Python | Python For Data Science | Python Data Science Cour...
Python and R for quantitative finance
Download Python for R Users pdf for free
A Data Science Tutorial in Python
Introduction to R ajay Ohri
Machine learning libraries with python
The Statistical Significance of "R"

What's hot (20)

PPTX
PhD Projects in Python With Source Code
PPTX
Hai huang presentation
PDF
A Map of the PyData Stack
PDF
Introduction to Spark: Or how I learned to love 'big data' after all.
PPTX
Introduction to r
PPTX
Democratizing Big Semantic Data management
PPTX
Python libraries
PDF
Introduction To R
PDF
Power of Python with Big Data
PPTX
R for data analytics
PDF
Top Libraries for Machine Learning with Python
PPTX
Programming with Semantic Broad Data
PDF
Lacey Liu SDE II Resume
PDF
R tutorial
PPTX
R programming
PPTX
LSESU a Taste of R Language Workshop
PDF
II-SDV 2015, 20 - 21 April, in Nice
PPTX
Introduction to R
PDF
Framester: A Wide Coverage Linguistic Linked Data Hub
PDF
Can functional programming be liberated from static typing?
PhD Projects in Python With Source Code
Hai huang presentation
A Map of the PyData Stack
Introduction to Spark: Or how I learned to love 'big data' after all.
Introduction to r
Democratizing Big Semantic Data management
Python libraries
Introduction To R
Power of Python with Big Data
R for data analytics
Top Libraries for Machine Learning with Python
Programming with Semantic Broad Data
Lacey Liu SDE II Resume
R tutorial
R programming
LSESU a Taste of R Language Workshop
II-SDV 2015, 20 - 21 April, in Nice
Introduction to R
Framester: A Wide Coverage Linguistic Linked Data Hub
Can functional programming be liberated from static typing?

Similar to Turbocharge your data science with python and r (20)

PDF
Python vs. r for data science
PDF
Introduction to Analytics with Azure Notebooks and Python
PPTX
Python and r in data science
PDF
London level39
PDF
Python webinar 4th june
PDF
RDM 2020: Python, Numpy, and Pandas
PDF
Language-agnostic data analysis workflows and reproducible research
PDF
CrashCourse: Python with DataCamp and Jupyter for Beginners
PPTX
Python vs R for Data Science: What’s the Difference? How can they automate?
PDF
Python on Science ? Yes, We can.
PPTX
How To Become Data Scientist? | Complete Roadmap To Become Data Scientist In ...
PPTX
Python ml
PDF
Python for Data Science 1 / converted Edition Yuli Vasiliev
PDF
-python-for-data-science-20240911071905Ss8z.pdf
PDF
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
PPTX
R vs python. Which one is best for data science
PDF
Python for Data Science: A Comprehensive Guide
PPTX
Abhishek Training PPT.pptx
PDF
Python Programming: The Best Language for Every Coder
PDF
PyData: Past, Present Future (PyData SV 2014 Keynote)
Python vs. r for data science
Introduction to Analytics with Azure Notebooks and Python
Python and r in data science
London level39
Python webinar 4th june
RDM 2020: Python, Numpy, and Pandas
Language-agnostic data analysis workflows and reproducible research
CrashCourse: Python with DataCamp and Jupyter for Beginners
Python vs R for Data Science: What’s the Difference? How can they automate?
Python on Science ? Yes, We can.
How To Become Data Scientist? | Complete Roadmap To Become Data Scientist In ...
Python ml
Python for Data Science 1 / converted Edition Yuli Vasiliev
-python-for-data-science-20240911071905Ss8z.pdf
What We Learned Building an R-Python Hybrid Predictive Analytics Pipeline
R vs python. Which one is best for data science
Python for Data Science: A Comprehensive Guide
Abhishek Training PPT.pptx
Python Programming: The Best Language for Every Coder
PyData: Past, Present Future (PyData SV 2014 Keynote)

Recently uploaded (20)

PDF
Business Analytics and business intelligence.pdf
PPTX
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPTX
1_Introduction to advance data techniques.pptx
PDF
annual-report-2024-2025 original latest.
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
Mega Projects Data Mega Projects Data
PPTX
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PDF
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PDF
.pdf is not working space design for the following data for the following dat...
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
Business Analytics and business intelligence.pdf
advance b rammar.pptxfdgdfgdfsgdfgsdgfdfgdfgsdfgdfgdfg
Reliability_Chapter_ presentation 1221.5784
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
1_Introduction to advance data techniques.pptx
annual-report-2024-2025 original latest.
climate analysis of Dhaka ,Banglades.pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
Introduction-to-Cloud-ComputingFinal.pptx
Mega Projects Data Mega Projects Data
Introduction to Firewall Analytics - Interfirewall and Transfirewall.pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
168300704-gasification-ppt.pdfhghhhsjsjhsuxush
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
.pdf is not working space design for the following data for the following dat...
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
Business Ppt On Nestle.pptx huunnnhhgfvu

Turbocharge your data science with python and r

  • 1. Turbocharge your data science with Python or R Kelli-Jean Chun North Bay Python Nov 4, 2018
  • 2. Turbocharge your data science with Python or AND R Kelli-Jean Chun North Bay Python Nov 4, 2018
  • 3. What the heck is a data scientist? It depends on the company, here are a few example roles: - Data science analysts: aka data analysts or business analysts - Product data scientists: Partner with product managers & engineers to focus on product initiatives - Experimentation data scientists - Growth/marketing data scientists Leverage data to gain insight and solve problems
  • 4. What is R? Python R Indexing starts at 0 1 :) Loops for i in range(3): print(i) for (i in 0:2){ print(i) } List/Vector [0, 1, 2, 3] c(0, 1, 2, 3) Data Frames import pandas as pd pd.DataFrame({'A': [1, 2], 'B': [3, 4]}) data.frame('A' = c(1,2), 'B' = c(3,4)) When typical people say this, they usually refer to Type of snake Letter in the alphabet “R is a language and environment for statistical computing and graphics.” Source: https://www.r-project.org/
  • 5. The Great Debate: Python or R
  • 6. A brief comparison of some Python & R packages used in Data Science Use Case Python R Data frame + manipulation Pandas + Numpy Base R + dplyr Plotting matplotlib, seaborn, bokeh Base R, ggplot2, highcharter Statistics statsmodels Base R ML scikit-learn caret + glm + xgboost + ... Deep Learning TensorFlow TensorFlow Connecting to the other language rpy2, pyRserve, RPython reticulate, PythonInR, rPython, rJython, SnakeCharmR
  • 7. So, Python or R? As a data scientist, I’ll have both!
  • 8. Predicting whether or not a NYC dog is spayed/neutered There is a publicly available NYC dataset that has information on licensed dogs, such as the: - Dog name - Gender - Breed - Birth month & year - Coloring - Borough (e.g. Manhattan, Bronx) - Zip Code - Whether or not guard or trained - Whether or not spayed/neutered Using this dataset, let’s build a model to predict whether or not a NYC dog is spayed/neutered. https://project.wnyc.org/dogs-of-nyc/
  • 9. What is my typical data scientific method when building a model? - ETLs - Pre-learning: Explore the data, feature engineering, visualizations - Learning: Model the data - Post-learning: - Evaluate the model - Document and present the final model in a consumable format for product, engineering, and other data scientists - Deployment: Data science as a service / microservice to call the model in production
  • 10. Plan of action Goal: Using the other features (dog name, gender, etc) provide a prediction for whether or not we believe a dog is spayed/neutered 1. Pre-learning: Process the data and explore in R 2. Learning: Develop a predictive model in Python 3. Post-learning: Evaluate the model in Python
  • 11. Pre-Learning Exploratory data analysis can be quickly done in R and a summary of the exploration can be easily shared with RMarkdown. Similar to Jupyter notebooks: - Allows for reproducible analysis - Quickly provide a report & visuals for others - Organize code chunks - Embed code in report As a bonus, R provides fast and easy functions (once you understand some of the strange syntax) to produce clean visuals.
  • 13. Learning & Post-Learning Python + Sklearn + Pandas + Numpy = 100% - Sklearn (aka Scikit-learn): provides a wide variety of Machine Learning and Statistical models. As well as allows for easier splitting of data into training and testing and model evaluation. - Pandas: provides the DataFrame type that makes working with data easier. - NumPy provides broadcasting functions that make it easier to work with arrays (specifically columns in a pandas DataFrame)
  • 15. How do we connect the two languages?
  • 16. R in Python with rpy2 Loading the data frame of NYC dogs that was processed in R into Python can be done with rpy2 import rpy2.robjects as robjects from rpy2.robjects import pandas2ri # Read in data from R pandas2ri.activate() readRDS = robjects.r['readRDS'] df = readRDS('data/dogs_proc.RDS') df = pandas2ri.ri2py(df) R function to read R’s RDS files
  • 17. Python in RMarkdown with reticulate ```{r} library("reticulate") ``` ```{python} print('Python in R') for i in range(3): print(i) # execute Jupyter notebooks import papermill as pm pm.execute_notebook("example_notebook.ipynb", "executed_notebook/example_notebook.ipynb") ``` Instead of specifying r code (e.g. with {r}), specify python