SlideShare a Scribd company logo
Docopt, beautiful command-line options for R
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
July 2014, UseR!2014
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
What is docopt?
docopt is an utility R library for parsing command-line options. It is
a port of docopt.py (python).
How does it work?
You supply a properly formed help description
docopt creates from this a fully functional command-line
parser
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Why Command-line?
R is used more and more:
Ad hoc, interactive analysis, e.g
R REPL shell
RStudio
interactive data analysis
Creating R libraries with vi, Rstudio etc.
no data analysis
But also for repetitive batch jobs:
Rscript my_script.R arg1 arg2 . . .
R -f my_script.R --args arg1 arg2 . . .
reproducible data processing
So also more and more Command-line!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Rscript example
#!/usr/bin/Rscript
my_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length
)
print(coef(my_model))
Hmm, that script only works for this specific data set.
I Need Arguments and Options!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Command-line parameters
Parsing command-line parameters seems easy, but what about:
Switches? e.g. --debug, --help
Short names and long names? -d, -h vs --debug, --help?
Options with a value? --output=garbage.csv
Arguments e.g. input_file.csv?
Optional arguments?
default values for options?
documenting all options and arguments?
That is a lot of work for just a batch script. . .
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Retrieving command-line options
What libraries available?
base::commandArgs (very primitive)
library(getopt): (basic)
library(argparse), Python dependency
library(optparse) very nice, Python inspired
These are all fine, but result in a lot of parsing or settting-up code
in your script. (and that is not what your script is about. . . )
docopt is different.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
What is Docopt?
Originally a Python lib: http://docopt.org
It is a Command Line Interface Specification language:
You specify your help and docopt parser takes care of
everything.
The documentation = the specification.
Your script starts with the command-line help
docopt automatically has --help or -h switch to supply help
to users of your script.
It will stop when obligatory switch are not set or non existing
options are set.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Simple example
#!/usr/bin/Rscript
"This is my incredible script
Usage: my_inc_script.R [-v --output=<output>] FILE
" -> doc
library(docopt)
my_opts <- docopt(doc)
That’s all you need to handle your command-line options.
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Options
Docopt lets you parse:
Both short as long options
Default values
Descriptions of parameters
Optional parameters: my_script.R [-a -b]
Commands: my_script.R (lm | summary)
Positional arguments
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Usage patterns
Syntax is defined at http://docopt.org
Start with Usage:
"Usage:
script.R --option <argument>
script.R [<optional-argument>]
script.R --another-option=<with-argument>
script.R (--either-that-option | <or-this-argument>)
script.R <repeating-argument> <repeating-argument>...
" -> doc
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Longer example
#!/usr/bin/Rscript
"This is my useful scriptI I use on everything
Usage: my_uf_script.R [options] FILE
Options:
-b --bogus This is a bogus switch
-o --output=OUTPUT output file [default: out.csv]
Arguments:
FILE the input file" -> doc
library(docopt)
my_opts <- docopt(doc)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Recall first example
Lets make a CLI for our script
#!/usr/bin/Rscript
my_model <- glm( data=iris
, Sepal.Width ~ Sepal.Length
)
print(coef(my_model))
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Preparing. . .
#!/usr/bin/Rscript
main <- function( DATA, response, terms, family){
data <- read.csv(DATA)
f <- as.formula(paste0(response, " ~ ", terms))
my_model <- glm(f, family=family, data=data)
print(coef(my_model))
}
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Done!
"Usage: my_script.R --response=<y> --terms=<x>
[--family=<family>] DATA
Options:
-r --response=<y> Response for glm
-t --terms=<x> Terms for glm
-f --family=<family> Family [default: gaussian]
Arguments:
DATA Input data frame" -> doc
main <- function( DATA, response, terms, family){...}
opt <- docopt::docopt(doc)
main(opt$DATA, opt[["--response"]], opt[["--terms"]],
opt[["--family"]])
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Implementation
Docopt is implemented:
using Reference classes (R5) in pure R.
It is port of the original Python project: http://docopt.org
Available from: CRAN and
https://github.com/edwindj/docopt.R
Very functional, except for:
multiple identical arguments -vvv
repeating arguments (both will be fixed soon)
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Questions?
$ my_talk.R --help
Edwins talk on docopt
Usage: my_talk.R (--questions | --fell-asleep)
Options:
-q --questions Anyone any questions?
-f --fell-asleep Wake up! Next UseR talk!
$ my_talk.R --questions
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R
Questions?
Thanks for listening!
Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS)
Docopt, beautiful command-line options for R

More Related Content

PPTX
Data Mining
PDF
Exploratory data analysis data visualization
PDF
Information Extraction
PPT
HDF5 Advanced Topics - Datatypes and Partial I/O
PDF
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
PPTX
Data analytics vs. Data analysis
PPTX
Kdd process
PPTX
01 IoT Development History and Overview.pptx
Data Mining
Exploratory data analysis data visualization
Information Extraction
HDF5 Advanced Topics - Datatypes and Partial I/O
Big Data Applications | Big Data Analytics Use-Cases | Big Data Tutorial for ...
Data analytics vs. Data analysis
Kdd process
01 IoT Development History and Overview.pptx

What's hot (20)

PPTX
Iot architecture
PPTX
M2M systems layers and designs standardizations
PPTX
Xml presentation
PPTX
Hadoop And Their Ecosystem ppt
PPT
Natural language processing
PDF
Natural language processing (NLP) introduction
PPTX
Text MIning
PPTX
Big data
PDF
Topic Modeling - NLP
PDF
Natural Language Processing
PPTX
4. Internet of Things - Reference Model and Architecture
PPTX
Big data ppt
PPT
[ppt]
PPT
1.2 steps and functionalities
PDF
NLP using transformers
PDF
History of Data Science
PPTX
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
PDF
Introduction to Natural Language Processing (NLP)
PPTX
Natural lanaguage processing
PDF
Natural language processing (nlp)
Iot architecture
M2M systems layers and designs standardizations
Xml presentation
Hadoop And Their Ecosystem ppt
Natural language processing
Natural language processing (NLP) introduction
Text MIning
Big data
Topic Modeling - NLP
Natural Language Processing
4. Internet of Things - Reference Model and Architecture
Big data ppt
[ppt]
1.2 steps and functionalities
NLP using transformers
History of Data Science
임태현, Text-CNN을 이용한 Sentiment 분설모델 구현
Introduction to Natural Language Processing (NLP)
Natural lanaguage processing
Natural language processing (nlp)
Ad

Viewers also liked (19)

PDF
TestR: generating unit tests for R internals
PDF
Seefeld stats r_bio
PPTX
Extending and customizing ibm spss statistics with python, r, and .net (2)
PPTX
R Statistics
PDF
Statistics with R
PDF
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
PDF
3 descriptive statistics with R
PPTX
Presentation on use of r statistics
PDF
Dependencies and Licenses
PDF
Descriptive Statistics with R
PDF
Chunked, dplyr for large text files
PPTX
Introduction to basic statistics
 
PPTX
How to use Logistic Regression in GIS using ArcGIS and R statistics
PPTX
Why R? A Brief Introduction to the Open Source Statistics Platform
PDF
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
PDF
Class ppt intro to r
PDF
R statistics with mongo db
PDF
Data analysis using spss
TestR: generating unit tests for R internals
Seefeld stats r_bio
Extending and customizing ibm spss statistics with python, r, and .net (2)
R Statistics
Statistics with R
Getting Up to Speed with R: Certificate Program in R for Statistical Analysis...
3 descriptive statistics with R
Presentation on use of r statistics
Dependencies and Licenses
Descriptive Statistics with R
Chunked, dplyr for large text files
Introduction to basic statistics
 
How to use Logistic Regression in GIS using ArcGIS and R statistics
Why R? A Brief Introduction to the Open Source Statistics Platform
عرض محاضرة كيفية انشاء المطاعم لرؤوس الاموال الناشئة والمبتدئة
Class ppt intro to r
R statistics with mongo db
Data analysis using spss
Ad

Similar to Docopt, beautiful command-line options for R, user2014 (20)

PDF
Language-agnostic data analysis workflows and reproducible research
PDF
Massively Parallel Processing with Procedural Python (PyData London 2014)
PDF
What we can learn from Rebol?
PDF
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
PDF
Massively Parallel Process with Prodedural Python by Ian Huston
PDF
Pyhton-1a-Basics.pdf
ODP
biopython, doctest and makefiles
PPTX
R and Python, A Code Demo
PDF
First Steps in Python Programming
ODP
Programming Under Linux In Python
PPTX
Hadoop with Python
PDF
Python: an introduction for PHP webdevelopers
PDF
Introduction to R and R Studio
PPTX
Workshop presentation hands on r programming
DOCX
Python Course.docx
PDF
Unit V.pdf
PDF
Data Science - Part II - Working with R & R studio
PDF
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
PPT
An Overview Of Python With Functional Programming
Language-agnostic data analysis workflows and reproducible research
Massively Parallel Processing with Procedural Python (PyData London 2014)
What we can learn from Rebol?
Data Science Amsterdam - Massively Parallel Processing with Procedural Languages
Massively Parallel Process with Prodedural Python by Ian Huston
Pyhton-1a-Basics.pdf
biopython, doctest and makefiles
R and Python, A Code Demo
First Steps in Python Programming
Programming Under Linux In Python
Hadoop with Python
Python: an introduction for PHP webdevelopers
Introduction to R and R Studio
Workshop presentation hands on r programming
Python Course.docx
Unit V.pdf
Data Science - Part II - Working with R & R studio
Massively Parallel Processing with Procedural Python by Ronert Obst PyData Be...
An Overview Of Python With Functional Programming

More from Edwin de Jonge (15)

PDF
sdcSpatial user!2019
PDF
Validatetools, resolve and simplify contradictive or data validation rules
PDF
Data error! But where?
PDF
Daff: diff, patch and merge for data.frame
PDF
Uncertainty visualisation
PDF
Heatmaps best practices Strata Hadoop
PPTX
Big data experiments
PPTX
StatMine
PDF
Big Data Visualization
PDF
ffbase, statistical functions for large datasets
PDF
Tabplotd3, interactive inspection of large data
PPTX
Big data as a source for official statistics
PPT
Statmine, Visuele dataexploratie
PPTX
StatMine (New Technologies and Techniques for Statistics)
PPT
StatMine, visual exploration of output data
sdcSpatial user!2019
Validatetools, resolve and simplify contradictive or data validation rules
Data error! But where?
Daff: diff, patch and merge for data.frame
Uncertainty visualisation
Heatmaps best practices Strata Hadoop
Big data experiments
StatMine
Big Data Visualization
ffbase, statistical functions for large datasets
Tabplotd3, interactive inspection of large data
Big data as a source for official statistics
Statmine, Visuele dataexploratie
StatMine (New Technologies and Techniques for Statistics)
StatMine, visual exploration of output data

Recently uploaded (20)

PPTX
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
PDF
System and Network Administration Chapter 2
PDF
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
PDF
Navsoft: AI-Powered Business Solutions & Custom Software Development
PDF
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
PPTX
ai tools demonstartion for schools and inter college
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PPTX
CHAPTER 2 - PM Management and IT Context
PPTX
ManageIQ - Sprint 268 Review - Slide Deck
PPTX
Odoo POS Development Services by CandidRoot Solutions
PDF
top salesforce developer skills in 2025.pdf
PPTX
history of c programming in notes for students .pptx
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PDF
Nekopoi APK 2025 free lastest update
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
PPTX
Transform Your Business with a Software ERP System
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
How to Choose the Right IT Partner for Your Business in Malaysia
PDF
Digital Strategies for Manufacturing Companies
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
CHAPTER 12 - CYBER SECURITY AND FUTURE SKILLS (1) (1).pptx
System and Network Administration Chapter 2
Raksha Bandhan Grocery Pricing Trends in India 2025.pdf
Navsoft: AI-Powered Business Solutions & Custom Software Development
Claude Code: Everyone is a 10x Developer - A Comprehensive AI-Powered CLI Tool
ai tools demonstartion for schools and inter college
Internet Downloader Manager (IDM) Crack 6.42 Build 41
CHAPTER 2 - PM Management and IT Context
ManageIQ - Sprint 268 Review - Slide Deck
Odoo POS Development Services by CandidRoot Solutions
top salesforce developer skills in 2025.pdf
history of c programming in notes for students .pptx
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Nekopoi APK 2025 free lastest update
2025 Textile ERP Trends: SAP, Odoo & Oracle
Transform Your Business with a Software ERP System
Design an Analysis of Algorithms II-SECS-1021-03
How to Choose the Right IT Partner for Your Business in Malaysia
Digital Strategies for Manufacturing Companies
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx

Docopt, beautiful command-line options for R, user2014

  • 1. Docopt, beautiful command-line options for R Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) July 2014, UseR!2014 Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 2. What is docopt? docopt is an utility R library for parsing command-line options. It is a port of docopt.py (python). How does it work? You supply a properly formed help description docopt creates from this a fully functional command-line parser Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 3. Why Command-line? R is used more and more: Ad hoc, interactive analysis, e.g R REPL shell RStudio interactive data analysis Creating R libraries with vi, Rstudio etc. no data analysis But also for repetitive batch jobs: Rscript my_script.R arg1 arg2 . . . R -f my_script.R --args arg1 arg2 . . . reproducible data processing So also more and more Command-line! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 4. Rscript example #!/usr/bin/Rscript my_model <- glm( data=iris , Sepal.Width ~ Sepal.Length ) print(coef(my_model)) Hmm, that script only works for this specific data set. I Need Arguments and Options! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 5. Command-line parameters Parsing command-line parameters seems easy, but what about: Switches? e.g. --debug, --help Short names and long names? -d, -h vs --debug, --help? Options with a value? --output=garbage.csv Arguments e.g. input_file.csv? Optional arguments? default values for options? documenting all options and arguments? That is a lot of work for just a batch script. . . Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 6. Retrieving command-line options What libraries available? base::commandArgs (very primitive) library(getopt): (basic) library(argparse), Python dependency library(optparse) very nice, Python inspired These are all fine, but result in a lot of parsing or settting-up code in your script. (and that is not what your script is about. . . ) docopt is different. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 7. What is Docopt? Originally a Python lib: http://docopt.org It is a Command Line Interface Specification language: You specify your help and docopt parser takes care of everything. The documentation = the specification. Your script starts with the command-line help docopt automatically has --help or -h switch to supply help to users of your script. It will stop when obligatory switch are not set or non existing options are set. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 8. Simple example #!/usr/bin/Rscript "This is my incredible script Usage: my_inc_script.R [-v --output=<output>] FILE " -> doc library(docopt) my_opts <- docopt(doc) That’s all you need to handle your command-line options. Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 9. Options Docopt lets you parse: Both short as long options Default values Descriptions of parameters Optional parameters: my_script.R [-a -b] Commands: my_script.R (lm | summary) Positional arguments Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 10. Usage patterns Syntax is defined at http://docopt.org Start with Usage: "Usage: script.R --option <argument> script.R [<optional-argument>] script.R --another-option=<with-argument> script.R (--either-that-option | <or-this-argument>) script.R <repeating-argument> <repeating-argument>... " -> doc Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 11. Longer example #!/usr/bin/Rscript "This is my useful scriptI I use on everything Usage: my_uf_script.R [options] FILE Options: -b --bogus This is a bogus switch -o --output=OUTPUT output file [default: out.csv] Arguments: FILE the input file" -> doc library(docopt) my_opts <- docopt(doc) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 12. Recall first example Lets make a CLI for our script #!/usr/bin/Rscript my_model <- glm( data=iris , Sepal.Width ~ Sepal.Length ) print(coef(my_model)) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 13. Preparing. . . #!/usr/bin/Rscript main <- function( DATA, response, terms, family){ data <- read.csv(DATA) f <- as.formula(paste0(response, " ~ ", terms)) my_model <- glm(f, family=family, data=data) print(coef(my_model)) } Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 14. Done! "Usage: my_script.R --response=<y> --terms=<x> [--family=<family>] DATA Options: -r --response=<y> Response for glm -t --terms=<x> Terms for glm -f --family=<family> Family [default: gaussian] Arguments: DATA Input data frame" -> doc main <- function( DATA, response, terms, family){...} opt <- docopt::docopt(doc) main(opt$DATA, opt[["--response"]], opt[["--terms"]], opt[["--family"]]) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 15. Implementation Docopt is implemented: using Reference classes (R5) in pure R. It is port of the original Python project: http://docopt.org Available from: CRAN and https://github.com/edwindj/docopt.R Very functional, except for: multiple identical arguments -vvv repeating arguments (both will be fixed soon) Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 16. Questions? $ my_talk.R --help Edwins talk on docopt Usage: my_talk.R (--questions | --fell-asleep) Options: -q --questions Anyone any questions? -f --fell-asleep Wake up! Next UseR talk! $ my_talk.R --questions Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R
  • 17. Questions? Thanks for listening! Edwin de Jonge (@edwindjonge), Statistics Netherlands (CBS) Docopt, beautiful command-line options for R