SlideShare a Scribd company logo
© 2013 Sqor, Inc.
Sqor
Using R For Social Media and Sports Data
Athletes SuccessData
Noah Gift: CTO @ Sqor
© 2013 Sqor, Inc.
What is Sqor?

•  Social Network hyper-focused on enhancing fan/athlete relationships. We only do Sports!: Now
•  Marketplace for athletes to build and market their digital brand: Now
•  Social Analytics and Prediction Engine as a Service: Q1 2015
•  Micro-endorsement platform: Q1 2015
•  Crowdfunding for athletes: Now
•  Game platform: First Homegrown game featuring Brett Favre: Now
•  Cross-Social Network Publishing Platform: Facebook, Twitter, Embeddable posts.: Now
•  Website, Android App, and iOS App:
© 2013 Sqor, Inc.
Key Aspects of Data Pipeline
•  Multiple languages involved: Python, R, Erlang, C#, SQL and Javascript.
•  Multiple persistence options: SQL Server (RDS), Riak (No SQL), CSV Files, Mnesia (Distributed Soft Realtime
DB)
•  RabbitMQ and Erlang handle messaging and job communication
•  Easy to debug: daily and nightly scripts, intermediate CSV files, deep storage in K/V store and reports live in
RDS.
•  R is used exclusively for machine learning and statistics (Although recommendation engine v1 was written in
Python. We are going to replace it with R/Erlang code though)
© 2013 Sqor, Inc.
Things They Don’t Tell You Building A Data Pipeline 
From Scratch (Our you should have paid attention to) 
•  Getting the data in the right format and making sure it is accurate is back breaking work. It truly is horrible.
•  Keeping track of model prediction accuracy over time: both with new data and new models is really important
•  Non-linear regression is non-trivial
•  Automation and debuggability of every step is very important. Think Unix Tools
•  Expensive, exotic solutions sometimes aren’t worth it at first…or maybe ever. Weird databases, etc.
•  Making predictions involving real money with limited data is scary and really hard. If your not scared about this,
you should be.)
© 2013 Sqor, Inc.
Predicting Top Athletic Performers in Social Media
•  Sqor finds influential athletes and collaborates with them using our prediction algorithms
© 2013 Sqor, Inc.
Our Prediction Algorithms Appear To Work
•  Or we got really lucky….
© 2013 Sqor, Inc.
Clustering
•  We use R clustering packages for classification, visualization of patterns and diagnostics for predictions
© 2013 Sqor, Inc.
Clustering
•  We use kNN clustering for NBA and MLB Sports. Plan on expanding this further in the near future.
© 2013 Sqor, Inc.
Erlang/R Bridge
•  Sqor is a heavy user of Erlang
•  We like Erlang because it has unique concurrency abilities and high uptime (and also because I had a lot of
bosses who told me I couldn’t use).
•  ➜ ~ curl -v -X PUT -H 'content-type: application/json' http://127.0.0.1:8080/api/script/foo -d
'{"script":"execute <- function (A) { A * 2 }", "docs":"this doubles stuff"}'
•  ➜ ~ curl -v http://127.0.0.1:8080/api/script/foo -X POST -H 'content-type: application/json' -d '[25]’
•  Returns: [50.0]
•  We plan on open sourcing this in next 2 months: Run scripts, runs jobs, scales R

More Related Content

PPTX
Analyze this
PPTX
R and Data Science
PPTX
Data Analytics with R and SQL Server
PDF
Introduction to Analytics with Azure Notebooks and Python
PPTX
Distributed Deep Learning + others for Spark Meetup
PPTX
Building a scalable data science platform with R
PPTX
Machine Learning with Spark
PDF
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark
Analyze this
R and Data Science
Data Analytics with R and SQL Server
Introduction to Analytics with Azure Notebooks and Python
Distributed Deep Learning + others for Spark Meetup
Building a scalable data science platform with R
Machine Learning with Spark
Scalable Collaborative Filtering Recommendation Algorithms on Apache Spark

What's hot (20)

PDF
Data Science with Spark
PPTX
Python for Data Science with Anaconda
PDF
Distributed processing of large graphs in python
PPTX
Using the search engine as recommendation engine
PDF
Agile data science with scala
PPT
Big Graph Analytics on Neo4j with Apache Spark
PDF
Towards a rebirth of data science (by Data Fellas)
PPTX
The Challenges of Bringing Machine Learning to the Masses
PPTX
Sparking Science up with Research Recommendations by Maya Hristakeva
PDF
What is a distributed data science pipeline. how with apache spark and friends.
PPTX
Gephi, Graphx, and Giraph
PDF
Pandas UDF: Scalable Analysis with Python and PySpark
PPTX
Making Machine Learning Scale: Single Machine and Distributed
PPTX
Follow the money with graphs
PPTX
CuRious about R in Power BI? End to end R in Power BI for beginners
PPTX
EDHREC @ Data Science MD
PDF
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
PDF
Better {ML} Together: GraphLab Create + Spark
PPTX
Data Science at Scale: Using Apache Spark for Data Science at Bitly
PDF
The MADlib Analytics Library
 
Data Science with Spark
Python for Data Science with Anaconda
Distributed processing of large graphs in python
Using the search engine as recommendation engine
Agile data science with scala
Big Graph Analytics on Neo4j with Apache Spark
Towards a rebirth of data science (by Data Fellas)
The Challenges of Bringing Machine Learning to the Masses
Sparking Science up with Research Recommendations by Maya Hristakeva
What is a distributed data science pipeline. how with apache spark and friends.
Gephi, Graphx, and Giraph
Pandas UDF: Scalable Analysis with Python and PySpark
Making Machine Learning Scale: Single Machine and Distributed
Follow the money with graphs
CuRious about R in Power BI? End to end R in Power BI for beginners
EDHREC @ Data Science MD
Parikshit Ram – Senior Machine Learning Scientist, Skytree at MLconf ATL
Better {ML} Together: GraphLab Create + Spark
Data Science at Scale: Using Apache Spark for Data Science at Bitly
The MADlib Analytics Library
 
Ad

Similar to Using R for Social Media and Sports Analytics (20)

PPTX
SPSNYC2019 - What is Common Data Model and how to use it?
PDF
San Francisco Atlassian User Group - February 2014
ODP
Measuring Programmer Performance with SourceKibitzer EyeQ
PDF
How Celtra Optimizes its Advertising Platform with Databricks
PDF
RightScale Webinar: Get Top Performance for Your Games
PDF
Snowplow: open source game analytics powered by AWS
PDF
Extending the Reach of R to the Enterprise with TERR and Spotfire
PDF
Software Development & Architecture @ LinkedIn
PDF
Splunk bangalore user group 2020-06-01
PPTX
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
PPTX
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
PDF
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
PDF
Software Development Services
PPTX
Redefining the Role of IT in a Self-Help Data Integration Environment
PPTX
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
PDF
Designing a pragmatic back-end service for mobile games
DOC
PDF
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
PDF
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
PPTX
RootandLeaves.pptx
SPSNYC2019 - What is Common Data Model and how to use it?
San Francisco Atlassian User Group - February 2014
Measuring Programmer Performance with SourceKibitzer EyeQ
How Celtra Optimizes its Advertising Platform with Databricks
RightScale Webinar: Get Top Performance for Your Games
Snowplow: open source game analytics powered by AWS
Extending the Reach of R to the Enterprise with TERR and Spotfire
Software Development & Architecture @ LinkedIn
Splunk bangalore user group 2020-06-01
Can We Make Maps from Videos? ~From AI Algorithm to Engineering for Continuou...
Venkatesh Ramanathan, Data Scientist, PayPal at MLconf ATL 2017
Agile Data Engineering: Introduction to Data Vault 2.0 (2018)
Software Development Services
Redefining the Role of IT in a Self-Help Data Integration Environment
Big Data Day LA 2016/ Big Data Track - Apply R in Enterprise Applications, Lo...
Designing a pragmatic back-end service for mobile games
Thinking DevOps in the Era of the Cloud - Demi Ben-Ari
AI - Your Startup Sidekick (Leveraging AI to Bootstrap a Lean Startup).pdf
RootandLeaves.pptx
Ad

More from Ajay Ohri (20)

PDF
Introduction to R ajay Ohri
PPTX
Introduction to R
PDF
Social Media and Fake News in the 2016 Election
PDF
Pyspark
PDF
Download Python for R Users pdf for free
PDF
Install spark on_windows10
DOCX
Ajay ohri Resume
PDF
Statistics for data scientists
PPTX
National seminar on emergence of internet of things (io t) trends and challe...
PDF
Tools and techniques for data science
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
PDF
Training in Analytics and Data Science
PDF
Tradecraft
PDF
Software Testing for Data Scientists
PDF
Craps
PDF
A Data Science Tutorial in Python
PDF
How does cryptography work? by Jeroen Ooms
PDF
Kush stats alpha
PPTX
Summer school python in spanish
PPTX
Introduction to sas in spanish
Introduction to R ajay Ohri
Introduction to R
Social Media and Fake News in the 2016 Election
Pyspark
Download Python for R Users pdf for free
Install spark on_windows10
Ajay ohri Resume
Statistics for data scientists
National seminar on emergence of internet of things (io t) trends and challe...
Tools and techniques for data science
How Big Data ,Cloud Computing ,Data Science can help business
Training in Analytics and Data Science
Tradecraft
Software Testing for Data Scientists
Craps
A Data Science Tutorial in Python
How does cryptography work? by Jeroen Ooms
Kush stats alpha
Summer school python in spanish
Introduction to sas in spanish

Recently uploaded (20)

PPTX
Qualitative Qantitative and Mixed Methods.pptx
PDF
annual-report-2024-2025 original latest.
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPT
Reliability_Chapter_ presentation 1221.5784
PPTX
Business Acumen Training GuidePresentation.pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PDF
Mega Projects Data Mega Projects Data
PPTX
1_Introduction to advance data techniques.pptx
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Fluorescence-microscope_Botany_detailed content
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Foundation of Data Science unit number two notes
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PDF
Lecture1 pattern recognition............
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Introduction to Knowledge Engineering Part 1
Qualitative Qantitative and Mixed Methods.pptx
annual-report-2024-2025 original latest.
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
Reliability_Chapter_ presentation 1221.5784
Business Acumen Training GuidePresentation.pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Mega Projects Data Mega Projects Data
1_Introduction to advance data techniques.pptx
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
oil_refinery_comprehensive_20250804084928 (1).pptx
Fluorescence-microscope_Botany_detailed content
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
Data_Analytics_and_PowerBI_Presentation.pptx
Foundation of Data Science unit number two notes
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Lecture1 pattern recognition............
Introduction-to-Cloud-ComputingFinal.pptx
Introduction to Knowledge Engineering Part 1

Using R for Social Media and Sports Analytics

  • 1. © 2013 Sqor, Inc. Sqor Using R For Social Media and Sports Data Athletes SuccessData Noah Gift: CTO @ Sqor
  • 2. © 2013 Sqor, Inc. What is Sqor? •  Social Network hyper-focused on enhancing fan/athlete relationships. We only do Sports!: Now •  Marketplace for athletes to build and market their digital brand: Now •  Social Analytics and Prediction Engine as a Service: Q1 2015 •  Micro-endorsement platform: Q1 2015 •  Crowdfunding for athletes: Now •  Game platform: First Homegrown game featuring Brett Favre: Now •  Cross-Social Network Publishing Platform: Facebook, Twitter, Embeddable posts.: Now •  Website, Android App, and iOS App:
  • 3. © 2013 Sqor, Inc. Key Aspects of Data Pipeline •  Multiple languages involved: Python, R, Erlang, C#, SQL and Javascript. •  Multiple persistence options: SQL Server (RDS), Riak (No SQL), CSV Files, Mnesia (Distributed Soft Realtime DB) •  RabbitMQ and Erlang handle messaging and job communication •  Easy to debug: daily and nightly scripts, intermediate CSV files, deep storage in K/V store and reports live in RDS. •  R is used exclusively for machine learning and statistics (Although recommendation engine v1 was written in Python. We are going to replace it with R/Erlang code though)
  • 4. © 2013 Sqor, Inc. Things They Don’t Tell You Building A Data Pipeline From Scratch (Our you should have paid attention to) •  Getting the data in the right format and making sure it is accurate is back breaking work. It truly is horrible. •  Keeping track of model prediction accuracy over time: both with new data and new models is really important •  Non-linear regression is non-trivial •  Automation and debuggability of every step is very important. Think Unix Tools •  Expensive, exotic solutions sometimes aren’t worth it at first…or maybe ever. Weird databases, etc. •  Making predictions involving real money with limited data is scary and really hard. If your not scared about this, you should be.)
  • 5. © 2013 Sqor, Inc. Predicting Top Athletic Performers in Social Media •  Sqor finds influential athletes and collaborates with them using our prediction algorithms
  • 6. © 2013 Sqor, Inc. Our Prediction Algorithms Appear To Work •  Or we got really lucky….
  • 7. © 2013 Sqor, Inc. Clustering •  We use R clustering packages for classification, visualization of patterns and diagnostics for predictions
  • 8. © 2013 Sqor, Inc. Clustering •  We use kNN clustering for NBA and MLB Sports. Plan on expanding this further in the near future.
  • 9. © 2013 Sqor, Inc. Erlang/R Bridge •  Sqor is a heavy user of Erlang •  We like Erlang because it has unique concurrency abilities and high uptime (and also because I had a lot of bosses who told me I couldn’t use). •  ➜ ~ curl -v -X PUT -H 'content-type: application/json' http://127.0.0.1:8080/api/script/foo -d '{"script":"execute <- function (A) { A * 2 }", "docs":"this doubles stuff"}' •  ➜ ~ curl -v http://127.0.0.1:8080/api/script/foo -X POST -H 'content-type: application/json' -d '[25]’ •  Returns: [50.0] •  We plan on open sourcing this in next 2 months: Run scripts, runs jobs, scales R