SlideShare a Scribd company logo
Background © Jim Kaskade: Big Data 
BIG DATA AND DATA SCIENCE 
study materials and online courses by @dspadawan
WHAT IS DATA SCIENCE 
2 Copyright © 2013-2014 by Teradata. All rights reserved. 
THE DATA SCIENCE VENN DIAGRAM 
@dspadawan
DATA SCIENCE DOMAINS 
All links go to Wiki. 
If you are not sure 
what something 
means you can learn. 
1. Data Science (Fundamentals) 
2. Statistics 
3. Programming languages 
4. Machine Learning / Data Mining 
5. Text Mining / Natural Language Processing 
6. Data Visualization 
7. Big Data (Hadoop, MapReduce, NoSQL) 
8. Data Ingestion 
9. Data Munging or Data Wrangling 
10. Toolbox (Weka, …, Spark, Storm, …, Sqoop, RHIPE, etc.) 
3 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
DATA SCIENCE METRO MAP 
4 Copyright © 2013-2014 by Teradata. All rights reserved. 
BECOMING A DATA SCIENTIST
MASSIVE OPEN ONLINE COURSES (MOOC) 
• Aggregator 
> http://www.mooc-list.com 
• Platforms 
> https://www.coursera.org 
> https://www.edx.org 
> https://www.open2study.com 
> https://www.udacity.com 
> https://www.udemy.com 
> http://online.stanford.edu 
• Interactive platforms 
> http://www.codecademy.com 
> https://www.datacamp.com 
5 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
WANT TO WORK AS DATA SCIENTIST? 
6 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
DATA SCIENCE & ANALYTICS 
• Coursera 
> Core Concepts in Data Analysis 
https://www.coursera.org/course/datan 
> Introduction to Data Science: 
https://www.coursera.org/course/datasci 
> Data Science Specialization: 
https://www.coursera.org/specialization/jhudatascience/1 
– 9 courses + 1 capstone project 
– Each course or capstone takes 4 weeks 
– You can do it for free or you can pay 49 USD for certification 
> Welcome To Process Mining: Data science in Action! 
https://www.coursera.org/course/procmin 
7 Copyright © 2013-2014 by Teradata. All rights reserved. 
1 
@dspadawan
DATA SCIENCE & ANALYTICS 1 
• Edx 
> The Analytics Edge 
http://www.edx.org/course/mitx/mitx-15-071x-analytics-edge- 
1416 
> Data, Analytics and Learning 
http://www.edx.org/course/utarlingtonx/utarlingtonx-link5-10x-data- 
analytics-2186 
• Udacity 
$ 
> Intro to Data Science 
https://www.udacity.com/course/ud359 
8 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
MATH DANCE 
9 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
STATISTICS COURSES 
• Coursera 
> Data analysis and statistical inference: 
https://www.coursera.org/course/statistics 
> Statistical inference and exploratory data analysis: 
https://www.coursera.org/specialization/jhudatascience/1/courses 
• EdX 
> Introduction to Statistics: Descriptive Statistics 
http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-1x-introduction- 
1138 
> Introduction to Statistics: Probability 
http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-2x-introduction- 
1534 
> Introduction to Statistics: Inference 
http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-3x-introduction- 
1533 
10 Copyright © 2013-2014 by Teradata. All rights reserved. 
2 
@dspadawan
STATISTICS COURSES CONT. 2 
• Udacity 
$ 
> Intro to statistics: 
https://www.udacity.com/course/st101 
> Exploratory data analysis: 
https://www.udacity.com/course/ud651 
> Intro to Inferential Statistics 
https://www.udacity.com/course/ud201 
• Mathematical monk 
> https://www.youtube.com/playlist?list=PL17567A1A3F5DB5E4 
11 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
PROGRAMMING LANGUAGES 
• Analysis/Data mining: 
> R language 
> Python 
> SQL 
> (Perl) 
> (Octave) 
• Big Data (Hadoop) 
> Java (!) 
> Python 
• Visualization 
> JavaScript 
12 Copyright © 2013-2014 by Teradata. All rights reserved. 
3 
@dspadawan
R LANGUAGE 
• Basic info and SW 
> R Language: 
http://www.r-project.org 
> R Studio (IDE): 
http://www.rstudio.com 
• Courses 
> R Programming: 
https://www.coursera.org/course/rprog 
• Practice 
> Interactive courses: 
https://www.datacamp.com/courses 
> Data mining examples in R: 
http://www.rdatamining.com 
13 Copyright © 2013-2014 by Teradata. All rights reserved. 
3 
@dspadawan
PYTHON 
• Basic info and SW: 
> Python language: 
https://www.python.org 
> Eclipse Python: 
http://pydev.org 
• Python for Java developers: 
> http://www.sthurlow.com/python 
• Google's Python Class 
> https://developers.google.com/edu/python 
• Code Academy Python 
> http://www.codecademy.com/tracks/python 
14 Copyright © 2013-2014 by Teradata. All rights reserved. 
3 
@dspadawan
OCTAVE 
• Basic info and SW: 
> http://octave.sourceforge.net 
> https://gnu.org/software/octave 
> http://en.wikipedia.org/wiki/GNU_Octave 
• Coursera: 
> Machine learning: https://www.coursera.org/course/ml 
15 Copyright © 2013-2014 by Teradata. All rights reserved. 
3 
Octave is mostly 
compatible with 
MatLab. 
@dspadawan
MACHINE LEARNING COURSES 
Subfield of computer 
science and artificial 
intelligence about 
learn from data. 
• Coursera 
> Machine Learning (Stanford): 
https://www.coursera.org/course/ml 
> Machine Learning: (University of Washington) 
https://www.coursera.org/course/machlearning 
> Practical Machine Learning (Johns Hopkins): 
https://www.coursera.org/course/predmachlearn 
– part of Data Science Specialization 
• Udacity 
> Machine Learning (Supervised, Reinforcement, Unsupervised) 
https://www.udacity.com/course/ud675 
https://www.udacity.com/course/ud820 
https://www.udacity.com/course/ud741 
16 Copyright © 2013-2014 by Teradata. All rights reserved. 
4A 
$ 
@dspadawan
MACHINE LEARNING VIDEOS 
• Udemy 
> Hilary Mason: An Intro to Machine Learning with Web Data 
https://www.udemy.com/hilary-mason-an-intro-to-machine-learning- 
with-web-data 
> Hilary Mason: Advanced Machine Learning 
https://www.udemy.com/hilary-mason-advanced-machine-learning/ 
• Mathematical monk 
> https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA 
• Videolectures.net 
> http://blog.videolectures.net/100-most-popular-machine-learning- 
talks-at-videolectures-net/ 
17 Copyright © 2013-2014 by Teradata. All rights reserved. 
4A 
$ 
@dspadawan
DATA MINING COURSES 
Process of discovery 
patterns in large data 
sets via machine 
learning or statistics. 
• Coursera 
> Mining Massive Datasets 
(Stanford) 
https://www.coursera.org/course/mmds 
• Udemy 
> Matthew Russell on Mining the Social Web 
https://www.udemy.com/matthew-russell-on-mining-the-social-web/ 
> Data Mining 
https://www.udemy.com/data-mining 
• Web page 
> http://www.rdatamining.com 
18 Copyright © 2013-2014 by Teradata. All rights reserved. 
4B 
$ 
@dspadawan
DATA MINING COURSES & TOOLS 
• Courses: 
> Data Mining with Weka: 
https://weka.waikato.ac.nz/dataminingwithweka/preview 
> More Data Mining with Weka: 
https://weka.waikato.ac.nz/moredataminingwithweka 
• Weka 
> SW: http://www.cs.waikato.ac.nz/ml/weka 
• Knime 
> SW: https://www.knime.org/downloads/overview 
• RapidMiner 
> Official site: http://rapidminer.com 
> SW: http://sourceforge.net/projects/rapidminer 
19 Copyright © 2013-2014 by Teradata. All rights reserved. 
4B 
@dspadawan
TEXT MINING 5A 
• R Data Mining (Word Cloud) 
TOP RECURRING THEMES ABOUT BIG DATA 
> http://www.rdatamining.com/examples/text-mining 
• Videolectures.net 
> http://videolectures.net/Top/Computer_Science/Text_Mining 
• Tool (Word Cloud) 
> Wordle.net 
20 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
NATURAL LANGUAGE PROCESSING COURSES 
• Coursera 
> Natural Language Processing 
Subfield of computer 
science and artificial 
intelligence and 
linguistics. 
(Columbia University): 
https://www.coursera.org/course/nlangp 
> Natural Language Processing (Stanford): 
https://www.coursera.org/course/nlp 
• Deeper Learning MOOC 
> http://dlmooc.deeper-learning.org/ 
• Wikipedia 
> http://en.wikipedia.org/wiki/Natural_language_processing 
21 Copyright © 2013-2014 by Teradata. All rights reserved. 
5B 
@dspadawan
VISUALIZATION TOOLS 6 
• Tableau 
> http://www.tableausoftware.com 
> Commercial visualization software 
• D3.js 
> http://d3js.org 
> Data Driven document visualization library 
• GraphViz 
> http://www.graphviz.org 
> Graph visualization tools 
• Gephi 
> https://gephi.github.io 
> Visualization platform 
22 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
TABLEAU 6 
• Trainings 
> http://www.tableausoftware.com/learn/training 
> On demand 
> Live Online planned for specific topic 
• Download 
> Tableau Public: http://www.tableausoftware.com/public 
> Tableau Trial: http://www.tableausoftware.com/products/trial 
• Certification 
> Desktop (Qualified associate, Certified Professional) 
> Server (Qualified associate, Certified Professional) 
> http://www.tableausoftware.com/support/certification 
23 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
HOW BIG, IS BIG ENOUGH? 
24 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
BIG DATA STUDY 7 
• MOOC 
> http://bigdatauniversity.com 
> http://bigdatacourse.appspot.com 
• Coursera 
> Web Intelligence and Big Data 
https://www.coursera.org/course/bigdata 
• Udemy 
$ 
> Big Data and Hadoop Essentials 
https://www.udemy.com/big-data-and-hadoop-essentials-free-tutorial 
• Open2Study 
> Big Data for Better Performance 
http://www.open2study.com/courses/big-data-for-better-performance 
25 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
BIG DATA TOOLS 
• Hadoop – Big Data Framework 
• Hive – DWH infrastructure build on top of Hadoop 
• HBase – Non-relational, distributed DB 
• Pig – Hadoop programming tool 
• Storm – Real time computation system for Hadoop 
• Solr – Search platform 
• Falcon – Data management and processing for Hadoop 
• Sqoop – CMD application for transfer data into Hadoop 
• Flume – Large scale log aggregation framework 
• Oozie – Workflow scheduler for Hadoop 
• Ambari – Simpler management for Hadoop clusters 
• Mahout – Machine Learning algorithms implemented on Hadoop 
• ZooKeeper – Coordination service for distributed applications 
• Knox - REST API Gateway for interacting with Hadoop clusters 
26 Copyright © 2013-2014 by Teradata. All rights reserved. 
7 
@dspadawan
HADOOP STUDY 
• Hadoop providers 
> http://www.cloudera.com 
> http://hortonworks.com 
> http://www.mapr.com 
> http://www.teradata.com/aster 
• Udacity 
> Intro to Hadoop and MapReduce 
https://www.udacity.com/course/ud617 
• Udemy 
> Become a Certified Hadoop Developer | Training | Tutorial 
https://www.udemy.com/hadoop-tutorial 
27 Copyright © 2013-2014 by Teradata. All rights reserved. 
7 
There is more 
Hadoop providers: 
IBM, Pivotal, etc. 
$ 
$ 
@dspadawan
NOT ONLY SQL DATABASES 
• MongoDB – JSON document store 
> http://www.mongodb.com 
> https://university.mongodb.com 
• CouchDB – JSON document store 
> http://couchdb.apache.org 
• CasandraDB – High performance column oriented DB 
> http://cassandra.apache.org 
• VoltDB – In-memory database 
> http://voltdb.com 
• Redis – High performance column oriented DB 
> http://redis.io 
• NuoDB – Distributed SQL DB 
> http://www.nuodb.com 
28 Copyright © 2013-2014 by Teradata. All rights reserved. 
7 
@dspadawan
BIG DATA UNIVERSITY 7 
• Big Data Courses path: 
> Big Data Fundamentals 
> Hadoop Fundamentals 
> Moving Data into Hadoop (Sqoop and Flume tools) 
> Query languages for Hadoop (Hive, Pig and Jaql) 
> SQL Access for Hadoop 
> Using HBase for Real-time Access to your Big Data 
> Accessing Hadoop Data Using Hive 
> Introduction to Pig 
> Controlling Hadoop Jobs using Oozie 
> Hadoop Reporting and Analysis 
> Introduction to MapReduce Programming 
• Courses are provided by IBM 
29 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
IT IS EVEN BETTER, DON’T YOU THINK? 
30 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
CLOUDERA HADOOP 7 
• Tutorials 
> 8 different paths 
> On demand and free 
> Lectured together with Udacity (paid on monthly basis) 
> http://cloudera.com/content/cloudera/en/training/courses.html 
> http://cloudera.com/content/cloudera/en/training/library.html 
• Sandbox 
> http://cloudera.com/content/support/en/downloads/quickstart_v 
ms/cdh-5-1-x1.html 
• Certification 
> 200 USD per exam 
> http://cloudera.com/content/cloudera/en/training/certification.ht 
ml 
31 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
HORTONWORKS HADOOP 7 
• Tutorials 
> http://hortonworks.com/tutorials 
> 3 paths for 
– Developers 
– Administrators 
– Data Scientists 
• Sandbox 
> http://hortonworks.com/hdp/downloads 
• Certifications 
> 200 USD per exam 
> http://hortonworks.com/training/certification 
32 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
MAPR HADOOP 7 
• Tutorials 
> https://www.mapr.com/services/mapr-academy/training-videos 
> 3 paths for 
– Developers 
– Administrators 
– Business users 
• Sandbox 
> https://www.mapr.com/products/mapr-sandbox-hadoop 
• Certification 
> For administrator only 
> You must pass Hadoop Cluster Administration on MapR course 
> https://www.mapr.com/services/mapr-academy/certification 
33 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
STREAMING – NO BIG DEAL 
34 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
STREAMING DATA PROCESSING 
• Storm (https://storm.incubator.apache.org) 
• Open source (ASF) real-time Hadoop 
• Twitter project 
• Spark (https://spark.apache.org) 
• Open source (ASF) in-memory Hadoop 
• Apache project 
• S4 (http://incubator.apache.org/s4) 
• Open source (ASF) processing of stream data 
• Yahoo project 
• Samza (http://samza.incubator.apache.org) 
• Open source processing messagining data 
• LinkedIn project 
35 Copyright © 2013-2014 by Teradata. All rights reserved. 
7 
@dspadawan
DATA INGESTION 8 
• Techniques 
Process of obtaining, 
importing and 
processing data for 
later use or storage. 
> Data import and export 
> Data fusion – integration multiple data 
> Data sampling – selection of data subset (rows) 
> Data discovery – detection patterns in data 
> Exploratory data analysis – summarize main data characteristics 
> Feature extraction – selection of data subset (columns) 
> Data scrubbing – data error correction 
> Missing data values – data correction 
> Etc. 
36 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
DATA WRANGLING / DATA MUNGING 9 
• Coursera 
> Getting and Cleaning Data 
Converting or 
mapping data from 
one "raw" form into 
another format. 
part of Data Science Specialization 
https://www.coursera.org/course/getdata 
• Udacity 
$ 
> Data Wrangling with MongoDB 
https://www.udacity.com/course/ud032 
• School of Data 
> Many different courses http://schoolofdata.org 
• Tools 
> OpenRefine, DataWrangler – clean up and transform tools 
> Talend, Pentaho – integration 
37 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
TOOLBOX 10 
• Hadoop and realtime 
> Apache Scibe 
• Machine Learning 
> H2O – In memory machine learning 
• Data Mining 
> Rattle – GUI for DM using R 
• Python and NLP 
> NLTK = Natural Language ToolKit for Python 
• R and Hadoop 
> RHIPE = R + Hadoop Integrated Programming Environment 
• Visualization 
> Many Eyes – Online visualization system from IBM 
38 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
ONLINE SOURCES 
• Data Science Servers: 
> http://www.datasciencecentral.com 
> http://www.hadoop360.com 
> http://www.datascienceweekly.org 
• Aggregators 
> https://trello.com/b/rbpEfMld/data-science 
• Blogs 
• http://datasciencemasters.org 
• http://www.kdnuggets.com 
• http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro- 
to-data-science 
• http://datascience101.wordpress.com 
• http://fivethirtyeight.blogs.nytimes.com 
39 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
FREE BOOKS 
• Data Science 
> Doing Data Science 
> Agile Data Science 
> Data Science for Business 
• Statistics 
> Think Stats 
• Programming 
> R language 
– 25 Recipes for Getting Started with R 
– Learning R 
> Python 
– Learning Python, 5th Edition 
– Think Python 
40 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
FREE BOOKS CONTINUED 
• Machine Learning / Data Mining 
> Machine Learning for Hackers 
> Mining the Social Web 
• Visualization 
> Visualizing Data 
> Getting Started with D3 
> Communicating Data with Tableau 
• Text mining / Natural Language Processing 
> 21 Recipes for Mining Twitter 
> Natural Language Processing with Python 
> Natural Language Annotation for Machine Learning 
41 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
FREE BOOKS CONTINUED 
• Big Data 
> Hadoop: The Definitive Guide, 3rd Edition 
> Ethics of Big Data 
> Big Data Analytics with R and Hadoop 
• Data Ingestion 
> Data Analysis with Open Source Tools 
> Python for Data Analysis 
• Data Wrangling and Munging 
> Using OpenRefine 
• Toolbox 
$ 
> Getting Started with Storm 
> Fast Data Processing with Spark 
42 Copyright © 2013-2014 by Teradata. All rights reserved. 
@dspadawan
QUESTIONS AND ANSWERS 
43 Copyright © 2013-2014 by Teradata. All rights reserved. 
By Tara Laskowski 
@dspadawan 
Contact me at datasciencepadawan@gmail.com 
Follow me at twitter @dspadawan 
Read my blog http://datasciencepadawan.blogspot.com

More Related Content

PPTX
Apache Storm
PDF
Webinar: Big Data & Hadoop - When not to use Hadoop
PPTX
Hadoop for Java Professionals
PDF
Spark streaming
PDF
Data Manipulation at Scale Systems and Algorithms
PDF
Data Manipulation at Scale Systems and Algorithms
PDF
From DARPA to Shakespeare: All the Data we Can Handle
PPT
03 preprocessing
Apache Storm
Webinar: Big Data & Hadoop - When not to use Hadoop
Hadoop for Java Professionals
Spark streaming
Data Manipulation at Scale Systems and Algorithms
Data Manipulation at Scale Systems and Algorithms
From DARPA to Shakespeare: All the Data we Can Handle
03 preprocessing

Viewers also liked (17)

PPTX
Social BPM
PDF
Deep Learning in theano
PDF
Deep learning
PDF
Data science
PPT
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
PPT
Selection and on boarding process
PDF
Machine Learning and Data Mining: 15 Data Exploration and Preparation
PDF
Intégration des données avec Talend ETL
PDF
Demystifying Data Science with an introduction to Machine Learning
PDF
Ideation and Design Principles Workshop
PPT
Capturing Data Requirements
PDF
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
PPTX
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
PDF
RMPG Learning Series CRM Workshop Day 1 session 3
PPTX
기계학습 / 딥러닝이란 무엇인가
PDF
The Field Guide to Data Science
 
PDF
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Social BPM
Deep Learning in theano
Deep learning
Data science
Big Data, Bigger Campaigns: Using IBM’s Unica and Netezza Platforms to Increa...
Selection and on boarding process
Machine Learning and Data Mining: 15 Data Exploration and Preparation
Intégration des données avec Talend ETL
Demystifying Data Science with an introduction to Machine Learning
Ideation and Design Principles Workshop
Capturing Data Requirements
2016 kcd 세미나 발표자료. 구글포토로 바라본 인공지능과 머신러닝
Talk on Industrial Internet of Things @ Intelligent systems tech forum 2014
RMPG Learning Series CRM Workshop Day 1 session 3
기계학습 / 딥러닝이란 무엇인가
The Field Guide to Data Science
 
Webinar Smile et Talend : Faites communiquer vos applications en temps réel
Ad

Similar to Big data and data science study (20)

PPTX
Big Data Analytics V2
PDF
Big data analytics 1
PDF
00-01 DSnDA.pdf
PDF
Course 8 : How to start your big data project by Eric Rodriguez
PDF
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
PPTX
Chapter 1 Introduction to Datascience (1).pptx
PPTX
selected topics in CS-CHaaapteerobe.pptx
PPT
Data Munging in concepts of data mining in DS
PPTX
Unit 1 Introduction to Data Analytics .pptx
PDF
Thinkful DC - Intro to Data Science
PDF
How to crack down big data?
PPTX
Big Data Tutorial V4
PPTX
Data science.chapter-1,2,3
PDF
Lecture1 introduction to big data
PPTX
Big Data Analysis : Deciphering the haystack
PPTX
Big data and data mining
PPTX
2016 Chapter 2 - Intro. to Data Sciences.pptx
PPTX
Data Science presentation for explanation of numpy and pandas
PPT
Data_Science.ppt
PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Big Data Analytics V2
Big data analytics 1
00-01 DSnDA.pdf
Course 8 : How to start your big data project by Eric Rodriguez
ESWC SS 2013 - Wednesday Tutorial Marko Grobelnik: Introduction to Big Data A...
Chapter 1 Introduction to Datascience (1).pptx
selected topics in CS-CHaaapteerobe.pptx
Data Munging in concepts of data mining in DS
Unit 1 Introduction to Data Analytics .pptx
Thinkful DC - Intro to Data Science
How to crack down big data?
Big Data Tutorial V4
Data science.chapter-1,2,3
Lecture1 introduction to big data
Big Data Analysis : Deciphering the haystack
Big data and data mining
2016 Chapter 2 - Intro. to Data Sciences.pptx
Data Science presentation for explanation of numpy and pandas
Data_Science.ppt
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Ad

Big data and data science study

  • 1. Background © Jim Kaskade: Big Data BIG DATA AND DATA SCIENCE study materials and online courses by @dspadawan
  • 2. WHAT IS DATA SCIENCE 2 Copyright © 2013-2014 by Teradata. All rights reserved. THE DATA SCIENCE VENN DIAGRAM @dspadawan
  • 3. DATA SCIENCE DOMAINS All links go to Wiki. If you are not sure what something means you can learn. 1. Data Science (Fundamentals) 2. Statistics 3. Programming languages 4. Machine Learning / Data Mining 5. Text Mining / Natural Language Processing 6. Data Visualization 7. Big Data (Hadoop, MapReduce, NoSQL) 8. Data Ingestion 9. Data Munging or Data Wrangling 10. Toolbox (Weka, …, Spark, Storm, …, Sqoop, RHIPE, etc.) 3 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 4. DATA SCIENCE METRO MAP 4 Copyright © 2013-2014 by Teradata. All rights reserved. BECOMING A DATA SCIENTIST
  • 5. MASSIVE OPEN ONLINE COURSES (MOOC) • Aggregator > http://www.mooc-list.com • Platforms > https://www.coursera.org > https://www.edx.org > https://www.open2study.com > https://www.udacity.com > https://www.udemy.com > http://online.stanford.edu • Interactive platforms > http://www.codecademy.com > https://www.datacamp.com 5 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 6. WANT TO WORK AS DATA SCIENTIST? 6 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 7. DATA SCIENCE & ANALYTICS • Coursera > Core Concepts in Data Analysis https://www.coursera.org/course/datan > Introduction to Data Science: https://www.coursera.org/course/datasci > Data Science Specialization: https://www.coursera.org/specialization/jhudatascience/1 – 9 courses + 1 capstone project – Each course or capstone takes 4 weeks – You can do it for free or you can pay 49 USD for certification > Welcome To Process Mining: Data science in Action! https://www.coursera.org/course/procmin 7 Copyright © 2013-2014 by Teradata. All rights reserved. 1 @dspadawan
  • 8. DATA SCIENCE & ANALYTICS 1 • Edx > The Analytics Edge http://www.edx.org/course/mitx/mitx-15-071x-analytics-edge- 1416 > Data, Analytics and Learning http://www.edx.org/course/utarlingtonx/utarlingtonx-link5-10x-data- analytics-2186 • Udacity $ > Intro to Data Science https://www.udacity.com/course/ud359 8 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 9. MATH DANCE 9 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 10. STATISTICS COURSES • Coursera > Data analysis and statistical inference: https://www.coursera.org/course/statistics > Statistical inference and exploratory data analysis: https://www.coursera.org/specialization/jhudatascience/1/courses • EdX > Introduction to Statistics: Descriptive Statistics http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-1x-introduction- 1138 > Introduction to Statistics: Probability http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-2x-introduction- 1534 > Introduction to Statistics: Inference http://www.edx.org/course/uc-berkeleyx/uc-berkeleyx-stat2-3x-introduction- 1533 10 Copyright © 2013-2014 by Teradata. All rights reserved. 2 @dspadawan
  • 11. STATISTICS COURSES CONT. 2 • Udacity $ > Intro to statistics: https://www.udacity.com/course/st101 > Exploratory data analysis: https://www.udacity.com/course/ud651 > Intro to Inferential Statistics https://www.udacity.com/course/ud201 • Mathematical monk > https://www.youtube.com/playlist?list=PL17567A1A3F5DB5E4 11 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 12. PROGRAMMING LANGUAGES • Analysis/Data mining: > R language > Python > SQL > (Perl) > (Octave) • Big Data (Hadoop) > Java (!) > Python • Visualization > JavaScript 12 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
  • 13. R LANGUAGE • Basic info and SW > R Language: http://www.r-project.org > R Studio (IDE): http://www.rstudio.com • Courses > R Programming: https://www.coursera.org/course/rprog • Practice > Interactive courses: https://www.datacamp.com/courses > Data mining examples in R: http://www.rdatamining.com 13 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
  • 14. PYTHON • Basic info and SW: > Python language: https://www.python.org > Eclipse Python: http://pydev.org • Python for Java developers: > http://www.sthurlow.com/python • Google's Python Class > https://developers.google.com/edu/python • Code Academy Python > http://www.codecademy.com/tracks/python 14 Copyright © 2013-2014 by Teradata. All rights reserved. 3 @dspadawan
  • 15. OCTAVE • Basic info and SW: > http://octave.sourceforge.net > https://gnu.org/software/octave > http://en.wikipedia.org/wiki/GNU_Octave • Coursera: > Machine learning: https://www.coursera.org/course/ml 15 Copyright © 2013-2014 by Teradata. All rights reserved. 3 Octave is mostly compatible with MatLab. @dspadawan
  • 16. MACHINE LEARNING COURSES Subfield of computer science and artificial intelligence about learn from data. • Coursera > Machine Learning (Stanford): https://www.coursera.org/course/ml > Machine Learning: (University of Washington) https://www.coursera.org/course/machlearning > Practical Machine Learning (Johns Hopkins): https://www.coursera.org/course/predmachlearn – part of Data Science Specialization • Udacity > Machine Learning (Supervised, Reinforcement, Unsupervised) https://www.udacity.com/course/ud675 https://www.udacity.com/course/ud820 https://www.udacity.com/course/ud741 16 Copyright © 2013-2014 by Teradata. All rights reserved. 4A $ @dspadawan
  • 17. MACHINE LEARNING VIDEOS • Udemy > Hilary Mason: An Intro to Machine Learning with Web Data https://www.udemy.com/hilary-mason-an-intro-to-machine-learning- with-web-data > Hilary Mason: Advanced Machine Learning https://www.udemy.com/hilary-mason-advanced-machine-learning/ • Mathematical monk > https://www.youtube.com/playlist?list=PLD0F06AA0D2E8FFBA • Videolectures.net > http://blog.videolectures.net/100-most-popular-machine-learning- talks-at-videolectures-net/ 17 Copyright © 2013-2014 by Teradata. All rights reserved. 4A $ @dspadawan
  • 18. DATA MINING COURSES Process of discovery patterns in large data sets via machine learning or statistics. • Coursera > Mining Massive Datasets (Stanford) https://www.coursera.org/course/mmds • Udemy > Matthew Russell on Mining the Social Web https://www.udemy.com/matthew-russell-on-mining-the-social-web/ > Data Mining https://www.udemy.com/data-mining • Web page > http://www.rdatamining.com 18 Copyright © 2013-2014 by Teradata. All rights reserved. 4B $ @dspadawan
  • 19. DATA MINING COURSES & TOOLS • Courses: > Data Mining with Weka: https://weka.waikato.ac.nz/dataminingwithweka/preview > More Data Mining with Weka: https://weka.waikato.ac.nz/moredataminingwithweka • Weka > SW: http://www.cs.waikato.ac.nz/ml/weka • Knime > SW: https://www.knime.org/downloads/overview • RapidMiner > Official site: http://rapidminer.com > SW: http://sourceforge.net/projects/rapidminer 19 Copyright © 2013-2014 by Teradata. All rights reserved. 4B @dspadawan
  • 20. TEXT MINING 5A • R Data Mining (Word Cloud) TOP RECURRING THEMES ABOUT BIG DATA > http://www.rdatamining.com/examples/text-mining • Videolectures.net > http://videolectures.net/Top/Computer_Science/Text_Mining • Tool (Word Cloud) > Wordle.net 20 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 21. NATURAL LANGUAGE PROCESSING COURSES • Coursera > Natural Language Processing Subfield of computer science and artificial intelligence and linguistics. (Columbia University): https://www.coursera.org/course/nlangp > Natural Language Processing (Stanford): https://www.coursera.org/course/nlp • Deeper Learning MOOC > http://dlmooc.deeper-learning.org/ • Wikipedia > http://en.wikipedia.org/wiki/Natural_language_processing 21 Copyright © 2013-2014 by Teradata. All rights reserved. 5B @dspadawan
  • 22. VISUALIZATION TOOLS 6 • Tableau > http://www.tableausoftware.com > Commercial visualization software • D3.js > http://d3js.org > Data Driven document visualization library • GraphViz > http://www.graphviz.org > Graph visualization tools • Gephi > https://gephi.github.io > Visualization platform 22 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 23. TABLEAU 6 • Trainings > http://www.tableausoftware.com/learn/training > On demand > Live Online planned for specific topic • Download > Tableau Public: http://www.tableausoftware.com/public > Tableau Trial: http://www.tableausoftware.com/products/trial • Certification > Desktop (Qualified associate, Certified Professional) > Server (Qualified associate, Certified Professional) > http://www.tableausoftware.com/support/certification 23 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 24. HOW BIG, IS BIG ENOUGH? 24 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 25. BIG DATA STUDY 7 • MOOC > http://bigdatauniversity.com > http://bigdatacourse.appspot.com • Coursera > Web Intelligence and Big Data https://www.coursera.org/course/bigdata • Udemy $ > Big Data and Hadoop Essentials https://www.udemy.com/big-data-and-hadoop-essentials-free-tutorial • Open2Study > Big Data for Better Performance http://www.open2study.com/courses/big-data-for-better-performance 25 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 26. BIG DATA TOOLS • Hadoop – Big Data Framework • Hive – DWH infrastructure build on top of Hadoop • HBase – Non-relational, distributed DB • Pig – Hadoop programming tool • Storm – Real time computation system for Hadoop • Solr – Search platform • Falcon – Data management and processing for Hadoop • Sqoop – CMD application for transfer data into Hadoop • Flume – Large scale log aggregation framework • Oozie – Workflow scheduler for Hadoop • Ambari – Simpler management for Hadoop clusters • Mahout – Machine Learning algorithms implemented on Hadoop • ZooKeeper – Coordination service for distributed applications • Knox - REST API Gateway for interacting with Hadoop clusters 26 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
  • 27. HADOOP STUDY • Hadoop providers > http://www.cloudera.com > http://hortonworks.com > http://www.mapr.com > http://www.teradata.com/aster • Udacity > Intro to Hadoop and MapReduce https://www.udacity.com/course/ud617 • Udemy > Become a Certified Hadoop Developer | Training | Tutorial https://www.udemy.com/hadoop-tutorial 27 Copyright © 2013-2014 by Teradata. All rights reserved. 7 There is more Hadoop providers: IBM, Pivotal, etc. $ $ @dspadawan
  • 28. NOT ONLY SQL DATABASES • MongoDB – JSON document store > http://www.mongodb.com > https://university.mongodb.com • CouchDB – JSON document store > http://couchdb.apache.org • CasandraDB – High performance column oriented DB > http://cassandra.apache.org • VoltDB – In-memory database > http://voltdb.com • Redis – High performance column oriented DB > http://redis.io • NuoDB – Distributed SQL DB > http://www.nuodb.com 28 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
  • 29. BIG DATA UNIVERSITY 7 • Big Data Courses path: > Big Data Fundamentals > Hadoop Fundamentals > Moving Data into Hadoop (Sqoop and Flume tools) > Query languages for Hadoop (Hive, Pig and Jaql) > SQL Access for Hadoop > Using HBase for Real-time Access to your Big Data > Accessing Hadoop Data Using Hive > Introduction to Pig > Controlling Hadoop Jobs using Oozie > Hadoop Reporting and Analysis > Introduction to MapReduce Programming • Courses are provided by IBM 29 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 30. IT IS EVEN BETTER, DON’T YOU THINK? 30 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 31. CLOUDERA HADOOP 7 • Tutorials > 8 different paths > On demand and free > Lectured together with Udacity (paid on monthly basis) > http://cloudera.com/content/cloudera/en/training/courses.html > http://cloudera.com/content/cloudera/en/training/library.html • Sandbox > http://cloudera.com/content/support/en/downloads/quickstart_v ms/cdh-5-1-x1.html • Certification > 200 USD per exam > http://cloudera.com/content/cloudera/en/training/certification.ht ml 31 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 32. HORTONWORKS HADOOP 7 • Tutorials > http://hortonworks.com/tutorials > 3 paths for – Developers – Administrators – Data Scientists • Sandbox > http://hortonworks.com/hdp/downloads • Certifications > 200 USD per exam > http://hortonworks.com/training/certification 32 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 33. MAPR HADOOP 7 • Tutorials > https://www.mapr.com/services/mapr-academy/training-videos > 3 paths for – Developers – Administrators – Business users • Sandbox > https://www.mapr.com/products/mapr-sandbox-hadoop • Certification > For administrator only > You must pass Hadoop Cluster Administration on MapR course > https://www.mapr.com/services/mapr-academy/certification 33 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 34. STREAMING – NO BIG DEAL 34 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 35. STREAMING DATA PROCESSING • Storm (https://storm.incubator.apache.org) • Open source (ASF) real-time Hadoop • Twitter project • Spark (https://spark.apache.org) • Open source (ASF) in-memory Hadoop • Apache project • S4 (http://incubator.apache.org/s4) • Open source (ASF) processing of stream data • Yahoo project • Samza (http://samza.incubator.apache.org) • Open source processing messagining data • LinkedIn project 35 Copyright © 2013-2014 by Teradata. All rights reserved. 7 @dspadawan
  • 36. DATA INGESTION 8 • Techniques Process of obtaining, importing and processing data for later use or storage. > Data import and export > Data fusion – integration multiple data > Data sampling – selection of data subset (rows) > Data discovery – detection patterns in data > Exploratory data analysis – summarize main data characteristics > Feature extraction – selection of data subset (columns) > Data scrubbing – data error correction > Missing data values – data correction > Etc. 36 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 37. DATA WRANGLING / DATA MUNGING 9 • Coursera > Getting and Cleaning Data Converting or mapping data from one "raw" form into another format. part of Data Science Specialization https://www.coursera.org/course/getdata • Udacity $ > Data Wrangling with MongoDB https://www.udacity.com/course/ud032 • School of Data > Many different courses http://schoolofdata.org • Tools > OpenRefine, DataWrangler – clean up and transform tools > Talend, Pentaho – integration 37 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 38. TOOLBOX 10 • Hadoop and realtime > Apache Scibe • Machine Learning > H2O – In memory machine learning • Data Mining > Rattle – GUI for DM using R • Python and NLP > NLTK = Natural Language ToolKit for Python • R and Hadoop > RHIPE = R + Hadoop Integrated Programming Environment • Visualization > Many Eyes – Online visualization system from IBM 38 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 39. ONLINE SOURCES • Data Science Servers: > http://www.datasciencecentral.com > http://www.hadoop360.com > http://www.datascienceweekly.org • Aggregators > https://trello.com/b/rbpEfMld/data-science • Blogs • http://datasciencemasters.org • http://www.kdnuggets.com • http://www.zipfianacademy.com/blog/post/46864003608/a-practical-intro- to-data-science • http://datascience101.wordpress.com • http://fivethirtyeight.blogs.nytimes.com 39 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 40. FREE BOOKS • Data Science > Doing Data Science > Agile Data Science > Data Science for Business • Statistics > Think Stats • Programming > R language – 25 Recipes for Getting Started with R – Learning R > Python – Learning Python, 5th Edition – Think Python 40 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 41. FREE BOOKS CONTINUED • Machine Learning / Data Mining > Machine Learning for Hackers > Mining the Social Web • Visualization > Visualizing Data > Getting Started with D3 > Communicating Data with Tableau • Text mining / Natural Language Processing > 21 Recipes for Mining Twitter > Natural Language Processing with Python > Natural Language Annotation for Machine Learning 41 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 42. FREE BOOKS CONTINUED • Big Data > Hadoop: The Definitive Guide, 3rd Edition > Ethics of Big Data > Big Data Analytics with R and Hadoop • Data Ingestion > Data Analysis with Open Source Tools > Python for Data Analysis • Data Wrangling and Munging > Using OpenRefine • Toolbox $ > Getting Started with Storm > Fast Data Processing with Spark 42 Copyright © 2013-2014 by Teradata. All rights reserved. @dspadawan
  • 43. QUESTIONS AND ANSWERS 43 Copyright © 2013-2014 by Teradata. All rights reserved. By Tara Laskowski @dspadawan Contact me at datasciencepadawan@gmail.com Follow me at twitter @dspadawan Read my blog http://datasciencepadawan.blogspot.com

Editor's Notes