SlideShare a Scribd company logo
3
Most read
4
Most read
24
Most read
Roadmap to Data
Science
by
SWAPNIL NARAYAN
Microsoft | IIT | Hacker Cup
About the
Instructor
Hey there,
I’m Swapnil Narayan, a graduate from
IIT(ISM) Dhanbad with Computer
Science majors.
I’m a Software Engineer at Microsoft
India, and have also got offers from
Amazon and Oracle for Software
Engineering roles.
I’m a very passionate Competitive
Programming Instructor and have a
decent experience for the same at
various popular edtech platforms, and
have taken sessions with IITs, NITs, and
other engineering colleges .
I will be your mentor for this session and
will walk you through the topics the
following slides.
What is Data Science??
O Data Science as a multi-disciplinary subject encompasses the use
of mathematics, statistics, and computer science to study and
evaluate data. The key objective of Data Science is to extract
valuable information for use in strategic decision making,
product development, trend analysis and forecasting.
O A Data scientist is sort of 'jack-of-all-trades' for data crunching.
Basically, 3 main skills a data scientist needs to possess are
mathematics/statistics, computer programming literacy and
knowledge of particular business.
Data Science is a Broader Field
Comparison between Different Roles
in 2018
How to become a Data Scientist??
Math
Programming
Languages
Data Wrangling and
Management
Data Analysis and Visualization
Machine Learning
Deep Learning
Mathematics
O Linear Algebra: Matrix, Eigen, Tensor etc.
O Calculus: Differentiation and Integration.
O Probability: Bayes Theorem, Optimization etc.
O Statistics: Inferential Statistics, Descriptive Statistics, Chi
squared Testes, Random Variable, Gaussian And Normal
Distribution.
Programming Languages
O Python: It is the Bible.
→ Easy to understand, i.e., plane English
→ No semicolon
→ Simple and tons of libraries available
O Talk about Packages
→ Data visualization using ggplot2, tidy are extremely important
Libraries
Data Wrangling and Management
O Data Mining
O Data Cleaning
O Data Management
Relevant Skills:
→ MySQL: RDBMS
→ NoSQL: Mongo DB, Cassandra etc.
JOIN
Data Analysis and Visualization
O Plotting libraries in programming languages, e.g.,
• plotly, matplotlib, seaborn → python
• ggplot2 → R
• Tableau and PowerBI is booming now.
[Pandas and Numpy for Data Analysis]
Machine Learning and Deep Learning
O Domain Knowledge???
HEALTHCARE, BUSINESS, FINANCE, SPORTS etc.
Supervised Unsupervised Reinforcement
Machine Learning Algorithms
O Topics: Regression, Decision Tree, Random Forest, Naïve
Bayes, Ensemble Learning, AdaBoost, Hierarchical
Clustering, Association, k-means Clustering, SVM, KNN,
Gradient Descent, Cross Validation, Entropy, Accuracy,
Precision, Collaborative Filtration, PCA, Markov model,
Boltzmann theorem etc.
Testing Evaluation and Validation of Models
Deep Learning Algorithms
O Neural Networks, Feed Forward NN, Fuzzy Logic,
Sequence Model, LSTM, RNN, CNN, CapsNet, Time Series
etc
Big Data
O Map Reduce
O Hadoop
O Apache
O Spark
O Hive
O Pig
O Mahout
O Yarn
Additional Skills
NLP CV
Learning Outcomes
O Build artificial neural networks with Tensorflow and Keras
O Build Deep Learning networks to classify images with
Convolutional Neural Networks
O Implement machine learning, clustering, and search using TF/IDF
at massive scale with Apache Spark's MLLib
O Implement Sentiment Analysis with Recurrent Neural Networks
O Understand reinforcement learning - and how to build a Pac-Man
bot
O Make predictions using linear regression, polynomial
regression, and multivariate regression
O Implement Sentiment Analysis with Recurrent Neural
Networks
O Understand reinforcement learning - and how to build a
Pac-Man bot
O Classify medical test results with a wide variety of
supervised machine learning classification techniques
O Cluster data using K-Means clustering and Support Vector
Machines (SVM)
O Build a spam classifier using Naive Bayes
O Use decision trees to predict hiring decisions
O Apply dimensionality reduction with Principal Component
Analysis (PCA) to classify flowers
O Predict classifications using K-Nearest-Neighbor (KNN)
O Develop using iPython notebooks
O Understand statistical measures such as standard deviation
O Visualize data distributions, probability mass functions, and
probability density functions
O Visualize data with matplotlib
O Use covariance and correlation metrics
O Apply conditional probability for finding correlated
features
O Use Bayes' Theorem to identify false positives
O Understand complex multi-level models
O Use train/test and K-Fold cross validation to choose the
right model
O Build a movie recommender system using item-based and
user-based collaborative filtering
O Clean your input data to remove outliers
O Design and evaluate A/B tests using T-Tests and P-Values
Best Blogs and Open Source
Community
O Medium AI Community
O Official Documentations
O Github and Stackoverflow
O Kaggle- Spend 5 hours of a day here
O Cheat Sheets from Amazon aws
Best Books
For Machine/ Deep Learning Data Science
Beginners
Book
Statistics
Overview of Data Science Tools and
Packages
Thank You

More Related Content

PPTX
Data scientist roadmap
PPTX
Introduction to data science
PDF
Data science presentation
PDF
Brochure data science learning path board-infinity (1)
PDF
Data Analytics_BigData Cert
PDF
Artificial Intelligence Certification
Data scientist roadmap
Introduction to data science
Data science presentation
Brochure data science learning path board-infinity (1)
Data Analytics_BigData Cert
Artificial Intelligence Certification

Similar to Data Science Roadmap by Swapnil Microsoft (20)

PDF
848_VamsiKrishnaPenumadu_CEE
PPTX
A Comprehensive Learning Path to Become a Data Science 2021.pptx
PDF
603_SaiKiranPutta_CEE
PDF
ML MODULE 1_slideshare.pdf
PPTX
Roadmap of Data Science only for beginner
PDF
392_SannaReddyBharath (1)
PPTX
Workshop_Presentation.pptx
PDF
How to start your journey as a data scientist
PDF
Digicrome Data Science & AI 11 Month Course PDF.pdf
PDF
776_AlluruMPranav_CEE
PDF
362_NeelimaKandepu (1)
PDF
Miraj Vashi_CPEE
PDF
HiteshAgarwal_CPEE
PDF
797_NaveenKKapoor_CEE
PDF
438_AmeeruddinMohammed
PDF
Data Science: Notes and Toolkits
PPTX
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
PDF
402_DheerajKura
848_VamsiKrishnaPenumadu_CEE
A Comprehensive Learning Path to Become a Data Science 2021.pptx
603_SaiKiranPutta_CEE
ML MODULE 1_slideshare.pdf
Roadmap of Data Science only for beginner
392_SannaReddyBharath (1)
Workshop_Presentation.pptx
How to start your journey as a data scientist
Digicrome Data Science & AI 11 Month Course PDF.pdf
776_AlluruMPranav_CEE
362_NeelimaKandepu (1)
Miraj Vashi_CPEE
HiteshAgarwal_CPEE
797_NaveenKKapoor_CEE
438_AmeeruddinMohammed
Data Science: Notes and Toolkits
Altron presentation on Emerging Technologies: Data Science and Artificial Int...
402_DheerajKura
Ad

Recently uploaded (20)

PPTX
Global journeys: estimating international migration
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
PPT
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Database Infoormation System (DBIS).pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
Introduction to machine learning and Linear Models
PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PPTX
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
PPTX
climate analysis of Dhaka ,Banglades.pptx
PDF
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Global journeys: estimating international migration
Galatica Smart Energy Infrastructure Startup Pitch Deck
IB Computer Science - Internal Assessment.pptx
Data_Analytics_and_PowerBI_Presentation.pptx
TRAFFIC-MANAGEMENT-AND-ACCIDENT-INVESTIGATION-WITH-DRIVING-PDF-FILE.pdf
Chapter 2 METAL FORMINGhhhhhhhjjjjmmmmmmmmm
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
Introduction-to-Cloud-ComputingFinal.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Database Infoormation System (DBIS).pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
STUDY DESIGN details- Lt Col Maksud (21).pptx
Introduction to machine learning and Linear Models
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Supervised vs unsupervised machine learning algorithms
Introduction to Basics of Ethical Hacking and Penetration Testing -Unit No. 1...
climate analysis of Dhaka ,Banglades.pptx
22.Patil - Early prediction of Alzheimer’s disease using convolutional neural...
Ad

Data Science Roadmap by Swapnil Microsoft

  • 1. Roadmap to Data Science by SWAPNIL NARAYAN Microsoft | IIT | Hacker Cup
  • 2. About the Instructor Hey there, I’m Swapnil Narayan, a graduate from IIT(ISM) Dhanbad with Computer Science majors. I’m a Software Engineer at Microsoft India, and have also got offers from Amazon and Oracle for Software Engineering roles. I’m a very passionate Competitive Programming Instructor and have a decent experience for the same at various popular edtech platforms, and have taken sessions with IITs, NITs, and other engineering colleges . I will be your mentor for this session and will walk you through the topics the following slides.
  • 3. What is Data Science?? O Data Science as a multi-disciplinary subject encompasses the use of mathematics, statistics, and computer science to study and evaluate data. The key objective of Data Science is to extract valuable information for use in strategic decision making, product development, trend analysis and forecasting. O A Data scientist is sort of 'jack-of-all-trades' for data crunching. Basically, 3 main skills a data scientist needs to possess are mathematics/statistics, computer programming literacy and knowledge of particular business.
  • 4. Data Science is a Broader Field
  • 6. How to become a Data Scientist?? Math Programming Languages Data Wrangling and Management Data Analysis and Visualization Machine Learning Deep Learning
  • 7. Mathematics O Linear Algebra: Matrix, Eigen, Tensor etc. O Calculus: Differentiation and Integration. O Probability: Bayes Theorem, Optimization etc. O Statistics: Inferential Statistics, Descriptive Statistics, Chi squared Testes, Random Variable, Gaussian And Normal Distribution.
  • 8. Programming Languages O Python: It is the Bible. → Easy to understand, i.e., plane English → No semicolon → Simple and tons of libraries available O Talk about Packages → Data visualization using ggplot2, tidy are extremely important
  • 10. Data Wrangling and Management O Data Mining O Data Cleaning O Data Management Relevant Skills: → MySQL: RDBMS → NoSQL: Mongo DB, Cassandra etc. JOIN
  • 11. Data Analysis and Visualization O Plotting libraries in programming languages, e.g., • plotly, matplotlib, seaborn → python • ggplot2 → R • Tableau and PowerBI is booming now. [Pandas and Numpy for Data Analysis]
  • 12. Machine Learning and Deep Learning O Domain Knowledge??? HEALTHCARE, BUSINESS, FINANCE, SPORTS etc. Supervised Unsupervised Reinforcement
  • 13. Machine Learning Algorithms O Topics: Regression, Decision Tree, Random Forest, Naïve Bayes, Ensemble Learning, AdaBoost, Hierarchical Clustering, Association, k-means Clustering, SVM, KNN, Gradient Descent, Cross Validation, Entropy, Accuracy, Precision, Collaborative Filtration, PCA, Markov model, Boltzmann theorem etc. Testing Evaluation and Validation of Models
  • 14. Deep Learning Algorithms O Neural Networks, Feed Forward NN, Fuzzy Logic, Sequence Model, LSTM, RNN, CNN, CapsNet, Time Series etc
  • 15. Big Data O Map Reduce O Hadoop O Apache O Spark O Hive O Pig O Mahout O Yarn
  • 17. Learning Outcomes O Build artificial neural networks with Tensorflow and Keras O Build Deep Learning networks to classify images with Convolutional Neural Networks O Implement machine learning, clustering, and search using TF/IDF at massive scale with Apache Spark's MLLib O Implement Sentiment Analysis with Recurrent Neural Networks O Understand reinforcement learning - and how to build a Pac-Man bot
  • 18. O Make predictions using linear regression, polynomial regression, and multivariate regression O Implement Sentiment Analysis with Recurrent Neural Networks O Understand reinforcement learning - and how to build a Pac-Man bot O Classify medical test results with a wide variety of supervised machine learning classification techniques O Cluster data using K-Means clustering and Support Vector Machines (SVM)
  • 19. O Build a spam classifier using Naive Bayes O Use decision trees to predict hiring decisions O Apply dimensionality reduction with Principal Component Analysis (PCA) to classify flowers O Predict classifications using K-Nearest-Neighbor (KNN) O Develop using iPython notebooks O Understand statistical measures such as standard deviation O Visualize data distributions, probability mass functions, and probability density functions O Visualize data with matplotlib
  • 20. O Use covariance and correlation metrics O Apply conditional probability for finding correlated features O Use Bayes' Theorem to identify false positives O Understand complex multi-level models O Use train/test and K-Fold cross validation to choose the right model O Build a movie recommender system using item-based and user-based collaborative filtering O Clean your input data to remove outliers O Design and evaluate A/B tests using T-Tests and P-Values
  • 21. Best Blogs and Open Source Community O Medium AI Community O Official Documentations O Github and Stackoverflow O Kaggle- Spend 5 hours of a day here O Cheat Sheets from Amazon aws
  • 22. Best Books For Machine/ Deep Learning Data Science Beginners Book Statistics
  • 23. Overview of Data Science Tools and Packages