SlideShare a Scribd company logo
International School of Engineering
awards
Certificate in Engineering Excellence
in Big Data Analytics and Optimization
to
Rama Srikanth Jakkam
on successful completion of all the requirements of the 352-hour program conducted between
November 28, 2015 and May 15, 2016 followed by a project defense.
This program is certified for quality of content, assessment and pedagogy by the Language Technologies Institute (LTI)
of Carnegie Mellon University (CMU). LTI also provided assistance in curriculum development for this program.
Dated this eleventh day of August, two thousand and sixteen.
Dr. Dakshinamurthy V Kolluru Dr. Sridhar Pappu
President Executive VP - Academics
01CSE03/201605/598 Program details are on the back
Mode: Classroom Teaching
Topics Covered
Certificate Type
Certificate of Participation Assessment-based training program Professional certification
Planning and Thinking Skills for Architecting Data Science Solutions
Why build models or use data to run a business? What kind of models are built? Were do models not work? How do
you make predictions? When does big unstructured data become important?
Thinking tools: Approximations and estimations, Geometric visualization of data and models
Choosing the right models and architecting a solution: Structure and anatomy of models, Problematic data and
choosing the right experimentation
Sources of errors in predictive models and techniques to minimize them
Interacting with technical and business teams; Case study
Essential Engineering Skills in Big Data Analytics
Reading from Excel, CSV and other forms; Data exploration (histograms, bar charts, box plots, line graphs and scatter
graphs); Storytelling with data: The science, ggplot, bubble charts with multiple dimensions, gauge charts, tree maps,
heat maps and motion charts
Data pre-processing of structured data: R, Handling missing values, Binning, Standardization, Outliers/Noise, PCA, Type
conversion
Fundamentals of Probability and Statistical Methods
Probabilistic analysis of data and models, Analyzing networks and graphs: Analyzing transitions, Markov chains and
unstructured data
Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard
Deviation); Expectations of a Variable; Describing an attribute: Probability distributions (Discrete and Continuous) -
Bernoulli, Geometric, Binomial, Poisson and Exponential distributions; Special emphasis on Normal distribution; Central
Limit Theorem; t-distribution
Describing the relationship between attributes: Covariance; Correlation; ChiSquare
Inferential statistics: How to learn about the population from a sample and vice-versa, Sampling distributions,
Confidence Intervals, Hypothesis Testing; ANOVA
Statistics and Probability in Decision Modeling
Regression (Linear, Multivariate Regression) in forecasting; Analyzing and interpreting regression results;
Logistic Regression for classification
Trend analysis and Time Series; Cyclical and Seasonal analysis; Box-Jenkins method; Smoothing; Moving averages; Auto
-correlation; ARIMA – Holt-Winters method
Bayesian analysis and Naïve Bayes classifier; Bayesian Belief Networks
Optimization and Decision Analysis
Genetic algorithms: The algorithm and the process, Representing data, Why and how do they work?
Linear Programming: Graphical analysis; Sensitivity and Duality analyses
Integer and Binary programming: Applications, Problem formulation, Solving in R
Goal programming; Data development analysis
Quadratic programming
Engineering Big Data with R and Hadoop Ecosystem
Introduction—Big Data, Hadoop applications; Parallel and Distributed computing; Introduction to algorithms;
Concurrent algorithms; Linux refresher; NoSQL; GFS; HDFS; CDH4-HDFS
Map Reduce: YARN
Map Reduce Applications: Text Mining, Page Rank, Graph processing
Hadoop ecosystem components: Pig, Hive, HBase, Sqoop, Mahout, Spark, H2O, Hama, Flume, Chukwa, Avro, Whirr,
Hue, Oozie, Zookeeper, Kafka
Hadoop Streaming with Python
R-Hadoop
Text Mining, Social Network Analysis and Natural Language Processing
Introduction to text mining and text pre-processing: Write a web crawler to collect data, R, Find unique words and
counts, Handling number, Punctuations, Stop words, Incorrect spellings, Stemming, Lemmatization and TxD
computation
Unstructured vs. semi-structured data; Fundamentals of information retrieval
Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures
Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking)
Text classification and feature selection: How to use Naïve Bayes classifier for text classification
Evaluation systems on the accuracy of text mining
Sentiment Analysis
Natural Language Analysis
Discussion of text mining tools and applications
Methods and Algorithms in Machine Learning — Unsupervised and Supervised
Rule based knowledge: Logic of rules, Evaluating rules, Rule induction and Association rules
Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each
non-leaf node; Entropy; Information Gain; Generalizing Decision Trees; Information Content and Gain Ratio; Dealing
with numerical variables; Other measures of randomness; Pruning a Decision Tree; Cost as a consideration; Unwrapping
Trees as rules
Specialized decision trees (oblique trees)
Ensemble and Hybrid models
AdaBoost, Random Forests
K-Nearest Neighbor method; Wilson editing and triangulations; K-nearest neighbors in collaborative filtering, digit
recognition
Motivation for Neural Networks and its applications; Perceptron and Single Layer Neural Network, and hand
calculations; Learning in a Neural Net: Back propagation and conjugant gradient techniques; Application of Neural Net
in Face and Digit Recognition
Deep Learning techniques
Connectivity models (hierarchical clustering); Centroid models (K-Means algorithm); Distribution models (Expectation
maximization); Spectral clustering
Linear learning machines and Kernel methods in learning
VC (Vapnik-Chervonenkis) dimension; Shattering power of models
Algorithm of Support Vector Machines (SVM)
Communication, Ethical and IP Challenges for Analytics Professionals
Why is Communication important?
How to communicate effectively: Telling stories
Communications issues from daily life using examples using audio, video, blogs, charts, email, etc.
Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives
Challenges: Mix of stakeholders, Explicability of results, Visualization
Guiding Principles: Clarity, Transparency, Integrity, Humility
Framework for Effective Presentations; Examples of bad and good presentations
Writing effective technical reports
Difference between Legal and Ethical issues
Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights,
Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data
How to handle legal, ethical and IP issues at an organization and an individual level
The “Ethics Check” questions

More Related Content

PDF
603_SaiKiranPutta_CEE
PDF
587_EswarPrasadReddyMachireddy_CEE
PDF
662_AravindKumarN_CEE
PDF
566_SriramDandamudi_CEE
PDF
421_PrakashMudholkar
PDF
402_DheerajKura
PDF
438_AmeeruddinMohammed
PDF
671_JeevanRavula_CEE
603_SaiKiranPutta_CEE
587_EswarPrasadReddyMachireddy_CEE
662_AravindKumarN_CEE
566_SriramDandamudi_CEE
421_PrakashMudholkar
402_DheerajKura
438_AmeeruddinMohammed
671_JeevanRavula_CEE

What's hot (20)

PDF
Data Analytics_BigData Cert
PDF
392_SannaReddyBharath (1)
PDF
Miraj Vashi_CPEE
PDF
776_AlluruMPranav_CEE
PDF
HiteshAgarwal_CPEE
PDF
797_NaveenKKapoor_CEE
PDF
362_NeelimaKandepu (1)
PDF
848_VamsiKrishnaPenumadu_CEE
PPT
Machine learning for the Web:
DOCX
Ds shipra sharan_resume
PDF
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
PPTX
PhD Projects in Pattern Analysis Machine Intelligence Research Assistance
PDF
Machine Learning Real Life Applications By Examples
PPT
Data Mining
PDF
Bootcamp python-1
PDF
Prashant resume
PDF
Building Blocks for Distributed Geo-Knowledge Graphs
PDF
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
PPT
Provinance in scientific workflows in e science
PPT
Data quality and uncertainty visualization
Data Analytics_BigData Cert
392_SannaReddyBharath (1)
Miraj Vashi_CPEE
776_AlluruMPranav_CEE
HiteshAgarwal_CPEE
797_NaveenKKapoor_CEE
362_NeelimaKandepu (1)
848_VamsiKrishnaPenumadu_CEE
Machine learning for the Web:
Ds shipra sharan_resume
Identification of Relevant Sections in Web Pages Using a Machine Learning App...
PhD Projects in Pattern Analysis Machine Intelligence Research Assistance
Machine Learning Real Life Applications By Examples
Data Mining
Bootcamp python-1
Prashant resume
Building Blocks for Distributed Geo-Knowledge Graphs
From Knowledge Bases to Knowledge Infrastructures for Intelligent Systems
Provinance in scientific workflows in e science
Data quality and uncertainty visualization
Ad

Similar to 598_RamaSrikanthJakkam_CEE (20)

PPTX
Data scientist roadmap
PPTX
Data Science Roadmap by Swapnil Microsoft
PDF
Data Mining and Machine Learning
PDF
Certified Professional Diploma in Data Science.pdf
PDF
Brochure data science learning path board-infinity (1)
PDF
How to crack down big data?
PDF
Data Science Accelerator Program
PPTX
Data science
PPTX
Datascience Training in Hyderabad
PPTX
Data science training in Hyderabad
PPTX
Introduction to data science
PPTX
A Comprehensive Learning Path to Become a Data Science 2021.pptx
PDF
How to start your journey as a data scientist
PPTX
Internship (7)szgsdgszdssagsagzsvszszvsvszfvsz
PPTX
Internship (7)gfytfyugiujhoiipobjhvyuhjkb jh
PPTX
Big Data - IBA.pptx
PDF
Data Science: Notes and Toolkits
PPTX
Aniket_C MSC Data Science Portfolio.pptx
Data scientist roadmap
Data Science Roadmap by Swapnil Microsoft
Data Mining and Machine Learning
Certified Professional Diploma in Data Science.pdf
Brochure data science learning path board-infinity (1)
How to crack down big data?
Data Science Accelerator Program
Data science
Datascience Training in Hyderabad
Data science training in Hyderabad
Introduction to data science
A Comprehensive Learning Path to Become a Data Science 2021.pptx
How to start your journey as a data scientist
Internship (7)szgsdgszdssagsagzsvszszvsvszfvsz
Internship (7)gfytfyugiujhoiipobjhvyuhjkb jh
Big Data - IBA.pptx
Data Science: Notes and Toolkits
Aniket_C MSC Data Science Portfolio.pptx
Ad

598_RamaSrikanthJakkam_CEE

  • 1. International School of Engineering awards Certificate in Engineering Excellence in Big Data Analytics and Optimization to Rama Srikanth Jakkam on successful completion of all the requirements of the 352-hour program conducted between November 28, 2015 and May 15, 2016 followed by a project defense. This program is certified for quality of content, assessment and pedagogy by the Language Technologies Institute (LTI) of Carnegie Mellon University (CMU). LTI also provided assistance in curriculum development for this program. Dated this eleventh day of August, two thousand and sixteen. Dr. Dakshinamurthy V Kolluru Dr. Sridhar Pappu President Executive VP - Academics 01CSE03/201605/598 Program details are on the back
  • 2. Mode: Classroom Teaching Topics Covered Certificate Type Certificate of Participation Assessment-based training program Professional certification Planning and Thinking Skills for Architecting Data Science Solutions Why build models or use data to run a business? What kind of models are built? Were do models not work? How do you make predictions? When does big unstructured data become important? Thinking tools: Approximations and estimations, Geometric visualization of data and models Choosing the right models and architecting a solution: Structure and anatomy of models, Problematic data and choosing the right experimentation Sources of errors in predictive models and techniques to minimize them Interacting with technical and business teams; Case study Essential Engineering Skills in Big Data Analytics Reading from Excel, CSV and other forms; Data exploration (histograms, bar charts, box plots, line graphs and scatter graphs); Storytelling with data: The science, ggplot, bubble charts with multiple dimensions, gauge charts, tree maps, heat maps and motion charts Data pre-processing of structured data: R, Handling missing values, Binning, Standardization, Outliers/Noise, PCA, Type conversion Fundamentals of Probability and Statistical Methods Probabilistic analysis of data and models, Analyzing networks and graphs: Analyzing transitions, Markov chains and unstructured data Computing the properties of an attribute: Central tendencies (Mean, Median, Mode, Range, Variance, Standard Deviation); Expectations of a Variable; Describing an attribute: Probability distributions (Discrete and Continuous) - Bernoulli, Geometric, Binomial, Poisson and Exponential distributions; Special emphasis on Normal distribution; Central Limit Theorem; t-distribution Describing the relationship between attributes: Covariance; Correlation; ChiSquare Inferential statistics: How to learn about the population from a sample and vice-versa, Sampling distributions, Confidence Intervals, Hypothesis Testing; ANOVA Statistics and Probability in Decision Modeling Regression (Linear, Multivariate Regression) in forecasting; Analyzing and interpreting regression results; Logistic Regression for classification Trend analysis and Time Series; Cyclical and Seasonal analysis; Box-Jenkins method; Smoothing; Moving averages; Auto -correlation; ARIMA – Holt-Winters method Bayesian analysis and Naïve Bayes classifier; Bayesian Belief Networks Optimization and Decision Analysis Genetic algorithms: The algorithm and the process, Representing data, Why and how do they work? Linear Programming: Graphical analysis; Sensitivity and Duality analyses Integer and Binary programming: Applications, Problem formulation, Solving in R Goal programming; Data development analysis Quadratic programming Engineering Big Data with R and Hadoop Ecosystem Introduction—Big Data, Hadoop applications; Parallel and Distributed computing; Introduction to algorithms; Concurrent algorithms; Linux refresher; NoSQL; GFS; HDFS; CDH4-HDFS Map Reduce: YARN Map Reduce Applications: Text Mining, Page Rank, Graph processing Hadoop ecosystem components: Pig, Hive, HBase, Sqoop, Mahout, Spark, H2O, Hama, Flume, Chukwa, Avro, Whirr, Hue, Oozie, Zookeeper, Kafka Hadoop Streaming with Python R-Hadoop Text Mining, Social Network Analysis and Natural Language Processing Introduction to text mining and text pre-processing: Write a web crawler to collect data, R, Find unique words and counts, Handling number, Punctuations, Stop words, Incorrect spellings, Stemming, Lemmatization and TxD computation Unstructured vs. semi-structured data; Fundamentals of information retrieval Properties of words; Vector space models; Creating Term-Document (TxD) matrices; Similarity measures Low-level processes (Sentence Splitting; Tokenization; Part-of-Speech Tagging; Stemming; Chunking) Text classification and feature selection: How to use Naïve Bayes classifier for text classification Evaluation systems on the accuracy of text mining Sentiment Analysis Natural Language Analysis Discussion of text mining tools and applications Methods and Algorithms in Machine Learning — Unsupervised and Supervised Rule based knowledge: Logic of rules, Evaluating rules, Rule induction and Association rules Construction of Decision Trees through simplified examples; Choosing the "best" attribute at each non-leaf node; Entropy; Information Gain; Generalizing Decision Trees; Information Content and Gain Ratio; Dealing with numerical variables; Other measures of randomness; Pruning a Decision Tree; Cost as a consideration; Unwrapping Trees as rules Specialized decision trees (oblique trees) Ensemble and Hybrid models AdaBoost, Random Forests K-Nearest Neighbor method; Wilson editing and triangulations; K-nearest neighbors in collaborative filtering, digit recognition Motivation for Neural Networks and its applications; Perceptron and Single Layer Neural Network, and hand calculations; Learning in a Neural Net: Back propagation and conjugant gradient techniques; Application of Neural Net in Face and Digit Recognition Deep Learning techniques Connectivity models (hierarchical clustering); Centroid models (K-Means algorithm); Distribution models (Expectation maximization); Spectral clustering Linear learning machines and Kernel methods in learning VC (Vapnik-Chervonenkis) dimension; Shattering power of models Algorithm of Support Vector Machines (SVM) Communication, Ethical and IP Challenges for Analytics Professionals Why is Communication important? How to communicate effectively: Telling stories Communications issues from daily life using examples using audio, video, blogs, charts, email, etc. Seeing the big picture; Paying attention to details; Seeing things from multiple perspectives Challenges: Mix of stakeholders, Explicability of results, Visualization Guiding Principles: Clarity, Transparency, Integrity, Humility Framework for Effective Presentations; Examples of bad and good presentations Writing effective technical reports Difference between Legal and Ethical issues Challenges in current laws, regulations and fair information practices: Data protection, Intellectual property rights, Confidentiality, Contractual liability, Competition law, Licensing of Open Source software and Open Data How to handle legal, ethical and IP issues at an organization and an individual level The “Ethics Check” questions