SlideShare a Scribd company logo
Big Data Analytics 1
Big Data Analytics
Orientation
Basanta Joshi, PhD
Asst. Prof., Depart of Electronics and Computer Engineering
Deputy Director, Center for Applied Research and Development
Member, Laboratory for ICT Research and Development (LICT)
Institute of Engineering
basanta@ioe.edu.np
http://www.basantajoshi.com.np
https://scholar.google.com/citations?user=iocLiGcAAAAJ
https://www.researchgate.net/profile/Basanta_Joshi2
Big Data Analytics 2
About me
Current Affiliation
• Assistant Professor, Department of Electronics and
Computer Engineering, Pulchowk Campus
• https://pcampus.edu.np/
• Deputy Director, Center for Applied Research &
Development(CARD) https://card.ioe.edu.np/
• Member, Laboratory for ICT Research and Development
(LICT) http://lict.ioe.edu.np/
Education
• Bachelor of Electronics and Communication
Engineering, IOE 2005
• MSc in Information and Communication Engineering,
IOE, 2008
• Doctor of Engineering, Osaka Sangyo University, Japan
2013
Industry Experience
• Senior Software Engineer, D2hawkeye, Nepal
• Research Consultant, LogPoint, Nepal
• System Administration and Web development, Japan
• IT Consultant In Various National & International
Projects
Interests and Research Area
• 3D Reconstruction/ Motion Tracking
• Nepali Language Processing
• Video Analytics
• Network Analytics
• Medical Data Analytics
• Edge Analytics
• Use of Big data Analytics in above mentioned Areas
Big Data Analytics 3
Fill a survey form
MSDSA Students BDA Status Survey Form
https://forms.gle/ripvzKHwq8k8NqnA8
Big Data Analytics 4
Big data world then
Big Data Analytics 5
Big data world then
Big Data Analytics 6
Big data world then
The ‘Data Lake’ of Antiquity
www.extentia.com
Big Data Analytics 7
Big data world then
Big Data Analytics by Vikram Neerugatti
Big Data Analytics 8
Big Data Statistics 2020
• Every person will generate 1.7 megabytes in just a second. The times users spend on social is about 65 minutes
• Facebook has gained around 2.7 billion active monthly users and generates 4 petabytes of data per day
• Facebook stated that 3.3 billion people were using at least one of the company's core products (Facebook, WhatsApp,
Instagram, or Messenger) each month.
• YouTube currently counts 2 billion monthly active users and 500 hours of content are uploaded to the platform every
minute s
• In 2019, Amazon has 150 million mobile users
• Twitter users send more than 528,780 tweets every minute.
• Over 2.5 quintillion bytes of data is generated worldwide every day.
• The total amount of data created, captured, copied, and consumed globally has reached 64.2 zettabytes in 2020 and is
forecast to increase rapidly over next five projected to grow to more than 180 zettabytes by 2025.
• By 2021, insight-driven businesses are predicted to take $1.8 trillion annually from their less-informed peers.
• Data-driven organizations are 23 times more likely to acquire customers than their peers.
• Businesses are spending $187 billion on big data and analytics in 2019.
• 91.6% of firms worldwide confirm an increased pace in investment in big data in 2019.
Big Data Analytics 9
Big Data Statistics 2021
• In 2020, Every person generated 1.7 megabytes in just a second. The times users spend on social is about 145 minutes
• Facebook has gained around 2.9 billion active monthly users and generates 4 petabytes of data per day
• Facebook stated that 3.58 billion people were using at least one of the company's core products (Facebook, WhatsApp,
Instagram, or Messenger) each month.
• In August 2020, YouTube currently counts 2 billion monthly active users and 500 hours of content are uploaded to the
platform every minute s
• In 2020, Amazon has 150 million mobile users
• Twitter users send more than 529,020 tweets every minute.
• Over 2.5 quintillion bytes of data is generated worldwide every day.
• The total amount of data created, captured, copied, and consumed globally has reached 64.2 zettabytes in
2020 and is forecast to increase rapidly over next five projected to grow to more than 180 zettabytes by 2025.
• By 2021, insight-driven businesses are predicted to take $1.8 trillion annually from their less-informed peers.
• Using big data, Netflix saves $1 billion per year on customer retention.
• Worldwide spending on big data and business analytics (BDA) solutions is forecast to reach $215.7 billion this year, an
increase of 10.1% over 2020
• 97.2% of organizations are investing in big data and AI.
https://techjury.net/blog/big-data-statistics/#gref https://www.statista.com/topics/1464/big-data/
December,2021
Big Data Analytics 10
Big Data Statistics 2022/23
1. Each day, Google processes 8.5 billion searches. (Source: Oberlo)
2. WhatsApp users exchange up to 65 billion messages daily. (Source: Connectiva Systems)
3. 95% of businesses cite the need to manage unstructured data as a problem for their business.
(Source: Statista)
4. 45% of businesses worldwide are running at least one of their Big Data workloads in the
cloud.(Source: ZD Net)
5. 80-90% of the data we generate today is unstructured.(Source: CIO)
6. The market of Big Data analytics in banking is set to reach $62.10 billion by 2025.(Source: KR Elixir)
7. Big data in healthcare could be worth $71.6 billion by 2027.(Source: Globe News Wire)
8. According to big data stats, cyber scams have gone up 400% at the start of the pandemic. (Source:
Reed Smith)
9. Data creation will grow to more than 180 zettabytes by 2025.(Source: Statista)
10. Today it would take a person approximately 181 million years to download all the data from the
internet.(Source: Unicorn Insights)
Big Data Analytics 11
Big Data Statistics 2022/23
11. The demand for composite data analytics professionals will grow by 31% by 2030.(Source: Forbes)
12. Internet users spent a total of 1.2 billion years online.(Source: Digital)
13. Social media accounts for 33% of the total time spent online.(Source: Global Web Index)
14. Facebook has almost two billion daily active users.(Source: Datareportal)
15. Tweeps send over 870 million tweets per day.(Source: Internet Live Stats)
16. 97.2% of organizations are investing in big data and AI.(Source: New Vantage)
17. Big data will grow at a 12% CAGR by 2026.(Source: Market Data Forecast)
18. The software sector will bring in the highest revenue by 2027.(Source: Statista)
19. The number of IoT devices could rise to 41.6 billion by 2025.(Source: IDC)
20. Worldwide spending on Big Data analytics solutions will be worth over $274.3 billion in
2022.(Source: Business Wire)
21. The ratio between unique and replicated data will be 1:10 by 2024.(Source: IDC)
22. Data science jobs will increase by around 28% by 2026.(Source: Towards Data Science)
Big Data Analytics 12
https://financesonline.com/big-data-statistics/
December,2021
Big Data Statistics 2021
Big Data Analytics 13
What is Big Data ?
• Big Data is the next generation of data warehousing.
No single definition; here is from Wikipedia:
Big data is the term for a collection of data sets so large
and complex that it becomes difficult to process using on-
hand database management tools or traditional data
processing applications.
•The challenges include capture, curation, storage,
search, sharing, transfer, analysis, and visualization.
The trend to larger data sets is due to the additional
information derivable from analysis of a single large set of
related data, as compared to separate smaller sets with
the same total amount of data, allowing correlations to be
found to "spot business trends, determine quality of
research, prevent diseases, link legal citations, combat
crime, and determine real-time roadway traffic
conditions.”
2
Big Data Analytics 14
Describing Data Size
https://twitter.com/paolopisani/
Big Data Analytics 15
Big Data challenges
Big Data Analytics 16
Applications
Big Data Analytics 17
Technologies for Big data
Big Data Analytics 18
Big Data Analytics
Big Data Analytics 19
Big Data Analytics Use
Cases
Big Data Analytics 21
Big data Landscape
Big Data Analytics 22
Big data Landscape
Big Data Analytics 23
Big data Landscape
Big Data Analytics 24
Big data Landscape
Big Data Analytics 25
Big Data Analytics 26
Big Data Analytics 27
Lamda Architecture
Big Data Analytics 28
Hadoop Ecosystem
Big Data Analytics 29
Stream Processing
Architecture
Big Data Analytics 30
Course Agenda
Big Data Analytics 31
Course Objectives
• To give overview of Big data and latest Trend in Big Data Analytics
• To introduce the technologies for Handling Big Data
• To introduce Hadoop and Components in Hadoop Platform
• To perform basic exploration of large, complex datasets and understand
scalable big data analysis
• To apply big data tool for advanced analytics disciplines such as predictive
analytics, data mining, text analytics and statistical analysis.
Big Data Analytics 32
Syllabus
• Fundamentals of Big Data Analytics (6 hours)
• Big Data and the V’s of Big Data, Handling and Processing Big Data, The Big Data landscape, Big
Data Analytics, Examples of real world big data problems
•
• Technologies for Handling Big Data(8 hours)
• GFS, HDFS, Google Big Table, Introduction to Hadoop, functioning of Hadoop, MapReduce,
RDDs Cloud Computing for big data
•
• Understanding Big Data Technology Foundations( 12 hours)
• Big data stack i.e. data source layer, ingestion layer, storage layer, processing layer, security layer,
visualization layer, visualization approaches etc.
• Architectural design patterns and programming models used for Real-World Applications, Lamda
Architecture e.t.c
Big Data Analytics 33
Syllabus
• Understanding Big data Ecosystem ( 12hours)
• Hadoop and its ecosystem, Introduction and Experimentation with Apache Flume, Apache Kafka, Apache
Zookeeper, , Apache Spark, Apache Mesos, Apache Kudu etc
• Amazon and Google Cloud Platform, Microsoft Azure , Amazon Kinesis.t.c
•
• Using Big Data for Analytics( 10 hours)
• Basic approaches to querying and exploring big data ,
• New Databases for Big data Analytics -Classification, Characteristics and Comparison: Apache HBase, Apache
Hive, Apache Cassandra e.t.c
• Descriptive, Diagnostic, Predictive, Prescriptive Analytics ,Stream Analytics and Location Analytics
• Case studies for big data analytics
•
• Machine Learning with Big Data ( 12 hours)
• Introduction to parallel, distributed and scalable machine learning.
• Using processing layer tools (Apache Spark, Apache Mahout) to train, evaluate, and validate basic predictive
models.
• Case studies for application of machine learning in big data.
Big Data Analytics 34
Teaching Methodology
• Lecture notes with available on google classroom
CLASSROOM CODE: rnsrtvb
• Students should submit assignments including programming assignments in google
classroom
• Students should do presentation upload in google classroom
Big Data Analytics 35
Marking Scheme
S.N. Course Code Course Title Credit Internal External Total
Big Data Analytics 4 40 60 100
• Students will be evaluated and Internal Marks will be given
by course teacher
• Marks distribution
– Attendance 20% => 8
– Student Performance (Assignments+ Presentation) 20% => 8
– Projects 20% => 8
– Quiz + Assessment 40% => 16
• Students have to appear in Final exam to obtain External
Marks
Big Data Analytics 36
Teaching Methodology
• Lecture notes with available on google classroom
Students have been already invited to google CLASSROOM
• Students should submit assignments including a project in google classroom
• Students should do presentation upload in google classroom
• Presentation/ Project will be done by in a group of two
Big Data Analytics 37
References
• Tom White. Hadoop: The Definitive Guide, Storage and Analysis at Internet Scale, O'Reilly
Media, Fourth Edition,2015
• Nathan Marz and James Warren. Big Data: Principles and best practices of scalable realtime data
systems, Manning Publications,First Edition,2015
• Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira. Hadoop Application Architectures:
Designing Real-World Big Data Applications, O'Reilly Media,First Edition, 2015
• Holden Karau, Andy Konwinski, Patrick Wendell, MateiZaharia . Learning Spark: Lightning -Fast
Big Data Analysis,First Edition,O'Reilly Media, 2015:
• Nataraj Dasgupta, Practical Big Data Analytics: Hands-on techniques to implement enterprise
analytics and machine learning using Hadoop, Spark, NoSQL and R, Packt Publishing,2018
• http://index-of.co.uk/Big-Data-Technologies/
Big Data Analytics 38
Thank you !!!

More Related Content

PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
PPTX
Bigdata " new level"
PDF
Bigdata the technological renaissance
PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
PPTX
Big Data: The Main Pillar of Technology Disruption
PDF
Big-Data-AryaTadbirNetworkDesigners
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
PDF
QuickView #3 - Big Data
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Bigdata " new level"
Bigdata the technological renaissance
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Big Data: The Main Pillar of Technology Disruption
Big-Data-AryaTadbirNetworkDesigners
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
QuickView #3 - Big Data

Similar to Big Data Analytics Orientation. .pdf (20)

PPTX
Foundations of Big Data: Concepts, Techniques, and Applications
PDF
Big Data Analytics Introduction chapter.pdf
PPTX
Big data and its applications
PDF
Big data - a review (2013 4)
PDF
Big Data - Gerami
PPTX
An Introduction to Data Science.pptx learn
PPTX
Kartikey tripathi
PPTX
Big Data World
PPTX
2014 Big Data Research by IDG Enterprise
PPTX
big data.pptx
PPTX
Big data analytics
PDF
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
PPTX
Big data Analytics
PPTX
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx
PPTX
Introduction to Big Data
PPTX
DEVOLSAFGSDFHGKJHJGHFGDFSDFDSDASFDGFUC.pptx
PDF
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
PPTX
PPTX
bigdata.pptx
PDF
Analysis on big data concepts and applications
Foundations of Big Data: Concepts, Techniques, and Applications
Big Data Analytics Introduction chapter.pdf
Big data and its applications
Big data - a review (2013 4)
Big Data - Gerami
An Introduction to Data Science.pptx learn
Kartikey tripathi
Big Data World
2014 Big Data Research by IDG Enterprise
big data.pptx
Big data analytics
Big data document (basic concepts,3vs,Bigdata vs Smalldata,importance,storage...
Big data Analytics
PRESTAdASFDGFHGHKJLKKHGFDSsadsfdgfhfgghjA.pptx
Introduction to Big Data
DEVOLSAFGSDFHGKJHJGHFGDFSDFDSDASFDGFUC.pptx
A Roadmap Towards Big Data Opportunities, Emerging Issues and Hadoop as a Sol...
bigdata.pptx
Analysis on big data concepts and applications
Ad

Recently uploaded (20)

PPT
Miokarditis (Inflamasi pada Otot Jantung)
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PPTX
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
PPTX
Supervised vs unsupervised machine learning algorithms
PDF
Lecture1 pattern recognition............
PPTX
oil_refinery_comprehensive_20250804084928 (1).pptx
PDF
Foundation of Data Science unit number two notes
PPTX
Global journeys: estimating international migration
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
Computer network topology notes for revision
PPTX
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
PPT
Reliability_Chapter_ presentation 1221.5784
PDF
Clinical guidelines as a resource for EBP(1).pdf
PPTX
IB Computer Science - Internal Assessment.pptx
PPTX
Business Ppt On Nestle.pptx huunnnhhgfvu
PPTX
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Introduction to Knowledge Engineering Part 1
Miokarditis (Inflamasi pada Otot Jantung)
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
ALIMENTARY AND BILIARY CONDITIONS 3-1.pptx
Supervised vs unsupervised machine learning algorithms
Lecture1 pattern recognition............
oil_refinery_comprehensive_20250804084928 (1).pptx
Foundation of Data Science unit number two notes
Global journeys: estimating international migration
climate analysis of Dhaka ,Banglades.pptx
Computer network topology notes for revision
mbdjdhjjodule 5-1 rhfhhfjtjjhafbrhfnfbbfnb
Reliability_Chapter_ presentation 1221.5784
Clinical guidelines as a resource for EBP(1).pdf
IB Computer Science - Internal Assessment.pptx
Business Ppt On Nestle.pptx huunnnhhgfvu
05. PRACTICAL GUIDE TO MICROSOFT EXCEL.pptx
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Introduction to Knowledge Engineering Part 1
Ad

Big Data Analytics Orientation. .pdf

  • 1. Big Data Analytics 1 Big Data Analytics Orientation Basanta Joshi, PhD Asst. Prof., Depart of Electronics and Computer Engineering Deputy Director, Center for Applied Research and Development Member, Laboratory for ICT Research and Development (LICT) Institute of Engineering basanta@ioe.edu.np http://www.basantajoshi.com.np https://scholar.google.com/citations?user=iocLiGcAAAAJ https://www.researchgate.net/profile/Basanta_Joshi2
  • 2. Big Data Analytics 2 About me Current Affiliation • Assistant Professor, Department of Electronics and Computer Engineering, Pulchowk Campus • https://pcampus.edu.np/ • Deputy Director, Center for Applied Research & Development(CARD) https://card.ioe.edu.np/ • Member, Laboratory for ICT Research and Development (LICT) http://lict.ioe.edu.np/ Education • Bachelor of Electronics and Communication Engineering, IOE 2005 • MSc in Information and Communication Engineering, IOE, 2008 • Doctor of Engineering, Osaka Sangyo University, Japan 2013 Industry Experience • Senior Software Engineer, D2hawkeye, Nepal • Research Consultant, LogPoint, Nepal • System Administration and Web development, Japan • IT Consultant In Various National & International Projects Interests and Research Area • 3D Reconstruction/ Motion Tracking • Nepali Language Processing • Video Analytics • Network Analytics • Medical Data Analytics • Edge Analytics • Use of Big data Analytics in above mentioned Areas
  • 3. Big Data Analytics 3 Fill a survey form MSDSA Students BDA Status Survey Form https://forms.gle/ripvzKHwq8k8NqnA8
  • 4. Big Data Analytics 4 Big data world then
  • 5. Big Data Analytics 5 Big data world then
  • 6. Big Data Analytics 6 Big data world then The ‘Data Lake’ of Antiquity www.extentia.com
  • 7. Big Data Analytics 7 Big data world then Big Data Analytics by Vikram Neerugatti
  • 8. Big Data Analytics 8 Big Data Statistics 2020 • Every person will generate 1.7 megabytes in just a second. The times users spend on social is about 65 minutes • Facebook has gained around 2.7 billion active monthly users and generates 4 petabytes of data per day • Facebook stated that 3.3 billion people were using at least one of the company's core products (Facebook, WhatsApp, Instagram, or Messenger) each month. • YouTube currently counts 2 billion monthly active users and 500 hours of content are uploaded to the platform every minute s • In 2019, Amazon has 150 million mobile users • Twitter users send more than 528,780 tweets every minute. • Over 2.5 quintillion bytes of data is generated worldwide every day. • The total amount of data created, captured, copied, and consumed globally has reached 64.2 zettabytes in 2020 and is forecast to increase rapidly over next five projected to grow to more than 180 zettabytes by 2025. • By 2021, insight-driven businesses are predicted to take $1.8 trillion annually from their less-informed peers. • Data-driven organizations are 23 times more likely to acquire customers than their peers. • Businesses are spending $187 billion on big data and analytics in 2019. • 91.6% of firms worldwide confirm an increased pace in investment in big data in 2019.
  • 9. Big Data Analytics 9 Big Data Statistics 2021 • In 2020, Every person generated 1.7 megabytes in just a second. The times users spend on social is about 145 minutes • Facebook has gained around 2.9 billion active monthly users and generates 4 petabytes of data per day • Facebook stated that 3.58 billion people were using at least one of the company's core products (Facebook, WhatsApp, Instagram, or Messenger) each month. • In August 2020, YouTube currently counts 2 billion monthly active users and 500 hours of content are uploaded to the platform every minute s • In 2020, Amazon has 150 million mobile users • Twitter users send more than 529,020 tweets every minute. • Over 2.5 quintillion bytes of data is generated worldwide every day. • The total amount of data created, captured, copied, and consumed globally has reached 64.2 zettabytes in 2020 and is forecast to increase rapidly over next five projected to grow to more than 180 zettabytes by 2025. • By 2021, insight-driven businesses are predicted to take $1.8 trillion annually from their less-informed peers. • Using big data, Netflix saves $1 billion per year on customer retention. • Worldwide spending on big data and business analytics (BDA) solutions is forecast to reach $215.7 billion this year, an increase of 10.1% over 2020 • 97.2% of organizations are investing in big data and AI. https://techjury.net/blog/big-data-statistics/#gref https://www.statista.com/topics/1464/big-data/ December,2021
  • 10. Big Data Analytics 10 Big Data Statistics 2022/23 1. Each day, Google processes 8.5 billion searches. (Source: Oberlo) 2. WhatsApp users exchange up to 65 billion messages daily. (Source: Connectiva Systems) 3. 95% of businesses cite the need to manage unstructured data as a problem for their business. (Source: Statista) 4. 45% of businesses worldwide are running at least one of their Big Data workloads in the cloud.(Source: ZD Net) 5. 80-90% of the data we generate today is unstructured.(Source: CIO) 6. The market of Big Data analytics in banking is set to reach $62.10 billion by 2025.(Source: KR Elixir) 7. Big data in healthcare could be worth $71.6 billion by 2027.(Source: Globe News Wire) 8. According to big data stats, cyber scams have gone up 400% at the start of the pandemic. (Source: Reed Smith) 9. Data creation will grow to more than 180 zettabytes by 2025.(Source: Statista) 10. Today it would take a person approximately 181 million years to download all the data from the internet.(Source: Unicorn Insights)
  • 11. Big Data Analytics 11 Big Data Statistics 2022/23 11. The demand for composite data analytics professionals will grow by 31% by 2030.(Source: Forbes) 12. Internet users spent a total of 1.2 billion years online.(Source: Digital) 13. Social media accounts for 33% of the total time spent online.(Source: Global Web Index) 14. Facebook has almost two billion daily active users.(Source: Datareportal) 15. Tweeps send over 870 million tweets per day.(Source: Internet Live Stats) 16. 97.2% of organizations are investing in big data and AI.(Source: New Vantage) 17. Big data will grow at a 12% CAGR by 2026.(Source: Market Data Forecast) 18. The software sector will bring in the highest revenue by 2027.(Source: Statista) 19. The number of IoT devices could rise to 41.6 billion by 2025.(Source: IDC) 20. Worldwide spending on Big Data analytics solutions will be worth over $274.3 billion in 2022.(Source: Business Wire) 21. The ratio between unique and replicated data will be 1:10 by 2024.(Source: IDC) 22. Data science jobs will increase by around 28% by 2026.(Source: Towards Data Science)
  • 12. Big Data Analytics 12 https://financesonline.com/big-data-statistics/ December,2021 Big Data Statistics 2021
  • 13. Big Data Analytics 13 What is Big Data ? • Big Data is the next generation of data warehousing. No single definition; here is from Wikipedia: Big data is the term for a collection of data sets so large and complex that it becomes difficult to process using on- hand database management tools or traditional data processing applications. •The challenges include capture, curation, storage, search, sharing, transfer, analysis, and visualization. The trend to larger data sets is due to the additional information derivable from analysis of a single large set of related data, as compared to separate smaller sets with the same total amount of data, allowing correlations to be found to "spot business trends, determine quality of research, prevent diseases, link legal citations, combat crime, and determine real-time roadway traffic conditions.” 2
  • 14. Big Data Analytics 14 Describing Data Size https://twitter.com/paolopisani/
  • 15. Big Data Analytics 15 Big Data challenges
  • 16. Big Data Analytics 16 Applications
  • 17. Big Data Analytics 17 Technologies for Big data
  • 18. Big Data Analytics 18 Big Data Analytics
  • 19. Big Data Analytics 19 Big Data Analytics Use Cases
  • 20. Big Data Analytics 21 Big data Landscape
  • 21. Big Data Analytics 22 Big data Landscape
  • 22. Big Data Analytics 23 Big data Landscape
  • 23. Big Data Analytics 24 Big data Landscape
  • 26. Big Data Analytics 27 Lamda Architecture
  • 27. Big Data Analytics 28 Hadoop Ecosystem
  • 28. Big Data Analytics 29 Stream Processing Architecture
  • 29. Big Data Analytics 30 Course Agenda
  • 30. Big Data Analytics 31 Course Objectives • To give overview of Big data and latest Trend in Big Data Analytics • To introduce the technologies for Handling Big Data • To introduce Hadoop and Components in Hadoop Platform • To perform basic exploration of large, complex datasets and understand scalable big data analysis • To apply big data tool for advanced analytics disciplines such as predictive analytics, data mining, text analytics and statistical analysis.
  • 31. Big Data Analytics 32 Syllabus • Fundamentals of Big Data Analytics (6 hours) • Big Data and the V’s of Big Data, Handling and Processing Big Data, The Big Data landscape, Big Data Analytics, Examples of real world big data problems • • Technologies for Handling Big Data(8 hours) • GFS, HDFS, Google Big Table, Introduction to Hadoop, functioning of Hadoop, MapReduce, RDDs Cloud Computing for big data • • Understanding Big Data Technology Foundations( 12 hours) • Big data stack i.e. data source layer, ingestion layer, storage layer, processing layer, security layer, visualization layer, visualization approaches etc. • Architectural design patterns and programming models used for Real-World Applications, Lamda Architecture e.t.c
  • 32. Big Data Analytics 33 Syllabus • Understanding Big data Ecosystem ( 12hours) • Hadoop and its ecosystem, Introduction and Experimentation with Apache Flume, Apache Kafka, Apache Zookeeper, , Apache Spark, Apache Mesos, Apache Kudu etc • Amazon and Google Cloud Platform, Microsoft Azure , Amazon Kinesis.t.c • • Using Big Data for Analytics( 10 hours) • Basic approaches to querying and exploring big data , • New Databases for Big data Analytics -Classification, Characteristics and Comparison: Apache HBase, Apache Hive, Apache Cassandra e.t.c • Descriptive, Diagnostic, Predictive, Prescriptive Analytics ,Stream Analytics and Location Analytics • Case studies for big data analytics • • Machine Learning with Big Data ( 12 hours) • Introduction to parallel, distributed and scalable machine learning. • Using processing layer tools (Apache Spark, Apache Mahout) to train, evaluate, and validate basic predictive models. • Case studies for application of machine learning in big data.
  • 33. Big Data Analytics 34 Teaching Methodology • Lecture notes with available on google classroom CLASSROOM CODE: rnsrtvb • Students should submit assignments including programming assignments in google classroom • Students should do presentation upload in google classroom
  • 34. Big Data Analytics 35 Marking Scheme S.N. Course Code Course Title Credit Internal External Total Big Data Analytics 4 40 60 100 • Students will be evaluated and Internal Marks will be given by course teacher • Marks distribution – Attendance 20% => 8 – Student Performance (Assignments+ Presentation) 20% => 8 – Projects 20% => 8 – Quiz + Assessment 40% => 16 • Students have to appear in Final exam to obtain External Marks
  • 35. Big Data Analytics 36 Teaching Methodology • Lecture notes with available on google classroom Students have been already invited to google CLASSROOM • Students should submit assignments including a project in google classroom • Students should do presentation upload in google classroom • Presentation/ Project will be done by in a group of two
  • 36. Big Data Analytics 37 References • Tom White. Hadoop: The Definitive Guide, Storage and Analysis at Internet Scale, O'Reilly Media, Fourth Edition,2015 • Nathan Marz and James Warren. Big Data: Principles and best practices of scalable realtime data systems, Manning Publications,First Edition,2015 • Mark Grover, Ted Malaska, Jonathan Seidman, Gwen Shapira. Hadoop Application Architectures: Designing Real-World Big Data Applications, O'Reilly Media,First Edition, 2015 • Holden Karau, Andy Konwinski, Patrick Wendell, MateiZaharia . Learning Spark: Lightning -Fast Big Data Analysis,First Edition,O'Reilly Media, 2015: • Nataraj Dasgupta, Practical Big Data Analytics: Hands-on techniques to implement enterprise analytics and machine learning using Hadoop, Spark, NoSQL and R, Packt Publishing,2018 • http://index-of.co.uk/Big-Data-Technologies/
  • 37. Big Data Analytics 38 Thank you !!!