Velammal College of Engineering and Technology
(Autonomous)
Department of Information Technology
21IT401
BIG DATA ENGINEERING
Syllabus
UNIT I - INTRODUCTION
Big Data Overview, Evolution of Big Data,
Definition of Big Data, Challenges with Big Data -
State of practice in Analytics,
Key roles for New Big Data Ecosystem,
Data Analytics Lifecycle Overview,
Examples for Big Data Analytics.
21IT401- BIG DATA ENGINEERING UNIT-I 2
Understanding Big Data
• Devices and sensors automatically generate
diagnostic information that needs to be stored and
processed in real time.
• Credit card companies monitor every purchase
their customers make and can identify fraudulent
purchases
• Mobile phone companies analyze subscribers’
calling patterns to determine, for example,
whether a caller’s frequent contacts are on a rival
network.
21IT401- BIG DATA ENGINEERING UNIT-I 3
Three attributes stand out as defining
Big Data characteristics:
• Huge volume of data
• Complexity of data types and structures
• Speed of new data creation and growth
21IT401- BIG DATA ENGINEERING UNIT-I 4
Definition of Big Data
• Big Data is data whose scale, distribution,
diversity, and/or timeliness require the use of
new technical architectures and analytics to
enable insights that unlock new sources of
business value.
21IT401- BIG DATA ENGINEERING UNIT-I 5
Evolution of Big Data
The evolution of big data has been driven by technological
advancements, increasing data generation, and the growing
recognition of the value of data insights. Here’s a chronological
overview of the key phases and milestones in the evolution of big
data:
1.Early Data Processing (1960s - 1980s)
2. The Rise of the Internet and Digitalization (1990s)
3. Web 2.0 and the Explosion of User-Generated Content (2000s)
4. Advancements in Big Data Technologies (2010s)
5. Integration and Real-Time Analytics (Mid 2010s - Present)
6. Current Trends and Future Directions (2020s and Beyond)
21IT401- BIG DATA ENGINEERING UNIT-I 6
Evolution of Big Data
1.Early Data Processing
• 1960s: The advent of databases and data management systems
began with the development of hierarchical and network
databases, such as IBM’s IMS (Information Management
System).
• 1970s: The introduction of relational databases by E.F. Codd
at IBM, leading to the creation of the SQL language and the
development of the first relational database management
systems (RDBMS), like Oracle.
• 1980s: Data warehousing concepts emerged, allowing
organizations to aggregate data from various sources for
analysis and reporting.
21IT401- BIG DATA ENGINEERING UNIT-I 7
Evolution of Big Data
2. The Rise of the Internet and Digitalization (1990s)
• 1990s: The proliferation of the internet and the digitalization
of information led to exponential growth in data generation. E-
commerce, email, and early web applications contributed
significantly to data volumes.
• Data mining techniques were developed to extract patterns and
insights from large datasets.
21IT401- BIG DATA ENGINEERING UNIT-I 8
Evolution of Big Data
3. Web 2.0 and the Explosion of User-Generated Content
(2000s)
• Early 2000s: The rise of Web 2.0 technologies, characterized
by user-generated content, social media, and multimedia,
resulted in an explosion of unstructured data.
• 2004: The term "big data" began to gain traction, emphasizing
the challenges associated with managing and processing vast
amounts of diverse and rapidly growing data.
• 2006: The introduction of Hadoop by Doug Cutting and Mike
Cafarella, inspired by Google’s MapReduce and Google File
System (GFS) papers, provided a scalable, distributed
framework for processing large datasets.
21IT401- BIG DATA ENGINEERING UNIT-I 9
Evolution of Big Data
4. Advancements in Big Data Technologies (2010s)
• 2010s: Significant advancements in big data technologies and
frameworks emerged, including Apache Spark, which offered
faster data processing capabilities than Hadoop.
• NoSQL databases, such as MongoDB, Cassandra, and HBase,
were developed to handle unstructured and semi-structured
data.
• The cloud computing revolution provided scalable and cost-
effective storage and processing solutions, with services like
Amazon Web Services (AWS), Microsoft Azure, and Google
Cloud Platform becoming popular.
• Machine learning and AI technologies advanced, enabling
more sophisticated data analytics and predictive modeling.
21IT401- BIG DATA ENGINEERING UNIT-I 10
Evolution of Big Data
5. Integration and Real-Time Analytics
(Mid 2010s - Present)
• The focus shifted to integrating big data with traditional
enterprise data systems, leading to the rise of hybrid data
architectures.
• Real-time data processing and analytics became critical, with
technologies like Apache Kafka enabling real-time data
streaming and processing.
• Data lakes emerged as a way to store vast amounts of raw data
in its native format until needed for analysis.
21IT401- BIG DATA ENGINEERING UNIT-I 11
Evolution of Big Data
6. Current Trends and Future Directions (2020s and
Beyond)
• 2020s: The convergence of big data with AI and machine learning
has led to more intelligent and automated data analytics.
• Edge computing is gaining traction, processing data closer to where
it is generated to reduce latency and bandwidth usage.
• Privacy, security, and ethical considerations have become
paramount, driven by regulations like GDPR and CCPA.
• The rise of DataOps (Data Operations) emphasizes the need for
collaboration and automation in data management processes to
improve the quality and speed of data analytics.
• Quantum computing, though in its early stages, holds the potential
to revolutionize data processing capabilities in the future.
21IT401- BIG DATA ENGINEERING UNIT-I 12
Challenges with Big Data
Big data refers to the vast volumes of structured and
unstructured data generated at high velocity from a wide
variety of sources.
1. Volume
2. Velocity
3. Variety
4. Veracity
5. Complexity
6. Scalability
7. Storage
21IT401- BIG DATA ENGINEERING UNIT-I 13
Challenges with Big Data
8. Data Governance
9. Security
10. Data Integration
11. Data Analysis
12. Talent Gap
13. Cost
21IT401- BIG DATA ENGINEERING UNIT-I 14
Strategies to Address Big Data Challenges:
• Invest in scalable and flexible infrastructure: Use cloud-
based solutions to handle data storage and processing needs.
• Employ robust data governance frameworks: Implement
policies and practices to ensure data quality, security, and
compliance.
• Use advanced analytics tools: Leverage machine learning
and AI to derive actionable insights from big data.
• Foster talent development: Invest in training and
development programs to build a skilled workforce.
• Adopt data integration platforms: Utilize ETL (Extract,
Transform, Load) tools to streamline data integration from
diverse sources.
21IT401- BIG DATA ENGINEERING UNIT-I 15
State of practice in Analytics
Current business problems provide many opportunities
for organizations to become more analytical and data driven.
• BI Versus Data Science
• Current Analytical Architecture
• Drivers of Big Data
• Emerging Big Data Ecosystem and a New Approach to
Analytics
21IT401- BIG DATA ENGINEERING UNIT-I 16
State of practice in Analytics
BI Versus Data Science
21IT401- BIG DATA ENGINEERING UNIT-I 17
State of practice in Analytics
BI Versus Data Science
21IT401- BIG DATA ENGINEERING UNIT-I 18
State of practice in Analytics
BI Versus Data Science
21IT401- BIG DATA ENGINEERING UNIT-I 19
State of practice in Analytics
Current Analytical Architecture
21IT401- BIG DATA ENGINEERING UNIT-I 20
State of practice in Analytics
Drivers of Big Data
21IT401- BIG DATA ENGINEERING UNIT-I 21
The data now comes from multiple sources, such as these:
● Medical information, such as genomic sequencing and diagnostic imaging
● Photos and video footage uploaded to the World Wide Web
● Video surveillance, such as the thousands of video cameras spread across a
city
● Mobile devices, which provide geospatial location data of the users, as well
as metadata about text messages, phone calls, and application usage on
smart phones
● Smart devices, which provide sensor-based collection of information from
smart electric grids, smart buildings, and many other public and industry
infrastructures
● Nontraditional IT devices, including the use of radio-frequency
identification (RFID) readers, GPS navigation systems, and seismic
processing
State of practice in Analytics
Drivers of Big Data
21IT401- BIG DATA ENGINEERING UNIT-I 22
State of practice in Analytics
Emerging Big Data Ecosystem and a New Approach to
Analytics
21IT401- BIG DATA ENGINEERING UNIT-I 23
Key roles for New Big Data Ecosystem
21IT401- BIG DATA ENGINEERING UNIT-I 24
Key roles for New Big Data Ecosystem
Deep Analytical Talent
21IT401- BIG DATA ENGINEERING UNIT-I 25
• This role is technically savvy, with strong analytical
skills.
• Members possess a combination of skills to handle
raw, unstructured data and to apply complex analytical
techniques at massive scales.
• This group has advanced training in quantitative
disciplines, such as mathematics, statistics, and
machine learning.
• Examples of current professions fitting into this group
include statisticians, economists, mathematicians, and
the new role of the Data Scientist.
Key roles for New Big Data Ecosystem
Data Savvy Professionals
21IT401- BIG DATA ENGINEERING UNIT-I 26
• It has less technical depth but has a basic knowledge of
statistics or machine learning and can define key
questions that can be answered using advanced
analytics.
• These people tend to have a base knowledge of
working with data, or an appreciation for some of the
work being performed by data scientists and others
with deep analytical talent.
• Examples of data savvy professionals include financial
analysts, market research analysts, life scientists,
operations managers, and business and functional
managers
Key roles for New Big Data Ecosystem
Technology and Data Enablers
21IT401- BIG DATA ENGINEERING UNIT-I 27
• This group represents people providing
technical expertise to support analytical
projects
• This role requires skills related to computer
engineering, programming, and database
administration.
Key roles for New Big Data Ecosystem
Profile of a Data Scientist
21IT401- BIG DATA ENGINEERING UNIT-I 28
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 29
• The Data Analytics Lifecycle defines analytics
process best practices spanning discovery to
project completion.
• Phase 1—Discovery
• Phase 2—Data preparation:
• Phase 3—Model planning:
• Phase 4—Model building:
• Phase 5—Communicate results:
• Phase 6—Operationalize:
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 30
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 31
Phase 1—Discovery: In Phase 1, the team learns
the business domain, including relevant
history such as whether the organization or
business unit has attempted similar projects in
the past from which they can learn. The team
assesses the resources available to support
the project in terms of people, technology,
time, and data.
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 32
Phase 2—Data preparation: Phase 2 requires
the presence of an analytic sandbox, in which
the team can work with data and perform
analytics for the duration of the project. The
team needs to execute extract, load, and
transform (ELT) or extract, transform and load
(ETL) to get data into the sandbox. The ELT
and ETL are sometimes abbreviated as ETLT.
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 33
Phase 3—Model planning: Phase 3 is model
planning, where the team determines the
methods,techniques, and workflow it intends
to follow for the subsequent model building
phase. The team explores the data to learn
about the relationships between variables and
subsequently selects key variables and the
most suitable models.
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 34
Phase 4—Model building: In Phase 4, the team
develops datasets for testing, training, and
production purposes. In addition, in this phase
the team builds and executes models based
on the work done in the model planning
phase.
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 35
Phase 5—Communicate results: In Phase 5, the
team, in collaboration with major
stakeholders, determines if the results of the
project are a success or a failure based on the
criteria developed in Phase 1.
Data Analytics Lifecycle Overview:
21IT401- BIG DATA ENGINEERING UNIT-I 36
Phase 6—Operationalize: In Phase 6, the team
delivers final reports, briefings, code, and
technical documents. In addition, the team
may run a pilot project to implement the
models in a production environment.
Examples for Big Data Analytics
• Retail and E-commerce
– Customer behavior analysis
– Recommendation systems
– Inventory and supply chain optimization
• Banking and Finance
– Fraud detection
– Risk analytics and credit scoring
– Algorithmic trading
• Healthcare
– Patient diagnostics and treatment predictions
– Real-time monitoring using IoT
– Disease outbreak forecasting
21IT401- BIG DATA ENGINEERING
UNIT-I
37
Examples for Big Data Analytics
• Telecommunications
– Churn prediction
– Network optimization
– Customer segmentation
• Manufacturing
– Predictive maintenance
– Process optimization
– Quality control using sensor data
• Government
– Crime pattern analysis
– Traffic and transportation analytics
– Smart city planning
21IT401- BIG DATA ENGINEERING
UNIT-I
38
Examples for Big Data Analytics
• Energy
– Smart grid data analysis
– Forecasting energy demand
– Equipment failure prediction
• Social Media
– Sentiment analysis
– Trend analysis
– Targeted advertising
21IT401- BIG DATA ENGINEERING
UNIT-I
39

More Related Content

PPTX
Kartikey tripathi
PPTX
unit1 big data analysis description and defenition .pptx
PPTX
Applying Big Data
PDF
Let's make money from big data!
PDF
Ch_2a_Big Data and IoT allowing AI and Energy Industry_vf.pdf
PDF
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
PDF
QuickView #3 - Big Data
PDF
Bigdatappt 140225061440-phpapp01
Kartikey tripathi
unit1 big data analysis description and defenition .pptx
Applying Big Data
Let's make money from big data!
Ch_2a_Big Data and IoT allowing AI and Energy Industry_vf.pdf
Quick view Big Data, brought by Oomph!, courtesy of our partner Sonovate
QuickView #3 - Big Data
Bigdatappt 140225061440-phpapp01

Similar to Big Data Engineering- Introduction- Unit-I.ppt (20)

PPTX
Data sharing between private companies and research facilities
PPTX
ppt final.pptx
PPTX
Big data ppt
PPTX
Company Overview for ICT-15 Big Data Info & Networking Day
PPTX
Big data
PDF
Industrial internet big data german market study
PDF
Industrial internet big data german market study
PDF
Big data and analytics
PPTX
MongoDB IoT CITY Tour LONDON: How M2M and IoT are changing the playing field ...
PPTX
MongoDB IoT CITY Tour STUTTGART: The IoT Market Landscape, Machina Research
PPTX
Big_Data_ppt[1] (1).pptx
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
PPTX
BIGDATA-Basics-Sources-types-Impact.pptx
PPTX
BIGDATA-Basics-Sources-types-Impact.pptx
PDF
Blockchain for industry 4.0 HMI 2018
PPTX
Big data session five ( a )f
DOCX
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
PPTX
Group 4 IT INfrastructure Group presentation Final [Auto-saved].pptx
PDF
Data dynamics in IoT Era
PDF
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
Data sharing between private companies and research facilities
ppt final.pptx
Big data ppt
Company Overview for ICT-15 Big Data Info & Networking Day
Big data
Industrial internet big data german market study
Industrial internet big data german market study
Big data and analytics
MongoDB IoT CITY Tour LONDON: How M2M and IoT are changing the playing field ...
MongoDB IoT CITY Tour STUTTGART: The IoT Market Landscape, Machina Research
Big_Data_ppt[1] (1).pptx
Content1. Introduction2. What is Big Data3. Characte.docx
BIGDATA-Basics-Sources-types-Impact.pptx
BIGDATA-Basics-Sources-types-Impact.pptx
Blockchain for industry 4.0 HMI 2018
Big data session five ( a )f
BIGDATAPrepared ByMuhammad Abrar UddinIntrodu.docx
Group 4 IT INfrastructure Group presentation Final [Auto-saved].pptx
Data dynamics in IoT Era
ADV Slides: What Happened of Note in 1H 2020 in Enterprise Advanced Analytics
Ad

Recently uploaded (20)

PDF
Categorization of Factors Affecting Classification Algorithms Selection
PDF
Design Guidelines and solutions for Plastics parts
PDF
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
PPTX
Software Engineering and software moduleing
PDF
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
PDF
III.4.1.2_The_Space_Environment.p pdffdf
PPTX
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
PPTX
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
PPTX
Management Information system : MIS-e-Business Systems.pptx
PDF
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
PDF
Abrasive, erosive and cavitation wear.pdf
PPTX
Amdahl’s law is explained in the above power point presentations
PPTX
Module 8- Technological and Communication Skills.pptx
PPTX
introduction to high performance computing
PPTX
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
PPT
Total quality management ppt for engineering students
PPTX
Information Storage and Retrieval Techniques Unit III
PDF
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
PPTX
Fundamentals of Mechanical Engineering.pptx
PPTX
CyberSecurity Mobile and Wireless Devices
Categorization of Factors Affecting Classification Algorithms Selection
Design Guidelines and solutions for Plastics parts
BIO-INSPIRED ARCHITECTURE FOR PARSIMONIOUS CONVERSATIONAL INTELLIGENCE : THE ...
Software Engineering and software moduleing
A SYSTEMATIC REVIEW OF APPLICATIONS IN FRAUD DETECTION
III.4.1.2_The_Space_Environment.p pdffdf
Graph Data Structures with Types, Traversals, Connectivity, and Real-Life App...
Chemical Technological Processes, Feasibility Study and Chemical Process Indu...
Management Information system : MIS-e-Business Systems.pptx
Improvement effect of pyrolyzed agro-food biochar on the properties of.pdf
Abrasive, erosive and cavitation wear.pdf
Amdahl’s law is explained in the above power point presentations
Module 8- Technological and Communication Skills.pptx
introduction to high performance computing
Sorting and Hashing in Data Structures with Algorithms, Techniques, Implement...
Total quality management ppt for engineering students
Information Storage and Retrieval Techniques Unit III
ChapteR012372321DFGDSFGDFGDFSGDFGDFGDFGSDFGDFGFD
Fundamentals of Mechanical Engineering.pptx
CyberSecurity Mobile and Wireless Devices
Ad

Big Data Engineering- Introduction- Unit-I.ppt

  • 1. Velammal College of Engineering and Technology (Autonomous) Department of Information Technology 21IT401 BIG DATA ENGINEERING
  • 2. Syllabus UNIT I - INTRODUCTION Big Data Overview, Evolution of Big Data, Definition of Big Data, Challenges with Big Data - State of practice in Analytics, Key roles for New Big Data Ecosystem, Data Analytics Lifecycle Overview, Examples for Big Data Analytics. 21IT401- BIG DATA ENGINEERING UNIT-I 2
  • 3. Understanding Big Data • Devices and sensors automatically generate diagnostic information that needs to be stored and processed in real time. • Credit card companies monitor every purchase their customers make and can identify fraudulent purchases • Mobile phone companies analyze subscribers’ calling patterns to determine, for example, whether a caller’s frequent contacts are on a rival network. 21IT401- BIG DATA ENGINEERING UNIT-I 3
  • 4. Three attributes stand out as defining Big Data characteristics: • Huge volume of data • Complexity of data types and structures • Speed of new data creation and growth 21IT401- BIG DATA ENGINEERING UNIT-I 4
  • 5. Definition of Big Data • Big Data is data whose scale, distribution, diversity, and/or timeliness require the use of new technical architectures and analytics to enable insights that unlock new sources of business value. 21IT401- BIG DATA ENGINEERING UNIT-I 5
  • 6. Evolution of Big Data The evolution of big data has been driven by technological advancements, increasing data generation, and the growing recognition of the value of data insights. Here’s a chronological overview of the key phases and milestones in the evolution of big data: 1.Early Data Processing (1960s - 1980s) 2. The Rise of the Internet and Digitalization (1990s) 3. Web 2.0 and the Explosion of User-Generated Content (2000s) 4. Advancements in Big Data Technologies (2010s) 5. Integration and Real-Time Analytics (Mid 2010s - Present) 6. Current Trends and Future Directions (2020s and Beyond) 21IT401- BIG DATA ENGINEERING UNIT-I 6
  • 7. Evolution of Big Data 1.Early Data Processing • 1960s: The advent of databases and data management systems began with the development of hierarchical and network databases, such as IBM’s IMS (Information Management System). • 1970s: The introduction of relational databases by E.F. Codd at IBM, leading to the creation of the SQL language and the development of the first relational database management systems (RDBMS), like Oracle. • 1980s: Data warehousing concepts emerged, allowing organizations to aggregate data from various sources for analysis and reporting. 21IT401- BIG DATA ENGINEERING UNIT-I 7
  • 8. Evolution of Big Data 2. The Rise of the Internet and Digitalization (1990s) • 1990s: The proliferation of the internet and the digitalization of information led to exponential growth in data generation. E- commerce, email, and early web applications contributed significantly to data volumes. • Data mining techniques were developed to extract patterns and insights from large datasets. 21IT401- BIG DATA ENGINEERING UNIT-I 8
  • 9. Evolution of Big Data 3. Web 2.0 and the Explosion of User-Generated Content (2000s) • Early 2000s: The rise of Web 2.0 technologies, characterized by user-generated content, social media, and multimedia, resulted in an explosion of unstructured data. • 2004: The term "big data" began to gain traction, emphasizing the challenges associated with managing and processing vast amounts of diverse and rapidly growing data. • 2006: The introduction of Hadoop by Doug Cutting and Mike Cafarella, inspired by Google’s MapReduce and Google File System (GFS) papers, provided a scalable, distributed framework for processing large datasets. 21IT401- BIG DATA ENGINEERING UNIT-I 9
  • 10. Evolution of Big Data 4. Advancements in Big Data Technologies (2010s) • 2010s: Significant advancements in big data technologies and frameworks emerged, including Apache Spark, which offered faster data processing capabilities than Hadoop. • NoSQL databases, such as MongoDB, Cassandra, and HBase, were developed to handle unstructured and semi-structured data. • The cloud computing revolution provided scalable and cost- effective storage and processing solutions, with services like Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform becoming popular. • Machine learning and AI technologies advanced, enabling more sophisticated data analytics and predictive modeling. 21IT401- BIG DATA ENGINEERING UNIT-I 10
  • 11. Evolution of Big Data 5. Integration and Real-Time Analytics (Mid 2010s - Present) • The focus shifted to integrating big data with traditional enterprise data systems, leading to the rise of hybrid data architectures. • Real-time data processing and analytics became critical, with technologies like Apache Kafka enabling real-time data streaming and processing. • Data lakes emerged as a way to store vast amounts of raw data in its native format until needed for analysis. 21IT401- BIG DATA ENGINEERING UNIT-I 11
  • 12. Evolution of Big Data 6. Current Trends and Future Directions (2020s and Beyond) • 2020s: The convergence of big data with AI and machine learning has led to more intelligent and automated data analytics. • Edge computing is gaining traction, processing data closer to where it is generated to reduce latency and bandwidth usage. • Privacy, security, and ethical considerations have become paramount, driven by regulations like GDPR and CCPA. • The rise of DataOps (Data Operations) emphasizes the need for collaboration and automation in data management processes to improve the quality and speed of data analytics. • Quantum computing, though in its early stages, holds the potential to revolutionize data processing capabilities in the future. 21IT401- BIG DATA ENGINEERING UNIT-I 12
  • 13. Challenges with Big Data Big data refers to the vast volumes of structured and unstructured data generated at high velocity from a wide variety of sources. 1. Volume 2. Velocity 3. Variety 4. Veracity 5. Complexity 6. Scalability 7. Storage 21IT401- BIG DATA ENGINEERING UNIT-I 13
  • 14. Challenges with Big Data 8. Data Governance 9. Security 10. Data Integration 11. Data Analysis 12. Talent Gap 13. Cost 21IT401- BIG DATA ENGINEERING UNIT-I 14
  • 15. Strategies to Address Big Data Challenges: • Invest in scalable and flexible infrastructure: Use cloud- based solutions to handle data storage and processing needs. • Employ robust data governance frameworks: Implement policies and practices to ensure data quality, security, and compliance. • Use advanced analytics tools: Leverage machine learning and AI to derive actionable insights from big data. • Foster talent development: Invest in training and development programs to build a skilled workforce. • Adopt data integration platforms: Utilize ETL (Extract, Transform, Load) tools to streamline data integration from diverse sources. 21IT401- BIG DATA ENGINEERING UNIT-I 15
  • 16. State of practice in Analytics Current business problems provide many opportunities for organizations to become more analytical and data driven. • BI Versus Data Science • Current Analytical Architecture • Drivers of Big Data • Emerging Big Data Ecosystem and a New Approach to Analytics 21IT401- BIG DATA ENGINEERING UNIT-I 16
  • 17. State of practice in Analytics BI Versus Data Science 21IT401- BIG DATA ENGINEERING UNIT-I 17
  • 18. State of practice in Analytics BI Versus Data Science 21IT401- BIG DATA ENGINEERING UNIT-I 18
  • 19. State of practice in Analytics BI Versus Data Science 21IT401- BIG DATA ENGINEERING UNIT-I 19
  • 20. State of practice in Analytics Current Analytical Architecture 21IT401- BIG DATA ENGINEERING UNIT-I 20
  • 21. State of practice in Analytics Drivers of Big Data 21IT401- BIG DATA ENGINEERING UNIT-I 21 The data now comes from multiple sources, such as these: ● Medical information, such as genomic sequencing and diagnostic imaging ● Photos and video footage uploaded to the World Wide Web ● Video surveillance, such as the thousands of video cameras spread across a city ● Mobile devices, which provide geospatial location data of the users, as well as metadata about text messages, phone calls, and application usage on smart phones ● Smart devices, which provide sensor-based collection of information from smart electric grids, smart buildings, and many other public and industry infrastructures ● Nontraditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS navigation systems, and seismic processing
  • 22. State of practice in Analytics Drivers of Big Data 21IT401- BIG DATA ENGINEERING UNIT-I 22
  • 23. State of practice in Analytics Emerging Big Data Ecosystem and a New Approach to Analytics 21IT401- BIG DATA ENGINEERING UNIT-I 23
  • 24. Key roles for New Big Data Ecosystem 21IT401- BIG DATA ENGINEERING UNIT-I 24
  • 25. Key roles for New Big Data Ecosystem Deep Analytical Talent 21IT401- BIG DATA ENGINEERING UNIT-I 25 • This role is technically savvy, with strong analytical skills. • Members possess a combination of skills to handle raw, unstructured data and to apply complex analytical techniques at massive scales. • This group has advanced training in quantitative disciplines, such as mathematics, statistics, and machine learning. • Examples of current professions fitting into this group include statisticians, economists, mathematicians, and the new role of the Data Scientist.
  • 26. Key roles for New Big Data Ecosystem Data Savvy Professionals 21IT401- BIG DATA ENGINEERING UNIT-I 26 • It has less technical depth but has a basic knowledge of statistics or machine learning and can define key questions that can be answered using advanced analytics. • These people tend to have a base knowledge of working with data, or an appreciation for some of the work being performed by data scientists and others with deep analytical talent. • Examples of data savvy professionals include financial analysts, market research analysts, life scientists, operations managers, and business and functional managers
  • 27. Key roles for New Big Data Ecosystem Technology and Data Enablers 21IT401- BIG DATA ENGINEERING UNIT-I 27 • This group represents people providing technical expertise to support analytical projects • This role requires skills related to computer engineering, programming, and database administration.
  • 28. Key roles for New Big Data Ecosystem Profile of a Data Scientist 21IT401- BIG DATA ENGINEERING UNIT-I 28
  • 29. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 29 • The Data Analytics Lifecycle defines analytics process best practices spanning discovery to project completion. • Phase 1—Discovery • Phase 2—Data preparation: • Phase 3—Model planning: • Phase 4—Model building: • Phase 5—Communicate results: • Phase 6—Operationalize:
  • 30. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 30
  • 31. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 31 Phase 1—Discovery: In Phase 1, the team learns the business domain, including relevant history such as whether the organization or business unit has attempted similar projects in the past from which they can learn. The team assesses the resources available to support the project in terms of people, technology, time, and data.
  • 32. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 32 Phase 2—Data preparation: Phase 2 requires the presence of an analytic sandbox, in which the team can work with data and perform analytics for the duration of the project. The team needs to execute extract, load, and transform (ELT) or extract, transform and load (ETL) to get data into the sandbox. The ELT and ETL are sometimes abbreviated as ETLT.
  • 33. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 33 Phase 3—Model planning: Phase 3 is model planning, where the team determines the methods,techniques, and workflow it intends to follow for the subsequent model building phase. The team explores the data to learn about the relationships between variables and subsequently selects key variables and the most suitable models.
  • 34. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 34 Phase 4—Model building: In Phase 4, the team develops datasets for testing, training, and production purposes. In addition, in this phase the team builds and executes models based on the work done in the model planning phase.
  • 35. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 35 Phase 5—Communicate results: In Phase 5, the team, in collaboration with major stakeholders, determines if the results of the project are a success or a failure based on the criteria developed in Phase 1.
  • 36. Data Analytics Lifecycle Overview: 21IT401- BIG DATA ENGINEERING UNIT-I 36 Phase 6—Operationalize: In Phase 6, the team delivers final reports, briefings, code, and technical documents. In addition, the team may run a pilot project to implement the models in a production environment.
  • 37. Examples for Big Data Analytics • Retail and E-commerce – Customer behavior analysis – Recommendation systems – Inventory and supply chain optimization • Banking and Finance – Fraud detection – Risk analytics and credit scoring – Algorithmic trading • Healthcare – Patient diagnostics and treatment predictions – Real-time monitoring using IoT – Disease outbreak forecasting 21IT401- BIG DATA ENGINEERING UNIT-I 37
  • 38. Examples for Big Data Analytics • Telecommunications – Churn prediction – Network optimization – Customer segmentation • Manufacturing – Predictive maintenance – Process optimization – Quality control using sensor data • Government – Crime pattern analysis – Traffic and transportation analytics – Smart city planning 21IT401- BIG DATA ENGINEERING UNIT-I 38
  • 39. Examples for Big Data Analytics • Energy – Smart grid data analysis – Forecasting energy demand – Equipment failure prediction • Social Media – Sentiment analysis – Trend analysis – Targeted advertising 21IT401- BIG DATA ENGINEERING UNIT-I 39