SlideShare a Scribd company logo
Course – Big Data Analytics (Professional Elective-II)
Course code-IT314B
Unit-II- BIG DATA ANALYTICS LIFE CYCLE
Sanjivani Rural Education Society’s
Sanjivani College of Engineering, Kopargaon-423603
(An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune)
NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified
Department of Information Technology
(NBA Accredited)
Mr. Rajendra N Kankrale
Asst. Prof.
1
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Unit-I BIG DATA ANALYTICS LIFE CYCLE
• Syllabus
• Introduction to Big Data, sources of Big Data, Data Analytic Lifecycle:
Introduction, Phase 1: Discovery, Phase 2: Data Preparation, Phase 3: Model
Planning, Phase 4: Model Building, Phase 5: Communication results, Phase 6:
Operationalize.
2
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Unit-I BIG DATA ANALYTICS LIFE CYCLE
1. Why Big Data analytics?
2. What is Big Data analytics?
3. Lifecycle of Big Data analytics
4. Types of Big Data analytics
5. Tools used in Big Data analytics
6. Big Data application domains
3
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Why Big Data analytics?
Take the music streaming platform Spotify for example. The company has nearly 96
million users that generate a tremendous amount of data every day. Through this
information, the cloud-based platform automatically generates suggested songs—
through a smart recommendation engine—based on likes, shares, search history, and
more. What enables this is the techniques, tools, and frameworks that are a result of
Big Data analytics.
If you are a Spotify user, then you must have come across the top recommendation
section, which is based on your likes, past history, and other things. Utilizing a
recommendation engine that leverages data filtering tools that collect data and then
filter it using algorithms works. This is what Spotify does.
4
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Uses and Examples of Big Data Analytics
There are many different ways that Big Data analytics can be used in order to improve
businesses and organizations. Here are some examples:
• Using analytics to understand customer behavior in order to optimize the customer
experience
• Predicting future trends in order to make better business decisions
• Improving marketing campaigns by understanding what works and what doesn't
• Increasing operational efficiency by understanding where bottlenecks are and how
to fix them
• Detecting fraud and other forms of misuse sooner
These are just a few examples — the possibilities are really endless when it comes to
Big Data analytics. It all depends on how you want to use it in order to improve your
business.
5
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
What is Big Data analytics?
• What is Big Data?
• Big Data is a massive amount of data sets that cannot be stored, processed, or
analyzed using traditional tools.
• Big Data analytics is a process used to extract meaningful insights, such as
hidden patterns, unknown correlations, market trends, and customer
preferences. Big Data analytics provides various advantages—it can be used
for better decision making, preventing fraudulent activities, among other
things.
6
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Big Data sources
• Today, there are millions of data sources that generate data at a very rapid
rate. These data sources are present across the world. Some of the largest
sources of data are social media platforms and networks. Let’s use
Facebook as an example—it generates more than 500 terabytes of data
every day. This data includes pictures, videos, messages, and more.
• Data also exists in different formats, like structured data, semi-structured
data, and unstructured data. For example, in a regular Excel sheet, data is
classified as structured data—with a definite format. In contrast, emails fall
under semi-structured, and your pictures and videos fall under unstructured
data. All this data combined makes up Big Data.
7
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Types of Big Data analytics
The following are the four types of big data analytics:
1. Prescriptive Analytics- (What is the solution?)
2. Diagnostic Analytics- (why did happened?)
3. Predictive Analytics- (What will happen?)
4. Descriptive Analytics- (What has happened ?)
8
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Apache Spark: Spark is a framework for real-time data analytics, which
is a part of the Hadoop ecosystem.
• Python: Python is one of the most versatile programming languages that
is rapidly being deployed for various applications including machine
learning.
• SAS: SAS is an advanced analytical tool that is used for working with large
volumes of data and deriving valuable insights from it.
9
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Hadoop: Hadoop is the most popular big data framework that is
deployed by a wide range of organizations from around the world for
making sense of big data.
• SQL: SQL is used for working with relational database management
systems.
• Tableau: Tableau is the most popular business intelligence tool that is
deployed for the purpose of data visualization and business analytics.
10
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
• Splunk: Splunk is the tool of choice for parsing machine-generated data
and deriving valuable business insights out of it.
• R: R is the no. 1 programming language that is being used by data
scientists for statistical computing and graphical applications alike.
11
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
Cassandra
APACHE Cassandra is an open-source NoSQL distributed database that is used to
fetch large amounts of data. It’s one of the most popular tools for data analytics and
has been praised by many tech companies due to its high scalability and availability
without compromising speed and performance. It is capable of delivering thousands of
operations every second and can handle petabytes of resources with almost zero
downtime. It was created by Facebook back in 2008 and was published publicly.
12
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Tools used in Big Data analytics
Apache Storm
A storm is a robust, user-friendly tool used for data analytics, especially in
small companies. The best part about the storm is that it has no language
barrier (programming) in it and can support any of them. It was designed to
handle a pool of large data in fault-tolerance and horizontally scalable
methods. When we talk about real-time data processing, Storm leads the chart
because of its distributed real-time big data processing system, due to which
today many tech giants are using APACHE Storm in their system. Some of the
most notable names are Twitter, Zendesk, NaviSite, etc.
13
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Applications of Big Data Analytics
• Customer Acquisition and Retention: Customer information helps tremendously
in marketing trends, through data-driven actions, to increase customer satisfaction.
For example, personalization engines for Netflix, Amazon, and Spotify help with
improved customer experiences and gaining customer loyalty.
• Targeted Ads: Personalized data about interaction patterns, order history, and
product page viewing history can help immensely to create targeted ad campaigns
for customers on a larger scale and at the individual level.
14
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Applications of Big Data Analytics
• Product Development: It can generate insights on development decisions, product
viability, performance measurements, etc., and direct improvements that positively
serve the customers.
• Price Optimization: Pricing models can be modeled and used by retailers with the
help of diverse data sources to maximize revenues.
• Supply Chain and Channel Analytics: Predictive analytical models help with
B2B supplier networks, preemptive replenishment, route optimizations, inventory
management, and notification of potential delays in deliveries.
• Risk Management: It helps in the identification of new risks with the help of data
patterns for the purpose of developing effective risk management strategies.
• Improved Decision-making: The insights that are extracted from the data can help
enterprises make sound and quick decisions.
15
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Examples/Areas Using Big Data Analytics Tools
• Healthcare: Big data analytics technologies and tools are being used in healthcare
to predict patient outcomes, identify at-risk patients, and improve population health.
• Retail: Big data analytics tools are being used by retailers to improve customer
experience, target marketing campaigns, and prevent fraud.
• Manufacturing: Big data analytics tools are being used in manufacturing to
improve quality control, reduce downtime, and optimize production processes.
• Banking: Real time big data analytics tools are being used by banks to detect
fraudulent activities, prevent money laundering, and improve customer service.
• Government: Big data analytics tools are being used by government agencies to
improve public services, combat fraud and corruption, and better understand citizen
needs.
16
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• The Data analytics lifecycle was designed to address Big Data problems and data
science projects. The process is repeated to show the real projects. To address the
specific demands for conducting analysis on Big Data, the step-by-step
methodology is required to plan the various tasks associated with the acquisition,
processing, analysis, and recycling of data.
17
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
18
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 1: Discovery –
• The data science team is trained and researches the issue.
• Create context and gain understanding.
• Learn about the data sources that are needed and accessible to the project.
• The team comes up with an initial hypothesis, which can be later confirmed
with evidence.
19
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 1: Discovery –
• The first phase of the Data Analytics Lifecycle is the data discovery step. This
stage involves identifying potential data sources, both internal and external,
that are relevant to the business problem at hand. It is essential to define the
scope of the analysis and gather data from various databases, applications, and
online repositories. Data can come in different formats, including structured,
unstructured, and semi-structured data.
• The key to success in this phase is to ensure the data collected is accurate,
relevant, and comprehensive. Missing or flawed data can lead to misleading
insights and decisions down the line. Rigorous data quality checks and
validation procedures are necessary to maintain data integrity.
20
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Once the data is collected, it is crucial to clean and preprocess it before
analysis. Data preparation involves identifying and rectifying errors,
duplications, and inconsistencies in the dataset. This process ensures that the
data is of high quality and ready for further analysis.
• Data preprocessing tasks may include data transformation, normalisation, and
handling missing values. Cleaning and preprocessing are time-consuming but
vital steps that significantly impact the accuracy and reliability of the final
results. Proper data preprocessing can also help in dealing with noise and
irrelevant data, leading to better outcomes.
21
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Methods to investigate the possibilities of pre-processing, analysing, and
preparing data before analysis and modelling.
• It is required to have an analytic sandbox. The team performs, loads, and
transforms to bring information to the data sandbox.
• Data preparation tasks can be repeated and not in a predetermined sequence.
• Some of the tools used commonly for this process include - Hadoop, Alpine
Miner, Open Refine, etc.-
22
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 2: Data Preparation -
• Data preparation and processing involves gathering, sorting, processing and
purifying collected information to make sure it can be utilized by subsequent
steps of analysis.
• Data Collection: Draw information from external sources.
• Data Entry: Within an organization, data entry refers to creating new points of
information using either digital technologies or manual input procedures.
• Signal Reception: Accumulating data from digital devices like the Internet of
Things devices and control systems.
• An analytical sandbox is essential during the data preparation stage of data
analytics Life Cycle. This scalable platform is used by data analysts and
scientists alike for processing their data sets; once executed, loaded, or altered
it resides securely inside this sandbox for later examination and modification.
23
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 3: Model Planning -
• The team studies data to discover the connections between variables. Later, it
selects the most significant variables as well as the most effective models.
• In this phase, the data science teams create data sets that can be used for
training for testing, production, and training goals.
• The team builds and implements models based on the work completed in the
modelling planning phase.
• Some of the tools used commonly for this stage are MATLAB and
STASTICA.
24
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 3: Model Planning -
• The team studies data to discover the connections between variables. Later, it
selects the most significant variables as well as the most effective models.
• In this phase, the data science teams create data sets that can be used for
training for testing, production, and training goals.
• The team builds and implements models based on the work completed in the
modelling planning phase.
• Some of the tools used commonly for this stage are MATLAB and
STASTICA.
25
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 4: Model Building -
• The team creates datasets for training, testing as well as production use.
• The team is also evaluating whether its current tools are sufficient to run the
models or if they require an even more robust environment to run models.
• Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA.
• Commercial tools - MATLAB, STASTICA.
26
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 5: Communication Results -
• Following the execution of the model, team members will need to evaluate the
outcomes of the model to establish criteria for the success or failure of the
model.
• The team is considering how best to present findings and outcomes to the
various members of the team and other stakeholders while taking into
consideration cautionary tales and assumptions.
• The team should determine the most important findings, quantify their value to
the business and create a narrative to present findings and summarize them to
all stakeholders.
27
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
• Life Cycle of Data Analytics
• Phase 6: Operationalize -
• The team distributes the benefits of the project to a wider audience. It sets up a
pilot project that will deploy the work in a controlled manner prior to
expanding the project to the entire enterprise of users.
• This technique allows the team to gain insight into the performance and
constraints related to the model within a production setting at a small scale and
then make necessary adjustments before full deployment.
• The team produces the last reports, presentations, and codes.
• Open source or free tools such as WEKA, SQL, MADlib, and Octave.
28
BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT
Lifecycle of Big Data analytics
The Big Data Analytics Life cycle is divided into nine phases, named as :
1. Business Case/Problem Definition
2. Data Identification
3. Data Acquisition and filtration
4. Data Extraction
5. Data Munging(Validation and Cleaning)
6. Data Aggregation & Representation(Storage)
7. Exploratory Data Analysis
8. Data Visualization(Preparation for Modeling and Assessment)
9. Utilization of analysis results.
29

More Related Content

PPTX
Big data analytics
PDF
Comprehensive Notes on Big Data Concepts and Applications Based on University...
PPTX
Exploring the impact and evolution of Advanced Analytics Tools.pptx
PDF
Agile data science
PPTX
Big Data in Business Application use case and benefits
PPTX
Introduction to data analytics - Intro to Data Analytics
PPTX
L3 Big Data and Application.pptx
PDF
bda-unit-bda-unit-materail big data1.pdf
Big data analytics
Comprehensive Notes on Big Data Concepts and Applications Based on University...
Exploring the impact and evolution of Advanced Analytics Tools.pptx
Agile data science
Big Data in Business Application use case and benefits
Introduction to data analytics - Intro to Data Analytics
L3 Big Data and Application.pptx
bda-unit-bda-unit-materail big data1.pdf

Similar to Unit-I_Big data life cycle.pptx, sources of Big Data (20)

PDF
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
DOCX
Big data - The next best thing
PDF
Big Data analytics usage
PPTX
final oracle presentation
PDF
Big data Analytics
PPTX
KIT601 Unit I.pptx
PDF
Big data analytics, research report
PPTX
Big data and Predictive Analytics By : Professor Lili Saghafi
PPTX
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
PDF
Big Data, Big Thinking: Untapped Opportunities
PDF
Lecture3 business intelligence
PPTX
Introduction to Big Data
PDF
Introduction to Data Analytics, AKTU - UNIT-1
PPTX
BIG DATA CHAPTER 2 IN DSS.pptx
PPTX
Big Data Developer Career Path: Job & Interview Preparation
PPTX
Top Big data Analytics tools: Emerging trends and Best practices
PPTX
Modern Analytics And The Future Of Quality And Performance Excellence
PPTX
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
DOCX
A study on web analytics with reference to select sports websites
PDF
Big Data Analytics M1.pdf big data analytics
Big Data Intoduction & Hadoop ArchitectureModule1.pdf
Big data - The next best thing
Big Data analytics usage
final oracle presentation
Big data Analytics
KIT601 Unit I.pptx
Big data analytics, research report
Big data and Predictive Analytics By : Professor Lili Saghafi
March Towards Big Data - Big Data Implementation, Migration, Ingestion, Manag...
Big Data, Big Thinking: Untapped Opportunities
Lecture3 business intelligence
Introduction to Big Data
Introduction to Data Analytics, AKTU - UNIT-1
BIG DATA CHAPTER 2 IN DSS.pptx
Big Data Developer Career Path: Job & Interview Preparation
Top Big data Analytics tools: Emerging trends and Best practices
Modern Analytics And The Future Of Quality And Performance Excellence
Lecture 1.13 & 1.14 &1.15_Business Profiles in Big Data.pptx
A study on web analytics with reference to select sports websites
Big Data Analytics M1.pdf big data analytics
Ad

More from RajendraKankrale1 (7)

PPTX
5.Transaction Management and concurrency Control
PPTX
PL_SQL, Trigger, Cursor, Stored procedure ,function
PPT
UNIT 2 relational algebra and Structured Query Language
PPT
UNIT 1 ER Model 2025 -Entity Relationship (ER) Diagram
PPTX
HADOOP ECO SYSTEM Pig: Introduction to PIG, Execution Modes of Pig, Comp...
PPTX
INTRODUCTION TO APACHE HADOOP AND MAPREDUCE
PPTX
Unit2_Regression, ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON
5.Transaction Management and concurrency Control
PL_SQL, Trigger, Cursor, Stored procedure ,function
UNIT 2 relational algebra and Structured Query Language
UNIT 1 ER Model 2025 -Entity Relationship (ER) Diagram
HADOOP ECO SYSTEM Pig: Introduction to PIG, Execution Modes of Pig, Comp...
INTRODUCTION TO APACHE HADOOP AND MAPREDUCE
Unit2_Regression, ADVANCED ANALYTICAL THEORY AND METHODS USING PYTHON
Ad

Recently uploaded (20)

PPTX
web development for engineering and engineering
PDF
Digital Logic Computer Design lecture notes
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
PDF
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PPTX
Geodesy 1.pptx...............................................
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PDF
composite construction of structures.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PPTX
bas. eng. economics group 4 presentation 1.pptx
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
PPTX
additive manufacturing of ss316l using mig welding
PPTX
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
PPTX
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
PDF
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT
web development for engineering and engineering
Digital Logic Computer Design lecture notes
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
UNIT 4 Total Quality Management .pptx
Infosys Presentation by1.Riyan Bagwan 2.Samadhan Naiknavare 3.Gaurav Shinde 4...
July 2025 - Top 10 Read Articles in International Journal of Software Enginee...
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Geodesy 1.pptx...............................................
CYBER-CRIMES AND SECURITY A guide to understanding
composite construction of structures.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
Foundation to blockchain - A guide to Blockchain Tech
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
bas. eng. economics group 4 presentation 1.pptx
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
additive manufacturing of ss316l using mig welding
M Tech Sem 1 Civil Engineering Environmental Sciences.pptx
IOT PPTs Week 10 Lecture Material.pptx of NPTEL Smart Cities contd
SM_6th-Sem__Cse_Internet-of-Things.pdf IOT

Unit-I_Big data life cycle.pptx, sources of Big Data

  • 1. Course – Big Data Analytics (Professional Elective-II) Course code-IT314B Unit-II- BIG DATA ANALYTICS LIFE CYCLE Sanjivani Rural Education Society’s Sanjivani College of Engineering, Kopargaon-423603 (An Autonomous Institute Affiliated to Savitribai Phule Pune University, Pune) NAAC ‘A’ Grade Accredited, ISO 9001:2015 Certified Department of Information Technology (NBA Accredited) Mr. Rajendra N Kankrale Asst. Prof. 1
  • 2. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Unit-I BIG DATA ANALYTICS LIFE CYCLE • Syllabus • Introduction to Big Data, sources of Big Data, Data Analytic Lifecycle: Introduction, Phase 1: Discovery, Phase 2: Data Preparation, Phase 3: Model Planning, Phase 4: Model Building, Phase 5: Communication results, Phase 6: Operationalize. 2
  • 3. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Unit-I BIG DATA ANALYTICS LIFE CYCLE 1. Why Big Data analytics? 2. What is Big Data analytics? 3. Lifecycle of Big Data analytics 4. Types of Big Data analytics 5. Tools used in Big Data analytics 6. Big Data application domains 3
  • 4. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Why Big Data analytics? Take the music streaming platform Spotify for example. The company has nearly 96 million users that generate a tremendous amount of data every day. Through this information, the cloud-based platform automatically generates suggested songs— through a smart recommendation engine—based on likes, shares, search history, and more. What enables this is the techniques, tools, and frameworks that are a result of Big Data analytics. If you are a Spotify user, then you must have come across the top recommendation section, which is based on your likes, past history, and other things. Utilizing a recommendation engine that leverages data filtering tools that collect data and then filter it using algorithms works. This is what Spotify does. 4
  • 5. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Uses and Examples of Big Data Analytics There are many different ways that Big Data analytics can be used in order to improve businesses and organizations. Here are some examples: • Using analytics to understand customer behavior in order to optimize the customer experience • Predicting future trends in order to make better business decisions • Improving marketing campaigns by understanding what works and what doesn't • Increasing operational efficiency by understanding where bottlenecks are and how to fix them • Detecting fraud and other forms of misuse sooner These are just a few examples — the possibilities are really endless when it comes to Big Data analytics. It all depends on how you want to use it in order to improve your business. 5
  • 6. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT What is Big Data analytics? • What is Big Data? • Big Data is a massive amount of data sets that cannot be stored, processed, or analyzed using traditional tools. • Big Data analytics is a process used to extract meaningful insights, such as hidden patterns, unknown correlations, market trends, and customer preferences. Big Data analytics provides various advantages—it can be used for better decision making, preventing fraudulent activities, among other things. 6
  • 7. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Big Data sources • Today, there are millions of data sources that generate data at a very rapid rate. These data sources are present across the world. Some of the largest sources of data are social media platforms and networks. Let’s use Facebook as an example—it generates more than 500 terabytes of data every day. This data includes pictures, videos, messages, and more. • Data also exists in different formats, like structured data, semi-structured data, and unstructured data. For example, in a regular Excel sheet, data is classified as structured data—with a definite format. In contrast, emails fall under semi-structured, and your pictures and videos fall under unstructured data. All this data combined makes up Big Data. 7
  • 8. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Types of Big Data analytics The following are the four types of big data analytics: 1. Prescriptive Analytics- (What is the solution?) 2. Diagnostic Analytics- (why did happened?) 3. Predictive Analytics- (What will happen?) 4. Descriptive Analytics- (What has happened ?) 8
  • 9. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Apache Spark: Spark is a framework for real-time data analytics, which is a part of the Hadoop ecosystem. • Python: Python is one of the most versatile programming languages that is rapidly being deployed for various applications including machine learning. • SAS: SAS is an advanced analytical tool that is used for working with large volumes of data and deriving valuable insights from it. 9
  • 10. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Hadoop: Hadoop is the most popular big data framework that is deployed by a wide range of organizations from around the world for making sense of big data. • SQL: SQL is used for working with relational database management systems. • Tableau: Tableau is the most popular business intelligence tool that is deployed for the purpose of data visualization and business analytics. 10
  • 11. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics • Splunk: Splunk is the tool of choice for parsing machine-generated data and deriving valuable business insights out of it. • R: R is the no. 1 programming language that is being used by data scientists for statistical computing and graphical applications alike. 11
  • 12. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics Cassandra APACHE Cassandra is an open-source NoSQL distributed database that is used to fetch large amounts of data. It’s one of the most popular tools for data analytics and has been praised by many tech companies due to its high scalability and availability without compromising speed and performance. It is capable of delivering thousands of operations every second and can handle petabytes of resources with almost zero downtime. It was created by Facebook back in 2008 and was published publicly. 12
  • 13. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Tools used in Big Data analytics Apache Storm A storm is a robust, user-friendly tool used for data analytics, especially in small companies. The best part about the storm is that it has no language barrier (programming) in it and can support any of them. It was designed to handle a pool of large data in fault-tolerance and horizontally scalable methods. When we talk about real-time data processing, Storm leads the chart because of its distributed real-time big data processing system, due to which today many tech giants are using APACHE Storm in their system. Some of the most notable names are Twitter, Zendesk, NaviSite, etc. 13
  • 14. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Applications of Big Data Analytics • Customer Acquisition and Retention: Customer information helps tremendously in marketing trends, through data-driven actions, to increase customer satisfaction. For example, personalization engines for Netflix, Amazon, and Spotify help with improved customer experiences and gaining customer loyalty. • Targeted Ads: Personalized data about interaction patterns, order history, and product page viewing history can help immensely to create targeted ad campaigns for customers on a larger scale and at the individual level. 14
  • 15. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Applications of Big Data Analytics • Product Development: It can generate insights on development decisions, product viability, performance measurements, etc., and direct improvements that positively serve the customers. • Price Optimization: Pricing models can be modeled and used by retailers with the help of diverse data sources to maximize revenues. • Supply Chain and Channel Analytics: Predictive analytical models help with B2B supplier networks, preemptive replenishment, route optimizations, inventory management, and notification of potential delays in deliveries. • Risk Management: It helps in the identification of new risks with the help of data patterns for the purpose of developing effective risk management strategies. • Improved Decision-making: The insights that are extracted from the data can help enterprises make sound and quick decisions. 15
  • 16. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Examples/Areas Using Big Data Analytics Tools • Healthcare: Big data analytics technologies and tools are being used in healthcare to predict patient outcomes, identify at-risk patients, and improve population health. • Retail: Big data analytics tools are being used by retailers to improve customer experience, target marketing campaigns, and prevent fraud. • Manufacturing: Big data analytics tools are being used in manufacturing to improve quality control, reduce downtime, and optimize production processes. • Banking: Real time big data analytics tools are being used by banks to detect fraudulent activities, prevent money laundering, and improve customer service. • Government: Big data analytics tools are being used by government agencies to improve public services, combat fraud and corruption, and better understand citizen needs. 16
  • 17. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • The Data analytics lifecycle was designed to address Big Data problems and data science projects. The process is repeated to show the real projects. To address the specific demands for conducting analysis on Big Data, the step-by-step methodology is required to plan the various tasks associated with the acquisition, processing, analysis, and recycling of data. 17
  • 18. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics 18
  • 19. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 1: Discovery – • The data science team is trained and researches the issue. • Create context and gain understanding. • Learn about the data sources that are needed and accessible to the project. • The team comes up with an initial hypothesis, which can be later confirmed with evidence. 19
  • 20. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 1: Discovery – • The first phase of the Data Analytics Lifecycle is the data discovery step. This stage involves identifying potential data sources, both internal and external, that are relevant to the business problem at hand. It is essential to define the scope of the analysis and gather data from various databases, applications, and online repositories. Data can come in different formats, including structured, unstructured, and semi-structured data. • The key to success in this phase is to ensure the data collected is accurate, relevant, and comprehensive. Missing or flawed data can lead to misleading insights and decisions down the line. Rigorous data quality checks and validation procedures are necessary to maintain data integrity. 20
  • 21. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Once the data is collected, it is crucial to clean and preprocess it before analysis. Data preparation involves identifying and rectifying errors, duplications, and inconsistencies in the dataset. This process ensures that the data is of high quality and ready for further analysis. • Data preprocessing tasks may include data transformation, normalisation, and handling missing values. Cleaning and preprocessing are time-consuming but vital steps that significantly impact the accuracy and reliability of the final results. Proper data preprocessing can also help in dealing with noise and irrelevant data, leading to better outcomes. 21
  • 22. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Methods to investigate the possibilities of pre-processing, analysing, and preparing data before analysis and modelling. • It is required to have an analytic sandbox. The team performs, loads, and transforms to bring information to the data sandbox. • Data preparation tasks can be repeated and not in a predetermined sequence. • Some of the tools used commonly for this process include - Hadoop, Alpine Miner, Open Refine, etc.- 22
  • 23. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 2: Data Preparation - • Data preparation and processing involves gathering, sorting, processing and purifying collected information to make sure it can be utilized by subsequent steps of analysis. • Data Collection: Draw information from external sources. • Data Entry: Within an organization, data entry refers to creating new points of information using either digital technologies or manual input procedures. • Signal Reception: Accumulating data from digital devices like the Internet of Things devices and control systems. • An analytical sandbox is essential during the data preparation stage of data analytics Life Cycle. This scalable platform is used by data analysts and scientists alike for processing their data sets; once executed, loaded, or altered it resides securely inside this sandbox for later examination and modification. 23
  • 24. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 3: Model Planning - • The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. • In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. • The team builds and implements models based on the work completed in the modelling planning phase. • Some of the tools used commonly for this stage are MATLAB and STASTICA. 24
  • 25. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 3: Model Planning - • The team studies data to discover the connections between variables. Later, it selects the most significant variables as well as the most effective models. • In this phase, the data science teams create data sets that can be used for training for testing, production, and training goals. • The team builds and implements models based on the work completed in the modelling planning phase. • Some of the tools used commonly for this stage are MATLAB and STASTICA. 25
  • 26. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 4: Model Building - • The team creates datasets for training, testing as well as production use. • The team is also evaluating whether its current tools are sufficient to run the models or if they require an even more robust environment to run models. • Tools that are free or open-source or free tools Rand PL/R, Octave, WEKA. • Commercial tools - MATLAB, STASTICA. 26
  • 27. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 5: Communication Results - • Following the execution of the model, team members will need to evaluate the outcomes of the model to establish criteria for the success or failure of the model. • The team is considering how best to present findings and outcomes to the various members of the team and other stakeholders while taking into consideration cautionary tales and assumptions. • The team should determine the most important findings, quantify their value to the business and create a narrative to present findings and summarize them to all stakeholders. 27
  • 28. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT • Life Cycle of Data Analytics • Phase 6: Operationalize - • The team distributes the benefits of the project to a wider audience. It sets up a pilot project that will deploy the work in a controlled manner prior to expanding the project to the entire enterprise of users. • This technique allows the team to gain insight into the performance and constraints related to the model within a production setting at a small scale and then make necessary adjustments before full deployment. • The team produces the last reports, presentations, and codes. • Open source or free tools such as WEKA, SQL, MADlib, and Octave. 28
  • 29. BDA- Unit-I BIG DATA ANALYTICS LIFE CYCLE Department of IT Lifecycle of Big Data analytics The Big Data Analytics Life cycle is divided into nine phases, named as : 1. Business Case/Problem Definition 2. Data Identification 3. Data Acquisition and filtration 4. Data Extraction 5. Data Munging(Validation and Cleaning) 6. Data Aggregation & Representation(Storage) 7. Exploratory Data Analysis 8. Data Visualization(Preparation for Modeling and Assessment) 9. Utilization of analysis results. 29