SlideShare a Scribd company logo
Unit - 1
What is data?
 Dictionary Definition:
The quantities, characters, or symbols on which
operations are performed by a computer, which may
be stored and transmitted in the form of electrical
signals and recorded on magnetic (audio tape), optical
(CD), or mechanical recording media (Phonographic
disc)
What is big data?
 It is a collection of data that is huge in volume and yet
growing exponentially with time.
 It is a data with so large size and complexity that none
of traditional data management tools can store it or
process it efficiently.
 Big data is also a data but with huge size.
Definition
 Big data is high-volume, and or / high velocity
information assets that demand cost-effective,
innovative forms of information processing for
enhanced insight and decision making.
 - Gartner IT Glossary
 The huge data is of the order of tera (10^12)bytes, Peta
(10^15) bytes or Zeta (10^21) bytes.
Explanation of Big Data definition
Big Data Definition
Why is Big Data important?
 Using the data from any source and analyzing it, we
can find answers that
 Streamline resource management
 Improve operational efficiencies
 Optimize product development
 Drive new revenue and growth opportunities
 Enable smart decision making
Big data enables to accomplish
business related tasks
 Determine the root causes of failures, issues and
defects in near- real time (industrial usage)
 Spotting anomalies faster and more accurately than
human eye (healthcare usage)
 Recalculating entire risk portfolios in minutes
(investment / finance sector)
 Detect fraudulent behavior before it affects your
organization.
Some examples of big data
 The NYSE generates about one terabyte of new trade
data per day.
 55 billion messages and 4.5 billion photos are
exchanged on wattsapp every day
 300 hours of video is uploaded every minute
 Every minute user sends 31.25 million messages and
watch 2.77 million videos
 There are around 40,000 search queries googled each
second.
Types of Big Data
 Structured
 Semi-Structured
 Unstructured
Types of Big Data
Big Data
Unstructured (80%)
Structured (10%)
Semi Structured (10%)
Un-structured Data
10 Characteristics of Big Data
Volume
Data Size
Velocity
Speed at which
data is generated
Variety
Different types of
data
Variability
Dynamic Evolving
behavior in data
science
Values
Getting value
out of data
Visualization
Data clustering,
sunbursts
Vulnerability
Security Concerns
Volatility
Data governance
Validity
Data quality check
Veracity
Confidence or Trust in
data
Data accuracy
The 10V’s
of Big data
Big Data Analytics_Unit1.pptx
Unstructured data for Analytics
 Business Documents
 Emails
 Social Media
 Customer feedback
 Webpages
 Open-ended survey responses
 Images, Audio and Video
 Importance of unstructured data analysis for businesses:
 Improve the customer experience
 Discover gaps in the market and innovate
 Listen to your customers
How to Analyze unstructured Data
 Choose the End Goal
 Define a clear set of measurable goals.
 Collect Relevant Data
 Focus on the source of data
 Clean Data
 Reduce noise
 Eliminate unwanted information
 Implement Technology
 NoSQL databases
 Data visualization using Tableau, Google data studio
Unstructured Data Analytics
Dealing
with
unstructure
d data
Data
Mining
• Association Rule
mining
• Regression Analysis
• Collaborative filtering
Text
Mining
NLP
Noisy
Text
Analysis
Unstructured Data Analytics Tools
 MonkeyLearn – Used for Text Analytics.
 This tool makes it simple to clean, label and visualize customer
feedback
 Word Clouds - textual data visualization which allows anyone to
see in a single glance the words which have the highest frequency
within a given body of text
 Listen to customer’s voice – open surveys and emails.
 Aspect is based on sentiment analysis
 Amazon AWS
 Microsoft Azure
 IBM Cloud
The Advantages of Deploying Big
Data
 Better Decision making
 Cost Reduction
 Newer Products and redevelopment of the old
 Risk Analysis
 Collection of Data
Industries using Big Data
 CA technology have done a global study in which clearly the benefits of
Big data outweigh the obstacles in implementation
 The percentage of organizations that plan to and already have
implemented a big data project is 84%
 Acquision has increased to 54%, revenue has improved by 88%.
 hiQ: It specializes in ‘people analytics’.
 SumAI: Helps businesses optimize their social media campaigns with
the help of one single chart.
 Splunk: Visual analytics
 Alteryx: Combines structured and unstructured data from a number
of sources and stores it in one database. Spacial, predictive and
statistical analysis tasks are done on this data.
How big data is used in different
industries
 Media and entertainment:
 Companies like Hulu and Netflix work with big data to
analyze user tendencies, preferred content, trends in
consumption.
 Lot of services like spotify are coming up with new
revenue models to increase profits
 Ads are targeted more strategically thanks to big data
analytics software.
Finance
 Shift from Manual trading to trading backed by
technology
 These models analyze big data to make
 accurate enter / exit trade decisions,
 minimize risk using machine learning and
 guage market sentiment using opinion mining
Healthcare
 With predictive analytics , big data can predict
negative health events that senior citizens would
experience from home care.
 This reduced visits by 73% and 64% amongst
chronically ill patients
 Big data can identify disease trends based on
demographics, geographies, socio economics and
other factors
Education
 Improve learning management. Tracking how much
time learners spend on tasks, tests, and exams helps to
customize curricula efficiently.
 Improve students’ performance. Leveraging data about
learners’ performance helps educators develop
personalized learning paths.
 Provide data-driven decision-making.
 Predict learning outcomes.
 Use big data to reduce dropout rates
Retail
 Enhance Service Quality
 Optimize Price
 Manage Supply Chain
 Identify Potential Risks
 Forecast demand
Manufacturing
 Quality Assurance
 Supply Chain Optimization
 Improving Throughput and Yield
 Less downtime
 Greater Customer Service
Big Data Challenges
Big Data Challenges
 1. Lack of Knowledge Professionals
 To run these large data tools, companies need skilled
data professionals. (data scientists, data analysts and
data engineers)
 Solution : Big data tools are used by professionals who are not
data science experts but have the basic knowledge.
 This saves a lot of money for the companies.
 2. Lack of proper understanding of Massive Data
 Employees not knowing how to store sensitive data.
 Solution: Data workshops and seminars must be held at
companies for everybody.
Big Data Challenges
 3. Data Growth Issues:
 One of the biggest challenges is to store the huge data.
 Solution: Compression is used to reduce the size of data
stored.
 De-duplication removes the duplicate and unwanted data
 Data Tiering stores data in different data tiers.(public clouds,
private cloud and flash storage)
 4. Confusion while Big data tool selection
 Companies are confused on which tool to select for Data
analysis and storage? Hbase, Cassandra etc.
 Solution: Hire experts who know which tools to use.
 5. Integrating data from a spread of sources
 Data in corporation comes from various sources like
social media pages, ERP applications, customer logs etc.
 Solution: Data integration problems are solved by purchasing
proper tools.
 6. Securing Data
 Companies can lost a lot of revenue due to a stolen
record or a knowledge breach.
 Solution: cyber-security professionals guard their data. Other
steps include encryption, identity and access control,
implementation of end point security real-time security
monitoring
Big Data Challenges
Assignment Questions
1. What is Big Data ?
2. Explain the types of data. Also briefly mention the sources
of each types of data along with examples.
3. Why is big data important. How does it help businesses
and briefly describe its usage across various
domains(industry, retail, healthcare, manufacturing,
education …)
4. Briefly describe the characteristics of big data.
5. Describe the types of analytics and mention the sources of
unstructured data used in analytics.
6. Mention some tools used in analytics
7. Discuss the Big Data challenges briefly

More Related Content

PDF
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
PPTX
Chapter 4 : Introduction to BigData.pptx
PDF
Embracing data science
PPTX
basic of data science and big data......
PDF
What is Big Data - Edvicon
DOCX
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
Nuestar "Big Data Cloud" Major Data Center Technology nuestarmobilemarketing...
Chapter 4 : Introduction to BigData.pptx
Embracing data science
basic of data science and big data......
What is Big Data - Edvicon
Project 3 – Hollywood and IT· Find 10 incidents of Hollywood p.docx
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...

Similar to Big Data Analytics_Unit1.pptx (20)

PDF
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
PDF
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
PPTX
Identify and analyze the greatest insights from big data
PPTX
big data on science of analytics and innovativeness among udergraduate studen...
PPTX
big data on science of analytics and innovativeness among udergraduate studen...
PPTX
Bigdata Hadoop introduction
DOCX
Introduction to big data – convergences.
PDF
Introduction to Data Science: data science process
PDF
Know The What, Why, and How of Big Data_.pdf
DOCX
Big data (word file)
PDF
Introduction to visualizing Big Data
PDF
IRJET- Big Data Management and Growth Enhancement
PPTX
Introduction to big data
PPTX
INFORMATION TECHNOLOGY UNIT 2 THE EMERGING TECHNOLOGY
PPTX
Introduction to data analytics - Intro to Data Analytics
PDF
Data foundation for analytics excellence
PDF
Big data in design and manufacturing engineering
Big Data Analytics: Challenges And Applications For Text, Audio, Video, And S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
BIG DATA ANALYTICS: CHALLENGES AND APPLICATIONS FOR TEXT, AUDIO, VIDEO, AND S...
Identify and analyze the greatest insights from big data
big data on science of analytics and innovativeness among udergraduate studen...
big data on science of analytics and innovativeness among udergraduate studen...
Bigdata Hadoop introduction
Introduction to big data – convergences.
Introduction to Data Science: data science process
Know The What, Why, and How of Big Data_.pdf
Big data (word file)
Introduction to visualizing Big Data
IRJET- Big Data Management and Growth Enhancement
Introduction to big data
INFORMATION TECHNOLOGY UNIT 2 THE EMERGING TECHNOLOGY
Introduction to data analytics - Intro to Data Analytics
Data foundation for analytics excellence
Big data in design and manufacturing engineering
Ad

Recently uploaded (20)

PPTX
OOP with Java - Java Introduction (Basics)
PPTX
Construction Project Organization Group 2.pptx
PDF
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
PPTX
Foundation to blockchain - A guide to Blockchain Tech
PPTX
Geodesy 1.pptx...............................................
PPTX
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
CH1 Production IntroductoryConcepts.pptx
PPTX
CYBER-CRIMES AND SECURITY A guide to understanding
PPTX
web development for engineering and engineering
PDF
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
DOCX
573137875-Attendance-Management-System-original
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
PDF
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
PDF
Structs to JSON How Go Powers REST APIs.pdf
PDF
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PPTX
Sustainable Sites - Green Building Construction
OOP with Java - Java Introduction (Basics)
Construction Project Organization Group 2.pptx
Mohammad Mahdi Farshadian CV - Prospective PhD Student 2026
Foundation to blockchain - A guide to Blockchain Tech
Geodesy 1.pptx...............................................
MCN 401 KTU-2019-PPE KITS-MODULE 2.pptx
UNIT 4 Total Quality Management .pptx
CH1 Production IntroductoryConcepts.pptx
CYBER-CRIMES AND SECURITY A guide to understanding
web development for engineering and engineering
Evaluating the Democratization of the Turkish Armed Forces from a Normative P...
573137875-Attendance-Management-System-original
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
FINAL REVIEW FOR COPD DIANOSIS FOR PULMONARY DISEASE.pptx
BMEC211 - INTRODUCTION TO MECHATRONICS-1.pdf
Structs to JSON How Go Powers REST APIs.pdf
PRIZ Academy - 9 Windows Thinking Where to Invest Today to Win Tomorrow.pdf
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Sustainable Sites - Green Building Construction
Ad

Big Data Analytics_Unit1.pptx

  • 2. What is data?  Dictionary Definition: The quantities, characters, or symbols on which operations are performed by a computer, which may be stored and transmitted in the form of electrical signals and recorded on magnetic (audio tape), optical (CD), or mechanical recording media (Phonographic disc)
  • 3. What is big data?  It is a collection of data that is huge in volume and yet growing exponentially with time.  It is a data with so large size and complexity that none of traditional data management tools can store it or process it efficiently.  Big data is also a data but with huge size.
  • 4. Definition  Big data is high-volume, and or / high velocity information assets that demand cost-effective, innovative forms of information processing for enhanced insight and decision making.  - Gartner IT Glossary  The huge data is of the order of tera (10^12)bytes, Peta (10^15) bytes or Zeta (10^21) bytes.
  • 5. Explanation of Big Data definition
  • 7. Why is Big Data important?  Using the data from any source and analyzing it, we can find answers that  Streamline resource management  Improve operational efficiencies  Optimize product development  Drive new revenue and growth opportunities  Enable smart decision making
  • 8. Big data enables to accomplish business related tasks  Determine the root causes of failures, issues and defects in near- real time (industrial usage)  Spotting anomalies faster and more accurately than human eye (healthcare usage)  Recalculating entire risk portfolios in minutes (investment / finance sector)  Detect fraudulent behavior before it affects your organization.
  • 9. Some examples of big data  The NYSE generates about one terabyte of new trade data per day.  55 billion messages and 4.5 billion photos are exchanged on wattsapp every day  300 hours of video is uploaded every minute  Every minute user sends 31.25 million messages and watch 2.77 million videos  There are around 40,000 search queries googled each second.
  • 10. Types of Big Data  Structured  Semi-Structured  Unstructured
  • 11. Types of Big Data Big Data Unstructured (80%) Structured (10%) Semi Structured (10%)
  • 13. 10 Characteristics of Big Data Volume Data Size Velocity Speed at which data is generated Variety Different types of data Variability Dynamic Evolving behavior in data science Values Getting value out of data Visualization Data clustering, sunbursts Vulnerability Security Concerns Volatility Data governance Validity Data quality check Veracity Confidence or Trust in data Data accuracy The 10V’s of Big data
  • 15. Unstructured data for Analytics  Business Documents  Emails  Social Media  Customer feedback  Webpages  Open-ended survey responses  Images, Audio and Video  Importance of unstructured data analysis for businesses:  Improve the customer experience  Discover gaps in the market and innovate  Listen to your customers
  • 16. How to Analyze unstructured Data  Choose the End Goal  Define a clear set of measurable goals.  Collect Relevant Data  Focus on the source of data  Clean Data  Reduce noise  Eliminate unwanted information  Implement Technology  NoSQL databases  Data visualization using Tableau, Google data studio
  • 17. Unstructured Data Analytics Dealing with unstructure d data Data Mining • Association Rule mining • Regression Analysis • Collaborative filtering Text Mining NLP Noisy Text Analysis
  • 18. Unstructured Data Analytics Tools  MonkeyLearn – Used for Text Analytics.  This tool makes it simple to clean, label and visualize customer feedback  Word Clouds - textual data visualization which allows anyone to see in a single glance the words which have the highest frequency within a given body of text  Listen to customer’s voice – open surveys and emails.  Aspect is based on sentiment analysis  Amazon AWS  Microsoft Azure  IBM Cloud
  • 19. The Advantages of Deploying Big Data  Better Decision making  Cost Reduction  Newer Products and redevelopment of the old  Risk Analysis  Collection of Data
  • 20. Industries using Big Data  CA technology have done a global study in which clearly the benefits of Big data outweigh the obstacles in implementation  The percentage of organizations that plan to and already have implemented a big data project is 84%  Acquision has increased to 54%, revenue has improved by 88%.  hiQ: It specializes in ‘people analytics’.  SumAI: Helps businesses optimize their social media campaigns with the help of one single chart.  Splunk: Visual analytics  Alteryx: Combines structured and unstructured data from a number of sources and stores it in one database. Spacial, predictive and statistical analysis tasks are done on this data.
  • 21. How big data is used in different industries  Media and entertainment:  Companies like Hulu and Netflix work with big data to analyze user tendencies, preferred content, trends in consumption.  Lot of services like spotify are coming up with new revenue models to increase profits  Ads are targeted more strategically thanks to big data analytics software.
  • 22. Finance  Shift from Manual trading to trading backed by technology  These models analyze big data to make  accurate enter / exit trade decisions,  minimize risk using machine learning and  guage market sentiment using opinion mining
  • 23. Healthcare  With predictive analytics , big data can predict negative health events that senior citizens would experience from home care.  This reduced visits by 73% and 64% amongst chronically ill patients  Big data can identify disease trends based on demographics, geographies, socio economics and other factors
  • 24. Education  Improve learning management. Tracking how much time learners spend on tasks, tests, and exams helps to customize curricula efficiently.  Improve students’ performance. Leveraging data about learners’ performance helps educators develop personalized learning paths.  Provide data-driven decision-making.  Predict learning outcomes.  Use big data to reduce dropout rates
  • 25. Retail  Enhance Service Quality  Optimize Price  Manage Supply Chain  Identify Potential Risks  Forecast demand
  • 26. Manufacturing  Quality Assurance  Supply Chain Optimization  Improving Throughput and Yield  Less downtime  Greater Customer Service
  • 28. Big Data Challenges  1. Lack of Knowledge Professionals  To run these large data tools, companies need skilled data professionals. (data scientists, data analysts and data engineers)  Solution : Big data tools are used by professionals who are not data science experts but have the basic knowledge.  This saves a lot of money for the companies.  2. Lack of proper understanding of Massive Data  Employees not knowing how to store sensitive data.  Solution: Data workshops and seminars must be held at companies for everybody.
  • 29. Big Data Challenges  3. Data Growth Issues:  One of the biggest challenges is to store the huge data.  Solution: Compression is used to reduce the size of data stored.  De-duplication removes the duplicate and unwanted data  Data Tiering stores data in different data tiers.(public clouds, private cloud and flash storage)  4. Confusion while Big data tool selection  Companies are confused on which tool to select for Data analysis and storage? Hbase, Cassandra etc.  Solution: Hire experts who know which tools to use.
  • 30.  5. Integrating data from a spread of sources  Data in corporation comes from various sources like social media pages, ERP applications, customer logs etc.  Solution: Data integration problems are solved by purchasing proper tools.  6. Securing Data  Companies can lost a lot of revenue due to a stolen record or a knowledge breach.  Solution: cyber-security professionals guard their data. Other steps include encryption, identity and access control, implementation of end point security real-time security monitoring Big Data Challenges
  • 31. Assignment Questions 1. What is Big Data ? 2. Explain the types of data. Also briefly mention the sources of each types of data along with examples. 3. Why is big data important. How does it help businesses and briefly describe its usage across various domains(industry, retail, healthcare, manufacturing, education …) 4. Briefly describe the characteristics of big data. 5. Describe the types of analytics and mention the sources of unstructured data used in analytics. 6. Mention some tools used in analytics 7. Discuss the Big Data challenges briefly