SlideShare a Scribd company logo
TAMING
BIG DATA
Tools and techniques
adopted for Big Data
Analytics
JOSEPH FRANCIS
1BI11IM020
WHAT IS BIG DATA?
WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE BASED
SERVICES AND PRODUCTS ?
BIG DATA CHARACTERISTICS
A BRIEF HISTORY ON ORIGINS OF BIG DATA
PHASES IN BIG DATA ANALYSIS
CHALLENGES IN BIG DATA ANALYSIS
TOOLS AND TECHNIQUES FOR DATA ANALYTICS
CASE STUDIES
CONCLUSION
Tools and techniques adopted for big data analytics
WHAT IS BIG DATA?
Extremely large data sets that may be analysed
computationally to reveal patterns, trends, and
associations, especially relating to human
behaviour and interactions.
WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE
BASED SERVICES AND PRODUCTS ?
BUSINESS INTELLIGENCE
DECISION SUPPORT
PREDICTIVE ANALYTICS
GOVERNMENTS
HEALTHCARE
RESEARCH
MARKETTING STRATEGIES
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
Tools and techniques adopted for big data analytics
A brief history on origins of big data
1880 - The Start of Information Overload
8years to complete US census
1932 - The Population Boom
1956 - Virtual Memory
1966 - Centralized Computing Systems Enter the Scene
1970 - Relational Database
1985 - Manufacturing Resources Planning Systems
1989 - Business Intelligence
1995 - The World Wide Web Explodes
1999 - Predictive Analysis Changes Business as Usual
http://www.winshuttle.com/big-data-timeline/
Data Acquisition and Recording
Information Extraction and Cleaning
Data Integration, Aggregation, and Representation
Query Processing, Data Modelling, and Analysis
Interpretation
Phases in Big Data analysis
CHALLENGES IN DATA ANALYSIS
HETEROGINITY
TIMELINESS
SCALE
PRIVACY
HUMAN COLLABORATION
Tools and techniques adopted for big data analytics
Tools and Techniques
a/b testing Crowdsourcing
Genetic algorithms Machine learning
Natural language processing Time series analysis
Visualization Data mining
Association rule learning Classification tree analysis
Regression analysis
a/b testing
It is a form of statistical hypothesis testing with two
variants leading to the technical term, Two-sample hypothesis
testing, used in the field of statistics.
a = H0: NULL HYPOTHESIS
b = H1:ALTERNATE HYPOTHESIS
Tools and techniques adopted for big data analytics
Crowdsourcing
Crowdsourcing represents the act of a company or institution
taking a function once performed by employees and outsourcing
it to an undefined (and generally large) network of people in the
form of an open call.
Analysis of the reviews for opinion
Analysis of the interactions for need and intent
Analysis of social network interactions
Tools and techniques adopted for big data analytics
Machine learning
- scientific discipline that explores the construction and
study of algorithms.
- by building a model from example inputs and using that
to make predictions or decisions.
- by dynamic instructions.
Machine learning is closely related to and often overlaps
with computational statistics; a discipline which also specializes
in prediction-making.
Tools and techniques adopted for big data analytics
Indian Elections 2014
- size of the Indian electorate. With 814 million voters, in
comparison to the USA’s 193.6 million and the UK’s 45.5
million.
0
100
200
300
400
500
600
700
800
900
INDIA USA UK
- variety of data – India’s voter rolls in 12 different
languages and 900,000 PDF’s amounting to 25
million pages made for a heterogeneous, non-
uniform and deeply diverse information set.
- the veracity of the information was often questionable
one report noted that some voters were listed as 19,545
years old, and others a confounding 0 years old. Name
overlapping (there are 327,000 women named “Sita” in
Bihar alone) only further complicated the process.
A tactical scenario
BJP
WEBSITE
BIKE
WEBSITE
JOB
PORTAL
INDIA DESERVES
BETTER
Air BnB
-Airbnb’s team had a hunch that better photos would
increase rentals.
-They tested the idea by putting the least effort
possible into a test that would give them valid results.
-When the experiment showed good results, they
built the necessary components and rolled it out to all
customers.
Tools and techniques adopted for big data analytics
Shoppers stop
Shoppers Stop stores retails clothing,
accessories, handbags, shoes, jewelry, fragrances,
cosmetics, health and beauty products, home furnishing
and decor products.
Shoppers Stop launched its e-store with delivery
across major cities in India in 2008. The website retails
all the products available at Shoppers Stop stores,
including apparel, cosmetics and accessories. Shoppers
Stop opened stores in Amritsar, Bhopal and
Aurangabad.
After analysing its First Citizen base, the company had
observed that not all those who buy shirts also buy trousers.
But those who buy both men’s shirts and trousers
spend 60% more a year on average than those who buy only
shirts, and thrice as much as those who don’t buy men’s shirts at
all
9,00,000
- included customers who showed a pattern
of being interested in new brands in other non
trouser categories. They were sent information on
new trouser brand launches and fits.
- exhibited multiple buying patterns in
other categories. They were sent attractive deals if
they bought two or more trousers.
- “control group” to measure success or
failure of the promotions.
The campaign proved 30 % increase in sales equivalent to
30 crore

More Related Content

PPSX
Applications of Big Data Analytics in Businesses
PDF
Introduction to IoT Architectures and Protocols
PPTX
What is Big Data?
PPTX
Cyber security
PPTX
Knowledge Representation, Inference and Reasoning
PPTX
Basic Computer Skills
PPTX
Big Data Analytics
Applications of Big Data Analytics in Businesses
Introduction to IoT Architectures and Protocols
What is Big Data?
Cyber security
Knowledge Representation, Inference and Reasoning
Basic Computer Skills
Big Data Analytics

What's hot (20)

DOCX
Big data lecture notes
PPTX
Data analytics
PPTX
Big Data and Security - Where are we now? (2015)
PPTX
Chapter 1 big data
PPTX
Big data
PPTX
Data Science
PPTX
introduction to data science
PPTX
Data science applications and usecases
PDF
Data science
PPTX
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
PPTX
Data analytics
PDF
Introduction to data analytics
PPTX
Big_data_ppt
PPTX
Data analytics
PPTX
Data science
PDF
Big Data: Its Characteristics And Architecture Capabilities
PPTX
Big data analytics
PPTX
Big Data Analytics
PPT
Big Data
PPTX
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Big data lecture notes
Data analytics
Big Data and Security - Where are we now? (2015)
Chapter 1 big data
Big data
Data Science
introduction to data science
Data science applications and usecases
Data science
What Is Data Science? | Introduction to Data Science | Data Science For Begin...
Data analytics
Introduction to data analytics
Big_data_ppt
Data analytics
Data science
Big Data: Its Characteristics And Architecture Capabilities
Big data analytics
Big Data Analytics
Big Data
Big data PPT prepared by Hritika Raj (Shivalik college of engg.)
Ad

Similar to Tools and techniques adopted for big data analytics (20)

PDF
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
PPTX
Future of marketing research
PDF
Datamining Etnography
PDF
Market Research in Today’s Business Landscape:
PDF
Le meilleur des études Ipsos à travers le monde – Septembre 2017
PDF
Webinar: "Data Driven Marketing Research Techniques"
PDF
Consumer attitudes synopsis [www.writekraft.com]
PDF
Consumer attitudes synopsis [www.writekraft.com]
PDF
Consumer attitudes synopsis [www.writekraft.com]
PDF
Consumer attitudes synopsis [www.writekraft.com]
PDF
Consumer attitudes synopsis [www.writekraft.com]
DOC
Consumer attitudes synopsis [www.writekraft.com]
PPTX
Fashionmarketing
PPTX
Business Analytics in Fashion marketing
PDF
Applications of Big Data
PPTX
Big Data & Business Analytics: Understanding the Marketspace
PDF
An era of game changing insight from Big Data
PDF
Big Data, Republicans and 2016
PPTX
Data mining semiinar ppo
PDF
A Statistical Analysis on Supermarket Sales
Framework to Analyze Customer’s Feedback in Smartphone Industry Using Opinion...
Future of marketing research
Datamining Etnography
Market Research in Today’s Business Landscape:
Le meilleur des études Ipsos à travers le monde – Septembre 2017
Webinar: "Data Driven Marketing Research Techniques"
Consumer attitudes synopsis [www.writekraft.com]
Consumer attitudes synopsis [www.writekraft.com]
Consumer attitudes synopsis [www.writekraft.com]
Consumer attitudes synopsis [www.writekraft.com]
Consumer attitudes synopsis [www.writekraft.com]
Consumer attitudes synopsis [www.writekraft.com]
Fashionmarketing
Business Analytics in Fashion marketing
Applications of Big Data
Big Data & Business Analytics: Understanding the Marketspace
An era of game changing insight from Big Data
Big Data, Republicans and 2016
Data mining semiinar ppo
A Statistical Analysis on Supermarket Sales
Ad

Recently uploaded (20)

PPTX
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
PPTX
Global journeys: estimating international migration
PDF
Foundation of Data Science unit number two notes
PPTX
Introduction-to-Cloud-ComputingFinal.pptx
PDF
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
PPTX
Major-Components-ofNKJNNKNKNKNKronment.pptx
PPTX
climate analysis of Dhaka ,Banglades.pptx
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PPTX
Moving the Public Sector (Government) to a Digital Adoption
PPT
Miokarditis (Inflamasi pada Otot Jantung)
PDF
Lecture1 pattern recognition............
PPTX
Introduction to machine learning and Linear Models
PPT
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
PPTX
Logistic Regression ml machine learning.pptx
PDF
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
PPTX
STUDY DESIGN details- Lt Col Maksud (21).pptx
PPTX
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
PDF
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
PPT
Quality review (1)_presentation of this 21
The THESIS FINAL-DEFENSE-PRESENTATION.pptx
Global journeys: estimating international migration
Foundation of Data Science unit number two notes
Introduction-to-Cloud-ComputingFinal.pptx
“Getting Started with Data Analytics Using R – Concepts, Tools & Case Studies”
Major-Components-ofNKJNNKNKNKNKronment.pptx
climate analysis of Dhaka ,Banglades.pptx
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
Moving the Public Sector (Government) to a Digital Adoption
Miokarditis (Inflamasi pada Otot Jantung)
Lecture1 pattern recognition............
Introduction to machine learning and Linear Models
Chapter 3 METAL JOINING.pptnnnnnnnnnnnnn
Logistic Regression ml machine learning.pptx
Recruitment and Placement PPT.pdfbjfibjdfbjfobj
STUDY DESIGN details- Lt Col Maksud (21).pptx
DISORDERS OF THE LIVER, GALLBLADDER AND PANCREASE (1).pptx
BF and FI - Blockchain, fintech and Financial Innovation Lesson 2.pdf
Quality review (1)_presentation of this 21

Tools and techniques adopted for big data analytics

  • 2. Tools and techniques adopted for Big Data Analytics JOSEPH FRANCIS 1BI11IM020
  • 3. WHAT IS BIG DATA? WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE BASED SERVICES AND PRODUCTS ? BIG DATA CHARACTERISTICS A BRIEF HISTORY ON ORIGINS OF BIG DATA PHASES IN BIG DATA ANALYSIS CHALLENGES IN BIG DATA ANALYSIS TOOLS AND TECHNIQUES FOR DATA ANALYTICS CASE STUDIES CONCLUSION
  • 5. WHAT IS BIG DATA? Extremely large data sets that may be analysed computationally to reveal patterns, trends, and associations, especially relating to human behaviour and interactions.
  • 6. WHY ANALYSIS ON BIG DATA IS CRUCIAL FOR VALUE BASED SERVICES AND PRODUCTS ? BUSINESS INTELLIGENCE DECISION SUPPORT PREDICTIVE ANALYTICS GOVERNMENTS HEALTHCARE RESEARCH MARKETTING STRATEGIES
  • 13. A brief history on origins of big data 1880 - The Start of Information Overload 8years to complete US census 1932 - The Population Boom 1956 - Virtual Memory 1966 - Centralized Computing Systems Enter the Scene 1970 - Relational Database 1985 - Manufacturing Resources Planning Systems 1989 - Business Intelligence 1995 - The World Wide Web Explodes 1999 - Predictive Analysis Changes Business as Usual http://www.winshuttle.com/big-data-timeline/
  • 14. Data Acquisition and Recording Information Extraction and Cleaning Data Integration, Aggregation, and Representation Query Processing, Data Modelling, and Analysis Interpretation Phases in Big Data analysis
  • 15. CHALLENGES IN DATA ANALYSIS HETEROGINITY TIMELINESS SCALE PRIVACY HUMAN COLLABORATION
  • 17. Tools and Techniques a/b testing Crowdsourcing Genetic algorithms Machine learning Natural language processing Time series analysis Visualization Data mining Association rule learning Classification tree analysis Regression analysis
  • 18. a/b testing It is a form of statistical hypothesis testing with two variants leading to the technical term, Two-sample hypothesis testing, used in the field of statistics. a = H0: NULL HYPOTHESIS b = H1:ALTERNATE HYPOTHESIS
  • 20. Crowdsourcing Crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. Analysis of the reviews for opinion Analysis of the interactions for need and intent Analysis of social network interactions
  • 22. Machine learning - scientific discipline that explores the construction and study of algorithms. - by building a model from example inputs and using that to make predictions or decisions. - by dynamic instructions. Machine learning is closely related to and often overlaps with computational statistics; a discipline which also specializes in prediction-making.
  • 24. Indian Elections 2014 - size of the Indian electorate. With 814 million voters, in comparison to the USA’s 193.6 million and the UK’s 45.5 million. 0 100 200 300 400 500 600 700 800 900 INDIA USA UK - variety of data – India’s voter rolls in 12 different languages and 900,000 PDF’s amounting to 25 million pages made for a heterogeneous, non- uniform and deeply diverse information set. - the veracity of the information was often questionable one report noted that some voters were listed as 19,545 years old, and others a confounding 0 years old. Name overlapping (there are 327,000 women named “Sita” in Bihar alone) only further complicated the process.
  • 27. -Airbnb’s team had a hunch that better photos would increase rentals. -They tested the idea by putting the least effort possible into a test that would give them valid results. -When the experiment showed good results, they built the necessary components and rolled it out to all customers.
  • 29. Shoppers stop Shoppers Stop stores retails clothing, accessories, handbags, shoes, jewelry, fragrances, cosmetics, health and beauty products, home furnishing and decor products. Shoppers Stop launched its e-store with delivery across major cities in India in 2008. The website retails all the products available at Shoppers Stop stores, including apparel, cosmetics and accessories. Shoppers Stop opened stores in Amritsar, Bhopal and Aurangabad.
  • 30. After analysing its First Citizen base, the company had observed that not all those who buy shirts also buy trousers. But those who buy both men’s shirts and trousers spend 60% more a year on average than those who buy only shirts, and thrice as much as those who don’t buy men’s shirts at all 9,00,000
  • 31. - included customers who showed a pattern of being interested in new brands in other non trouser categories. They were sent information on new trouser brand launches and fits. - exhibited multiple buying patterns in other categories. They were sent attractive deals if they bought two or more trousers. - “control group” to measure success or failure of the promotions. The campaign proved 30 % increase in sales equivalent to 30 crore