SlideShare a Scribd company logo
What is Better Alert?Big Data in Action: Operations, Analytics and more
Agenda
• Meet & Greet Introduction.
• Unfolding the term “Big Data”.
– Evolution of Data to Big Data : Static to Stream.
– 3 V’s of Big Data.
• Overview of Implementing Big Data
– Examples of implementation of Big Data
– Implementing Big data with Hadoop infrastructure
– Implementing Big data with NoSql like Cassandra & MongoDB.
• Advantages of implementing Big Data solutions.
• Open Forum Discussion/ Networking.
Vibhu Bhutani
Technical Project Manager
Started as a Java developer, and I have many years of experience in developing and managing
state of the art applications. With extensive experience in the phases of the SDLC model, I
leads the team of innovations & mobile excellence in softweb soloutions. Am involved in
various innovative implementations which include the implementation of Big Data systems,
IOT implementations and iBeacon developments at Softweb Solutions.
in/vibhuis
Welcome
Unfolding the Term Big Data
• IBM reported in a study that every day we create roughly 2.5 quintillion data from various
data sources like Climate Sensors, GPS Signals, Social Media, Online transactions. Out of
which 90% was created in the last couple of years. Big Data is a buzz word of a technology
that shows a potential to process, huge amount of data so that we get some valuable
information out of it.
• How old is Big Data?
– Its as old as data however the parameters changes every year. In 2012 it was about couple of
Petabytes and now its about few Exabyte's.
• Why do we now here about Big Data?
– Although big data is old, but now a days more industries are knowing about the implications of
big data. In 2004 Google introduced a paper explaining Map Reduce technique to analyze large
datasets. After that many other companies joined together and the buzz word Big Data came
into existence.
• Static data VS Dynamic Data
Evolution of Data
In 76 KB of
Hardwired
Memory, Nasa
successfully took
Men to moon and
brought them back.
With an 8 Gigs
iPhone it can be
done 108 times.
Strange Fact
Evolution of Data
Necessity is the Mother of Invention, and I believe Technology his father
3 V’s of Big Data
3 V’s of Big Data
4th V of Big Data
Application of Big Data - Cern
• In 1960 Cern used to store data in a main frame
computer.
• In 1970 cern used to distribute data in several
machines dividing mainframe computer into a
smaller piece of equipments and cern net was
introduced to bridge these machine and travel
was reduced.
• In 1980 these machines were placed in different
countries of US and Europe and internet was
introduced to connect these machines.
• Due to enormous increase of data in 2000 a cern
grid was introduced connecting different smaller
computers together to analyze and process the
data.
• Detector with 150 million sensors are used in LHC
where protons collides at a light speed works as a
3D camera where pictures are by a rate of 40
million times per second. The data is now stored
in cloud and analyzed using big data techniques.
Implementation of Big Data - Cern
Proton injection for collision Collision of particles recording data
in sensors
Other Industries using Big Data
• Government Application:
– US government invested a lot in the big data applications. Big data analysis played a large role
in Barack Obama's successful 2012 re-election campaign.
– The Utah Data Center is a data center currently being constructed by the United States National
Security Agency. The exact amount of storage space is unknown, but more recent sources claim it
will be on the order of a few exabytes.
– Big data analysis was, in parts, responsible for the BJP and its allies to win a highly successful Indian
General Election 2014.
– UK government is utilizing big data to improv weather forecasting & new drug release forecasts.
• Manufacturing Industries:
• Vast amount of sensory data such as acoustics, vibration, pressure, current, voltage and controller
data in addition to historical data construct the big data in manufacturing. The generated big data
acts as the input into predictive tools and preventive strategies.
• Technology Industries:
• Ebay and Amazon are industry leaders for maintaining large amount of user searches and predictive
analysis. This helps to identify user needs and provide them with better results.
• Retail Industries:
• Walmart contains about 2.5 peta bytes of data handling 1 million customer transaction every hour.
• Amazon does a transaction of USD 80,000 in an hour. Amazon has worlds three largest databases.
Big Data Solutions - Hadoop
• Hadoop is an open-source system to reliably
store and process lot of information.
• Solution of Big Data that handles complexity
involved in volume, variety and velocity of data.
• It transform the commodity hardware to services
to handle peta bytes of data into distributed
environments: Pigeon Computing.
• Hadoop is redundant , reliable, powerful, batch
process centric, distributed.
High Level Architecture of Hadoop
Map Reduce program – Word Count
Hadoop Implementation in Real World
• Yahoo:
– In 2008 Yahoo claimed world’s largest hadoop prodcution application. Yahoo Search Webmap is a hadoop
application that runs on Linux with more that 10,000 cores.
• Facebook:
– In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. On
June 13, 2012 they announced the data had grown to 100 PB] On November 8, 2012 they announced the
data gathered in the warehouse grows by roughly half a PB per day
• As of 2013, Hadoop adoption is widespread. For example, more
than half of the Fortune 50 use Hadoop.
• The New York Times used 100 Amazon EC2 instances and a
Hadoop application to process 4 TB of raw image TIFF data (stored
in S3) into 11 million finished PDFs in the space of 24 hours at a
computation cost of about $240 (not including bandwidth)
Distributed System - CAP Theorem
Introduction to No SQL
• A NoSQL database provides a mechanism
for storage and retrieval of data that is modeled in means other
than the tabular relations used in relational databases
• Types of NoSQL Databases:
– Column: Cassandra, HBase
– Document: Apache CouchDB, MongoDB
– Key-value: CouchDB, Dynamo, Redis
– Graph: Neo4J
– Multi-model: OrientDB, Alchemy Database, CortexDB
High Level Architecture - Cassandra
• Ring based replication
• Only 1 type of server (cassandra)
• All nodes hold data and can answer queries
• No Single Point of Failure
• Build for HA & Scalability
• Multi-DC
• Data is found by key (CQL)
• Runs on JVM
High Level Architecture - Cassandra
High Level Architecture - Cassandra
Example: Single Row Partition
• Simple User system
• Identified by name (pk)
• 1 Row per partition
High Level Architecture - Cassandra
Example: Multiple Rows
• Comments on photos
• Comments are always selected by
the photo_id
• There are only 4 rows in 2 partitions
High Level Architecture - Cassandra
• Multiple rows are transposed into a single partition
• Partitions vary in size
• Old terminology - "wide row"
• Cassandra is built for fast write. The data model should be deformalize to do few Reads as
possible
High Level Architecture – Mongo DB
• Open-source, Document-oriented, popular for its
agile and scalable approach
• Notable Features :
– JSON/BSON data model with dynamic schema
– Auto-sharding for horizontal scalability
– Built-in replication with automated fail-overs
– Full, flexible index support including secondary
indexes
– Rich document-based queries
– Aggregation framework and Map / Reduce
– GridFS for large file storage
High Level Architecture – Mongo DB
• Ensures High Availability, Redundancy, Automated
Fail-over
• Writes to the Primary, Reads from all
• Asynchronous replication
• In conventional terms, more like Master/Slave
replication
• Members can be configured to be: Secondary only
/ Non- Voting / Hidden / Arbiters / Delayed
When to use : Mongo DB
• Unstructured data from multiple suppliers
• GridFS : Stores large binary objects
• Spring Data Services
• Embedding and linking documents
• Easy replication set up for AWS
Advantages of using Big Data
Thank you for your patience.
Thank You!

More Related Content

PPT
Big Data Real Time Analytics - A Facebook Case Study
PDF
Big data trends challenges opportunities
PPTX
Владимир Слободянюк «DWH & BigData – architecture approaches»
PPTX
Introducing Technologies for Handling Big Data by Jaseela
PPT
Big data analytics, survey r.nabati
PPTX
Big Data vs Data Warehousing
PPTX
Big Data PPT by Rohit Dubey
PDF
Big Data Use Cases
Big Data Real Time Analytics - A Facebook Case Study
Big data trends challenges opportunities
Владимир Слободянюк «DWH & BigData – architecture approaches»
Introducing Technologies for Handling Big Data by Jaseela
Big data analytics, survey r.nabati
Big Data vs Data Warehousing
Big Data PPT by Rohit Dubey
Big Data Use Cases

What's hot (20)

PPTX
Service generated big data and big data-as-a-service
PPTX
Introduction to Cloud computing and Big Data-Hadoop
PPTX
Structuring Big Data
PDF
Core concepts and Key technologies - Big Data Analytics
PPT
Big Data: An Overview
PDF
02 a holistic approach to big data
PPTX
Great Expectations Presentation
PPTX
introduction to big data frameworks
PDF
Big Data Scotland 2017
PDF
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
PPTX
Big Data - Applications and Technologies Overview
PDF
Big data analytics, research report
PDF
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
PPTX
BigData
PPTX
Big data frameworks
PDF
Big Data Analytics for Real Time Systems
PPTX
Big Data & Hadoop Introduction
PPTX
Big Data - An Overview
PDF
Big Data Analytics
PPTX
Big Data Overview 2013-2014
Service generated big data and big data-as-a-service
Introduction to Cloud computing and Big Data-Hadoop
Structuring Big Data
Core concepts and Key technologies - Big Data Analytics
Big Data: An Overview
02 a holistic approach to big data
Great Expectations Presentation
introduction to big data frameworks
Big Data Scotland 2017
THE 3V's OF BIG DATA: VARIETY, VELOCITY, AND VOLUME from Structure:Data 2012
Big Data - Applications and Technologies Overview
Big data analytics, research report
Guest Lecture: Introduction to Big Data at Indian Institute of Technology
BigData
Big data frameworks
Big Data Analytics for Real Time Systems
Big Data & Hadoop Introduction
Big Data - An Overview
Big Data Analytics
Big Data Overview 2013-2014
Ad

Similar to Big Data in Action : Operations, Analytics and more (20)

PDF
Big data
PPTX
ppt final.pptx
PPTX
Big_Data_ppt[1] (1).pptx
DOCX
Content1. Introduction2. What is Big Data3. Characte.docx
PPTX
Big data ppt
PDF
Bigdatappt 140225061440-phpapp01
PPT
Hadoop HDFS.ppt
PPTX
BIGDATA-Basics-Sources-types-Impact.pptx
PPTX
BIGDATA-Basics-Sources-types-Impact.pptx
PPTX
Special issues on big data
PPTX
Introduction to Big Data
PPT
Data analytics & its Trends
PPTX
Kartikey tripathi
PPTX
Big data
PPTX
Big Data
PPSX
Big data with Hadoop - Introduction
PPTX
Bigdata " new level"
PPT
Research issues in the big data and its Challenges
PPTX
Bigdata
Big data
ppt final.pptx
Big_Data_ppt[1] (1).pptx
Content1. Introduction2. What is Big Data3. Characte.docx
Big data ppt
Bigdatappt 140225061440-phpapp01
Hadoop HDFS.ppt
BIGDATA-Basics-Sources-types-Impact.pptx
BIGDATA-Basics-Sources-types-Impact.pptx
Special issues on big data
Introduction to Big Data
Data analytics & its Trends
Kartikey tripathi
Big data
Big Data
Big data with Hadoop - Introduction
Bigdata " new level"
Research issues in the big data and its Challenges
Bigdata
Ad

More from Softweb Solutions (20)

PDF
Sitecore 9 Pre-Migration Assessment
PDF
Enterprise Sales App with Salesforce Integration - Softweb Solutions
PDF
How Salesforce FSL is redefining field service operations
PDF
Salesforce integration with ERP
PDF
A complete Salesforce implementation guide on how to implement Salesforce
PDF
How cognitive services can be used in various industries
PPTX
5 jobs where bots will replace humans
PPTX
How Amazon Echo can be helpful for the healthcare industry
PPTX
Top 8 questions to ask to an IoT platform provider
PPTX
Deep Dive into Service Design
PDF
Leverage IoT to Setup Smart Manufacturing Solutions
PPTX
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
PDF
Secure and scalable motioning solution with aws
PDF
How enterprise can benefit from internet of things
PPTX
Enterprise Mobility Solutions for Manufacturing Industry
PPTX
Noti-fi Android App at Softweb Hackthon 2014
PPTX
Song Sharing with Nodejs - Softweb Hackathon 2014
PPTX
Tracking Application - Softweb Hackathon 2014
PPTX
Beacon applications - Softweb Hackathon 2014
PPTX
Softweb Hackathon iOffice - An iBeacon App
Sitecore 9 Pre-Migration Assessment
Enterprise Sales App with Salesforce Integration - Softweb Solutions
How Salesforce FSL is redefining field service operations
Salesforce integration with ERP
A complete Salesforce implementation guide on how to implement Salesforce
How cognitive services can be used in various industries
5 jobs where bots will replace humans
How Amazon Echo can be helpful for the healthcare industry
Top 8 questions to ask to an IoT platform provider
Deep Dive into Service Design
Leverage IoT to Setup Smart Manufacturing Solutions
Sensors, Wearables and Internet of Things - The Dawn of the Smart Era
Secure and scalable motioning solution with aws
How enterprise can benefit from internet of things
Enterprise Mobility Solutions for Manufacturing Industry
Noti-fi Android App at Softweb Hackthon 2014
Song Sharing with Nodejs - Softweb Hackathon 2014
Tracking Application - Softweb Hackathon 2014
Beacon applications - Softweb Hackathon 2014
Softweb Hackathon iOffice - An iBeacon App

Recently uploaded (20)

PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
Cloud computing and distributed systems.
PDF
Modernizing your data center with Dell and AMD
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PDF
Empathic Computing: Creating Shared Understanding
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Digital-Transformation-Roadmap-for-Companies.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
MYSQL Presentation for SQL database connectivity
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Reach Out and Touch Someone: Haptics and Empathic Computing
Mobile App Security Testing_ A Comprehensive Guide.pdf
The Rise and Fall of 3GPP – Time for a Sabbatical?
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Cloud computing and distributed systems.
Modernizing your data center with Dell and AMD
Network Security Unit 5.pdf for BCA BBA.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Empathic Computing: Creating Shared Understanding

Big Data in Action : Operations, Analytics and more

  • 1. What is Better Alert?Big Data in Action: Operations, Analytics and more
  • 2. Agenda • Meet & Greet Introduction. • Unfolding the term “Big Data”. – Evolution of Data to Big Data : Static to Stream. – 3 V’s of Big Data. • Overview of Implementing Big Data – Examples of implementation of Big Data – Implementing Big data with Hadoop infrastructure – Implementing Big data with NoSql like Cassandra & MongoDB. • Advantages of implementing Big Data solutions. • Open Forum Discussion/ Networking.
  • 3. Vibhu Bhutani Technical Project Manager Started as a Java developer, and I have many years of experience in developing and managing state of the art applications. With extensive experience in the phases of the SDLC model, I leads the team of innovations & mobile excellence in softweb soloutions. Am involved in various innovative implementations which include the implementation of Big Data systems, IOT implementations and iBeacon developments at Softweb Solutions. in/vibhuis Welcome
  • 4. Unfolding the Term Big Data • IBM reported in a study that every day we create roughly 2.5 quintillion data from various data sources like Climate Sensors, GPS Signals, Social Media, Online transactions. Out of which 90% was created in the last couple of years. Big Data is a buzz word of a technology that shows a potential to process, huge amount of data so that we get some valuable information out of it. • How old is Big Data? – Its as old as data however the parameters changes every year. In 2012 it was about couple of Petabytes and now its about few Exabyte's. • Why do we now here about Big Data? – Although big data is old, but now a days more industries are knowing about the implications of big data. In 2004 Google introduced a paper explaining Map Reduce technique to analyze large datasets. After that many other companies joined together and the buzz word Big Data came into existence. • Static data VS Dynamic Data
  • 5. Evolution of Data In 76 KB of Hardwired Memory, Nasa successfully took Men to moon and brought them back. With an 8 Gigs iPhone it can be done 108 times. Strange Fact
  • 6. Evolution of Data Necessity is the Mother of Invention, and I believe Technology his father
  • 7. 3 V’s of Big Data
  • 8. 3 V’s of Big Data
  • 9. 4th V of Big Data
  • 10. Application of Big Data - Cern • In 1960 Cern used to store data in a main frame computer. • In 1970 cern used to distribute data in several machines dividing mainframe computer into a smaller piece of equipments and cern net was introduced to bridge these machine and travel was reduced. • In 1980 these machines were placed in different countries of US and Europe and internet was introduced to connect these machines. • Due to enormous increase of data in 2000 a cern grid was introduced connecting different smaller computers together to analyze and process the data. • Detector with 150 million sensors are used in LHC where protons collides at a light speed works as a 3D camera where pictures are by a rate of 40 million times per second. The data is now stored in cloud and analyzed using big data techniques.
  • 11. Implementation of Big Data - Cern Proton injection for collision Collision of particles recording data in sensors
  • 12. Other Industries using Big Data • Government Application: – US government invested a lot in the big data applications. Big data analysis played a large role in Barack Obama's successful 2012 re-election campaign. – The Utah Data Center is a data center currently being constructed by the United States National Security Agency. The exact amount of storage space is unknown, but more recent sources claim it will be on the order of a few exabytes. – Big data analysis was, in parts, responsible for the BJP and its allies to win a highly successful Indian General Election 2014. – UK government is utilizing big data to improv weather forecasting & new drug release forecasts. • Manufacturing Industries: • Vast amount of sensory data such as acoustics, vibration, pressure, current, voltage and controller data in addition to historical data construct the big data in manufacturing. The generated big data acts as the input into predictive tools and preventive strategies. • Technology Industries: • Ebay and Amazon are industry leaders for maintaining large amount of user searches and predictive analysis. This helps to identify user needs and provide them with better results. • Retail Industries: • Walmart contains about 2.5 peta bytes of data handling 1 million customer transaction every hour. • Amazon does a transaction of USD 80,000 in an hour. Amazon has worlds three largest databases.
  • 13. Big Data Solutions - Hadoop • Hadoop is an open-source system to reliably store and process lot of information. • Solution of Big Data that handles complexity involved in volume, variety and velocity of data. • It transform the commodity hardware to services to handle peta bytes of data into distributed environments: Pigeon Computing. • Hadoop is redundant , reliable, powerful, batch process centric, distributed.
  • 15. Map Reduce program – Word Count
  • 16. Hadoop Implementation in Real World • Yahoo: – In 2008 Yahoo claimed world’s largest hadoop prodcution application. Yahoo Search Webmap is a hadoop application that runs on Linux with more that 10,000 cores. • Facebook: – In 2010 Facebook claimed that they had the largest Hadoop cluster in the world with 21 PB of storage. On June 13, 2012 they announced the data had grown to 100 PB] On November 8, 2012 they announced the data gathered in the warehouse grows by roughly half a PB per day • As of 2013, Hadoop adoption is widespread. For example, more than half of the Fortune 50 use Hadoop. • The New York Times used 100 Amazon EC2 instances and a Hadoop application to process 4 TB of raw image TIFF data (stored in S3) into 11 million finished PDFs in the space of 24 hours at a computation cost of about $240 (not including bandwidth)
  • 17. Distributed System - CAP Theorem
  • 18. Introduction to No SQL • A NoSQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases • Types of NoSQL Databases: – Column: Cassandra, HBase – Document: Apache CouchDB, MongoDB – Key-value: CouchDB, Dynamo, Redis – Graph: Neo4J – Multi-model: OrientDB, Alchemy Database, CortexDB
  • 19. High Level Architecture - Cassandra • Ring based replication • Only 1 type of server (cassandra) • All nodes hold data and can answer queries • No Single Point of Failure • Build for HA & Scalability • Multi-DC • Data is found by key (CQL) • Runs on JVM
  • 20. High Level Architecture - Cassandra
  • 21. High Level Architecture - Cassandra Example: Single Row Partition • Simple User system • Identified by name (pk) • 1 Row per partition
  • 22. High Level Architecture - Cassandra Example: Multiple Rows • Comments on photos • Comments are always selected by the photo_id • There are only 4 rows in 2 partitions
  • 23. High Level Architecture - Cassandra • Multiple rows are transposed into a single partition • Partitions vary in size • Old terminology - "wide row" • Cassandra is built for fast write. The data model should be deformalize to do few Reads as possible
  • 24. High Level Architecture – Mongo DB • Open-source, Document-oriented, popular for its agile and scalable approach • Notable Features : – JSON/BSON data model with dynamic schema – Auto-sharding for horizontal scalability – Built-in replication with automated fail-overs – Full, flexible index support including secondary indexes – Rich document-based queries – Aggregation framework and Map / Reduce – GridFS for large file storage
  • 25. High Level Architecture – Mongo DB • Ensures High Availability, Redundancy, Automated Fail-over • Writes to the Primary, Reads from all • Asynchronous replication • In conventional terms, more like Master/Slave replication • Members can be configured to be: Secondary only / Non- Voting / Hidden / Arbiters / Delayed
  • 26. When to use : Mongo DB • Unstructured data from multiple suppliers • GridFS : Stores large binary objects • Spring Data Services • Embedding and linking documents • Easy replication set up for AWS
  • 28. Thank you for your patience. Thank You!

Editor's Notes

  • #5: 4. Example of streaming data: If there is a application searching of some text in the emails that we send. Emails can be considered as a stream of data, algorithms work to get some text identification done on the basis of specific pattern and send’s an alert if something is found. Now a days many government agencies are working on these kind of stuffs.
  • #6: 5. The image shows how the data was evolved. Archaeology findings shows that around 2000 BC Phaistos Disc were getting used to store the information. These were the clay discs which embeds the data and store it for a long period of time. Later people used wrote things in pyramids following by store tabs.
  • #7: 6. Necessity is the mother of invention. Human brain always want to know more and to know more, we need to process more. Information Era gave us the data, and to process this data we created big data.
  • #8: 7. Characteristics of Bid Data consists of 3V. Volume, Variety and Velocity. Volume represents the bulk and size of data. Every decade the definition of big data changes. Previously it was hard to store KB’s of data but now we are storing huge amounts of data on a smartphone. The image shows the amount of data that is getting stored in different parts of the world. Next comes variety, it’s the categorization of big data. By categorizing data we make it easy for data analyst to group some inter dependent data and get some advantage out of it.
  • #9: 8. Velocity represents the speed of generating this data. The image shows by what speed we generate this data. Its really to think what happens with this enormous amount of data that we are generating and this leads to the 4th V.
  • #10: 9. Value. What is the value of analyzing the data. The image shows how the various industries are utilizing & analyzing this data. Apart from the monetary benefits, many other fields like machine learning, scientific experiments, medicine etc. are benefited by Big Data.
  • #11: 10. In 1962 Arthur Samuel wrote a computer program to play checkers. The program got defeated initially but later Samuel wrote a sub program to analyze the board and compute the plays for wining. When the sub program got linked with the checkers program the computer started to win. This was a first incident of artificial intelligence where the data generated by the computer was recorded and used to plan the moves.
  • #12: 11.
  • #13: 12. Some car manufactures are gathering data from the sensors on the drivers seat, they are identifying the pattern when a driver feels sleepy and informs the driver by vibrating the steering. The same technology is getting used to identify the theft based on sitting patterns.
  • #14: 13.
  • #15: 14. Map reduce is the processing part, it runs the computation and return the results. Second part is HDFS. It stores all the data, with files and directories and is highly scalable and distributed.
  • #16: 15. This is a classic map reduce program for word count.
  • #17: 16.
  • #18: 17. In theoretical computer science, the CAP theorem, also known as Brewer's theorem, states that it is impossible for a distributed computer system to simultaneously provide all three of the following guarantees: Consistency (all nodes see the same data at the same time) Availability (a guarantee that every request receives a response about whether it succeeded or failed) Partition tolerance (the system continues to operate despite arbitrary message loss or failure of part of the system)
  • #19: 18. A column of a distributed data store is a NoSQL object of the lowest level in a keyspace. It is a tuple (a key-value pair) consisting of three elements, Unique name, value & timestramp. Document: A trivial example would be scanning paper documents, extracting the title, author, and date from them either by OCR or having a human locate and enter them, and storing each document in a 4-column relational database, the columns being author, title, date, and a blob full of page images Key Value: an associative array, map, symbol table, or dictionary is an abstract data type composed of a collection of  pairs, such that each possible key appears just once in the collection. Graph:  graph database is a database that uses graph structures for semantic queries with nodes, edges, and properties to represent and store data
  • #20: 19.
  • #21: 20.
  • #22: 21.
  • #23: 22.
  • #24: 23.
  • #25: 24.
  • #26: 25.
  • #27: 26.
  • #28: 27. Not to say there are some disadvantages too: Issues with finding the right talent. Issue with finding the proper use case. Impact on white colar jobs due to high needs of data scientists. Analyzing and finding out Good Data from Big Data.
  • #29: 28.