SlideShare a Scribd company logo
The Term Bigdata stems from Characterisized
by 5V:
Volume: Large Volume of data
Velocity: amount of data per seconds
Variability: level of unintentional
modification affecting data Quality
throughout lifecycle of data.
Value: Value derived from data.
Variety: large range of data which is received
from video , audio, text, image.
Sources Example by 5V.
Volume: Youtube, large volume of video feeds received
and maintained at many video sites like youtube,
vimeo etc…
Variety: Large variety of data text, audio, video,
images, received in sites like facebook, twitter, other
social media platforms.
Velocity: Speed at which data is received in sites like
twitter, facebook (1 billion people all feeding there
data on one site)
Batch Processing Vs Real Time processing
Batch Jobs run at particular time of day like Nightly jobs
or morning jobs which depends on slack time When
server has less load.
But people now want to see the Status like in
transportation when bus is arriving on particular stand
in real time. Or in Retail as soon they update there status
the require real time advertisements. This is shaping
move towards Big data.
Problems differentiated by 5V.
Velocity: With large volume of data received and quick turn
around latency required to reflect the data fed at facebook then
Can it be managed by regular DBMS?
DBMS- maintains ACID properties & have lots of constraints like
primary, foreign keys, check constraints etc.. with quick
turnaround or short latency required these constraints add up
processing time and volume required for storage. So all of these
sites have there own File based storage DBMS like systems with
does not have these constraints. All data is maintained in files, id
assigned to files are indexed and regularly moved (these are
publically know open sourced databases like Cassandra developed
by facebook, BigTable by Google, etc…)
Most of this databases are popularly Categorized as NoSQL
databases.
Technology Company Open Sourced On
Cassandra DataStax Apache Cassandra
used by Facebook , Linkedin ,
Twitter
BigTable Google Google BigTable
Apache HBase Apache HBase ( used by many
companies most popular)
MongoDB MongoDB Inc. Apache (written on C++,Erlang,C)
Couchbase CouchBase Inc Apache (written on Erlang)
Category No SQL database
Column
Oriented
Accumulo, Cassandra, Hbase.
Document Clusterpoint,Couchdb, Couchbase, MarkLogic, MongoDB
Key-Value Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c-
treeACE
Graph Allegro, Neo4J, OrientDB, Virtuoso, Stardog
- Column Oriented DB store database store Values in Column By Column rather
in other RDBMS row by row.
- It leads to better Compression Of data and hence less space required to store
DB.
- There are Still higher Compression can be achieved when used Probabilistic
Databases.
- Similarly Document oriented Store and arrange data in form of documents.
- Key-Value store Data in form of collection of Key-value pairs. Allowing add,
insert, delete to key-value pairs.
- Graph Databases: Every Element is direct pointer to its adjacent hence no-
lookup required.
Go through the link below:
http://sandyclassic.wordpress.com/2013/07/02/data-
warehousing-business-intelligence-and-cloud-
computing
As we know now Bigdata is solving problems of 5V like
the huge (V)olume of storage required for video sites
like youtube. Etc.
It’s changing how We perceive and Visualize or
Analyze data like HBase used for data storage, Mahout
of used to run analytics and find patterns. These
databases have variety of data which require different
kind of processing cannot be achieved by traditional
RDBMS based products. Example link below:
http://sandyclassic.wordpress.com/2013/06/18/gini-
coefficient-of-economics-and-roc-curve-machine-
learning/
Map-Reduce Algorithm was starting point of All we see
in BigData created by Google researcher.
Mapper divides work into multiple parallel task, sorts
within queue and filters into queue of say 1 queue for
each name.
Reducer Component Aggregates data or summarizes
from multiple units.
Big data technologies with Case Study Finance and Healthcare
So Since data is mostly unstructured the best way to
analyze unstructured data is using Analytics here
Comes New Career Called Data Scientist.
Skill Set Required for Data Scientist:
Mathematics (mostly statistics), Computer Science,
Domain like Sociology (like Social Media Analysis),
Big data technologies with Case Study Finance and Healthcare
Big data technologies with Case Study Finance and Healthcare
One application of Bigdata has been to gather
feedback about product from social media.
Here is Sample project Report below How and what
tools can be used to Analyze social media.
http://www.slideshare.net/SandeepSharma65/social-
media-analysis-project
Hadoop allows to distribute load among many
clusters.
There can be Database clusters, OS clusters,
Application Web server level clustering But here we
are dealing with OS like Distributed File System(DFS).
Hadoop DFS (HDFS) File system developed by yahoo
Competes with BigTable of Google providing quick
storage and retrieval of data in form of files used by
many social media platforms.
‘R’ was open source Statistical Analysis language
having Statistical Constructs available used for
Analysis of data.
Java data mining API, .Net data mining API , python
libraries are used to mine and understand trends in
Data.
PIG is another Apache Hadoop based system used
provide High level language for analyzing large data
sets.
Data Science http://thedatascience.wordpress.com/
Big Data :http://thebigdatatrends.wordpress.com
Data Science Blog2:
http://thedatascientistview.blogspot.ie/
Retail generates huge amount of data for product
positioned on different shelf at store, replenishment level,
reorder level, merchandising, assortment planning all this
data most of it usually structured Since lots of system is
Automated but there are lots of forms, customer feedback,
planning data analysis of mails other chat platforms.
Large Warehouses of Retail store needs plan positioning
and containers in Aisle.
Analyze trends from social media to find customer
preferences for products and offers.
Retail Innovation read:
http://sandyclassic.wordpress.com/2013/10/26/retail-
sector-innovations/
Retail uses lots of Sensors for tracking items with
warehouse and inside Store. The Huge real time data
(video , text and other forms) generated every milli-
second from Sensors embedded across every store and
warehouse Cannot be analyzed by any other medium
better than in Hadoop or Bigdata based System.
Finance being Game of numbers huge data from Book
of accounts, P&L, Balance sheets of etc accumulates of
different business over a period of time But most
books are Structured and hence the data. But Hadoop
offers huge scalable clusters to quickly analyze
structured data as well.
Lots of social media data about interest for share or
any instrument does get reflected in numbers.
Spreadsheets are popular medium of analysis and
other textual forms can be better analyzed if available
over Hadoop like clusters for a kind of semi-structured
data analysis.

More Related Content

PPTX
NoSQL Type, Bigdata, and Analytics
PPTX
Data science big data and analytics
PPTX
Real time bi solution architecture
PPTX
Solution architecture
PPT
Big Data Analytics 2014
ODP
Building next generation data warehouses
ODP
Open Source Business Intelligence Overview
NoSQL Type, Bigdata, and Analytics
Data science big data and analytics
Real time bi solution architecture
Solution architecture
Big Data Analytics 2014
Building next generation data warehouses
Open Source Business Intelligence Overview

What's hot (20)

PPTX
BigData-Architecture
PDF
Big data landscape
PPT
Big Tools for Big Data
PDF
Big Data Pitfalls
PPT
Data Mining and Data Warehousing
PDF
GraphTech Ecosystem - part 1: Graph Databases
PDF
Mastering in Data Warehousing and Business Intelligence
PPTX
Big Data with SQL Server
PPTX
Enterprise architecture for big data projects
PDF
It Don’t Mean a Thing If It Ain’t Got Semantics
PPTX
Enterprise Architecture in the Era of Big Data and Quantum Computing
PDF
How to migrate to GraphDB in 10 easy to follow steps
PPTX
Solution architecture for big data projects
PPTX
Big Data and Hadoop
PPTX
Big data frameworks
PPTX
introduction to big data frameworks
PPTX
Hadoop - An Introduction
PDF
Future of Data - Big Data
PDF
How is smart data cooked?
PPTX
Bigdata
BigData-Architecture
Big data landscape
Big Tools for Big Data
Big Data Pitfalls
Data Mining and Data Warehousing
GraphTech Ecosystem - part 1: Graph Databases
Mastering in Data Warehousing and Business Intelligence
Big Data with SQL Server
Enterprise architecture for big data projects
It Don’t Mean a Thing If It Ain’t Got Semantics
Enterprise Architecture in the Era of Big Data and Quantum Computing
How to migrate to GraphDB in 10 easy to follow steps
Solution architecture for big data projects
Big Data and Hadoop
Big data frameworks
introduction to big data frameworks
Hadoop - An Introduction
Future of Data - Big Data
How is smart data cooked?
Bigdata
Ad

Similar to Big data technologies with Case Study Finance and Healthcare (20)

PPTX
Introduction to Big Data
PDF
PPTX
Chapter1-Introduction Εισαγωγικές έννοιες
PDF
Introduction to Big Data
PPT
Lecture 5 - Big Data and Hadoop Intro.ppt
PDF
Big data-analytics-cpe8035
PPTX
Big Data in Action : Operations, Analytics and more
PPTX
Big data analytics: Technology's bleeding edge
PPTX
Big data seminor
PPT
Data analytics & its Trends
PPTX
Big data
PPTX
Big data
PPTX
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
PPTX
How Big Data ,Cloud Computing ,Data Science can help business
DOCX
Big data lecture notes
PDF
IRJET- A Scenario on Big Data
PPTX
Sharing a Startup’s Big Data Lessons
PDF
Big data
PPTX
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
PDF
Big Data Analytics Lecture notes pdf notes
Introduction to Big Data
Chapter1-Introduction Εισαγωγικές έννοιες
Introduction to Big Data
Lecture 5 - Big Data and Hadoop Intro.ppt
Big data-analytics-cpe8035
Big Data in Action : Operations, Analytics and more
Big data analytics: Technology's bleeding edge
Big data seminor
Data analytics & its Trends
Big data
Big data
Unit-1 -2-3- BDA PIET 6 AIDS.pptx
How Big Data ,Cloud Computing ,Data Science can help business
Big data lecture notes
IRJET- A Scenario on Big Data
Sharing a Startup’s Big Data Lessons
Big data
bda ghhhhhftttyygghhjjuuujjjhhunit1.pptx
Big Data Analytics Lecture notes pdf notes
Ad

More from Sandeep Sharma IIMK Smart City,IoT,Bigdata,Cloud,BI,DW (20)

PDF
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
PPTX
Digital transformation journey Consulting
DOCX
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
DOCX
Risk management Consulting For Municipality
DOCX
GDPR And Privacy By design Consultancy
PPTX
Real implementation Blockchain Best Use Cases Examples
DOCX
Biztalk architecture for Configured SMS service
DOCX
DOCX
Cloud manager client provisioning guideline draft 1.0
PPTX
DOCX
Government Digital transformation trend draft 1.0
DOCX
Enterprise architecture maturity rating draft 1.0
DOCX
Organisation Structure For digital Transformation Team
Management Consultancy Saudi Telecom Digital Transformation Design Thinking
Digital transformation journey Consulting
Lnt and bbby Retail Houseare industry Case assignment sandeep sharma
Risk management Consulting For Municipality
GDPR And Privacy By design Consultancy
Real implementation Blockchain Best Use Cases Examples
Biztalk architecture for Configured SMS service
Cloud manager client provisioning guideline draft 1.0
Government Digital transformation trend draft 1.0
Enterprise architecture maturity rating draft 1.0
Organisation Structure For digital Transformation Team

Recently uploaded (20)

PPTX
L1 - Introduction to python Backend.pptx
PDF
medical staffing services at VALiNTRY
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PDF
Design an Analysis of Algorithms I-SECS-1021-03
PDF
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
How to Migrate SBCGlobal Email to Yahoo Easily
PDF
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
PDF
How Creative Agencies Leverage Project Management Software.pdf
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 41
PDF
System and Network Administraation Chapter 3
PDF
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
PPTX
Online Work Permit System for Fast Permit Processing
PDF
Design an Analysis of Algorithms II-SECS-1021-03
PDF
AI in Product Development-omnex systems
PDF
Wondershare Filmora 15 Crack With Activation Key [2025
PDF
Nekopoi APK 2025 free lastest update
PPTX
Introduction to Artificial Intelligence
PDF
2025 Textile ERP Trends: SAP, Odoo & Oracle
L1 - Introduction to python Backend.pptx
medical staffing services at VALiNTRY
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Design an Analysis of Algorithms I-SECS-1021-03
T3DD25 TYPO3 Content Blocks - Deep Dive by André Kraus
CHAPTER 2 - PM Management and IT Context
How to Migrate SBCGlobal Email to Yahoo Easily
Addressing The Cult of Project Management Tools-Why Disconnected Work is Hold...
How Creative Agencies Leverage Project Management Software.pdf
Internet Downloader Manager (IDM) Crack 6.42 Build 41
System and Network Administraation Chapter 3
Audit Checklist Design Aligning with ISO, IATF, and Industry Standards — Omne...
Online Work Permit System for Fast Permit Processing
Design an Analysis of Algorithms II-SECS-1021-03
AI in Product Development-omnex systems
Wondershare Filmora 15 Crack With Activation Key [2025
Nekopoi APK 2025 free lastest update
Introduction to Artificial Intelligence
2025 Textile ERP Trends: SAP, Odoo & Oracle

Big data technologies with Case Study Finance and Healthcare

  • 1. The Term Bigdata stems from Characterisized by 5V: Volume: Large Volume of data Velocity: amount of data per seconds Variability: level of unintentional modification affecting data Quality throughout lifecycle of data. Value: Value derived from data. Variety: large range of data which is received from video , audio, text, image.
  • 2. Sources Example by 5V. Volume: Youtube, large volume of video feeds received and maintained at many video sites like youtube, vimeo etc… Variety: Large variety of data text, audio, video, images, received in sites like facebook, twitter, other social media platforms. Velocity: Speed at which data is received in sites like twitter, facebook (1 billion people all feeding there data on one site)
  • 3. Batch Processing Vs Real Time processing Batch Jobs run at particular time of day like Nightly jobs or morning jobs which depends on slack time When server has less load. But people now want to see the Status like in transportation when bus is arriving on particular stand in real time. Or in Retail as soon they update there status the require real time advertisements. This is shaping move towards Big data.
  • 4. Problems differentiated by 5V. Velocity: With large volume of data received and quick turn around latency required to reflect the data fed at facebook then Can it be managed by regular DBMS? DBMS- maintains ACID properties & have lots of constraints like primary, foreign keys, check constraints etc.. with quick turnaround or short latency required these constraints add up processing time and volume required for storage. So all of these sites have there own File based storage DBMS like systems with does not have these constraints. All data is maintained in files, id assigned to files are indexed and regularly moved (these are publically know open sourced databases like Cassandra developed by facebook, BigTable by Google, etc…) Most of this databases are popularly Categorized as NoSQL databases.
  • 5. Technology Company Open Sourced On Cassandra DataStax Apache Cassandra used by Facebook , Linkedin , Twitter BigTable Google Google BigTable Apache HBase Apache HBase ( used by many companies most popular) MongoDB MongoDB Inc. Apache (written on C++,Erlang,C) Couchbase CouchBase Inc Apache (written on Erlang)
  • 6. Category No SQL database Column Oriented Accumulo, Cassandra, Hbase. Document Clusterpoint,Couchdb, Couchbase, MarkLogic, MongoDB Key-Value Dynamo, FoundationDB, MemcacheDB, Redis, Riak, FairCom c- treeACE Graph Allegro, Neo4J, OrientDB, Virtuoso, Stardog - Column Oriented DB store database store Values in Column By Column rather in other RDBMS row by row. - It leads to better Compression Of data and hence less space required to store DB. - There are Still higher Compression can be achieved when used Probabilistic Databases. - Similarly Document oriented Store and arrange data in form of documents. - Key-Value store Data in form of collection of Key-value pairs. Allowing add, insert, delete to key-value pairs. - Graph Databases: Every Element is direct pointer to its adjacent hence no- lookup required.
  • 7. Go through the link below: http://sandyclassic.wordpress.com/2013/07/02/data- warehousing-business-intelligence-and-cloud- computing
  • 8. As we know now Bigdata is solving problems of 5V like the huge (V)olume of storage required for video sites like youtube. Etc. It’s changing how We perceive and Visualize or Analyze data like HBase used for data storage, Mahout of used to run analytics and find patterns. These databases have variety of data which require different kind of processing cannot be achieved by traditional RDBMS based products. Example link below: http://sandyclassic.wordpress.com/2013/06/18/gini- coefficient-of-economics-and-roc-curve-machine- learning/
  • 9. Map-Reduce Algorithm was starting point of All we see in BigData created by Google researcher. Mapper divides work into multiple parallel task, sorts within queue and filters into queue of say 1 queue for each name. Reducer Component Aggregates data or summarizes from multiple units.
  • 11. So Since data is mostly unstructured the best way to analyze unstructured data is using Analytics here Comes New Career Called Data Scientist. Skill Set Required for Data Scientist: Mathematics (mostly statistics), Computer Science, Domain like Sociology (like Social Media Analysis),
  • 14. One application of Bigdata has been to gather feedback about product from social media. Here is Sample project Report below How and what tools can be used to Analyze social media. http://www.slideshare.net/SandeepSharma65/social- media-analysis-project
  • 15. Hadoop allows to distribute load among many clusters. There can be Database clusters, OS clusters, Application Web server level clustering But here we are dealing with OS like Distributed File System(DFS). Hadoop DFS (HDFS) File system developed by yahoo Competes with BigTable of Google providing quick storage and retrieval of data in form of files used by many social media platforms.
  • 16. ‘R’ was open source Statistical Analysis language having Statistical Constructs available used for Analysis of data. Java data mining API, .Net data mining API , python libraries are used to mine and understand trends in Data. PIG is another Apache Hadoop based system used provide High level language for analyzing large data sets.
  • 17. Data Science http://thedatascience.wordpress.com/ Big Data :http://thebigdatatrends.wordpress.com Data Science Blog2: http://thedatascientistview.blogspot.ie/
  • 18. Retail generates huge amount of data for product positioned on different shelf at store, replenishment level, reorder level, merchandising, assortment planning all this data most of it usually structured Since lots of system is Automated but there are lots of forms, customer feedback, planning data analysis of mails other chat platforms. Large Warehouses of Retail store needs plan positioning and containers in Aisle. Analyze trends from social media to find customer preferences for products and offers. Retail Innovation read: http://sandyclassic.wordpress.com/2013/10/26/retail- sector-innovations/
  • 19. Retail uses lots of Sensors for tracking items with warehouse and inside Store. The Huge real time data (video , text and other forms) generated every milli- second from Sensors embedded across every store and warehouse Cannot be analyzed by any other medium better than in Hadoop or Bigdata based System.
  • 20. Finance being Game of numbers huge data from Book of accounts, P&L, Balance sheets of etc accumulates of different business over a period of time But most books are Structured and hence the data. But Hadoop offers huge scalable clusters to quickly analyze structured data as well. Lots of social media data about interest for share or any instrument does get reflected in numbers. Spreadsheets are popular medium of analysis and other textual forms can be better analyzed if available over Hadoop like clusters for a kind of semi-structured data analysis.