SlideShare a Scribd company logo
2
Most read
4
Most read
6
Most read
CLOUD-BASED BIG DATA
ANALYTICS
INTRODUCTION:
• With the advent of the digital age, the amount of data being
generated, stored and shared has been on the rise. From data
warehouses, social media, webpages and blogs to audio/video
streams, all of these are sources of massive amounts of data.
• This data has huge potential, ever-increasing complexity,
insecurity and risks, and irrelevance.
• Big data, by definition, is a term used to
describe a variety of data -structured, semi-
structured and unstructured, which makes it a
complex data infrastructure.
• Big data includes variety, volume, velocity
and veracity
• The different types of data available on a dataset
determine variety while the rate at which data is
produced determines Velocity.
• Predictably, the size of data is called Volume.
• Veracity indicates data reliability.
INTRODUCTION: CNTD…
INTRODUCTION: CNTD…
• The cloud computing environment offers
development, installation and
implementation of software and data
applications ‘as a service’.
• software as a service(SaaS)
• Platform as a service(PaaS)
• Infrastructure as a service(IaaS)
• Infrastructure-as-a-service is a model that
provides computing and storage resources as
a service.
• in case of PaaS and SaaS, the cloud services
provide software platform or software itself
LITERATURE SURVEY:
• Traditional data management tools and data processing or data
mining techniques cannot be used for Big Data Analytics for the
large volume and complexity of the datasets that it includes.
• Conventional business intelligence applications make use of
methods, which are based on traditional analytics methods and
techniques and make use of OLAP, BPM, Mining and database
systems like RDBMS.
• One of the most popular models used for data processing on
cluster of computers is MapReduce.
• Hadoop is simply an open-source implementation of the
MapReduce framework, which was originally created as a
distributed file system.
PROBLEM STATEMENT:
• In order to move beyond the existing techniques and strategies
used for machine learning and data analytics, some challenges
need to be overcome. NESSI identifies the following
requirements as critical.
• In order to select an adequate method or design, a solid scientific
foundation needs to be developed.
• New efficient and scalable algorithms need to be developed.
• For proper implementation of devised solutions, appropriate
development skills and technological platforms must be identified and
developed.
• Lastly, the business value of the solutions must be explored just as
much as the data structure and its usability.
PROBLEM STATEMENT:CNTD…
• This section, describes two example applications where large
scale data management over cloud is used. These are specific
use-case examples in telecom and finance.
• In the telecom domain, massive amount of call detail records
can be processed to generate near real-time network usage
information.
• In finance domain it can be describe the fraud detection
application.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
1.Dashboard for CDR Processing:
• Telecom operators are interested in building a dashboard that would
allow the analysts and architects to understand the traffic flowing
through the network along various dimensions of interest.
• The traffic is captured using Call Detail Records (CDRs) whose volume
runs into a terabyte per day.
• CDR is a structured stream generated by the telecom switches to
summarize various aspects of individual services like voice, SMS, MMS,
etc.
• The dashboard include determining the cell site used most for each
customer, identifying whether users are mostly making calls within cell
site calls, and for cell sites in rural areas identifying the source of traffic
i.e. local versus routed calls.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
1.Dashboard for CDR Processing: CNTD…
• Given the huge and ever growing customer base and large call volumes,
solutions using traditional warehouse will not be able to keep-up with
the rates required for effective operation.
• The need is to process the CDRs in near real-time, mediate them (i.e.,
collect CDRs from individual switches, stitch, validate, filter, and
normalize them), and create various indices which can be exploited by
dashboard among other applications.
• An IBM Stream Processing Language (SPL) based system leads to
mediating 6 billion CDRs per day.
• CDRs can be loaded periodically over cloud data management solution.
As cloud provides flexible storage, depending on traffic one can decide
on the storage required.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
2. Credit Card Fraud Detection:
• More than one-tenth of world’s population is shopping online. Credit
card is the most popular mode of online payments. As the number of
credit card transactions rise, the opportunities for attackers to steal
credit card details and commit fraud are also increasing.
• As the attacker only needs to know some details about the card (card
number, expiration date, etc.), the only way to detect online credit card
fraud is to analyze the spending patterns and detect any inconsistency
with respect to usual spending patterns.
• The companies keep tabs on the geographical locations where the credit
card transactions are made—if the area is far from the card holder’s area
of residence, or if two transactions from the same credit card are made
in two very distant areas within a relatively short timeframe, — then the
transactions are potentially fraud transactions.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
2. Credit Card Fraud Detection:CNTD…
• Various data mining algorithms are used to detect patterns within the
transaction data. Detecting these patterns requires the analysis of large
amount of data.
• Using tuples of the transactions, one can find the distance between
geographic locations of two consecutive transactions, amount of these
transactions, etc. By these parameters, one can find the potential
fraudulent transactions. Further data mining, based on a particular
user’s spending profile can be used to increase the confidence whether
the transaction is indeed fraudulent.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
2. Credit Card Fraud Detection:CNTD…
• As number of credit card transactions is huge and the kind of processing
required is not a typical relational processing (hence, warehouses are not
optimized to do such processing), one can use Hadoop based solution
for this purpose as depicted.
• Using Hadoop one can create customer profile as well as creating
matrices of consecutive transactions to decide whether a particular
transaction is a fraud transaction. As one needs to find the fraud with-in
some specified time, stream processing can help.
• By employing massive resources for analyzing potentially fraud
transactions one can meet the response time guarantees.
DESIGN, IMPLEMENTATION AND RESULT
ANALYSIS DETAILS:
3. Result Analysis:
• Several open source data mining techniques, resources
and tools exist. Some of these include R, Gate, Rapid-
Miner and Weka, in addition to many others.
• Cloud-based big data analytics solutions must provide
a provision for the availability of these affordable data
analytics on the cloud so that cost-effective and
efficient services can be provided.
• The fundamental reason why cloud-based analytics are
such a big thing is their easy accessibility, cost-
effectiveness and ease of setting up and testing.
Cloud-Based Big Data Analytics
CONCLUSION AND FUTURE RESEARCH
DIRECTION:
• This is an age of big data and the emergence of this field of
study has attracted the attention of many practitioners and
researchers.
• Considering the rate at which data is being created in the
digital world, big data analytics and analysis have become all
the more relevant.
• The cloud infrastructure suffices the storage and computing
requirements of data analytics algorithms. On the other hand,
open issues like security, privacy and the lack of ownership and
control exist.
• Research studies in the area of cloud-based big data analytics
THANK YOU

More Related Content

PPTX
Ppt full stack developer
PPT
Final year project proposal
PDF
Petronas drilling operations guideline
PDF
Oil and Gas Industry.pdf
PPTX
Supply Chain - Petroleum
PDF
Google Cloud Machine Learning
PPTX
Unit-I Introduction to Cloud Computing.pptx
PPTX
Cloud Computing & Big Data
Ppt full stack developer
Final year project proposal
Petronas drilling operations guideline
Oil and Gas Industry.pdf
Supply Chain - Petroleum
Google Cloud Machine Learning
Unit-I Introduction to Cloud Computing.pptx
Cloud Computing & Big Data

What's hot (20)

PDF
Overview of big data in cloud computing
PPTX
Hadoop And Their Ecosystem ppt
PPTX
Hadoop File system (HDFS)
PPTX
Big Data Open Source Technologies
PPT
Big data Analytics
PDF
Outlier detection method introduction
PPTX
Introduction to Big Data
PDF
Big Data: Its Characteristics And Architecture Capabilities
PPTX
Big data and Hadoop
PPTX
Kdd process
PPTX
Big Data in the Cloud
PPT
Mining Frequent Patterns, Association and Correlations
PDF
Intro to Graphs and Neo4j
PPT
4.3 multimedia datamining
PDF
UNIT 1 -BIG DATA ANALYTICS Full.pdf
PPTX
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
PDF
Big Data & Analytics (Conceptual and Practical Introduction)
PPTX
Brain tumor detection using image segmentation ppt
PPTX
Ensemble methods in machine learning
Overview of big data in cloud computing
Hadoop And Their Ecosystem ppt
Hadoop File system (HDFS)
Big Data Open Source Technologies
Big data Analytics
Outlier detection method introduction
Introduction to Big Data
Big Data: Its Characteristics And Architecture Capabilities
Big data and Hadoop
Kdd process
Big Data in the Cloud
Mining Frequent Patterns, Association and Correlations
Intro to Graphs and Neo4j
4.3 multimedia datamining
UNIT 1 -BIG DATA ANALYTICS Full.pdf
EX-6-Implement Matrix Multiplication with Hadoop Map Reduce.pptx
Big Data & Analytics (Conceptual and Practical Introduction)
Brain tumor detection using image segmentation ppt
Ensemble methods in machine learning
Ad

Similar to Cloud-Based Big Data Analytics (20)

PDF
Harnessing Big Data_UCLA
PPTX
Kaushal Amin & Big 5 IT trends in the world
PPTX
Technology Trends and Big Data in 2013-2014
PPTX
Cloud computing
PDF
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
PDF
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
PDF
Barga ACM DEBS 2013 Keynote
PPTX
PPTX
Big Data analytics subpart distributed computing
PPTX
SMAC - Social, Mobile, Analytics and Cloud - An overview
PDF
Big data Analytics
PPTX
10-IoT Data Analytics, Cloud Computing for IoT, Cloud Based platforms, ML for...
PPTX
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
PPTX
bigdataintro.pptx
PPTX
Overview of Cloud Computing
PPTX
Unit 1 (1).pptx
PPTX
Future of the cloud
PPTX
Introduction to Cloud Computing, Overview
PPT
Securing Apps and Data in the Cloud - July 23 2014 Toronto Board of Trade
PPTX
Speaker Presention by Irena Bojanova of the University of Maryland University...
Harnessing Big Data_UCLA
Kaushal Amin & Big 5 IT trends in the world
Technology Trends and Big Data in 2013-2014
Cloud computing
EVOLVING PATTERNS IN BIG DATA - NEIL AVERY
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Barga ACM DEBS 2013 Keynote
Big Data analytics subpart distributed computing
SMAC - Social, Mobile, Analytics and Cloud - An overview
Big data Analytics
10-IoT Data Analytics, Cloud Computing for IoT, Cloud Based platforms, ML for...
Insurtech, Cloud and Cybersecurity - Chartered Insurance Institute
bigdataintro.pptx
Overview of Cloud Computing
Unit 1 (1).pptx
Future of the cloud
Introduction to Cloud Computing, Overview
Securing Apps and Data in the Cloud - July 23 2014 Toronto Board of Trade
Speaker Presention by Irena Bojanova of the University of Maryland University...
Ad

Recently uploaded (20)

PPT
Mechanical Engineering MATERIALS Selection
PPTX
bas. eng. economics group 4 presentation 1.pptx
PPTX
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
PDF
Operating System & Kernel Study Guide-1 - converted.pdf
PPTX
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
PDF
R24 SURVEYING LAB MANUAL for civil enggi
PPTX
UNIT-1 - COAL BASED THERMAL POWER PLANTS
PDF
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
PDF
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
PPTX
UNIT 4 Total Quality Management .pptx
PPTX
Construction Project Organization Group 2.pptx
PPTX
Internet of Things (IOT) - A guide to understanding
DOCX
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
PDF
Well-logging-methods_new................
PDF
Embodied AI: Ushering in the Next Era of Intelligent Systems
PPTX
Lecture Notes Electrical Wiring System Components
PPTX
Geodesy 1.pptx...............................................
PPTX
Sustainable Sites - Green Building Construction
PPT
Project quality management in manufacturing
PDF
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf
Mechanical Engineering MATERIALS Selection
bas. eng. economics group 4 presentation 1.pptx
MET 305 2019 SCHEME MODULE 2 COMPLETE.pptx
Operating System & Kernel Study Guide-1 - converted.pdf
Recipes for Real Time Voice AI WebRTC, SLMs and Open Source Software.pptx
R24 SURVEYING LAB MANUAL for civil enggi
UNIT-1 - COAL BASED THERMAL POWER PLANTS
keyrequirementskkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk
Enhancing Cyber Defense Against Zero-Day Attacks using Ensemble Neural Networks
UNIT 4 Total Quality Management .pptx
Construction Project Organization Group 2.pptx
Internet of Things (IOT) - A guide to understanding
ASol_English-Language-Literature-Set-1-27-02-2023-converted.docx
Well-logging-methods_new................
Embodied AI: Ushering in the Next Era of Intelligent Systems
Lecture Notes Electrical Wiring System Components
Geodesy 1.pptx...............................................
Sustainable Sites - Green Building Construction
Project quality management in manufacturing
TFEC-4-2020-Design-Guide-for-Timber-Roof-Trusses.pdf

Cloud-Based Big Data Analytics

  • 2. INTRODUCTION: • With the advent of the digital age, the amount of data being generated, stored and shared has been on the rise. From data warehouses, social media, webpages and blogs to audio/video streams, all of these are sources of massive amounts of data. • This data has huge potential, ever-increasing complexity, insecurity and risks, and irrelevance.
  • 3. • Big data, by definition, is a term used to describe a variety of data -structured, semi- structured and unstructured, which makes it a complex data infrastructure. • Big data includes variety, volume, velocity and veracity • The different types of data available on a dataset determine variety while the rate at which data is produced determines Velocity. • Predictably, the size of data is called Volume. • Veracity indicates data reliability. INTRODUCTION: CNTD…
  • 4. INTRODUCTION: CNTD… • The cloud computing environment offers development, installation and implementation of software and data applications ‘as a service’. • software as a service(SaaS) • Platform as a service(PaaS) • Infrastructure as a service(IaaS) • Infrastructure-as-a-service is a model that provides computing and storage resources as a service. • in case of PaaS and SaaS, the cloud services provide software platform or software itself
  • 5. LITERATURE SURVEY: • Traditional data management tools and data processing or data mining techniques cannot be used for Big Data Analytics for the large volume and complexity of the datasets that it includes. • Conventional business intelligence applications make use of methods, which are based on traditional analytics methods and techniques and make use of OLAP, BPM, Mining and database systems like RDBMS. • One of the most popular models used for data processing on cluster of computers is MapReduce. • Hadoop is simply an open-source implementation of the MapReduce framework, which was originally created as a distributed file system.
  • 6. PROBLEM STATEMENT: • In order to move beyond the existing techniques and strategies used for machine learning and data analytics, some challenges need to be overcome. NESSI identifies the following requirements as critical. • In order to select an adequate method or design, a solid scientific foundation needs to be developed. • New efficient and scalable algorithms need to be developed. • For proper implementation of devised solutions, appropriate development skills and technological platforms must be identified and developed. • Lastly, the business value of the solutions must be explored just as much as the data structure and its usability.
  • 7. PROBLEM STATEMENT:CNTD… • This section, describes two example applications where large scale data management over cloud is used. These are specific use-case examples in telecom and finance. • In the telecom domain, massive amount of call detail records can be processed to generate near real-time network usage information. • In finance domain it can be describe the fraud detection application.
  • 8. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 1.Dashboard for CDR Processing: • Telecom operators are interested in building a dashboard that would allow the analysts and architects to understand the traffic flowing through the network along various dimensions of interest. • The traffic is captured using Call Detail Records (CDRs) whose volume runs into a terabyte per day. • CDR is a structured stream generated by the telecom switches to summarize various aspects of individual services like voice, SMS, MMS, etc. • The dashboard include determining the cell site used most for each customer, identifying whether users are mostly making calls within cell site calls, and for cell sites in rural areas identifying the source of traffic i.e. local versus routed calls.
  • 9. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 1.Dashboard for CDR Processing: CNTD… • Given the huge and ever growing customer base and large call volumes, solutions using traditional warehouse will not be able to keep-up with the rates required for effective operation. • The need is to process the CDRs in near real-time, mediate them (i.e., collect CDRs from individual switches, stitch, validate, filter, and normalize them), and create various indices which can be exploited by dashboard among other applications. • An IBM Stream Processing Language (SPL) based system leads to mediating 6 billion CDRs per day. • CDRs can be loaded periodically over cloud data management solution. As cloud provides flexible storage, depending on traffic one can decide on the storage required.
  • 10. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection: • More than one-tenth of world’s population is shopping online. Credit card is the most popular mode of online payments. As the number of credit card transactions rise, the opportunities for attackers to steal credit card details and commit fraud are also increasing. • As the attacker only needs to know some details about the card (card number, expiration date, etc.), the only way to detect online credit card fraud is to analyze the spending patterns and detect any inconsistency with respect to usual spending patterns. • The companies keep tabs on the geographical locations where the credit card transactions are made—if the area is far from the card holder’s area of residence, or if two transactions from the same credit card are made in two very distant areas within a relatively short timeframe, — then the transactions are potentially fraud transactions.
  • 11. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection:CNTD… • Various data mining algorithms are used to detect patterns within the transaction data. Detecting these patterns requires the analysis of large amount of data. • Using tuples of the transactions, one can find the distance between geographic locations of two consecutive transactions, amount of these transactions, etc. By these parameters, one can find the potential fraudulent transactions. Further data mining, based on a particular user’s spending profile can be used to increase the confidence whether the transaction is indeed fraudulent.
  • 12. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 2. Credit Card Fraud Detection:CNTD… • As number of credit card transactions is huge and the kind of processing required is not a typical relational processing (hence, warehouses are not optimized to do such processing), one can use Hadoop based solution for this purpose as depicted. • Using Hadoop one can create customer profile as well as creating matrices of consecutive transactions to decide whether a particular transaction is a fraud transaction. As one needs to find the fraud with-in some specified time, stream processing can help. • By employing massive resources for analyzing potentially fraud transactions one can meet the response time guarantees.
  • 13. DESIGN, IMPLEMENTATION AND RESULT ANALYSIS DETAILS: 3. Result Analysis: • Several open source data mining techniques, resources and tools exist. Some of these include R, Gate, Rapid- Miner and Weka, in addition to many others. • Cloud-based big data analytics solutions must provide a provision for the availability of these affordable data analytics on the cloud so that cost-effective and efficient services can be provided. • The fundamental reason why cloud-based analytics are such a big thing is their easy accessibility, cost- effectiveness and ease of setting up and testing.
  • 15. CONCLUSION AND FUTURE RESEARCH DIRECTION: • This is an age of big data and the emergence of this field of study has attracted the attention of many practitioners and researchers. • Considering the rate at which data is being created in the digital world, big data analytics and analysis have become all the more relevant. • The cloud infrastructure suffices the storage and computing requirements of data analytics algorithms. On the other hand, open issues like security, privacy and the lack of ownership and control exist. • Research studies in the area of cloud-based big data analytics