SlideShare a Scribd company logo
Anexinet Big Data

Solutions for Big Data Analytics
Big Data Defined



Volume                                 Velocity
• Datasets that grow too large to      • Large volume streaming data that
  easily manage in traditional RDBMS     can overwhelm traditional BI & ETL
• TBs, PBs, ZBs                          processes



Variety                                Value
• Data sources extraneous to           • Big Data can have a
  traditional business systems that      transformational effect on business
  can be unstructured and require        when the proper systems and
  text analytics                         processes are put in place
Big Data vs. Classic BI

 What is different from classic DW/BI and Big Data Analytics?
     Businesses today treat data warehouse & business intelligence as must-have reporting and
      operational capability
     Businesses that are not fully mature in BI lifecycle may struggle with Big Data

 Big Data Projects look for untapped analytics, not BI dashboards
 SCALE: Think Volume, Variety and Velocity
     Yahoo! Uses Microsoft SQL Server & Analysis Services, with Hadoop, Oracle & Tableau
         38,000 machines distributed across 20 different clusters
     2-petabyte Hadoop cluster that feeds 1.2 terabytes of raw data each day into Oracle RAC
     Data is compressed and 135 gigabytes of data per day is sent to a SQL Server 2008 R2 Analysis
      Services cube
     Cube produces 24 terabytes of data each quarter
     http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
Scalable Big Data Platform Architecture



                            HDFS Cluster                                       In-memory
                                                                                  cubes




                              MapReduce
                              Framework
                                                                            Analytical
                                                                                                  Advanced in-
                                                                           Columnstore
                                                      MPP                                        memory analytics
                                                                              Tables
                                                    Database


                  Hadoop                                           Analytics




                                                     Star                                          Ad-hoc data
                                                   Schemas                                          discovery


                                           Data Warehouse                                  End User Reporting


© Copyright 2013 Anexinet Corp.                                                                                     4
Go Beyond Dashboards. Provide Advanced Analytics.

 Large number of data
                                                                   Tableau
  points adds new business
  value

 Big Data advanced
  analytics requires tool that                                          Microsoft Power
  can sample complex data                                                    View
  sources

 Must provide quick
  aggregations of large data
  sets that are easily                              Qlikview
  consumed by the human
  eye

 Must provide “data
  discovery” for ad-hoc
  analysis
Marketing Samples

 Enhance marketing
  campaigns with Big Data

 Social analytics,
  customer analytic,
  targeted marketing,
  brand sentiment

 Big Data has proven
  transformational for
  marketing organizations
  (Razorfish, Yahoo!,
  NBC, [x+1])




                               Web Analytics from Google Analytics
Anexinet Big Data Offerings

Strategy Engagement
• Customer stakeholder interviews & interactive sessions
• Define Big Data Requirements
• Design Big Data Strategy
• Deliver Strategy & Roadmap Documents


     Starter Solution
     • Let Anexinet handle the hardest parts of a Big Data solution
       * Getting started
       * Collecting & processing data
       * Uncover business value from Big Data


Big Data Project Engagement
• End-to-end Big Data project
  * Big Data Discovery
  * Big Data Platform
  * Big Data Analytics
  * Big Data Visualizations
Partnerships



  Big Data Platforms     Big Data Databases   Big Data Visualizations


• EMC Greenplum        • HP Vertica           • QlikView
• Hortonworks          • EMC Greenplum        • Tableau
  (OSS, MSFT, HP)      • Microsoft PDW        • Microsoft PowerPivot
• Cloudera             • Oracle Exalytics     • Microsoft Power View
  (OSS, Oracle, HP)    • Oracle Big Data
                         Appliance
A Credible Partner to Deploy Big Data Solutions



    Security           Integration         Configuration         Governance

• Ensure           • ETL / ELT           • Configure the      • Ensure Data
  privacy of PII   • Integrate             Big Data             Quality
                     Hadoop into           environment to     • MDM
• Conform Big        your DW &             maximize           • Process
  Data solution      Analytics             throughput,          Governance
  to your            environments          performance
  enterprise       • Integrate Big         and analytics to
  security           Data into your IT     meet your
                     investments           stated SLA goals
  standards
Top Impediments to Successful Big Data Analytics
Big Data Buzzword Glossary
 Big Data: Think 3 v’s, unstructured data, data that is not currently managed in DW. This is the data that
  companies need to do game-changing analytics.
 Big Data Analytics: Business insights gained from mining Big Data to transform business processes
 Columnar: Column-oriented databases that are used in Big Data scenarios because of their speed and
  compression capabilities, i.e. HP Vertica, HBase
 Hadoop: Apache open-source framework for Big Data processing. Made up of multiple components. The
  leading Big Data platform. Marketed by Couldera & Hortonworks.
 In-memory DB: A database that resides fully in memory, eliminating IO bottlenecks. Very important in Big
  Data Analytics systems, i.e. Microsoft PowerPivot, SSAS 2012, SAP HANA
 MapReduce: Distributed data programming and processing framework. A key aspect of processing Big
  Data is using a MapReduce framework across distributed clusters of commodity servers. Available as
  open source in the Hadoop framework and in various Hadoop distribution flavors.
 MPP: Massively Parallel Processing database engine, mostly used for data warehouse & BI workloads.
  I.e. SQL Server PDW, IBM Netezza, Teradata
 NoSQL: Key-value data store for quick eventual-ACID schemaless database writes. Big Data systems will
  use these to store data coming in from sources that dump large amounts of data quickly, i.e. Cassandra,
  MongoDB.

More Related Content

PDF
Modern data warehouse
PDF
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
PDF
Accelerate Self-Service Analytics with Data Virtualization and Visualization
PDF
Can data virtualization uphold performance with complex queries?
PDF
Enterprise Data Lake - Scalable Digital
PDF
Data architecture for modern enterprise
PPTX
The Future of Data Warehousing: ETL Will Never be the Same
PPTX
Hadoop and Your Data Warehouse
Modern data warehouse
WP_Impetus_2016_Guide_to_Modernize_Your_Enterprise_Data_Warehouse_JRoberts
Accelerate Self-Service Analytics with Data Virtualization and Visualization
Can data virtualization uphold performance with complex queries?
Enterprise Data Lake - Scalable Digital
Data architecture for modern enterprise
The Future of Data Warehousing: ETL Will Never be the Same
Hadoop and Your Data Warehouse

What's hot (20)

PPTX
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
PDF
Data warehouse-optimization-with-hadoop-informatica-cloudera
PDF
Modern Data Management for Federal Modernization
PDF
A beginners guide to Cloudera Hadoop
PDF
Microsoft SQL Azure - Cloud Based Database Datasheet
PPTX
Hadoop: Extending your Data Warehouse
PPTX
Integrating with Azure Data Lake
PDF
Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
PPTX
From Traditional Data Warehouse To Real Time Data Warehouse
PPTX
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
PDF
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
PDF
Magic quadrant for data warehouse database management systems
PDF
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
PPTX
Pervasive analytics through data & analytic centricity
PPTX
PDF
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
 
PDF
Data Virtualization: From Zero to Hero
PPT
Data Mining and Data Warehousing
PDF
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
PPTX
Introduction to Microsoft’s Master Data Services (MDS)
Filling the Data Lake - Strata + HadoopWorld San Jose 2016 Preview Presentation
Data warehouse-optimization-with-hadoop-informatica-cloudera
Modern Data Management for Federal Modernization
A beginners guide to Cloudera Hadoop
Microsoft SQL Azure - Cloud Based Database Datasheet
Hadoop: Extending your Data Warehouse
Integrating with Azure Data Lake
Webinar: Emerging Trends in Data Architecture – What’s the Next Big Thing?
From Traditional Data Warehouse To Real Time Data Warehouse
How One Company Offloaded Data Warehouse ETL To Hadoop and Saved $30 Million
White paper making an-operational_data_store_(ods)_the_center_of_your_data_...
Magic quadrant for data warehouse database management systems
Designing Fast Data Architecture for Big Data using Logical Data Warehouse a...
Pervasive analytics through data & analytic centricity
Use Big Data Technologies to Modernize Your Enterprise Data Warehouse
 
Data Virtualization: From Zero to Hero
Data Mining and Data Warehousing
Logical Data Lakes: From Single Purpose to Multipurpose Data Lakes (APAC)
Introduction to Microsoft’s Master Data Services (MDS)
Ad

Viewers also liked (20)

PPT
Knowles Award 2011
PPTX
Understanding Physician/ Patient Conversations Online
PPTX
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
PPTX
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
PPTX
Microsoft Event Registration System Hosted on Windows Azure
PPTX
What's new in SQL Server 2012 for philly code camp 2012.1
PPTX
Big Data in the Cloud with Azure Marketplace Images
DOCX
MEC Data sheet
PPTX
PSSUG Nov 2012: Big Data with SQL Server
PPTX
Big Data with SQL Server
PPTX
Pentaho Big Data Analytics with Vertica and Hadoop
PPTX
Big Data in the Real World
PPTX
Rohit Bhargava, Influential Marketing Group: How To (Actually) Predict the Fu...
PPTX
Robert Hastings, Bell Helicopter: Lead Like a Warrior
PPTX
Pentaho Analytics on MongoDB
PPTX
Francesca DeMartino, Medtronic: Adding Patient Value Through Partnerships
PPTX
Big Data Analytics Projects - Real World with Pentaho
PPTX
Sql server 2012 roadshow masd overview 003
PPTX
Microsoft SQL Server Data Warehouses for SQL Server DBAs
PPTX
Azure vs. amazon
Knowles Award 2011
Understanding Physician/ Patient Conversations Online
Philly Code Camp 2013 Mark Kromer Big Data with SQL Server
Microsoft Cloud BI Update 2012 for SQL Saturday Philly
Microsoft Event Registration System Hosted on Windows Azure
What's new in SQL Server 2012 for philly code camp 2012.1
Big Data in the Cloud with Azure Marketplace Images
MEC Data sheet
PSSUG Nov 2012: Big Data with SQL Server
Big Data with SQL Server
Pentaho Big Data Analytics with Vertica and Hadoop
Big Data in the Real World
Rohit Bhargava, Influential Marketing Group: How To (Actually) Predict the Fu...
Robert Hastings, Bell Helicopter: Lead Like a Warrior
Pentaho Analytics on MongoDB
Francesca DeMartino, Medtronic: Adding Patient Value Through Partnerships
Big Data Analytics Projects - Real World with Pentaho
Sql server 2012 roadshow masd overview 003
Microsoft SQL Server Data Warehouses for SQL Server DBAs
Azure vs. amazon
Ad

Similar to Anexinet Big Data Solutions (20)

PPTX
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
PPTX
Accelerating Big Data Analytics
PPTX
Big Data Practice_Planning_steps_RK
PPTX
Big Data Analytics with Hadoop
PDF
Traditional data word
PPTX
Skilwise Big data
PDF
Présentation on radoop
PPTX
Derfor skal du bruge en DataLake
PPTX
Data Warehouse Modernization: Accelerating Time-To-Action
PPTX
Skillwise Big Data part 2
PPTX
Big data? No. Big Decisions are What You Want
PPTX
Big Data SE vs. SE for Big Data
PDF
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
PDF
Talend introduction v1
PDF
Big Data Analytics Unit I CCS334 Syllabus
PDF
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
PDF
the Data World Distilled
PPTX
Hadoop as Data Refinery - Steve Loughran
PPTX
Hadoop as data refinery
PDF
Big data Question bank.pdf
Introducing the Big Data Ecosystem with Caserta Concepts & Talend
Accelerating Big Data Analytics
Big Data Practice_Planning_steps_RK
Big Data Analytics with Hadoop
Traditional data word
Skilwise Big data
Présentation on radoop
Derfor skal du bruge en DataLake
Data Warehouse Modernization: Accelerating Time-To-Action
Skillwise Big Data part 2
Big data? No. Big Decisions are What You Want
Big Data SE vs. SE for Big Data
Seagate: Sensor Overload! Taming The Raging Manufacturing Big Data Torrent
Talend introduction v1
Big Data Analytics Unit I CCS334 Syllabus
The Practice of Big Data - The Hadoop ecosystem explained with usage scenarios
the Data World Distilled
Hadoop as Data Refinery - Steve Loughran
Hadoop as data refinery
Big data Question bank.pdf

More from Mark Kromer (20)

PPTX
Fabric Data Factory Pipeline Copy Perf Tips.pptx
PPTX
Build data quality rules and data cleansing into your data pipelines
PPTX
Mapping Data Flows Training deck Q1 CY22
PPTX
Data cleansing and prep with synapse data flows
PPTX
Data cleansing and data prep with synapse data flows
PPTX
Mapping Data Flows Training April 2021
PPTX
Mapping Data Flows Perf Tuning April 2021
PPTX
Data Lake ETL in the Cloud with ADF
PPTX
Azure Data Factory Data Wrangling with Power Query
PPTX
Azure Data Factory Data Flow Performance Tuning 101
PPTX
Data Quality Patterns in the Cloud with ADF
PPTX
Azure Data Factory Data Flows Training (Sept 2020 Update)
PPTX
Data quality patterns in the cloud with ADF
PPTX
Azure Data Factory Data Flows Training v005
PPTX
Data Quality Patterns in the Cloud with Azure Data Factory
PPTX
ADF Mapping Data Flows Level 300
PPTX
ADF Mapping Data Flows Training V2
PPTX
ADF Mapping Data Flows Training Slides V1
PDF
ADF Mapping Data Flow Private Preview Migration
PPTX
Azure Data Factory ETL Patterns in the Cloud
Fabric Data Factory Pipeline Copy Perf Tips.pptx
Build data quality rules and data cleansing into your data pipelines
Mapping Data Flows Training deck Q1 CY22
Data cleansing and prep with synapse data flows
Data cleansing and data prep with synapse data flows
Mapping Data Flows Training April 2021
Mapping Data Flows Perf Tuning April 2021
Data Lake ETL in the Cloud with ADF
Azure Data Factory Data Wrangling with Power Query
Azure Data Factory Data Flow Performance Tuning 101
Data Quality Patterns in the Cloud with ADF
Azure Data Factory Data Flows Training (Sept 2020 Update)
Data quality patterns in the cloud with ADF
Azure Data Factory Data Flows Training v005
Data Quality Patterns in the Cloud with Azure Data Factory
ADF Mapping Data Flows Level 300
ADF Mapping Data Flows Training V2
ADF Mapping Data Flows Training Slides V1
ADF Mapping Data Flow Private Preview Migration
Azure Data Factory ETL Patterns in the Cloud

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Modernizing your data center with Dell and AMD
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Cloud computing and distributed systems.
PDF
KodekX | Application Modernization Development
PPT
Teaching material agriculture food technology
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PDF
Machine learning based COVID-19 study performance prediction
PDF
Electronic commerce courselecture one. Pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Network Security Unit 5.pdf for BCA BBA.
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Modernizing your data center with Dell and AMD
Big Data Technologies - Introduction.pptx
Cloud computing and distributed systems.
KodekX | Application Modernization Development
Teaching material agriculture food technology
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
MYSQL Presentation for SQL database connectivity
Digital-Transformation-Roadmap-for-Companies.pptx
Machine learning based COVID-19 study performance prediction
Electronic commerce courselecture one. Pdf
Chapter 3 Spatial Domain Image Processing.pdf
Per capita expenditure prediction using model stacking based on satellite ima...
NewMind AI Monthly Chronicles - July 2025
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
20250228 LYD VKU AI Blended-Learning.pptx
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Understanding_Digital_Forensics_Presentation.pptx
Network Security Unit 5.pdf for BCA BBA.

Anexinet Big Data Solutions

  • 1. Anexinet Big Data Solutions for Big Data Analytics
  • 2. Big Data Defined Volume Velocity • Datasets that grow too large to • Large volume streaming data that easily manage in traditional RDBMS can overwhelm traditional BI & ETL • TBs, PBs, ZBs processes Variety Value • Data sources extraneous to • Big Data can have a traditional business systems that transformational effect on business can be unstructured and require when the proper systems and text analytics processes are put in place
  • 3. Big Data vs. Classic BI  What is different from classic DW/BI and Big Data Analytics?  Businesses today treat data warehouse & business intelligence as must-have reporting and operational capability  Businesses that are not fully mature in BI lifecycle may struggle with Big Data  Big Data Projects look for untapped analytics, not BI dashboards  SCALE: Think Volume, Variety and Velocity  Yahoo! Uses Microsoft SQL Server & Analysis Services, with Hadoop, Oracle & Tableau  38,000 machines distributed across 20 different clusters  2-petabyte Hadoop cluster that feeds 1.2 terabytes of raw data each day into Oracle RAC  Data is compressed and 135 gigabytes of data per day is sent to a SQL Server 2008 R2 Analysis Services cube  Cube produces 24 terabytes of data each quarter  http://www.microsoft.com/casestudies/Case_Study_Detail.aspx?CaseStudyID=710000001707
  • 4. Scalable Big Data Platform Architecture HDFS Cluster In-memory cubes MapReduce Framework Analytical Advanced in- Columnstore MPP memory analytics Tables Database Hadoop Analytics Star Ad-hoc data Schemas discovery Data Warehouse End User Reporting © Copyright 2013 Anexinet Corp. 4
  • 5. Go Beyond Dashboards. Provide Advanced Analytics.  Large number of data Tableau points adds new business value  Big Data advanced analytics requires tool that Microsoft Power can sample complex data View sources  Must provide quick aggregations of large data sets that are easily Qlikview consumed by the human eye  Must provide “data discovery” for ad-hoc analysis
  • 6. Marketing Samples  Enhance marketing campaigns with Big Data  Social analytics, customer analytic, targeted marketing, brand sentiment  Big Data has proven transformational for marketing organizations (Razorfish, Yahoo!, NBC, [x+1]) Web Analytics from Google Analytics
  • 7. Anexinet Big Data Offerings Strategy Engagement • Customer stakeholder interviews & interactive sessions • Define Big Data Requirements • Design Big Data Strategy • Deliver Strategy & Roadmap Documents Starter Solution • Let Anexinet handle the hardest parts of a Big Data solution * Getting started * Collecting & processing data * Uncover business value from Big Data Big Data Project Engagement • End-to-end Big Data project * Big Data Discovery * Big Data Platform * Big Data Analytics * Big Data Visualizations
  • 8. Partnerships Big Data Platforms Big Data Databases Big Data Visualizations • EMC Greenplum • HP Vertica • QlikView • Hortonworks • EMC Greenplum • Tableau (OSS, MSFT, HP) • Microsoft PDW • Microsoft PowerPivot • Cloudera • Oracle Exalytics • Microsoft Power View (OSS, Oracle, HP) • Oracle Big Data Appliance
  • 9. A Credible Partner to Deploy Big Data Solutions Security Integration Configuration Governance • Ensure • ETL / ELT • Configure the • Ensure Data privacy of PII • Integrate Big Data Quality Hadoop into environment to • MDM • Conform Big your DW & maximize • Process Data solution Analytics throughput, Governance to your environments performance enterprise • Integrate Big and analytics to security Data into your IT meet your investments stated SLA goals standards
  • 10. Top Impediments to Successful Big Data Analytics
  • 11. Big Data Buzzword Glossary  Big Data: Think 3 v’s, unstructured data, data that is not currently managed in DW. This is the data that companies need to do game-changing analytics.  Big Data Analytics: Business insights gained from mining Big Data to transform business processes  Columnar: Column-oriented databases that are used in Big Data scenarios because of their speed and compression capabilities, i.e. HP Vertica, HBase  Hadoop: Apache open-source framework for Big Data processing. Made up of multiple components. The leading Big Data platform. Marketed by Couldera & Hortonworks.  In-memory DB: A database that resides fully in memory, eliminating IO bottlenecks. Very important in Big Data Analytics systems, i.e. Microsoft PowerPivot, SSAS 2012, SAP HANA  MapReduce: Distributed data programming and processing framework. A key aspect of processing Big Data is using a MapReduce framework across distributed clusters of commodity servers. Available as open source in the Hadoop framework and in various Hadoop distribution flavors.  MPP: Massively Parallel Processing database engine, mostly used for data warehouse & BI workloads. I.e. SQL Server PDW, IBM Netezza, Teradata  NoSQL: Key-value data store for quick eventual-ACID schemaless database writes. Big Data systems will use these to store data coming in from sources that dump large amounts of data quickly, i.e. Cassandra, MongoDB.