SlideShare a Scribd company logo
Case Study: Big Data Analytics
The client is a US based product company that offers Enterprise Search & Discovery
Platforms based on Big Data technologies. The platform is used by many large enterprises
to transform their most valuable asset - information, to valuable insights.
The client’s product connects with all the diverse information sources available in an
enterprise - email systems, enterprise resource planning systems, customer relationship
management
systems, document management systems like sharepoint and many others in to a single
source. The client has added a new SQL interface which allows their customers to search
via using familiar SQL interfaces.
Challenges
Benchmarking Different Data Sources and Search Platforms
for Today’s Digital Enterprise
The SQL interface to the product allows querying information in the system via
SQL queries. The client was interested in knowing how their product performs
compared to other alternatives which may or may not be similar. The mandate
was to evaluate performance of their product with open source Hadoop+Hive
eco-system using industry standard TPC-H benchmarking system for relational
databases.
The objective was to identify areas of improvement in their product and make
recommendations to their customers.
Benchmarking proprietary Big Data based enterprise search and discovery
platforms with relational databases and open source Big Data systems
Solution
Researched TPC-H benchmarking standard, generated test data
for a specific scale factor
Identified appropriate hardware profile for all the data sources
Added MySQL Server, ran TPC-H benchmark against MySQL
Server for base lining
Built a Hadoop Cluster, added Hive, and using Sqoop imported
data from MySQL and ran the TPC-H benchmark for this system
TenXLabs enhanced our own
ideas just like a consultant.
They were very
thoughtfuland enabled us to
gain valuable outputs. They
allowed us to grab the
competitive advantage in the
digitally transformed world.
TenXLabs assisted us in
performing high level testing
for our different data sources.
TenXLabs took a holistic approach for this benchmarking exercise by
using both relational as well as non relational sources into
consideration and:
+1 267-507-6135 +91 404-646-5532 sales@tenxlabs.com(US) (INDIA)
Š TenXLabs Technologies. All rights reserved.
www.tenxlabs.com
BNY Mellon Center, 1735 Market Street, Suite 3750, Philadelphia PA, USA
Block A, IIIT Campus, Gachibowli, Hyderabad - 500 032, INDIA
Researched client’s product, built a cluster, wrote custom
connectors to import data from MySQL and ran TPC-H
benchmark for this system
Built a JDBC/JUnit based Test harness to run TPC-H benchmark
against any given system which allows JDBC
Built a web based tool, to run any ad-hoc SQL queries against
client’s product, Hadoop+Hive and MySQL and collect
performance stats in real time
TenXLabs approach offered the client
repeatable and reliable solution to
benchmark their product.The solution was
both extensible, enabling more data
sources to be added in future and
scalable,wherein data for multiple load
factors was supported by TPC-H.
As a result client was also able to identify
the areas of focus and was pleasantly
surprised to see their product preforming
exceptionally well, in some cases compared
to even MySQL - something they did not
anticipate.
TenXLabs is currently engaged with the
client to enhance as well as expand this
solution.
Benefits

More Related Content

PPTX
Architecting a Modern Data Warehouse: Enterprise Must-Haves
PPTX
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
PPTX
How Yellowbrick Data Integrates to Existing Environments Webcast
PPTX
The Yellowbrick Impact for MicroStrategy
PPTX
Moving to the Cloud: Modernizing Data Architecture in Healthcare
PPTX
Yellowbrick Webcast with DBTA for Real-Time Analytics
PPTX
Enterprise Data Hub: The Next Big Thing in Big Data
PPTX
Delivering digital transformation and business impact with io t, machine lear...
Architecting a Modern Data Warehouse: Enterprise Must-Haves
Open Source in the Energy Industry - Creating a New Operational Model for Dat...
How Yellowbrick Data Integrates to Existing Environments Webcast
The Yellowbrick Impact for MicroStrategy
Moving to the Cloud: Modernizing Data Architecture in Healthcare
Yellowbrick Webcast with DBTA for Real-Time Analytics
Enterprise Data Hub: The Next Big Thing in Big Data
Delivering digital transformation and business impact with io t, machine lear...

What's hot (20)

PDF
Modern Data Management for Federal Modernization
PDF
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
PPTX
Breakout: Operational Analytics with Hadoop
PPT
Emergence of MongoDB as an Enterprise Data Hub
PPTX
A brief history of data warehousing
PPTX
Great Expectations Presentation
PDF
What's new in Hortonworks DataFlow 3.0 by Andrew Psaltis
PDF
Why Data Virtualization? An Introduction.
PDF
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
PPTX
Rethink Analytics with an Enterprise Data Hub
PDF
Transforming GE Healthcare with Data Platform Strategy
PPTX
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
PPTX
Hadoop: Making it work for the Business Unit
PDF
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
PPTX
The Convergence of Data & Digital: Mapping Out a Cohesive Strategy for Maximu...
PPT
Making the Case for Hadoop in a Large Enterprise-British Airways
PDF
Southwest Power Pool big data case study
PDF
The Future of Data Management: The Enterprise Data Hub
ODP
Big Data Testing Strategies
PDF
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Modern Data Management for Federal Modernization
Making Big Data Analytics with Hadoop fast & easy (webinar slides)
Breakout: Operational Analytics with Hadoop
Emergence of MongoDB as an Enterprise Data Hub
A brief history of data warehousing
Great Expectations Presentation
What's new in Hortonworks DataFlow 3.0 by Andrew Psaltis
Why Data Virtualization? An Introduction.
Denodo DataFest 2017: Conquering the Edge with Data Virtualization
Rethink Analytics with an Enterprise Data Hub
Transforming GE Healthcare with Data Platform Strategy
Cloudera Federal Forum 2014: The Building Blocks of the Enterprise Data Hub
Hadoop: Making it work for the Business Unit
Citizens Bank: Data Lake Implementation – Selecting BigInsights ViON Spark/Ha...
The Convergence of Data & Digital: Mapping Out a Cohesive Strategy for Maximu...
Making the Case for Hadoop in a Large Enterprise-British Airways
Southwest Power Pool big data case study
The Future of Data Management: The Enterprise Data Hub
Big Data Testing Strategies
AURIN Data Hubs Supporting Smarter Cities - Phil Delaney, Locate14
Ad

Similar to Case Study: Big Data Analytics (20)

PDF
CSC - Presentation at Hortonworks Booth - Strata 2014
PPTX
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
PPTX
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
PPTX
Testing Big Data: Automated ETL Testing of Hadoop
PPTX
Big data journey to the cloud maz chaudhri 5.30.18
PPTX
The Future of Apache Hadoop an Enterprise Architecture View
PPTX
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
DOCX
RamMohan_Testing_8Exp_Resume
PPTX
Skilwise Big data
PDF
Maven and google pharma r&d (1)
PDF
Hadoop as an Analytic Platform: Why Not?
PPTX
Skillwise Big Data part 2
PDF
Transforming Business in a Digital Era with Big Data and Microsoft
PDF
QuerySurge Slide Deck for Big Data Testing Webinar
 
PDF
Strengthening the Quality of Big Data Implementations
PDF
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
PDF
Eliminating the Challenges of Big Data Management Inside Hadoop
PPTX
Improve the Health of Your Data
 
PPT
Choosing a Database
CSC - Presentation at Hortonworks Booth - Strata 2014
Testing Big Data: Automated Testing of Hadoop with QuerySurge
 
Big Data Testing : Automate theTesting of Hadoop, NoSQL & DWH without Writing...
 
Testing Big Data: Automated ETL Testing of Hadoop
Big data journey to the cloud maz chaudhri 5.30.18
The Future of Apache Hadoop an Enterprise Architecture View
Apache Hadoop Summit 2016: The Future of Apache Hadoop an Enterprise Architec...
 
RamMohan_Testing_8Exp_Resume
Skilwise Big data
Maven and google pharma r&d (1)
Hadoop as an Analytic Platform: Why Not?
Skillwise Big Data part 2
Transforming Business in a Digital Era with Big Data and Microsoft
QuerySurge Slide Deck for Big Data Testing Webinar
 
Strengthening the Quality of Big Data Implementations
BDW Chicago 2016 - Ramu Kalvakuntla, Sr. Principal - Technical - Big Data Pra...
Eliminating the Challenges of Big Data Management Inside Hadoop
Eliminating the Challenges of Big Data Management Inside Hadoop
Improve the Health of Your Data
 
Choosing a Database
Ad

More from Abhinav Das (7)

PPTX
TaaS Webinar
PPTX
Performance Testing webinar
PPTX
TenXLabs - webinar
PDF
Case Study: Performance Testing
PDF
Case Study: Security Testing
PDF
Major bank enterprise payment hub automation framework
PDF
Test Automation for Healthcare Industry
TaaS Webinar
Performance Testing webinar
TenXLabs - webinar
Case Study: Performance Testing
Case Study: Security Testing
Major bank enterprise payment hub automation framework
Test Automation for Healthcare Industry

Recently uploaded (20)

PPTX
STERILIZATION AND DISINFECTION-1.ppthhhbx
PPTX
Database Infoormation System (DBIS).pptx
PPT
DATA COLLECTION METHODS-ppt for nursing research
PPTX
Data_Analytics_and_PowerBI_Presentation.pptx
PDF
Microsoft Core Cloud Services powerpoint
PPTX
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
PPTX
importance of Data-Visualization-in-Data-Science. for mba studnts
PPTX
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
PDF
Galatica Smart Energy Infrastructure Startup Pitch Deck
PPTX
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
PDF
annual-report-2024-2025 original latest.
PPTX
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
PPTX
Topic 5 Presentation 5 Lesson 5 Corporate Fin
PPTX
Acceptance and paychological effects of mandatory extra coach I classes.pptx
PPTX
Managing Community Partner Relationships
PPT
Predictive modeling basics in data cleaning process
PDF
How to run a consulting project- client discovery
 
PPTX
A Complete Guide to Streamlining Business Processes
PPTX
iec ppt-1 pptx icmr ppt on rehabilitation.pptx
STERILIZATION AND DISINFECTION-1.ppthhhbx
Database Infoormation System (DBIS).pptx
DATA COLLECTION METHODS-ppt for nursing research
Data_Analytics_and_PowerBI_Presentation.pptx
Microsoft Core Cloud Services powerpoint
01_intro xxxxxxxxxxfffffffffffaaaaaaaaaaafg
importance of Data-Visualization-in-Data-Science. for mba studnts
QUANTUM_COMPUTING_AND_ITS_POTENTIAL_APPLICATIONS[2].pptx
Galatica Smart Energy Infrastructure Startup Pitch Deck
CEE 2 REPORT G7.pptxbdbshjdgsgjgsjfiuhsd
annual-report-2024-2025 original latest.
Microsoft-Fabric-Unifying-Analytics-for-the-Modern-Enterprise Solution.pptx
Topic 5 Presentation 5 Lesson 5 Corporate Fin
Acceptance and paychological effects of mandatory extra coach I classes.pptx
Managing Community Partner Relationships
Predictive modeling basics in data cleaning process
How to run a consulting project- client discovery
 
A Complete Guide to Streamlining Business Processes
iec ppt-1 pptx icmr ppt on rehabilitation.pptx

Case Study: Big Data Analytics

  • 1. Case Study: Big Data Analytics The client is a US based product company that offers Enterprise Search & Discovery Platforms based on Big Data technologies. The platform is used by many large enterprises to transform their most valuable asset - information, to valuable insights. The client’s product connects with all the diverse information sources available in an enterprise - email systems, enterprise resource planning systems, customer relationship management systems, document management systems like sharepoint and many others in to a single source. The client has added a new SQL interface which allows their customers to search via using familiar SQL interfaces. Challenges Benchmarking Different Data Sources and Search Platforms for Today’s Digital Enterprise The SQL interface to the product allows querying information in the system via SQL queries. The client was interested in knowing how their product performs compared to other alternatives which may or may not be similar. The mandate was to evaluate performance of their product with open source Hadoop+Hive eco-system using industry standard TPC-H benchmarking system for relational databases. The objective was to identify areas of improvement in their product and make recommendations to their customers. Benchmarking proprietary Big Data based enterprise search and discovery platforms with relational databases and open source Big Data systems
  • 2. Solution Researched TPC-H benchmarking standard, generated test data for a specific scale factor Identified appropriate hardware profile for all the data sources Added MySQL Server, ran TPC-H benchmark against MySQL Server for base lining Built a Hadoop Cluster, added Hive, and using Sqoop imported data from MySQL and ran the TPC-H benchmark for this system TenXLabs enhanced our own ideas just like a consultant. They were very thoughtfuland enabled us to gain valuable outputs. They allowed us to grab the competitive advantage in the digitally transformed world. TenXLabs assisted us in performing high level testing for our different data sources. TenXLabs took a holistic approach for this benchmarking exercise by using both relational as well as non relational sources into consideration and: +1 267-507-6135 +91 404-646-5532 sales@tenxlabs.com(US) (INDIA) Š TenXLabs Technologies. All rights reserved. www.tenxlabs.com BNY Mellon Center, 1735 Market Street, Suite 3750, Philadelphia PA, USA Block A, IIIT Campus, Gachibowli, Hyderabad - 500 032, INDIA Researched client’s product, built a cluster, wrote custom connectors to import data from MySQL and ran TPC-H benchmark for this system Built a JDBC/JUnit based Test harness to run TPC-H benchmark against any given system which allows JDBC Built a web based tool, to run any ad-hoc SQL queries against client’s product, Hadoop+Hive and MySQL and collect performance stats in real time TenXLabs approach offered the client repeatable and reliable solution to benchmark their product.The solution was both extensible, enabling more data sources to be added in future and scalable,wherein data for multiple load factors was supported by TPC-H. As a result client was also able to identify the areas of focus and was pleasantly surprised to see their product preforming exceptionally well, in some cases compared to even MySQL - something they did not anticipate. TenXLabs is currently engaged with the client to enhance as well as expand this solution. Benefits