SlideShare a Scribd company logo
The Big Data Journey
Connexity
Shopping powers our marketing platforms
2
• Paid Search & Marketplace
Performance-based marketing that finds in-
market shoppers and delivers conversions at
lower cost
• Bizrate Insights
A reporting and ratings platform that captures
the power of the consumer voice.
• Display Media
An audience activation platform that integrates
retail data and programmatic buying.
Connexity History
Don’t worry - there won’t be a test later
3
Connexity Technology
The Pre-Big Data Era
4
Connexity Technology
The Big Data Explosion
5
Lessons Learned
“There’s a funny thing about regret... It’s better to regret
something you have done, than something you haven’t.” – Gibby
Haynes
6
A few of our production graduates
o Use of Cassandra
o SitePerf: in-house availability monitoring tool
o Several different customer-facing advertising products
o Hadoop implementations of core bidding platform
o Mock Service: Like Wiremock with persistence to MySQL
o Numerous internal tools for managing our systems
R & D
10% time: Give all engineers the opportunity to experiment
7
Quality Assurance
Any new technology choice should improve or maintain
test automation coverage
Case Study: Hadoop + Solr + BDD
8
Existing Technologies
Reasons to stay with an older technology
1. It works well
2. Your business depends on it
3. Your team is very knowledgeable in its operation
9
New Technologies
Reasons to use a new technology
1. It makes new things possible or very difficult things
easier
• Hadoop / MapReduce
• Auto-sharding distributed key-value data
stores (Cassandra, Hbase, VoltDB, Riak,
etc)
• Distributed stream-processing systems
(Storm)
10
New Technologies
Reasons to use a new technology
2. It will save your company
money
• Hardware
• Software Licensing
• Bandwidth
• Power Consumption
11
New Technologies
Reasons to use a new technology
3. It will save you time
• Time to market
• Time spent on operational complexity
• Time fighting fires
• Compute time
12
New Technologies
Reasons to use a new technology
4. It brings you in line with industry
standards
• Moving from home-grown frameworks
to Hadoop, Solr
• Where possible, running on JVM-based
systems
13
Big Data Trends
14
o Like you, our working dataset is only growing
o We are consolidating the number and variety of NoSQL solutions that we
use
o We’re looking at better abstractions for Java MapReduce programming:
Crunch, Cascading, …
o Have dipped our toes in the water with Storm, but expect heavier stream-
processing needs soon
o Still looking for a bulletproof way of importing data from various sources into
Hadoop: LinkedIn’s Gobblin shows some promise there

More Related Content

PDF
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
PDF
What's new in SQL on Hadoop and Beyond
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Big Data Ingestion @ Flipkart Data Platform
PPTX
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
PDF
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
PPTX
Real time fraud detection at 1+M scale on hadoop stack
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
The Future of Hadoop by Arun Murthy, PMC Apache Hadoop & Cofounder Hortonworks
What's new in SQL on Hadoop and Beyond
HBase Global Indexing to support large-scale data ingestion at Uber
Big Data Ingestion @ Flipkart Data Platform
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Big Data Day LA 2015 - Always-on Ingestion for Data at Scale by Arvind Prabha...
Real time fraud detection at 1+M scale on hadoop stack
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...

What's hot (20)

PPTX
Hadoop and HBase @eBay
PPTX
In Flux Limiting for a multi-tenant logging service
PDF
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
PDF
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
PPTX
Lego-like building blocks of Storm and Spark Streaming Pipelines
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
IoFMT – Internet of Fleet Management Things
PPTX
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
PPTX
Self-Service Analytics on Hadoop: Lessons Learned
PPTX
Hadoop and friends
PDF
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
PDF
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
PDF
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
PDF
The Next Generation of Data Processing and Open Source
PDF
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
PDF
Unified, Efficient, and Portable Data Processing with Apache Beam
PPTX
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
PPTX
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
PPTX
Real time analytics
PPTX
Querying Druid in SQL with Superset
Hadoop and HBase @eBay
In Flux Limiting for a multi-tenant logging service
A New “Sparkitecture” for Modernizing your Data Warehouse: Spark Summit East ...
Analytics at the Real-Time Speed of Business: Spark Summit East talk by Manis...
Lego-like building blocks of Storm and Spark Streaming Pipelines
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
IoFMT – Internet of Fleet Management Things
HBaseConAsia2018: Track2-5: JanusGraph-Distributed graph database with HBase
Self-Service Analytics on Hadoop: Lessons Learned
Hadoop and friends
Big Data Day LA 2016/ Use Case Driven track - Hydrator: Open Source, Code-Fre...
Learnings Using Spark Streaming and DataFrames for Walmart Search: Spark Summ...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Introduction to Kafka - Je...
The Next Generation of Data Processing and Open Source
HBaseConAsia2018 Track2-6: Scaling 30TB's of data lake with Apache HBase and ...
Unified, Efficient, and Portable Data Processing with Apache Beam
HBaseConAsia2018 Track2-2: Apache Kylin on HBase: Extreme OLAP for big data
Big Data Day LA 2015 - Introducing N1QL: SQL for Documents by Jeff Morris of ...
Real time analytics
Querying Druid in SQL with Superset
Ad

Viewers also liked (20)

PPT
Dot pab forum september 2011
PPT
101129 tokyopref bochibochi
PDF
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
PDF
Big Data Day LA 2015 - Using data visualization to find patterns in multidime...
PPTX
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
PDF
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
PPTX
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
PPTX
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
PDF
Big Data Day LA 2016/ Data Science Track - Data Storytelling for Impact - Dav...
PDF
Do you know how the ultra affluent use social media? Find out.
PDF
Spark after Dark by Chris Fregly of Databricks
PPTX
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
PDF
6 damaging myths about social media and the truths behind them
PPTX
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
PPTX
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
PPTX
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
PDF
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
PPTX
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
PDF
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
PPTX
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Dot pab forum september 2011
101129 tokyopref bochibochi
Big Data Day LA 2015 - What's New Tajo 0.10 and Beyond by Hyunsik Choi of Gruter
Big Data Day LA 2015 - Using data visualization to find patterns in multidime...
Big Data Day LA 2015 - Transforming into a data driven enterprise using exist...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
An evening with Jay Kreps; author of Apache Kafka, Samza, Voldemort & Azkaban.
Big Data Day LA 2015 - Lessons learned from scaling Big Data in the Cloud by...
Big Data Day LA 2016/ Data Science Track - Data Storytelling for Impact - Dav...
Do you know how the ultra affluent use social media? Find out.
Spark after Dark by Chris Fregly of Databricks
Spark Streaming& Kafka-The Future of Stream Processing by Hari Shreedharan of...
6 damaging myths about social media and the truths behind them
Big Data Day LA 2016/ Use Case Driven track - Shaping the Role of Data Scienc...
Big Data Day LA 2016/ Hadoop/ Spark/ Kafka track - Building an Event-oriented...
Big Data Day LA 2016/ Use Case Driven track - From Clusters to Clouds, Hardwa...
Big Data Day LA 2016/ NoSQL track - Big Data and Real Estate, Jon Zifcak, CEO...
Big Data Day LA 2016/ Data Science Track - Intuit's Payments Risk Platform, D...
Big Data Day LA 2016/ Use Case Driven track - How to Use Design Thinking to J...
Big Data Day LA 2015 - Event Driven Architecture for Web Analytics by Peyman ...
Ad

Similar to Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at Connexity by Will Gage of Connexity (20)

PDF
The Big Data Journey at Connexity - Big Data Day LA 2015
PDF
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
PDF
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
PDF
Customer value analysis of big data products
PPTX
Building a Modern Analytic Database with Cloudera 5.8
PDF
50 Shades of SQL
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PPTX
Big data4businessusers
PPTX
Achieve New Heights with Modern Analytics
PPTX
Making Bank Predictive and Real-Time
PDF
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
PDF
Moving data to the cloud BY CESAR ROJAS from Pivotal
PDF
A Tight Ship: How Containers and SDS Optimize the Enterprise
PPTX
MapR on Azure: Getting Value from Big Data in the Cloud -
PDF
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
PDF
Big data beyond the hype may 2014
PPTX
Simplifying Real-Time Architectures for IoT with Apache Kudu
PPTX
How Data Drives Business at Choice Hotels
PDF
CSC - Presentation at Hortonworks Booth - Strata 2014
PDF
OpenPOWER Update
The Big Data Journey at Connexity - Big Data Day LA 2015
From Business Idea to Successful Delivery by Serhiy Haziyev & Olha Hrytsay, S...
Engineering Machine Learning Data Pipelines Series: Streaming New Data as It ...
Customer value analysis of big data products
Building a Modern Analytic Database with Cloudera 5.8
50 Shades of SQL
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Big data4businessusers
Achieve New Heights with Modern Analytics
Making Bank Predictive and Real-Time
Cortana Analytics Workshop: The "Big Data" of the Cortana Analytics Suite, Pa...
Moving data to the cloud BY CESAR ROJAS from Pivotal
A Tight Ship: How Containers and SDS Optimize the Enterprise
MapR on Azure: Getting Value from Big Data in the Cloud -
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Big data beyond the hype may 2014
Simplifying Real-Time Architectures for IoT with Apache Kudu
How Data Drives Business at Choice Hotels
CSC - Presentation at Hortonworks Booth - Strata 2014
OpenPOWER Update

More from Data Con LA (20)

PPTX
Data Con LA 2022 Keynotes
PPTX
Data Con LA 2022 Keynotes
PDF
Data Con LA 2022 Keynote
PPTX
Data Con LA 2022 - Startup Showcase
PPTX
Data Con LA 2022 Keynote
PDF
Data Con LA 2022 - Using Google trends data to build product recommendations
PPTX
Data Con LA 2022 - AI Ethics
PDF
Data Con LA 2022 - Improving disaster response with machine learning
PDF
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
PDF
Data Con LA 2022 - Real world consumer segmentation
PPTX
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
PPTX
Data Con LA 2022 - Moving Data at Scale to AWS
PDF
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
PDF
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
PDF
Data Con LA 2022 - Intro to Data Science
PDF
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
PPTX
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
PPTX
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
PPTX
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
PPTX
Data Con LA 2022 - Data Streaming with Kafka
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynotes
Data Con LA 2022 Keynote
Data Con LA 2022 - Startup Showcase
Data Con LA 2022 Keynote
Data Con LA 2022 - Using Google trends data to build product recommendations
Data Con LA 2022 - AI Ethics
Data Con LA 2022 - Improving disaster response with machine learning
Data Con LA 2022 - What's new with MongoDB 6.0 and Atlas
Data Con LA 2022 - Real world consumer segmentation
Data Con LA 2022 - Modernizing Analytics & AI for today's needs: Intuit Turbo...
Data Con LA 2022 - Moving Data at Scale to AWS
Data Con LA 2022 - Collaborative Data Exploration using Conversational AI
Data Con LA 2022 - Why Database Modernization Makes Your Data Decisions More ...
Data Con LA 2022 - Intro to Data Science
Data Con LA 2022 - How are NFTs and DeFi Changing Entertainment
Data Con LA 2022 - Why Data Quality vigilance requires an End-to-End, Automat...
Data Con LA 2022-Perfect Viral Ad prediction of Superbowl 2022 using Tease, T...
Data Con LA 2022- Embedding medical journeys with machine learning to improve...
Data Con LA 2022 - Data Streaming with Kafka

Recently uploaded (20)

PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PPTX
A Presentation on Artificial Intelligence
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Big Data Technologies - Introduction.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Diabetes mellitus diagnosis method based random forest with bat algorithm
Dropbox Q2 2025 Financial Results & Investor Presentation
A Presentation on Artificial Intelligence
“AI and Expert System Decision Support & Business Intelligence Systems”
Big Data Technologies - Introduction.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Modernizing your data center with Dell and AMD
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Network Security Unit 5.pdf for BCA BBA.
NewMind AI Monthly Chronicles - July 2025
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
MYSQL Presentation for SQL database connectivity
Spectral efficient network and resource selection model in 5G networks
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

Big Data Day LA 2015 - The Big Data Journey: How Big Data Practices Evolve at Connexity by Will Gage of Connexity

  • 1. The Big Data Journey
  • 2. Connexity Shopping powers our marketing platforms 2 • Paid Search & Marketplace Performance-based marketing that finds in- market shoppers and delivers conversions at lower cost • Bizrate Insights A reporting and ratings platform that captures the power of the consumer voice. • Display Media An audience activation platform that integrates retail data and programmatic buying.
  • 3. Connexity History Don’t worry - there won’t be a test later 3
  • 5. Connexity Technology The Big Data Explosion 5
  • 6. Lessons Learned “There’s a funny thing about regret... It’s better to regret something you have done, than something you haven’t.” – Gibby Haynes 6
  • 7. A few of our production graduates o Use of Cassandra o SitePerf: in-house availability monitoring tool o Several different customer-facing advertising products o Hadoop implementations of core bidding platform o Mock Service: Like Wiremock with persistence to MySQL o Numerous internal tools for managing our systems R & D 10% time: Give all engineers the opportunity to experiment 7
  • 8. Quality Assurance Any new technology choice should improve or maintain test automation coverage Case Study: Hadoop + Solr + BDD 8
  • 9. Existing Technologies Reasons to stay with an older technology 1. It works well 2. Your business depends on it 3. Your team is very knowledgeable in its operation 9
  • 10. New Technologies Reasons to use a new technology 1. It makes new things possible or very difficult things easier • Hadoop / MapReduce • Auto-sharding distributed key-value data stores (Cassandra, Hbase, VoltDB, Riak, etc) • Distributed stream-processing systems (Storm) 10
  • 11. New Technologies Reasons to use a new technology 2. It will save your company money • Hardware • Software Licensing • Bandwidth • Power Consumption 11
  • 12. New Technologies Reasons to use a new technology 3. It will save you time • Time to market • Time spent on operational complexity • Time fighting fires • Compute time 12
  • 13. New Technologies Reasons to use a new technology 4. It brings you in line with industry standards • Moving from home-grown frameworks to Hadoop, Solr • Where possible, running on JVM-based systems 13
  • 14. Big Data Trends 14 o Like you, our working dataset is only growing o We are consolidating the number and variety of NoSQL solutions that we use o We’re looking at better abstractions for Java MapReduce programming: Crunch, Cascading, … o Have dipped our toes in the water with Storm, but expect heavier stream- processing needs soon o Still looking for a bulletproof way of importing data from various sources into Hadoop: LinkedIn’s Gobblin shows some promise there

Editor's Notes

  • #9: This property has been true of most big-data technologies we’ve worked with Especially open source ones Any technology that represents a step back in testability should give you a horrible icky feeling This example is Cucumber’s Gherkin DSL Executes with every build Runs against MiniMRCluster, starts a real Solr instance, executes all the real code in integration