SlideShare a Scribd company logo
sqrrl data, INC.
                                                        Secure. Scale. Adapt.


                                                                        Adam Fuchs, Chief Technology Officer




info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Who We are



                                     is the commercial
                                         provider of


                Mature Database Technology - Apache Accumulo
                Fine-Grained Access Controls - Data Integration and Sharing
                Proven Performance - Petabytes and Beyond
                Advanced Analytics - Search, Statistics, and Graphs


                                                                                                      2
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                      3
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Apache Accumulo Perspective

         Data                  Data             Data
                                                                           Integration across:

                                                                                Multiple business lines
                                                                                Multiple data sets
                                                                                Multiple applications
                                                                                Multiple security, privacy, legal,
     Application          Application        Application
                                                                                policy, regulatory, and
                                                                                compliance constraints
                                                                                New demands




                                                                                                              4
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Design Drivers

                       Cell-Level Security
       1                Express common security requirements in the infrastructure, not just in the application
                        Data-centric approach encourages secure sharing



                      Scalability
       2               Near linear performance improvements at thousands of nodes
                       Durable and reliable under increased failures that come with scale



                      Diverse, Interactive Analytics
       3               Sorted key/value core performs well in a diverse set of domains
                       Information retrieval, statistics, graph analysis, geo indexing, and more


                      Flexible, Adaptive Schema
       4               Start with universal structures and indexing
                       Refine the schema over time


                                                                                                                   5
info@sqrrl.com | @sqrrl_inc | 617.520.4375    sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                      6
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Key Structure

      An Accumulo key is a 5-tuple, consisting of:
           Row: Controls Atomicity
           Column Family: Controls Locality
           Column Qualifier: Controls Uniqueness
           Visibility Label: Controls Access
           Timestamp: Controls Versioning


          Row             Col. Fam.             Col. Qual.              Visibility      Timestamp          Value
                                                                                                    Patient suffers
      John Doe         Notes                 PCP                    PCP_JD              20120912
                                                                                                    from an acute …
      John Doe         Test Results          Cholesterol            JD|PCP_JD           20120912    183
      John Doe         Test Results          Mental Health          JD|PSYCH_JD         20120801    Pass
      John Doe         Test Results          X-Ray                  JD|PHYS_JD          20120513    1010110110100…

                                              Accumulo Key/Value Example

                                                                                                                   7
info@sqrrl.com | @sqrrl_inc | 617.520.4375      sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Visibility Syntax & Semantics




                                                                                                      8
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Tablets
         Well-Known
           Location
         (zookeeper)
                                                                             Collections of KV pairs form Tables
                                                                             Tables are partitioned into Tablets
                           Root Tablet
                            -∞ to ∞                                          Metadata tablets hold info about
                                                                             other tablets, forming a 3-level
                                                                             hierarchy
         Metadata Tablet 1            Metadata Tablet 2                      A Tablet is a unit of work for a Tablet
        -∞ to “Encyclopedia:Ocelot”   “Encyclopedia:Ocelot” to ∞             Server


      Table: Adam’s Table                                          Table: Encyclopedia                     Table: Foo

      Data Tablet         Data Tablet                 Data Tablet        Data Tablet        Data Tablet     Data Tablet
       -∞ : thing          thing : ∞                  -∞ : Ocelot        Ocelot : Yak        Yak : ∞         -∞ to ∞

                                                                                                                       9
info@sqrrl.com | @sqrrl_inc | 617.520.4375          sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Accumulo Architecture
                                             Delegate
                     Zookeeper               Authority      Tablet Server
                     Zookeeper
                     Zookeeper
                                                                     Tablet
           Delegate                                                                      Read/Write
                                                                                                       Application
           Authority                                        Tablet Server
                                      Assign/Balance


                        Master                                                                         Application
                                                                     Tablet

                                      Store/Replicate                                                  Application
                                                            Tablet Server


                     Hadoop
                                                                     Tablet


                                                                                                                10
info@sqrrl.com | @sqrrl_inc | 617.520.4375       sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Tablet Data Flow


                                                          Tablet
                                                                                     Scan
                                 In-Memory                                                  Iterator
                                                                                                           Reads
       Writes                                                  Iterator                       Tree
                                    Map             Minor        Tree

                                                  Compaction


                                                          Sorted, Ind        Sorted, Ind
                                                           exed File          exed File

                             Write Ahead                                                     Sorted, Ind
                                  Log                                          Iterator       exed File
                            (For Recovery)                   Merging /    Major Tree
                                                              Compaction




                                                                                                               11
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                     16
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Hierarchical Decomposition

                          Row:                                                  <person>



      Column Family:                               attribute                   purchases               returns



 Column Qualifier:                            age          discount sneakers                             hat



                        Value:               <age>           <40%>                   <cost>            <cost>

                                                                                                               17
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Materialized Table
                                                              Key/Value Pair
       Row:                                   bill                                             george




   Column                           attribute        purchases                   attribute purchases returns
   Family:



 Column                age          discount          sneakers                        age     sneakers     hat
Qualifier:



     Value:             49              40%              $100                         27        $83        $42

                                                                                                            18
info@sqrrl.com | @sqrrl_inc | 617.520.4375    sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Forward and Inverted Index

                        Table:               Forward Index                            Inverted Index

                          Row:                      <UUID>                               <Term>


      Column Family:                                <Type>                           <Type> + <Field>


 Column Qualifier:                                  <Field>                              <UUID>


                        Value:                      <Term>                           <Digest of Event>

                                                                                                         19
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Forward and Inverted Index




                                                                                                     20
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Graph Analysis

                        Table:                                               Graph Table

                          Row:                                                 <Node ID>

      Column Family:                            “Node Info”                  “Out Edges”     “In Edges”

 Column Qualifier:                                  <Field>                    <Node ID>     <Node ID>
        (Tuples):
                                                                               <Edge ID>     <Edge ID>

                        Value:                      <Value>                   <Edge Info>   <Edge Info>

                                                                                                      21
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Geospatial Queries
                 Table:                 Geo Index                    Latitude    Longitude   Depth
                                                                     10110101001 00111010010 11010110110

                   Row:               <GeoHash>
                                                                    101001110111010101011100001011100


  Column Family:                     <Event Type>



Column Qualifier:                         <UUID>



                 Value:           <Digest of Event>

                                                                                                      22
 info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Document Partitioning

                   Table:                                       Shard Table

                     Row:                                     <Partition ID>

 Column Family:                              “Docs” “Inv. Index” “Field Index”                      “Geo”

Column Qualifier                         <UUID>                <Term>               <Field:Term> <Hash>
       (Tuples):
                                         <Field>               <UUID>                   <UUID>     <UUID>

                   Value:               <Value>

                                                                                                            23
info@sqrrl.com | @sqrrl_inc | 617.520.4375      sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Document Partitioning




                                                                                                     24
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Intersecting Iterator
                                                                        ‘foo’ and (‘bar’ or ‘baz’)


                 <Partition ID>

            “Docs” “Inv. Index”

           <UUID>             <Term>

            <Field>           <UUID>

           <Value>




                                                                                                          26
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contents


                    Core Philosophy
                    Technology
                    Techniques
                    Application APIs




                                                                                                     27
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
acorn

    Key/Value pairs are great!                                                                       =
    How do I construct a document
    partitioning key again?
           Techniques should be built into an API
           Let the people have polyglot
           Lucene, SQL, SPARQL, JAQL, Matlab
           (not just Key, Value, Range)
                                                                                      +
                                                                                      +
                                                                                                     28
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Combined IR + Graph Search




                                                                                                     29
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Schema-less Stats




                                                                                                     30
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Get Involved

                           http://accumulo.apache.org
                Help us make Accumulo even better!




                                                                                                     31
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved
Secure. Scale. Adapt.
Contact




                                              Adam Fuchs, CTO

                                                  sqrrl data, Inc.
                                                  617-520-4375
                                                 www.sqrrl.com
                                                    @sqrrl_inc
                                                 info@sqrrl.com

                                                                                                     32
info@sqrrl.com | @sqrrl_inc | 617.520.4375   sqrrl data, INC., All Rights Reserved

More Related Content

PDF
Sqrrl real time_big_data_20130411
PPTX
Accumulo meetup 20130109
PDF
Hugaccumulo 121018192044-phpapp02
PPTX
Innovate Analytics with Oracle Data Mining & Oracle R
PDF
Using Graphs for Data Analysis
PPTX
Fighting cyber fraud with hadoop
PDF
How To Become A Big Data Engineer? Edureka
PDF
5 here today still here tomorrow new technology for big_forever_archives
Sqrrl real time_big_data_20130411
Accumulo meetup 20130109
Hugaccumulo 121018192044-phpapp02
Innovate Analytics with Oracle Data Mining & Oracle R
Using Graphs for Data Analysis
Fighting cyber fraud with hadoop
How To Become A Big Data Engineer? Edureka
5 here today still here tomorrow new technology for big_forever_archives

What's hot (19)

PPTX
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
PDF
Splunking configfiles 20211208_daniel_wilson
PPTX
Analyzing 1.2 Million Network Packets per Second in Real-time
PDF
Cloud Accelerated Genomics
PPTX
Just the sketch: advanced streaming analytics in Apache Metron
PDF
Achieving HIPAA on GCP
PPTX
IoT: How Data Science Driven Software is Eating the Connected World
PPTX
Getting Started with Splunk Breakout Session
PPTX
Big Data Fundamentals
PPTX
Getting Started with Splunk Enterprises
PPTX
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
PPTX
Intel precision medicine apr 2015
PPTX
IT @ Intel: Preparing the Future Enterprise with the Internet of Things
PDF
Oracle Database Appliance - Introduction in Cyprus
PDF
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
PDF
Cloudwatt pioneers big_data
PPTX
Harnessing the Power of Apache Hadoop Series
PPTX
Deep Learning with Cloudera
PPTX
Hadoop in the cloud – The what, why and how from the experts
Imperative Induced Innovation - Patrick W. Dowd, Ph. D
Splunking configfiles 20211208_daniel_wilson
Analyzing 1.2 Million Network Packets per Second in Real-time
Cloud Accelerated Genomics
Just the sketch: advanced streaming analytics in Apache Metron
Achieving HIPAA on GCP
IoT: How Data Science Driven Software is Eating the Connected World
Getting Started with Splunk Breakout Session
Big Data Fundamentals
Getting Started with Splunk Enterprises
Beyond Kerberos and Ranger - Tips to discover, track and manage risks in hybr...
Intel precision medicine apr 2015
IT @ Intel: Preparing the Future Enterprise with the Internet of Things
Oracle Database Appliance - Introduction in Cyprus
Moving Health Care Analytics to Hadoop to Build a Better Predictive Model
Cloudwatt pioneers big_data
Harnessing the Power of Apache Hadoop Series
Deep Learning with Cloudera
Hadoop in the cloud – The what, why and how from the experts
Ad

Viewers also liked (20)

PPTX
An Introduction to Accumulo
PDF
Sqrrl June Webinar: An Accumulo Love Story
PDF
Accumulo14 15
PPTX
Intro to Big Data in Urban GIS Research
PDF
Accumulo design
PDF
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
PDF
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
PDF
Accumulo Summit 2016: Accumulo in the Enterprise
PDF
Apache Accumulo and the Data Lake
PDF
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
PDF
Large Scale Accumulo Clusters
PPTX
Accumulo: A Quick Introduction
PDF
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
PDF
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
PDF
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
PPTX
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
PDF
Introduction to Accumulo
PDF
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
PPT
Apache Accumulo Overview
PPT
Introduction to RDF
An Introduction to Accumulo
Sqrrl June Webinar: An Accumulo Love Story
Accumulo14 15
Intro to Big Data in Urban GIS Research
Accumulo design
Accumulo Summit 2014: Four Orders of Magnitude: Running Large Scale Accumulo ...
Accumulo Summit 2015: Tracing in Accumulo and HDFS [Internals]
Accumulo Summit 2016: Accumulo in the Enterprise
Apache Accumulo and the Data Lake
Accumulo Summit 2014: Benchmarking Accumulo: How Fast Is Fast?
Large Scale Accumulo Clusters
Accumulo: A Quick Introduction
Accumulo Summit 2016: Embedding Authenticated Data Structures in Accumulo
Accumulo Summit 2015: Accumulo In-Depth: Building Bulk Ingest [Sponsored]
Accumulo Summit 2015: Performance Models for Apache Accumulo: The Heavy Tail ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Introduction to Accumulo
Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...
Apache Accumulo Overview
Introduction to RDF
Ad

Similar to Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data (20)

PDF
Data Processing in the Work of NoSQL? An Introduction to Hadoop
KEY
An Introduction to Hadoop
PDF
Big Data Israel Meetup : Couchbase and Big Data
PDF
Sqrrl Enterprise: Integrate, Explore, Analyze
PPTX
Unlocking value in your (big) data
PDF
Self-Service Access and Exploration of Big Data
PDF
Sqrrl February Webinar: Breaking Down Data Silos
PDF
The CIOs Guide to NoSQL 2012
PDF
Sqrrl May Webinar: Data-Centric Security
PDF
Sqrrl Enterprise: Big Data Security Analytics Use Case
PDF
Meetup presenation 06192013
PPTX
Crowd-Sourced Intelligence Built into Search over Hadoop
PPTX
Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data
PPTX
Hcj 2013-01-21
PPTX
NoSQL learnings from the world of Telco
PDF
Simplifying Big Data Analytics for the Business
PPTX
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
PPT
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
PPTX
Dunning strata-2012-27-02
PPTX
Big data and its impact on SOA
Data Processing in the Work of NoSQL? An Introduction to Hadoop
An Introduction to Hadoop
Big Data Israel Meetup : Couchbase and Big Data
Sqrrl Enterprise: Integrate, Explore, Analyze
Unlocking value in your (big) data
Self-Service Access and Exploration of Big Data
Sqrrl February Webinar: Breaking Down Data Silos
The CIOs Guide to NoSQL 2012
Sqrrl May Webinar: Data-Centric Security
Sqrrl Enterprise: Big Data Security Analytics Use Case
Meetup presenation 06192013
Crowd-Sourced Intelligence Built into Search over Hadoop
Emulex and the Evaluator Group Present Why I/O is Strategic for Big Data
Hcj 2013-01-21
NoSQL learnings from the world of Telco
Simplifying Big Data Analytics for the Business
Hadoop summit EU - Crowd Sourcing Reflected Intelligence
The Business Advantage of Hadoop: Lessons from the Field – Cloudera Summer We...
Dunning strata-2012-27-02
Big data and its impact on SOA

More from Yahoo Developer Network (20)

PDF
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
PDF
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
PDF
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
PDF
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
PDF
CICD at Oath using Screwdriver
PDF
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
PPTX
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
PDF
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
PPTX
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
PPTX
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
PDF
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
PPTX
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
PDF
Moving the Oath Grid to Docker, Eric Badger, Oath
PDF
Architecting Petabyte Scale AI Applications
PDF
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
PPTX
Jun 2017 HUG: YARN Scheduling – A Step Beyond
PDF
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
PPTX
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
PPTX
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
PPTX
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics
Developing Mobile Apps for Performance - Swapnil Patel, Verizon Media
Athenz - The Open-Source Solution to Provide Access Control in Dynamic Infras...
Athenz & SPIFFE, Tatsuya Yano, Yahoo Japan
Athenz with Istio - Single Access Control Model in Cloud Infrastructures, Tat...
CICD at Oath using Screwdriver
Big Data Serving with Vespa - Jon Bratseth, Distinguished Architect, Oath
How @TwitterHadoop Chose Google Cloud, Joep Rottinghuis, Lohit VijayaRenu
The Future of Hadoop in an AI World, Milind Bhandarkar, CEO, Ampool
Apache YARN Federation and Tez at Microsoft, Anupam Upadhyay, Adrian Nicoara,...
Containerized Services on Apache Hadoop YARN: Past, Present, and Future, Shan...
HDFS Scalability and Security, Daryn Sharp, Senior Engineer, Oath
Hadoop {Submarine} Project: Running deep learning workloads on YARN, Wangda T...
Moving the Oath Grid to Docker, Eric Badger, Oath
Architecting Petabyte Scale AI Applications
Introduction to Vespa – The Open Source Big Data Serving Engine, Jon Bratseth...
Jun 2017 HUG: YARN Scheduling – A Step Beyond
Jun 2017 HUG: Large-Scale Machine Learning: Use Cases and Technologies
February 2017 HUG: Slow, Stuck, or Runaway Apps? Learn How to Quickly Fix Pro...
February 2017 HUG: Exactly-once end-to-end processing with Apache Apex
February 2017 HUG: Data Sketches: A required toolkit for Big Data Analytics

Recently uploaded (20)

PPTX
Tartificialntelligence_presentation.pptx
PPTX
OMC Textile Division Presentation 2021.pptx
PPTX
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
PDF
A comparative analysis of optical character recognition models for extracting...
PDF
Encapsulation theory and applications.pdf
PDF
project resource management chapter-09.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Mushroom cultivation and it's methods.pdf
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
PDF
A novel scalable deep ensemble learning framework for big data classification...
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
August Patch Tuesday
PPTX
1. Introduction to Computer Programming.pptx
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Programs and apps: productivity, graphics, security and other tools
PPTX
Chapter 5: Probability Theory and Statistics
PDF
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
PDF
Assigned Numbers - 2025 - Bluetooth® Document
Tartificialntelligence_presentation.pptx
OMC Textile Division Presentation 2021.pptx
TechTalks-8-2019-Service-Management-ITIL-Refresh-ITIL-4-Framework-Supports-Ou...
A comparative analysis of optical character recognition models for extracting...
Encapsulation theory and applications.pdf
project resource management chapter-09.pdf
Encapsulation_ Review paper, used for researhc scholars
Mushroom cultivation and it's methods.pdf
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
DASA ADMISSION 2024_FirstRound_FirstRank_LastRank.pdf
A novel scalable deep ensemble learning framework for big data classification...
MIND Revenue Release Quarter 2 2025 Press Release
August Patch Tuesday
1. Introduction to Computer Programming.pptx
Getting Started with Data Integration: FME Form 101
Programs and apps: productivity, graphics, security and other tools
Chapter 5: Probability Theory and Statistics
Microsoft Solutions Partner Drive Digital Transformation with D365.pdf
Assigned Numbers - 2025 - Bluetooth® Document

Oct 2012 HUG: Apache Accumulo: Unlocking the Power of Big Data

  • 1. sqrrl data, INC. Secure. Scale. Adapt. Adam Fuchs, Chief Technology Officer info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 2. Secure. Scale. Adapt. Who We are is the commercial provider of Mature Database Technology - Apache Accumulo Fine-Grained Access Controls - Data Integration and Sharing Proven Performance - Petabytes and Beyond Advanced Analytics - Search, Statistics, and Graphs 2 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 3. Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 3 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 4. Secure. Scale. Adapt. Apache Accumulo Perspective Data Data Data Integration across: Multiple business lines Multiple data sets Multiple applications Multiple security, privacy, legal, Application Application Application policy, regulatory, and compliance constraints New demands 4 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 5. Secure. Scale. Adapt. Accumulo Design Drivers Cell-Level Security 1  Express common security requirements in the infrastructure, not just in the application  Data-centric approach encourages secure sharing Scalability 2  Near linear performance improvements at thousands of nodes  Durable and reliable under increased failures that come with scale Diverse, Interactive Analytics 3  Sorted key/value core performs well in a diverse set of domains  Information retrieval, statistics, graph analysis, geo indexing, and more Flexible, Adaptive Schema 4  Start with universal structures and indexing  Refine the schema over time 5 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 6. Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 6 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 7. Secure. Scale. Adapt. Accumulo Key Structure An Accumulo key is a 5-tuple, consisting of: Row: Controls Atomicity Column Family: Controls Locality Column Qualifier: Controls Uniqueness Visibility Label: Controls Access Timestamp: Controls Versioning Row Col. Fam. Col. Qual. Visibility Timestamp Value Patient suffers John Doe Notes PCP PCP_JD 20120912 from an acute … John Doe Test Results Cholesterol JD|PCP_JD 20120912 183 John Doe Test Results Mental Health JD|PSYCH_JD 20120801 Pass John Doe Test Results X-Ray JD|PHYS_JD 20120513 1010110110100… Accumulo Key/Value Example 7 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 8. Secure. Scale. Adapt. Visibility Syntax & Semantics 8 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 9. Secure. Scale. Adapt. Tablets Well-Known Location (zookeeper) Collections of KV pairs form Tables Tables are partitioned into Tablets Root Tablet -∞ to ∞ Metadata tablets hold info about other tablets, forming a 3-level hierarchy Metadata Tablet 1 Metadata Tablet 2 A Tablet is a unit of work for a Tablet -∞ to “Encyclopedia:Ocelot” “Encyclopedia:Ocelot” to ∞ Server Table: Adam’s Table Table: Encyclopedia Table: Foo Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet Data Tablet -∞ : thing thing : ∞ -∞ : Ocelot Ocelot : Yak Yak : ∞ -∞ to ∞ 9 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 10. Secure. Scale. Adapt. Accumulo Architecture Delegate Zookeeper Authority Tablet Server Zookeeper Zookeeper Tablet Delegate Read/Write Application Authority Tablet Server Assign/Balance Master Application Tablet Store/Replicate Application Tablet Server Hadoop Tablet 10 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 11. Secure. Scale. Adapt. Tablet Data Flow Tablet Scan In-Memory Iterator Reads Writes Iterator Tree Map Minor Tree Compaction Sorted, Ind Sorted, Ind exed File exed File Write Ahead Sorted, Ind Log Iterator exed File (For Recovery) Merging / Major Tree Compaction 11 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 12. Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 16 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 13. Secure. Scale. Adapt. Hierarchical Decomposition Row: <person> Column Family: attribute purchases returns Column Qualifier: age discount sneakers hat Value: <age> <40%> <cost> <cost> 17 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 14. Secure. Scale. Adapt. Materialized Table Key/Value Pair Row: bill george Column attribute purchases attribute purchases returns Family: Column age discount sneakers age sneakers hat Qualifier: Value: 49 40% $100 27 $83 $42 18 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 15. Secure. Scale. Adapt. Forward and Inverted Index Table: Forward Index Inverted Index Row: <UUID> <Term> Column Family: <Type> <Type> + <Field> Column Qualifier: <Field> <UUID> Value: <Term> <Digest of Event> 19 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 16. Secure. Scale. Adapt. Forward and Inverted Index 20 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 17. Secure. Scale. Adapt. Graph Analysis Table: Graph Table Row: <Node ID> Column Family: “Node Info” “Out Edges” “In Edges” Column Qualifier: <Field> <Node ID> <Node ID> (Tuples): <Edge ID> <Edge ID> Value: <Value> <Edge Info> <Edge Info> 21 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 18. Secure. Scale. Adapt. Geospatial Queries Table: Geo Index Latitude Longitude Depth 10110101001 00111010010 11010110110 Row: <GeoHash> 101001110111010101011100001011100 Column Family: <Event Type> Column Qualifier: <UUID> Value: <Digest of Event> 22 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 19. Secure. Scale. Adapt. Document Partitioning Table: Shard Table Row: <Partition ID> Column Family: “Docs” “Inv. Index” “Field Index” “Geo” Column Qualifier <UUID> <Term> <Field:Term> <Hash> (Tuples): <Field> <UUID> <UUID> <UUID> Value: <Value> 23 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 20. Secure. Scale. Adapt. Document Partitioning 24 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 21. Secure. Scale. Adapt. Intersecting Iterator ‘foo’ and (‘bar’ or ‘baz’) <Partition ID> “Docs” “Inv. Index” <UUID> <Term> <Field> <UUID> <Value> 26 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 22. Secure. Scale. Adapt. Contents Core Philosophy Technology Techniques Application APIs 27 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 23. Secure. Scale. Adapt. acorn Key/Value pairs are great! = How do I construct a document partitioning key again? Techniques should be built into an API Let the people have polyglot Lucene, SQL, SPARQL, JAQL, Matlab (not just Key, Value, Range) + + 28 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 24. Secure. Scale. Adapt. Combined IR + Graph Search 29 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 25. Secure. Scale. Adapt. Schema-less Stats 30 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 26. Secure. Scale. Adapt. Get Involved http://accumulo.apache.org Help us make Accumulo even better! 31 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved
  • 27. Secure. Scale. Adapt. Contact Adam Fuchs, CTO sqrrl data, Inc. 617-520-4375 www.sqrrl.com @sqrrl_inc info@sqrrl.com 32 info@sqrrl.com | @sqrrl_inc | 617.520.4375 sqrrl data, INC., All Rights Reserved

Editor's Notes

  • #10: Tablet Servers have 4 primary functions:Hosting RPCs (read, write, etc.)Managing resources (RAM, CPU, File I/O, etc.)Scheduling background tasks (compactions, caching, etc.)Handling key/value pairs (via Iterators)