SlideShare a Scribd company logo
Implementing and Visualizing Click-
  Stream Data with MongoDB	

                      	

Jan 22, 2013 - New York MongoDB User Group	

                        	

            Cameron Sim - LearnVest.com
Agenda	

•  About LearnVest	

•  HL Application Architecture	

•  Data Capture	

•  Event Packaging	

•  MongoDB Data Warehousing	

•  Loading & Visualization	

•  Finishing up
LearnVest Inc.
                            www.learnvest.com	

                             Mission Statement	

    Aiming to making Financial Planning as accessible as having a gym membership	

                                          	

                                          	

           Company	

                                          Key Products	

nded in 2008 by Alexa Von Tobel, CEO	

            Account Aggregation and Managem
                	

                              (Bank, Credit, Loan, Investment, Mort
 50+ People and Growing rapidly	

                                     	

          Based in NYC	

                       Original and Syndicated Newsletter Co
                	

                                                    	

           Platforms	

                                       Financial Planning	

         Web  iPhone	

                                  (tiered product offering)	

                	

                                                    	


                                        Stack	

                                                             Analytics	

        Operational	

                             MongoDB 2.2.0 (3-node replica-set
Wordpress, Backbone.js, Node.js	

                         Java 6, Spring 3	

ava Spring 3, Redis, Memcached,
LearnVest.com	

      Web
LearnVest.com	

     IPhone
High Level Architecture	

      Production	

                            Analytics	

               	

                                  	

elivery               Services	

   Services              Loaders  Dashbo




  HTTPS	

  pyMongo
ure Everything	

                            Collection	

-Driven events over web and mobile	

 m-level exceptions	

ything else	


porary Data	

ok’ with approximate data	

rational Databases are the system of record	


egate events as they come in	

ove the overhead of basic metrics (counts, sums) on core events	

p by user unique id and increment counts per event, over time-dimensions
eek-ending, month, year)
Data Capture	

OS	


 (void) sendAnalyticEventType:(NSString*)eventType
                       object:(NSString*)object
                         name:(NSString*)name
                         page:(NSString*)page
                       source:(NSString*)source;

    NSMutableDictionary *eventData = [NSMutableDictionary dictionary];

    if   (eventType!=nil) [params setObject:eventType forKey:@eventType];
    if   (object!=nil) [eventData setObject:object forKey:@object];
    if   (name!=nil) [eventData setObject:name forKey:@name];
    if   (page!=nil) [eventData setObject:page forKey:@page];
    if   (source!=nil) [eventData setObject:source forKey:@source];
    if   (eventData!=nil) [params setObject:eventData forKey:@eventData];

    [[LVNetworkEngine sharedManager] analytics_send:params];
Data Capture	

WEB (JavaScript)	


unction internalTrackPageView() {
  var cookie = {
            userContext: jQuery.cookie('UserContextCookie'),
      };
  var trackEvent = {
            eventType: pageView,
            eventData: {
                   page: window.location.pathname + window.location.search
            }
      };
      // AJAX
      jQuery.ajax({
             url: /api/track,
             type: POST,
             dataType: json,
             data: JSON.stringify(trackEvent),
             // Set Request Headers
             beforeSend: function (xhr, settings) {
                    xhr.setRequestHeader('Accept', 'application/json');
                    xhr.setRequestHeader('User-Context', cookie.userContext)
                    if(settings.type === 'PUT' || settings.type === 'POST')
                           xhr.setRequestHeader('Content-Type', 'application/js
                    }
             }
      });
Bus Event Packaging	

ng 3 RESTful service layer, controller methods define the eventCode via @tracki
otation	


tom Intercepter class extends HandlerInterceptorAdapter and implements
 Handle() (for each event) to invoke calls via Spring @async to an EventPublisher	


ntPublisher publishes to common event bus queue with multiple subscribers, one o
kages the eventPayload MapString, Object object and forwards to Analytics Rest
Bus Event Packaging	

ing RestController Methods	

ace	


estMapping(value = /user/login, method = RequestMethod.POST,
rs=Accept=application/json)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request);

ete/Impl Class	

ride
king(user.login)
c MapString, Object userLogin(@RequestBody MapString, Object event,
ervletRequest request){

/Implementation

eturn event;
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	


cted void handleTracking(String trackingCode, MapString, Object modelMap
ervletRequest request) {


MapString, Object responseModel = new HashMapString, Object();

 // remove non-serializables  copy over data from modelMap

 try {
        this.eventPublisher.publish(trackingCode, responseModel, request);
 } catch (Exception e) {
        log.error(Error tracking event ' + trackingCode + ' : 
                     + ExceptionUtils.getStackTrace(e));
 }
Bus Event Packaging	

stom Intercepter class extends HandlerInterceptorAdapter 	

c void publish (String eventCode, MapString,Object eventData,
                                                HttpServletRequest request

MapString,Object payload = new HashMapString,Object();
String eventId=UUID.randomUUID().toString();
MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques

//Normalize message
payload.put(eventType, eventData.get(eventType));
payload.put(eventData, eventData.get(eventType));
payload.put(version, eventData.get(eventType));
payload.put(eventId, eventId);
payload.put(eventTime, new Date());
payload.put(request, requestMap);
.
.
.
//Send to the Analytics Service for MongoDB persistence




c void sendPost(EventPayload payload){
   HttpEntity request = new HttpEntity(payload.getEventPayload(), headers)
Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
Bus Event Packaging	

erialized Json (User Action)	


tCode”   :   “user.login”,
tType”   :   “login”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “” : “”,
                  “” : “”,
                  “” : “”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
Bus Event Packaging	

erialized Json (Generic Event)	


tCode”   :   “generic.ui”,
tType”   :   “pageView”,
ion”     :   “1.0”,
tTime”   :   “1358603157746”,
tData”   :   {
                  “page”    : “/learnvest/moneycenter/inbox”,
                  “section” : “transactions”,
                  “name”    : “view transactions”
                  “object” : “page”
             },
est” : {
             “call-source” : “WEB”,
             “user-context” : “00002b4f1150249206ac2b692e48ddb3”,
             “user.agent”   : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2)
                                AppleWebKit/537.11 (KHTML, like Gecko) Chrome/
                                23.0.1271.101 Safari/537.11”,
             “cookie”       : “size=4; CP.mode=B; PHPSESSID=c087908516
                                ee2fae50cef6500101dc89; resolution=1920;
                                JSESSIONID=56EB165266A2C4AFF9
                                46F139669D746F; csrftoken=73bdcd
                                ddf151dc56b8020855b2cb10c8, content-length :
                                204, accept-encoding : gzip,deflate,sdch”,

         }
MongoDB Data Warehousing	

goDB Information	

 0	

 de replica-set	

rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines	

  with single 500GB EBS volumes mounted to /opt/data	


goDB Config File	

  = /opt/data/mongodb/datarest = truereplSet = voyager	

mes	

vents daily on web, ~600K on mobile	

B per day at start, slowed to ~1GB per day	

ntly at 78GB (collecting since August 2012)	


re Scaling Strategy	

p 2nd Replica-Set	

d replica-sets to n at 60% / 250GB per EBS volume	

d key probably based on sequential mix of email_address  additional string
MongoDB Data Warehousing	

OBILE	


 ist all events, bucketed by source, event code and time:-	

EB/MOBILE	

er.login	

 e (day, week-ending, month, year)	


ert into collection e_web / e_mobile	


sert into:- 	

web_user_login_day	

web_user_login_week	

web_user_login_month	

web_user_login_year	


 dictable model for scaling and measuring business growth
MongoDB Data Warehousing	

DBObject newDocument = new BasicDBObject().append($inc
                     new BasicDBObject().append(count, 1));

ate day dimension
ction_day.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(d)),newDocument, true, false

ate week dimension
ction_week.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_day.format(w)), newDocument, true, fals

ate month dimension
ction_month.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_month.format(d)), newDocument, true, fa

ate month dimension
ction_year.update(new BasicDBObject().append(user-context, userContext)
               .append(eventType, eventType)
               .append(date, sdf_year.format(d)), newDocument, true, fal
MongoDB Data Warehousing	

ount_addManual_weeke_web_account_addManual_year
_user_login_day
_user_login_week
_user_login_month
_user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g
weeke_mobile_generic_ui_year

e_web_user_login_day.find()
d : ObjectId(50e4b9871b36921910222c42), count   : 5, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6cfcb9a80a2b4ee21422), count   : 7, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50cd6e51b9a80a2b4ee21427), count   : 2, date : 01/02,
-context : c4ca4238a0b923820dcc509a6f75849b }
d : ObjectId(50e4b9871b36921910222c42), count   : 3, date : 01/03,
-context : 50e49a561b36921910222c33 }
MongoDB Data Warehousing	

1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size=
de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920;
IONID=56EB165266A2C4AFF946F139669D746F;
oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept-
ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje
on, name : split transaction button, page : #inbox/79876/, secti
saction_river_details } }
MongoDB Data Warehousing	

xing Strategy	


xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large
ce and 3.75GB on Medium instances	


 datetime in two fields and compound index on date with other fields like eventTyp
unique id (user-context)	


vy insertion rates, much lower read rates....so less indexes the better
MongoDB Data Warehousing	

ing Strategy
e_web.getIndexes()[
        v : 1,            key : {                  request.user-contex
               created_date : 1        },            ns :
ycenter.e_web,             name : request.user-context_1_created_date_

        v : 1,            key : {                  eventData.name : 1
     created_date : 1            },           ns : moneycenter.e_web
 name : eventData.name_1_created_date_1     }]
jective	

Loading  Visualization	

 how historic and intraday stats on core use cases (logins, conversions)	

 how user funnel rates on conversion pages	

 how general usability - how do users really use the Web and IOS platforms?	


on-Functionals	

 traday doesn’t need to be “real-time”, polling is good enough for now	

Overnight batch job for historic must scale horizontally	


 neral Implementation Strategy	

 o all heavy lifting  object manipulation, UI should just display graph or table	

Modularize the service to be able to regenerate any graphs/tables without a full load
Loading  Visualization	

va Batch Service	


a Mongo library to query key collections and return user counts and sum of events

ursor webUserLogins = c.find(
   new BasicDBObject(date, sdf.format(new Date())));

vate HashMapString, Object getSumAndCount(DBCursor cursor){
          HashMapString, Object m = new HashMapString, Object();

           int sum=0;
           int count=0;
           DBObject obj;
           while(cursor.hasNext()){
                  obj=(DBObject)cursor.next();
                  count++;
                  sum=sum+(Integer)obj.get(count);
           }

           m.put(sum, sum);
           m.put(count, count);
           m.put(average, sdf.format(new Float(sum)/count));

           return m;
Loading  Visualization	

va Batch Service	


e Aggregation Framework where required on core collections (e_web) and externa
reate aggregation objects
bject project = new BasicDBObject($project,
 new BasicDBObject(day_value, fields) );
bject day_value = new BasicDBObject( day_value, $day_value);
bject groupFields = new BasicDBObject( _id, day_value);

reate the fields to group by, in this case “number”
upFields.put(number, new BasicDBObject( $sum, 1));

reate the group
bject group = new BasicDBObject($group, groupFields);

xecute
regationOutput output = mycollection.aggregate( project, group );

(DBObject obj : output.results()){
Loading  Visualization	


va Batch Service	


ngoDB Command Line example on aggregation over a time period, e.g. month
b.e_web.aggregate( [      { $match : { created_date : { $gt :
Date(2012-10-25T00:00:00)}}},     { $project : {        day_value : {day
dayOfMonth : $created_date },                          month:{ $month :
reated_date }} }},     { $group : {         _id : {day_value:$day_value}
    number : { $sum : 1 }      } },   { $sort : { day_value : -1 } } ])
Loading  Visualization	

va Batch Service	


sisting events into graph and table collections	


.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate
.96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0
ers_avg_linked : 3.43, users_linked : 7 }
_id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144,
counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat
.11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0
ers_avg_linked : 4, users_linked : 16 }
_id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119,
counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat
.13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0
ers_avg_linked : 4.5, users_linked : 18 }
17)
           Loading  Visualization	

day numbers    try:        conn = pymongo.Connection('localhost',
           db = conn['lvanalytics']
accountmetrics.find(
                                           cursor =

           {date : {$gte : dt_from, $lte : dt_to}}).sort(date)
urn buildMetricsDict(cursor)    except Exception as e:
ger.error(e.message)


urn the graph object (as a list or a dict of lists) to the view that called the
thod	

edata={}
edata['accountsGraph']=mongodb_home.getHomeChart()

urn render_to_response('home.html',{'pagedata': pagedata},
text_instance=RequestContext(request))




.homeGraphs.find()

_id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
Loading  Visualization	


ango and HighCharts

pulate the series.. (JavaScript with Django templating)	

iesOptions[0] = {
id: 'naturalAccounts',    name: Natural Accounts,    data: [     {% for
n pagedata.metrics.accounts_natural %}          {% if not forloop.first
 {% endif %}               [Date.UTC({{a.0}}),{{a.1}}]         {% endfor
  ],   tooltip: {      valueDecimals: 2   }   };
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Loading  Visualization	

ango and HighCharts

d Create the Charts and Tables...
Lessons Learned	

• Date Time managed as two fields, Datetime and Date	

• Aggregating and upserting documents as events are received works for us	

•  Real-time Map-Reduce in pyMongo - too slow, don’t do this.	

	

• Django-noRel - Unstable, use Django and configure MongoDB as a
      datastore only	


• Memcached on Django is good enough (at the moment) - use django-celery
      with rabbitmq to pre-cache all data after data loading	


•  HighCharts is buggy - considering D3  other libraries	

• Don’t need to retrieve data directly from MongoDB to Django, perhaps
      provide all data via a service layer (at the expense of ever-additional
      features in pyMongo)
Next Steps	

• A/B testing framework, experiments and variances	

•  Unauthenticated / Authenticated user tracking	

•  Provide data async over service layer	

• Segmentation with graphical libraries like D3  Cross-Filter (
http://square.github.com/crossfilter/)	


• Saving Query Criteria, expanding out BI tools for internal users	

• MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools)	

• Storm / Kafka for real-time analytics processing	

• Shard the Replica-Set, looking into Gizzard as the middleware
Hrishi Dixit	

  Chief Technology Officer	

                                                       
                                             Kevin Connelly	

                                         Director of Engineering	

                 Will Larche	

                                          kevin@learnvest.com	

   hrishi@learnvest.com	

                                  	

                                  	

                                                                     	

                                                                                Lead IOS Developer	

                                                                                will@learnvest.com	


                                  	

                                  	

                                  	

                                                  	

                   	

                                                                        	

                                                                        	

                                  	

                                   	

                                  	

                                   	

                                                    	

                 	

                                                    	

                 	

              	

                                             Cameron Sim	

                             	

       Jeremy Brennan	

                                        Director of Analytics Tech	

           your name here	

Director of UI/UX Technology	

                                        cameron@learnvest.com	

              New Awesome Develope
   jeremy@learnvest.com	

                                  	

                                           you@learnvest.com	

              	

                                  	

             	

                                             	

                        	

                                                                               HIR

More Related Content

PPTX
GitHub Copilot.pptx
PDF
Parallelizing with Apache Spark in Unexpected Ways
PPTX
APACHE KAFKA / Kafka Connect / Kafka Streams
PDF
Ml ops intro session
PDF
Cloud Migration Strategy and Best Practices
PDF
Data Pipelines with Apache Kafka
PDF
IoT & Azure (EventHub)
PDF
Hyperspace for Delta Lake
GitHub Copilot.pptx
Parallelizing with Apache Spark in Unexpected Ways
APACHE KAFKA / Kafka Connect / Kafka Streams
Ml ops intro session
Cloud Migration Strategy and Best Practices
Data Pipelines with Apache Kafka
IoT & Azure (EventHub)
Hyperspace for Delta Lake

What's hot (20)

PDF
Building real time analytics applications using pinot : A LinkedIn case study
PPTX
Introduction to Azure Databricks
PDF
Big Data Architecture
PDF
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
PPTX
Introduction to Apache ZooKeeper
PDF
Pinterest - Big Data Machine Learning Platform at Pinterest
PDF
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
PDF
PPTX
On Prem vs Cloud SlideShare
PPTX
Using Camunda on Kubernetes through Operators
PDF
Solving Enterprise Data Challenges with Apache Arrow
PDF
A Collaborative Data Science Development Workflow
PPTX
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
PPTX
Kibana overview
PPTX
Introducing MongoDB Atlas
PDF
Introduction to Oracle Cloud Infrastructure Services
PPTX
Introducing DevOps
PDF
Apache Kafka - Martin Podval
PDF
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
PDF
Application modernization patterns with apache kafka, debezium, and kubernete...
Building real time analytics applications using pinot : A LinkedIn case study
Introduction to Azure Databricks
Big Data Architecture
Disaster Recovery for Multi-Region Apache Kafka Ecosystems at Uber
Introduction to Apache ZooKeeper
Pinterest - Big Data Machine Learning Platform at Pinterest
Architecture patterns for distributed, hybrid, edge and global Apache Kafka d...
On Prem vs Cloud SlideShare
Using Camunda on Kubernetes through Operators
Solving Enterprise Data Challenges with Apache Arrow
A Collaborative Data Science Development Workflow
Using Spark Streaming and NiFi for the next generation of ETL in the enterprise
Kibana overview
Introducing MongoDB Atlas
Introduction to Oracle Cloud Infrastructure Services
Introducing DevOps
Apache Kafka - Martin Podval
Serverless Kafka and Spark in a Multi-Cloud Lakehouse Architecture
Application modernization patterns with apache kafka, debezium, and kubernete...
Ad

Viewers also liked (6)

PDF
MongoDB ClickStream and Visualization
PDF
Clickstream Data Warehouse - Turning clicks into customers
PDF
PDF
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
PPTX
Web log & clickstream
PPTX
Using Big Data to Drive Customer 360
MongoDB ClickStream and Visualization
Clickstream Data Warehouse - Turning clicks into customers
Insights into Customer Behavior from Clickstream Data by Ronald Nowling
Web log & clickstream
Using Big Data to Drive Customer 360
Ad

Similar to Implementing and Visualizing Clickstream data with MongoDB (20)

PDF
Open analytics | Cameron Sim
KEY
How Signpost uses MongoDB for Tracking and Analytics
PPTX
A great api is hard to find
PDF
Usergrid Overview
PPTX
Nosql Now 2012: MongoDB Use Cases
PDF
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
PDF
Bluemix Mobile Cloud Services - Accelerating Mobile App Development
PPTX
MediaGlu and Mongo DB
PPTX
Webinar: How Banks Use MongoDB as a Tick Database
PPTX
U.S. Census presentation at DC API Meetup 12/13/12 by Alec Permison
PDF
Protecting Your APIs Against Attack & Hijack
PPTX
Secure Big Data Analytics - Hadoop & Intel
PDF
JavaScript as Data Processing Language & HTML5 Integration
PPTX
Codestrong 2012 breakout session the role of cloud services in your next ge...
PDF
Designing your API Server for mobile apps
PDF
Shreeraj - Hacking Web 2 0 - ClubHack2007
PDF
MongoDB in FS
PDF
Symfony & Javascript. Combining the best of two worlds
PPT
Technology stack behind Airbnb
PPTX
Cloudbase.io MoSync Reload Course
Open analytics | Cameron Sim
How Signpost uses MongoDB for Tracking and Analytics
A great api is hard to find
Usergrid Overview
Nosql Now 2012: MongoDB Use Cases
MongoDB World 2018: Data Models for Storing Sophisticated Customer Journeys i...
Bluemix Mobile Cloud Services - Accelerating Mobile App Development
MediaGlu and Mongo DB
Webinar: How Banks Use MongoDB as a Tick Database
U.S. Census presentation at DC API Meetup 12/13/12 by Alec Permison
Protecting Your APIs Against Attack & Hijack
Secure Big Data Analytics - Hadoop & Intel
JavaScript as Data Processing Language & HTML5 Integration
Codestrong 2012 breakout session the role of cloud services in your next ge...
Designing your API Server for mobile apps
Shreeraj - Hacking Web 2 0 - ClubHack2007
MongoDB in FS
Symfony & Javascript. Combining the best of two worlds
Technology stack behind Airbnb
Cloudbase.io MoSync Reload Course

More from MongoDB (20)

PDF
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
PDF
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
PDF
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
PDF
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
PDF
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
PDF
MongoDB SoCal 2020: MongoDB Atlas Jump Start
PDF
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
PDF
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
PDF
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
PDF
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
PDF
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
PDF
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
PDF
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
PDF
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
PDF
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
PDF
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
PDF
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
PDF
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...
MongoDB SoCal 2020: Migrate Anything* to MongoDB Atlas
MongoDB SoCal 2020: Go on a Data Safari with MongoDB Charts!
MongoDB SoCal 2020: Using MongoDB Services in Kubernetes: Any Platform, Devel...
MongoDB SoCal 2020: A Complete Methodology of Data Modeling for MongoDB
MongoDB SoCal 2020: From Pharmacist to Analyst: Leveraging MongoDB for Real-T...
MongoDB SoCal 2020: Best Practices for Working with IoT and Time-series Data
MongoDB SoCal 2020: MongoDB Atlas Jump Start
MongoDB .local San Francisco 2020: Powering the new age data demands [Infosys]
MongoDB .local San Francisco 2020: Using Client Side Encryption in MongoDB 4.2
MongoDB .local San Francisco 2020: Using MongoDB Services in Kubernetes: any ...
MongoDB .local San Francisco 2020: Go on a Data Safari with MongoDB Charts!
MongoDB .local San Francisco 2020: From SQL to NoSQL -- Changing Your Mindset
MongoDB .local San Francisco 2020: MongoDB Atlas Jumpstart
MongoDB .local San Francisco 2020: Tips and Tricks++ for Querying and Indexin...
MongoDB .local San Francisco 2020: Aggregation Pipeline Power++
MongoDB .local San Francisco 2020: A Complete Methodology of Data Modeling fo...
MongoDB .local San Francisco 2020: MongoDB Atlas Data Lake Technical Deep Dive
MongoDB .local San Francisco 2020: Developing Alexa Skills with MongoDB & Golang
MongoDB .local Paris 2020: Realm : l'ingrédient secret pour de meilleures app...
MongoDB .local Paris 2020: Upply @MongoDB : Upply : Quand le Machine Learning...

Recently uploaded (20)

PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
Review of recent advances in non-invasive hemoglobin estimation
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PPTX
Spectroscopy.pptx food analysis technology
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Programs and apps: productivity, graphics, security and other tools
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Approach and Philosophy of On baking technology
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
KodekX | Application Modernization Development
PDF
The Rise and Fall of 3GPP – Time for a Sabbatical?
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
Review of recent advances in non-invasive hemoglobin estimation
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
Spectroscopy.pptx food analysis technology
The AUB Centre for AI in Media Proposal.docx
Programs and apps: productivity, graphics, security and other tools
Per capita expenditure prediction using model stacking based on satellite ima...
Advanced methodologies resolving dimensionality complications for autism neur...
MYSQL Presentation for SQL database connectivity
Approach and Philosophy of On baking technology
Building Integrated photovoltaic BIPV_UPV.pdf
MIND Revenue Release Quarter 2 2025 Press Release
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
Mobile App Security Testing_ A Comprehensive Guide.pdf
KodekX | Application Modernization Development
The Rise and Fall of 3GPP – Time for a Sabbatical?

Implementing and Visualizing Clickstream data with MongoDB

  • 1. Implementing and Visualizing Click- Stream Data with MongoDB Jan 22, 2013 - New York MongoDB User Group Cameron Sim - LearnVest.com
  • 2. Agenda •  About LearnVest •  HL Application Architecture •  Data Capture •  Event Packaging •  MongoDB Data Warehousing •  Loading & Visualization •  Finishing up
  • 3. LearnVest Inc. www.learnvest.com Mission Statement Aiming to making Financial Planning as accessible as having a gym membership Company Key Products nded in 2008 by Alexa Von Tobel, CEO Account Aggregation and Managem (Bank, Credit, Loan, Investment, Mort 50+ People and Growing rapidly Based in NYC Original and Syndicated Newsletter Co Platforms Financial Planning Web iPhone (tiered product offering) Stack Analytics Operational MongoDB 2.2.0 (3-node replica-set Wordpress, Backbone.js, Node.js Java 6, Spring 3 ava Spring 3, Redis, Memcached,
  • 5. LearnVest.com IPhone
  • 6. High Level Architecture Production Analytics elivery Services Services Loaders Dashbo HTTPS pyMongo
  • 7. ure Everything Collection -Driven events over web and mobile m-level exceptions ything else porary Data ok’ with approximate data rational Databases are the system of record egate events as they come in ove the overhead of basic metrics (counts, sums) on core events p by user unique id and increment counts per event, over time-dimensions eek-ending, month, year)
  • 8. Data Capture OS (void) sendAnalyticEventType:(NSString*)eventType object:(NSString*)object name:(NSString*)name page:(NSString*)page source:(NSString*)source; NSMutableDictionary *eventData = [NSMutableDictionary dictionary]; if (eventType!=nil) [params setObject:eventType forKey:@eventType]; if (object!=nil) [eventData setObject:object forKey:@object]; if (name!=nil) [eventData setObject:name forKey:@name]; if (page!=nil) [eventData setObject:page forKey:@page]; if (source!=nil) [eventData setObject:source forKey:@source]; if (eventData!=nil) [params setObject:eventData forKey:@eventData]; [[LVNetworkEngine sharedManager] analytics_send:params];
  • 9. Data Capture WEB (JavaScript) unction internalTrackPageView() { var cookie = { userContext: jQuery.cookie('UserContextCookie'), }; var trackEvent = { eventType: pageView, eventData: { page: window.location.pathname + window.location.search } }; // AJAX jQuery.ajax({ url: /api/track, type: POST, dataType: json, data: JSON.stringify(trackEvent), // Set Request Headers beforeSend: function (xhr, settings) { xhr.setRequestHeader('Accept', 'application/json'); xhr.setRequestHeader('User-Context', cookie.userContext) if(settings.type === 'PUT' || settings.type === 'POST') xhr.setRequestHeader('Content-Type', 'application/js } } });
  • 10. Bus Event Packaging ng 3 RESTful service layer, controller methods define the eventCode via @tracki otation tom Intercepter class extends HandlerInterceptorAdapter and implements Handle() (for each event) to invoke calls via Spring @async to an EventPublisher ntPublisher publishes to common event bus queue with multiple subscribers, one o kages the eventPayload MapString, Object object and forwards to Analytics Rest
  • 11. Bus Event Packaging ing RestController Methods ace estMapping(value = /user/login, method = RequestMethod.POST, rs=Accept=application/json) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request); ete/Impl Class ride king(user.login) c MapString, Object userLogin(@RequestBody MapString, Object event, ervletRequest request){ /Implementation eturn event;
  • 12. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter cted void handleTracking(String trackingCode, MapString, Object modelMap ervletRequest request) { MapString, Object responseModel = new HashMapString, Object(); // remove non-serializables copy over data from modelMap try { this.eventPublisher.publish(trackingCode, responseModel, request); } catch (Exception e) { log.error(Error tracking event ' + trackingCode + ' : + ExceptionUtils.getStackTrace(e)); }
  • 13. Bus Event Packaging stom Intercepter class extends HandlerInterceptorAdapter c void publish (String eventCode, MapString,Object eventData, HttpServletRequest request MapString,Object payload = new HashMapString,Object(); String eventId=UUID.randomUUID().toString(); MapString, String requestMap = HttpRequestUtils.getRequestHeaders(reques //Normalize message payload.put(eventType, eventData.get(eventType)); payload.put(eventData, eventData.get(eventType)); payload.put(version, eventData.get(eventType)); payload.put(eventId, eventId); payload.put(eventTime, new Date()); payload.put(request, requestMap); . . . //Send to the Analytics Service for MongoDB persistence c void sendPost(EventPayload payload){ HttpEntity request = new HttpEntity(payload.getEventPayload(), headers) Map m = restTemplate.postForObject(endpoint, request, java.util.Map.class)
  • 14. Bus Event Packaging erialized Json (User Action) tCode” : “user.login”, tType” : “login”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “” : “”, “” : “”, “” : “” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 15. Bus Event Packaging erialized Json (Generic Event) tCode” : “generic.ui”, tType” : “pageView”, ion” : “1.0”, tTime” : “1358603157746”, tData” : { “page” : “/learnvest/moneycenter/inbox”, “section” : “transactions”, “name” : “view transactions” “object” : “page” }, est” : { “call-source” : “WEB”, “user-context” : “00002b4f1150249206ac2b692e48ddb3”, “user.agent” : “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_2) AppleWebKit/537.11 (KHTML, like Gecko) Chrome/ 23.0.1271.101 Safari/537.11”, “cookie” : “size=4; CP.mode=B; PHPSESSID=c087908516 ee2fae50cef6500101dc89; resolution=1920; JSESSIONID=56EB165266A2C4AFF9 46F139669D746F; csrftoken=73bdcd ddf151dc56b8020855b2cb10c8, content-length : 204, accept-encoding : gzip,deflate,sdch”, }
  • 16. MongoDB Data Warehousing goDB Information 0 de replica-set rge (primary), 2x Medium (secondary) AWS Amazon-Linux machines with single 500GB EBS volumes mounted to /opt/data goDB Config File = /opt/data/mongodb/datarest = truereplSet = voyager mes vents daily on web, ~600K on mobile B per day at start, slowed to ~1GB per day ntly at 78GB (collecting since August 2012) re Scaling Strategy p 2nd Replica-Set d replica-sets to n at 60% / 250GB per EBS volume d key probably based on sequential mix of email_address additional string
  • 17. MongoDB Data Warehousing OBILE ist all events, bucketed by source, event code and time:- EB/MOBILE er.login e (day, week-ending, month, year) ert into collection e_web / e_mobile sert into:- web_user_login_day web_user_login_week web_user_login_month web_user_login_year dictable model for scaling and measuring business growth
  • 18. MongoDB Data Warehousing DBObject newDocument = new BasicDBObject().append($inc new BasicDBObject().append(count, 1)); ate day dimension ction_day.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(d)),newDocument, true, false ate week dimension ction_week.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_day.format(w)), newDocument, true, fals ate month dimension ction_month.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_month.format(d)), newDocument, true, fa ate month dimension ction_year.update(new BasicDBObject().append(user-context, userContext) .append(eventType, eventType) .append(date, sdf_year.format(d)), newDocument, true, fal
  • 19. MongoDB Data Warehousing ount_addManual_weeke_web_account_addManual_year _user_login_day _user_login_week _user_login_month _user_login_yeare_mobile_generic_ui_daye_mobile_generic_ui_monthe_mobile_g weeke_mobile_generic_ui_year e_web_user_login_day.find() d : ObjectId(50e4b9871b36921910222c42), count : 5, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6cfcb9a80a2b4ee21422), count : 7, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50cd6e51b9a80a2b4ee21427), count : 2, date : 01/02, -context : c4ca4238a0b923820dcc509a6f75849b } d : ObjectId(50e4b9871b36921910222c42), count : 3, date : 01/03, -context : 50e49a561b36921910222c33 }
  • 20. MongoDB Data Warehousing 1, accept-charset : ISO-8859-1,utf-8;q=0.7,*;q=0.3, cookie : size= de=B; PHPSESSID=c087908516ee2fae50cef6500101dc89; resolution=1920; IONID=56EB165266A2C4AFF946F139669D746F; oken=73bdcdddf151dc56b8020855b2cb10c8, content-length : 255, accept- ing : gzip,deflate,sdch }, eventType : flick, eventData : { obje on, name : split transaction button, page : #inbox/79876/, secti saction_river_details } }
  • 21. MongoDB Data Warehousing xing Strategy xes on core collections (e_web and e_mobile) come in under 3GB on 7.5GB Large ce and 3.75GB on Medium instances datetime in two fields and compound index on date with other fields like eventTyp unique id (user-context) vy insertion rates, much lower read rates....so less indexes the better
  • 22. MongoDB Data Warehousing ing Strategy e_web.getIndexes()[ v : 1, key : { request.user-contex created_date : 1 }, ns : ycenter.e_web, name : request.user-context_1_created_date_ v : 1, key : { eventData.name : 1 created_date : 1 }, ns : moneycenter.e_web name : eventData.name_1_created_date_1 }]
  • 23. jective Loading Visualization how historic and intraday stats on core use cases (logins, conversions) how user funnel rates on conversion pages how general usability - how do users really use the Web and IOS platforms? on-Functionals traday doesn’t need to be “real-time”, polling is good enough for now Overnight batch job for historic must scale horizontally neral Implementation Strategy o all heavy lifting object manipulation, UI should just display graph or table Modularize the service to be able to regenerate any graphs/tables without a full load
  • 24. Loading Visualization va Batch Service a Mongo library to query key collections and return user counts and sum of events ursor webUserLogins = c.find( new BasicDBObject(date, sdf.format(new Date()))); vate HashMapString, Object getSumAndCount(DBCursor cursor){ HashMapString, Object m = new HashMapString, Object(); int sum=0; int count=0; DBObject obj; while(cursor.hasNext()){ obj=(DBObject)cursor.next(); count++; sum=sum+(Integer)obj.get(count); } m.put(sum, sum); m.put(count, count); m.put(average, sdf.format(new Float(sum)/count)); return m;
  • 25. Loading Visualization va Batch Service e Aggregation Framework where required on core collections (e_web) and externa reate aggregation objects bject project = new BasicDBObject($project, new BasicDBObject(day_value, fields) ); bject day_value = new BasicDBObject( day_value, $day_value); bject groupFields = new BasicDBObject( _id, day_value); reate the fields to group by, in this case “number” upFields.put(number, new BasicDBObject( $sum, 1)); reate the group bject group = new BasicDBObject($group, groupFields); xecute regationOutput output = mycollection.aggregate( project, group ); (DBObject obj : output.results()){
  • 26. Loading Visualization va Batch Service ngoDB Command Line example on aggregation over a time period, e.g. month b.e_web.aggregate( [ { $match : { created_date : { $gt : Date(2012-10-25T00:00:00)}}}, { $project : { day_value : {day dayOfMonth : $created_date }, month:{ $month : reated_date }} }}, { $group : { _id : {day_value:$day_value} number : { $sum : 1 } } }, { $sort : { day_value : -1 } } ])
  • 27. Loading Visualization va Batch Service sisting events into graph and table collections .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54, counts_total : 54, date : ISODate(2011-02-06T05:00:00Z), linked_rate .96, premium_rate : 0, str_date : 2011,01,06, upgrade_rate : 0 ers_avg_linked : 3.43, users_linked : 7 } _id : ObjectId(50f57b5c1d4e714b581674e3), accounts_natural : 144, counts_total : 144, date : ISODate(2011-02-07T05:00:00Z), linked_rat .11, premium_rate : 0, str_date : 2011,01,07, upgrade_rate : 0 ers_avg_linked : 4, users_linked : 16 } _id : ObjectId(50f57b5c1d4e714b581674e4), accounts_natural : 119, counts_total : 119, date : ISODate(2011-02-08T05:00:00Z), linked_rat .13, premium_rate : 0, str_date : 2011,01,08, upgrade_rate : 0 ers_avg_linked : 4.5, users_linked : 18 }
  • 28. 17) Loading Visualization day numbers try: conn = pymongo.Connection('localhost', db = conn['lvanalytics'] accountmetrics.find( cursor = {date : {$gte : dt_from, $lte : dt_to}}).sort(date) urn buildMetricsDict(cursor) except Exception as e: ger.error(e.message) urn the graph object (as a list or a dict of lists) to the view that called the thod edata={} edata['accountsGraph']=mongodb_home.getHomeChart() urn render_to_response('home.html',{'pagedata': pagedata}, text_instance=RequestContext(request)) .homeGraphs.find() _id : ObjectId(50f57b5c1d4e714b581674e2), accounts_natural : 54,
  • 29. Loading Visualization ango and HighCharts pulate the series.. (JavaScript with Django templating) iesOptions[0] = { id: 'naturalAccounts', name: Natural Accounts, data: [ {% for n pagedata.metrics.accounts_natural %} {% if not forloop.first {% endif %} [Date.UTC({{a.0}}),{{a.1}}] {% endfor ], tooltip: { valueDecimals: 2 } };
  • 30. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 31. Loading Visualization ango and HighCharts d Create the Charts and Tables...
  • 32. Lessons Learned • Date Time managed as two fields, Datetime and Date • Aggregating and upserting documents as events are received works for us •  Real-time Map-Reduce in pyMongo - too slow, don’t do this. • Django-noRel - Unstable, use Django and configure MongoDB as a datastore only • Memcached on Django is good enough (at the moment) - use django-celery with rabbitmq to pre-cache all data after data loading •  HighCharts is buggy - considering D3 other libraries • Don’t need to retrieve data directly from MongoDB to Django, perhaps provide all data via a service layer (at the expense of ever-additional features in pyMongo)
  • 33. Next Steps • A/B testing framework, experiments and variances •  Unauthenticated / Authenticated user tracking •  Provide data async over service layer • Segmentation with graphical libraries like D3 Cross-Filter ( http://square.github.com/crossfilter/) • Saving Query Criteria, expanding out BI tools for internal users • MongoDB Connector, Hadoop and Hive (maybe Tableau and other tools) • Storm / Kafka for real-time analytics processing • Shard the Replica-Set, looking into Gizzard as the middleware
  • 34. Hrishi Dixit Chief Technology Officer Kevin Connelly Director of Engineering Will Larche kevin@learnvest.com hrishi@learnvest.com Lead IOS Developer will@learnvest.com Cameron Sim Jeremy Brennan Director of Analytics Tech your name here Director of UI/UX Technology cameron@learnvest.com New Awesome Develope jeremy@learnvest.com you@learnvest.com HIR