SlideShare a Scribd company logo
Um Case de Arquitetura Distribuída
para Indexação, Armazenamento e
 Análise de Logs em Tempo Real

         Juan Lopes
COMPLEX EVENT
   PROCESSING
TIME SERIES
REAL-TIME
LOGS
tdc2012
tdc2012
CENTENAS DE
SERVIDORES
marvin@goldenheart ~ $ ssh root@deepthought
****
WELCOME TO 1 OF YOUR 38,157,987 SERVERS.
TRY THE VEAL. IT'S THE BEST IN THIS FARM.
****

root@deepthought ~ $ tail -f /var/log.txt




                       COMO ACESSAR
                           OS LOGS?
COMO
"DEBUGAR"?
tdc2012
CENTRALIZAR
  INDEXAR
tdc2012
3TB / DIA
3TB / DIA
10.000.000.000 MSGS / DIA
36 MB / SEGUNDO
TWITTER
400.000.000 MSGS / DIA
EM JUNHO/2012
LOGGLY
Amplamente utilizado
Primeira opção para cloud
Maior plano não-custom: 12GB/dia
Preço: $1,779/mês
GRAYLOG2
Open Source
Self-hosted
Arquitetura de partes móveis
MongoDB
ElasticSearch
AMQP
SPLUNK
Famoso na área de BigData
Destinado ao mundo Enterprise
Muitos gráficos e relatórios
$6,000 one-time fee: 500MB/dia
500MB < 3TB :(
tdc2012
JAVA
HOTSPOT
java.util.concurrent
VISÃO GERAL


                            Armazenar


  mensagens   Interpretar


                             Indexar
RFC 3164: SYSLOG

 <34>Oct 11 22:14:15 mymachine su: 'su
 root' failed for lonvick on /dev/pts/8

 <priority = facility*8+severity>
 <date/time>
 <host>
 <process>
 <message>
CHAVE: VALOR

message    <34>Oct 11 22:14:15 mymachine su: 'su root'
           failed for lonvick on /dev/pts/8
text       su, root, failed, for, lonvick, on, /dev/pts/8

facility   AUTH
severity   CRITICAL
date       20121011
time       221415
host       mymachine
process    su
?
MG4J
 Egothor
    Nutch
    Oxyus
  BDDBot
Zilverline
     YaCy
Compass
      Lius
   Regain
 Piscator
 Hounder
 HSearch
<FIELD:CONTENT, DOC*>

TEXT:ABACAXI ➜ 1, 3, 9
TEXT:BANANA ➜ 2, 3, 10, 42
TEXT:CAJU ➜ 3, 11, 50
BAIXA ENTROPIA
<10% de termos únicos
menor overhead por mensagem


        MESSAGE BAG
VISÃO GERAL


                            Armazenar




 Interpretar   Bufferizar




                             Indexar
<DOC, FREQ, POSITION*>

1, 4 ➜ 5, 6, 10, 20
3, 1 ➜ 40
9, 4 ➜ 6, 7, 8, 9
SCORES NÃO
  IMPORTAM
NORMAL
INDEXAÇÃO      BUSCA
       Field   QueryParser
   Document    Query
 IndexWriter   IndexSearcher
HARDCORE
 INDEXAÇÃO       BUSCA
TokenStream      TermPositions
     Document    FieldCache
   IndexWriter   IndexReader
CULPA DA WIDESCREEN




CULPA DA WIDESCREEN
WEB INTERFACE
Jersey (REST API)
Backbone.js
CometD
WEB INTERFACE
Jersey (REST API)
Backbone.js
CometD           "app:apache     http 404"?




               engine                    browser




              "OK. listen:
              /comet/1234568790abcdef"
CULPA DA WIDESCREEN




CULPA DA WIDESCREEN
COMMAND-LINE INTERFACE
Cara e coragem
HttpClient
CometD




       /intelie/lognit-cli
REALTIME (AKA TAIL -F)

          EVENTS




                   subscriber
LIGHTWEIGHT TERM TRIE
ABRAÇO
ABRIGO
CHOCOLATE
                          <RAIZ>




                  ABR              CHOCOLATE



            AÇO         IGO
AGREGAÇÃO (AKA WC -L)

         EVENTS
~10.000 eventos / segundo




http
1 evento / segundo




http => count()
~100 eventos / segundo




http => count()
    by host
~100 eventos / 30 segundos




 http => count()
     by host
every 30 seconds
~100 eventos / 30 segundos




http => avg(cputime#)
        by host
   every 30 seconds
CULPA DA WIDESCREEN




CULPA DA WIDESCREEN
É PRECISO
ESCALAR
taxa de leitura
      MODERADA
 taxa de escrita
   ALTÍSSIMA
dependência entre os dados
         BAIXA
SHARDING


                         engine




                Load
UDP/TCP 514              engine
              Balancer




                         engine
Cluster

engine




engine   engine
Cluster

engine




                   engine
engine
Cluster

engine

                    Web     HTTP
                   Server
engine   engine                    usuário




                   Broker
Cluster

engine

                    Web     HTTP
                   Server
engine   engine                    usuário
Cluster

engine

         Multicast   engine   HTTP



engine                               usuário
tdc2012
MULTICAST
JChannel channel = new JChannel();
channel.setReceiver(new ReceiverAdapter() {
    public void receive(Message msg) {
        System.out.println(
            msg.getSrc() + ": " + msg.getObject());
    }
});

channel.connect("meuCanalDeChat");

BufferedReader reader = new BufferedReader(
                        new InputStreamReader(System.in));
while(true) {
    String line = reader.readLine();
    channel.send(null, line);
}
STACK CONFIGURÁVEL
TUDO ESTÁ DISTRIBUÍDO
BUSCA


 engine
           10

                   mergesort, take 10

          10    last 10 "http_status:   10
 engine
                        404"
                                             usuário
          10


 engine
BUSCA


 engine
           10
                    mergesort, take 10


          10    last 10 "http_status:    10
 engine                  404"
                before {id:84324814}
                                              usuário
          10


 engine
AGREGAÇÃO

        http 200 => count() by host

 host                count
 foo                 1234
 bar                 2345
 baz                 3456
AGREGAÇÃO


   count() + count() + count()




  engine      engine       engine
AGREGAÇÃO

        http 200 => avg(time) by host

 host                 avg_time
 foo                  0.888889
 bar                  0.224568
 baz                  5.623424
AGREGAÇÃO


   avg(time) + avg(time) + avg(time)
                   ?




  engine         engine         engine
AGREGAÇÃO


    sum(time) + sum(time) + sum(time)
 count(time) + count(time) + count(time)




  engine          engine         engine
AGREGAÇÃO


    sum(time) + sum(time) + sum(time)
 count(time) + count(time) + count(time)




  engine          engine         engine
E EM PRODUÇÃO?
3.5BI DE MENSAGENS
1TB DE DADOS ORIGINAIS
    180GB DE ÍNDICE
3 SERVIDORES (LOAD < 0.2)
UMA ÚLTIMA COISA
1700+
          TESTS

99%
LINES
COVERED
OBRIGADO!

     /juanplopes

     @juanplopes

     intelie.com.br

More Related Content

PDF
Devinsampa nginx-scripting
PDF
Haproxy - zastosowania
PDF
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
PDF
Node.js streaming csv downloads proxy
PDF
Object Storage with Gluster
PDF
Redis as a message queue
PDF
Varnish Cache and Django (Falcon, Flask etc)
Devinsampa nginx-scripting
Haproxy - zastosowania
Mасштабирование микросервисов на Go, Matt Heath (Hailo)
Node.js streaming csv downloads proxy
Object Storage with Gluster
Redis as a message queue
Varnish Cache and Django (Falcon, Flask etc)

What's hot (20)

KEY
Streams are Awesome - (Node.js) TimesOpen Sep 2012
PDF
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
PDF
Containers: What are they, Really?
PDF
Quay 3.3 installation
PPTX
OpenShift4 Installation by UPI on kvm
ODP
Web program-peformance-optimization
PDF
CoreOS intro
PDF
Perl Memory Use 201209
PDF
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
PDF
DBD::Gofer 200809
PDF
CoreOS: Control Your Fleet
PDF
Using ngx_lua in UPYUN 2
PPTX
agri inventory - nouka data collector / yaoya data convertor
PDF
Odoo Online platform: architecture and challenges
PDF
Perl at SkyCon'12
PPT
Replica Sets (NYC NoSQL Meetup)
PDF
Fluentd unified logging layer
PDF
Perl Memory Use - LPW2013
ODP
nginx: writing your first module
PDF
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Streams are Awesome - (Node.js) TimesOpen Sep 2012
Using Node.js to Build Great Streaming Services - HTML5 Dev Conf
Containers: What are they, Really?
Quay 3.3 installation
OpenShift4 Installation by UPI on kvm
Web program-peformance-optimization
CoreOS intro
Perl Memory Use 201209
Масштабируемая конфигурация Nginx, Игорь Сысоев (Nginx)
DBD::Gofer 200809
CoreOS: Control Your Fleet
Using ngx_lua in UPYUN 2
agri inventory - nouka data collector / yaoya data convertor
Odoo Online platform: architecture and challenges
Perl at SkyCon'12
Replica Sets (NYC NoSQL Meetup)
Fluentd unified logging layer
Perl Memory Use - LPW2013
nginx: writing your first module
Как понять, что происходит на сервере? / Александр Крижановский (NatSys Lab.,...
Ad

Viewers also liked (7)

PDF
dnarj-20120630
PDF
qconrio2015
PDF
rioinfo2012
PDF
uerj201212
PDF
qconsp2015
PDF
dnarj20130504
PDF
PIPES: Uma linguagem para processamento distribuído de eventos complexos
dnarj-20120630
qconrio2015
rioinfo2012
uerj201212
qconsp2015
dnarj20130504
PIPES: Uma linguagem para processamento distribuído de eventos complexos
Ad

Similar to tdc2012 (20)

KEY
Dev ops for developers
KEY
DevOps for Developers
PDF
Evented programming
PDF
Porque VIM?
PDF
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
PDF
Varnish Oscon 2009
PDF
TechShift: There’s light beyond LAMP
PDF
OpenStack Deployments with Chef
PPTX
Nginx Scalable Stack
PDF
Fisl - Deployment
PDF
Australian OpenStack User Group August 2012: Chef for OpenStack
PPTX
Middleware Services for Search, Online Ads and Recommender
KEY
The story and tech of Read the Docs
PDF
Chef for OpenStack - OpenStack Fall 2012 Summit
PDF
Chef for OpenStack- Fall 2012.pdf
PDF
Splunk's api how we built it
PDF
Deployment de Rails
PDF
Google AppEngine @Open World Forum 2012 - 12 oct.2012
PDF
OWF12/Java Moussine pouchkine Girard
PDF
WEBSEARCHFORAPLANET: THEGOOGLECLUSTER ARCHITECTURE
Dev ops for developers
DevOps for Developers
Evented programming
Porque VIM?
VMworld 2013: Deep Dive into vSphere Log Management with vCenter Log Insight
Varnish Oscon 2009
TechShift: There’s light beyond LAMP
OpenStack Deployments with Chef
Nginx Scalable Stack
Fisl - Deployment
Australian OpenStack User Group August 2012: Chef for OpenStack
Middleware Services for Search, Online Ads and Recommender
The story and tech of Read the Docs
Chef for OpenStack - OpenStack Fall 2012 Summit
Chef for OpenStack- Fall 2012.pdf
Splunk's api how we built it
Deployment de Rails
Google AppEngine @Open World Forum 2012 - 12 oct.2012
OWF12/Java Moussine pouchkine Girard
WEBSEARCHFORAPLANET: THEGOOGLECLUSTER ARCHITECTURE

Recently uploaded (20)

PPTX
A Presentation on Artificial Intelligence
PDF
Machine learning based COVID-19 study performance prediction
PDF
cuic standard and advanced reporting.pdf
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
Electronic commerce courselecture one. Pdf
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
KodekX | Application Modernization Development
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Spectral efficient network and resource selection model in 5G networks
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
A Presentation on Artificial Intelligence
Machine learning based COVID-19 study performance prediction
cuic standard and advanced reporting.pdf
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
Electronic commerce courselecture one. Pdf
Unlocking AI with Model Context Protocol (MCP)
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Blue Purple Modern Animated Computer Science Presentation.pdf.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Understanding_Digital_Forensics_Presentation.pptx
Per capita expenditure prediction using model stacking based on satellite ima...
KodekX | Application Modernization Development
Encapsulation_ Review paper, used for researhc scholars
Spectral efficient network and resource selection model in 5G networks
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication

tdc2012