SlideShare a Scribd company logo
The basics of Fluentd

            Masahiro Nakagawa
                  Treasuare Data, Inc.
             Senior Software Engineer
Structured logging

                       Reliable forwarding

http://fluentd.org/   Pluggable architecture
Agenda



>   Background
>   Overview
>   Product Comparison
>   Use cases
Background
Data Processing
                       Data source


   Collect   Store   Process   Visualize




 Reporting
Monitoring
Related Products
            easier & shorter time


  Collect   Store Process            Visualize




???         Cloudera                Excel
            Horton Works            Tableau
            Treasure Data           R
The basics of fluentd
Before Fluentd
 Server1           Server2               Server3

Application       Application          Application


           ・・・               ・・・                   ・・・




                                   High Latency!
                                   must wait for a day...
                  Fluent
                 Log Server
After Fluentd
 Server1                Server2              Server3

Application        Application              Application


 Fluentd   ・・・          Fluentd   ・・・        Fluentd   ・・・




                                        In streaming!

              Fluentd             Fluentd
Overview
In short


>   Open sourced log collector written in Ruby
>   Using rubygems ecosystem for plugins


    It’s like syslogd, but
uses JSON for log messages
Time       2012-12-11 07:26:27
     Apache                                                                Tag          apache.log
                                                                          Record {
                                                                                     "host": "127.0.0.1",
                                                                      tail           "method": "GET",
                                                                                     "path": "/",
                     write                                                           ...
                                                                                 }
                                                                                           insert
127.0.0.1
127.0.0.1
127.0.0.1
            -
            -
            -
                -
                -
                -
                    [11/Dec/2012:07:26:27]
                    [11/Dec/2012:07:26:30]
                    [11/Dec/2012:07:26:32]
                                             "GET
                                             "GET
                                             "GET
                                                    /
                                                    /
                                                    /
                                                        ...
                                                        ...
                                                        ...
                                                                     Fluentd
127.0.0.1   -   -   [11/Dec/2012:07:26:40]   "GET   /   ...
127.0.0.1   -   -   [11/Dec/2012:07:27:01]   "GET   /   ...
                             ...



                                                               event
                                                              buffering
                                                                                     Mongo
Event structure(log message)


✓ Time                    ✓ Record
>   second unit           >   JSON format
>   from data source or       >   MessagePack
    adding parsed time            internally
✓ Tag                         >   non-unstructured
>   for message routing
Architecture
Pluggable      Pluggable   Pluggable



  Input         Buffer     Output

> Forward      > Memory    > Forward
> HTTP         > File      > File
> File tail                > Amazon S3
> dstat                    > MongoDB
> ...                      > ...
Client libraries
> Ruby
> Java           Application
> Perl
> PHP
                                  Time:Tag:Record
> Python
>D
> Scala
> ...
                 Fluentd
# Ruby
Fluent.open(“myapp”)
Fluent.event(“login”, {“user” => 38})
#=> 2012-12-11 07:56:01 myapp.login     {“user”:38}
Configuration and operation


>   No central / master node
    >   HTTP include helps conf sharing
>   Operation depends on your environment
    >   Use your deamon management
    >   Use chef in Treasure Data
>   Scribe like syntax
# receive events via HTTP       # save alerts to a file
<source>                        <match alert.**>
 type http                       type file
 port 8888                       path /var/log/fluent/alerts
</source>                       </match>

# read logs from a file          # forward other logs to servers
<source>                        <match **>
 type tail                       type forward
 path /var/log/httpd.log         <server>
 format apache                     host 192.168.0.11
 tag apache.access                 weight 20
</source>                        </server>
                                 <server>
# save access logs to MongoDB      host 192.168.0.12
<match apache.access>              weight 60
 type mongo                      </server>
 database apache                </match>
 collection log
</match>                        include http://example.com/conf
Reliability (core + plugin)

>   Buffering
    >   Use file buffer for persistent data
    >   buffer chunk has ID for idempotent
>   Retrying
>   Error handling
    >   transaction, failover, etc on forward plugin
    >   secondary
Plugins - use rubygems


$ fluent-gem search -rd fluent-plugin


$ fluent-gem search -rd fluent-mixin


$ fluent-gem install fluent-plugin-mongo

※ Today, don’t talk the plugin development
http://fluentd.org/plugin/
in_tail


     Apache                    Fluentd
                                                ✓ read a log file
                                                ✓ custom regexp
                                                ✓ custom parser in Ruby
       access.log

Supported format:
 >   apache         >   json
 >   apache2        >   csv
 >   syslog         >   tsv
 >   nginx          >   ltsv (since v0.10.32)
out_mongo


  Apache        Fluentd



   access.log      buffer


                            ✓ retry automatically
                            ✓ exponential retry wait
                            ✓ persistent on a file
out_webhdfs                                 ✓ custom text formatter



  Apache                   Fluentd



   access.log                    buffer
                                                  HDFS


    ✓ slice files based on time            ✓ retry automatically
      2013-01-01/01/access.log.gz         ✓ exponential retry wait
      2013-01-01/02/access.log.gz         ✓ persistent on a file
      2013-01-01/03/access.log.gz
      ...
out_copy + other plugins
                                       Hadoop
  Apache        Fluentd



   access.log      buffer

                                    Amazon S3

                          ✓ routing based on tags
                          ✓ copy to multiple storages
out_forward            ✓ automatic fail-over
                       ✓ load balancing

                                       Fluentd
    apache
  Apache        Fluentd
                                       Fluentd

                                       Fluentd
   access.log      buffer


                            ✓ retry automatically
                            ✓ exponential retry wait
                            ✓ persistent on a file
Forward topology


Fluentd   send/ack


Fluentd          Fluentd   send/ack

                                 Fluentd
Fluentd
                 Fluentd

Fluentd
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
Access logs                               Alerting
 Apache                                     Nagios

App logs                                  Analysis
 Frontend                                  MongoDB
 Backend                                   MySQL

System logs                                Hadoop
  syslogd
              filter / buffer / routing   Archiving
Databases                                  Amazon S3
td-agent

>   Open sourced distribution package of fluentd
>   ETL part of Treasure Data
>   Including useful components
    >   ruby, jemalloc, fluentd
    >   3rd party gems: td, mongo, webhdfs, etc...
        td plugin is for TD

>   http://packages.treasure-data.com/
v11

>   Breaking source code compatibility
    >   Not protocol.
>   Windows support
>   Error handling enhancement
>   Better DSL configuration
>   etc: https://gist.github.com/frsyuki/2638703
Product Comparison
Scribe
     Scribe: log collector by Facebook
Pros and Cons

>
●   Pros
    >   Fast (written in C++)
>
●   Cons
    >   Hard to install and extend
        Are you a C++ magician?
    >   Deal with unstructured logs
    >   No longer maintained
        Replaced with Calligraphus at Facebook
Flume
Flume: distributed log collector by Cloudera

  Phisical           Flume Master
 Topology

             Flume      Flume       Flume



  Logical
 Topology                                      Hadoop
                                                HDFS
Network topology
                    Master
           Agent                        ack

           Agent   Collector
Flume OG                                Collector
           Agent   Collector     send
           Agent

                    Master     Option
           Agent
                                 send/ack
           Agent   Collector
Flume NG                                Collector
           Agent   Collector
           Agent
Pros and Cons

>
●   Pros
    >    Using central master to manage all nodes
>
●   Cons
    >    Java culture (Pros for Java-er?)
        Difficult configuration and setup
    >    Difficult topology
    >    Mainly for Hadoop
        less plugins?
Use cases
Treasure Data

   Frontend                           Worker
                          Job Queue                      Hadoop




                                                         Hadoop

 Applications push
 metrics to Fluentd
                                                   sums up data minutes
 (via local Fluentd)       Fluentd    Fluentd      (partial aggregation)


       Treasure                                 Librato
           Data                                 Metrics
for historical analysis                         for realtime analysis
Cookpad
hundreds of app servers


  Rails app           td-agent
               sends event logs                            Daily/Hourly      Google
                                                           Batch           Spreadsheet

  Rails app           td-agent             Treasure Data
               sends event logs
                                                                            MySQL

  Rails app           td-agent
                                  Logs are available
               sends event logs
                                  after several mins.

                                                                     KPI
                                  Feedback rankings        visualization
  Unlimited scalability
  Flexible schema
  Realtime
  Less performance impact               ✓ Over 100 RoR servers (2012/2/4)
NHN Japan
                                                                                       Archive
                                                                                       Storage
        Web
       Servers                               Fluentd                                  (scribed)
                                             Cluster
                                                                                      Notifications
                               STREAM                                                    (IRC)
                                                           Fluentd
                                                           Watchers
                                                                                          Graph
                                                                                          Tools

                               webhdfs                                        SCHEDULED
                                                              BATCH              BATCH
✓ 16 nodes               Hadoop Cluster               hive
                                                     server
✓ 120,000+ lines/sec
                             CDH4                                        Shib              ShibUI
✓ 400Mbps at peak                                  Huahin
✓ 1.5+ TB/day (raw)      (HDFS, YARN)              Manager

           http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
Other companies
Conclusion


>   Fluentd is a widely-used log collector
    >   There are many use cases
    >   Many contributors and plugins
>   Keep it simple
    >   Easy to integrate your environment

More Related Content

PDF
Fluentd Overview, Now and Then
PDF
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
PDF
The basics of fluentd
PDF
アーキテクチャから理解するPostgreSQLのレプリケーション
PDF
5ステップで始めるPostgreSQLレプリケーション@hbstudy#13
PDF
Fluentd v1.0 in a nutshell
PDF
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
PPTX
Introduction to rook
Fluentd Overview, Now and Then
Apache kafka 모니터링을 위한 Metrics 이해 및 최적화 방안
The basics of fluentd
アーキテクチャから理解するPostgreSQLのレプリケーション
5ステップで始めるPostgreSQLレプリケーション@hbstudy#13
Fluentd v1.0 in a nutshell
PostgreSQLのリカバリ超入門(もしくはWAL、CHECKPOINT、オンラインバックアップの仕組み)
Introduction to rook

What's hot (20)

PDF
The Patterns of Distributed Logging and Containers
PDF
Blazing Performance with Flame Graphs
PDF
Fluentd 101
PDF
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Confer...
PDF
카프카, 산전수전 노하우
PDF
A Deep Dive into Kafka Controller
PDF
分散ストレージソフトウェアCeph・アーキテクチャー概要
PDF
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
PDF
MySQLとPostgreSQLの基本的なレプリケーション設定比較
PDF
Linux Networking Explained
PDF
PostgreSQLレプリケーション(pgcon17j_t4)
PDF
Apache Kafka Fundamentals for Architects, Admins and Developers
PPTX
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
PPTX
NGINX Installation and Tuning
PDF
twMVC#44 讓我們用 k6 來進行壓測吧
PDF
Loki - like prometheus, but for logs
PDF
PostgreSQL13でのpg_basebackupの改善について(第13回PostgreSQLアンカンファレンス@オンライン)
PPT
Cassandraのしくみ データの読み書き編
PPTX
Apache BigtopによるHadoopエコシステムのパッケージング(Open Source Conference 2021 Online/Osaka...
PDF
10分でわかる Cilium と XDP / BPF
The Patterns of Distributed Logging and Containers
Blazing Performance with Flame Graphs
Fluentd 101
PacemakerのMaster/Slave構成の基本と事例紹介(DRBD、PostgreSQLレプリケーション) @Open Source Confer...
카프카, 산전수전 노하우
A Deep Dive into Kafka Controller
分散ストレージソフトウェアCeph・アーキテクチャー概要
[오픈소스컨설팅] EFK Stack 소개와 설치 방법
MySQLとPostgreSQLの基本的なレプリケーション設定比較
Linux Networking Explained
PostgreSQLレプリケーション(pgcon17j_t4)
Apache Kafka Fundamentals for Architects, Admins and Developers
NginxとLuaを用いた動的なリバースプロキシでデプロイを 100 倍速くした
NGINX Installation and Tuning
twMVC#44 讓我們用 k6 來進行壓測吧
Loki - like prometheus, but for logs
PostgreSQL13でのpg_basebackupの改善について(第13回PostgreSQLアンカンファレンス@オンライン)
Cassandraのしくみ データの読み書き編
Apache BigtopによるHadoopエコシステムのパッケージング(Open Source Conference 2021 Online/Osaka...
10分でわかる Cilium と XDP / BPF
Ad

Viewers also liked (20)

PPTX
Life of an Fluentd event
PDF
Fluentd Hacking Guide at RubyKaigi 2014
PDF
Fluentd v0.14 Plugin API Details
PDF
Fluentd in Co-Work
PDF
Dive into Fluentd plugin v0.12
PDF
Fluentd and Kafka
PDF
Should we write such like plugin or not?
PDF
Fluentd vs. Logstash for OpenStack Log Management
PPTX
Fluentd/LogStash + elastic search + kibana
PDF
Fluentd Meetup 2016 - ServerEngine Integration & Windows support
PDF
Distributed Stream Processing on Fluentd / #fluentd
PDF
Fluentd v0.14 Overview
PDF
Fluentd message forwarding with authentication and encryption
PDF
Packaging Ecosystems -Monki Gras 2017
PDF
Docker and Fluentd
PDF
如何选择 Docker 监控方案
PDF
Ibm dnt-dcos-v9-3
PDF
fluentd設定行数とシステム複雑性のカジュアルな話
PDF
Fluentd and docker monitoring
PDF
Fluentd meetup dive into fluent plugin (outdated)
Life of an Fluentd event
Fluentd Hacking Guide at RubyKaigi 2014
Fluentd v0.14 Plugin API Details
Fluentd in Co-Work
Dive into Fluentd plugin v0.12
Fluentd and Kafka
Should we write such like plugin or not?
Fluentd vs. Logstash for OpenStack Log Management
Fluentd/LogStash + elastic search + kibana
Fluentd Meetup 2016 - ServerEngine Integration & Windows support
Distributed Stream Processing on Fluentd / #fluentd
Fluentd v0.14 Overview
Fluentd message forwarding with authentication and encryption
Packaging Ecosystems -Monki Gras 2017
Docker and Fluentd
如何选择 Docker 监控方案
Ibm dnt-dcos-v9-3
fluentd設定行数とシステム複雑性のカジュアルな話
Fluentd and docker monitoring
Fluentd meetup dive into fluent plugin (outdated)
Ad

Similar to The basics of fluentd (20)

PDF
Fluentd meetup
PDF
Fluentd - RubyKansai 65
PDF
Fluentd meetup at Slideshare
PDF
Fluentd and Embulk Game Server 4
PDF
fluentd -- the missing log collector
PDF
Like loggly using open source
PDF
Logging for Production Systems in The Container Era
PDF
Fluentd Project Intro at Kubecon 2019 EU
PDF
Melbourne Infracoders: Compliance as Code with InSpec
PDF
Fluentd meetup #2
PDF
upload test 1
ODP
IT Operations for Web Developers
PPT
Why Managed Service Providers Should Embrace Container Technology
PDF
Collect distributed application logging using fluentd (EFK stack)
PDF
Odoo command line interface
PPTX
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
PDF
Fluentd Unified Logging Layer At Fossasia
KEY
Apache Wizardry - Ohio Linux 2011
PDF
Fluentd meetup in japan
PDF
How to collect Big Data into Hadoop
Fluentd meetup
Fluentd - RubyKansai 65
Fluentd meetup at Slideshare
Fluentd and Embulk Game Server 4
fluentd -- the missing log collector
Like loggly using open source
Logging for Production Systems in The Container Era
Fluentd Project Intro at Kubecon 2019 EU
Melbourne Infracoders: Compliance as Code with InSpec
Fluentd meetup #2
upload test 1
IT Operations for Web Developers
Why Managed Service Providers Should Embrace Container Technology
Collect distributed application logging using fluentd (EFK stack)
Odoo command line interface
Big Data Day LA 2016/ Big Data Track - Fluentd and Embulk: Collect More Data,...
Fluentd Unified Logging Layer At Fossasia
Apache Wizardry - Ohio Linux 2011
Fluentd meetup in japan
How to collect Big Data into Hadoop

More from Treasure Data, Inc. (20)

PPTX
GDPR: A Practical Guide for Marketers
PPTX
AR and VR by the Numbers: A Data First Approach to the Technology and Market
PPTX
Introduction to Customer Data Platforms
PPTX
Hands On: Javascript SDK
PPTX
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
PPTX
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
PPTX
How to Power Your Customer Experience with Data
PPTX
Why Your VR Game is Virtually Useless Without Data
PDF
Connecting the Customer Data Dots
PPTX
Harnessing Data for Better Customer Experience and Company Success
PDF
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
PDF
Keynote - Fluentd meetup v14
PDF
Introduction to New features and Use cases of Hivemall
PDF
Scalable Hadoop in the cloud
PDF
Using Embulk at Treasure Data
PDF
Scaling to Infinity - Open Source meets Big Data
PDF
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
PDF
Treasure Data From MySQL to Redshift
PDF
Unifying Events and Logs into the Cloud
PDF
Fluentd and Docker - running fluentd within a docker container
GDPR: A Practical Guide for Marketers
AR and VR by the Numbers: A Data First Approach to the Technology and Market
Introduction to Customer Data Platforms
Hands On: Javascript SDK
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
How to Power Your Customer Experience with Data
Why Your VR Game is Virtually Useless Without Data
Connecting the Customer Data Dots
Harnessing Data for Better Customer Experience and Company Success
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
Keynote - Fluentd meetup v14
Introduction to New features and Use cases of Hivemall
Scalable Hadoop in the cloud
Using Embulk at Treasure Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data From MySQL to Redshift
Unifying Events and Logs into the Cloud
Fluentd and Docker - running fluentd within a docker container

Recently uploaded (20)

PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PPTX
MYSQL Presentation for SQL database connectivity
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Modernizing your data center with Dell and AMD
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
GamePlan Trading System Review: Professional Trader's Honest Take
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Electronic commerce courselecture one. Pdf
PPTX
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
PDF
NewMind AI Weekly Chronicles - August'25 Week I
PDF
Empathic Computing: Creating Shared Understanding
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
MYSQL Presentation for SQL database connectivity
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
Dropbox Q2 2025 Financial Results & Investor Presentation
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
The AUB Centre for AI in Media Proposal.docx
Spectral efficient network and resource selection model in 5G networks
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Modernizing your data center with Dell and AMD
Review of recent advances in non-invasive hemoglobin estimation
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
GamePlan Trading System Review: Professional Trader's Honest Take
CIFDAQ's Market Insight: SEC Turns Pro Crypto
Chapter 3 Spatial Domain Image Processing.pdf
Electronic commerce courselecture one. Pdf
breach-and-attack-simulation-cybersecurity-india-chennai-defenderrabbit-2025....
NewMind AI Weekly Chronicles - August'25 Week I
Empathic Computing: Creating Shared Understanding

The basics of fluentd

  • 1. The basics of Fluentd Masahiro Nakagawa Treasuare Data, Inc. Senior Software Engineer
  • 2. Structured logging Reliable forwarding http://fluentd.org/ Pluggable architecture
  • 3. Agenda > Background > Overview > Product Comparison > Use cases
  • 5. Data Processing Data source Collect Store Process Visualize Reporting Monitoring
  • 6. Related Products easier & shorter time Collect Store Process Visualize ??? Cloudera Excel Horton Works Tableau Treasure Data R
  • 8. Before Fluentd Server1 Server2 Server3 Application Application Application ・・・ ・・・ ・・・ High Latency! must wait for a day... Fluent Log Server
  • 9. After Fluentd Server1 Server2 Server3 Application Application Application Fluentd ・・・ Fluentd ・・・ Fluentd ・・・ In streaming! Fluentd Fluentd
  • 11. In short > Open sourced log collector written in Ruby > Using rubygems ecosystem for plugins It’s like syslogd, but uses JSON for log messages
  • 12. Time 2012-12-11 07:26:27 Apache Tag apache.log Record { "host": "127.0.0.1", tail "method": "GET", "path": "/", write ... } insert 127.0.0.1 127.0.0.1 127.0.0.1 - - - - - - [11/Dec/2012:07:26:27] [11/Dec/2012:07:26:30] [11/Dec/2012:07:26:32] "GET "GET "GET / / / ... ... ... Fluentd 127.0.0.1 - - [11/Dec/2012:07:26:40] "GET / ... 127.0.0.1 - - [11/Dec/2012:07:27:01] "GET / ... ... event buffering Mongo
  • 13. Event structure(log message) ✓ Time ✓ Record > second unit > JSON format > from data source or > MessagePack adding parsed time internally ✓ Tag > non-unstructured > for message routing
  • 14. Architecture Pluggable Pluggable Pluggable Input Buffer Output > Forward > Memory > Forward > HTTP > File > File > File tail > Amazon S3 > dstat > MongoDB > ... > ...
  • 15. Client libraries > Ruby > Java Application > Perl > PHP Time:Tag:Record > Python >D > Scala > ... Fluentd # Ruby Fluent.open(“myapp”) Fluent.event(“login”, {“user” => 38}) #=> 2012-12-11 07:56:01 myapp.login {“user”:38}
  • 16. Configuration and operation > No central / master node > HTTP include helps conf sharing > Operation depends on your environment > Use your deamon management > Use chef in Treasure Data > Scribe like syntax
  • 17. # receive events via HTTP # save alerts to a file <source> <match alert.**> type http type file port 8888 path /var/log/fluent/alerts </source> </match> # read logs from a file # forward other logs to servers <source> <match **> type tail type forward path /var/log/httpd.log <server> format apache host 192.168.0.11 tag apache.access weight 20 </source> </server> <server> # save access logs to MongoDB host 192.168.0.12 <match apache.access> weight 60 type mongo </server> database apache </match> collection log </match> include http://example.com/conf
  • 18. Reliability (core + plugin) > Buffering > Use file buffer for persistent data > buffer chunk has ID for idempotent > Retrying > Error handling > transaction, failover, etc on forward plugin > secondary
  • 19. Plugins - use rubygems $ fluent-gem search -rd fluent-plugin $ fluent-gem search -rd fluent-mixin $ fluent-gem install fluent-plugin-mongo ※ Today, don’t talk the plugin development
  • 21. in_tail Apache Fluentd ✓ read a log file ✓ custom regexp ✓ custom parser in Ruby access.log Supported format: > apache > json > apache2 > csv > syslog > tsv > nginx > ltsv (since v0.10.32)
  • 22. out_mongo Apache Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  • 23. out_webhdfs ✓ custom text formatter Apache Fluentd access.log buffer HDFS ✓ slice files based on time ✓ retry automatically 2013-01-01/01/access.log.gz ✓ exponential retry wait 2013-01-01/02/access.log.gz ✓ persistent on a file 2013-01-01/03/access.log.gz ...
  • 24. out_copy + other plugins Hadoop Apache Fluentd access.log buffer Amazon S3 ✓ routing based on tags ✓ copy to multiple storages
  • 25. out_forward ✓ automatic fail-over ✓ load balancing Fluentd apache Apache Fluentd Fluentd Fluentd access.log buffer ✓ retry automatically ✓ exponential retry wait ✓ persistent on a file
  • 26. Forward topology Fluentd send/ack Fluentd Fluentd send/ack Fluentd Fluentd Fluentd Fluentd
  • 27. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 28. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 29. Access logs Alerting Apache Nagios App logs Analysis Frontend MongoDB Backend MySQL System logs Hadoop syslogd filter / buffer / routing Archiving Databases Amazon S3
  • 30. td-agent > Open sourced distribution package of fluentd > ETL part of Treasure Data > Including useful components > ruby, jemalloc, fluentd > 3rd party gems: td, mongo, webhdfs, etc... td plugin is for TD > http://packages.treasure-data.com/
  • 31. v11 > Breaking source code compatibility > Not protocol. > Windows support > Error handling enhancement > Better DSL configuration > etc: https://gist.github.com/frsyuki/2638703
  • 33. Scribe Scribe: log collector by Facebook
  • 34. Pros and Cons > ● Pros > Fast (written in C++) > ● Cons > Hard to install and extend Are you a C++ magician? > Deal with unstructured logs > No longer maintained Replaced with Calligraphus at Facebook
  • 35. Flume Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Topology Hadoop HDFS
  • 36. Network topology Master Agent ack Agent Collector Flume OG Collector Agent Collector send Agent Master Option Agent send/ack Agent Collector Flume NG Collector Agent Collector Agent
  • 37. Pros and Cons > ● Pros > Using central master to manage all nodes > ● Cons > Java culture (Pros for Java-er?) Difficult configuration and setup > Difficult topology > Mainly for Hadoop less plugins?
  • 39. Treasure Data Frontend Worker Job Queue Hadoop Hadoop Applications push metrics to Fluentd sums up data minutes (via local Fluentd) Fluentd Fluentd (partial aggregation) Treasure Librato Data Metrics for historical analysis for realtime analysis
  • 40. Cookpad hundreds of app servers Rails app td-agent sends event logs Daily/Hourly Google Batch Spreadsheet Rails app td-agent Treasure Data sends event logs MySQL Rails app td-agent Logs are available sends event logs after several mins. KPI Feedback rankings visualization Unlimited scalability Flexible schema Realtime Less performance impact ✓ Over 100 RoR servers (2012/2/4)
  • 41. NHN Japan Archive Storage Web Servers Fluentd (scribed) Cluster Notifications STREAM (IRC) Fluentd Watchers Graph Tools webhdfs SCHEDULED BATCH BATCH ✓ 16 nodes Hadoop Cluster hive server ✓ 120,000+ lines/sec CDH4 Shib ShibUI ✓ 400Mbps at peak Huahin ✓ 1.5+ TB/day (raw) (HDFS, YARN) Manager http://www.slideshare.net/tagomoris/log-analysis-with-hadoop-in-livedoor-2013 by @tagomoris
  • 43. Conclusion > Fluentd is a widely-used log collector > There are many use cases > Many contributors and plugins > Keep it simple > Easy to integrate your environment