SlideShare a Scribd company logo
Fluentd ♥ MongoDB
                          Log Everything As JSON


                 Kazuki Ohta, CTO at Treasure Data, Inc.




Tuesday, July 17, 2012
Self-Introduction
           •       Kazuki Ohta
                   >     twitter: @kzk_mover
                   >     github: kzk

           •       Treasure Data, Inc.
                   >     Chief Technology Officer; Founder
                   >     Original Fluentd Author @frsyuki is another co-founder.

           •       Open-Source Enthusiast
                   >     KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc.
                   >     Fluentd rpm/deb package manager
                                                                              2
Tuesday, July 17, 2012
Logging? Why?




Tuesday, July 17, 2012
Figure 1: Common Logging Purposes




                                                  Analytics

                                                  Error Notification

                                                  Recommendation


                                                                   4
Tuesday, July 17, 2012
Figure 2: Types of Logs




                                           App Log

                                           Access Log
                                           (Apache, Rails, etc.)
                                           System Log
                                           (syslog etc.)
                                           Others
                                                                   5
Tuesday, July 17, 2012
From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
Fragile for format change,
                         No type information,
                         No field name, etc.


                         From “Scaling Lessons learned at Dropbox”
                                                            6
Tuesday, July 17, 2012
About Fluentd




Tuesday, July 17, 2012
It's like syslogd, but uses JSON for log
                 messages


                                                            8
Tuesday, July 17, 2012
Logs in JSON? Why?

                     1. Machine-Readable
                     > machine is goint to be a main consumer of logs


                     2. Schema-Free
                     > you want to add/remove fields from logs at anytime



    Write Logs for Machines, use JSON
    http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/
                                                                            9
Tuesday, July 17, 2012
Logs As TEXT


   Logs As JSON



                         + Field Name
                         + No Custom Parser
                         + Type Information
                         + Schema Free

                                       10
Tuesday, July 17, 2012
Logs As TEXT
         “2011-04-01 host1 myapp: cmessage size=12MB user=me”


   Logs As JSON
                         2011-04-01 myapp.message {
                             “on_host”: ”host1”,
                             ”combined”: true,
                             “size”: 12000000,     + Field Name
                             “user”: “me”          + No Custom Parser
                                                   + Type Information
                         }                         + Schema Free

                                                                 10
Tuesday, July 17, 2012
http://fluentd.org/




                                              11
Tuesday, July 17, 2012
•       Website
                   >     http://fluentd.org/

           •       Community
                   >     http://github.com/fluent
                   >     16 committers across
                         many organizations
                   >     web, game, enterprise

           •       Mailing list
                   >     Google groups

                                                   12
Tuesday, July 17, 2012
Fluentd Architecture




Tuesday, July 17, 2012
Fluentd: Log Format

                         Application



                          Fluentd




                          Storage


                                                14
Tuesday, July 17, 2012
Fluentd: Log Format

                         Application

                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }


                                                                   14
Tuesday, July 17, 2012
Fluentd: Log Format

                                                       time
                         Application                    tag
                                       2012-02-04 01:33:51
                                       myapp.buylog {
                          Fluentd
                                           “user”: ”me”,
                                           “path”: “/buyItem”,
                                           “price”: 150,
                                           “referer”: “/landing”
                          Storage      }
                                                    record

                                                                   14
Tuesday, July 17, 2012
Fluentd: Plugins

                             Application



                                           filter / buffer /
                              Fluentd
                                           routing




                              Storage


                                                              15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        15
Tuesday, July 17, 2012
Fluentd: Plugins

                                       Application



                                                     filter / buffer /
                                        Fluentd
                                                     routing




                          SaaS          Storage            Fluentd

                         Plug-in        Plug-in           Plug-in
                                                                        16
Tuesday, July 17, 2012
Fluentd: Plugins

            syslogd         Scribe     Application          File Plug-in

                                                     tail
           Plug-in Plug-in
                                                      filter / buffer /
                                        Fluentd
                                                      routing




                          SaaS          Storage                Fluentd

                         Plug-in        Plug-in               Plug-in
                                                                           16
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl             Application         Buffering

                   > PHP
                                            HTTP / TCP / UDS
                   > Python
                   > Java              Fluentd
                   > ...




                                                                17
Tuesday, July 17, 2012
•       Client libraries
                   > Ruby
                   > Perl               Application         Buffering

                   > PHP
                                              HTTP / TCP / UDS
                   > Python
                   > Java                Fluentd
                   > ...


            Fluent.open(“myapp”)
            Fluent.event(“login”, {“user”=>38})
            #=> 2012-02-04 04:56:01 myapp.login    {“user”:38}

                                                                  17
Tuesday, July 17, 2012
Typical Log Collection by `rsync`




               Burst of traffic
               rsync consumes
               all bandwidth



                                                             18
Tuesday, July 17, 2012
Typical Log Collection by `rsync`
                     App server              App server              App server

                   Application              Application            Application


               File File File ...          File File File ...     File File File ...


                                    File
               Burst of traffic                                 High latency
               rsync consumes                                   must wait for a day
               all bandwidth                 Log server         Hard to analyze
                                                                complex text parsers

                                                                                  18
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd




                                                                   19
Tuesday, July 17, 2012
Log Collection using Fluentd

                         Fluentd        Fluentd          Fluentd



                                                       Realtime!
                                   Fluentd   Fluentd



                                              Amazon     Ready to
                         Hadoop    Mongo
                                               S3 /
                          / Hive    DB
                                               EMR       Analyze!

                                                                    19
Tuesday, July 17, 2012
Fluentd Case Study
               Ruby on Rails              Ruby on Rails          Ruby on Rails


                         Fluentd              Fluentd               Fluentd




      ✓    127 RoR servers
      ✓    100,000 msgs/sec             Fluentd    Fluentd      routing
      ✓    120Mbps at peak
      ✓    1TB/day

                                      Hadoop            Mongo     User behavior
                           PV logs     / Hive            DB       logs

                                                                                 20
Tuesday, July 17, 2012
# read logs from a file         # forward other logs to servers
      <source>                        # (load-balancing + fail-over)
        type tail                     <match **>
        path /var/log/httpd.log         type forward
        format apache                   <server>
        tag apache.access                 host 192.168.0.11
      </source>                           weight 20
                                        </server>
      # save access logs to MongoDB     <server>
      <match apache.access>               host 192.168.0.12
        type mongo                        weight 60
        host 127.0.0.1                  </server>
      </match>                        </match>




Tuesday, July 17, 2012
Comparison




Tuesday, July 17, 2012
Scribe: log collector by
                               Facebook
                         Frontend servers

                                            Aggregator nodes
                             scribe
                                                scribe
                             scribe
                                                               Hadoop
                                                                HDFS
                             scribe
                                                scribe
                             scribe

                                                                        23
Tuesday, July 17, 2012
Scribe’s Pros & Cons
                • Pros.
                         • Fast (written in C++)
                • Cons.
                         • VERY HARD to install
                            • nightmare of boost, thrift, libhdfs, etc.
                         • Unstructured Logs
                            • parsing must be required before the analysis
                         • Hard to extend
                            • recompiling C++ programs are required
                         • No longer maintained

                                                                             24
Tuesday, July 17, 2012
Fluentd vs Scribe
                • Easy to install
                         • “gem install fluentd”
                         • Stable RPM and Deb packages
                           • http://packages.treasure-data.com/
                • Easy to write plugins
                         • you can use Ruby
                • Easy plugin distribution
                         • “gem search -rd fluent-plugin”


                                                                  25
Tuesday, July 17, 2012
Flume: distributed log collector by Cloudera

           Phisical
                                 Flume Master
          Topology

                         Flume      Flume       Flume




           Logical                                      Hadoop
          Topology                                       HDFS


                                                             26
Tuesday, July 17, 2012
Flume’s Pros & Cons
                • Pros.
                         • Central master server manages all nodes
                • Cons.
                         • Difficult to understand
                            • logical topologies, phisical servers and a
                              configuration of the logical/phisical mapping
                         • Difficult to configure
                            • replicated master servers, log servers and agents
                         • Big footprint
                            • 50,000 lines of Java

                                                                                  27
Tuesday, July 17, 2012
Fluentd vs Flume
                 • Easy to understand
                         • “syslogd that understands JSON”
                 • Easy to setup
                         • “sudo fluentd --setup && fluentd”
                 • Very small footprint
                         • small engine (3,000) lines + plugins
                         • small, but battle-tested!
                 • Easy to configure


                                                                  28
Tuesday, July 17, 2012
Fluentd           Scribe           Flume
          Installation          gem/rpm/deb          make          jar/rpm/deb

                                 3000 lines of    8000 lines of   50,000 lines of
          Footprint                 Ruby             C++              Java

          Plugin                    Ruby              N/A             Java

          Plugin distribution   RubyGems.org          N/A              N/A

          Master Server              No               No               Yes

          License               Apache License   Apache License   Apache License


                                                                                 29
Tuesday, July 17, 2012
Fluentd Plugin for




Tuesday, July 17, 2012
fluent-plugin-mongo
                • Included within rpm/deb by default!
                         • http://github.com/fluent/fluent-plugin-mongo
                • #1 plugin among 50+ Fluentd plugins
                         • Logs As JSON. WHY NOT Put Them Into Mongo??
                         • http://fluentd.org/plugin/
                • Supports most of the MongoDB features
                         • Authentication
                         • ReplicaSet
                         • Capped Collection

                                                                          31
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
• MongoDB Output Plugin
                     Application                           • Maintain JSON Structure
                                                           • Reliable Buffering
                                                           • Batch Insertion
                         Fluentd       Buffering           • Handle Broken Records
                                                             • Ruby Driver #82
                             Authentication


                         MongoDB              MongoDB               MongoDB    MongoDB
                                                                    MongoDB    MongoDB
                     Single Instance                                MongoDB    MongoDB
                    (Capped or Not)     MongoDB     MongoDB
                                                                          Sharding
                                              ReplicaSet

                                                                                     32
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
ReplicaSet
                                          (Capped Collection)
             Single Instance
           (Capped Collection)                MongoDB

                    MongoDB          MongoDB        MongoDB


                         Authentication


                    Fluentd          Buffering
                                                        • MongoDB Input Plugin
                                                           • Tailing Capped Collections


                                                                                    33
Tuesday, July 17, 2012
Realtime Analytics with Fluentd + MongoDB

                          App                    App                 App


                         Fluentd             Fluentd                Fluentd




                             routing   Fluentd         Fluentd


          Nagios, Zabbix, etc.
                                            Mongo          query
                                                                   Charting
                         Alert               DB
                                                                              34
Tuesday, July 17, 2012
Realtime or Batch? No, BOTH!

                          App                          App                 App


                         Fluentd                   Fluentd                Fluentd




                             routing         Fluentd         Fluentd




        Hadoop                     Amazon         Mongo          query
                                                                         Charting
         / Hive                      S3            DB
             batch                 archive         realtime                         35
Tuesday, July 17, 2012
Intro of our company’s service: Treasure Data

                          App                    App                    App


                         Fluentd             Fluentd                   Fluentd




                             routing   Fluentd         Fluentd




      Treasure                              Mongo                Hadoop-based
        Data                                 DB                  Cloud Data Warehouse
             batch                           realtime                            36
Tuesday, July 17, 2012
Exercise: Apache Logs into MongoDB




Tuesday, July 17, 2012
Log File




                                    38
Tuesday, July 17, 2012
39
Tuesday, July 17, 2012
40
Tuesday, July 17, 2012
Conclusion
                • Log Everything as JSON
                         • Machine Readability
                         • Schema Freeness
                • MongoDB fits into Fluentd’s backend perfectly
                         • Both using JSON representation




                                                                  41
Tuesday, July 17, 2012

More Related Content

PDF
ビッグデータ処理データベースの全体像と使い分け
PDF
Azure サポート チームの現場からお届けする落ちないサービスのために
PDF
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
PDF
Oracle Analytics Cloud のご紹介【2021年3月版】
PPTX
Real-Time Data Flows with Apache NiFi
PDF
【Oracle Cloud ウェビナー】WebLogic Serverのご紹介
PDF
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
PPTX
High Availability Content Caching with NGINX
ビッグデータ処理データベースの全体像と使い分け
Azure サポート チームの現場からお届けする落ちないサービスのために
【旧版】Oracle Database Cloud Service:サービス概要のご紹介 [2021年7月版]
Oracle Analytics Cloud のご紹介【2021年3月版】
Real-Time Data Flows with Apache NiFi
【Oracle Cloud ウェビナー】WebLogic Serverのご紹介
Db2 v11.5.4 高可用性構成 & HADR 構成パターンご紹介
High Availability Content Caching with NGINX

What's hot (20)

PPTX
BuildKitによる高速でセキュアなイメージビルド
PDF
An Introduction to Kubernetes
PPTX
Virtualization, Containers, Docker and scalable container management services
PPTX
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
PDF
ゲームのインフラをAwsで実戦tips全て見せます
PDF
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
PDF
Zabbix最新情報 ~Zabbix 6.0に向けて~ @OSC2021 Online/Fall
PPTX
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
PDF
20111015 勉強会 (PCIe / SR-IOV)
PDF
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
PDF
ヤフーのプライベートクラウドとクラウドエンジニアの業務について
PDF
Next-Generation Cloud Native Apps with Spring Cloud and Kubernetes
PDF
API提供におけるOAuthの役割 #apijp
PDF
Oracle GoldenGate入門
PDF
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2021年7月版]
PPTX
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
PDF
CPUから見たG1GC
PDF
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
PDF
簡単!AWRをEXCELピボットグラフで分析しよう♪
PDF
[B22] PostgresPlus Advanced Server の Oracle Database 互換機能検証 by Noriyoshi Shinoda
BuildKitによる高速でセキュアなイメージビルド
An Introduction to Kubernetes
Virtualization, Containers, Docker and scalable container management services
GraalVM を普通の Java VM として使う ~クラウドベンチマークなどでの比較~
ゲームのインフラをAwsで実戦tips全て見せます
YugabyteDBを使ってみよう(NewSQL/分散SQLデータベースよろず勉強会 #1 発表資料)
Zabbix最新情報 ~Zabbix 6.0に向けて~ @OSC2021 Online/Fall
Apache Spark on Kubernetes入門(Open Source Conference 2021 Online Hiroshima 発表資料)
20111015 勉強会 (PCIe / SR-IOV)
YugabyteDBを使ってみよう - part2 -(NewSQL/分散SQLデータベースよろず勉強会 #2 発表資料)
ヤフーのプライベートクラウドとクラウドエンジニアの業務について
Next-Generation Cloud Native Apps with Spring Cloud and Kubernetes
API提供におけるOAuthの役割 #apijp
Oracle GoldenGate入門
【旧版】Oracle Exadata Cloud Service:サービス概要のご紹介 [2021年7月版]
しばちょう先生が語る!オラクルデータベースの進化の歴史と最新技術動向#3
CPUから見たG1GC
マイクロサービス時代の認証と認可 - AWS Dev Day Tokyo 2018 #AWSDevDay
簡単!AWRをEXCELピボットグラフで分析しよう♪
[B22] PostgresPlus Advanced Server の Oracle Database 互換機能検証 by Noriyoshi Shinoda
Ad

Similar to Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012 (20)

KEY
Fluentd: the missing log collector
PDF
Fluentd meetup
PDF
Fluentd meetup #2
PDF
The basics of fluentd
PDF
Fluentd meetup in japan
PDF
upload test 1
PDF
Fluentd meetup dive into fluent plugin (outdated)
PDF
Logging Application Behavior to MongoDB
PDF
Fluentd meetup at Slideshare
PDF
Fluentd meetup logging infrastructure in paa s
PDF
The basics of fluentd
PDF
Logging in Action: With Fluentd, Kubernetes and more 1st Edition Phil Wilkins
PPTX
Hadoop in Education
PDF
Read the Docs: A completely open source Django project
PDF
Fluentd introduction at ipros
PDF
Pluggable Django Application Patterns PyCon 2011
PPTX
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
PDF
Fluent logger-scala
PDF
RSSOwl - Open source contribution report
PDF
Flowdock's full-text search with MongoDB
Fluentd: the missing log collector
Fluentd meetup
Fluentd meetup #2
The basics of fluentd
Fluentd meetup in japan
upload test 1
Fluentd meetup dive into fluent plugin (outdated)
Logging Application Behavior to MongoDB
Fluentd meetup at Slideshare
Fluentd meetup logging infrastructure in paa s
The basics of fluentd
Logging in Action: With Fluentd, Kubernetes and more 1st Edition Phil Wilkins
Hadoop in Education
Read the Docs: A completely open source Django project
Fluentd introduction at ipros
Pluggable Django Application Patterns PyCon 2011
Hadoop World 2011: Advanced HBase Schema Design - Lars George, Cloudera
Fluent logger-scala
RSSOwl - Open source contribution report
Flowdock's full-text search with MongoDB
Ad

More from Treasure Data, Inc. (20)

PPTX
GDPR: A Practical Guide for Marketers
PPTX
AR and VR by the Numbers: A Data First Approach to the Technology and Market
PPTX
Introduction to Customer Data Platforms
PPTX
Hands On: Javascript SDK
PPTX
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
PPTX
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
PPTX
How to Power Your Customer Experience with Data
PPTX
Why Your VR Game is Virtually Useless Without Data
PDF
Connecting the Customer Data Dots
PPTX
Harnessing Data for Better Customer Experience and Company Success
PDF
Packaging Ecosystems -Monki Gras 2017
PDF
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
PDF
Keynote - Fluentd meetup v14
PDF
Introduction to New features and Use cases of Hivemall
PDF
Scalable Hadoop in the cloud
PDF
Using Embulk at Treasure Data
PDF
Scaling to Infinity - Open Source meets Big Data
PDF
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
PDF
Treasure Data From MySQL to Redshift
PDF
Unifying Events and Logs into the Cloud
GDPR: A Practical Guide for Marketers
AR and VR by the Numbers: A Data First Approach to the Technology and Market
Introduction to Customer Data Platforms
Hands On: Javascript SDK
Hands-On: Managing Slowly Changing Dimensions Using TD Workflow
Brand Analytics Management: Measuring CLV Across Platforms, Devices and Apps
How to Power Your Customer Experience with Data
Why Your VR Game is Virtually Useless Without Data
Connecting the Customer Data Dots
Harnessing Data for Better Customer Experience and Company Success
Packaging Ecosystems -Monki Gras 2017
글로벌 사례로 보는 데이터로 돈 버는 법 - 트레저데이터 (Treasure Data)
Keynote - Fluentd meetup v14
Introduction to New features and Use cases of Hivemall
Scalable Hadoop in the cloud
Using Embulk at Treasure Data
Scaling to Infinity - Open Source meets Big Data
Treasure Data: Move your data from MySQL to Redshift with (not much more tha...
Treasure Data From MySQL to Redshift
Unifying Events and Logs into the Cloud

Recently uploaded (20)

PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
PDF
Modernizing your data center with Dell and AMD
PDF
KodekX | Application Modernization Development
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Electronic commerce courselecture one. Pdf
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Big Data Technologies - Introduction.pptx
PPT
Teaching material agriculture food technology
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Encapsulation_ Review paper, used for researhc scholars
PPTX
A Presentation on Artificial Intelligence
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Machine learning based COVID-19 study performance prediction
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
Mobile App Security Testing_ A Comprehensive Guide.pdf
Modernizing your data center with Dell and AMD
KodekX | Application Modernization Development
Advanced methodologies resolving dimensionality complications for autism neur...
Electronic commerce courselecture one. Pdf
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Big Data Technologies - Introduction.pptx
Teaching material agriculture food technology
Spectral efficient network and resource selection model in 5G networks
Chapter 3 Spatial Domain Image Processing.pdf
Encapsulation_ Review paper, used for researhc scholars
A Presentation on Artificial Intelligence
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Building Integrated photovoltaic BIPV_UPV.pdf
The AUB Centre for AI in Media Proposal.docx
Reach Out and Touch Someone: Haptics and Empathic Computing
Machine learning based COVID-19 study performance prediction
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Review of recent advances in non-invasive hemoglobin estimation
Per capita expenditure prediction using model stacking based on satellite ima...

Fluentd loves MongoDB, at MongoDB SV User Group, July 17, 2012

  • 1. Fluentd ♥ MongoDB Log Everything As JSON Kazuki Ohta, CTO at Treasure Data, Inc. Tuesday, July 17, 2012
  • 2. Self-Introduction • Kazuki Ohta > twitter: @kzk_mover > github: kzk • Treasure Data, Inc. > Chief Technology Officer; Founder > Original Fluentd Author @frsyuki is another co-founder. • Open-Source Enthusiast > KDE, uim, Hadoop, memcached, Mozilla, Mongo, etc. > Fluentd rpm/deb package manager 2 Tuesday, July 17, 2012
  • 4. Figure 1: Common Logging Purposes Analytics Error Notification Recommendation 4 Tuesday, July 17, 2012
  • 5. Figure 2: Types of Logs App Log Access Log (Apache, Rails, etc.) System Log (syslog etc.) Others 5 Tuesday, July 17, 2012
  • 6. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 7. Fragile for format change, No type information, No field name, etc. From “Scaling Lessons learned at Dropbox” 6 Tuesday, July 17, 2012
  • 9. It's like syslogd, but uses JSON for log messages 8 Tuesday, July 17, 2012
  • 10. Logs in JSON? Why? 1. Machine-Readable > machine is goint to be a main consumer of logs 2. Schema-Free > you want to add/remove fields from logs at anytime Write Logs for Machines, use JSON http://journal.paul.querna.org/articles/2011/12/26/log-for-machines-in-json/ 9 Tuesday, July 17, 2012
  • 11. Logs As TEXT Logs As JSON + Field Name + No Custom Parser + Type Information + Schema Free 10 Tuesday, July 17, 2012
  • 12. Logs As TEXT “2011-04-01 host1 myapp: cmessage size=12MB user=me” Logs As JSON 2011-04-01 myapp.message { “on_host”: ”host1”, ”combined”: true, “size”: 12000000, + Field Name “user”: “me” + No Custom Parser + Type Information } + Schema Free 10 Tuesday, July 17, 2012
  • 13. http://fluentd.org/ 11 Tuesday, July 17, 2012
  • 14. Website > http://fluentd.org/ • Community > http://github.com/fluent > 16 committers across many organizations > web, game, enterprise • Mailing list > Google groups 12 Tuesday, July 17, 2012
  • 16. Fluentd: Log Format Application Fluentd Storage 14 Tuesday, July 17, 2012
  • 17. Fluentd: Log Format Application 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } 14 Tuesday, July 17, 2012
  • 18. Fluentd: Log Format time Application tag 2012-02-04 01:33:51 myapp.buylog { Fluentd “user”: ”me”, “path”: “/buyItem”, “price”: 150, “referer”: “/landing” Storage } record 14 Tuesday, July 17, 2012
  • 19. Fluentd: Plugins Application filter / buffer / Fluentd routing Storage 15 Tuesday, July 17, 2012
  • 20. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 15 Tuesday, July 17, 2012
  • 21. Fluentd: Plugins Application filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 22. Fluentd: Plugins syslogd Scribe Application File Plug-in tail Plug-in Plug-in filter / buffer / Fluentd routing SaaS Storage Fluentd Plug-in Plug-in Plug-in 16 Tuesday, July 17, 2012
  • 23. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... 17 Tuesday, July 17, 2012
  • 24. Client libraries > Ruby > Perl Application Buffering > PHP HTTP / TCP / UDS > Python > Java Fluentd > ... Fluent.open(“myapp”) Fluent.event(“login”, {“user”=>38}) #=> 2012-02-04 04:56:01 myapp.login {“user”:38} 17 Tuesday, July 17, 2012
  • 25. Typical Log Collection by `rsync` Burst of traffic rsync consumes all bandwidth 18 Tuesday, July 17, 2012
  • 26. Typical Log Collection by `rsync` App server App server App server Application Application Application File File File ... File File File ... File File File ... File Burst of traffic High latency rsync consumes must wait for a day all bandwidth Log server Hard to analyze complex text parsers 18 Tuesday, July 17, 2012
  • 27. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd 19 Tuesday, July 17, 2012
  • 28. Log Collection using Fluentd Fluentd Fluentd Fluentd Realtime! Fluentd Fluentd Amazon Ready to Hadoop Mongo S3 / / Hive DB EMR Analyze! 19 Tuesday, July 17, 2012
  • 29. Fluentd Case Study Ruby on Rails Ruby on Rails Ruby on Rails Fluentd Fluentd Fluentd ✓ 127 RoR servers ✓ 100,000 msgs/sec Fluentd Fluentd routing ✓ 120Mbps at peak ✓ 1TB/day Hadoop Mongo User behavior PV logs / Hive DB logs 20 Tuesday, July 17, 2012
  • 30. # read logs from a file # forward other logs to servers <source> # (load-balancing + fail-over) type tail <match **> path /var/log/httpd.log type forward format apache <server> tag apache.access host 192.168.0.11 </source> weight 20 </server> # save access logs to MongoDB <server> <match apache.access> host 192.168.0.12 type mongo weight 60 host 127.0.0.1 </server> </match> </match> Tuesday, July 17, 2012
  • 32. Scribe: log collector by Facebook Frontend servers Aggregator nodes scribe scribe scribe Hadoop HDFS scribe scribe scribe 23 Tuesday, July 17, 2012
  • 33. Scribe’s Pros & Cons • Pros. • Fast (written in C++) • Cons. • VERY HARD to install • nightmare of boost, thrift, libhdfs, etc. • Unstructured Logs • parsing must be required before the analysis • Hard to extend • recompiling C++ programs are required • No longer maintained 24 Tuesday, July 17, 2012
  • 34. Fluentd vs Scribe • Easy to install • “gem install fluentd” • Stable RPM and Deb packages • http://packages.treasure-data.com/ • Easy to write plugins • you can use Ruby • Easy plugin distribution • “gem search -rd fluent-plugin” 25 Tuesday, July 17, 2012
  • 35. Flume: distributed log collector by Cloudera Phisical Flume Master Topology Flume Flume Flume Logical Hadoop Topology HDFS 26 Tuesday, July 17, 2012
  • 36. Flume’s Pros & Cons • Pros. • Central master server manages all nodes • Cons. • Difficult to understand • logical topologies, phisical servers and a configuration of the logical/phisical mapping • Difficult to configure • replicated master servers, log servers and agents • Big footprint • 50,000 lines of Java 27 Tuesday, July 17, 2012
  • 37. Fluentd vs Flume • Easy to understand • “syslogd that understands JSON” • Easy to setup • “sudo fluentd --setup && fluentd” • Very small footprint • small engine (3,000) lines + plugins • small, but battle-tested! • Easy to configure 28 Tuesday, July 17, 2012
  • 38. Fluentd Scribe Flume Installation gem/rpm/deb make jar/rpm/deb 3000 lines of 8000 lines of 50,000 lines of Footprint Ruby C++ Java Plugin Ruby N/A Java Plugin distribution RubyGems.org N/A N/A Master Server No No Yes License Apache License Apache License Apache License 29 Tuesday, July 17, 2012
  • 40. fluent-plugin-mongo • Included within rpm/deb by default! • http://github.com/fluent/fluent-plugin-mongo • #1 plugin among 50+ Fluentd plugins • Logs As JSON. WHY NOT Put Them Into Mongo?? • http://fluentd.org/plugin/ • Supports most of the MongoDB features • Authentication • ReplicaSet • Capped Collection 31 Tuesday, July 17, 2012
  • 41. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 42. • MongoDB Output Plugin Application • Maintain JSON Structure • Reliable Buffering • Batch Insertion Fluentd Buffering • Handle Broken Records • Ruby Driver #82 Authentication MongoDB MongoDB MongoDB MongoDB MongoDB MongoDB Single Instance MongoDB MongoDB (Capped or Not) MongoDB MongoDB Sharding ReplicaSet 32 Tuesday, July 17, 2012
  • 43. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 44. ReplicaSet (Capped Collection) Single Instance (Capped Collection) MongoDB MongoDB MongoDB MongoDB Authentication Fluentd Buffering • MongoDB Input Plugin • Tailing Capped Collections 33 Tuesday, July 17, 2012
  • 45. Realtime Analytics with Fluentd + MongoDB App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Nagios, Zabbix, etc. Mongo query Charting Alert DB 34 Tuesday, July 17, 2012
  • 46. Realtime or Batch? No, BOTH! App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Hadoop Amazon Mongo query Charting / Hive S3 DB batch archive realtime 35 Tuesday, July 17, 2012
  • 47. Intro of our company’s service: Treasure Data App App App Fluentd Fluentd Fluentd routing Fluentd Fluentd Treasure Mongo Hadoop-based Data DB Cloud Data Warehouse batch realtime 36 Tuesday, July 17, 2012
  • 48. Exercise: Apache Logs into MongoDB Tuesday, July 17, 2012
  • 49. Log File 38 Tuesday, July 17, 2012
  • 52. Conclusion • Log Everything as JSON • Machine Readability • Schema Freeness • MongoDB fits into Fluentd’s backend perfectly • Both using JSON representation 41 Tuesday, July 17, 2012