SlideShare a Scribd company logo
Discussion on the P2P
                     Distributed
                   Search System
                  Yusuke FUJISAKA, CyberAgent inc.




2011   6   18
•
                •
                    •

                •
                    •

                    •
2011   6   18
Agenda

                •
                •
                •
                    •

                    •


2011   6   18
•


                •(   :)




2011   6   18
•                2
                    • 1   100        →10 /sec.

                    •           TB

                •



2011   6   18
•
                •
                •
                •
                • ...


2011   6   18
•    (   TB   )


                •
                •
                •
                •…

2011   6   18
•   Namazu
                •   Senna
                •   Lucene
                •   Solr
                •   Hyper Estraier
                •   etc. ...


2011   6   18
• Solr                 (   )

                                                Broker
                Search

                          Application

                                                Slave
                          Index Data


                                                Master
                Index

2011   6   18
• Solr                 (   )

                                                Broker
                Search

                          Application

                                                Slave
                          Index Data


                                                Master
                Index

2011   6   18
• Solr                 (   )

                                                Broker
                Search

                          Application

                                                Slave
                          Index Data


                                                Master
                Index

2011   6   18
•
                    •

                    •

                    •




2011   6   18
•   →


                •




2011   6   18
•
                    • Zoie (with Lucene)
                    • Caffeine (Google)
                    • ...




2011   6   18
FARE system

                •
                • Fast Autonomous Retrieval Engine
                    •




2011   6   18
1.No Master-Slave
                 •

                2.Peer to Peer
                 •

                3.Realtime indexing
                 •



2011   6   18
No Master-Slave

                •           …
                    • 1            →Primary mode

                    •
                        →Secondary mode




2011   6   18
Primary mode
                •


                        <Setting>
                        ...
                        Content = Title
                        Content = Body
                        Appendix = PostID
                        Appendix = URL
                        Appendix = PostTime




2011   6   18
Primary mode
                •




                                  <Setting>




                    ./fare --primary SETTING



2011   6   18
Secondary mode
                                                10.0.1.1
                            10.0.1.2




                10.0.1.3




                                       ./fare --secondary 10.0.1.1




2011   6   18
Secondary mode
                                                  10.0.1.1
                            10.0.1.2



                                          Setting, Data



                10.0.1.3




                                       ./fare --secondary 10.0.1.1




2011   6   18
Secondary mode
                                                10.0.1.1
                            10.0.1.2




                10.0.1.3




                                       ./fare --secondary 10.0.1.2




2011   6   18
Secondary mode
                                                10.0.1.1
                            10.0.1.2



                                                Setting, Data




                10.0.1.3




                                       ./fare --secondary 10.0.1.2




2011   6   18
Secondary mode
                                                10.0.1.1
                            10.0.1.2



                                                Setting, Data




                10.0.1.3




                                       ./fare --secondary 10.0.1.2




2011   6   18
•


                •


                •   Consistent hashing



2011   6   18
• Chord chain
                            F
                                        A


                                            B
                      E

                                    C
                                D
2011   6   18
• Chord chain
                            F
                     Hash: 0xEF459AB...       A


                                                  B
                      E

                                          C
                                      D
2011   6   18
• Chord chain
                            F
                     Hash: 0xEF459AB...           A


                                                      B
                      E
                             Hash: 0x6D137EA...

                                              C
                                      D
2011   6   18
•F
                         F
                                     A


                                         B
                     E

                                 C
                             D
2011   6   18
F
                                          A


                                              B
                E
                    (replication=3)

                                      C
                                D
2011   6   18
•
                •




2011   6   18
•1   1




2011   6   18
1
           Node 1: 56%
                                 Node 2: 20%



                                 2



                         3   Node 3: 24%
2011   6   18
1
           Node 1: 56%
                                 Node 2: 20%



                                 2



                         3   Node 3: 24%
2011   6   18
2
                                       3
                        Node 1 56% 42%
                        Node 2 20% 32%
                1                         1
                         Node 3 24% 26% 2
                 3
                     1                  3
                       2              1
                           1 2     2
                                1
2011   6   18
2
                                       3
                        Node 1 56% 42%
                        Node 2 20% 32%
                1                         1
                         Node 3 24% 26% 2
                 3
                     1                  3
                       2              1
                           1 2     2
                                1
2011   6   18
Indexing

                • RPC         (    )


                 •            →

                 •            →




2011   6   18
Queries

                Invert Index


                Contents
2011   6   18
Skip pointer                 …

                 Dictionary      …

                Invert index     …
                           ID
            Skip pointer                 …

                   Content      … Appendix   …

2011   6   18
Indexing
                        F
       ID: 12345
       Title: ...                             A       Content
       Body: ...                                       Term
        URL: ...        MessagePack-RPC
        Time:...                                       Term
                                                  B
                    E

                                          C
                                 D
2011   6   18
Indexing
                    F
                                   A       Content
                                            Term


                                       B    Hash
                        Hash
                E

                               C
                        D
2011   6   18
Searching
                                     MessagePack-RPC
                     F   Term
                                Analyze
                         Term             A       Query
                         Term


                                              B
                E

                                      C
                            D
2011   6   18
Searching
                                              Intersection
                         F
                                                   A
                ID: 12, 24, 35, 49, ...


                    ID: 12, 30, 49, 55, ...            B
                E                                  ID: 7, 12, 30, 49, ...


                                               C
                                     D
2011   6   18
Searching
                     F                                         Output

                                                     A
                           ID: 49
                                                             ID: 12


                         Title:
                                                         B
                                      Body:...
                E

                                                 C
                                  D
2011   6   18
• 1000


                 •

                 •

                 •       …



2011   6   18
Beacon
                    F
                        beacon       A


                                         B
                E

                                 C
                             D
2011   6   18
Beacon
                    F
                        beacon        A
                         “live”

                                          B
                E

                                  C
                             D
2011   6   18
F
                …           ×
                                        A


                                            B
                    E

                                    C
                                D
2011   6   18
Running
                              F
                                  ×
                Suspend                       A


                                                  B
                          E

                                          C
                                      D
2011   6   18
Suspend                           dead “F”
                              F
                                  ×
                Node dead                     A


                                                  B
                          E

                                          C
                                      D
2011   6   18
A


                                B
                E

                        C
                    D
2011   6   18
Suspend
                              F
                Running                   A


                                              B
                          E

                                      C
                                  D
2011   6   18
Beacon
                    F
                                 A


                                     B
                E

                             C
                        D
2011   6   18
Future work

                •
                    •
                    •

                •




2011   6   18
Source codes


                •                       :
                    • http://code.google.com/p/fujene/




2011   6   18
Bibliography
                (1) I. Stoica, et al.; Chord: A Scalable Peer-
                  to-peer Lookup Service for Internet
                  Applications; SIGCOMM 2001; October
                  2001
                (2) D. Karger, et al.; Consistent Hashing
                  and Random Trees: Distributed Caching
                  Protocols for Relieving Hot Spots on the
                  World Wide Web; STOC ’97; 1997


2011   6   18
Bibliography

                (3) C. D. Manning, et al.; An Introduction to
                  Information Retrieval; Cambridge UP;
                  2009
                (4) T. Luu, et al.; ALVIS Peers: A Scalable
                  Full-text Peer-to-Peer Retrieval Engine;
                  P2PIR ’06; Nov. 2006



2011   6   18

More Related Content

PDF
Apimec – 1 q07 results
PDF
P2p search engine
PDF
Lda2vec text by the bay 2016 with notes
PDF
lda2vec Text by the Bay 2016
PDF
Fabrikatyr lda topic modelling practical application
PPTX
Topic Modelling to identify behavioral trends in online communities
PDF
Drawing word2vec
PDF
次世代シーケンサのデータ解析 技術開発編
Apimec – 1 q07 results
P2p search engine
Lda2vec text by the bay 2016 with notes
lda2vec Text by the Bay 2016
Fabrikatyr lda topic modelling practical application
Topic Modelling to identify behavioral trends in online communities
Drawing word2vec
次世代シーケンサのデータ解析 技術開発編

Similar to Discussion on the Distributed Search Engine (20)

PDF
WEBエンジニアが押さえておきたいソーシャルアプリ開発のツボ
PDF
Améliorez votre Kanban !
PDF
クラウドを利用した電力可視化システムの構築
PDF
UI於現階段與未來的應用趨勢
DOC
Google 運算元
PDF
Gumi mr. horiuchi
PPTX
Galaxy of bits
PDF
Advanced mysql replication techniques
PDF
IEEE P2P 2009 - Kalman Graffi - Monitoring and Management of Structured Peer-...
DOC
java & .net titles
PDF
Kalman Graffi - 3rd Research Talk - 2010
PDF
Panasonic search
 
PDF
AWSによるソーシャルアプリ運用事例
PDF
The Future of Search in Plone
PDF
The Future of Search in Plone
PDF
Evernote overview sept 2011
PPTX
Splunk at opa
PDF
Kalman Graffi - 10 Slide - 2010
PDF
Google arch-fangkun-qcon
WEBエンジニアが押さえておきたいソーシャルアプリ開発のツボ
Améliorez votre Kanban !
クラウドを利用した電力可視化システムの構築
UI於現階段與未來的應用趨勢
Google 運算元
Gumi mr. horiuchi
Galaxy of bits
Advanced mysql replication techniques
IEEE P2P 2009 - Kalman Graffi - Monitoring and Management of Structured Peer-...
java & .net titles
Kalman Graffi - 3rd Research Talk - 2010
Panasonic search
 
AWSによるソーシャルアプリ運用事例
The Future of Search in Plone
The Future of Search in Plone
Evernote overview sept 2011
Splunk at opa
Kalman Graffi - 10 Slide - 2010
Google arch-fangkun-qcon
Ad

Recently uploaded (20)

PDF
KodekX | Application Modernization Development
PDF
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
Spectroscopy.pptx food analysis technology
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Machine learning based COVID-19 study performance prediction
PDF
Approach and Philosophy of On baking technology
PPTX
sap open course for s4hana steps from ECC to s4
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PDF
Encapsulation_ Review paper, used for researhc scholars
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
KodekX | Application Modernization Development
Peak of Data & AI Encore- AI for Metadata and Smarter Workflows
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
MYSQL Presentation for SQL database connectivity
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Reach Out and Touch Someone: Haptics and Empathic Computing
ACSFv1EN-58255 AWS Academy Cloud Security Foundations.pptx
The AUB Centre for AI in Media Proposal.docx
Spectroscopy.pptx food analysis technology
“AI and Expert System Decision Support & Business Intelligence Systems”
Machine learning based COVID-19 study performance prediction
Approach and Philosophy of On baking technology
sap open course for s4hana steps from ECC to s4
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Review of recent advances in non-invasive hemoglobin estimation
Agricultural_Statistics_at_a_Glance_2022_0.pdf
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Encapsulation_ Review paper, used for researhc scholars
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
Ad

Discussion on the Distributed Search Engine

  • 1. Discussion on the P2P Distributed Search System Yusuke FUJISAKA, CyberAgent inc. 2011 6 18
  • 2. • • • • • 2011 6 18
  • 3. Agenda • • • • • 2011 6 18
  • 4. •( :) 2011 6 18
  • 5. 2 • 1 100 →10 /sec. • TB • 2011 6 18
  • 6. • • • • ... 2011 6 18
  • 7. ( TB ) • • • •… 2011 6 18
  • 8. Namazu • Senna • Lucene • Solr • Hyper Estraier • etc. ... 2011 6 18
  • 9. • Solr ( ) Broker Search Application Slave Index Data Master Index 2011 6 18
  • 10. • Solr ( ) Broker Search Application Slave Index Data Master Index 2011 6 18
  • 11. • Solr ( ) Broker Search Application Slave Index Data Master Index 2011 6 18
  • 12. • • • 2011 6 18
  • 13. → • 2011 6 18
  • 14. • Zoie (with Lucene) • Caffeine (Google) • ... 2011 6 18
  • 15. FARE system • • Fast Autonomous Retrieval Engine • 2011 6 18
  • 16. 1.No Master-Slave • 2.Peer to Peer • 3.Realtime indexing • 2011 6 18
  • 17. No Master-Slave • … • 1 →Primary mode • →Secondary mode 2011 6 18
  • 18. Primary mode • <Setting> ... Content = Title Content = Body Appendix = PostID Appendix = URL Appendix = PostTime 2011 6 18
  • 19. Primary mode • <Setting> ./fare --primary SETTING 2011 6 18
  • 20. Secondary mode 10.0.1.1 10.0.1.2 10.0.1.3 ./fare --secondary 10.0.1.1 2011 6 18
  • 21. Secondary mode 10.0.1.1 10.0.1.2 Setting, Data 10.0.1.3 ./fare --secondary 10.0.1.1 2011 6 18
  • 22. Secondary mode 10.0.1.1 10.0.1.2 10.0.1.3 ./fare --secondary 10.0.1.2 2011 6 18
  • 23. Secondary mode 10.0.1.1 10.0.1.2 Setting, Data 10.0.1.3 ./fare --secondary 10.0.1.2 2011 6 18
  • 24. Secondary mode 10.0.1.1 10.0.1.2 Setting, Data 10.0.1.3 ./fare --secondary 10.0.1.2 2011 6 18
  • 25. • • Consistent hashing 2011 6 18
  • 26. • Chord chain F A B E C D 2011 6 18
  • 27. • Chord chain F Hash: 0xEF459AB... A B E C D 2011 6 18
  • 28. • Chord chain F Hash: 0xEF459AB... A B E Hash: 0x6D137EA... C D 2011 6 18
  • 29. •F F A B E C D 2011 6 18
  • 30. F A B E (replication=3) C D 2011 6 18
  • 31. • 2011 6 18
  • 32. •1 1 2011 6 18
  • 33. 1 Node 1: 56% Node 2: 20% 2 3 Node 3: 24% 2011 6 18
  • 34. 1 Node 1: 56% Node 2: 20% 2 3 Node 3: 24% 2011 6 18
  • 35. 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 2 3 1 3 2 1 1 2 2 1 2011 6 18
  • 36. 2 3 Node 1 56% 42% Node 2 20% 32% 1 1 Node 3 24% 26% 2 3 1 3 2 1 1 2 2 1 2011 6 18
  • 37. Indexing • RPC ( ) • → • → 2011 6 18
  • 38. Queries Invert Index Contents 2011 6 18
  • 39. Skip pointer … Dictionary … Invert index … ID Skip pointer … Content … Appendix … 2011 6 18
  • 40. Indexing F ID: 12345 Title: ... A Content Body: ... Term URL: ... MessagePack-RPC Time:... Term B E C D 2011 6 18
  • 41. Indexing F A Content Term B Hash Hash E C D 2011 6 18
  • 42. Searching MessagePack-RPC F Term Analyze Term A Query Term B E C D 2011 6 18
  • 43. Searching Intersection F A ID: 12, 24, 35, 49, ... ID: 12, 30, 49, 55, ... B E ID: 7, 12, 30, 49, ... C D 2011 6 18
  • 44. Searching F Output A ID: 49 ID: 12 Title: B Body:... E C D 2011 6 18
  • 45. • 1000 • • • … 2011 6 18
  • 46. Beacon F beacon A B E C D 2011 6 18
  • 47. Beacon F beacon A “live” B E C D 2011 6 18
  • 48. F … × A B E C D 2011 6 18
  • 49. Running F × Suspend A B E C D 2011 6 18
  • 50. Suspend dead “F” F × Node dead A B E C D 2011 6 18
  • 51. A B E C D 2011 6 18
  • 52. Suspend F Running A B E C D 2011 6 18
  • 53. Beacon F A B E C D 2011 6 18
  • 54. Future work • • • • 2011 6 18
  • 55. Source codes • : • http://code.google.com/p/fujene/ 2011 6 18
  • 56. Bibliography (1) I. Stoica, et al.; Chord: A Scalable Peer- to-peer Lookup Service for Internet Applications; SIGCOMM 2001; October 2001 (2) D. Karger, et al.; Consistent Hashing and Random Trees: Distributed Caching Protocols for Relieving Hot Spots on the World Wide Web; STOC ’97; 1997 2011 6 18
  • 57. Bibliography (3) C. D. Manning, et al.; An Introduction to Information Retrieval; Cambridge UP; 2009 (4) T. Luu, et al.; ALVIS Peers: A Scalable Full-text Peer-to-Peer Retrieval Engine; P2PIR ’06; Nov. 2006 2011 6 18