SlideShare a Scribd company logo
OpenSearchLab and Lucene

            Grant Ingersoll
     Chief Scientist @LucidWorks
Member, Committer at Apache Soft. Found.
     Co-Founder, Apache Mahout
Hats




I’m here as an individual who happens to contribute (and commit)
      to Lucene, Solr, Mahout and other open source projects.
  I don’t officially represent the ASF or even Lucene/Solr/Mahout.
Topics
• Openness

• What are some OpenSearchLab (OSL) needs?

• The Lucene Ecosystem

• Lucene for Research?

• A Sample Architecture
Putting the Open in OpenSearchLab
• Open Development >> Open Source
• Open community

• Open corpora

• Open evaluations

• Open Research
  •   w/o being onerous
                          http://www.facebook.com/photo.php?fbid=10151728075710181&set=a.101
                          51045050120181.780469.68096845180&type=1&theater
OSL Needs?
        Community                 Code                       Infrastructure

• Openness Model      • Architecture                • Hardware
                        • Flexible                    • Cloud or hosted?
• Contributions:        • Scalable                    • Network/Bandwidth
  • Who?              • Experiment Mgmt               • Production/Staging/Dev
  • Where?
  • How?              • Content Acquisition         • $$$$
                      • Analysis
• Ownership/Legal:    • Indexing                    • Release Management
  • Code              • Querying
  • Contributions     • Downstream Tools            • Devops
  • Infrastructure      • Faceting, highlighting,   •…
                          auto-suggest,
• Privacy                 spellchecking, etc.
•…                    • Records Mgmt
                      • Testing
                      •…
What’s this have to do with
         Lucene?
“An ecosystem is a community of living organisms in conjunction
with the nonliving components of their environment interacting
as a system.”
   – Wikipedia
                               Code


                          Committers


                      Contributors
                              ASF
                            Users
The ASF and ASL
• ASF == Apache Software Foundation
   – Volunteer-based, but many are paid to work on open source by their
     employer

   – Community Over Code
       • Consensus-driven development
   – Meritocracy
       • “Those who do, make the decisions”
   – 100+ Top Level Projects
   – Infrastructure to support projects
   – “The Apache Way”

• ASL == Apache Software License (v2)
                              ASL ≠ ASF
Lucene Community
•   In a nutshell: Large, Active Community
•   30+ committers, many, many more contributors
•   (Tens of?) Thousands of Practitioners
•   Thousands of production instances
    – Twitter, Apple, IBM Watson, LinkedIn, Netflix,
      Commercial Search Engines, …
    – “… they frequently turn to real-time search: our
      system serves over two billion queries a day, with an
      average query latency of 50 ms. Usually, tweets are
      searchable within 10 seconds after creation.” --
      EarlyBird, Busch et. al.
The Code Ecosystem
          Solr



 Tika             Hadoop


        Lucene
         Core
Nutch             Mahout



        OpenNLP
• Flagship Java library for building search applications
    – Indexing, Searching, Language Analysis

•   Powers apps large and small the world over
•   More in Apache Lucene 4 talk later
•   Fast, small footprint
•   Lots of useful related modules
    – Highlighting, Joins, Spatial, etc.

• http://lucene.apache.org/core
• Search server built using Lucene and HTTP
• Faceting, highlighting, most Lucene features,
  easy admin
• Highly Extensible
• Scalable (query volume and index size)

• Lucene Best Practices
• http://lucene.apache.org/solr
• Originally built for Nutch to solve large scale
  crawling problems

• Distributed File System and Computation Model
   – HDFS and MapReduce, YARN coming
• Common Use Cases: storage, log analysis, ETL

• http://hadoop.apache.org
• Web-scale crawler and search built on
  Lucene/Solr and Hadoop
• Link analysis (aka PageRank)
• Plugin framework
• Parsers for common document formats (PDF,
  Word, HTML, etc.)

• http://nutch.apache.org
• Scalable machine learning
  – Utilize Hadoop where appropriate
• Primary Focus: “The 3 C’s”
  – Clustering, classification, collaborative filtering
• Others
  – Frequent pattern mining, topic extraction,
    statistically interesting phrases

• http://mahout.apache.org
• Toolkit for detecting and extracting content from
  MIME types
• Support for many common file formats
   – Office, PDF, HTML, etc.
• Intuitive API (think SAX parser)
• Wraps best of breed open source extractors
• Plug in your own

• http://tika.apache.org
• Supports common NLP tasks
  – NER, POS tagging, Chunking, Parsing, CoRef
    resolution
• MaxEnt and Perceptron based
  – Working to make the machine learning pluggable
• Some Multilingual support
• New life at the ASF
• Related: cTakes, Stanbol
Other Useful Tools
• Apache Zookeeper – Distrib. Coordination
• Apache Pig – Hadoop scripting w/o Java
• Apache HBase/Accumulo/Cassandra –
  BigTable/Dynamo
• Avro and Protobufs – Serialization
  frameworks
• Netty: Server framework – easy to add
  protocols and to scale
• Stanbol – Semantic Content Management
  using Solr, OpenNLP, others
• UIMA – Unstructured Info Management
LUCENE CAN HAS RESEARCH?
• Dispelling a few misconceptions:
  – No such thing as Lucene OOTB
  – Lucene ≠ Solr
• Researchers are welcome!
  – Large audience and many domains
  – http://wiki.apache.org/lucene-
    java/HowToContribute
  – Battle-tested code
  – Speed v. Quality tradeoffs
                             http://1.bp.blogspot.com/_T2ki5Em5dnI/S8gxtImG7
                             wI/AAAAAAAAAEs/N7aZKZ6g6g4/s1600/cat%2520ty
                             ping.jpg
Research/Contribution Areas
• Work with the community to do evaluations
• Scoring
   – BM25, LM, IM, DFR others already implemented
   – Easy to add your own
• Codecs
   – Extensible compression/storage
   – Many already implemented approaches and more coming
   – SimpleText FTW!
• Others:
   – Faceting, auto-suggest, spell-checking, highlighting,
     expansion and more
   – Different domains: machine generated data, mobile,
Clients




Abstract OSL Architecture          Access APIs


                                                 Personalization
     Shard     Shard                Shard                           Users/Admin/
                            ...                    & Machine
       1         2                    n                                Other
                                                    Learning
                Search View




    Updates/Analysis
                                  Distributed, Scalable         Distributed
    (Batch/Real Time)
                                         Storage              Coordination and
                                  (Docs, Users, Logs)           Messaging




                                                          Keys
   Content Acquisition
      Distributed Content
    Content Acquisition                   - Service-Oriented Architecture
       Acquisition ETL                    - Stateless
     Batch and Real Time                  - Failover/Fault Tolerant
                                          - Glue is lightweight
                                          - Smart about updates




       Data (Internet)
Clients




Lucene Ecosystem Implementation       Access APIs


                                                    Personalization
        Shard     Shard                Shard                           Users/Admin/
                               ...                    & Machine
          1         2                    n                                Other
                                                       Learning
                   Search View




       Updates/Analysis
                                     Distributed, Scalable         Distributed
       (Batch/Real Time)
                                            Storage              Coordination and
                                     (Docs, Users, Logs)           Messaging




                                                             Keys
      Content Acquisition
         Distributed Content
       Content Acquisition                   - Service-Oriented Architecture
          Acquisition ETL                    - Stateless
        Batch and Real Time                  - Failover/Fault Tolerant
                                             - Glue is lightweight
                                             - Smart about updates




          Data (Internet)
Takeaways
• Open Development >> Open Source >> Shared
  Source
  – Corollary: You never know where good ideas are
    coming from
• ASF is a proven model for collaboration
• Lucene ecosystem: extensive, production ready
• Lucene 4 is viable for IR algorithms and data
  structure research
• OSL (IMO) needs a services-based, pluggable
  architecture
Resources
• Getting Started
  – {Lucene|Mahout|Hadoop} In Action
  – Taming Text


• grant@lucidworks.com
• @gsingers
• http://www.lucidworks.com

More Related Content

PPTX
Apache Lucene 4
PPTX
Open Source Search FTW
PPTX
Taming Text
ODP
If You Have The Content, Then Apache Has The Technology!
PPTX
Discovery Interfaces
PDF
Search all the things
PDF
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PDF
PyTorch 04 What's New in PyTorch Land
Apache Lucene 4
Open Source Search FTW
Taming Text
If You Have The Content, Then Apache Has The Technology!
Discovery Interfaces
Search all the things
Using JPA applications in the era of NoSQL: Introducing Hibernate OGM
PyTorch 04 What's New in PyTorch Land

What's hot (20)

PDF
How Solr Search Works
PDF
Solr: 4 big features
PDF
Hacking Lucene and Solr for Fun and Profit
PPTX
Nashville analytics summit aug9 no sql mike king dell v1.5
PPTX
NoSql - mayank singh
PPTX
Introduction to Lucene & Solr and Usecases
PPTX
Practical Machine Learning for Smarter Search with Solr and Spark
PDF
Webinar: Search and Recommenders
PPTX
Battle of the giants: Apache Solr vs ElasticSearch
PPTX
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
PDF
Intro to Elasticsearch
PPTX
Threat hunting using notebook technologies
PDF
Cloudera search
PDF
Cloudera Search Webinar: Big Data Search, Bigger Insights
PDF
Developing a Movie recommendation Engine with Spark
PPTX
Introduction to Apache Solr
PDF
elasticsearch
PDF
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
PDF
Elasticsearch Basics
How Solr Search Works
Solr: 4 big features
Hacking Lucene and Solr for Fun and Profit
Nashville analytics summit aug9 no sql mike king dell v1.5
NoSql - mayank singh
Introduction to Lucene & Solr and Usecases
Practical Machine Learning for Smarter Search with Solr and Spark
Webinar: Search and Recommenders
Battle of the giants: Apache Solr vs ElasticSearch
What's New in Lucene/Solr Presented by Grant Ingersoll at SolrExchage DC
Intro to Elasticsearch
Threat hunting using notebook technologies
Cloudera search
Cloudera Search Webinar: Big Data Search, Bigger Insights
Developing a Movie recommendation Engine with Spark
Introduction to Apache Solr
elasticsearch
ElasticSearch: Distributed Multitenant NoSQL Datastore and Search Engine
Elasticsearch Basics
Ad

Viewers also liked (9)

PPTX
Intro to Search
PPTX
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
PPTX
Crowd Sourced Reflected Intelligence for Solr and Hadoop
PPTX
Leveraging Solr and Mahout
PPTX
Enterprise Search Using Apache Solr
PPTX
Data IO: Next Generation Search with Lucene and Solr 4
PPTX
What's new in Lucene and Solr 4.x
PPTX
This Ain't Your Parent's Search Engine
PDF
Solr for Data Science
Intro to Search
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Crowd Sourced Reflected Intelligence for Solr and Hadoop
Leveraging Solr and Mahout
Enterprise Search Using Apache Solr
Data IO: Next Generation Search with Lucene and Solr 4
What's new in Lucene and Solr 4.x
This Ain't Your Parent's Search Engine
Solr for Data Science
Ad

Similar to OpenSearchLab and the Lucene Ecosystem (20)

PDF
DataScience Meeting II - Roman Kern - Building an open source based search so...
PDF
NoSQL, Apache SOLR and Apache Hadoop
ODP
Large scale crawling with Apache Nutch
PDF
No SQL Technologies
PDF
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
PPTX
The Evolution of the Hadoop Ecosystem
PPT
If we build it will they come? BOSC2012 Keynote Goble
KEY
Polyglot Persistence & Big Data in the Cloud
PDF
Catmandu / LibreCat Project
PPTX
Inside Wordnik's Architecture
PDF
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
PDF
Interoperability Requirements for a Sustainable Component to Support Manageme...
PPTX
Steve Watt Presentation
PDF
Distributed Data processing in a Cloud
KEY
From legacy, to batch, to near real-time
PDF
Petabyte scale on commodity infrastructure
PDF
Lucene Case Studies ApacheCon EU 2009
KEY
Introduction to the Semantic Web
PPT
If we build it will they come?
PDF
Using Lucene/Solr to Surface the Big Data of Social Media
DataScience Meeting II - Roman Kern - Building an open source based search so...
NoSQL, Apache SOLR and Apache Hadoop
Large scale crawling with Apache Nutch
No SQL Technologies
Chicago Data Summit: Keynote - Data Processing with Hadoop: Scalable and Cost...
The Evolution of the Hadoop Ecosystem
If we build it will they come? BOSC2012 Keynote Goble
Polyglot Persistence & Big Data in the Cloud
Catmandu / LibreCat Project
Inside Wordnik's Architecture
Large-Scale Search Discovery Analytics with Hadoop, Mahout, Solr
Interoperability Requirements for a Sustainable Component to Support Manageme...
Steve Watt Presentation
Distributed Data processing in a Cloud
From legacy, to batch, to near real-time
Petabyte scale on commodity infrastructure
Lucene Case Studies ApacheCon EU 2009
Introduction to the Semantic Web
If we build it will they come?
Using Lucene/Solr to Surface the Big Data of Social Media

More from Grant Ingersoll (11)

PPTX
Scalable Machine Learning with Hadoop
PPTX
Large Scale Search, Discovery and Analytics in Action
PPTX
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
PPTX
Bet you didn't know Lucene can...
PDF
Starfish: A Self-tuning System for Big Data Analytics
PPTX
Intro to Mahout -- DC Hadoop
PPTX
Intro to Apache Lucene and Solr
PPTX
Apache Mahout: Driving the Yellow Elephant
PPTX
Intelligent Apps with Apache Lucene, Mahout and Friends
PPTX
TriHUG: Lucene Solr Hadoop
PPTX
Intro to Apache Mahout
Scalable Machine Learning with Hadoop
Large Scale Search, Discovery and Analytics in Action
Large Scale Search, Discovery and Analytics with Hadoop, Mahout and Solr
Bet you didn't know Lucene can...
Starfish: A Self-tuning System for Big Data Analytics
Intro to Mahout -- DC Hadoop
Intro to Apache Lucene and Solr
Apache Mahout: Driving the Yellow Elephant
Intelligent Apps with Apache Lucene, Mahout and Friends
TriHUG: Lucene Solr Hadoop
Intro to Apache Mahout

Recently uploaded (20)

PPTX
Cloud computing and distributed systems.
PDF
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
PDF
Machine learning based COVID-19 study performance prediction
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PPTX
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
PDF
Review of recent advances in non-invasive hemoglobin estimation
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
PDF
CIFDAQ's Market Insight: SEC Turns Pro Crypto
PDF
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
PDF
Empathic Computing: Creating Shared Understanding
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PPT
Teaching material agriculture food technology
PDF
Advanced Soft Computing BINUS July 2025.pdf
PPTX
MYSQL Presentation for SQL database connectivity
Cloud computing and distributed systems.
GDG Cloud Iasi [PUBLIC] Florian Blaga - Unveiling the Evolution of Cybersecur...
Machine learning based COVID-19 study performance prediction
Spectral efficient network and resource selection model in 5G networks
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Advanced methodologies resolving dimensionality complications for autism neur...
PA Analog/Digital System: The Backbone of Modern Surveillance and Communication
Review of recent advances in non-invasive hemoglobin estimation
Diabetes mellitus diagnosis method based random forest with bat algorithm
Optimiser vos workloads AI/ML sur Amazon EC2 et AWS Graviton
CIFDAQ's Market Insight: SEC Turns Pro Crypto
solutions_manual_-_materials___processing_in_manufacturing__demargo_.pdf
Empathic Computing: Creating Shared Understanding
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
Teaching material agriculture food technology
Advanced Soft Computing BINUS July 2025.pdf
MYSQL Presentation for SQL database connectivity

OpenSearchLab and the Lucene Ecosystem

  • 1. OpenSearchLab and Lucene Grant Ingersoll Chief Scientist @LucidWorks Member, Committer at Apache Soft. Found. Co-Founder, Apache Mahout
  • 2. Hats I’m here as an individual who happens to contribute (and commit) to Lucene, Solr, Mahout and other open source projects. I don’t officially represent the ASF or even Lucene/Solr/Mahout.
  • 3. Topics • Openness • What are some OpenSearchLab (OSL) needs? • The Lucene Ecosystem • Lucene for Research? • A Sample Architecture
  • 4. Putting the Open in OpenSearchLab • Open Development >> Open Source • Open community • Open corpora • Open evaluations • Open Research • w/o being onerous http://www.facebook.com/photo.php?fbid=10151728075710181&set=a.101 51045050120181.780469.68096845180&type=1&theater
  • 5. OSL Needs? Community Code Infrastructure • Openness Model • Architecture • Hardware • Flexible • Cloud or hosted? • Contributions: • Scalable • Network/Bandwidth • Who? • Experiment Mgmt • Production/Staging/Dev • Where? • How? • Content Acquisition • $$$$ • Analysis • Ownership/Legal: • Indexing • Release Management • Code • Querying • Contributions • Downstream Tools • Devops • Infrastructure • Faceting, highlighting, •… auto-suggest, • Privacy spellchecking, etc. •… • Records Mgmt • Testing •…
  • 6. What’s this have to do with Lucene?
  • 7. “An ecosystem is a community of living organisms in conjunction with the nonliving components of their environment interacting as a system.” – Wikipedia Code Committers Contributors ASF Users
  • 8. The ASF and ASL • ASF == Apache Software Foundation – Volunteer-based, but many are paid to work on open source by their employer – Community Over Code • Consensus-driven development – Meritocracy • “Those who do, make the decisions” – 100+ Top Level Projects – Infrastructure to support projects – “The Apache Way” • ASL == Apache Software License (v2) ASL ≠ ASF
  • 9. Lucene Community • In a nutshell: Large, Active Community • 30+ committers, many, many more contributors • (Tens of?) Thousands of Practitioners • Thousands of production instances – Twitter, Apple, IBM Watson, LinkedIn, Netflix, Commercial Search Engines, … – “… they frequently turn to real-time search: our system serves over two billion queries a day, with an average query latency of 50 ms. Usually, tweets are searchable within 10 seconds after creation.” -- EarlyBird, Busch et. al.
  • 10. The Code Ecosystem Solr Tika Hadoop Lucene Core Nutch Mahout OpenNLP
  • 11. • Flagship Java library for building search applications – Indexing, Searching, Language Analysis • Powers apps large and small the world over • More in Apache Lucene 4 talk later • Fast, small footprint • Lots of useful related modules – Highlighting, Joins, Spatial, etc. • http://lucene.apache.org/core
  • 12. • Search server built using Lucene and HTTP • Faceting, highlighting, most Lucene features, easy admin • Highly Extensible • Scalable (query volume and index size) • Lucene Best Practices • http://lucene.apache.org/solr
  • 13. • Originally built for Nutch to solve large scale crawling problems • Distributed File System and Computation Model – HDFS and MapReduce, YARN coming • Common Use Cases: storage, log analysis, ETL • http://hadoop.apache.org
  • 14. • Web-scale crawler and search built on Lucene/Solr and Hadoop • Link analysis (aka PageRank) • Plugin framework • Parsers for common document formats (PDF, Word, HTML, etc.) • http://nutch.apache.org
  • 15. • Scalable machine learning – Utilize Hadoop where appropriate • Primary Focus: “The 3 C’s” – Clustering, classification, collaborative filtering • Others – Frequent pattern mining, topic extraction, statistically interesting phrases • http://mahout.apache.org
  • 16. • Toolkit for detecting and extracting content from MIME types • Support for many common file formats – Office, PDF, HTML, etc. • Intuitive API (think SAX parser) • Wraps best of breed open source extractors • Plug in your own • http://tika.apache.org
  • 17. • Supports common NLP tasks – NER, POS tagging, Chunking, Parsing, CoRef resolution • MaxEnt and Perceptron based – Working to make the machine learning pluggable • Some Multilingual support • New life at the ASF • Related: cTakes, Stanbol
  • 18. Other Useful Tools • Apache Zookeeper – Distrib. Coordination • Apache Pig – Hadoop scripting w/o Java • Apache HBase/Accumulo/Cassandra – BigTable/Dynamo • Avro and Protobufs – Serialization frameworks • Netty: Server framework – easy to add protocols and to scale • Stanbol – Semantic Content Management using Solr, OpenNLP, others • UIMA – Unstructured Info Management
  • 19. LUCENE CAN HAS RESEARCH? • Dispelling a few misconceptions: – No such thing as Lucene OOTB – Lucene ≠ Solr • Researchers are welcome! – Large audience and many domains – http://wiki.apache.org/lucene- java/HowToContribute – Battle-tested code – Speed v. Quality tradeoffs http://1.bp.blogspot.com/_T2ki5Em5dnI/S8gxtImG7 wI/AAAAAAAAAEs/N7aZKZ6g6g4/s1600/cat%2520ty ping.jpg
  • 20. Research/Contribution Areas • Work with the community to do evaluations • Scoring – BM25, LM, IM, DFR others already implemented – Easy to add your own • Codecs – Extensible compression/storage – Many already implemented approaches and more coming – SimpleText FTW! • Others: – Faceting, auto-suggest, spell-checking, highlighting, expansion and more – Different domains: machine generated data, mobile,
  • 21. Clients Abstract OSL Architecture Access APIs Personalization Shard Shard Shard Users/Admin/ ... & Machine 1 2 n Other Learning Search View Updates/Analysis Distributed, Scalable Distributed (Batch/Real Time) Storage Coordination and (Docs, Users, Logs) Messaging Keys Content Acquisition Distributed Content Content Acquisition - Service-Oriented Architecture Acquisition ETL - Stateless Batch and Real Time - Failover/Fault Tolerant - Glue is lightweight - Smart about updates Data (Internet)
  • 22. Clients Lucene Ecosystem Implementation Access APIs Personalization Shard Shard Shard Users/Admin/ ... & Machine 1 2 n Other Learning Search View Updates/Analysis Distributed, Scalable Distributed (Batch/Real Time) Storage Coordination and (Docs, Users, Logs) Messaging Keys Content Acquisition Distributed Content Content Acquisition - Service-Oriented Architecture Acquisition ETL - Stateless Batch and Real Time - Failover/Fault Tolerant - Glue is lightweight - Smart about updates Data (Internet)
  • 23. Takeaways • Open Development >> Open Source >> Shared Source – Corollary: You never know where good ideas are coming from • ASF is a proven model for collaboration • Lucene ecosystem: extensive, production ready • Lucene 4 is viable for IR algorithms and data structure research • OSL (IMO) needs a services-based, pluggable architecture
  • 24. Resources • Getting Started – {Lucene|Mahout|Hadoop} In Action – Taming Text • grant@lucidworks.com • @gsingers • http://www.lucidworks.com

Editor's Notes

  • #5: Shared source, visible source, BDFL is not open source. Open DEVELOPMENT is far more powerfulAnyone can be a “researcher” - Jack Andraka -- His study resulted in over 90 percent accuracy and showed his patent-pending sensor to be 28 times faster, 28 times less expensive and over 100 times more sensitive than current tests. Jack received the Gordon E. Moore Award, of $75,000, named in honor of Intel co-founder and retired chairman and CEO. -- You never know where the next good idea is coming fromOpen corpora: anyone anywhere should be able to download and run evaluations. If Common Crawl can do it, why can’t we? iBiblio, ASF, others can likely helpHow can we build, leverage and share an open evaluation framework? How do we leverage the Internet? Crowdsourcing? Dynamic nature of content, engines, community, users, etc.? Can we time slice experiments on a real system?Open Research: how do we encourage open methodology, open process, publications, etc. without being heavy-handed?
  • #6: Community will be the single most important pieceBottom up and top down needed to establish a community
  • #8: https://en.wikipedia.org/wiki/EcosystemMost people have this Pyramid backwards
  • #9: The ASF has a well developed community model that has been proven out over time
  • #10: Committers: many are paid to work on Lucene FT.Images: Commits: Ohloh, Traffic: lucene.markmail.org
  • #11: A loose orbit around Lucene Core
  • #20: Second bullet: deferred to 2nd talk