What’s new in Apache Solr 5.0
Who am I?
• Anshum Gupta, Apache Lucene/Solr committer,
Lucidworks Employee.
• Search and related stuff for 9+ years.
• Apache Lucene since 2006 and Solr since 2010.
• Organizations I am or have been a part of:
Solr - Releases
–Someone
Ease of Use: Because usability doesn’t end after
the first five minutes!
Scripts - Richer, faster, easier!
• Solr Demo:
• bin/post script
• Auto config-set copying
• Create -> Post -> Browse -> Delete
• bin/solr start -e cloud -noprompt ; bin/post -c
gettingstarted http://lucidworks -recursive 2; open http://
localhost:8983/solr/gettingstarted/browse
Example is now Server
• No default collection1
• Configset options
• ant example server
• post.sh
Posting documents was never so easy!
• bin/post script wraps around the improved
SimplePostTool
• Index JSON directly OTB
• Developers: SolrServer is now SolrClient
Managing Solr
Managing Solr Configuration - Application
• Paramsets: Add/Edit
• initParams: Generic appends, invariants and defaults
outside of the component
• Schema API: REST API for adding field types, and
dynamic fields
• Managing requestHandlers through API
• Implicit registration of replication, get and admin
Handlers.
Managing the cluster - Systems
• Collection APIs
• BALANCESHARDUNIQUE: Even distribution of custom replica properties
• Improved APIs
• Option to not shuffle nodeSet specified during CREATE Collection
• Logging
• Transaction log replay status
• Slow request (optional)
• Support for editing common solrconfig.xml values
• Scripts to support installing and running Solr as a service on Linux.
Keeping Solr Instance(s) Stable
• ReplicationHandler now has an option to throttle the speed of
replication
• timeAllowed respected more widely - Query expansion,
collection and LBHTTPSolrClient retries
• Finite default timeouts for select and update requests
Scalability
• Splitting of ClusterState
• Every collection has its own cluster state
• No need to watch what everyone else is doing
• Might be the default in 5.0
• Improved Solr - Zk communication
• Speed up overseer operations avoiding cluster state
reads from zookeeper at the start of each loop
• Better default timeouts to operate at a large scale
–Johnny Appleseed
“Type a quote here.”
Solr scalability is unmatched.
Features
Distributed IDF
• Multiple contributors and almost 5 years.
• 4 implementations OTB:
• LocalStatsCache: Local Stats
• ExactStatsCache: One time use aggregation
• ExactSharedStatsCache: Stats shared across requests
• LRUStatsCache: Stats shared in an LRU cache across requests
• Flow:
• Conditionally Send GET_TERM_STATS request to participating nodes
• Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS
• Conditional GET_FIELDS
Stats Component
• stats.field can now be used to generate stats over
the numeric results of arbitrary functions,
• stats.field={!func}product(price,popularity)
• Stats hang off pivots via tags
And there are more…
• DateRangeField for indexing date ranges, especially multi-valued ones.
• Spatial fields that used to require units=degrees now take
distanceUnits=degrees/kilometers miles instead.
• MoreLikeThis QueryParser: Works in SolrCloud mode too.
• API for managing blobs
and more…
• First class support in SolrJ for Collection API calls
• Upgrade Tika to 1.7: This adds support for parsing
Outlook PST and Matlab (MAT) files.
Maturity
• Jepsen tests
• More unit tests and more success
stories of Solr.
• Protection of ZK content
No more WAR!
• Solr is now an app, no more shipping a war starting
Solr 5.0
• Upgrade to Jetty 9 coming soon
• Will allow for a lot of things (SPDY) that wouldn’t be
possible if we had to support tomcat/netty/jetty
everything else.
Between 4.10 and 5.0: The new Identity
Timeline*
• Release branch cut
• 2nd RC vote in progress.
• Vote - 3 days, 3 votes
• Artifacts propagation to ASF mirrors - 1 day
• Official release note - Right after!
* prospective and subject to how things go
Coming soon
• Collections API: REBALANCESHARDS
• Spatial 2D heat-map faceting
• Facet and analytics
• Replication performance
• More API goodness
Questions?
Connect @
http://www.twitter.com/anshumgupta
http://www.linkedin.com/in/anshumgupta/
anshum@apache.org

More Related Content

PPTX
Scaling SolrCloud to a large number of Collections
PDF
Ease of use in Apache Solr
PDF
Deploying and managing Solr at scale
PDF
SolrCloud Cluster management via APIs
PDF
Apache Solr 5.0 and beyond
PDF
What's New in Apache Solr 4.10
PPTX
Managing a SolrCloud cluster using APIs
PDF
Best practices for highly available and large scale SolrCloud
Scaling SolrCloud to a large number of Collections
Ease of use in Apache Solr
Deploying and managing Solr at scale
SolrCloud Cluster management via APIs
Apache Solr 5.0 and beyond
What's New in Apache Solr 4.10
Managing a SolrCloud cluster using APIs
Best practices for highly available and large scale SolrCloud

What's hot (20)

PDF
Understanding the Solr security framework - Lucene Solr Revolution 2015
PDF
Solr security frameworks
PDF
First oslo solr community meetup lightning talk janhoy
PDF
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
PDF
Inside Solr 5 - Bangalore Solr/Lucene Meetup
PDF
Intro to Apache Solr
PPTX
Solrcloud Leader Election
PDF
SolrCloud Failover and Testing
PDF
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
PPTX
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
PPTX
Solr Exchange: Introduction to SolrCloud
PDF
Scaling search with Solr Cloud
PDF
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
ODP
Get involved with the Apache Software Foundation
PDF
Autoscaling Suggestions: Simplifying Operations - Varun Thacker, Lucidworks
PPTX
Storm worker redesign
PPTX
Project Orleans - Actor Model framework
PPTX
"Walk in a distributed systems park with Orleans" Евгений Бобров
PDF
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
PPTX
A Brief Intro to Microsoft Orleans
Understanding the Solr security framework - Lucene Solr Revolution 2015
Solr security frameworks
First oslo solr community meetup lightning talk janhoy
Scaling SolrCloud to a Large Number of Collections - Fifth Elephant 2014
Inside Solr 5 - Bangalore Solr/Lucene Meetup
Intro to Apache Solr
Solrcloud Leader Election
SolrCloud Failover and Testing
Scaling SolrCloud to a Large Number of Collections: Presented by Shalin Shekh...
Eventual Consistency @WalmartLabs with Kafka, Avro, SolrCloud and Hadoop
Solr Exchange: Introduction to SolrCloud
Scaling search with Solr Cloud
Autoscaling Solr - Shalin Shekhar Mangar, Lucidworks
Get involved with the Apache Software Foundation
Autoscaling Suggestions: Simplifying Operations - Varun Thacker, Lucidworks
Storm worker redesign
Project Orleans - Actor Model framework
"Walk in a distributed systems park with Orleans" Евгений Бобров
Solr Consistency and Recovery Internals - Mano Kovacs, Cloudera
A Brief Intro to Microsoft Orleans
Ad

Viewers also liked (19)

PDF
Webinar: What's New in Solr 6
PDF
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
PDF
Webinar: Search and Recommenders
PDF
Webinar: Fusion for Business Intelligence
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PDF
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
PDF
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
PDF
it's just search
PDF
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
PDF
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
PDF
Working with deeply nested documents in Apache Solr
PDF
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
PDF
Webinar: Replace Google Search Appliance with Lucidworks Fusion
PPTX
Slash n near real time indexing
PDF
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
PDF
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
PDF
Parallel SQL and Streaming Expressions in Apache Solr 6
Webinar: What's New in Solr 6
Native Code & Off-Heap Data Structures for Solr: Presented by Yonik Seeley, H...
Webinar: Search and Recommenders
Webinar: Fusion for Business Intelligence
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
it's just search
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro...
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Working with deeply nested documents in Apache Solr
Coffee, Danish & Search: Presented by Alan Woodward & Charlie Hull, Flax
Webinar: Replace Google Search Appliance with Lucidworks Fusion
Slash n near real time indexing
Building a Vibrant Search Ecosystem @ Bloomberg: Presented by Steven Bower & ...
Solr Highlighting at Full Speed: Presented by Timothy Rodriguez, Bloomberg & ...
Parallel SQL and Streaming Expressions in Apache Solr 6
Ad

Similar to What's new in Solr 5.0 (20)

KEY
Apache Solr - Enterprise search platform
PDF
Meet Solr For The Tirst Again
PPTX
Benchmarking Solr Performance at Scale
PDF
Solr Powered Lucene
PDF
Solr 4
PPTX
Solr/Elasticsearch for CF Developers (and others)
PDF
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
PDF
Lessons from Sharding Solr
PDF
Solr search engine with multiple table relation
PDF
Performance and Abstractions
PDF
Solr at zvents 6 years later & still going strong
KEY
Big Search with Big Data Principles
PDF
How do Solr and Azure Search compare?
PDF
Oslo Solr MeetUp March 2012 - Solr4 alpha
PDF
Solr Recipes
PPTX
What's new in Lucene and Solr 4.x
PDF
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
PPTX
(Re)Indexing Large Repositories in Alfresco
PPTX
From Lucene to Solr 4 Trunk
Apache Solr - Enterprise search platform
Meet Solr For The Tirst Again
Benchmarking Solr Performance at Scale
Solr Powered Lucene
Solr 4
Solr/Elasticsearch for CF Developers (and others)
Lessons From Sharding Solr At Etsy: Presented by Gregg Donovan, Etsy
Lessons from Sharding Solr
Solr search engine with multiple table relation
Performance and Abstractions
Solr at zvents 6 years later & still going strong
Big Search with Big Data Principles
How do Solr and Azure Search compare?
Oslo Solr MeetUp March 2012 - Solr4 alpha
Solr Recipes
What's new in Lucene and Solr 4.x
Backing Data Silo Atack: Alfresco sharding, SOLR for non-flat objects
(Re)Indexing Large Repositories in Alfresco
From Lucene to Solr 4 Trunk

Recently uploaded (20)

PPTX
Matchmaking for JVMs: How to Pick the Perfect GC Partner
PPTX
Trending Python Topics for Data Visualization in 2025
PDF
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Introduction to Windows Operating System
PDF
Guide to Food Delivery App Development.pdf
PDF
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PDF
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
PDF
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PDF
How Tridens DevSecOps Ensures Compliance, Security, and Agility
PPTX
GSA Content Generator Crack (2025 Latest)
PDF
MCP Security Tutorial - Beginner to Advanced
PPTX
most interesting chapter in the world ppt
PPTX
Download Adobe Photoshop Crack 2025 Free
PDF
CCleaner 6.39.11548 Crack 2025 License Key
PPTX
Cybersecurity: Protecting the Digital World
PPTX
Tech Workshop Escape Room Tech Workshop
Matchmaking for JVMs: How to Pick the Perfect GC Partner
Trending Python Topics for Data Visualization in 2025
AI/ML Infra Meetup | Beyond S3's Basics: Architecting for AI-Native Data Access
Topaz Photo AI Crack New Download (Latest 2025)
Introduction to Windows Operating System
Guide to Food Delivery App Development.pdf
EaseUS PDF Editor Pro 6.2.0.2 Crack with License Key 2025
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Introduction to Ragic - #1 No Code Tool For Digitalizing Your Business Proces...
Multiverse AI Review 2025: Access All TOP AI Model-Versions!
Wondershare Recoverit Full Crack New Version (Latest 2025)
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
How Tridens DevSecOps Ensures Compliance, Security, and Agility
GSA Content Generator Crack (2025 Latest)
MCP Security Tutorial - Beginner to Advanced
most interesting chapter in the world ppt
Download Adobe Photoshop Crack 2025 Free
CCleaner 6.39.11548 Crack 2025 License Key
Cybersecurity: Protecting the Digital World
Tech Workshop Escape Room Tech Workshop

What's new in Solr 5.0

  • 1. What’s new in Apache Solr 5.0
  • 2. Who am I? • Anshum Gupta, Apache Lucene/Solr committer, Lucidworks Employee. • Search and related stuff for 9+ years. • Apache Lucene since 2006 and Solr since 2010. • Organizations I am or have been a part of:
  • 4. –Someone Ease of Use: Because usability doesn’t end after the first five minutes!
  • 5. Scripts - Richer, faster, easier! • Solr Demo: • bin/post script • Auto config-set copying • Create -> Post -> Browse -> Delete • bin/solr start -e cloud -noprompt ; bin/post -c gettingstarted http://lucidworks -recursive 2; open http:// localhost:8983/solr/gettingstarted/browse
  • 6. Example is now Server • No default collection1 • Configset options • ant example server • post.sh
  • 7. Posting documents was never so easy! • bin/post script wraps around the improved SimplePostTool • Index JSON directly OTB • Developers: SolrServer is now SolrClient
  • 9. Managing Solr Configuration - Application • Paramsets: Add/Edit • initParams: Generic appends, invariants and defaults outside of the component • Schema API: REST API for adding field types, and dynamic fields • Managing requestHandlers through API • Implicit registration of replication, get and admin Handlers.
  • 10. Managing the cluster - Systems • Collection APIs • BALANCESHARDUNIQUE: Even distribution of custom replica properties • Improved APIs • Option to not shuffle nodeSet specified during CREATE Collection • Logging • Transaction log replay status • Slow request (optional) • Support for editing common solrconfig.xml values • Scripts to support installing and running Solr as a service on Linux.
  • 11. Keeping Solr Instance(s) Stable • ReplicationHandler now has an option to throttle the speed of replication • timeAllowed respected more widely - Query expansion, collection and LBHTTPSolrClient retries • Finite default timeouts for select and update requests
  • 13. • Splitting of ClusterState • Every collection has its own cluster state • No need to watch what everyone else is doing • Might be the default in 5.0 • Improved Solr - Zk communication • Speed up overseer operations avoiding cluster state reads from zookeeper at the start of each loop • Better default timeouts to operate at a large scale
  • 14. –Johnny Appleseed “Type a quote here.” Solr scalability is unmatched.
  • 16. Distributed IDF • Multiple contributors and almost 5 years. • 4 implementations OTB: • LocalStatsCache: Local Stats • ExactStatsCache: One time use aggregation • ExactSharedStatsCache: Stats shared across requests • LRUStatsCache: Stats shared in an LRU cache across requests • Flow: • Conditionally Send GET_TERM_STATS request to participating nodes • Compute global values, another request for SET_TERM_STATS + GET_TOP_IDS • Conditional GET_FIELDS
  • 17. Stats Component • stats.field can now be used to generate stats over the numeric results of arbitrary functions, • stats.field={!func}product(price,popularity) • Stats hang off pivots via tags
  • 18. And there are more… • DateRangeField for indexing date ranges, especially multi-valued ones. • Spatial fields that used to require units=degrees now take distanceUnits=degrees/kilometers miles instead. • MoreLikeThis QueryParser: Works in SolrCloud mode too. • API for managing blobs
  • 19. and more… • First class support in SolrJ for Collection API calls • Upgrade Tika to 1.7: This adds support for parsing Outlook PST and Matlab (MAT) files.
  • 20. Maturity • Jepsen tests • More unit tests and more success stories of Solr. • Protection of ZK content
  • 21. No more WAR! • Solr is now an app, no more shipping a war starting Solr 5.0 • Upgrade to Jetty 9 coming soon • Will allow for a lot of things (SPDY) that wouldn’t be possible if we had to support tomcat/netty/jetty everything else.
  • 22. Between 4.10 and 5.0: The new Identity
  • 23. Timeline* • Release branch cut • 2nd RC vote in progress. • Vote - 3 days, 3 votes • Artifacts propagation to ASF mirrors - 1 day • Official release note - Right after! * prospective and subject to how things go
  • 24. Coming soon • Collections API: REBALANCESHARDS • Spatial 2D heat-map faceting • Facet and analytics • Replication performance • More API goodness