SlideShare a Scribd company logo
Mathematical Modeling
with Apache Solr
Joel Bernstein
Senior Data Engineer, Lucidworks
Lucene/Solr Committer & PMC
@jbernste2
#Activate18 #ActivateSearch
Agenda
• Math Expressions Introduction
• Model Types
• Training Regression Models
• Assessing Regression Models
• Prediction and Anomaly Detection
• Mathematics language that runs inside of Solr Cloud.

• Integration of Apache Commons Math with Solr.

• Designed for fast quantitative analysis of result sets. 

• Any Streaming Expression can be used to create the data set.
Math Expressions
Vector/Matrix Math Text Analysis
Statistics Probability
Monte Carlo Simulations Linear Regression
Curve Fitting Time Series
Interpolation, Derivatives, Integrals Digital Signal Processing
Machine Learning Computational Geometry
Statistical Modeling
Two of the most commonly used statistical models are: 

• Regression Models: Predicting Numeric Values

• Probability Distributions: Models of probability
Regression Support in Math Expressions
Linear
• Simple Linear Regression
• Multivariate Linear Regression
Non-linear
• Loess Regression: Bivariate, robust, often used for time series modeling. 

• Polynomial Curve Fitting: Bivariate, general purpose modeling of curves.

• Harmonic Curve Fitting: Bivariate, sine wave modeling.

• Gaussian Curve Fitting: Bivariate, modeling a Gaussian peak. 

• KNN Regression: Multivariate, robust, distance based, very flexible.
Probability Distributions
• Statistical models of probability

• Used to model risk

• Perform simulations

• Natural outlier detectors

• Math Expressions supports many of the commonly used probability distributions 

• There is an important relationship between the Normal Distribution and Regression Models
Training the Model
Use Case
• Detect unusual slowness in the network.

• Use Simple Linear Regression to model the linear relationship of file sizes and response times.

• Use the regression model to detect higher than expected response times.
Random Sample
Response
Set the Samples to a Variable
Response
Vectorize the File Sizes
Response
Vectorize the Responses
Response
Plotting the Responses (Sunplot by: Michael Suzuki)
Response Times
Simple Linear Regression
Response
Cache the Model
Assessing the Model
Statistical Analysis of the Residuals
• Residuals
– Calculate the Residuals

– Describe

– Normality Test

– Residual Plot (Homoscedasticity)

– Model the Residuals
Residuals
• The difference between the actual value and the predicted value is called the residual.

• Residuals represent the error of a regression model.

• Residuals can be analyzed and modeled as a probability distribution.

• In an ideal scenario the residuals well be normally distributed and homoscedastic.
Predictions
Response
Calculate the Residuals
Response
Describe
Response
Testing for Normality
Response
Cache the Residuals Distribution
Residual Plot
Scatter Plot of Residuals
Prediction
Getting the Cached Model
Response
Request/Response Prediction
Response
Streaming Prediction
Response
Anomaly Detection
Streaming Anomaly Detection
Response
Thank you!
Joel Bernstein
Senior Data Engineer, Lucidworks
@jbernste2
#Activate18 #ActivateSearch

More Related Content

PPTX
Top 10 Data Science Practioner Pitfalls - Mark Landry
PDF
Principal component analysis, Code and Time Complexity
PDF
H2O World - Ensembles with Erin LeDell
PDF
PCA (Principal component analysis)
PPTX
Pca(principal components analysis)
PPTX
Project presentation slides
PDF
How to understand and implement regression analysis
PPTX
Information retrieval 12 modern ir and set based models
Top 10 Data Science Practioner Pitfalls - Mark Landry
Principal component analysis, Code and Time Complexity
H2O World - Ensembles with Erin LeDell
PCA (Principal component analysis)
Pca(principal components analysis)
Project presentation slides
How to understand and implement regression analysis
Information retrieval 12 modern ir and set based models

Similar to Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks (20)

PDF
Statistical Modeling in Research_Dr.Balamurugan .pdf
PPT
Linear functions and modeling
PPTX
Qt unit i
PDF
Regression Linear Modeling Best Practices And Modern Methods 1st Edition Jaso...
PDF
Inference with big data: SCECR 2012 Presentation
PDF
Python Advanced Predictive Analytics Kumar Ashish
PPTX
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
PDF
Scaling Analytics with Apache Spark
PDF
Confidence Intervals in Generalized Regression Models 1st Edition Esa Uusipaikka
DOCX
The future is uncertain. Some events do have a very small probabil.docx
DOCX
seminar sanjubggggggdrydrereryrdryfghj.docx
PDF
The math behind big systems analysis.
PPTX
MLU_DTE_Lecture_2.pptx
PDF
Time Series Modeling Computation And Inference West Mike Prado
PPTX
Regression is A statistical procedure used to find relationships among a set...
PDF
sanjutdx gffzcbvn nbnutreasdfghhbn b.pdf
PPTX
Machine Learning with Python made easy and simple
PDF
PDF
Module 4: Model Selection and Evaluation
PDF
Assumptions: Check yo'self before you wreck yourself
Statistical Modeling in Research_Dr.Balamurugan .pdf
Linear functions and modeling
Qt unit i
Regression Linear Modeling Best Practices And Modern Methods 1st Edition Jaso...
Inference with big data: SCECR 2012 Presentation
Python Advanced Predictive Analytics Kumar Ashish
Linear Regression Algorithm | Linear Regression in R | Data Science Training ...
Scaling Analytics with Apache Spark
Confidence Intervals in Generalized Regression Models 1st Edition Esa Uusipaikka
The future is uncertain. Some events do have a very small probabil.docx
seminar sanjubggggggdrydrereryrdryfghj.docx
The math behind big systems analysis.
MLU_DTE_Lecture_2.pptx
Time Series Modeling Computation And Inference West Mike Prado
Regression is A statistical procedure used to find relationships among a set...
sanjutdx gffzcbvn nbnutreasdfghhbn b.pdf
Machine Learning with Python made easy and simple
Module 4: Model Selection and Evaluation
Assumptions: Check yo'self before you wreck yourself
Ad

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond
Ad

Recently uploaded (20)

PDF
NewMind AI Weekly Chronicles - August'25-Week II
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PPT
Teaching material agriculture food technology
PDF
A comparative analysis of optical character recognition models for extracting...
DOCX
The AUB Centre for AI in Media Proposal.docx
PPTX
MYSQL Presentation for SQL database connectivity
PPTX
sap open course for s4hana steps from ECC to s4
PPTX
Machine Learning_overview_presentation.pptx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PPTX
Cloud computing and distributed systems.
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PPTX
Big Data Technologies - Introduction.pptx
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Encapsulation theory and applications.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
NewMind AI Weekly Chronicles - August'25-Week II
Chapter 3 Spatial Domain Image Processing.pdf
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Teaching material agriculture food technology
A comparative analysis of optical character recognition models for extracting...
The AUB Centre for AI in Media Proposal.docx
MYSQL Presentation for SQL database connectivity
sap open course for s4hana steps from ECC to s4
Machine Learning_overview_presentation.pptx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
Machine learning based COVID-19 study performance prediction
Profit Center Accounting in SAP S/4HANA, S4F28 Col11
MIND Revenue Release Quarter 2 2025 Press Release
Cloud computing and distributed systems.
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Big Data Technologies - Introduction.pptx
Reach Out and Touch Someone: Haptics and Empathic Computing
Advanced methodologies resolving dimensionality complications for autism neur...
Encapsulation theory and applications.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”

Applied Mathematical Modeling with Apache Solr - Joel Bernstein, Lucidworks