SlideShare a Scribd company logo
O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
Using Apache Solr for Images As Big Data: A Case Study
Kerry Koitzsch
Architect, Wipro Technologies
Overview of this
Presentation
•  This quick overview of one of our ongoing projects
describes why Lucene and Solr are key parts of our ongoing
research, development, and client support activities.
•  The presentation highlights areas of research which
involve Solr technologies in the “images as big data”
arena: an automated microscope slide application
prototype as well as other kinds of data analysis and
visualization. The use case described relies heavily on
Lucene, Solr, and related “helper libraries” to provide
data storage capabilities for the software toolkit, the
“Image as Big Data Toolkit” (IABDT).
•  Throughout the presentation we discuss how the flexibility,
high performance, and ability to “play well with” other
components makes Lucene/Solr an essential part of the
application described here.
4
01
Use Case Overview: How Solr Technologies Relate To:
§ ‘Old School’ statistical displays
§ Web-based data visualization
§ ‘Glue Ware’
§ A crime statistic visualization
§ An image as big data
visualization
5
02
Types of Data Visualization
Statistical displays --- ‘old school’ histogram, pie
chart, and time series
Tabular displays --- stylized table-based
visualization with search, etc.
Notebook based visualization
Map based displays with geo-location
Images with overlays
Constructing data visualizers with Lucene | Solr
components
6
03
“Old School” Statistical Visualization
Histograms, line charts, pie charts and
time series displays.
Notebook technologies, built-in visualization
capabilities (such as Elasticsearch-Kibana or
Apache Mahout visualization) may be used
with Cassandra data and with Lucene/Solr.
A standard ETL approach may be used as
part of the data pipeline, and intelligent
search can be provided by Lucene/Solr.
7
01
“Old School” Statistical Visualization: Standard Plots and Charts
8
01
“Old School” Visualization of Classifier Results
9
01
“Old School” Statistical Visualization: Standard Time Series Plots
10
01
Tabular Display Visualization: Hive Notebook
11
01
Graph Visualization
ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location
Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y
Coordinate,Year,Updated On,Latitude,Longitude,Location9955810,HY144797,02/08/2015
11:43:40 PM,081XX S COLES AVE,1811,NARCOTICS,POSS: CANNABIS 30GMS OR
LESS,STREET,true,false,0422,004,7,46,18,1198273,1851626,2015,02/15/2015 12:43:39 PM,
41.747693646,-87.549035389,"(41.747693646, -87.549035389)"9955861,HY144838,02/08/2015
11:41:42 PM,118XX S STATE ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,true,true,
0522,005,34,53,08B,1178335,1826581,2015,02/15/2015 12:43:39 PM,
41.679442289,-87.622850758,"(41.679442289, -87.622850758)"9955801,HY144779,02/08/2015
11:30:22 PM,002XX S LARAMIE AVE,2026,NARCOTICS,POSS: PCP,SIDEWALK,true,false,
1522,015,29,25,18,1141717,1898581,2015,02/15/2015 12:43:39 PM,
41.87777333,-87.755117993,"(41.87777333, -87.755117993)"9956197,HY144787,02/08/2015
11:30:23 PM,006XX E 67TH ST,1811,NARCOTICS,POSS: CANNABIS 30GMS OR
LESS,STREET,true,false,0321,,6,42,18,,,2015,02/15/2015 12:43:39 PM,,,
9955846,HY144829,02/08/2015 11:30:58 PM,0000X S MAYFIELD AVE,0610,BURGLARY,FORCIBLE
ENTRY,APARTMENT,false,false,1513,015,29,25,05,1137239,1899372,2015,02/15/2015 12:4
§ Leveraging Graph databases and graph visualization toolkits with Lucene/Solr-centric systems
§ Giraph, neo4j, OrientDB, and other graph databases in combination with a Lucene/Solr centric
technology stack
§ For example, Chicago crime data format as CSV:
Graph Visualization in Neo4J
Graph Visualization Example I: Neo4J (Separate Nodes)
Graph Visualization Example : Simple UIs and Hierarchies
Graph Visualization Example II: gojs Visualization
Notebook-Based Visualization
Jupyter or Zeppelin
notebook technologies may
be used to display Solr
based information and
analytics results
These notebook
technologies can be used
as the display component
in a data pipeline oriented
processing architecture
Solr works well as one
element of such a data
pipeline
Spring, Spring Data, and
Apache Tika may be used
as data pipeline
components
Simpler data pipelines may
be evolved into Complex
Event Processors (CEPs)
Notebook Visualization: Architecture and Strategy
§ A relatively simple data pipeline system
may be build using Zeppelin notebook
as a visualization of the output results
§ Geolocation data may be visualized as
in the following example
Hadoop HBase NGData Lily Solr Lucene
Solandra Katta
Cassandra ELK Stack
Kafka
Apache
Spark
Mesos
Akka
Technology components
Notebook Based Visualization: Example: Solr-Zeppelin-Cassandra
Map / Geolocation Visualization
Crime data can easily be imported into Solr
The data may be manipulated and pushed
into Elasticsearch or Solr or back to
Cassandra
Elasticsearch data can be visualized using
Kibana and searched compatibly with
Lucene | Solr and the other modules
Logstash may be used to assist in importing
data from “log file analysis” type applications, or
Flume or any of the many other import
frameworks: Apache Tika is especially useful as
a support library
Map / Geolocation Data: Crime Data in Solr
§ Technology stack includes
the ELK Stack plus
Cassandra plus Lucene/Solr/
Hadoop
§ Data may use CSV crime
data files as an original data
source
§  Solr can process JSON
based data with geolocation
data associated with it, and is
especially powerful with
Apache Tika
Map / Geolocation : Crime Data in Kibana
§ Technology stack includes
the ELK Stack plus
Cassandra plus Lucene/Solr/
Hadoop
§ Data may use CSV crime
data files as an original data
source
§  Kibana can process JSON
based data with geolocation
data associated with it, as
can Lucene/Solr/Tika
Map | Geolocation Visualization: Data to Image
“Image as Big Data” Visualization
A data pipeline with images as a data
source
Feature extraction can identify features of
interest and write them to Cassandra as feature
descriptors, using Lucene/Solr for intelligent
search capability
Deep learning and machine learning can
enhance the processing pipeline
Image as Big Data Analysis
Image as Big Data Analysis (Poggio’s MIT Vision Machine)
Original Images
Color Analyzers Texture Analyzers Edge Detectors Motion Analyzers
Stereo Image
Analyzers
Discontinuity Map Generation (Including Line & Continuous Process)
Cooperating Recognition Process
Analysis Result Repository
Intelligent Search with Lucene Solr Centric Architecture
Image “As Big Data” Analytics Visualization: Linear Features
Automated Microscopy : The Original Components
Feature Extraction : Original Electron Microscope Image
Feature Extraction : Image to Data : Ellipses
Feature Extraction : Image to Data : Contours
“Image as Big Data” Visualization: Optical Microscope Hardware
Microscope Control Software, with Data Ingestion
“Image as Big Data” Visualization: Solr Search: Metadata
“Image as Big Data” Visualization: Microscopy UI
Another View of the Data Pipeline
	
  
Image	
  and	
  Metadata	
  
Input	
  Sources	
  
(or	
  “smart	
  sensors”)	
  
Multi-­‐sensor	
  Fusion	
  
Software	
  Engine	
  
Short	
  Term	
  
Computation	
  Result	
  
Repository	
  
Long-­‐Term	
  	
  Result	
  
Data	
  Repository	
  
Feature	
  Extraction	
  
and	
  Model	
  Builder	
  
Global	
  System	
  Controller	
  
Conclusions and Future Work
A use case was described in which we use a Lucene/Solr-
centric technology stack to provide an intelligent search
component
Flat files, HDFS files, CSV data, data streams and other data
sources may be used, including microscope images of many
different formats, resolutions, and metadata content
“Images as big data” is a viable strategy for building image
processing applications with Lucene/Solr as an intelligent
search component, because of Lucene/Solr’s flexibility and
ability to play well with other components
Deep learning, machine learning, data mining, and hybrid
techniques can be used to develop Lucene/Solr-centric
analytics applications with “intelligent search” capabilities
Your Questions?
Kerry.koitzsch@wipro.com
Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch,  Wipro Technologies

More Related Content

PDF
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
PDF
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
PDF
Cloud Experience: Data-driven Applications Made Simple and Fast
PDF
Spark Summit EU talk by Pat Patterson
PDF
Credit Fraud Prevention with Spark and Graph Analysis
PDF
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
PDF
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
PPTX
SplunkLive! Presentation - Data Onboarding with Splunk
Automotive Information Research Driven by Apache Solr: Presented by Mario-Lea...
Your Big Data Stack is Too Big!: Presented by Timothy Potter, Lucidworks
Cloud Experience: Data-driven Applications Made Simple and Fast
Spark Summit EU talk by Pat Patterson
Credit Fraud Prevention with Spark and Graph Analysis
Introduction to Lucidworks Fusion - Alexander Kanarsky, Lucidworks
Smack Stack and Beyond—Building Fast Data Pipelines with Jorg Schad
SplunkLive! Presentation - Data Onboarding with Splunk

What's hot (20)

PPTX
SplunkLive! Analytics with Splunk Enterprise
PDF
Headaches and Breakthroughs in Building Continuous Applications
PPTX
Building Data Pipelines with Spark and StreamSets
PDF
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
PDF
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
PPTX
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
PDF
Spark Summit EU talk by John Musser
PDF
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
PDF
What to Expect for Big Data and Apache Spark in 2017
PDF
Databricks with R: Deep Dive
PDF
Monitoring Error Logs at Databricks
PPTX
Simplifying Big Data Applications with Apache Spark 2.0
PDF
New Directions for Spark in 2015 - Spark Summit East
PDF
Putting AI to Work on Apache Spark
PPTX
Quark Virtualization Engine for Analytics
PPTX
Apache Spark Model Deployment
PPTX
Databricks @ Strata SJ
PDF
Spark Summit EU talk by Stephan Kessler
PDF
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
PPTX
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
SplunkLive! Analytics with Splunk Enterprise
Headaches and Breakthroughs in Building Continuous Applications
Building Data Pipelines with Spark and StreamSets
Search Analytics Component: Presented by Steven Bower, Bloomberg L.P.
Is there a way that we can build our Azure Synapse Pipelines all with paramet...
Introducing apache prediction io (incubating) (bay area spark meetup at sales...
Spark Summit EU talk by John Musser
CERN’s Next Generation Data Analysis Platform with Apache Spark with Enric Te...
What to Expect for Big Data and Apache Spark in 2017
Databricks with R: Deep Dive
Monitoring Error Logs at Databricks
Simplifying Big Data Applications with Apache Spark 2.0
New Directions for Spark in 2015 - Spark Summit East
Putting AI to Work on Apache Spark
Quark Virtualization Engine for Analytics
Apache Spark Model Deployment
Databricks @ Strata SJ
Spark Summit EU talk by Stephan Kessler
"Spark Search" - In-memory, Distributed Search with Lucene, Spark, and Tachyo...
Real-time Machine Learning Analytics Using Structured Streaming and Kinesis F...
Ad

Viewers also liked (20)

PDF
Searching Images by Color: Presented by Chris Becker, Shutterstock
PDF
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
PDF
Solr Graph Query: Presented by Kevin Watters, KMW Technology
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
PDF
Apache Solr 5.0 and beyond
PDF
Webinar: Fusion for Business Intelligence
PDF
Webinar: Search and Recommenders
PDF
Understanding the Solr security framework - Lucene Solr Revolution 2015
PDF
What's New in Apache Solr 4.10
PDF
What's new in Solr 5.0
PDF
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
PDF
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
PPTX
Scaling SolrCloud to a large number of Collections
PDF
it's just search
PDF
Ease of use in Apache Solr
PDF
Solr security frameworks
PDF
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
PDF
SolrCloud Cluster management via APIs
PDF
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
PDF
Working with deeply nested documents in Apache Solr
Searching Images by Color: Presented by Chris Becker, Shutterstock
Building a Solr Continuous Delivery Pipeline with Jenkins: Presented by James...
Solr Graph Query: Presented by Kevin Watters, KMW Technology
Downtown SF Lucene/Solr Meetup: Developing Scalable User Search for PlayStati...
Apache Solr 5.0 and beyond
Webinar: Fusion for Business Intelligence
Webinar: Search and Recommenders
Understanding the Solr security framework - Lucene Solr Revolution 2015
What's New in Apache Solr 4.10
What's new in Solr 5.0
Webinar: Fusion 2.3 Preview - Enhanced Features with Solr & Spark
Solr JDBC: Presented by Kevin Risden, Avalon Consulting
Scaling SolrCloud to a large number of Collections
it's just search
Ease of use in Apache Solr
Solr security frameworks
Cross Data Center Replication for the Enterprise: Presented by Adam Williams,...
SolrCloud Cluster management via APIs
Downtown SF Lucene/Solr Meetup: Developing Scalable Search for User Generated...
Working with deeply nested documents in Apache Solr
Ad

Similar to Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro Technologies (20)

PDF
DataStax and Esri: Geotemporal IoT Search and Analytics
PDF
Big data, Hadoop - lunchtime talk 2015.02.26
PPTX
Gilbane Boston 2012 Big Data 101
PDF
WDE08 Visualizing Web of Data
PPT
Linked Open Data and Ontotext Projects
PDF
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
PDF
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
PDF
20130117 - Big Data Architectures
PPTX
Intro to Big Data in Urban GIS Research
PPTX
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
PPT
Participatory Web
PPT
Big Data = Big Decisions
PDF
Big data visualization frameworks and applications at Kitware
PPTX
Enterprise large scale graph analytics and computing base on distribute graph...
PDF
Semantic Technologies for Enterprise Cloud Management
PDF
Searching Images: Recent research at Southampton
PPTX
Everything Self-Service:Linked Data Applications with the Information Workbench
PPTX
MapR lucidworks joint webinar
PDF
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
PPT
NTEN Webinar - Data Cleaning and Visualization Tools for Nonprofits
DataStax and Esri: Geotemporal IoT Search and Analytics
Big data, Hadoop - lunchtime talk 2015.02.26
Gilbane Boston 2012 Big Data 101
WDE08 Visualizing Web of Data
Linked Open Data and Ontotext Projects
Resource Classification as the Basis for a Visualization Pipeline in LOD Scen...
Big Data Day LA 2015 - Big Data Day LA 2015 - Applying GeoSpatial Analytics u...
20130117 - Big Data Architectures
Intro to Big Data in Urban GIS Research
Cassandra-Based Image Processing: Two Case Studies (Kerry Koitzsch, Kildane) ...
Participatory Web
Big Data = Big Decisions
Big data visualization frameworks and applications at Kitware
Enterprise large scale graph analytics and computing base on distribute graph...
Semantic Technologies for Enterprise Cloud Management
Searching Images: Recent research at Southampton
Everything Self-Service:Linked Data Applications with the Information Workbench
MapR lucidworks joint webinar
BSC 3362 - Big Data and Social Analytics - IOD Conference (IBM)
NTEN Webinar - Data Cleaning and Visualization Tools for Nonprofits

More from Lucidworks (20)

PDF
Search is the Tip of the Spear for Your B2B eCommerce Strategy
PDF
Drive Agent Effectiveness in Salesforce
PPTX
How Crate & Barrel Connects Shoppers with Relevant Products
PPTX
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
PPTX
Connected Experiences Are Personalized Experiences
PDF
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
PPTX
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
PPTX
Preparing for Peak in Ecommerce | eTail Asia 2020
PPTX
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
PPTX
AI-Powered Linguistics and Search with Fusion and Rosette
PDF
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
PPTX
Webinar: Smart answers for employee and customer support after covid 19 - Europe
PDF
Smart Answers for Employee and Customer Support After COVID-19
PPTX
Applying AI & Search in Europe - featuring 451 Research
PPTX
Webinar: Accelerate Data Science with Fusion 5.1
PDF
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
PPTX
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
PPTX
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
PPTX
Webinar: Building a Business Case for Enterprise Search
PPTX
Why Insight Engines Matter in 2020 and Beyond
Search is the Tip of the Spear for Your B2B eCommerce Strategy
Drive Agent Effectiveness in Salesforce
How Crate & Barrel Connects Shoppers with Relevant Products
Lucidworks & IMRG Webinar – Best-In-Class Retail Product Discovery
Connected Experiences Are Personalized Experiences
Intelligent Insight Driven Policing with MC+A, Toronto Police Service and Luc...
[Webinar] Intelligent Policing. Leveraging Data to more effectively Serve Com...
Preparing for Peak in Ecommerce | eTail Asia 2020
Accelerate The Path To Purchase With Product Discovery at Retail Innovation C...
AI-Powered Linguistics and Search with Fusion and Rosette
The Service Industry After COVID-19: The Soul of Service in a Virtual Moment
Webinar: Smart answers for employee and customer support after covid 19 - Europe
Smart Answers for Employee and Customer Support After COVID-19
Applying AI & Search in Europe - featuring 451 Research
Webinar: Accelerate Data Science with Fusion 5.1
Webinar: 5 Must-Have Items You Need for Your 2020 Ecommerce Strategy
Where Search Meets Science and Style Meets Savings: Nordstrom Rack's Journey ...
Apply Knowledge Graphs and Search for Real-World Decision Intelligence
Webinar: Building a Business Case for Enterprise Search
Why Insight Engines Matter in 2020 and Beyond

Recently uploaded (20)

PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Per capita expenditure prediction using model stacking based on satellite ima...
PDF
Diabetes mellitus diagnosis method based random forest with bat algorithm
PDF
Encapsulation theory and applications.pdf
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
MYSQL Presentation for SQL database connectivity
PDF
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
PPTX
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
PPTX
Big Data Technologies - Introduction.pptx
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
20250228 LYD VKU AI Blended-Learning.pptx
PDF
Encapsulation_ Review paper, used for researhc scholars
PPT
Teaching material agriculture food technology
PPTX
Understanding_Digital_Forensics_Presentation.pptx
PDF
KodekX | Application Modernization Development
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Spectral efficient network and resource selection model in 5G networks
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
Dropbox Q2 2025 Financial Results & Investor Presentation
Per capita expenditure prediction using model stacking based on satellite ima...
Diabetes mellitus diagnosis method based random forest with bat algorithm
Encapsulation theory and applications.pdf
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
MYSQL Presentation for SQL database connectivity
Architecting across the Boundaries of two Complex Domains - Healthcare & Tech...
Detection-First SIEM: Rule Types, Dashboards, and Threat-Informed Strategy
Big Data Technologies - Introduction.pptx
“AI and Expert System Decision Support & Business Intelligence Systems”
Network Security Unit 5.pdf for BCA BBA.
20250228 LYD VKU AI Blended-Learning.pptx
Encapsulation_ Review paper, used for researhc scholars
Teaching material agriculture food technology
Understanding_Digital_Forensics_Presentation.pptx
KodekX | Application Modernization Development
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Advanced methodologies resolving dimensionality complications for autism neur...
Spectral efficient network and resource selection model in 5G networks
Reach Out and Touch Someone: Haptics and Empathic Computing

Using Apache Solr for Images as Big Data: Presented by Kerry Koitzsch, Wipro Technologies

  • 1. O C T O B E R 1 1 - 1 4 , 2 0 1 6 • B O S T O N , M A
  • 2. Using Apache Solr for Images As Big Data: A Case Study Kerry Koitzsch Architect, Wipro Technologies
  • 3. Overview of this Presentation •  This quick overview of one of our ongoing projects describes why Lucene and Solr are key parts of our ongoing research, development, and client support activities. •  The presentation highlights areas of research which involve Solr technologies in the “images as big data” arena: an automated microscope slide application prototype as well as other kinds of data analysis and visualization. The use case described relies heavily on Lucene, Solr, and related “helper libraries” to provide data storage capabilities for the software toolkit, the “Image as Big Data Toolkit” (IABDT). •  Throughout the presentation we discuss how the flexibility, high performance, and ability to “play well with” other components makes Lucene/Solr an essential part of the application described here.
  • 4. 4 01 Use Case Overview: How Solr Technologies Relate To: § ‘Old School’ statistical displays § Web-based data visualization § ‘Glue Ware’ § A crime statistic visualization § An image as big data visualization
  • 5. 5 02 Types of Data Visualization Statistical displays --- ‘old school’ histogram, pie chart, and time series Tabular displays --- stylized table-based visualization with search, etc. Notebook based visualization Map based displays with geo-location Images with overlays Constructing data visualizers with Lucene | Solr components
  • 6. 6 03 “Old School” Statistical Visualization Histograms, line charts, pie charts and time series displays. Notebook technologies, built-in visualization capabilities (such as Elasticsearch-Kibana or Apache Mahout visualization) may be used with Cassandra data and with Lucene/Solr. A standard ETL approach may be used as part of the data pipeline, and intelligent search can be provided by Lucene/Solr.
  • 7. 7 01 “Old School” Statistical Visualization: Standard Plots and Charts
  • 8. 8 01 “Old School” Visualization of Classifier Results
  • 9. 9 01 “Old School” Statistical Visualization: Standard Time Series Plots
  • 11. 11 01 Graph Visualization ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location Description,Arrest,Domestic,Beat,District,Ward,Community Area,FBI Code,X Coordinate,Y Coordinate,Year,Updated On,Latitude,Longitude,Location9955810,HY144797,02/08/2015 11:43:40 PM,081XX S COLES AVE,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,STREET,true,false,0422,004,7,46,18,1198273,1851626,2015,02/15/2015 12:43:39 PM, 41.747693646,-87.549035389,"(41.747693646, -87.549035389)"9955861,HY144838,02/08/2015 11:41:42 PM,118XX S STATE ST,0486,BATTERY,DOMESTIC BATTERY SIMPLE,APARTMENT,true,true, 0522,005,34,53,08B,1178335,1826581,2015,02/15/2015 12:43:39 PM, 41.679442289,-87.622850758,"(41.679442289, -87.622850758)"9955801,HY144779,02/08/2015 11:30:22 PM,002XX S LARAMIE AVE,2026,NARCOTICS,POSS: PCP,SIDEWALK,true,false, 1522,015,29,25,18,1141717,1898581,2015,02/15/2015 12:43:39 PM, 41.87777333,-87.755117993,"(41.87777333, -87.755117993)"9956197,HY144787,02/08/2015 11:30:23 PM,006XX E 67TH ST,1811,NARCOTICS,POSS: CANNABIS 30GMS OR LESS,STREET,true,false,0321,,6,42,18,,,2015,02/15/2015 12:43:39 PM,,, 9955846,HY144829,02/08/2015 11:30:58 PM,0000X S MAYFIELD AVE,0610,BURGLARY,FORCIBLE ENTRY,APARTMENT,false,false,1513,015,29,25,05,1137239,1899372,2015,02/15/2015 12:4 § Leveraging Graph databases and graph visualization toolkits with Lucene/Solr-centric systems § Giraph, neo4j, OrientDB, and other graph databases in combination with a Lucene/Solr centric technology stack § For example, Chicago crime data format as CSV:
  • 12. Graph Visualization in Neo4J Graph Visualization Example I: Neo4J (Separate Nodes)
  • 13. Graph Visualization Example : Simple UIs and Hierarchies Graph Visualization Example II: gojs Visualization
  • 14. Notebook-Based Visualization Jupyter or Zeppelin notebook technologies may be used to display Solr based information and analytics results These notebook technologies can be used as the display component in a data pipeline oriented processing architecture Solr works well as one element of such a data pipeline Spring, Spring Data, and Apache Tika may be used as data pipeline components Simpler data pipelines may be evolved into Complex Event Processors (CEPs)
  • 15. Notebook Visualization: Architecture and Strategy § A relatively simple data pipeline system may be build using Zeppelin notebook as a visualization of the output results § Geolocation data may be visualized as in the following example Hadoop HBase NGData Lily Solr Lucene Solandra Katta Cassandra ELK Stack Kafka Apache Spark Mesos Akka Technology components
  • 16. Notebook Based Visualization: Example: Solr-Zeppelin-Cassandra
  • 17. Map / Geolocation Visualization Crime data can easily be imported into Solr The data may be manipulated and pushed into Elasticsearch or Solr or back to Cassandra Elasticsearch data can be visualized using Kibana and searched compatibly with Lucene | Solr and the other modules Logstash may be used to assist in importing data from “log file analysis” type applications, or Flume or any of the many other import frameworks: Apache Tika is especially useful as a support library
  • 18. Map / Geolocation Data: Crime Data in Solr § Technology stack includes the ELK Stack plus Cassandra plus Lucene/Solr/ Hadoop § Data may use CSV crime data files as an original data source §  Solr can process JSON based data with geolocation data associated with it, and is especially powerful with Apache Tika
  • 19. Map / Geolocation : Crime Data in Kibana § Technology stack includes the ELK Stack plus Cassandra plus Lucene/Solr/ Hadoop § Data may use CSV crime data files as an original data source §  Kibana can process JSON based data with geolocation data associated with it, as can Lucene/Solr/Tika
  • 20. Map | Geolocation Visualization: Data to Image
  • 21. “Image as Big Data” Visualization A data pipeline with images as a data source Feature extraction can identify features of interest and write them to Cassandra as feature descriptors, using Lucene/Solr for intelligent search capability Deep learning and machine learning can enhance the processing pipeline
  • 22. Image as Big Data Analysis Image as Big Data Analysis (Poggio’s MIT Vision Machine) Original Images Color Analyzers Texture Analyzers Edge Detectors Motion Analyzers Stereo Image Analyzers Discontinuity Map Generation (Including Line & Continuous Process) Cooperating Recognition Process Analysis Result Repository
  • 23. Intelligent Search with Lucene Solr Centric Architecture
  • 24. Image “As Big Data” Analytics Visualization: Linear Features
  • 25. Automated Microscopy : The Original Components
  • 26. Feature Extraction : Original Electron Microscope Image
  • 27. Feature Extraction : Image to Data : Ellipses
  • 28. Feature Extraction : Image to Data : Contours
  • 29. “Image as Big Data” Visualization: Optical Microscope Hardware
  • 30. Microscope Control Software, with Data Ingestion
  • 31. “Image as Big Data” Visualization: Solr Search: Metadata
  • 32. “Image as Big Data” Visualization: Microscopy UI
  • 33. Another View of the Data Pipeline   Image  and  Metadata   Input  Sources   (or  “smart  sensors”)   Multi-­‐sensor  Fusion   Software  Engine   Short  Term   Computation  Result   Repository   Long-­‐Term    Result   Data  Repository   Feature  Extraction   and  Model  Builder   Global  System  Controller  
  • 34. Conclusions and Future Work A use case was described in which we use a Lucene/Solr- centric technology stack to provide an intelligent search component Flat files, HDFS files, CSV data, data streams and other data sources may be used, including microscope images of many different formats, resolutions, and metadata content “Images as big data” is a viable strategy for building image processing applications with Lucene/Solr as an intelligent search component, because of Lucene/Solr’s flexibility and ability to play well with other components Deep learning, machine learning, data mining, and hybrid techniques can be used to develop Lucene/Solr-centric analytics applications with “intelligent search” capabilities Your Questions? Kerry.koitzsch@wipro.com