SlideShare a Scribd company logo
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Semantics for Safeguarding & Security
a police story
Technical Delivery Architect, Consulting Services
Jen Shorten
Senior Consultant
Edward Thomas
SLIDE: 2 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
 Multi-model (documents, semantics)
 Load data as is, avoid ETL
 Built-in search and indexing
 ACID transactions, HA/DR, Elasticity
 Most secure NoSQL database
MarkLogic: Enterprise
NoSQL Database
OVERVIEW
JSON
XML
SEMANTIC DATA
GEOSPATIAL
DATA
BINARY
The Nature of Policing Is Changing
“The increasing availability of information and new technologies offers us huge potential to improve how we
protect the public. It sets new expectations about the services we provide.”
Police IT systems need to adapt to keep up with those changes
SLIDE: 4 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
 Proactive
 Impact led
 Outcomes driven
 Data driven
Digital Transformation
of Policing
NATIONAL POLICE OBJECTIVES
EVENTS
PEOPLE,
THINGS
OUTCOMES
TIME
SLIDE: 5 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
A Unified, Actionable
360 View of Data
WHAT POLICE NEED TO DELIVER THAT VISION
SLIDE: 6 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Data Is In Silos
THE REALITY
 Data is spread across disconnected databases
 Data quality issues are significant
 Data collection is a slow manual process
SLIDE: 7 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Current State
SLIDE: 8 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Traditional RDBMS Solutions
FEDERATED SEARCH
ETL
ETL
ETL
ETL
ETL
COTS
C2
CRIME RECORDING
DOC
INTEL
DOC
CHILD PROTECTION
C2
DOC
INTEL
DOC
CHILD PROTECTION CRIME RECORDING
COTS PRODUCTS
SEARCH
SLIDE: 9 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Federated Search
SLIDE: 10 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The Promise: Easy ETL, Low Costs
TRADITIONAL APPROACH WITH COTS
POLICE & PARTNER
SOURCE SYSTEMS
COTS
PRODUCT
SLIDE: 11 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The Reality: Extract, Transform, & Lose
TRADITIONAL APPROACH WITH COTS
POLICE & PARTNER
SOURCE SYSTEMS
COTS
PRODUCT
ETL
SLIDE: 12 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use All of the Data
THE IDEAL SOLUTION
 Semantic linking to see relationships between
people, locations, events and objects
 Extract context from narrative text
 Build a complete picture by exploiting the
value in all of the data
“frequent 999 caller”
“Wheelchair bound”
suspectNigel
victim
Sarah
partner
MULTI-MODEL: DOCUMENTS & TRIPLES TOGETHER
JSON, XML, & RDF
Crime
SLIDE: 13 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Single point of entry to all force information sources for intelligence and safeguarding
Police Intelligence Platform
THE PROJECT
 Single point of entry to all Police
information sources for
intelligence and safeguarding
 4 applications built on top of a
single unified set of data from 10
different police databases
 12 weeks of development
 Data quality issues
 Disconnected data
CRIME
PATTERNS
Emergency
Calls
Crimes
Arrests
Missing Persons
Child Protection
Intelligence
Mapping &
Addresses
+ MORE
10+ Data Feeds
INITELLIGENCE/ANALYSIS TOOLS
VULNERABILITY
DETECTION
INMATE RISK
ASSESSMENT
CSE
SOCIAL MEDIA
TRANSFORMATION, DISAMBIGUATION & AGGREGATION
AUTOMATED INGESTION
SECURTIY & PERMISSIONS
SEARCH ALERTS GEO WORKFLOW
SLIDE: 14 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Use Triples To Find Relationships
SLIDE: 15 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Combine Triples With Documents
SLIDE: 15 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
SLIDE: 16 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Add Geospatial to Triples and Documents
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18
<CRIME>
<REFERENCE>HW/001900/15</REFERENCE>
<CODE>MOPI GROUP 2 - ASSAULT A.B.H</CODE>
<CRIMINCIDENT_DATE>08/11/2015</CRIMINCIDENT_DATE>
<ADDRESS>
<TYPE>SCENE OF CRIME</TYPE>
<STREET>8 HAVEN ROAD, EXETER</STREET>
<POSTCODE>EX2 8BP</POSTCODE>
</ADDRESS>
<PERSON>
<AUTNPERSONTYPE>VICTIM</AUTNPERSONTYPE>
<SURNAME>PHILLIPS</SURNAME>
<FORENAME>MILDRED</FORENAME>
<DATE_OF_BIRTH>14/01/1927</DATE_OF_BIRTH>
<GENDER>F</GENDER>
<OCCUPATION>UNEMPLOYED</OCCUPATION>
<PERSONALIASLIST></PERSONALIASLIST>
...
Load ‘As Is’
 Export source data
 Load data as-is as documents – XML/JSON
 Record provenance information – PROV-O
ontology
 Harmonize data – envelope pattern
 Canonicalize – POLE model
DATA PROCESSING
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19
<envelope>
<person>
<uri>646569e5-5f6c-4667-96c7-09ff84b3e08e</uri>
<personType>VICTIM</personType>
<surname>PHILLIPS</surname>
<surname_dm>flps</surname_dm>
<forename>MILDRED</forename>
<forename_dm>mltrt</forename_dm>
<dob>1927-01-14</dob>
<gender>f</gender>
<occupation>UNEMPLOYED</occupation>
...
Harmonize People
EXTRACT ENTITIES
 Generate a unique ID for the entity instance
 Harmonize element names and data formats
 Generate phonetic versions of names
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20
<envelope>
<event>
<uri>police.uk/event/crime/CMS2_106600</uri>
<eventType>Crime</eventType>
<eventDate>2015-11-08</eventDate>
...
Harmonize Event
EXTRACT ENTITIES
 Generate a unique ID for the entity
instance
 Harmonize element names and data
formats
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21
<envelope>
<event>
<uri>police.uk/event/crime/CMS2_106600</uri>
<eventType>Crime</eventType>
<eventDate>2015-11-08</eventDate>
<location>key:3a001d392f0e819e98810095a542391a96aa177e
</event>
<address>
<key>key:3a001d392f0e819e98810095a542391a96aa177e</key>
</address>
...
Harmonize Locations
EXTRACT ENTITIES
 Utilize an authoritative reference data
source for addresses – e.g. Ordnance
Survey
 Record the Unique Property Reference
Number (UPRN)
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22
<envelope>
<person>
<uri>646569e5-5f6c-4667-96c7-09ff84b3e08e</uri>
<personType>VICTIM</personType>
<surname>PHILLIPS</surname>
<surname_dm>flps</surname_dm>
<forename>MILDRED</forename>
<forename_dm>mltrt</sp:forename_dm>
<dob>1927-01-14</dob>
<gender>f</gender>
<occupation>UNEMPLOYED</occupation>
<key>key:21a7e3154a71311e07f277d8696262d0bbd1bf94</key>
<key>key:82b93e911bd18e2612fae64d1c81e889b3858f64</key>
...
Calculate Hashes
APPLY MATCHING RULES
 Compute hash codes for dimensional
combinations that disambiguate the entity:
- forename_dm && surname_dm && dob
- forename_dm && surname_dm &&
address
 Leverage MarkLogic’s universal index for
resolving the keys
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23
<envelope>
<triple>
<subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5<subject>
<predicate>suspectOf</predicate>
<object>police.uk/event/crime/CMS2_106600</object>
</triple>
Store Triples
 Record relationships between entities:
- Person <suspectOf> Crime
RECORD RELATIONSHIPS
© COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24
<envelope>
<triple>
<subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5<subject>
<predicate>suspectOf</predicate>
<object>police.uk/event/crime/CMS2_106600</object>
</triple>
<triple>
<subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5</subject>
<predicate>sameAs</predicate>
<object>key:21a7e3154a71311e07f277d8696262d0bbd1bf94</obje
</triple>
<triple>
<subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5</subject>
<predicate>sameAs</predicate>
<object>key:82b93e911bd18e2612fae64d1c81e889b3858f64</obje
</triple>
Store Triples
 Record relationships between entities:
- Person <suspectOf> Crime
 Record relationship between entity
instance (i.e Person) and the
disambiguation hash code
RECORD RELATIONSHIPS
SLIDE: 25 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Collapsing Entities Using Semantics
Person A
Hash Code 1
Same As Same As
Same As
Person B
Hash Code 2
Person C
…Hash Code N
Person X
SLIDE: 26 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Advantages of the Multi-Model Approach
Fast
- Fast search - limits joins as entities are documents, relationships are triples
- Fast ingest - disambiguation effort is performed at query time
- Fast disambiguation - transitive closure operation is very quick
Flexible
- Query time disambiguation allows rules to be changed or applied on a per user basis
- Use different predicates for use-case sensitive deduplication and different degrees of confidence
Secure
- By applying document level security, and storing the hashes used inside the document, we can
ensure that no secure information leaks to the wrong user
SLIDE: 29 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
The Operational Data Hub
OUR SOLUTION
SLIDE: 30 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
Benefits of a Document + Triple Store
DATA DOCUMENTS SEMANTICS
All the benefits of each, plus:
 Docs can contain triples, Triples can
annotate docs, Graphs can contain docs
– Faster data integration using semantics as
the glue
– Ideal model for reference data, metadata,
provenance
– Ability to run really powerful queries
 Massive speed and scale
 Simplicity of a single unified platform
 Enterprise features (security, HA/DR, ACID
transactions,…)
Q&A

More Related Content

PPTX
Session 1.1 linked data applied: a field report from the netherlands
PDF
Session 1.3 context information management across smart city knowledge domains
PDF
Session 2.6 semantic data governance for regulatory compliance
PDF
Session 1.1 dalicc - data licenses clearance center
PDF
FIWARE Global Summit - International Data Spaces - From Industry 4.0 to Data ...
PPTX
Easy SPARQLing for the Building Performance Professional
PDF
IoT Semantic Interoperability: Keynote at Haystack Connect 2017
PDF
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...
Session 1.1 linked data applied: a field report from the netherlands
Session 1.3 context information management across smart city knowledge domains
Session 2.6 semantic data governance for regulatory compliance
Session 1.1 dalicc - data licenses clearance center
FIWARE Global Summit - International Data Spaces - From Industry 4.0 to Data ...
Easy SPARQLing for the Building Performance Professional
IoT Semantic Interoperability: Keynote at Haystack Connect 2017
FIWARE Global Summit - The Digital Single Market - Benefits and Solutions for...

What's hot (20)

PPTX
GraphTalks Frankfurt - Graph Database Überblick
PDF
FIWARE Tech Summit - Data Ahead - the New Data Logistic Approach
PDF
FIWARE Tech Summit - Industrial Data Space - a New Idea For Sharing Data
PDF
Data Privacy, Security in personal data sharing
PPT
Hri in english-generic-2011
PDF
Volum, Varietat, Velocitat... i Compartició
PDF
An open science cloud for scientific research
PPTX
DMA Ignite Night - Status DMA
PDF
Data sharing principles for Digital Transformation
PDF
OSFair2017 Workshop | Industrial Data Space: A new idea for sharing data
PPTX
Future services
PDF
Bde sc3 2nd_workshop_2016_10_04_p01_bde_introduction
PPT
Webinar: How MongoDB is making Government Better, Faster, Smarter
PPTX
The potential of the cloud
PDF
Bde sc3 2nd_workshop_2016_10_04_p03_efacec
PDF
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
PPTX
Data cockpit: Semantic MediaWiki as GDPR compliance tool SMWCon 2018
PPTX
Open data NMBS/SNCB
PDF
Overview & Key offerings
PDF
Adopting linked data principles for accelerating business transformation proc...
GraphTalks Frankfurt - Graph Database Überblick
FIWARE Tech Summit - Data Ahead - the New Data Logistic Approach
FIWARE Tech Summit - Industrial Data Space - a New Idea For Sharing Data
Data Privacy, Security in personal data sharing
Hri in english-generic-2011
Volum, Varietat, Velocitat... i Compartició
An open science cloud for scientific research
DMA Ignite Night - Status DMA
Data sharing principles for Digital Transformation
OSFair2017 Workshop | Industrial Data Space: A new idea for sharing data
Future services
Bde sc3 2nd_workshop_2016_10_04_p01_bde_introduction
Webinar: How MongoDB is making Government Better, Faster, Smarter
The potential of the cloud
Bde sc3 2nd_workshop_2016_10_04_p03_efacec
EDF2014: Dimitris Vassiliadis, Head of Unit, EXUS Innovation Attractor: From ...
Data cockpit: Semantic MediaWiki as GDPR compliance tool SMWCon 2018
Open data NMBS/SNCB
Overview & Key offerings
Adopting linked data principles for accelerating business transformation proc...
Ad

Similar to Session 2.3 semantics for safeguarding &amp; security – a police story (20)

PPTX
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
PDF
Data Con LA 2018 - Agile Integration Using an Enterprise Data Hub by Michael ...
PDF
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
PDF
The Value of Metadata
PDF
Big data vendor panel - MarkLogic
PDF
How Semantics Solves Big Data Challenges
PPTX
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
PPTX
New Trends in Data Management in the Information Industries
PDF
Fraud webinar - Prevention & Risk Management
PPT
Establishing conclusive proof in Forensic Data Analytics
PDF
Analytic Information Data Exchange
PPT
Applications of Semantic Technology in the Real World Today
PDF
Mark logic ediscovery and governance v1
PDF
The New Database Frontier: Harnessing the Cloud
PPTX
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
PDF
Building on Multi-Model Databases
PDF
about-marklogic-factsheet_Feb12014
PDF
Cwin16 - Lyon - partner mark logic - the rise of nosql
PDF
As You Seek – How Search Enables Big Data Analytics
PPTX
Oracle openworld-presentation
Northeastern DB Class Introduction to Marklogic NoSQL april 2016
Data Con LA 2018 - Agile Integration Using an Enterprise Data Hub by Michael ...
Stephen Buxton: When RDF alone is not enough - triples, documents, and data i...
The Value of Metadata
Big data vendor panel - MarkLogic
How Semantics Solves Big Data Challenges
2013 10-03-semantics-meetup-s buxton-mark_logic_pub
New Trends in Data Management in the Information Industries
Fraud webinar - Prevention & Risk Management
Establishing conclusive proof in Forensic Data Analytics
Analytic Information Data Exchange
Applications of Semantic Technology in the Real World Today
Mark logic ediscovery and governance v1
The New Database Frontier: Harnessing the Cloud
Stephen Buxton | Data Integration - a Multi-Model Approach - Documents and Tr...
Building on Multi-Model Databases
about-marklogic-factsheet_Feb12014
Cwin16 - Lyon - partner mark logic - the rise of nosql
As You Seek – How Search Enables Big Data Analytics
Oracle openworld-presentation
Ad

More from semanticsconference (20)

PPTX
Linear books to open world adventure
PDF
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
PDF
Session 4.3 semantic annotation for enhancing collaborative ideation
PDF
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
PPTX
Session 0.0 keynote sandeep sacheti - final hi res
PDF
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
PDF
Session 1.4 connecting information from legislation and datasets using a ca...
PDF
Session 1.4 a distributed network of heritage information
PDF
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
PDF
Session 1.3 semantic asset management in the dutch rail engineering and con...
PPTX
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
PDF
Session 1.2 improving access to digital content by semantic enrichment
PPTX
Session 2.5 semantic similarity based clustering of license excerpts for im...
PDF
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
PDF
Session 1.6 slovak public metadata governance and management based on linke...
PPTX
Session 5.6 towards a semantic outlier detection framework in wireless sens...
PPTX
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
PDF
Session 0.0 poster minutes madness
PPTX
Keynote new convergences between natural language processing and knowledge ...
PDF
Session 3.4 developing a medicines catalogue using linked data sources
Linear books to open world adventure
Session 1.2 high-precision, context-free entity linking exploiting unambigu...
Session 4.3 semantic annotation for enhancing collaborative ideation
Session 0.0 aussenac semanticsnl-pwebsem2017-v4
Session 0.0 keynote sandeep sacheti - final hi res
Session 1.2 enrich your knowledge graphs: linked data integration with pool...
Session 1.4 connecting information from legislation and datasets using a ca...
Session 1.4 a distributed network of heritage information
Session 0.0 media panel - matthias priem - gtuo - semantics 2017
Session 1.3 semantic asset management in the dutch rail engineering and con...
Session 1.3 energy, smart homes &amp; smart grids: towards interoperability...
Session 1.2 improving access to digital content by semantic enrichment
Session 2.5 semantic similarity based clustering of license excerpts for im...
Session 4.2 unleash the triple: leveraging a corporate discovery interface....
Session 1.6 slovak public metadata governance and management based on linke...
Session 5.6 towards a semantic outlier detection framework in wireless sens...
Session 2.2 ontology-guided job market demand analysis: a cross-sectional s...
Session 0.0 poster minutes madness
Keynote new convergences between natural language processing and knowledge ...
Session 3.4 developing a medicines catalogue using linked data sources

Recently uploaded (20)

PDF
gpt5_lecture_notes_comprehensive_20250812015547.pdf
PDF
Getting Started with Data Integration: FME Form 101
PPTX
Machine Learning_overview_presentation.pptx
PDF
Empathic Computing: Creating Shared Understanding
PDF
Machine learning based COVID-19 study performance prediction
PPT
Teaching material agriculture food technology
PDF
Agricultural_Statistics_at_a_Glance_2022_0.pdf
PPTX
cloud_computing_Infrastucture_as_cloud_p
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PDF
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
PPTX
TLE Review Electricity (Electricity).pptx
PDF
MIND Revenue Release Quarter 2 2025 Press Release
PDF
A comparative study of natural language inference in Swahili using monolingua...
PDF
Network Security Unit 5.pdf for BCA BBA.
PPTX
A Presentation on Artificial Intelligence
PDF
Encapsulation theory and applications.pdf
PDF
August Patch Tuesday
PDF
Building Integrated photovoltaic BIPV_UPV.pdf
PDF
A comparative analysis of optical character recognition models for extracting...
PPTX
Spectroscopy.pptx food analysis technology
gpt5_lecture_notes_comprehensive_20250812015547.pdf
Getting Started with Data Integration: FME Form 101
Machine Learning_overview_presentation.pptx
Empathic Computing: Creating Shared Understanding
Machine learning based COVID-19 study performance prediction
Teaching material agriculture food technology
Agricultural_Statistics_at_a_Glance_2022_0.pdf
cloud_computing_Infrastucture_as_cloud_p
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Video forgery: An extensive analysis of inter-and intra-frame manipulation al...
TLE Review Electricity (Electricity).pptx
MIND Revenue Release Quarter 2 2025 Press Release
A comparative study of natural language inference in Swahili using monolingua...
Network Security Unit 5.pdf for BCA BBA.
A Presentation on Artificial Intelligence
Encapsulation theory and applications.pdf
August Patch Tuesday
Building Integrated photovoltaic BIPV_UPV.pdf
A comparative analysis of optical character recognition models for extracting...
Spectroscopy.pptx food analysis technology

Session 2.3 semantics for safeguarding &amp; security – a police story

  • 1. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Semantics for Safeguarding & Security a police story Technical Delivery Architect, Consulting Services Jen Shorten Senior Consultant Edward Thomas
  • 2. SLIDE: 2 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Multi-model (documents, semantics)  Load data as is, avoid ETL  Built-in search and indexing  ACID transactions, HA/DR, Elasticity  Most secure NoSQL database MarkLogic: Enterprise NoSQL Database OVERVIEW JSON XML SEMANTIC DATA GEOSPATIAL DATA BINARY
  • 3. The Nature of Policing Is Changing “The increasing availability of information and new technologies offers us huge potential to improve how we protect the public. It sets new expectations about the services we provide.” Police IT systems need to adapt to keep up with those changes
  • 4. SLIDE: 4 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.  Proactive  Impact led  Outcomes driven  Data driven Digital Transformation of Policing NATIONAL POLICE OBJECTIVES EVENTS PEOPLE, THINGS OUTCOMES TIME
  • 5. SLIDE: 5 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. A Unified, Actionable 360 View of Data WHAT POLICE NEED TO DELIVER THAT VISION
  • 6. SLIDE: 6 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Data Is In Silos THE REALITY  Data is spread across disconnected databases  Data quality issues are significant  Data collection is a slow manual process
  • 7. SLIDE: 7 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Current State
  • 8. SLIDE: 8 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Traditional RDBMS Solutions FEDERATED SEARCH ETL ETL ETL ETL ETL COTS C2 CRIME RECORDING DOC INTEL DOC CHILD PROTECTION C2 DOC INTEL DOC CHILD PROTECTION CRIME RECORDING COTS PRODUCTS SEARCH
  • 9. SLIDE: 9 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Federated Search
  • 10. SLIDE: 10 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The Promise: Easy ETL, Low Costs TRADITIONAL APPROACH WITH COTS POLICE & PARTNER SOURCE SYSTEMS COTS PRODUCT
  • 11. SLIDE: 11 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The Reality: Extract, Transform, & Lose TRADITIONAL APPROACH WITH COTS POLICE & PARTNER SOURCE SYSTEMS COTS PRODUCT ETL
  • 12. SLIDE: 12 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use All of the Data THE IDEAL SOLUTION  Semantic linking to see relationships between people, locations, events and objects  Extract context from narrative text  Build a complete picture by exploiting the value in all of the data “frequent 999 caller” “Wheelchair bound” suspectNigel victim Sarah partner MULTI-MODEL: DOCUMENTS & TRIPLES TOGETHER JSON, XML, & RDF Crime
  • 13. SLIDE: 13 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Single point of entry to all force information sources for intelligence and safeguarding Police Intelligence Platform THE PROJECT  Single point of entry to all Police information sources for intelligence and safeguarding  4 applications built on top of a single unified set of data from 10 different police databases  12 weeks of development  Data quality issues  Disconnected data CRIME PATTERNS Emergency Calls Crimes Arrests Missing Persons Child Protection Intelligence Mapping & Addresses + MORE 10+ Data Feeds INITELLIGENCE/ANALYSIS TOOLS VULNERABILITY DETECTION INMATE RISK ASSESSMENT CSE SOCIAL MEDIA TRANSFORMATION, DISAMBIGUATION & AGGREGATION AUTOMATED INGESTION SECURTIY & PERMISSIONS SEARCH ALERTS GEO WORKFLOW
  • 14. SLIDE: 14 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Use Triples To Find Relationships
  • 15. SLIDE: 15 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Combine Triples With Documents SLIDE: 15 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.
  • 16. SLIDE: 16 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Add Geospatial to Triples and Documents
  • 17. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 18 <CRIME> <REFERENCE>HW/001900/15</REFERENCE> <CODE>MOPI GROUP 2 - ASSAULT A.B.H</CODE> <CRIMINCIDENT_DATE>08/11/2015</CRIMINCIDENT_DATE> <ADDRESS> <TYPE>SCENE OF CRIME</TYPE> <STREET>8 HAVEN ROAD, EXETER</STREET> <POSTCODE>EX2 8BP</POSTCODE> </ADDRESS> <PERSON> <AUTNPERSONTYPE>VICTIM</AUTNPERSONTYPE> <SURNAME>PHILLIPS</SURNAME> <FORENAME>MILDRED</FORENAME> <DATE_OF_BIRTH>14/01/1927</DATE_OF_BIRTH> <GENDER>F</GENDER> <OCCUPATION>UNEMPLOYED</OCCUPATION> <PERSONALIASLIST></PERSONALIASLIST> ... Load ‘As Is’  Export source data  Load data as-is as documents – XML/JSON  Record provenance information – PROV-O ontology  Harmonize data – envelope pattern  Canonicalize – POLE model DATA PROCESSING
  • 18. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 19 <envelope> <person> <uri>646569e5-5f6c-4667-96c7-09ff84b3e08e</uri> <personType>VICTIM</personType> <surname>PHILLIPS</surname> <surname_dm>flps</surname_dm> <forename>MILDRED</forename> <forename_dm>mltrt</forename_dm> <dob>1927-01-14</dob> <gender>f</gender> <occupation>UNEMPLOYED</occupation> ... Harmonize People EXTRACT ENTITIES  Generate a unique ID for the entity instance  Harmonize element names and data formats  Generate phonetic versions of names
  • 19. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 20 <envelope> <event> <uri>police.uk/event/crime/CMS2_106600</uri> <eventType>Crime</eventType> <eventDate>2015-11-08</eventDate> ... Harmonize Event EXTRACT ENTITIES  Generate a unique ID for the entity instance  Harmonize element names and data formats
  • 20. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 21 <envelope> <event> <uri>police.uk/event/crime/CMS2_106600</uri> <eventType>Crime</eventType> <eventDate>2015-11-08</eventDate> <location>key:3a001d392f0e819e98810095a542391a96aa177e </event> <address> <key>key:3a001d392f0e819e98810095a542391a96aa177e</key> </address> ... Harmonize Locations EXTRACT ENTITIES  Utilize an authoritative reference data source for addresses – e.g. Ordnance Survey  Record the Unique Property Reference Number (UPRN)
  • 21. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 22 <envelope> <person> <uri>646569e5-5f6c-4667-96c7-09ff84b3e08e</uri> <personType>VICTIM</personType> <surname>PHILLIPS</surname> <surname_dm>flps</surname_dm> <forename>MILDRED</forename> <forename_dm>mltrt</sp:forename_dm> <dob>1927-01-14</dob> <gender>f</gender> <occupation>UNEMPLOYED</occupation> <key>key:21a7e3154a71311e07f277d8696262d0bbd1bf94</key> <key>key:82b93e911bd18e2612fae64d1c81e889b3858f64</key> ... Calculate Hashes APPLY MATCHING RULES  Compute hash codes for dimensional combinations that disambiguate the entity: - forename_dm && surname_dm && dob - forename_dm && surname_dm && address  Leverage MarkLogic’s universal index for resolving the keys
  • 22. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 23 <envelope> <triple> <subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5<subject> <predicate>suspectOf</predicate> <object>police.uk/event/crime/CMS2_106600</object> </triple> Store Triples  Record relationships between entities: - Person <suspectOf> Crime RECORD RELATIONSHIPS
  • 23. © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED.SLIDE: 24 <envelope> <triple> <subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5<subject> <predicate>suspectOf</predicate> <object>police.uk/event/crime/CMS2_106600</object> </triple> <triple> <subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5</subject> <predicate>sameAs</predicate> <object>key:21a7e3154a71311e07f277d8696262d0bbd1bf94</obje </triple> <triple> <subject>7b9c5fb1-4a49-4d01-8979-a84408da51c5</subject> <predicate>sameAs</predicate> <object>key:82b93e911bd18e2612fae64d1c81e889b3858f64</obje </triple> Store Triples  Record relationships between entities: - Person <suspectOf> Crime  Record relationship between entity instance (i.e Person) and the disambiguation hash code RECORD RELATIONSHIPS
  • 24. SLIDE: 25 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Collapsing Entities Using Semantics Person A Hash Code 1 Same As Same As Same As Person B Hash Code 2 Person C …Hash Code N Person X
  • 25. SLIDE: 26 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Advantages of the Multi-Model Approach Fast - Fast search - limits joins as entities are documents, relationships are triples - Fast ingest - disambiguation effort is performed at query time - Fast disambiguation - transitive closure operation is very quick Flexible - Query time disambiguation allows rules to be changed or applied on a per user basis - Use different predicates for use-case sensitive deduplication and different degrees of confidence Secure - By applying document level security, and storing the hashes used inside the document, we can ensure that no secure information leaks to the wrong user
  • 26. SLIDE: 29 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. The Operational Data Hub OUR SOLUTION
  • 27. SLIDE: 30 © COPYRIGHT 2017 MARKLOGIC CORPORATION. ALL RIGHTS RESERVED. Benefits of a Document + Triple Store DATA DOCUMENTS SEMANTICS All the benefits of each, plus:  Docs can contain triples, Triples can annotate docs, Graphs can contain docs – Faster data integration using semantics as the glue – Ideal model for reference data, metadata, provenance – Ability to run really powerful queries  Massive speed and scale  Simplicity of a single unified platform  Enterprise features (security, HA/DR, ACID transactions,…)
  • 28. Q&A

Editor's Notes

  • #3: Load Data As Is In reality, there’s two kinds of ETL: The kind that focuses on transformation, cleansing, and quality because there’s a business process that requires standardization because they have to count things, do math, or disambiguate. The kind that has to be done merely to allow RDBMS to function because of their dependency on harmonized, normalized, and deorthoganalized data.  We eliminate #2 completely, and enable a more iterative approach to #1. How does ML deliver this integration? Ingest data as is. No upfront data modeling required. ETL tool not required. Structured and unstructured data. This includes scalar, text, geospatial, binary, semantic triples. Data and meta data. Schemas accepted, but not required. Data Formats (XML, JSON, binary etc.) with efficient tokenized storage Methods Content Pump - high speed data loading, serial writes REST APIs Java Client API Node.js Client API Java / .NET XCC Competitive Advantage? Data modeling and transformation is not a mandatory pre-requisite to loading data.  
  • #4: Quote is from the NPCC’s Policing Vision 2025 http://www.npcc.police.uk/NPCCBusinessAreas/ReformandTransformation/PolicingVision2025.aspx
  • #6: Need information quickly, need confidence that it is correct, and need it to be complete, To know that they have a complete picture of a suspect/perpetrator Need to be able to see a full picture of a suspect in a live intelligence situation To have all of the right information to hand during a live intelligence situation
  • #7: Major problem is their silos are blocking them Lots of small steps that taken individually do not seem insurmountable but taken together cause paralysis And the status quo ends often ends up winning in this situation because if you can’t figure out where to start then you start anywhere One PCC of a force shared that the highly place ministers are telling the police forces that they need to get on with their transformation
  • #8: Story: Imagine you’re a police analyst and you need to provide a full history of a individual to allow officers to proceed with a terrorism investigation Currently the analysts need to log into several different databases. At some forces there are more than 10 different systems holding important data None of these systems is useful for combining and sharing the results of the analysis So in order to share they then copy and paste the data in a WORD or Excel spreadsheet This means that important intelligence reports are then further cut off from their source data and are not reproduceable
  • #10: Google like search appliance Lists of documents - Keyword Searches No context False positives– for example Nigel Brown will bring back tons of false positives How do you see the person in their context if you can only search for their name and not their relationships?
  • #11: The Promise - Access all data from a single user interface Migrate all of the data to a new relational database and use a search engine
  • #12: Schema design for this number and complexity of systems is time consuming Losing the unstructured data e.g. the police intelligence report, the storm call transcipt e.g. the valuable data that gives meaning and context
  • #13: Does more than just puts the data in one place Allows you to discover how that data is related by building connections between the information It’s the powerful combination of unstructured data and the semantic triples that help surface the relationships between people, events, locations and objects Able to exploit all of the value in the unstructured data that has heretofore been considered dark matter and therefore difficult to exploit
  • #14: An example of the IDEAL SOLUTION Create a model for the delivery of shared services for future application development and for rolling out as good practice to other forces Increase effectiveness and outcomes of analysts and police officers Prove that data driven application design can produce results quickly
  • #15: Single record for a person - not a list of documents Name misspellings Network diagram showing the relationships that have been identified by the system Information is organised by it's context e.g. person rather than by it's format e.g. storm call record
  • #16: Combine external data sources with your rich linked data set Dictionaries of words/phrases associated with grooming highlighted in search results
  • #17: Geospatial search making use of cleansed address data to show crime patterns over time and in geographic location
  • #18: When there is confidence that a complete 360 view of an individual can be identified you can start to use statistical analysis to identify behavioral patterns that indicate vulnerability
  • #19: Hand over to JW Higher contrast colour of purple. SB: - Explain up-front why you are using phonetic names – to deal with misspelling etc. - Same for dates – so I can type in a date in any format.
  • #22: SB: - Out of UPRN can get lat, long -
  • #23: Give a clearer description of a dimension combination - A combination of fields that uniquely identify the entity. SB: - Calculating hashes – so every unique entity can be uniquely identified.
  • #25: Explanation of transitive
  • #28: Loosely Integrated Multi System Stack SB: - We have demonstrate the approach of a multi-model data platform with semantics. - Other vendors try to replicate this using multiple technologies – but this approach is different.
  • #29: Architects paid by the box. Doing joins is incredibly expensive. Brittle Question to Jen – still thinking of a good quesiton
  • #30: Multi model is a good thing Multi-model in once place is even better. Simpler, transactional, all backed up together. Query across three models – all in once.