SlideShare a Scribd company logo
Geolocation with Cassandra
Austin Cassandra Users – Jan 21, 2016
Matt Vorst
• Cassandra User
– Since 2011
• Architect / Java developer
• Corporate Life
– EntekIRD & Rockwell Automation
• Serial Entrepreneur
– EventsInCincinnati.com – Co-founder
– Dotloop, Inc. – Co-founder and CTO
– Physi, Inc. – Co-founder and C*O
Physi [fiz-ee] (noun)
1. a mobile app that pairs nearby people to play sports
2. a movement to make a smaller, happier, healthier
world through play
Why Cassandra
• Operations is Hard
– Most relational DB’s don’t scale easily or well
– Murphy’s Law always strikes at the worst time
– Recovery shouldn’t come at a high cost
• Distributed Design
– Cassandra is a distributed technology
– Applications are designed to be distributed
Necessary Location Services
• Proximity Search
– Postal code range search
– Distance between postal codes
• Location Conversion
– Postal code to latitude/longitude
– Latitude/longitude to postal code
• Search
– City name lookup
Setup
• Create the Keyspace
cqlsh> CREATE KEYSPACE physi WITH replication =
{'class': 'SimpleStrategy', 'replication_factor': 1};
cqlsh> USE physi;
Postal Code to Latitude/Longitude
• Use Case
– Place markers on a map
• Solution
– Buy a database
– PK: Country/postal code
Postal Code to Latitude/Longitude
• Create Column Family
cqlsh>CREATE TABLE zip_code_master (
location_country text, zip_code text, location_uuid uuid,
location_type text, city text, county text, state text,
latitude_e6 bigint, longitude_e6 bigint,
PRIMARY KEY (location_country, zip_code));
Postal Code to Latitude/Longitude
• Add data
cqlsh>INSERT INTO zip_code_master
(location_country, zip_code, location_uuid, location_type,
city, county, state, latitude_e6, longitude_e6)
VALUES(‘US’,’45219’,
7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39,
’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’,
39127564,-84514489);
Postal Code to Latitude/Longitude
• Search
cqlsh>SELECT * FROM zip_code_master WHERE
location_country = 'US' AND zip_code = '45219';
location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state
------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------
US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH
• Results
Postal Code to Latitude/Longitude
• Things to Know
– Row width: ~10
– Postal codes cover different areas
– A single postal codes can span different cities,
counties, and even states
– The largest postal code covers 10,000 mi2
Latitude/Longitude to Postal Code
• Use Case
– Determine which postal code a
user is currently in server side
– Use this to return suggestions
Latitude/Longitude to Postal Code
• The Relational Way
– Draw a box, loop, and calculate
– Query:
SELECT * FROM location_table
WHERE (min lat) < latitude AND latitude < (max lat)
AND (min long) < longitude AND longitude < (max long)
Latitude/Longitude to Postal Code
• Cassandra Solution
– Prebuild a lookup table
• Slice the US up into 7mi by <=7mi squares
• ~69 miles between lines of latitude
• Longitude is not equally spaced
– PK: latE1|longE1
Latitude/Longitude to Postal Code
• Cassandra Solution (cont.)
– Build: Add bordering postal codes
– Read: Loop and calculate distance
Latitude/Longitude to Postal Code
• Create Column Family
cqlsh>CREATE TABLE latitude_longitude_zip_code
(latitude_e1 int, longitude_e1 int, location_country text,
zip_code text, location text,
PRIMARY KEY ((latitude_e1, longitude_e1),
location_country, zip_code));
Latitude/Longitude to Postal Code
• Add data
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45219','{json data}');
cqlsh>INSERT INTO latitude_longitude_zip_code
(latitude_e1, longitude_e1, location_country, zip_code,
location) VALUES(391,-845,'US','45220','{json data}');
Latitude/Longitude to Postal Code
• Search
cqlsh>SELECT * FROM latitude_longitude_zip_code
WHERE latitude_e1 = 391 AND longitude_e1 = -845;
• Results
latitude_e1 | longitude_e1 | location_country | zip_code | location
-------------+--------------+------------------+----------+-------------
391 | -845 | US | 45206 | {json data}
391 | -845 | US | 45219 | {json data}
391 | -845 | US | 45220 | {json data}
Latitude/Longitude to Postal Code
• Things to Know
– Row width: 1 to ~50
– This was a short lived solution
– Primarily using client location services
– Still used as a fallback for web
– Creation of the lookup table took 3 hours on
localhost with RAID 0 SSDs
City Name Lookup
• Use Case
– Auto-complete city name
• Solution
– Create a lookup
– RK: searchTerm
– CN: (0 padded count)|country|city
City Name Lookup
• Create Column Family
cqlsh>CREATE TABLE name_search
(search_term text, occurrence_count int,
location_country text, city text, state text, location text,
PRIMARY KEY ((search_term), occurrence_count,
location_country, city, state));
City Name Lookup
• Add data
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}');
cqlsh> INSERT INTO name_search
(search_term, occurrence_count, location_country, city,
state, location)
VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
City Name Lookup
• Search
cqlsh>SELECT * FROM name_search
WHERE search_term = 'aus'
ORDER BY occurrence_count DESC;
• Results
search_term | occurrence_count | location_country | city | state | location
-------------+------------------+------------------+-------------+-------+-------------
aus | 31 | US | austin | TX | {json data}
aus | 10 | US | austell | GA | {json data}
aus | 10 | US | ausablefork | NY | {json data}
City Name Lookup
• Things to Know
– Row width: 10 – 60K
– Remove whitespace, special characters, convert
search terms to lowercase
– Only search when 2 or more characters have
been entered
Postal Code Range Search
• Use Case
– Find nearby neighborhoods
• Solution
– Create a lookup table
– RK: country|postal code
Postal Code Range Search
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance
(location_country text, zip_code text, distance_e2 int,
location text,
PRIMARY KEY ((location_country, zip_code),
distance_e2));
Postal Code Range Search
• Add Data
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 0, '{json data for 78741}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 180, '{json data for 78702}');
cqlsh>INSERT INTO zip_code_distance
(location_country, zip_code, distance_e2, location)
VALUES('US', '78741', 220, '{json data for 78721}');
Postal Code Range Search
• Search
cqlsh>SELECT * FROM zip_code_distance
WHERE location_country = 'US' AND zip_code = '78741'
AND distance_e2 < 200
ORDER BY distance_e2;
• Results
location_country | zip_code | distance_e2 | location
------------------+----------+-------------+-----------------------
US | 78741 | 0 | {json data for 78741}
US | 78741 | 180 | {json data for 78702}
Postal Code Range Search
• Things to know
– Row width: 1 to ~45K
Distance Between Postal Codes
• Use Case
– Estimate the distance between postal
codes
• Solution
– Create a lookup table
– RK: country|postal code
– CN: country|postal code
– Value: distanceE2
Distance Between Postal Codes
• Create Column Family
cqlsh>CREATE TABLE zip_code_distance_between
(location_country_1 text, zip_code_1 text,
location_country_2 text, zip_code_2 text, distance_e2 int,
PRIMARY KEY ((location_country_1, zip_code_1),
location_country_2, zip_code_2));
Distance Between Postal Codes
• Add Data
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78741', 0);
cqlsh>INSERT INTO zip_code_distance_between
(location_country_1, zip_code_1, location_country_2,
zip_code_2, distance_e2)
VALUES('US', '78741', 'US', '78702', 180);
Distance Between Postal Codes
• Select
cqlsh>SELECT * FROM zip_code_distance_between
WHERE location_country_1 = 'US'
AND zip_code_1 = '78741'
AND location_country_2 = 'US'
AND zip_code_2 = '78702';
• Results
location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2
--------------------+------------+--------------------+------------+-------------
US | 78741 | US | 78702 | 180
Distance Between Postal Codes
• Things to know
– Row width: ~45K
Final Thoughts
• Why just Cassandra?
– Fewer technologies to support
• Operations
• Development
– But be reasonable
• Prebuild reference data
– Consider prebuilding data to reduce read time
Questions & Contact Info
Matt Vorst
CTO Physi, Inc.
matt@physi.rocks

More Related Content

PDF
Geospatial and bitemporal search in cassandra with pluggable lucene index
PPTX
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
PDF
On Beyond (PostgreSQL) Data Types
PPTX
Cassandra Data Modeling - Practical Considerations @ Netflix
PDF
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
PDF
Accelerating Local Search with PostgreSQL (KNN-Search)
PPTX
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
PDF
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016
Geospatial and bitemporal search in cassandra with pluggable lucene index
Stratio's Cassandra Lucene index: Geospatial Use Cases (Andrés de la Peña & J...
On Beyond (PostgreSQL) Data Types
Cassandra Data Modeling - Practical Considerations @ Netflix
Stratio's Cassandra Lucene index: Geospatial use cases by Andrés Peña
Accelerating Local Search with PostgreSQL (KNN-Search)
Massively Scalable Real-time Geospatial Data Processing with Apache Kafka and...
Stratio's Cassandra Lucene index: Geospatial use cases - Big Data Spain 2016

What's hot (19)

PDF
GeoMesa on Apache Spark SQL with Anthony Fox
PPTX
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
PDF
Goal Based Data Production with Sim Simeonov
PDF
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
KEY
Handling Real-time Geostreams
PDF
A Century Of Weather Data - Midwest.io
PPTX
Time Series Analysis for Network Secruity
PPTX
PPTX
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
PDF
Developing and Deploying Apps with the Postgres FDW
PPTX
Scalding: Reaching Efficient MapReduce
PPTX
Data Wars: The Bloody Enterprise strikes back
PPTX
MongoDB for Time Series Data Part 3: Sharding
PDF
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
PPTX
MongoDB 3.2 - Analytics
PPTX
The Aggregation Framework
PDF
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
PDF
Time Series Analysis by JavaScript LL matsuri 2013
PDF
Map reduce: beyond word count
GeoMesa on Apache Spark SQL with Anthony Fox
MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Agg...
Goal Based Data Production with Sim Simeonov
NLP on a Billion Documents: Scalable Machine Learning with Apache Spark
Handling Real-time Geostreams
A Century Of Weather Data - Midwest.io
Time Series Analysis for Network Secruity
Couchbase Tutorial: Big data Open Source Systems: VLDB2018
Developing and Deploying Apps with the Postgres FDW
Scalding: Reaching Efficient MapReduce
Data Wars: The Bloody Enterprise strikes back
MongoDB for Time Series Data Part 3: Sharding
Mining Geo-referenced Data: Location-based Services and the Sharing Economy
MongoDB 3.2 - Analytics
The Aggregation Framework
Engineering Fast Indexes for Big-Data Applications: Spark Summit East talk by...
Time Series Analysis by JavaScript LL matsuri 2013
Map reduce: beyond word count
Ad

Similar to Geolocation and Cassandra at Physi (20)

PPTX
Presentation
PPTX
N1QL: What's new in Couchbase 5.0
PPTX
Drilling into Data with Apache Drill
PDF
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
PDF
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
PPTX
Drilling into Data with Apache Drill
PPTX
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
PPTX
Day 6 - PostGIS
PPT
Building web applications with mongo db presentation
PDF
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
PDF
Postgres Vision 2018: Five Sharding Data Models
 
PDF
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
PPTX
Couchbase N1QL: Language & Architecture Overview.
PDF
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
PDF
Practical JSON in MySQL 5.7 and Beyond
PPTX
SplunkLive! Dallas Nov 2012 - Metro PCS
PPTX
Introduction to Apache Cassandra
PPTX
data science pt time series analysis.pptx
PDF
Re-Engineering PostgreSQL as a Time-Series Database
PPTX
Follow the money with graphs
Presentation
N1QL: What's new in Couchbase 5.0
Drilling into Data with Apache Drill
Five data models for sharding and which is right | PGConf.ASIA 2018 | Craig K...
Five Data Models for Sharding | Nordic PGDay 2018 | Craig Kerstiens
Drilling into Data with Apache Drill
N1QL+GSI: Language and Performance Improvements in Couchbase 5.0 and 5.5
Day 6 - PostGIS
Building web applications with mongo db presentation
Geospatial and bitemporal search in C* with pluggable Lucene index by Andrés ...
Postgres Vision 2018: Five Sharding Data Models
 
MySQL Query Tuning for the Squeemish -- Fossetcon Orlando Sep 2014
Couchbase N1QL: Language & Architecture Overview.
Cassandra Community Webinar | Getting Started with Apache Cassandra with Patr...
Practical JSON in MySQL 5.7 and Beyond
SplunkLive! Dallas Nov 2012 - Metro PCS
Introduction to Apache Cassandra
data science pt time series analysis.pptx
Re-Engineering PostgreSQL as a Time-Series Database
Follow the money with graphs
Ad

Recently uploaded (20)

PPTX
Introduction to Artificial Intelligence
PDF
Nekopoi APK 2025 free lastest update
PDF
wealthsignaloriginal-com-DS-text-... (1).pdf
PPTX
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
PDF
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
PPTX
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
PPTX
L1 - Introduction to python Backend.pptx
PDF
System and Network Administration Chapter 2
PDF
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
PPTX
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
PPTX
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
PDF
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
PDF
Odoo Companies in India – Driving Business Transformation.pdf
PDF
top salesforce developer skills in 2025.pdf
PDF
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
PDF
Cost to Outsource Software Development in 2025
PPTX
CHAPTER 2 - PM Management and IT Context
PDF
Which alternative to Crystal Reports is best for small or large businesses.pdf
PPTX
history of c programming in notes for students .pptx
PDF
Softaken Excel to vCard Converter Software.pdf
Introduction to Artificial Intelligence
Nekopoi APK 2025 free lastest update
wealthsignaloriginal-com-DS-text-... (1).pdf
Lecture 3: Operating Systems Introduction to Computer Hardware Systems
SAP S4 Hana Brochure 3 (PTS SYSTEMS AND SOLUTIONS)
Oracle E-Business Suite: A Comprehensive Guide for Modern Enterprises
L1 - Introduction to python Backend.pptx
System and Network Administration Chapter 2
Internet Downloader Manager (IDM) Crack 6.42 Build 42 Updates Latest 2025
Agentic AI Use Case- Contract Lifecycle Management (CLM).pptx
Log360_SIEM_Solutions Overview PPT_Feb 2020.pptx
Product Update: Alluxio AI 3.7 Now with Sub-Millisecond Latency
Odoo Companies in India – Driving Business Transformation.pdf
top salesforce developer skills in 2025.pdf
EN-Survey-Report-SAP-LeanIX-EA-Insights-2025.pdf
Cost to Outsource Software Development in 2025
CHAPTER 2 - PM Management and IT Context
Which alternative to Crystal Reports is best for small or large businesses.pdf
history of c programming in notes for students .pptx
Softaken Excel to vCard Converter Software.pdf

Geolocation and Cassandra at Physi

  • 1. Geolocation with Cassandra Austin Cassandra Users – Jan 21, 2016
  • 2. Matt Vorst • Cassandra User – Since 2011 • Architect / Java developer • Corporate Life – EntekIRD & Rockwell Automation • Serial Entrepreneur – EventsInCincinnati.com – Co-founder – Dotloop, Inc. – Co-founder and CTO – Physi, Inc. – Co-founder and C*O
  • 3. Physi [fiz-ee] (noun) 1. a mobile app that pairs nearby people to play sports 2. a movement to make a smaller, happier, healthier world through play
  • 4. Why Cassandra • Operations is Hard – Most relational DB’s don’t scale easily or well – Murphy’s Law always strikes at the worst time – Recovery shouldn’t come at a high cost • Distributed Design – Cassandra is a distributed technology – Applications are designed to be distributed
  • 5. Necessary Location Services • Proximity Search – Postal code range search – Distance between postal codes • Location Conversion – Postal code to latitude/longitude – Latitude/longitude to postal code • Search – City name lookup
  • 6. Setup • Create the Keyspace cqlsh> CREATE KEYSPACE physi WITH replication = {'class': 'SimpleStrategy', 'replication_factor': 1}; cqlsh> USE physi;
  • 7. Postal Code to Latitude/Longitude • Use Case – Place markers on a map • Solution – Buy a database – PK: Country/postal code
  • 8. Postal Code to Latitude/Longitude • Create Column Family cqlsh>CREATE TABLE zip_code_master ( location_country text, zip_code text, location_uuid uuid, location_type text, city text, county text, state text, latitude_e6 bigint, longitude_e6 bigint, PRIMARY KEY (location_country, zip_code));
  • 9. Postal Code to Latitude/Longitude • Add data cqlsh>INSERT INTO zip_code_master (location_country, zip_code, location_uuid, location_type, city, county, state, latitude_e6, longitude_e6) VALUES(‘US’,’45219’, 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39, ’REGIONAL’,’Cincinnati’,’Hamilton’,’OH’, 39127564,-84514489);
  • 10. Postal Code to Latitude/Longitude • Search cqlsh>SELECT * FROM zip_code_master WHERE location_country = 'US' AND zip_code = '45219'; location_country | zip_code | city | county | latitude_e6 | location_type | location_uuid | longitude_e6 | state ------------------+----------+------------+----------+-------------+---------------+--------------------------------------+--------------+------ US | 45219 | Cincinnati | Hamilton | 39127564 | REGIONAL | 7b0e6b7f-0d9a-3a66-9f9a-0df17ed5dc39 | -84514489 | OH • Results
  • 11. Postal Code to Latitude/Longitude • Things to Know – Row width: ~10 – Postal codes cover different areas – A single postal codes can span different cities, counties, and even states – The largest postal code covers 10,000 mi2
  • 12. Latitude/Longitude to Postal Code • Use Case – Determine which postal code a user is currently in server side – Use this to return suggestions
  • 13. Latitude/Longitude to Postal Code • The Relational Way – Draw a box, loop, and calculate – Query: SELECT * FROM location_table WHERE (min lat) < latitude AND latitude < (max lat) AND (min long) < longitude AND longitude < (max long)
  • 14. Latitude/Longitude to Postal Code • Cassandra Solution – Prebuild a lookup table • Slice the US up into 7mi by <=7mi squares • ~69 miles between lines of latitude • Longitude is not equally spaced – PK: latE1|longE1
  • 15. Latitude/Longitude to Postal Code • Cassandra Solution (cont.) – Build: Add bordering postal codes – Read: Loop and calculate distance
  • 16. Latitude/Longitude to Postal Code • Create Column Family cqlsh>CREATE TABLE latitude_longitude_zip_code (latitude_e1 int, longitude_e1 int, location_country text, zip_code text, location text, PRIMARY KEY ((latitude_e1, longitude_e1), location_country, zip_code));
  • 17. Latitude/Longitude to Postal Code • Add data cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45219','{json data}'); cqlsh>INSERT INTO latitude_longitude_zip_code (latitude_e1, longitude_e1, location_country, zip_code, location) VALUES(391,-845,'US','45220','{json data}');
  • 18. Latitude/Longitude to Postal Code • Search cqlsh>SELECT * FROM latitude_longitude_zip_code WHERE latitude_e1 = 391 AND longitude_e1 = -845; • Results latitude_e1 | longitude_e1 | location_country | zip_code | location -------------+--------------+------------------+----------+------------- 391 | -845 | US | 45206 | {json data} 391 | -845 | US | 45219 | {json data} 391 | -845 | US | 45220 | {json data}
  • 19. Latitude/Longitude to Postal Code • Things to Know – Row width: 1 to ~50 – This was a short lived solution – Primarily using client location services – Still used as a fallback for web – Creation of the lookup table took 3 hours on localhost with RAID 0 SSDs
  • 20. City Name Lookup • Use Case – Auto-complete city name • Solution – Create a lookup – RK: searchTerm – CN: (0 padded count)|country|city
  • 21. City Name Lookup • Create Column Family cqlsh>CREATE TABLE name_search (search_term text, occurrence_count int, location_country text, city text, state text, location text, PRIMARY KEY ((search_term), occurrence_count, location_country, city, state));
  • 22. City Name Lookup • Add data cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 31, 'US', 'austin', 'TX', '{json data}'); cqlsh> INSERT INTO name_search (search_term, occurrence_count, location_country, city, state, location) VALUES ('aus', 10, 'US', 'austell', 'GA', '{json data}');
  • 23. City Name Lookup • Search cqlsh>SELECT * FROM name_search WHERE search_term = 'aus' ORDER BY occurrence_count DESC; • Results search_term | occurrence_count | location_country | city | state | location -------------+------------------+------------------+-------------+-------+------------- aus | 31 | US | austin | TX | {json data} aus | 10 | US | austell | GA | {json data} aus | 10 | US | ausablefork | NY | {json data}
  • 24. City Name Lookup • Things to Know – Row width: 10 – 60K – Remove whitespace, special characters, convert search terms to lowercase – Only search when 2 or more characters have been entered
  • 25. Postal Code Range Search • Use Case – Find nearby neighborhoods • Solution – Create a lookup table – RK: country|postal code
  • 26. Postal Code Range Search • Create Column Family cqlsh>CREATE TABLE zip_code_distance (location_country text, zip_code text, distance_e2 int, location text, PRIMARY KEY ((location_country, zip_code), distance_e2));
  • 27. Postal Code Range Search • Add Data cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 0, '{json data for 78741}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 180, '{json data for 78702}'); cqlsh>INSERT INTO zip_code_distance (location_country, zip_code, distance_e2, location) VALUES('US', '78741', 220, '{json data for 78721}');
  • 28. Postal Code Range Search • Search cqlsh>SELECT * FROM zip_code_distance WHERE location_country = 'US' AND zip_code = '78741' AND distance_e2 < 200 ORDER BY distance_e2; • Results location_country | zip_code | distance_e2 | location ------------------+----------+-------------+----------------------- US | 78741 | 0 | {json data for 78741} US | 78741 | 180 | {json data for 78702}
  • 29. Postal Code Range Search • Things to know – Row width: 1 to ~45K
  • 30. Distance Between Postal Codes • Use Case – Estimate the distance between postal codes • Solution – Create a lookup table – RK: country|postal code – CN: country|postal code – Value: distanceE2
  • 31. Distance Between Postal Codes • Create Column Family cqlsh>CREATE TABLE zip_code_distance_between (location_country_1 text, zip_code_1 text, location_country_2 text, zip_code_2 text, distance_e2 int, PRIMARY KEY ((location_country_1, zip_code_1), location_country_2, zip_code_2));
  • 32. Distance Between Postal Codes • Add Data cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78741', 0); cqlsh>INSERT INTO zip_code_distance_between (location_country_1, zip_code_1, location_country_2, zip_code_2, distance_e2) VALUES('US', '78741', 'US', '78702', 180);
  • 33. Distance Between Postal Codes • Select cqlsh>SELECT * FROM zip_code_distance_between WHERE location_country_1 = 'US' AND zip_code_1 = '78741' AND location_country_2 = 'US' AND zip_code_2 = '78702'; • Results location_country_1 | zip_code_1 | location_country_2 | zip_code_2 | distance_e2 --------------------+------------+--------------------+------------+------------- US | 78741 | US | 78702 | 180
  • 34. Distance Between Postal Codes • Things to know – Row width: ~45K
  • 35. Final Thoughts • Why just Cassandra? – Fewer technologies to support • Operations • Development – But be reasonable • Prebuild reference data – Consider prebuilding data to reduce read time
  • 36. Questions & Contact Info Matt Vorst CTO Physi, Inc. matt@physi.rocks