Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility

Real-time Freight Visibility
How Trimble uses NiFi and SAM to create sub-second transportation visibility
Krishna Potluri and Donnie Wheat
1

Agenda
▪ Transportation Industry Overview
▪ Adding Visibility To Transportation
▪ Reflections On HDF Application Development
2

Safe Harbor Notice
The information presented is for informational purposes only and should not
be relied upon in making a purchasing decision. Trimble is under no legal
obligation to deliver any future products, features or functions within any
specified time frame, if at all. Release dates and content are subject to
change at Trimble’s sole discretion.
3

Transportation Industry Overview
4

Transportation Industry
▪ Freight is moved via Truck, Trains, Rail, Ferry, etc,
and any Combination
▪ Trucks carries 10.55B tons of freight annually,
70.9% of 14.88B total (ATA)
▪ Shippers increasing demand for visibility of status
and estimation
▪ Industry continues to rely on 1980s EDI technology
▪ Most carriers running Transportation Management
Systems on in house Databases
5

Visibility, Historically Speaking
▪ Common Surface Transportation Issues
– Manual Customer Service Process
– No Proactive, Reliable Notifications
– Dynamic ETAs Not Available
– Stale Transit Data
– Lack Of Shipment Visibility
7

Adding Visibility To Transportation
8

Transportation Visibility
➢ Truck Check Calls send multiple times
per hour
➢ End-to-end Visibility With Automated,
Geo-fenced Notifications
➢ Dynamic ETAs
➢ Proactive Customer Service Interaction
➢ Real-time Transit Data
➢ Full Shipment Visibility
9

Technical Requirements
▪ Streaming data application
– If data is not a stream, make
it a stream
▪ Source data from
– Database
– Web services
– Message bus
▪ Rapid development
▪ Start small and grow
infrastructure with data growth
10

Processing Approach
▪ Minimal Client Impact, heavy lifting in SaaS world
▪ Customers store order data in 10-20 tables in Relational
Database
▪ Collect key data elements from customer database for
lookup and processing
▪ Receive updates from customer every few minutes as
customer desired
▪ As Trucks move, check calls are sent
– Look up order details
– Provide Visibility
▪ Zero touch client side for new functionality
11
Look Order Data
Truck + Order
Visibility
Phoenix
Customer
DB
Check Calls
Constant
Updates

Data Reality
13
▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs
– Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers
▪ 2,700,000 records saved per day average
▪ 700,000 Check Calls processed per day average
▪ 9,000,000 records initial data set per customer average
▪ 100,000,000 records saved maximum in a day (with smaller setup)
▪ 330,000,000 records stored in Phoenix
▪ 687 ms average process time for each Check Call
– 4-8 Phoenix database reads
▪ 12-21 ms average
– 2 MSSQL configuration reads
▪ 150 ms average
▪ 47 ms Phoenix record save average

Transportation Data Flow Architecture
14

Analytics
HDF Architecture
DATA
PROVIDERS/
CONSUMERS
TRIMBLE IDENTITY &
AUTHORIZATION
ENTERPRISESERVICEBUS
APIGateway
MICRO-
SERVICES
CollectConfigConsume
HADOOP CLUSTER

Apache NiFi
▪ Processors handle CRUD and
conversions of data
▪ Expression Language adds incredible
flexibility
▪ JSON Jolt makes for most JSON
processing
▪ Few custom components, but custom
components are easy to add
▪ Script capable to handle moderate
complexity
16

NiFi Optimization
▪ Enable Higher Concurrent Tasks for
intensive processors
▪ NiFi automatically balances where
threads go
▪ Increase threads in controller settings
to optimize concurrency
▪ Real time and historical visibility for
performance improvement
▪ Balance Thread Pool size against
Database Pool size
17

Micro Nifi Apps
▪ Begin and End Process Group with
Kafka Queue
▪ Process Group Focussed on simple
data flows, solve simple problems
▪ Taking micro-service concept to Nifi
▪ No master flow, simply manage
Kafka Queues, consumers and
producers
18

HDF Application
▪ Kafka allows data ingestion from services
– Used to scale NiFI processing across the cluster
– Enables Micro NiFi Apps to handle specific processing
▪ Schema Registry
– Schema with version control
– Seamless integration with Nifi, Kafka, and SAM
▪ SAM
– Easy Ingestion to Hbase, Druid
– Easy to scale it to millions of transactions
– Custom processors capabilities
– Event/Rules driven workflow
19

HDP Integration
▪ Phoenix / HBase for storage fast access storage
– 330,000,000+ records persistently stored in first 6 months
▪ Phoenix Indexes provide significant Query Performance
improvement
– Optimized Indexes for reference data, 1 to many lookup
– Sequence of columns in index crucial to performance
– Primary Key is efficient for 1 to 1 lookup of columns
▪ Hive for archive and Data Science Access
20

Custom NiFi Processor
▪ Custom Processor: JDBC Results To Attributes
▪ Flow required quickly lookup referential data
from Phoenix
▪ Reading straight to attribute increases
performance, reduces flow complexity.
▪ Planned replaced by Ignite cache, but sped
time to market
21

Custom and 3rd Party
▪ Data Collector
– Change Data Capture aware
– Multiple database type support
– Converts database data to events in messages
▪ Java APIs
– Manage centralized configuration of Data Collection
– Ability to configure data to collect per customer
– Zero touch remote sites
▪ Trimble Identity with WSO2
– API Gateway
– Identity Management
22

Deployment model
▪ Azure environment
▪ Cloudbreak Deployment
– Deploy HDP to Azure Resource group
– Customize Template to add HDF components as Compute Nodes
▪ Dockerized Deployment
– Microservices
– ESB, API Gateway
– Trimble Identity & Authorization
23

Reflections On HDF Application Development
24

HDF Successes
▪ Out of the Box Nifi has processors for pretty much everything
▪ First customer processing with-in 120 days
▪ Nifi for data flow, but also data warehousing
– Used Nifi to collect reporting metrics and make available in MSSQL
Data Warehouse
▪ Performance
– Initial 6 node cluster processed over 100 million records in a day
▪ Bug forced select clients to re-push full database
▪ Each record processed by minimum 10 NiFi processors
▪ 1 Billion NiFi Tasks
▪ 4 Core, 14 GB Ram - Small Machines
▪ 1 NiFi, 3 Datanodes for Phoenix
25

HDF Challenges
26
▪ Initial workflows are long and sequential
– Breaking into Micro NiFi apps
– Leveraging Kafka for simpler flows
▪ Phoenix coupling to HBase requires re-thinking databases
– Manage Security In HBase
– JOIN Optimization for complex queries
– Small cluster increases difficulty
▪ SAM - Feature rich DIY abilities, we needed fast
development, relied on Nifi

SAM Custom Processors
1. SqlServerEnrichmentProcessor
2. SqlServerEnrichmentCacheableProcessor (Cacheable and
with Hikari Pool)
3. PhoenixEnrichmentProcessor
4. PhoenixEnrichmentCacheableProcessor
5. JSONTransformationProcessor
6. RestApiSinkCustomProcessor
28

Apache Phoenix JOIN Optimization
29
▪ Traditional JOIN of 2 Large Datasets create timeouts
▪ Indexing did not improve performance
▪ Subqueries did not improve performance
▪ Traditional Query
– SELECT A.NAME, B.REFERENCE
FROM A
INNER JOIN B ON A.ID = B.ID
WHERE A.ID = <SOME_ID>
▪ JOIN to query with reduced data set
– SELECT A.NAME , B.REFERENCE
FROM A
LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID
WHERE A.ID = <SOME_ID>

Adding Master Data Management
▪ Applied to internal and
customer data
▪ Visibility is also required for
stakeholders
▪ Created NiFi flows to harvest
operational data
▪ Aggregated data sent to cloud
database for executive reports
30

Next Steps
▪ Better Data Warehouse and Data Science Integration
▪ Full integration to Ignite for lookups for complex processing
▪ Integration of additional Source Data
▪ Add additional Visibility Providers
31

Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility

More Related Content

What's hot (20)

Similar to Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility (20)

More from DataWorks Summit (20)

Recently uploaded (20)

Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility