SlideShare a Scribd company logo
Real-time Freight Visibility
How Trimble uses NiFi and SAM to create sub-second transportation visibility
Krishna Potluri and Donnie Wheat
1
Agenda
▪ Transportation Industry Overview
▪ Adding Visibility To Transportation
▪ Reflections On HDF Application Development
2
Safe Harbor Notice
The information presented is for informational purposes only and should not
be relied upon in making a purchasing decision. Trimble is under no legal
obligation to deliver any future products, features or functions within any
specified time frame, if at all. Release dates and content are subject to
change at Trimble’s sole discretion.
3
Transportation Industry Overview
4
Transportation Industry
▪ Freight is moved via Truck, Trains, Rail, Ferry, etc,
and any Combination
▪ Trucks carries 10.55B tons of freight annually,
70.9% of 14.88B total (ATA)
▪ Shippers increasing demand for visibility of status
and estimation
▪ Industry continues to rely on 1980s EDI technology
▪ Most carriers running Transportation Management
Systems on in house Databases
5
The Shipper Dilemma
6
Visibility, Historically Speaking
▪ Common Surface Transportation Issues
– Manual Customer Service Process
– No Proactive, Reliable Notifications
– Dynamic ETAs Not Available
– Stale Transit Data
– Lack Of Shipment Visibility
7
Adding Visibility To Transportation
8
Transportation Visibility
➢ Truck Check Calls send multiple times
per hour
➢ End-to-end Visibility With Automated,
Geo-fenced Notifications
➢ Dynamic ETAs
➢ Proactive Customer Service Interaction
➢ Real-time Transit Data
➢ Full Shipment Visibility
9
Technical Requirements
▪ Streaming data application
– If data is not a stream, make
it a stream
▪ Source data from
– Database
– Web services
– Message bus
▪ Rapid development
▪ Start small and grow
infrastructure with data growth
10
Processing Approach
▪ Minimal Client Impact, heavy lifting in SaaS world
▪ Customers store order data in 10-20 tables in Relational
Database
▪ Collect key data elements from customer database for
lookup and processing
▪ Receive updates from customer every few minutes as
customer desired
▪ As Trucks move, check calls are sent
– Look up order details
– Provide Visibility
▪ Zero touch client side for new functionality
11
Look Order Data
Truck + Order
Visibility
Phoenix
Customer
DB
Check Calls
Constant
Updates
Data Estimation
12
Data Reality
13
▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs
– Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers
▪ 2,700,000 records saved per day average
▪ 700,000 Check Calls processed per day average
▪ 9,000,000 records initial data set per customer average
▪ 100,000,000 records saved maximum in a day (with smaller setup)
▪ 330,000,000 records stored in Phoenix
▪ 687 ms average process time for each Check Call
– 4-8 Phoenix database reads
▪ 12-21 ms average
– 2 MSSQL configuration reads
▪ 150 ms average
▪ 47 ms Phoenix record save average
Transportation Data Flow Architecture
14
Analytics
HDF Architecture
DATA
PROVIDERS/
CONSUMERS
TRIMBLE IDENTITY &
AUTHORIZATION
ENTERPRISESERVICEBUS
APIGateway
MICRO-
SERVICES
CollectConfigConsume
HADOOP CLUSTER
Apache NiFi
▪ Processors handle CRUD and
conversions of data
▪ Expression Language adds incredible
flexibility
▪ JSON Jolt makes for most JSON
processing
▪ Few custom components, but custom
components are easy to add
▪ Script capable to handle moderate
complexity
16
NiFi Optimization
▪ Enable Higher Concurrent Tasks for
intensive processors
▪ NiFi automatically balances where
threads go
▪ Increase threads in controller settings
to optimize concurrency
▪ Real time and historical visibility for
performance improvement
▪ Balance Thread Pool size against
Database Pool size
17
Micro Nifi Apps
▪ Begin and End Process Group with
Kafka Queue
▪ Process Group Focussed on simple
data flows, solve simple problems
▪ Taking micro-service concept to Nifi
▪ No master flow, simply manage
Kafka Queues, consumers and
producers
18
HDF Application
▪ Kafka allows data ingestion from services
– Used to scale NiFI processing across the cluster
– Enables Micro NiFi Apps to handle specific processing
▪ Schema Registry
– Schema with version control
– Seamless integration with Nifi, Kafka, and SAM
▪ SAM
– Easy Ingestion to Hbase, Druid
– Easy to scale it to millions of transactions
– Custom processors capabilities
– Event/Rules driven workflow
19
HDP Integration
▪ Phoenix / HBase for storage fast access storage
– 330,000,000+ records persistently stored in first 6 months
▪ Phoenix Indexes provide significant Query Performance
improvement
– Optimized Indexes for reference data, 1 to many lookup
– Sequence of columns in index crucial to performance
– Primary Key is efficient for 1 to 1 lookup of columns
▪ Hive for archive and Data Science Access
20
Custom NiFi Processor
▪ Custom Processor: JDBC Results To Attributes
▪ Flow required quickly lookup referential data
from Phoenix
▪ Reading straight to attribute increases
performance, reduces flow complexity.
▪ Planned replaced by Ignite cache, but sped
time to market
21
Custom and 3rd Party
▪ Data Collector
– Change Data Capture aware
– Multiple database type support
– Converts database data to events in messages
▪ Java APIs
– Manage centralized configuration of Data Collection
– Ability to configure data to collect per customer
– Zero touch remote sites
▪ Trimble Identity with WSO2
– API Gateway
– Identity Management
22
Deployment model
▪ Azure environment
▪ Cloudbreak Deployment
– Deploy HDP to Azure Resource group
– Customize Template to add HDF components as Compute Nodes
▪ Dockerized Deployment
– Microservices
– ESB, API Gateway
– Trimble Identity & Authorization
23
Reflections On HDF Application Development
24
HDF Successes
▪ Out of the Box Nifi has processors for pretty much everything
▪ First customer processing with-in 120 days
▪ Nifi for data flow, but also data warehousing
– Used Nifi to collect reporting metrics and make available in MSSQL
Data Warehouse
▪ Performance
– Initial 6 node cluster processed over 100 million records in a day
▪ Bug forced select clients to re-push full database
▪ Each record processed by minimum 10 NiFi processors
▪ 1 Billion NiFi Tasks
▪ 4 Core, 14 GB Ram - Small Machines
▪ 1 NiFi, 3 Datanodes for Phoenix
25
HDF Challenges
26
▪ Initial workflows are long and sequential
– Breaking into Micro NiFi apps
– Leveraging Kafka for simpler flows
▪ Phoenix coupling to HBase requires re-thinking databases
– Manage Security In HBase
– JOIN Optimization for complex queries
– Small cluster increases difficulty
▪ SAM - Feature rich DIY abilities, we needed fast
development, relied on Nifi
SAM Integration
27
SAM Custom Processors
1. SqlServerEnrichmentProcessor
2. SqlServerEnrichmentCacheableProcessor (Cacheable and
with Hikari Pool)
3. PhoenixEnrichmentProcessor
4. PhoenixEnrichmentCacheableProcessor
5. JSONTransformationProcessor
6. RestApiSinkCustomProcessor
28
Apache Phoenix JOIN Optimization
29
▪ Traditional JOIN of 2 Large Datasets create timeouts
▪ Indexing did not improve performance
▪ Subqueries did not improve performance
▪ Traditional Query
– SELECT A.NAME, B.REFERENCE
FROM A
INNER JOIN B ON A.ID = B.ID
WHERE A.ID = <SOME_ID>
▪ JOIN to query with reduced data set
– SELECT A.NAME , B.REFERENCE
FROM A
LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID
WHERE A.ID = <SOME_ID>
Adding Master Data Management
▪ Applied to internal and
customer data
▪ Visibility is also required for
stakeholders
▪ Created NiFi flows to harvest
operational data
▪ Aggregated data sent to cloud
database for executive reports
30
Next Steps
▪ Better Data Warehouse and Data Science Integration
▪ Full integration to Ignite for lookups for complex processing
▪ Integration of additional Source Data
▪ Add additional Visibility Providers
31

More Related Content

PDF
Introducing Neo4j
PPTX
Snowflake Architecture.pptx
PPTX
AWS Lake Formation Deep Dive
PDF
Webinar Data Mesh - Part 3
PPTX
Data Sharing with Snowflake
PDF
Data Mesh Part 4 Monolith to Mesh
PPTX
Intro to Neo4j
PDF
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...
Introducing Neo4j
Snowflake Architecture.pptx
AWS Lake Formation Deep Dive
Webinar Data Mesh - Part 3
Data Sharing with Snowflake
Data Mesh Part 4 Monolith to Mesh
Intro to Neo4j
How to Build the Data Mesh Foundation: A Principled Approach | Zhamak Dehghan...

What's hot (20)

PDF
Building Lakehouses on Delta Lake with SQL Analytics Primer
PDF
Enabling a Data Mesh Architecture with Data Virtualization
PPT
Data quality architecture
PDF
Data Mesh for Dinner
PDF
Improving Data Literacy Around Data Architecture
PDF
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
PDF
Snowflake for Data Engineering
PDF
Azure Purview Data Toboggan Erwin de Kreuk
PPTX
Informatica Cloud Summer 2016 Release Webinar Slides
PDF
PDF
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
PPTX
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
PDF
Demystifying Data Warehousing as a Service - DFW
PDF
Elk - An introduction
PDF
Time to Talk about Data Mesh
PDF
Unifying IT with Outcome-Aware AIOps
PDF
Data Pipline Observability meetup
PPTX
Data Lakehouse, Data Mesh, and Data Fabric (r2)
PDF
Real Life Examples of Cybersecurity with Neo4j
PPTX
Azure data platform overview
Building Lakehouses on Delta Lake with SQL Analytics Primer
Enabling a Data Mesh Architecture with Data Virtualization
Data quality architecture
Data Mesh for Dinner
Improving Data Literacy Around Data Architecture
Log System As Backbone – How We Built the World’s Most Advanced Vector Databa...
Snowflake for Data Engineering
Azure Purview Data Toboggan Erwin de Kreuk
Informatica Cloud Summer 2016 Release Webinar Slides
Data Integration Alternatives: When to use Data Virtualization, ETL, and ESB
[DSC Europe 22] Lakehouse architecture with Delta Lake and Databricks - Draga...
Demystifying Data Warehousing as a Service - DFW
Elk - An introduction
Time to Talk about Data Mesh
Unifying IT with Outcome-Aware AIOps
Data Pipline Observability meetup
Data Lakehouse, Data Mesh, and Data Fabric (r2)
Real Life Examples of Cybersecurity with Neo4j
Azure data platform overview
Ad

Similar to Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility (20)

PDF
How to run a bank on Apache CloudStack
PPTX
Hhm 3474 mq messaging technologies and support for high availability and acti...
PDF
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PDF
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
PDF
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
PDF
Pivotal Real Time Data Stream Analytics
PDF
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
PDF
How the Development Bank of Singapore solves on-prem compute capacity challen...
PDF
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
PDF
Capital One Delivers Risk Insights in Real Time with Stream Processing
PPTX
Modernizing your Application Architecture with Microservices
PDF
Enabling big data & AI workloads on the object store at DBS
PPTX
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
PDF
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
PDF
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
PDF
Pulsar - Real-time Analytics at Scale
PDF
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
PDF
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
PPTX
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
PPTX
Data at Scale - Michael Peacock, Cloud Connect 2012
How to run a bank on Apache CloudStack
Hhm 3474 mq messaging technologies and support for high availability and acti...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
Slides: Start Small, Grow Big with a Unified Scale-Out Infrastructure
Row #9: An architecture overview of APNIC's RDAP deployment to the cloud
Pivotal Real Time Data Stream Analytics
Introducing Events and Stream Processing into Nationwide Building Society (Ro...
How the Development Bank of Singapore solves on-prem compute capacity challen...
Spark as part of a Hybrid RDBMS Architecture-John Leach Cofounder Splice Machine
Capital One Delivers Risk Insights in Real Time with Stream Processing
Modernizing your Application Architecture with Microservices
Enabling big data & AI workloads on the object store at DBS
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
Inside Freshworks' Migration from Cassandra to ScyllaDB by Premkumar Patturaj
Mohamed Amine Abdessemed – Real-time Data Integration with Apache Flink & Kafka
Pulsar - Real-time Analytics at Scale
AME-1934 : Enable Active-Active Messaging Technology to Extend Workload Balan...
C* Summit 2013: Large Scale Data Ingestion, Processing and Analysis: Then, No...
Creating a Centralized Consumer Profile Management Service with WebSphere Dat...
Data at Scale - Michael Peacock, Cloud Connect 2012
Ad

More from DataWorks Summit (20)

PPTX
Data Science Crash Course
PPTX
Floating on a RAFT: HBase Durability with Apache Ratis
PPTX
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
PDF
HBase Tales From the Trenches - Short stories about most common HBase operati...
PPTX
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
PPTX
Managing the Dewey Decimal System
PPTX
Practical NoSQL: Accumulo's dirlist Example
PPTX
HBase Global Indexing to support large-scale data ingestion at Uber
PPTX
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
PPTX
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
PPTX
Supporting Apache HBase : Troubleshooting and Supportability Improvements
PPTX
Security Framework for Multitenant Architecture
PDF
Presto: Optimizing Performance of SQL-on-Anything Engine
PPTX
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
PPTX
Extending Twitter's Data Platform to Google Cloud
PPTX
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
PPTX
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
PPTX
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
PDF
Computer Vision: Coming to a Store Near You
PPTX
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Data Science Crash Course
Floating on a RAFT: HBase Durability with Apache Ratis
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
HBase Tales From the Trenches - Short stories about most common HBase operati...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Managing the Dewey Decimal System
Practical NoSQL: Accumulo's dirlist Example
HBase Global Indexing to support large-scale data ingestion at Uber
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Security Framework for Multitenant Architecture
Presto: Optimizing Performance of SQL-on-Anything Engine
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Extending Twitter's Data Platform to Google Cloud
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Computer Vision: Coming to a Store Near You
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark

Recently uploaded (20)

PDF
Advanced methodologies resolving dimensionality complications for autism neur...
PDF
Network Security Unit 5.pdf for BCA BBA.
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
PDF
Approach and Philosophy of On baking technology
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
PDF
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
PDF
Electronic commerce courselecture one. Pdf
PPTX
A Presentation on Artificial Intelligence
PDF
cuic standard and advanced reporting.pdf
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Modernizing your data center with Dell and AMD
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPT
Teaching material agriculture food technology
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Chapter 3 Spatial Domain Image Processing.pdf
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PPTX
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
PDF
Mobile App Security Testing_ A Comprehensive Guide.pdf
Advanced methodologies resolving dimensionality complications for autism neur...
Network Security Unit 5.pdf for BCA BBA.
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
Unlocking AI with Model Context Protocol (MCP)
Bridging biosciences and deep learning for revolutionary discoveries: a compr...
Approach and Philosophy of On baking technology
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
TokAI - TikTok AI Agent : The First AI Application That Analyzes 10,000+ Vira...
Electronic commerce courselecture one. Pdf
A Presentation on Artificial Intelligence
cuic standard and advanced reporting.pdf
Dropbox Q2 2025 Financial Results & Investor Presentation
Modernizing your data center with Dell and AMD
Digital-Transformation-Roadmap-for-Companies.pptx
Teaching material agriculture food technology
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Chapter 3 Spatial Domain Image Processing.pdf
Reach Out and Touch Someone: Haptics and Empathic Computing
VMware vSphere Foundation How to Sell Presentation-Ver1.4-2-14-2024.pptx
Mobile App Security Testing_ A Comprehensive Guide.pdf

Real-time Freight Visibility: How TMW Systems uses NiFi and SAM to create sub-second transportation visibility

  • 1. Real-time Freight Visibility How Trimble uses NiFi and SAM to create sub-second transportation visibility Krishna Potluri and Donnie Wheat 1
  • 2. Agenda ▪ Transportation Industry Overview ▪ Adding Visibility To Transportation ▪ Reflections On HDF Application Development 2
  • 3. Safe Harbor Notice The information presented is for informational purposes only and should not be relied upon in making a purchasing decision. Trimble is under no legal obligation to deliver any future products, features or functions within any specified time frame, if at all. Release dates and content are subject to change at Trimble’s sole discretion. 3
  • 5. Transportation Industry ▪ Freight is moved via Truck, Trains, Rail, Ferry, etc, and any Combination ▪ Trucks carries 10.55B tons of freight annually, 70.9% of 14.88B total (ATA) ▪ Shippers increasing demand for visibility of status and estimation ▪ Industry continues to rely on 1980s EDI technology ▪ Most carriers running Transportation Management Systems on in house Databases 5
  • 7. Visibility, Historically Speaking ▪ Common Surface Transportation Issues – Manual Customer Service Process – No Proactive, Reliable Notifications – Dynamic ETAs Not Available – Stale Transit Data – Lack Of Shipment Visibility 7
  • 8. Adding Visibility To Transportation 8
  • 9. Transportation Visibility ➢ Truck Check Calls send multiple times per hour ➢ End-to-end Visibility With Automated, Geo-fenced Notifications ➢ Dynamic ETAs ➢ Proactive Customer Service Interaction ➢ Real-time Transit Data ➢ Full Shipment Visibility 9
  • 10. Technical Requirements ▪ Streaming data application – If data is not a stream, make it a stream ▪ Source data from – Database – Web services – Message bus ▪ Rapid development ▪ Start small and grow infrastructure with data growth 10
  • 11. Processing Approach ▪ Minimal Client Impact, heavy lifting in SaaS world ▪ Customers store order data in 10-20 tables in Relational Database ▪ Collect key data elements from customer database for lookup and processing ▪ Receive updates from customer every few minutes as customer desired ▪ As Trucks move, check calls are sent – Look up order details – Provide Visibility ▪ Zero touch client side for new functionality 11 Look Order Data Truck + Order Visibility Phoenix Customer DB Check Calls Constant Updates
  • 13. Data Reality 13 ▪ 3 Nifi, 3 Kafka, 4 HDFS/RegionServers VMs – Originally 1 Nifi, 1 Kafka, 3 HDFS/RegionServers ▪ 2,700,000 records saved per day average ▪ 700,000 Check Calls processed per day average ▪ 9,000,000 records initial data set per customer average ▪ 100,000,000 records saved maximum in a day (with smaller setup) ▪ 330,000,000 records stored in Phoenix ▪ 687 ms average process time for each Check Call – 4-8 Phoenix database reads ▪ 12-21 ms average – 2 MSSQL configuration reads ▪ 150 ms average ▪ 47 ms Phoenix record save average
  • 14. Transportation Data Flow Architecture 14
  • 15. Analytics HDF Architecture DATA PROVIDERS/ CONSUMERS TRIMBLE IDENTITY & AUTHORIZATION ENTERPRISESERVICEBUS APIGateway MICRO- SERVICES CollectConfigConsume HADOOP CLUSTER
  • 16. Apache NiFi ▪ Processors handle CRUD and conversions of data ▪ Expression Language adds incredible flexibility ▪ JSON Jolt makes for most JSON processing ▪ Few custom components, but custom components are easy to add ▪ Script capable to handle moderate complexity 16
  • 17. NiFi Optimization ▪ Enable Higher Concurrent Tasks for intensive processors ▪ NiFi automatically balances where threads go ▪ Increase threads in controller settings to optimize concurrency ▪ Real time and historical visibility for performance improvement ▪ Balance Thread Pool size against Database Pool size 17
  • 18. Micro Nifi Apps ▪ Begin and End Process Group with Kafka Queue ▪ Process Group Focussed on simple data flows, solve simple problems ▪ Taking micro-service concept to Nifi ▪ No master flow, simply manage Kafka Queues, consumers and producers 18
  • 19. HDF Application ▪ Kafka allows data ingestion from services – Used to scale NiFI processing across the cluster – Enables Micro NiFi Apps to handle specific processing ▪ Schema Registry – Schema with version control – Seamless integration with Nifi, Kafka, and SAM ▪ SAM – Easy Ingestion to Hbase, Druid – Easy to scale it to millions of transactions – Custom processors capabilities – Event/Rules driven workflow 19
  • 20. HDP Integration ▪ Phoenix / HBase for storage fast access storage – 330,000,000+ records persistently stored in first 6 months ▪ Phoenix Indexes provide significant Query Performance improvement – Optimized Indexes for reference data, 1 to many lookup – Sequence of columns in index crucial to performance – Primary Key is efficient for 1 to 1 lookup of columns ▪ Hive for archive and Data Science Access 20
  • 21. Custom NiFi Processor ▪ Custom Processor: JDBC Results To Attributes ▪ Flow required quickly lookup referential data from Phoenix ▪ Reading straight to attribute increases performance, reduces flow complexity. ▪ Planned replaced by Ignite cache, but sped time to market 21
  • 22. Custom and 3rd Party ▪ Data Collector – Change Data Capture aware – Multiple database type support – Converts database data to events in messages ▪ Java APIs – Manage centralized configuration of Data Collection – Ability to configure data to collect per customer – Zero touch remote sites ▪ Trimble Identity with WSO2 – API Gateway – Identity Management 22
  • 23. Deployment model ▪ Azure environment ▪ Cloudbreak Deployment – Deploy HDP to Azure Resource group – Customize Template to add HDF components as Compute Nodes ▪ Dockerized Deployment – Microservices – ESB, API Gateway – Trimble Identity & Authorization 23
  • 24. Reflections On HDF Application Development 24
  • 25. HDF Successes ▪ Out of the Box Nifi has processors for pretty much everything ▪ First customer processing with-in 120 days ▪ Nifi for data flow, but also data warehousing – Used Nifi to collect reporting metrics and make available in MSSQL Data Warehouse ▪ Performance – Initial 6 node cluster processed over 100 million records in a day ▪ Bug forced select clients to re-push full database ▪ Each record processed by minimum 10 NiFi processors ▪ 1 Billion NiFi Tasks ▪ 4 Core, 14 GB Ram - Small Machines ▪ 1 NiFi, 3 Datanodes for Phoenix 25
  • 26. HDF Challenges 26 ▪ Initial workflows are long and sequential – Breaking into Micro NiFi apps – Leveraging Kafka for simpler flows ▪ Phoenix coupling to HBase requires re-thinking databases – Manage Security In HBase – JOIN Optimization for complex queries – Small cluster increases difficulty ▪ SAM - Feature rich DIY abilities, we needed fast development, relied on Nifi
  • 28. SAM Custom Processors 1. SqlServerEnrichmentProcessor 2. SqlServerEnrichmentCacheableProcessor (Cacheable and with Hikari Pool) 3. PhoenixEnrichmentProcessor 4. PhoenixEnrichmentCacheableProcessor 5. JSONTransformationProcessor 6. RestApiSinkCustomProcessor 28
  • 29. Apache Phoenix JOIN Optimization 29 ▪ Traditional JOIN of 2 Large Datasets create timeouts ▪ Indexing did not improve performance ▪ Subqueries did not improve performance ▪ Traditional Query – SELECT A.NAME, B.REFERENCE FROM A INNER JOIN B ON A.ID = B.ID WHERE A.ID = <SOME_ID> ▪ JOIN to query with reduced data set – SELECT A.NAME , B.REFERENCE FROM A LEFT JOIN (SELECT B.REFERENCE FROM B WHERE B.ID = <SOME_ID>) AS B ON B.ID = A.ID WHERE A.ID = <SOME_ID>
  • 30. Adding Master Data Management ▪ Applied to internal and customer data ▪ Visibility is also required for stakeholders ▪ Created NiFi flows to harvest operational data ▪ Aggregated data sent to cloud database for executive reports 30
  • 31. Next Steps ▪ Better Data Warehouse and Data Science Integration ▪ Full integration to Ignite for lookups for complex processing ▪ Integration of additional Source Data ▪ Add additional Visibility Providers 31