SlideShare a Scribd company logo
PHP & Continuous Data ProcessingMichael Peacock, October, 2011
No. Not milk floats (anymore)All Electric, Commercial Vehicles.Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013
About Michael PeacockSenior/Lead Web Developer
Web Systems Developer
Telemetry Team – Smith Electric Vehicles US Corp
Author
PHP 5 Social Networking, PHP 5 E-Commerce Development, Drupal Social Networking (6 & 7), Selling online with Drupal e-Commerce, Building Websites with TYPO3
PHPNE Volunteer
Occasional technical speaker
PHP North-East, PHPNW 2010, SuperMondays, PHPNW 2011 Unconference, ConFoo 2012Smith Electric Vehicles & Telemetry	Worlds largest manufacturer of Commercial, all-electric vehiclesSmith Link – on-board vehicle telematics system, capturing over 2500 data points each second on the vehicle and broadcasting them over mobile network~400 telemetry enabled vehicles on the roadWorlds largest telemetry project outside of F1
System Architecture
System Architecture
Problem #1: We Can’t Loose Any DataData is required as part of a $32 million grant from the US Department of EnergyThousands of pieces of information collected on a per second basis from a range of remote collection devices
Un-predictable amounts of data at any one time
More vehicles rolling off the production line with telemetry enabled
What about system downtime, upgrades, roll-outs and connectivity problems?Message QueuingSolution: We use a fast, reliable, scalable, secure, hosted message queueIf our systems are offline, data builds up in the external message queue
If we are processing at full capacity, surplus builds in in the message queue
If the vehicle loses GPRS signal, or message queue were to be inaccessible, vehicles have an internal buffer of up to 7 daysSecret Weapon #1: StormMQBased on AMQP, an open standard
Secure: All data is encrypted and sent over SSL
Reliable: Huge investment in server infrastructure
Hosted: Backed up with an SLA
Scalable: Capable of processing huge numbers of incoming messages, with capacity to store the messages when we perform maintenance on our systemsProblem #2: Processing data quicklyWe utilise a dedicated server and number of dedicated applications to pull these messages and process themThis needs to happen quick enough for live data to be seen through the web interface
Data is rapidly converted into batch SQL files, which are imported to MySQL via “LOAD DATA INFILE”
Results in high number of inserts per second (20,000 – 80,000)
LOAD DATA INFILE isn’t enough on its own...Secret Weapon #2: DBASam Lambert – DBA ExtraordinaireConstantly tweaking the servers and configuration to get more and more performance
Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before
www.samlambert.com
http://www.samlambert.com/2011/07/how-to-push-your-san-with-open-iscsi_13.html
http://www.samlambert.com/2011/07/diagnosing-and-fixing-mysql-io.html
sam.lambert@smithelectric.comShardingHuge volumes of data being stored
We shard the data based on the truck it came from, each truck has its own database
Databases held on one of many database servers in our cluster each with ~100GB RAMLive, Real Time Information[live screen photo]
Real Time Status and Tracking
Live, Real Time Information: ProblemOriginal database design dictated:All data-points were stored in the same table
Each type of data point required a separate query, sub-query or join to obtainWorkings of the remote device collecting the data, and the processing server, dictated:GPS Co-ordinates can be up to 6 separate data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; DirectionReal Time Information: ConcurrentInitial Solution from the original developers:Pull as many pieces of real time information through asynchronously

More Related Content

PPTX
PHP & Twilio
PPTX
Agility Requires Safety
PPTX
Speed = $$$
PPTX
You are not_hiding_from_me_.net
PDF
Midwest PHP 2017 DevOps For Small team
PDF
Continuous delivery of your legacy application
PDF
Selenium and Sauce Labs
PDF
php[world] 2015 Laravel 5.1: From Homestead to the Cloud
PHP & Twilio
Agility Requires Safety
Speed = $$$
You are not_hiding_from_me_.net
Midwest PHP 2017 DevOps For Small team
Continuous delivery of your legacy application
Selenium and Sauce Labs
php[world] 2015 Laravel 5.1: From Homestead to the Cloud

What's hot (20)

PDF
php[world] 2015 Training - Laravel from the Ground Up
PDF
ColdFusion builder plugins
PDF
Advanced Spring Boot with Consul
PPTX
Selenium-4-and-appium-2
PDF
POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...
PDF
Drupal Performance
PPTX
Automatic Test Results Publishing using Slack
PDF
Securing Legacy CFML Code
ZIP
Drupal Deployment
PDF
WordPress London Developer Operations For Beginners
PDF
Dennis Benkert - The Dog Ate My Deployment - Symfony Usergroup Berlin March ...
 
PPTX
The Dog Ate My Deployment - PHP Uncoference September 2013
 
PDF
The Dog Ate My Deployment - Symfony Usergroup Cologne July 2013
 
PPT
VodQA_ParallelizingCukes_AmanKing
PPT
Reducing Build Time
PDF
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
PPTX
Helpful Automation Techniques - Selenium Camp 2014
PDF
Connecting Connect with Spring Boot
PPTX
Selenium tutorial (1)
PPTX
Selenium tutorial
php[world] 2015 Training - Laravel from the Ground Up
ColdFusion builder plugins
Advanced Spring Boot with Consul
Selenium-4-and-appium-2
POST/CON 2019 Workshop: Testing, Automated Testing, and Reporting APIs with P...
Drupal Performance
Automatic Test Results Publishing using Slack
Securing Legacy CFML Code
Drupal Deployment
WordPress London Developer Operations For Beginners
Dennis Benkert - The Dog Ate My Deployment - Symfony Usergroup Berlin March ...
 
The Dog Ate My Deployment - PHP Uncoference September 2013
 
The Dog Ate My Deployment - Symfony Usergroup Cologne July 2013
 
VodQA_ParallelizingCukes_AmanKing
Reducing Build Time
Serverless in production, an experience report (Going Serverless, 28 Feb 2018)
Helpful Automation Techniques - Selenium Camp 2014
Connecting Connect with Spring Boot
Selenium tutorial (1)
Selenium tutorial
Ad

Viewers also liked (6)

PDF
Refactoring to symfony components
PPT
Dance for the puppet master: G6 Tech Talk
PDF
Phinx talk
PPT
Powerful and flexible templates with Twig
PPT
Introduction to OOP with PHP
PPTX
Multimedia chapter 2
Refactoring to symfony components
Dance for the puppet master: G6 Tech Talk
Phinx talk
Powerful and flexible templates with Twig
Introduction to OOP with PHP
Multimedia chapter 2
Ad

Similar to PHP Continuous Data Processing (20)

KEY
Evolution of a big data project
PPTX
Microsoft Windows Server AppFabric
PPT
Giga Spaces Data Grid / Data Caching Overview
PDF
Log everything! @DC13
PDF
Data science for infrastructure dev week 2022
PDF
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
PDF
6 tips for improving ruby performance
PPT
Wikilims Road4
PPTX
Scaling asp.net websites to millions of users
PDF
Embulk, an open-source plugin-based parallel bulk data loader
PPTX
High Volume Payments using Mule
PPTX
Sherlock Homepage - A detective story about running large web services (VISUG...
PPTX
Sherlock Homepage (Maarten Balliauw)
PPTX
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
PDF
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
PPT
nodejs_at_a_glance, understanding java script
PDF
Big Events, Mob Scale - Darach Ennis (Push Technology)
PDF
Big Data, Mob Scale.
PDF
AD102 - Break out of the Box
PDF
Deconstructing Lambda
Evolution of a big data project
Microsoft Windows Server AppFabric
Giga Spaces Data Grid / Data Caching Overview
Log everything! @DC13
Data science for infrastructure dev week 2022
DataBearings: A semantic platform for data integration on IoT, Artem Katasonov
6 tips for improving ruby performance
Wikilims Road4
Scaling asp.net websites to millions of users
Embulk, an open-source plugin-based parallel bulk data loader
High Volume Payments using Mule
Sherlock Homepage - A detective story about running large web services (VISUG...
Sherlock Homepage (Maarten Balliauw)
SplunkLive! Frankfurt 2018 - Data Onboarding Overview
OSDC 2019 | Democratizing Data at Go-JEK by Maulik Soneji
nodejs_at_a_glance, understanding java script
Big Events, Mob Scale - Darach Ennis (Push Technology)
Big Data, Mob Scale.
AD102 - Break out of the Box
Deconstructing Lambda

More from Michael Peacock (18)

PPTX
Immutable Infrastructure with Packer Ansible and Terraform
PPTX
Test driven APIs with Laravel
PPTX
Symfony Workflow Component - Introductory Lightning Talk
PPTX
Alexa, lets make a skill
PPTX
API Development with Laravel
PPTX
An introduction to Laravel Passport
KEY
KEY
Phpne august-2012-symfony-components-friends
PPTX
Real time voice call integration - Confoo 2012
PPTX
Dealing with Continuous Data Processing, ConFoo 2012
PPTX
Data at Scale - Michael Peacock, Cloud Connect 2012
PPTX
Supermondays twilio
PPTX
PHP North East Registry Pattern
PPTX
PHP North East - Registry Design Pattern
PPTX
Supermondays: Jenkins CI lightning talk
PPTX
Corporate Structures - September 2010
PPTX
PHP North-East - Automated Deployment
PPTX
Abstracting functionality with centralised content
Immutable Infrastructure with Packer Ansible and Terraform
Test driven APIs with Laravel
Symfony Workflow Component - Introductory Lightning Talk
Alexa, lets make a skill
API Development with Laravel
An introduction to Laravel Passport
Phpne august-2012-symfony-components-friends
Real time voice call integration - Confoo 2012
Dealing with Continuous Data Processing, ConFoo 2012
Data at Scale - Michael Peacock, Cloud Connect 2012
Supermondays twilio
PHP North East Registry Pattern
PHP North East - Registry Design Pattern
Supermondays: Jenkins CI lightning talk
Corporate Structures - September 2010
PHP North-East - Automated Deployment
Abstracting functionality with centralised content

Recently uploaded (20)

PDF
Unlocking AI with Model Context Protocol (MCP)
PDF
Empathic Computing: Creating Shared Understanding
PDF
NewMind AI Monthly Chronicles - July 2025
PDF
Encapsulation theory and applications.pdf
PDF
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
PDF
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
PDF
Machine learning based COVID-19 study performance prediction
PDF
Reach Out and Touch Someone: Haptics and Empathic Computing
PDF
Modernizing your data center with Dell and AMD
PPTX
Understanding_Digital_Forensics_Presentation.pptx
DOCX
The AUB Centre for AI in Media Proposal.docx
PDF
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
PPT
“AI and Expert System Decision Support & Business Intelligence Systems”
PPTX
Digital-Transformation-Roadmap-for-Companies.pptx
PPTX
Cloud computing and distributed systems.
PDF
Dropbox Q2 2025 Financial Results & Investor Presentation
PDF
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
PPTX
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
PPTX
Big Data Technologies - Introduction.pptx
PPTX
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx
Unlocking AI with Model Context Protocol (MCP)
Empathic Computing: Creating Shared Understanding
NewMind AI Monthly Chronicles - July 2025
Encapsulation theory and applications.pdf
Shreyas Phanse Resume: Experienced Backend Engineer | Java • Spring Boot • Ka...
How UI/UX Design Impacts User Retention in Mobile Apps.pdf
Machine learning based COVID-19 study performance prediction
Reach Out and Touch Someone: Haptics and Empathic Computing
Modernizing your data center with Dell and AMD
Understanding_Digital_Forensics_Presentation.pptx
The AUB Centre for AI in Media Proposal.docx
7 ChatGPT Prompts to Help You Define Your Ideal Customer Profile.pdf
“AI and Expert System Decision Support & Business Intelligence Systems”
Digital-Transformation-Roadmap-for-Companies.pptx
Cloud computing and distributed systems.
Dropbox Q2 2025 Financial Results & Investor Presentation
Build a system with the filesystem maintained by OSTree @ COSCUP 2025
Effective Security Operations Center (SOC) A Modern, Strategic, and Threat-In...
Big Data Technologies - Introduction.pptx
KOM of Painting work and Equipment Insulation REV00 update 25-dec.pptx

PHP Continuous Data Processing

  • 1. PHP & Continuous Data ProcessingMichael Peacock, October, 2011
  • 2. No. Not milk floats (anymore)All Electric, Commercial Vehicles.Photo courtesy of kenjonbro: http://www.flickr.com/photos/kenjonbro/4037649210/in/set-72157623026469013
  • 5. Telemetry Team – Smith Electric Vehicles US Corp
  • 7. PHP 5 Social Networking, PHP 5 E-Commerce Development, Drupal Social Networking (6 & 7), Selling online with Drupal e-Commerce, Building Websites with TYPO3
  • 10. PHP North-East, PHPNW 2010, SuperMondays, PHPNW 2011 Unconference, ConFoo 2012Smith Electric Vehicles & Telemetry Worlds largest manufacturer of Commercial, all-electric vehiclesSmith Link – on-board vehicle telematics system, capturing over 2500 data points each second on the vehicle and broadcasting them over mobile network~400 telemetry enabled vehicles on the roadWorlds largest telemetry project outside of F1
  • 13. Problem #1: We Can’t Loose Any DataData is required as part of a $32 million grant from the US Department of EnergyThousands of pieces of information collected on a per second basis from a range of remote collection devices
  • 14. Un-predictable amounts of data at any one time
  • 15. More vehicles rolling off the production line with telemetry enabled
  • 16. What about system downtime, upgrades, roll-outs and connectivity problems?Message QueuingSolution: We use a fast, reliable, scalable, secure, hosted message queueIf our systems are offline, data builds up in the external message queue
  • 17. If we are processing at full capacity, surplus builds in in the message queue
  • 18. If the vehicle loses GPRS signal, or message queue were to be inaccessible, vehicles have an internal buffer of up to 7 daysSecret Weapon #1: StormMQBased on AMQP, an open standard
  • 19. Secure: All data is encrypted and sent over SSL
  • 20. Reliable: Huge investment in server infrastructure
  • 21. Hosted: Backed up with an SLA
  • 22. Scalable: Capable of processing huge numbers of incoming messages, with capacity to store the messages when we perform maintenance on our systemsProblem #2: Processing data quicklyWe utilise a dedicated server and number of dedicated applications to pull these messages and process themThis needs to happen quick enough for live data to be seen through the web interface
  • 23. Data is rapidly converted into batch SQL files, which are imported to MySQL via “LOAD DATA INFILE”
  • 24. Results in high number of inserts per second (20,000 – 80,000)
  • 25. LOAD DATA INFILE isn’t enough on its own...Secret Weapon #2: DBASam Lambert – DBA ExtraordinaireConstantly tweaking the servers and configuration to get more and more performance
  • 26. Pushing the capabilities of our SAN, tweaking configs where no DBA has gone before
  • 31. We shard the data based on the truck it came from, each truck has its own database
  • 32. Databases held on one of many database servers in our cluster each with ~100GB RAMLive, Real Time Information[live screen photo]
  • 33. Real Time Status and Tracking
  • 34. Live, Real Time Information: ProblemOriginal database design dictated:All data-points were stored in the same table
  • 35. Each type of data point required a separate query, sub-query or join to obtainWorkings of the remote device collecting the data, and the processing server, dictated:GPS Co-ordinates can be up to 6 separate data points, including: Longitude; Latitude; Altitude; Speed; Number of Satellites used to get location; DirectionReal Time Information: ConcurrentInitial Solution from the original developers:Pull as many pieces of real time information through asynchronously
  • 36. Involved the use of Flash based “widgets” which called separate PHP scripts to query the data
  • 38. Data points took a little time to load
  • 39. Not good enoughReal Time Information: CachingHigh volumes of data, and varying levels of concurrent processing means query times are often not consistent
  • 40. Memcachewas used when processing the data from the message queue, keeping a copy of the most recent of each data point for each truck
  • 41. Live, Real-Time information accessed directly from memcache, bypassing the databaseCaching: Registry/DI is IdealSporadic use of memcache within the web application – ideal use case for a lazy loading registry or DI container
  • 42. Give the registry or container details of memcache
  • 43. Object only instantiated and connection made only when data is requested from memcacheLazy Loadingpublic function getObject( $key ){ if( in_array( $key, array_keys( $this->objects ) ) ) { return $this->objects[$key]; }elseif( in_array( $key, array_keys( $this->objectSetup ) ) ) { if( ! is_null( $this->objectSetup[ $key ]['abstract'] ) ) {require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this->objectSetup[ $key ]['abstract'] .'.abstract.php' ); }require_once( FRAMEWORK_PATH . 'registry/aspects/' . $this->objectSetup[ $key ]['folder'] . '/' . $this- >objectSetup[ $key ]['file'] . '.class.php' ); $o = new $this->objectSetup[ $key ]['class']( $this ); $this->storeObject( $o, $key ); return $o; }elseif( $key == 'memcache' ) { // requesting memcache for the first time, instantiate, connect, store and return $mc = new Memcache(); $mc->connect( MEMCACHE_SERVER, MEMCACHE_PORT ); $this->storeObject( $mc, 'memcache' ); return $mc; }}Becomes the limit for the registry pattern, DI container more suitable
  • 44. Real Time Information: Extrapolate and AssumeOur telemetry unit broadcasts each data point once per second
  • 45. Data doesn’t change every second, e.g.
  • 46. Battery state of charge may take several minutes to loose a percentage point
  • 47. Fault flags only change to 1 when there is a fault
  • 49. We compare the data to the last known value…if it’s the same we don’t insert, instead we assume it was the same
  • 50. Unfortunately, this requires us to put additional checks and balances in placeExtrapolate and Assume: “Interlation”Built a special library which:Accepted a number of arrays, each representing a collection of data points for one variable on the truck
  • 51. Used key indicators and time differences to work out if/when the truck was off, and extrapolation should stop
  • 52. For each time data was recorded, pull down data for other variables for consistencyInterlace * Add an array to the interlation public function addArray( $name, $array ) * Get the time that we first receive data in one of our arrays public function getFirst( $field ) * Get the time that we last received data in any of our arrays public function getLast( $field ) * Generate the interlaced array public function generate( $keyField, $valueField) * Beak the interlaced array down into seperate days public function dayBreak( $interlationArray) * Generate an interlaced array and fill for all timestamps withinthe range of _first_ to _last_ public function generateAndFill( $keyField, $valueField) * Populate the new combined array with key fields using the common field public function populateKeysFromField( $field, $valueField=null )http://www.michaelpeacock.co.uk/interlation-library
  • 53. Real Time Information: Single RequestCurrently, each piece of “live data” is loaded into a flash graph or widget, which updates every 30 seconds using an AJAX request
  • 54. The move from MySQL to Memcache reduces database load, but large number of requests still add strain to web server
  • 55. Moving to image and JavaScript widgets, which are updated from a single AJAX requestLots of Data: Race ConditionsSessions in PHP close at the end of the execution cycleUnpredictable query times
  • 56. Large number of concurrent requests per screenSession LockingCompletely locks out a users session, as PHP hasn’t closed the session
  • 57. Race Conditions: PHP & Sessionssession_write_close()Added after each write to the $_SESSION array. Closes the current session.(requires a call to session_start immediately before any further reads or writes)
  • 58. Race Conditions: Use a ******* Template EngineV1 of the system mixed PHP and HTML 
  • 59. You can’t re-initialise your session once output has been sent
  • 60. All new code uses a template engine, so session interaction has no bearing on output. When the template is processed and output, all database and session work has been completed long before.Race Conditions: Use a Single Entry PointRace conditions are further exacerbated by the PHP timeout values
  • 61. Certain exports, actions and processes take longer than 30 seconds, so the default execution time is longer
  • 62. Initially the project lacked a single entry point, and execution flow was muddled
  • 63. Single Entry Point makes it easier to enforce a lower time out, which is overridden by intensive controllers or modelsIntensive queries & CalculationsHow far did this vehicle travel?
  • 64. Motor RPM x Various vehicle specific constants
  • 65. Calculated for every RPM value held during drive process
  • 66. How much energy did the vehicle use
  • 67. Battery Current x Battery Voltage x Time
  • 68. For every current and voltage value combination held during the driving process
  • 69. How well was the vehicle driven
  • 71. Harshness of accelerator and brake pedal usage
  • 72. Inappropriate duration of AC / Heater on time?
  • 73. What about for a customers fleet, or all of our vehicles sold?Intensive Queries & Calculations
  • 74. Intensive queries & CalculationsInvolves a fair number of queries per vehicle
  • 75. Calculations involve holding this data in memory
  • 76. Processing required for every single record for that piece of data during that dayTakes a while!Solution:Calculate information overnight
  • 77. Save it as a compiled report
  • 78. Lookups and comparisons only need to look at the compiled / saved reports in the databaseReportsIn addition to our calculated reports, we also need to export key bits of information to grant authoritiesInitially our PHP based export scripts held one database connection per database (~400 databases)
  • 79. Re-wrote to maintain only one connection per server, and switch the database used
  • 80. Toggles to instruct the export to only apply for 1 of the servers at a time
  • 81. Modulus magic to run multiple export scripts per serverTriggers and EventsCurrently a work-in-progress R&D project, evaluating two options:Golden hammer: Use PHP
  • 82. Run PHP as a daemon
  • 84. Continually monitor for specific changes to memcache variables
  • 88. Link into PHP based API to run triggers The FutureMore sharding
  • 89. Based on time – keep the individual tables smaller
  • 91. Currently investigating NoSQL solutions as alternatives
  • 93. Do we need as much data as we collect?
  • 95. We need to continually abstract concepts and ideas to make on-going maintenance and expansion easier; especially in terms of mapping code to database shards
  • 97. Expand our DB cluster, more RAM, R&D
  • 99. A much needed design refreshConclusionsMake the solution scalable from the start
  • 100. Where data collection is critical, use a message queue, ideally hosted or “cloud based”
  • 101. Hire a genius DBA to push your database engine
  • 102. Make use of data caching systems to reduce strain on the database
  • 103. Calculations and post-processing should be done during dead time and automated
  • 104. Add more tools to your toolbox – PHP needs lots of friends in these situations
  • 105. Watch out for Session race conditions: where they can’t be avoided, use session_write_close, a template engine and a single entry point
  • 106. Reduce the number of continuous AJAX callsQ & AMichael PeacockWeb Systems Developer – Telemetry Team – Smith Electric Vehicles US Corpmichael.peacock@smithelectric.comSenior / Lead Developer, Author & Entrepreneurme@michaelpeacock.co.uk www.michaelpeacock.co.uk@michaelpeacockhttp://joind.in/3808http://www.slideshare.net/michaelpeacock Extra information!

Editor's Notes

  • #16: Imagine viewing a customers fleet of 30 vehicles on a map? 60 queries refreshing every 30 seconds