Cron Jobs Gone Wrong
Top Mistakes The Keep Your Tasks from Ticking
I hate cron!!
• There’s something wrong with the …
• ssh-ing into a server at 3am
• Grepping through log files, if there even are any
• High rates of WTF per minute
• Abandon best practices all ye who write crons
• Basically just a BAD TIME!!
Who am I?
• Peter Meth
• Software Architect at Givelify
• Hail from Toronto, Canada
• Writing, Fixing, Breaking
software for 20+ years
• Meetup organizer for 9+ years
• Conference organizer for 6
years
Why this talk?
• Spent years debugging cron jobs gone wrong
• Want to help others avoid that pain
• 9 years ago made TrueCron in a hackathon at TrueNorth
PHP
• 6+ years using Jenkins as a cron runner (hacky but works)
• Recently made Cron Pilot … more on that later
• Cron is not evil, developers just keep making common
mistakes
• Let’s look at 7 common mistakes
• But first …
Quick Primer
Cron Overview
• For running programs on a recurring schedule
• Available on most Linux systems
• Each user has a list of jobs, people often use “root” user
• List of jobs to run is called the cron table (or crontab)
• crontab -l to list
• crontab -e to edit (usually opens vi or nano)
• Each line is a job
• Lines starting with # are comments
• Saving the file updates the schedule right away
Use / for modulo, eg. */5 = every 5 {units}
Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking
Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking
Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking
Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking
Misconfigured Syntax
and Paths
Mistake 1
❶ ② ③ ④ ⑤ ⑥ ⑦
Scenario
Boss: We need to email a sales report to all the
executives and board members every day.
You: The report is resource intensive. We should
process it during off peak hours, say 1am. I’ll
make the cron.
* 1 * * * cd /var/scripts; php report.php >> report.log
Done!
❶ ② ③ ④ ⑤ ⑥ ⑦
The next day .. I am a hero
• NOPE
• Boss is annoyed
• Nobody got their report
• What happened?
❶ ② ③ ④ ⑤ ⑥ ⑦
Investigate
• Log says
• /bin/sh: 1: php: not found
• Command says
• … cd /var/scripts; php report.php
• Cron runs as user where defined and may not load the $PATH or other
environment variables
• Can’t find “php”
PATH=/usr/bin:$PATH
# or
… cd /var/scripts; /usr/bin/php report.php
❶ ② ③ ④ ⑤ ⑥ ⑦
The next day, we’re good right?
• Wrong!!! Boss is angry
• Big outage last night during peak hours
• Board members complaining they got 60
of the same report
• What happened?
❶ ② ③ ④ ⑤ ⑥ ⑦
Look at the cron syntax
* 1 * * * …
• This means “every time where the hour matches 1”, so
• 1:00, 1:01, 1:02, …
• Unless specified, uses server timezone (usually UTC), so
• 1am UTC = 9pm EDT
TZ=America/New_York
0 1 * * * …
❶ ② ③ ④ ⑤ ⑥ ⑦
Advice
• Specify timezone in crontab
• Prepend $PATH in crontab or use full paths
• Check your cron syntax & next few run times
• crontab.guru
• Test your jobs in advance, under the user with the crontab
• Produce logs that you can look at after
• Bonus: Look into application-based scheduling
• Laravel Scheduler
• PHP Cron Scheduler (non-Laravel)
❶ ② ③ ④ ⑤ ⑥ ⑦
Silent Failures
Mistake 2
❶ ❷ ③ ④ ⑤ ⑥ ⑦
Who knew?
• If a cron fails in the forest and no one gets alerted, does it make a
sound?
• I dunno, but it still failed!!
• Set it and forget it mentality
• Failures can go unnoticed for a long time
• Lack of monitoring
• Slowness may not trigger as a failure
• Monitoring is part of cron job / on same server
• Depending on type of failure, alerting may not trigger
• Redirecting output to /dev/null
• Forensics after a failure is hard
❶ ❷ ③ ④ ⑤ ⑥ ⑦
War story
• Givelify uses cron jobs to process recurring donations
• Years ago this failed for 3 days unnoticed
• Found out when someone contacted support
• 100’s of thousands of $ in missed donations
• Remedying with donors had to be done carefully
• 10’s of thousands in goodwill credits
• Bad app reviews
• Tarnished brand reputation
❶ ❷ ③ ④ ⑤ ⑥ ⑦
Advice
• Log all output and errors
• Option 1: … >> /var/log/report.log 2>&1
• Option 2: log within your application
• Push logs to a centralized logging platform
• Datadog, Sentry, ELK stack
• What happens if server goes down?
• Use a monitoring / alerting service
• Healthchecks.io, Cronitor
• Roll your own??
❶ ❷ ③ ④ ⑤ ⑥ ⑦
Writing Crons in the Wrong
Language
Mistake 3
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
Why are you bashing bash?
• Many devs & sys admins think crons need to be in a shell
scripting language
• bash / sh
• perl
• ...
• Not usually core competency of your dev team
• Hard to be expressive, organize code, use abstractions, debug,
refactor
• Hard to tie into re-usable components
• Limited features (notifications, db queries, 3rd
party integrations)
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
Collection from db?
.env file
Upload to S3
Command builder
abstraction
Logging, notifications
What about?
• Calculating run time
• Storing results in db
• Advanced querying /
incremental
• Timestamps in the filename
• Connecting to 3rd
party
integrations
• Unit & Feature Tests
• etc
Leverage a framework
• Laravel (Zero) / Prompts
• Pretty console applications
(TUIs)
• Elegant handling of
• Inputs
• Arguments
• Options
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
Advice
• Write jobs/commands instead of crons
• Does something distinct
• May or may not have a schedule
• Write jobs the same way you write the rest of your system
• Re-use components, objects, services
• Leverage a framework
• Play into the team’s core competencies
• Involve everyone in the process (product, qa, engineers,
ops)
❶ ❷ ❸ ④ ⑤ ⑥ ⑦
Not Following
Software
Development
Lifecycle
Mistake 4
Photo: Licensed CC-BY-3.0 https://geek-and-poke.com
❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
What is different about crons?
• Developed by system admins
• Only live on the server, not in version control
• No code review
• No testing
• Don’t follow the deployment practices of rest of system
• Don’t get refactored frequently
• Generally not thought about unless they fail
❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
Advice
• Keep jobs within your codebase
• Commit to version control
• Subject cron jobs to code reviews
• Exercise same jobs in multiple environments
• Local, Qa, Demo, Production
• Test your jobs
• Think about features, enhancements, refactoring
• Crons are software, not an afterthought!!
❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
Resource Depletion
Mistake 5
❶ ❷ ❸ ❹ ❺ ⑥ ⑦
There’s only so much to go around
• Server: CPU, RAM, Disk Space
• Network: Bandwidth, remote file systems
• Database: Connection Limits, Locks, CPU, Slow queries
• API request limits
• Parallel jobs can compete
• Also competing with live system
• Crons slow down application
• Busy dates, times of day
• Unusual dates / events (eg. Black Friday)
❶ ❷ ❸ ❹ ❺ ⑥ ⑦
Advice
• Monitor system resources
• CPU, Memory, Disk Usage, Network Load
• Multiple Warning Levels (eg 75%, 90%, 100%)
• Rotate log files, other temporary files
• Pacing API calls depending on limits
• Run during off-peak hours (mind the timezone)
• Scale up during busy hours
• Monitor the entire system to detect slowdowns
❶ ❷ ❸ ❹ ❺ ⑥ ⑦
File Fiascos
Permissions, Locks, Paths
Mistake 6
❶ ❷ ❸ ❹ ❺ ❻ ⑦
Let’s file this under … Oh Crap!!
• Many things can impact your job or the files it works with
• Incorrect paths / case sensitivity
• Deleted / moved files / broken symlinks
• File / folder permission issues
• Locked files
• The file system can have issues
• Disk full / ran out of inodes
• Unmounted drives
• Read-only file system
❶ ❷ ❸ ❹ ❺ ❻ ⑦
Advice
• Tough to proactively handle every scenario
• Use absolute file paths in cron and app code
• Test jobs in a production-like environment
• Error handling in code
• Check if files & folders exist and are read / writeable
• Monitoring / alerting of failures
• General server monitoring
• Try not to lock files unless necessary
❶ ❷ ❸ ❹ ❺ ❻ ⑦
Overlapping Jobs
Mistake 7
❶ ❷ ❸ ❹ ❺ ❻ ❼
The great cron dogpile
• Same job starts again before previous finishes
• Double processing
• Resource contention
• Snowball effect
• Multiple different jobs running at the same time
• Resource contention
• Jobs that sometimes take much longer than expected
• Hung processes
❶ ❷ ❸ ❹ ❺ ❻ ❼
Locking strategies to prevent overlaps
• Cron by default does not prevent overlaps
• Use flock
• Implement locking mechanism in code
• Database locking
• Redis locking
• Extra logic can be added
❶ ❷ ❸ ❹ ❺ ❻ ❼
❶ ❷ ❸ ❹ ❺ ❻ ❼
Advice
• Stagger start times
• Use file locks to prevent overlaps
• Monitor run times
• Test edge cases
• Look for endless loops / waits
• Break bulk processing into smaller chunks
• Consider using queues, daemons instead of cron
❶ ❷ ❸ ❹ ❺ ❻ ❼
Recap
• Top mistakes and how to fix
1. Misconfigured syntax and paths
2. Silent failures
3. Written in the wrong language
4. Not following SDLC
5. Resource depletion
6. File permissions, locks, paths
7. Overlapping jobs
The Jenkins Solution
• Centralized
• Easily accessible via the web
• Can set access permissions
• Captures output for later review
• Sends notifications on success/failure
• Can organize into subfolders
• Easy to test & run jobs ad-hoc
• Easy on/off switch
• Demo …
Cron Pilot
• All the good things that Jenkins solution brings
• Nicer user interface, designed around crons
• Incorporate the advice from this talk
• Searching, filtering, import/export
• Flexible, easy to understand scheduler
• Many more features planned
• It’s PHP!!!
• Demo …
What to do next?
• Check to see if your cron jobs are subject to any of these
mistakes
• Try Cron Pilot on your projects or at your workplace
• Contribute
• Add features, fix bugs
• Report bugs, submit ideas
• Add documentation, record videos
• Star on GitHub, share with others
• Try Jenkins if Cron Pilot is not for you
https://github.com/cronpilot/cronpilot
Rate this talk
• Helps organizers
• Helps speakers
• Helps attendees
https://joind.in/talk/
2addc
The End
… until this talk runs again
@devopsmeth
phpc.social/@pmeth
petermeth
pmeth@delvia.com

More Related Content

PDF
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
PPT
Automating your plugin with WP-Cron
PDF
Operationalizing Clojure Confidently
PPTX
Automate Everything! (No stress development/Tallinn)
PPTX
Systems administration for coders presentation
ODP
Automating MySQL operations with Puppet
PDF
Unix in the Cloud — Ignorance, Stagnation, Obsolescence
ZIP
Continuous Integration, Build Pipelines and Continuous Deployment
Building Efficient and Reliable Crawler System With Sidekiq Enterprise
Automating your plugin with WP-Cron
Operationalizing Clojure Confidently
Automate Everything! (No stress development/Tallinn)
Systems administration for coders presentation
Automating MySQL operations with Puppet
Unix in the Cloud — Ignorance, Stagnation, Obsolescence
Continuous Integration, Build Pipelines and Continuous Deployment

Similar to Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking (20)

PPTX
It Works On Dev
PDF
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
PDF
How to get started with Site Reliability Engineering
PDF
From scheduled downtime to self-healing
PDF
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
PDF
Puppet for Sys Admins
PDF
SymfonyCon Madrid 2014 - Rock Solid Deployment of Symfony Apps
PDF
Monitoring and automation
PDF
Koha Cronjobs
PDF
Updated non-lab version of Level Up. Delivered at LOPSA-East, May 3, 2014.
PPTX
Slides from LAX & DEN usergroup meetings
PDF
Creating a mature puppet system
PDF
Creating a Mature Puppet System
KEY
Asynchronous Awesome
PDF
Symfony Live NYC 2014 - Rock Solid Deployment of Symfony Apps
PPTX
Continuous feature-development
PDF
Scaling Cron at Slack by Claire Adams, Slack
PDF
Git Makes Me Angry Inside
PDF
GitOps , done Right
PDF
What we talk about when we talk about DevOps
It Works On Dev
Running a Scalable And Reliable Symfony2 Application in Cloud (Symfony Sweden...
How to get started with Site Reliability Engineering
From scheduled downtime to self-healing
Hacking for fun & profit - The Kubernetes Way - Demi Ben-Ari - Panorays
Puppet for Sys Admins
SymfonyCon Madrid 2014 - Rock Solid Deployment of Symfony Apps
Monitoring and automation
Koha Cronjobs
Updated non-lab version of Level Up. Delivered at LOPSA-East, May 3, 2014.
Slides from LAX & DEN usergroup meetings
Creating a mature puppet system
Creating a Mature Puppet System
Asynchronous Awesome
Symfony Live NYC 2014 - Rock Solid Deployment of Symfony Apps
Continuous feature-development
Scaling Cron at Slack by Claire Adams, Slack
Git Makes Me Angry Inside
GitOps , done Right
What we talk about when we talk about DevOps
Ad

Recently uploaded (20)

PDF
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
PDF
BoxLang Dynamic AWS Lambda - Japan Edition
PDF
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
PPTX
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
PDF
Guide to Food Delivery App Development.pdf
PPTX
Cybersecurity: Protecting the Digital World
PDF
Topaz Photo AI Crack New Download (Latest 2025)
PPTX
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
PDF
Microsoft Office 365 Crack Download Free
DOC
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
PPTX
Full-Stack Developer Courses That Actually Land You Jobs
PDF
Practical Indispensable Project Management Tips for Delivering Successful Exp...
PDF
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
PDF
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
PDF
DNT Brochure 2025 – ISV Solutions @ D365
PPTX
Airline CRS | Airline CRS Systems | CRS System
PDF
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
PPTX
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
PDF
Wondershare Recoverit Full Crack New Version (Latest 2025)
PDF
Visual explanation of Dijkstra's Algorithm using Python
AI-Powered Threat Modeling: The Future of Cybersecurity by Arun Kumar Elengov...
BoxLang Dynamic AWS Lambda - Japan Edition
The Dynamic Duo Transforming Financial Accounting Systems Through Modern Expe...
MLforCyber_MLDataSetsandFeatures_Presentation.pptx
Guide to Food Delivery App Development.pdf
Cybersecurity: Protecting the Digital World
Topaz Photo AI Crack New Download (Latest 2025)
Cybersecurity-and-Fraud-Protecting-Your-Digital-Life.pptx
Microsoft Office 365 Crack Download Free
UTEP毕业证学历认证,宾夕法尼亚克拉里恩大学毕业证未毕业
Full-Stack Developer Courses That Actually Land You Jobs
Practical Indispensable Project Management Tips for Delivering Successful Exp...
Type Class Derivation in Scala 3 - Jose Luis Pintado Barbero
How AI/LLM recommend to you ? GDG meetup 16 Aug by Fariman Guliev
DNT Brochure 2025 – ISV Solutions @ D365
Airline CRS | Airline CRS Systems | CRS System
DuckDuckGo Private Browser Premium APK for Android Crack Latest 2025
4Seller: The All-in-One Multi-Channel E-Commerce Management Platform for Glob...
Wondershare Recoverit Full Crack New Version (Latest 2025)
Visual explanation of Dijkstra's Algorithm using Python
Ad

Cron Jobs Gone Wrong: The Top Mistakes That Keep Your Tasks from Ticking

  • 1. Cron Jobs Gone Wrong Top Mistakes The Keep Your Tasks from Ticking
  • 2. I hate cron!! • There’s something wrong with the … • ssh-ing into a server at 3am • Grepping through log files, if there even are any • High rates of WTF per minute • Abandon best practices all ye who write crons • Basically just a BAD TIME!!
  • 3. Who am I? • Peter Meth • Software Architect at Givelify • Hail from Toronto, Canada • Writing, Fixing, Breaking software for 20+ years • Meetup organizer for 9+ years • Conference organizer for 6 years
  • 4. Why this talk? • Spent years debugging cron jobs gone wrong • Want to help others avoid that pain • 9 years ago made TrueCron in a hackathon at TrueNorth PHP • 6+ years using Jenkins as a cron runner (hacky but works) • Recently made Cron Pilot … more on that later • Cron is not evil, developers just keep making common mistakes • Let’s look at 7 common mistakes • But first …
  • 6. Cron Overview • For running programs on a recurring schedule • Available on most Linux systems • Each user has a list of jobs, people often use “root” user • List of jobs to run is called the cron table (or crontab) • crontab -l to list • crontab -e to edit (usually opens vi or nano) • Each line is a job • Lines starting with # are comments • Saving the file updates the schedule right away
  • 7. Use / for modulo, eg. */5 = every 5 {units}
  • 12. Misconfigured Syntax and Paths Mistake 1 ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 13. Scenario Boss: We need to email a sales report to all the executives and board members every day. You: The report is resource intensive. We should process it during off peak hours, say 1am. I’ll make the cron. * 1 * * * cd /var/scripts; php report.php >> report.log Done! ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 14. The next day .. I am a hero • NOPE • Boss is annoyed • Nobody got their report • What happened? ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 15. Investigate • Log says • /bin/sh: 1: php: not found • Command says • … cd /var/scripts; php report.php • Cron runs as user where defined and may not load the $PATH or other environment variables • Can’t find “php” PATH=/usr/bin:$PATH # or … cd /var/scripts; /usr/bin/php report.php ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 16. The next day, we’re good right? • Wrong!!! Boss is angry • Big outage last night during peak hours • Board members complaining they got 60 of the same report • What happened? ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 17. Look at the cron syntax * 1 * * * … • This means “every time where the hour matches 1”, so • 1:00, 1:01, 1:02, … • Unless specified, uses server timezone (usually UTC), so • 1am UTC = 9pm EDT TZ=America/New_York 0 1 * * * … ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 18. Advice • Specify timezone in crontab • Prepend $PATH in crontab or use full paths • Check your cron syntax & next few run times • crontab.guru • Test your jobs in advance, under the user with the crontab • Produce logs that you can look at after • Bonus: Look into application-based scheduling • Laravel Scheduler • PHP Cron Scheduler (non-Laravel) ❶ ② ③ ④ ⑤ ⑥ ⑦
  • 19. Silent Failures Mistake 2 ❶ ❷ ③ ④ ⑤ ⑥ ⑦
  • 20. Who knew? • If a cron fails in the forest and no one gets alerted, does it make a sound? • I dunno, but it still failed!! • Set it and forget it mentality • Failures can go unnoticed for a long time • Lack of monitoring • Slowness may not trigger as a failure • Monitoring is part of cron job / on same server • Depending on type of failure, alerting may not trigger • Redirecting output to /dev/null • Forensics after a failure is hard ❶ ❷ ③ ④ ⑤ ⑥ ⑦
  • 21. War story • Givelify uses cron jobs to process recurring donations • Years ago this failed for 3 days unnoticed • Found out when someone contacted support • 100’s of thousands of $ in missed donations • Remedying with donors had to be done carefully • 10’s of thousands in goodwill credits • Bad app reviews • Tarnished brand reputation ❶ ❷ ③ ④ ⑤ ⑥ ⑦
  • 22. Advice • Log all output and errors • Option 1: … >> /var/log/report.log 2>&1 • Option 2: log within your application • Push logs to a centralized logging platform • Datadog, Sentry, ELK stack • What happens if server goes down? • Use a monitoring / alerting service • Healthchecks.io, Cronitor • Roll your own?? ❶ ❷ ③ ④ ⑤ ⑥ ⑦
  • 23. Writing Crons in the Wrong Language Mistake 3 ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 24. Why are you bashing bash? • Many devs & sys admins think crons need to be in a shell scripting language • bash / sh • perl • ... • Not usually core competency of your dev team • Hard to be expressive, organize code, use abstractions, debug, refactor • Hard to tie into re-usable components • Limited features (notifications, db queries, 3rd party integrations) ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 25. ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 26. Collection from db? .env file Upload to S3 Command builder abstraction Logging, notifications What about? • Calculating run time • Storing results in db • Advanced querying / incremental • Timestamps in the filename • Connecting to 3rd party integrations • Unit & Feature Tests • etc Leverage a framework • Laravel (Zero) / Prompts • Pretty console applications (TUIs) • Elegant handling of • Inputs • Arguments • Options ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 27. ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 28. Advice • Write jobs/commands instead of crons • Does something distinct • May or may not have a schedule • Write jobs the same way you write the rest of your system • Re-use components, objects, services • Leverage a framework • Play into the team’s core competencies • Involve everyone in the process (product, qa, engineers, ops) ❶ ❷ ❸ ④ ⑤ ⑥ ⑦
  • 29. Not Following Software Development Lifecycle Mistake 4 Photo: Licensed CC-BY-3.0 https://geek-and-poke.com ❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
  • 30. What is different about crons? • Developed by system admins • Only live on the server, not in version control • No code review • No testing • Don’t follow the deployment practices of rest of system • Don’t get refactored frequently • Generally not thought about unless they fail ❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
  • 31. Advice • Keep jobs within your codebase • Commit to version control • Subject cron jobs to code reviews • Exercise same jobs in multiple environments • Local, Qa, Demo, Production • Test your jobs • Think about features, enhancements, refactoring • Crons are software, not an afterthought!! ❶ ❷ ❸ ❹ ⑤ ⑥ ⑦
  • 32. Resource Depletion Mistake 5 ❶ ❷ ❸ ❹ ❺ ⑥ ⑦
  • 33. There’s only so much to go around • Server: CPU, RAM, Disk Space • Network: Bandwidth, remote file systems • Database: Connection Limits, Locks, CPU, Slow queries • API request limits • Parallel jobs can compete • Also competing with live system • Crons slow down application • Busy dates, times of day • Unusual dates / events (eg. Black Friday) ❶ ❷ ❸ ❹ ❺ ⑥ ⑦
  • 34. Advice • Monitor system resources • CPU, Memory, Disk Usage, Network Load • Multiple Warning Levels (eg 75%, 90%, 100%) • Rotate log files, other temporary files • Pacing API calls depending on limits • Run during off-peak hours (mind the timezone) • Scale up during busy hours • Monitor the entire system to detect slowdowns ❶ ❷ ❸ ❹ ❺ ⑥ ⑦
  • 35. File Fiascos Permissions, Locks, Paths Mistake 6 ❶ ❷ ❸ ❹ ❺ ❻ ⑦
  • 36. Let’s file this under … Oh Crap!! • Many things can impact your job or the files it works with • Incorrect paths / case sensitivity • Deleted / moved files / broken symlinks • File / folder permission issues • Locked files • The file system can have issues • Disk full / ran out of inodes • Unmounted drives • Read-only file system ❶ ❷ ❸ ❹ ❺ ❻ ⑦
  • 37. Advice • Tough to proactively handle every scenario • Use absolute file paths in cron and app code • Test jobs in a production-like environment • Error handling in code • Check if files & folders exist and are read / writeable • Monitoring / alerting of failures • General server monitoring • Try not to lock files unless necessary ❶ ❷ ❸ ❹ ❺ ❻ ⑦
  • 38. Overlapping Jobs Mistake 7 ❶ ❷ ❸ ❹ ❺ ❻ ❼
  • 39. The great cron dogpile • Same job starts again before previous finishes • Double processing • Resource contention • Snowball effect • Multiple different jobs running at the same time • Resource contention • Jobs that sometimes take much longer than expected • Hung processes ❶ ❷ ❸ ❹ ❺ ❻ ❼
  • 40. Locking strategies to prevent overlaps • Cron by default does not prevent overlaps • Use flock • Implement locking mechanism in code • Database locking • Redis locking • Extra logic can be added ❶ ❷ ❸ ❹ ❺ ❻ ❼
  • 41. ❶ ❷ ❸ ❹ ❺ ❻ ❼
  • 42. Advice • Stagger start times • Use file locks to prevent overlaps • Monitor run times • Test edge cases • Look for endless loops / waits • Break bulk processing into smaller chunks • Consider using queues, daemons instead of cron ❶ ❷ ❸ ❹ ❺ ❻ ❼
  • 43. Recap • Top mistakes and how to fix 1. Misconfigured syntax and paths 2. Silent failures 3. Written in the wrong language 4. Not following SDLC 5. Resource depletion 6. File permissions, locks, paths 7. Overlapping jobs
  • 44. The Jenkins Solution • Centralized • Easily accessible via the web • Can set access permissions • Captures output for later review • Sends notifications on success/failure • Can organize into subfolders • Easy to test & run jobs ad-hoc • Easy on/off switch • Demo …
  • 45. Cron Pilot • All the good things that Jenkins solution brings • Nicer user interface, designed around crons • Incorporate the advice from this talk • Searching, filtering, import/export • Flexible, easy to understand scheduler • Many more features planned • It’s PHP!!! • Demo …
  • 46. What to do next? • Check to see if your cron jobs are subject to any of these mistakes • Try Cron Pilot on your projects or at your workplace • Contribute • Add features, fix bugs • Report bugs, submit ideas • Add documentation, record videos • Star on GitHub, share with others • Try Jenkins if Cron Pilot is not for you https://github.com/cronpilot/cronpilot
  • 47. Rate this talk • Helps organizers • Helps speakers • Helps attendees https://joind.in/talk/ 2addc
  • 48. The End … until this talk runs again @devopsmeth phpc.social/@pmeth petermeth pmeth@delvia.com

Editor's Notes

  • #8: Here’s an example of listing out the cron table
  • #9: If you’re lucky you might have a nicely formatted and commented file
  • #10: But more likely you’ll have something like this
  • #15: Fix it by adding the path to the executables
  • #17: Fix by adding the timezone and the minutes
  • #21: Some donors made a manual donation to make up for it. Some may not kept funds aside to cover. Some had missed multiple donations. Some got auto-cancelled.
  • #26: but i don't usually write php this way
  • #27: if you start adding these things to a bash script it can quickly become unreadable, at least in my opinion. i'm sure there's some bash expert out there that would look at this and say no problem. this makes complete sense. i am not that person. this is really hard to read.