Guessing the Unguessable: Estimates and Legacy Code
I've often said that we shouldn't try to estimate the time it will take to fix a defect before we know what the defect is. Some defects, like a typo in some text, might be truly easy to identify, fix & verify, but most require some amount of analysis and investigative work to find the cause. Once that's found, then there's time required to make the fix and test it before releasing it to the world.
That process, in turn, is riddled with unknowns. Any experienced software developer has encountered defects that took hours or days to identify and minutes to fix and test, as well as defects that took minutes to identify but hours or days to fix and test. And that's just the time dedicated to the work itself and not including wait times in the team's process!
With that level of variability you simply cannot provide a reasonable estimate for fixing the defect until you've concluded the "identify" step.
After a recent experience, though, I'm tempted to extend this concept to legacy code.
What is Legacy Code?
I like to use the definition that Michael Feathers gave us in his book Working Effectively With Legacy Code:
To me, legacy code is simply code without tests.
He goes on to explain:
What do tests have to do with whether code is bad? To me, the answer is straightforward, and it is a point that I elaborate throughout the book: Code without tests is bad code. It doesn't matter how well written it is; it doesn't matter how pretty or object-oriented or well-encapsulated it is. With tests, we can change the behavior of our code quickly and verifiably. Without them, we really don't know if our code is getting better or worse.
This definition is quite important, because it describes my own quandary quite well.
From The "Real" World
I'm helping a group port a product from one application platform to another and a good amount of the work has already been done. There were, however, no automated tests of any sort which would help me better understand the intent of the code and provide some sort of safety net while making changes. The code itself was a convoluted mess of spaghetti with over a dozen global variables used all over among many other code smells. And documentation? Didn't exist.
The first step was to lift & replace one of the key areas of the system. Ironically, the use of a global variable mean that I could replace the old version with an adapter class that implemented the same API, but then delegated to the new code. This took a bit of time, but after a week or so I was confident in the replacement's implementation, backed by a substantial set of tests.
But there was a problem.
The old code successfully initiated a connection with an external device, while the new code did not. I checked the tests and code to ensure that I hadn't missed something. I added tons of logging to expose what was really happening in the app that maybe wasn't happening in the tests. I stepped through execution in the debugger, but still nothing was standing out that was different.
I brought up this issue in the daily standup meeting and the primary stakeholder wanted to know how I long I thought it would take to fix. I had to tell him that I had no idea because I didn't know the cause of the issue yet.
Finally, after 4 days, I had the idea to capture a before & after view of HTTP calls between the application and the external device. I compared the two outputs and that's when I found it! There was a single query parameter that I was missing that was crucial to establishing the proper connection between the application and the device. I added the missing parameter to the new code and everything worked as it had in the old code! It was a simple oversight when porting the code and hadn't been covered in the tests either.
Nine characters were all that was needed and maybe 5 minutes to verify that the change worked. This was after 4 days of trying to identify the problem.
Now, as a thought exercise, scale this single issue to a non-trivial system with a rich text editor, external device communication and interfaces with external systems and you can understand the risk of this happening many more times before the work is complete.
What Should We Do Instead?
As I said above, we're asked to make these guesses all the time! Then we're held to them despite the risks I've just described.
In this particular case, I'm working with a stakeholder who is quite enlightened and self-aware. He accepted that I wasn't able to provide an estimate, even though he wasn't particularly happy about it. In the end, there was no way that I could have said, "This will take about 4 days to complete".
So if you're a Product Owner, Product Manager, Project Manager or Stakeholder involved with a software development team and asking for work estimates, please understand that there may be situations where it's simply impossible to provide them. If the team is working with legacy code not only will the risk of encountering issues such as the one I describe be higher, over time that risk will increase substantially as more code is slapped into the system without tests. Your role is to provide guidance around whether the work itself is really so important that it needs to be done right now. In my case it was, so I continued. There are likely cases where it could be deferred. Similarly, could the work be simplified? Is there a smaller part that is known that could be delivered now. Finally, when the team is working with legacy code, let them get it under test! That's an investment which is a "get rich slow" scheme - it will pay some dividends immediately, but will compound over time as more and more tests are written to alleviate the risks I've talked about.
If you're the developer being asked to provide the estimate and you already know that you simply don't know how long it will take, say so! Explain the risks I just described but stick to your guns with respect to attempting to guess the unguessable. The only thing you can do is to provide transparency into what you're doing to solve the problem. Provide frequent updates, accept invitations to pair and/or invite others to pair with you, or even have the whole team mobbing on the work to provide the ultimate transparency.
Conclusion
We often can’t estimate how long it’ll take to fix a defect until we actually understand what the defect is, and the same goes for legacy code. As I illustrate in this story, identifying a bug in a system with no tests, poor structure, and zero documentation can take days, even if the eventual fix takes a few minutes. That kind of variability makes early estimation virtually impossible, especially in codebases where there’s no safety net to support confident changes.
Instead of asking developers to guess the unguessable, it’s far more valuable to support the work with questions like: Is this important right now? Can it be simplified or broken down? And above all, let the team get the code under test. That investment won’t pay off all at once, but it will build momentum over time. If you’re a developer and find yourself in this situation, be honest. Say you don’t know. Share what you’re doing to figure it out and invite collaboration. It’s the only responsible way to handle the uncertainty that comes with legacy code.
Just came here to say - yum! (I wouldn't want to unwind each noodle though!)
Fullstack polyglot programmer (TypeScript, Kotlin, Java, JavaScript, React, Vue, Spring, Android) + DevOps, Lean, Agile - leveraged to lead measured validated business impact & catalyse team growth
1moYou need to know something about effort & value to prioritise roughly correctly. I expect you're trading: - formal documented estimates for implicit in-memory estimates - high precision for looser intuition (note: precision doesn't imply accuracy) And the result is likely: - replacing estimation overheads with more value-work done - about the same prioritization success
Wingman Software - Coaching and training in Agile technical practices - Author Test-Driven Development for Embedded C
1moThat's some good looking dish of spaghetti.