Vibe coding - Let's do a code review
We all use AI to help us coding, some more, some less. We use it at different levels, some as code-autocomplete on steroids, some ask to generate entire applications with one prompt.
The sweet spot will be somewhere in between: simple to describe, not straight forward to implement, little wiggle room to implement it correctly, lots of examples available (e.g. here).
So dear Mr. AI, lets do a code review like with any other developer.
Prompt: How do I read the last 10 rows from a Kafka topic with Java?
The code generated is this:
That is quite impressive. First question is, is this code syntactically correct? The IDE says no, the import org.apache.kafka.common.PartitionInfo is missing. Embarrassing for you to deliver code that cannot even be executed, dear developer!
Question 1: Code styling?
Passed. Quite okay, no complaints. Easy to understand, comments where it makes sense but not too verbose, variables named properly. With the exception of the "record" variable in the for loop. "record" is a syntax word in Java. Code still works, just not nice. I am picky, I know. My apologies.
Note: I have added line breaks for better readability here in LinkedIn. The original code looks better in that regards.
Question 2: Does it provide the correct output?
Passed: It seeks all partitions to the end, positions the reader to end position minus 10, handles the case when the offset is below 10, reads all data in a 0.5secs poll in a loop, has an exit condition of the loop that must be triggered eventually, collects all records in a single list. Fine.
Once all data has been read, it sorts the data by partition offset and returns a sub-list of the last ten. Fail, the offset number does no correlate to time and I have asked for the last ten, not the 10 with the highest offset. Simple example: I had one partition for a long time, the current offset is 123456. Then I have added another partition, its offset is 0 and by coincidence, 10 ten last records added all ended up in the new partition with offset 0 to 9.
The code will show me the same old 10 records with the high offsets but not the most recently added ones.
Big mistake! Unnecessary mistake. Easy to fix as the records returned by the poll have a timestamp also. Needs quite some code refactoring but no big deal.
Would somebody have noticed it if he relied on the code? Nope. Problematic fail!
Another potential issue occurs when the topic is filled at a high frequency, as the poll would then return many records beyond the end offsets determined before. This can cause an unbalance in the returned data. Would have been better to add only records to the lastMessages array where its offset < endoffset of the partition.
Question 3: Code performance and memory usage?
Passed: It tries to read the minimum amount of data by using a seek of the end position minus 10 for each partition, which kinds of assumes that the offsets are a dense set of numbers. They are not. Offsets could have been deleted by tombstone messages or by log compaction. But there is no way to do that any better, so I accept that. Would have been nice to get a warning from you, dear Junior developer.
Fail: What is totally unacceptable as style is this code sequence, the test if all end offsets have been reached and the read loop can be exited.
In each iteration, it is checked if the endoffsets set is contained in the lastMessages set. So all read records are compared in each single iteration. Let's say the first poll returned 1000 records, 1000 comparisons are made. In the next iteration with another 1000 records, now 2000 comparisons are made.
Yeeah, yeah, I know, we did seek to endoffsets -10 and the default max.poll.records is 500, so that loop will not cope with too many records, but it is the principle. What if ask for the last 10'000 records next time?
The other point is, it would create 2x1000 TopicPartition instances, although there are only 8 partitions and compare the offsets for each. Why on earth 1000 comparisons, when 8 are enough?
And the code does not take into consideration that the record has an offset, no need to call the consumer.position().
What I would have done is define the endOffsets as Map<Integer, Long> instead of Map<TopicPartition, Long>, thus avoiding to create TopicPartition objects just for the comparison.
Then inside the record loop...
compare the endoffset of the partition with the record offset. If it has been reached, remove it.
Then we no longer need the "done" variable, the while loop can check how many partitions are left. If zero, all has been read. Thus we have way less comparisons as we take advantage from the fact that polling returns strictly ascending offsets per partition.
Question 4: Other code smells?
Passed: Not really.
The code sequence
describes what this settings does, but this setting is not used at all. The code does manually seek to a specific position. So this is misleading.
I would not have used
and the read the position of the next fetch for each partition explicitly, but rather returned the current endoffsets directly:
That would save some code lines, nothing too important.
Question 5: Production ready?
For production readiness, rails must be followed. Does it handle errors gracefully? Does it recover from errors automatically? Does it return useful error messages? As this code is just a method, not e.g. a rest service, I do not expect much here and if, it would be project specific.
What is missing is logging. So let me tell that.
Prompt: Add logging to above code.
Passed: The result is again quite good. Logger info() messages at start/end, at every interesting state change but not too many log lines. Logger debug() info for every record. An additional try-catch block to log errors.
My only complaint is this line inside the record loop
This would violate legal requirements as it spills sensitive data (PII or even PHI data under GDPR) to IT operations. Printing the entire payload is a no-go, a fail.
Artificial Intelligence?
There code piece with the Math.max(latestOffset - numberOfMessages, 0) is an example, where I would have said to a Junior developer "Good you thought about that situation".
But the AI did not ask itself "What if there are only two records in the partition and we want to last 10? In this case we cannot accomplish that, can only return the last 2 and the start offset must be set to 0" the same way the developer did. All it has seen is many code pieces with offsets where the Math.max() functtion has been used. No understanding, just experience, aka training.
When I asked "Optimize above code for better performance", this lack of understanding became more obvious. It added an ArrayDeque to store 10 items at max and dropped the first item before adding the new one. We have talked in the code review section of what would make sense.
Where does that leave us?
I have absolutely no idea. Whatever I found in this experiment is outdated tomorrow. Different prompt, different AI model, luck - all impacts the result.
Based on the code review, this is clearly written by a junior. I would not move this code to production.
What we have to keep in mind are
Software development is only as good as we describe the desired results. I did not tell about the logging and performance requirement in the initial prompt. Can I blame the "developer" then? Hmm, yes, to some degree. I would expect that my coworkers know that production ready code is expected and that performance is a factor always. We are building a solution, not a sample code.
AI does neither understand nor can think logically yet. The latter is currently worked on (see reasoning models) and the former is a problem for me. Given that AI does not understand, the results are impressive. It seems that even we humans only simulate understanding most of the time.
AI is a huge time saver, as it frees up a lot of code development time from us.
AI is a huge waste of time, as first I have to explain all in detail and then the result needs a careful code review and probably a rewrite.
Expert SAP Developer * Chief Nerdess at Boring Enterprise Nerds * SAP Mentor Alumna * Book author * Conference speaker * The First of Her Name * Protector of The ABAP Realm
2moThanks for sharing this, Werner Daehn ! Your conclusion that it's a time-saver and time-waster at the same time perfectly articulates conclusion I came to in my own experiments. And I appreciate your honesty in admitting to have no idea where does this leave us. :) This is the most practical post I've seen on the subject lately.
Glad you thought about the compacted topics, this was my first thought when I read that answer. Your conclusions are inline with my finding as well. AI is a huge time saver in brainstorming but the code it generates is average at best and needs a lot of fixes.
Cloud & AI Platform Specialist for Public Sector
2moIt would be valuable to see how GitHub Copilot performs in this test scenarios with additional high-level prompt for code performance and memory efficiency.
Solution Architect | Director at SAP Architecture & Platform Advisory 🇯🇵 🇩🇪
2moGood article! I found this fascinating (it's already a bit older) when it comes to telling AI "Write better code!" https://minimaxir.com/2025/01/write-better-code/ Let me know what it does to your example 😀
🔔Your business workflows need more cowbell?
2moThe vibe coding assessment that the seemingly growing vibe-coding community both needs and deserves. Great write-up Werner Daehn. I am a bit annoyed that you beat me to the punch on creating a blog post like this. But as you rightly said, today’s evaluation will be obsolete tomorrow but the framework you shared is solid regardless of where we are in the journey.