AI Field Journal: Understanding our customer’s data woes

Rob Mansfield

Leader in Data & AI Technologies

Published Apr 1, 2025

TL;DR

At TwelveFive , we’re on a mission to help agricultural teams move faster with better data. To do that, we needed a working demo rig that didn’t just show off tech - it has to earn its place in real-world Ag workflows. The catch? We’re a startup with no time, no team, and no access to clean data. So, we leaned into our data and AI smarts to put today’s coding assistants to the test.

AWS Q showed promise, but Aider emerged as our go-to for pair programming with GenAI. If you're leading a tech team in Ag and wondering how AI can help you build faster with fewer hands on deck, skip to the end for takeaways – or read on for the details of how we built a realistic, farm-ready dataset with just one dev and a few LLMs.

So, what are we doing?

Early on in our journey, we recognized that people wouldn’t just listen to our opinions, we had to back them up. We’ve all got a history of solving problems in Ag (and elsewhere) but what can we do now, today, to make it clear that we get what’s going on and can provide solutions? Bonus points if we can also provide an accelerator to projects, giving everyone a leg up on delivery of value.

What followed was much mud being thrown at many walls to work out what shape that would take. We landed on a Demo Rig to end all Demo Rigs. Effectively a set of data and components that give us our platonic ideal of how agricultural data can flow into a data lake and derive value:

Great, but that’s a lot of work, is it not? We are, by definition, a start-up, we don’t have the resources to go spin up a dev team, buy in new data, pay subscription fees, and basically deliver a whole Ag data platform, how can we deliver this?

Well, we do have something in abundance: AI smarts. We have all worked with GenAI solutions since the old gpt-3.5 days. Indeed, one of my key responsibilities in my Accenture days was asking the question: “What does this mean for our Dev Teams?” Well, this seemed a great opportunity to smoke what we were growing and start to dev with a team of… well… me.

This is the first in a series of posts documenting our journey of getting this rig up and working, and we will naturally be focusing on the first bit first…

Creating some data we can work with

We are a consultancy, we have worked with a lot of data; we have dissected it, analyzed it, transformed it, enriched it, but we don't actually own any of it, that’s our first job. We need to:

Create pristine farming data, the kind of thing you would find in the perfect FMIS (that clearly does not exist!).
Take those data and make them more “realistic”. As we all know, some people will create fields the size of Wyoming, not record everything that happens, and then apply “Rndp 5gal/hectare” which makes less sense every time I read it.

How can we use our small team and AI to get this done fast? We will be using Python and PyCharm for this post. I know a few languages, but Python is quick, easy, and should be relatively understandable by an LLM.

Job 1: Creating pristine farming data with AWS Q

Creating requirements for this isn’t too hard. We want something that can generate data for each of the core entities:

Additionally, we want constraints added to make the data “pristine”. For this, we can repurpose the popular Cucumber framework, a tool made to allow technical and non-technical people to work together more easily. We found this approach highly effective for both data generation and testing with GenAI, For example:

Feature: Specifics about a Cropzone
    Scenario: Ensuring a Field has Cropzones for every season between 2015 and 2025
        Given a Field with one or more associated Cropzones
        When I retrieve all Cropzone Seasons for this Field
        Then there should be exactly one Cropzone for each year between 2015 and  
                 2025 inclusive

Cucumber statements were defined for every single data attribute and relationship that makes up our data estate. Check out the accompanying GitHub repo for all the constraints and reference data we created ourselves.

For a human who knows how to code, this is trivial enough, spin over each entity in order and create the relevant items based on the rules but I got lot to do and limited bandwidth so can I accelerate it a little with GenAI?

Q, Who?

I think AWS (along with pretty much everyone else on the planet) were a bit back-footed when OpenAI brought out their first models, but they’ve achieved great things since. What was once CodeWhisperer is now AWS Q and comes with a number of features allowing agentic development of code, tests, and documentation.

We like

Agentic developers for code, testing, security, and documentation
A standard chat model for coding questions, chunks and hints
Inline suggestions (pretty standard nowadays but magic nonetheless)
Devfiles which let the agent create a dev environment to run tests, linters, or other operations

We are less fond of

Low feedback and long agent cycles really hurt the dev experience
Adding code style guidelines requires pointing it at a repository you own (and you are proud of!). You can’t just hand it a document.
Linting, naming, and style was mediocre, to the extent that tests often wouldn’t execute properly
The agents loved to refactor and muck about with existing code, and the size limit of your prompt meant you couldn’t give it long explanations about what to change or leave.

How did we use it?

Initial Setup

Opening with a prompt to setup a basic Python environment I got a requirements.txt, a main function and some tests, certainly removing some manual work. Then I began feeding it one set of constraints at a time to get it generating the correct entities.

Prompting Pains

In general, this worked okay. After each iteration I’d have to review the code and either edit it myself or suggest improvements. A good example of where it got frustrating is encapsulated in this prompt:

"Changes to this code frequently create issues with column naming. Using the csv files within the schemas directory, can you create data classes (one class per file, in their own directory) that all code within the make_data folder should use for creating datasets. After this change the data_generator.py file should output a list of classes."

A human would have spotted this much sooner or been experienced enough to stop it being a problem in the first place. Nevertheless, this prompt fixed the problem and after I’d stood up the tests, we were able to get it working.

Rabbit holes abound

I found I often got stuck down rabbit holes with Q, and debugging was difficult owing to the time it took the dev agent to complete with only basic feedback. I don’t know about you but that’s bad dev experience and just caused me to constantly context switch.

Refactor all the things!

For a window into my pain, it tended to generate readmes when it decided to do a large refactor and by the end I had seven of these! Such large changes in the codebase were largely unnecessary, and they frequently introduced errors. When you didn’t write the code, those errors are difficult to pin down and I often felt like I was developing on shifting sands, unable to maintain a whole picture of the (tiny) codebase.

Final Take

Overall, it was better than coding bareback, but it was very “one step forward, two steps back”. I would lean into using its chat features and inserting the code over letting the Agent do its thing.

Job 2: Creating real-world data with Aider

So, we have data that almost perfectly represent a Cropzone and the operations you might perform upon it. We now need to make it bad enough to be believable.

The code will use each of our entities, but to give it a hand we are giving the LLM the schema, so it doesn’t have to think too hard. We are also providing twenty error cases such as “1% chance for a Cropzone's crop to be either blank or be replaced by a random vegetable”. This will dictate how corrupted our data should get.

Ada, right?

No, Aider (but I’m quite sure that’s on purpose). This tool is a little bit more old school as, while there are IDE plugins, is primarily a command line tool that requires you to bring your own API keys. This means you can pick your model, and they maintain a leaderboard of the best combinations and settings (along with how much they cost: I’m looking at you GPT-o1), I used Sonnet 3.7 and didn’t look back.

We like

You choose which files to add, and it has a map of the repo so it can intelligently determine if you’ve missed anything out
There are three chat modes: Architect uses one model to think and another to execute, code gives you both in one command, and ask lets you plan an approach (which you can then ask it to execute)
You can add coding guidelines and standard commands, such as “always make tests, use unittest”
Aider creates a commit for every change so you can view the diff and roll back

We are less fond of

If you forget to clear the chat or drop old files you can easily exceed your context window. Anthropic especially rate limits you at 20,000 input tokens
It can execute code for you and evaluate the results but this requires a bit of plumbing.
For daily use it’s more expensive than a subscription. For occasional, it’s probably cheaper.

How did we use it?

Initial Setup

I managed to get a lot of the initial work done with very little intervention but – given I had a longer context window – I did use ChatGPT to help craft some longer prompts that covered all my coding preferences. I’ve put the initial prompt here, and it certainly made my life go a bit smoother.

Refactor none of the things!

It did start to grind as I made things more complex but those issues were more “this doesn’t work” than “I’ve just rebuilt your whole codebase”. They could often be solved through pinging error messages to the chat or doing a manual debug. In comparison to Q, it was also resistant to changing its approach unless explicitly asked

Maybe we should abstract that?

There were still foibles. For example, it loved to throw individual dataframes or dictionaries around and repeat interaction logic in every call. I had to create abstractions manually then tell it to start using them but it did pick up on my approach and work with it. For example:

“I'd like to add the filter_dfs functionality in main.py to the new DataFrameManager object in dataframe_manager.py. Given the business_id and optional crop name, I want to create temporary filters on all dataframes that will be used for any operations.”

Final Take

I liked the more free-form nature of dealing with Aider, and the use of git commits made dealing with diffs and rolling back any errors trivial. Certainly if working with Pycharm, having a terminal window open with Aider active is more effective and adaptable than running a plugin.

Also, minor point but it continued to use iterrows on my dataframes even when I told it to stop. I know everyone does it but it’s computational insanity on large datasets and it ran like a dog.

So, you have a favorite?

If you aren’t using Cursor, either because your company won’t buy it, or you’re just a hopeless JetBrains fanatic (guilty) then there are more tools being released every day but these are two that are definitely worth playing with. Overall, these tools made me a lot faster. I could ask questions, speed up repetitive operations and almost completely remove boiler plate but a less seasoned dev would probably have got lost in the noise and just given up (or worse, just deployed it to prod!).

For devs, Aider has a lot of good points and command line isn’t really a problem for us. If you can unpack a tar on Linux then this is a piece of cake. It gives you the power to choose your own approach, models, and guiderails in a way that Q simply doesn't quite yet. I’ve continued to use it in my dev workflow and learned new ways to utilise it effectively.

Personally, I found that all approaches have a habit of creating long files and complicated modules that I’d have to manually simplify, because if I asked the AI to do it then it just got even more convoluted.

Closing thoughts

Whatever you use, remember you are still in charge. Yes, the internet is full of people “Vibe Coding” (more on that in a later post) but these tools aren’t quite there yet. Get some test cases written yourself, add some old school pre-commit checks, and remember that GenAI is a wingman, not a replacement. Guide it as you would a junior dev in a pair programming session.

Your primary role as a programmer is to write code that other programmers can understand. In a world of smaller, AI driven teams, that other programmer will increasingly be an AI but – for the next few years at least – there will be a human in the loop. Treat them with respect.

Honorable mentions (AI we didn’t use)

GitHub Copilot

I know, I was surprised too. The fact is, Copilot does not yet support proper agentic dev. I can ask it to review code, generate tests or improve a module but I can’t say “Here’s some requirements, get back to me when it’s done”. As such, it’s out. Sorry Microsoft.*

* Edit: Yes, this is now in beta, I'll get to it later!

Cursor

It’s not that I don’t like it, I was just curious what else was out there and how it can work with my day-to-day. Cursor has a lot of power and would probably be the first thing many would reach for but it’s notable for having its own way of doing things, over-engineering stuff, and being a bit black box. MCPs are a killer feature however and I have got a later post on what it can bring to the table.

Dwight Wheeler

Ag Segment Data Program Manager at CNH Industrial

5mo

Well that’s not really very informative. Maybe you guys can actually explain how you handle vector data from the field coupled with other data sets to show actual grower value

Alistair Knott

Listening Volunteer - Samaritans

6mo

Very informative Rob thanks. As a bit of an Ag-data pioneer in previous years with yourself, Matt and Theo, with some great successes, it feels (to a retired luddite now) that the technology today is making the complexity of leveraging ag-data on a micro and macro level, more manageable. I wish you all every success, you guys have the skillsets needed alongside hardened ag-data management experience/learnings. It’s a potent mixture to any potential client. Alistair

TL;DR

So, what are we doing?

Creating some data we can work with

Job 1: Creating pristine farming data with AWS Q

Q, Who?

We like

We are less fond of

How did we use it?

Initial Setup

Prompting Pains

Rabbit holes abound

Refactor all the things!

Final Take

Job 2: Creating real-world data with Aider

Ada, right?

We like

We are less fond of

How did we use it?

Initial Setup

Refactor none of the things!

Maybe we should abstract that?

Final Take

So, you have a favorite?

Closing thoughts

Honorable mentions (AI we didn’t use)

GitHub Copilot

Cursor

AI Field Journal: Learning to embrace your masochist

Sep 23, 2025

From Fuzzy to crystal clear: A recipe for entity linking in AgTech

Sep 12, 2025

AI Field Journal: An OpenMetadata AI Assistant

Apr 29, 2025

Others also viewed

Issue 7 - February Recap

DeepSeek vs Big LLM, Pandas' PyArrow speed up, Snowflake's feature avalanche to bury the competition?

Navigating the ML Waters

A strategic approach to generative AI

Reactor Pulse

Cost savings by using DeepSeek R1 for Product Taxonomy Classification

Rewiring the Agentic AI Stack with MCP: A Deep Dive into MCP Spec

From Code to Context: Building AI That Thinks Like an Analyst, Not a Coder

Riding the Vibecoding Wave: How I Shipped a 12-Million-Customer ML Engine (and Kept My Sanity)

💊 DATA Pill #167 - Durable AI Loops, Flink Agents, TDD with dbt, S3 Vectors

Explore content categories