AI and Software Development 2026

February 16, 2026

AI And Software Development

When ChatGPT was released in Nov of 2022, it was already a better programmer than me. By December, I used it consistently whenever I wrote code. In 2023, I relied on ChatGPT to create an online puzzle game for my kids. At this time, my coding was mostly architecture decisions, code reviews, refactoring, and testing. I was still writing some of the code, as it was faster to make small changes by hand instead of copying the right context into ChatGPT. By early 2025, I started to use Cursor, one of the first coding agents. At this point, I stopped writing code completely. Coding agents solved the context problems of Chatbots, as the agent had complete access to your file system and code base, and can autonomously explore your computer to populate its own context. By that point, my involvement was purely architectural, requirement definition, and testing. I only review AI-written code insofar as to understand the decisions it made, not to identify bugs or offer improvements.

I have been thinking a lot about my relationship with coding and software development. My conclusion is that software engineering is about accountability, advocating for users, and defining what is good. These three things are essential to good software and require humans.

Story: An Inline List Manager

I have a family website. At first, the list of people who are allowed to login was managed via a configuration value, which was a pain to view and update. So I decided to use AI to build a custom web form so I can easily manage the list. I knew what the UI should look like and how I wanted to store this list in the database, so I wrote a detailed requirements doc for the AI. The spec covered UI design, edge cases, supported operations, drag-and-drop to reorder, concurrency issues, and the distinction between editing and empty list vs a list that was never created.

It took another hour and 6 iterations to build this. How can that be the case? The short answer is that I didn’t know what I really wanted until I started to use what the AI built.

In the first iteration, the AI wrote 1000 lines of code over 20 files and had a working list manager. That’s fantastic!

But that AI also got several things wrong:

Duplicate code: Adapter, widgets, and variable names were duplicated in multiple places and there were no hints they were related to each other.
Unclear ownership boundaries: What are the bounded contexts? What are the different domains? There were no clues who is responsible for validation and why.
Verbose code: Code worked but could be simplified.
Bad naming: Variable names were confusing.

The second iteration was about fixing the issues I identified. But the AI Agent introduced a bug that was not caught until I tested the form manually.

In the third iteration, I asked AI what the code would look like if I wanted to reuse it for other kinds of lists. The AI then realized its code structure was not easily reusable.

In the fourth iteration, we looked at how the code can be further simplified. We explored which options makes the code less verbose yet still followed community standards.

In the fifth iteration, we looked at how to make the UI work with different database backends (including a NoSQL DB). At first, I only wanted to get the list working. I didn’t care about making sure the code worked with different backends until I got to this point.

In the sixth iteration, we reviewed all latest state of the code and identified more places where variables and functions were misnamed. This was not a correctness bug. These were legacy names that came from the original design which are no longer accurate. AI agents are not proactive – they will not refactor existing code unless asked to do so.

The reason for the back-and-forth was that I didn’t know what I really wanted at first. Every iteration followed a similar pattern: the AI produced working code and satisfied the requirements. Then my review enabled me to say, “But what about…”, so another iteration followed. The process ended when my gut told me the code was finally good enough.

Story: A Word Puzzle

I wanted to make a simple word puzzle. I fired up Claude Code and started to talk through my concept – rows of moveable words. The AI asked good questions about winning conditions, scoring rules, and the tech stack. Sometimes I suggested things that AI didn’t bring up. At the end of this session, the AI made the game on the first try.

The game runs in the terminal. So I started a tmux session, started the game, and asked the AI to test. It was a slow way to test. (And expensive in terms of LLM tokens.) But it was a valid way to test. I then asked AI if it had better ideas. The AI did. It offered three ideas:

Separate logic and presentation tests.
Mock the curses window.
Use pyte to emulate the terminal.

The AI built a working game, which was amazing. But it didn’t think about testing. It couldn’t have known about my standards for testing. My review and feedback was necessary.

The AI refactored the game and informed me it added tests. But something didn’t look right. “Did you also use pyte and add integration tests?”, I asked. The AI had not. It implemented just the first option and didn’t mention that it skipped the other two. My review was necessary to catch that.

The integration tests caught bugs in the first iteration.

Now that the game was working, I asked the AI to add a linter, code checker, and comprehensive integration tests for the game. I simply didn’t care about these things until the game was working and I honestly wanted to get to a working game ASAP.

If any working word puzzle was good enough, then the AI really could’ve done this without needing a human. But that’s not how software is developed. People don’t know they care about separation of concerns until they experience the pain of not having it. People can’t be bothered to wait for a full CI/CD pipeline, linters, static code analysis, comprehensive tests, and a variety of other things. Most of the time, they want something they can play with ASAP, then follow up with “Now another thing…”

AI can replace the typing but it cannot read the user’s mind. If I, the user, cannot express the full details of the final product in the initial prompt, then multiple rounds of iteration is unavoidable.

Story: Word Puzzle V2

With the success of the previous game, I wanted to try a different terminal-based puzzle game. I had a different idea that I wanted to play with. I gave AI a requirements doc and asked it to code and test it. 18 minutes later, it came back with a working game.

It took most of my day to come up with the requirements doc. I had an inkling of an idea of what I wanted in the game. The player tries to make words from randomly chosen letters, but are encouraged to avoid using common letters. It can’t be a clone of a well-known board game – that wouldn’t be interesting. The mechanics and scoring system should be different. The game should feel fun. I spent several hours talking with an AI to refine the game concept and get a sense of what would work and wouldn’t work. Multiple hours of design for just 18 minutes of coding.

Coding is no longer the bottleneck. The bottleneck will be in knowing what you want. Sometimes you can iterate without writing a line of code. More often than not, you will need to iterate with your AI coding agent until you are satisfied. In either case, human taste and feedback will be necessary for a good product.

Conclusion

My experience reflects a broader shift in how software is built. AI has changed the role of the software engineer, but a human must still be in the loop. The software engineer must therefore be responsible for the things AI cannot do. Three responsibilities remain:

Be Accountable: Take pride in good software. Feel the user’s pain when the software doesn’t work.
Advocate For Users: Know who the users are and what they want. Make sure the AI agents know what the users want.
Define What Is Good: Review the agent’s work. Make subjective decisions. Trust your instincts and push back when something feels wrong.