I still fondly remember my first-hand experiences of using GitHub Copilot and Cursor. When I used GitHub Copilot for the first time (around 2023), I was a college student in my third year. It was an evening time, sitting alone in the HoD room, working on a project. I had applied for GitHub Education Pack and got access to GitHub Copilot through JetBrains Rider. It was those early days when Copilot was just autocompletions with specially-trained LLM called "Codex" and it was a time when ChatGPT wasn't released yet. But still, I was pleasantly surprised on how fast Copilot was able to understand my current context and give nice completions that accelerated me to finish 2-3 evenings worth of code, within 40-45 minutes. I said to myself, "I should buy this tool, once I complete my college".

Fast-forward to working at Sahaj (around 2024), one day, I was given access to use Cursor for software development. This was some time before Claude Code was even a thing. Cursor was touted to be a better tool compared to GitHub Copilot, not just for auto-completions. In fact, Cursor had introduced a new chat-based development paradigm. I wasn't initially sold at the idea of chat-box to ask an LLM to write code, but I was fascinated at the prospect of an alternate paradigm for writing code. When I had my hands on it, I was again amazed with results it gave. An LLM from somewhere in the internet, wrote the code for you in your own machine! Some of our colleagues jokingly suggested to buy some land for farming, just in case…

While we were amazed at the speed of development with Cursor, we were extra cautious in ensuring that the code generated by AI was safe. Reviews were done with some extra care, line-by-line. The amount of care we gave to this generated code was more than what we usually gave to code written by human peers. Obviously, the code that Cursor generated, had slightly a different coding style from our usual styles but we did feel some of them looked better than our style. And also the generated code wasn't without problems

  • The code appeared nicely written but few of the solutions and approaches it took, could have been better than we do.
  • Few sections were too verbose compared to human-written code
  • There were lot of comments in the new code (We generally don't prefer comments – Clean code principles say that code should be self-explanatory without comments in general.)

Fast-forward to April 2026, where the latest LLMs such as GPT 5.5, Claude Opus 4.7 and Gemini 3.1 Pro, can basically one-shot an entire app with a single prompt. And it is called "Vibe coding" these days, as the quality of LLMs are such that they basically enable even the non-coders to write apps just by prompting the LLM to do for them. There are lots and lots of tools for these in the market now. Some are dev-oriented, such as GitHub Copilot, Claude Code, Codex, Cursor, etc.,. Some are non-dev oriented, such as Lovable, Bolt, v0, Replit, etc.,.

Wow! So much evolution happened starting from a simple autocompleting tool to chat-based coding agents. We have moved from writing code by hand, to writing code by prompt. The entire paradigm of software development has drastically changed forever in last 4 years. But the question is:

"How much of the generated code is directly intended by human developers?"

One may ask why does it matter especially when the generated code works as intended. A Product Manager might also ask, "If the feature works as intended, what is concerning in the code? Does your product-owner or end-user care about the code?"

To answer this and properly understand the implications, we'll go back to some of the fundamental principles that drove the software engineering teams before the advent of LLMs. One of those fundamental principles is writing cleaner code. Sure, there are multiple interpretations on what you call "clean code". But all those interpretations agree to this particular statement: A codebase without following any of the clean code principles, would quickly "rot". And what happens when a codebase "rots"?

  • Amount of time and effort to build new features or fix bugs, would increase over time, which in-turn would affect the software delivery.
  • New team members would have even harder time in understanding the codebase and start working, as your existing team members already struggle.
  • Most importantly, the above effects exponentially increase as the size of the codebase and the amount of moving parts increase.

These are the primary reasons why there are certain operating principles enforced in software development. You enforce certain disciplines both inside code and outside code, to keep your code maintainable and maintain your software delivery loops.

So, yes, clean code actually matters. This means the generated code should adhere to these principles. And the following factors decide how much of your generated code will be clean, such as:

  1. The speed at which the new code is generated
  2. The amount of context that the LLMs take from your codebase
  3. The design of the solution for the given feature or bug

Speed of Code generation

The primary selling point of all these LLM-powered development tools is "Speed". They can generate the amount of code you write in an entire evening in 2-3 minutes. They are incredibly fast, especially compared to us, because they don't spend time in "thinking" or "crafting" code as we do. On the other hand, we need to spend time and effort to think, plan, craft and type code by hand, line-by line. Honestly it requires a tremendous amount of mental energy for long time, to coordinate all these tasks.

Now, with LLMs, the code we need is almost fully generated in just few minutes, without spending those tremendous amount of energy and thinking. You will take few more minutes to test it and verify if it works. Suddenly, you'll feel like delivering features faster than ever. Hold on a minute here, before you pick the next feature! Although your code works, if your code doesn't follow the prescribed clean code practices and your codebase standards, your codebase starts to rot. More importantly, it rots faster.

Your codebase rots faster, because bad code is added faster. Accumulate lot of bad code, your codebase becomes what people call "Big Ball of Mud". Once your codebase has become complex enough, LLMs can start to make some confused decisions, bloats its context window with unnecessary files and decisions, write more worse code and end up consuming a lot of tokens (Check your LLM costs and credit card bills).

This is where code reviews become more important than ever. If the code is bad, then the next best way is to ask LLM to either clean it up or redo it properly, until it gets right. This might feel like cutting off your momentum, but trust me, by slowing a bit and taking time to review the code, you are actually saving your codebase from dying. Who knows that you might catch the LLM deleting one of your previously working features!

Amount of Codebase Context

The next deciding factor is the amount of codebase context that the LLM uses. You want the LLM to develop a complete feature, so you describe the feature, give some code files for context and ask it to write a full feature for you. Note that the LLM doesn't have the context of the whole codebase but only gists of it.

I know that there are some of the modern LLMs are quite equipped with larger context windows up to 2 Million tokens (such as Grok 4.1). But I am very sure that they can't hold the whole codebase in their context window, even for medium-sized repositories, forgot non-code context data such as docs, mock-ups, etc.,. They can't remember such broad range of things as we do. They can't care about all the things at once as we do while writing code. LLMs are not J.A.R.V.I.S. AI that we saw in MCU Films, which seemed to keep accumulating context over years, like humans do.

When these LLMs write code, they have context of only few sections of our codebase. And if the amount of code they referred is too few to understand the codebase, more deviations from your codebase practices would be present in the generated code. What it means that your codebase is starting to lose its uniformity as it tends follow different patterns in different parts of your codebase. The parts of codebase might look good, but when all those parts are placed together in a big picture, you would feel that you have created one Frankenstein of a codebase.

So, now we know LLMs can't hold your entire codebase in its context and lesser (quality) files we give to LLM as context, it writes bad code. The solution is simple: Provide some of your existing high-quality source files to LLM as reference, which represent the principles followed in rest of your codebase. These provided files should look like your perfect code (at least close to it). LLMs can understand the patterns from these context files and generate code following those same patterns.

This simple concept is what gave rise to the convention files CLAUDE.md, .copilot-instructions.md, AGENTS.md and .rules files. Modern coding agents enforce these files as the places where you can instruct about your codebase to the LLM. Once you provide necessary rules, standards and patterns as plain text in these files, the agents would then instruct the LLM to follow them, so that the generated code have quality similar to the high-quality code that you and your colleagues usually write by hand.

Design of the Solution

This is the most underrated and most important deciding factor among the list. In classical software engineering, modelling solution for a given feature/problem is an art. There are developers who actually value spending dedicated time for modelling solution before writing actual code. They model the solution for correctness, robustness, ease of use and ease of maintenance, over a long time. With LLMs, its more often than not, easy to jump to writing actual code without devising a proper solution first. Remember the fact that LLMs are trained on patterns, so they churn out "obvious" solutions without proper deep trade-off analysis. They often overlook various real-world aspects necessary for the solution. The solution may be functional but it would be mostly brittle. Now, let's take a look at a non-exhaustive list of problems that are usually associated with poor modelling of solutions:

  • Missing edge cases
  • Poor performance
  • Poor scalability
  • Unnecessary complicated to understand
  • Duplicated logics in code
  • Overlooking security issues and so on

In short, suboptimal architectural designs eventually "rot" your codebase by causing hidden technical debt. Most importantly, poor upfront design "rots" faster than a bad implementation. The design represents the skeleton of your solution. If your skeleton is weak, nothing else in your body compensate for the lost strength. You can't replace a poor design with other clean code practices. No matter how much ever clean code practices are followed, a "poor design" always requires a technical debt at later point to fix it with a "better design". That's why modelling solutions is considered to be an important "Art" in software engineering.

So, now one can realize the fact that using LLMs to fully generate solutions, might lead to bad results, more importantly, in a faster pace. But the same time, one must understand that modelling proper solutions is an universal problem, both with and without LLMs. And here lies the solution for this problem: You can use LLMs to see possibilities, make adjustments to your design, which are usually out of your expertise or abilities. You should still own the design by being its primary architect. Every part of the design should be intentionally done, with care. Also, it is actually recommended to ping-pong with the LLM to refine this design, instead of trying to one-shot it. Once you get this design right, writing code for corresponding design should be just a cake walk. Even for the code, you might need to ping-pong to generate the intended code, which is totally fine, if you want to get high-quality code.

Remember that this is where the thin line between AI-assisted development and Vibe-coding exists:

  • You caring only the features and let the LLM do design – That's "Vibe Coding"
  • You caring both the features and the design while using LLM for help – That's "AI-Assisted Development"

This subtle difference is always defined by the ratio of amount of responsibility that the developer takes vs what the LLM takes .

Final Word

Let's go back to our original question: "How much of the generated code is directly intended by human developers?"

We have a proper answer for this question now. Assuming we take care of the above 3 deciding factors,

  • You are ensuring that your solutions have a proper design
  • You are giving enough context to LLMs to let it understand your codebase and styles
  • You are taking time to review all your LLM-generated code

Once you ensure the above, you'll realize you are not just delivering fast, but also with the same high quality that you used to deliver. Most of the generated code is now generated as you intended to be – High-quality code that you understand fully, even without writing it yourself. And this code is generated at a faster pace. As a developer, you are still exercising your brain for solving problems, instead of asking AI, meaning you won't be losing any of your problem-solving abilities. These abilities are going to be even more valuable in this AI era, not just for developers, but also for organizations and their customers.

"I suppose it is tempting, if the only tool you have is a hammer, to treat everything as if it were a nail".