Written by Amitb and Vighnesh Pathrikar

Test Driven Generation Cycle

Have you ever found yourself struggling to refactor your code to adhere to SOLID principles or convert user requirements into code? Did TDD or Test-Driven Development feel slow? Have you ever wished for a tool that could make this process easier and more efficient? Well, look no further! With the latest generative AI solution, ChatGPT, you can now easily refactor your code, convert user requirements into code, and even create tests — all while adhering to SOLID and other principles.

TDD stands for Test-Driven Development, which is a software development approach that emphasizes writing tests before writing the actual code. It follows a cyclical process of writing a failing test, writing the code to make the test pass, and then refactoring the code to improve its design without changing its behavior. This iterative cycle is often referred to as the “Red-Green-Refactor” cycle.

SOLID is an acronym that stands for five design principles: Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion. These principles promote modular design, separation of concerns, and reduced coupling between components, which can make code more maintainable, testable, and extensible.

Code refactoring is the practice of improving existing code by making it easier to read, understand, and maintain. It is done to improve code quality, reduce complexity, and make the code more flexible. By refactoring code, developers can avoid future problems and save time and effort in the long run. It is an important part of software development that allows developers to continuously improve their code over time.

We propose a new method to write software, where you use ChatGPT as a pair for programming, which we term as Test Driven Generation, or TDG. TDG is a software development methodology that emphasises writing tests first and then generating the code to meet those tests. This approach is different from the common practice of “code first, test later” approach because it focuses on ensuring that the code is testable, maintainable, and extensible from the very beginning. This is similar to Test Driven Development or TDD.

The traditional red-green cycle is a fundamental concept in TDD that involves three steps: writing a failing test (RED), writing the minimum code required to pass the test (GREEN), and refactoring the code (CYAN) to improve its maintainability, readability, and efficiency. This cycle is repeated for every new functionality that needs to be implemented, and it helps developers write better code by ensuring that the code is thoroughly tested and maintainable.

The TDG process reinvents the traditional Red-Green cycle with generative AI in the mix. The following steps are part of a TDG cycle:

  1. Write tests: The developer writes the tests that assert the desired behaviour from the code. These are written using the requirements which are gathered initially. These tests should be specific, clear, and easy to understand. These act as an input to a generative model to generate code
  2. Generating code: Once the tests are in-place, the generative AI is prompted to generate the code that will pass those tests. It is possible that the initial code is rudimentary in nature and does not follow clean code principles. This makes a subsequent step to refactor the generated code an imperative step
  3. Refactor: The generated code is refactored by collaborating with the generative AI to improve its design, readability, and maintainability. This step ensures that the code is scalable and extensible over time.

The main difference between TDG and TDD is that TDG focuses on generating code to meet the tests, whereas TDD focuses on writing the code to pass the tests. With TDG, developers can generate code that meets specific requirements, which can lead to better code quality and fewer bugs.

A prototype eliciting the concept of TDG with virtual pair

We have built a small prototype VSCode plugin that helps you achieve this workflow. While it does not support in place editing yet, it can demonstrate the workflow idea with well cut out use cases. This plugin is like having a virtual pair programmer that does not tire and is willing to constantly collaborate without being distracted. It takes your tests and uses ChatGPT to magically produce code that adheres to clean code principles and follows ideas like SOLID, YAGNI, KISS, and DRY. It also helps you explore problems with different perspectives which is great for when you’re feeling stuck and need a fresh set of eyes. And let’s be real, we could all use a little encouragement to write more tests.

Some of you may be wondering what these technical jargons are? Here’s a brief explanation of each principle:

  1. SOLID: SOLID is an acronym that stands for five design principles: Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, and Dependency Inversion. These principles promote modular design, separation of concerns, and reduced coupling between components, which can make code more maintainable, testable, and extensible. You can read more at https://www.bmc.com/blogs/solid-design-principles/ .
  2. YAGNI: YAGNI stands for “You Ain’t Gonna Need It.” It is a principle that suggests that you should not implement functionality until it is actually needed. This helps prevent over-engineering and wasted effort, and can result in simpler, more focused code. You can read more at https://martinfowler.com/bliki/Yagni.html .
  3. KISS: KISS stands for “Keep It Simple, Stupid.” It is a principle that emphasizes the importance of simplicity in design and implementation. Simple code is often easier to understand, test, and maintain than complex code. KISS encourages developers to avoid unnecessary complexity, duplication, and abstraction, and to favor straightforward, readable solutions. You can read more at https://www.baeldung.com/cs/kiss-software-design-principle .
  4. DRY: DRY stands for “Don’t Repeat Yourself.” It is a principle that promotes code reuse and maintainability by reducing duplication. DRY encourages developers to extract common functionality into reusable modules, functions, or classes, and to avoid repeating the same code or logic in multiple places. This can help reduce errors, improve consistency, and simplify maintenance. You can read more at https://www.baeldung.com/cs/dry-software-design-principle .

Here are a few more applications of the plugin we were able to identify.


On a codebase that is becoming difficult to manage and maintain you can use the plugin, and the AI will analyse it and provide a refactored version of the code along with what it refactored and why. This can be further tweaked manually but a first level of improvement can be made by the generative model.

Prompt to code

The plugin can also help you convert user requirements into code. The AI will analyse the requirements and generate code. This is great for simple boilerplate tasks, as those types of snippets are often redundant and tedious. It may not require active thinking and therefore it is best to leverage a model to do a large part of the grunt work.


Its ability to turn around prototypes quickly is already getting a lot of traction. There is a clear benefit in reducing cycle times to take ideas to the market and with its ability to create tests, it can also help you ensure that your software does not necessarily trade off all aspects of quality for speed in a prototype.

However, it does not mean that it comes with no consequences. Some of them are:

  1. Poorly written tests will result in poor code generated by the AI. If the tests are flawed, incomplete, or biased, the AI may learn incorrect or incomplete information and make faulty decisions based on that information. The same old saying, “Garbage in, Garbage out”.
  2. Models are often limited by the number of tokens they can produce, gpt-3.5 for example has a limit of 3000 tokens. If you have ever worked with a larger code base, you would know that 3000 tokens is definitely not enough. The good news is that this is improving with time, so there is promise that generative AI will be able to handle larger codebases soon.
  3. The generated code may not actually work. It’s entirely dependent on the developer to turn the code generated by the AI into working code. It is vital to remember that this approach is a collaboration between humans to achieve the best of the two worlds, speed and quality.
  4. Large Language Models (LLMs) are advanced AI models that generate human-like text based on massive amounts of data. They can understand natural language and have many applications, such as chatbots and content creation. LLMs generally work better with older languages like C++ and Java, since there is ample amount of code available for it to be trained on.
  5. The code will be sent to a 3rd party, for profit organisation, which probably won’t fly for the organisation you are working for. Some of the LLM services do call out that code is used only for suggestions and not for training and is not retained on the server, however if you are dealing with sensitive code, it is best to exercise caution.

But these limitations will be solved as the models improve in the near future. So the potential of gains is high.

Conclusion and Future Work

In conclusion, Test Driven Generation (TDG) using AI as a pair may just be the beginning of a disruption in software development. While there are limitations to consider, the potential gains in efficiency, code quality, and maintainability make it worth exploring. As the saying goes, “two heads are better than one” — especially when one of them is an AI trained on millions of lines of code. And hey, if all else fails, at least we’ll have a cool new tool to show off to our programmer friends.

The future scope of Test Driven Generation (TDG) using AI involves exploring the potential gains in efficiency, code quality, and maintainability that it can offer to software development. This includes the development of more advanced AI models, the integration of TDG with existing software development workflows, and the creation of new tools and frameworks to support TDG. Additionally, TDG can help improve software testing and debugging, leading to faster and more accurate detection and resolution of bugs. Overall, the combination of AI and TDG holds significant promise for the future of software development.