How much of it is “the present” is mostly a matter of what we use it for and how it’s applied. There are some big ideas on the tasks AI is capable of performing (with or without Human guidance). While most people are still playing around with AI in their personal lives, businesses have begun making serious investments in using it to enhance or replace Humans.
One area where this is definitely true is Software Development. Google estimates that over 25% of new code is generated by AI, and I suspect that percentage is already outdated. Microsoft estimates that 30% of its code is AI-generated. For these companies, and many others, those percentages are going to increase dramatically (perhaps even exponentially) over the next several months and years
With all of this AI-generated code, you might think that it’s only a matter of time before the role of software developer is obsolete. But, I don’t think that’s going to happen, and here’s why. The truth is, most developers spend more time debugging rather than writing code, and since neither Humans nor AI writes perfect code, debugging will always be a critical task. In fact, it is probably more important that AI tools can help with debugging code, rather than writing code
The lab is designed for training and evaluating AI coding tools, primarily ones based on large language models (LLMs). The purpose is to learn how to debug code interactively, similar to how Human programmers do. Current AI coding tools do well at suggesting fixes by analyzing code and error messages. However, they aren’t able to find additional information when their solutions don’t work.
Human programmers debug iteratively. First they analyze the code (and sometimes the requirements) to determine what they think may be wrong. Then, they find examples by stepping through the code and using debuggers to aid in finding and repairing the problems. They repeat these steps until the code is completely fixed.
Debug-gym provides AI agents with access to a toolbox of interactive debugging tools. These tools expand the agent's capabilities beyond just rewriting code. Tools include:
Early research used a simple prompt-based agent in debug-gym and explored how well LLMs could use interactive debugging tools. The experiments compared agent performance with and without debugging tools. The simple agent with tools rarely solved more than half of the SWE-bench Lite issues.
This tells me we are not there yet, with being able to use AI for debugging . There was a significant performance improvement compared to the agent without debugging tools. However, the performance improvement validates that interactive debugging with tools is a promising research direction, but again, AI debugging is not as good as having a developer do it.
The observed tool usage patterns suggested stronger models used a wider variety of tools and even showed curiosity in exploring the project structure. Future work in this area will involve training or fine-tuning LLMs specifically for interactive debugging using specialized data, like trajectory data that records interaction with debuggers. There's also interest in AI generating its own tests during the debugging process, improving trustworthiness (ensuring root cause fixes) and expanding beyond Python and PDB .
The bottom line is that debugging is definitely a task best left to Human developers right now. AI agents with debugging tools can’t match, let alone outperform their Human counterparts. AI tools lack the ability to search for the necessary information that is needed to get the answers on how to debug. They also struggle with the iterative process necessary to be successful in debugging.
In the future, with better data models, as these AI tools improve they may be able to be more successful in order to free up Human Developers for other tasks. However, even then, a Human Developer will still be able to do the best job at debugging since they can do more complex work that relies less on pattern recognition and more on Human Judgement. Human developers understand the entire system including configurations, databases and third-party services. They also understand the business logic and potential edge cases that AI tools struggle to recognize or understand.
There is another side to AI’s inability to handle some of the scenarios mentioned above. Humans will need to be better at communicating requirements so that there isn’t ambiguity or implicit domain knowledge necessary to develop the solution. If that happens, AI may be able to add more value than it does. AI still won’t be as valuable as a Human team member though.
Check out our Agile 101 workshop, suitable for anyone looking to learn the basics of Scrum or Kanban.