Why Microsoft’s usage of AI-generated code is going to enshittify their software further?
As early as April 2025, Microsoft CEO Satya Nadella said in an interview that,
I’d say maybe 20%, 30% of the code that is inside of our repos today and some of our projects are probably all written by software
In other words, he is saying that as much as 30% of Microsoft code is written by AI.
Let that sink in.
In the near future, we can expect a larger percentage of code in Microsoft software to be written by AI. I expect that the quality of Microsoft’s software is going to deteriorate.
Sanguine view of AI-written code
Someone on my LinkedIn post asked a very valid question:
Does it matter if code is human or AI generated? It has to be tested and pass specific criteria before it is released or passed to the next phase of development.
This is a very good question. At first glance, he made a valid point. In theory, as long as the code passes Quality Assurance (QA) testing, it should not matter whether it is written by a human or AI. But in practice, there are complicating factors.
In practice, Microsoft software engineers are slowly driven ‘insane’ by the AI (Copilot) engineer
A few months ago, someone wrote this post on Reddit: “My new hobby: watching AI slowly drive Microsoft employees insane”.
In this post, the original poster had been watching a software development project at Microsoft, where Copilot (the AI) is assigned to be an engineer. It is quite entertaining to watch human software engineers interacting with the AI software engineer, driving them to exasperation.
For example, Copilot contributed to this bug fix. Then the human engineer chastised Copilot,
This seems like it’s fixing the symptom rather than the underlying issue? What causes us to get into this situation in the first place, where we end up with an invalid index into the backtracking stack?
Copilot responded with:
You’re right that this fix addresses the symptom rather than the root cause.
But the infuriating thing is that while Copilot agreed with the human, it did not do anything more. Reading the discussion between Copilot and human engineers, Steve Gibson, in Episode 1027 of his Security Now podcast, commented that,
So now we have a patch on a patch and no idea why the trouble appeared in the first place, where none of this should have ever been needed. And this “patched patch” has been merged into the DotNET code base. So whatever underlying flaw caused the original trouble and required the addition of an explicit bounds-check when none should have been required – which Stephen [the human engineer] explicitly asked Copilot about – has not been eliminated. It’s been covered up. And then, having done what he could to get this resolved, Stephen Toub finally accepted and closed this problem report. So it’s no longer a problem, right?
…
Stephen Toub is doing his job. He’s not being upset, because his job is no longer fixing problems. His job has changed to overseeing Microsoft’s Copilot sweeping actual problems under the rug, patching the symptoms when and as they pop-up while blithely ignoring their underlying cause. I’m seeing a term used more and more, and I’m not a big fan of its overuse. But this does feel like the automation of the “enshitification” of Windows. Stephen’s original question to Copilot suggested that he knows the proper way to solve it and that if this were still his responsibility he would have worked to understand the root cause of the erroneous backtracking stack index – which he asked Copilot about – rather than simply resolving the crash by adding a test to prevent the out-of bounds read. But this is no longer his problem.
But perhaps Stephen is an exception at Microsoft? Perhaps this is the way Microsoft’s coders have been dealing with such problems all along? In that case this doesn’t really represent any change. This would explain why they never seem to get ahead of the need to continually patch their mistakes. It seems to me that making it quicker and easier to patch edge cases that may cover up underlying structural problems will have the effect of accelerating the crumbling of an aging infrastructure.
What is the problem?
Sweeping bugs under the carpet
From this, we can see that AI is excellent at fixing the symptoms of the bug. But humans can see the deeper root cause of the issue. But it seems to me that Microsoft’s internal mandate to use AI to write code is preventing humans from doing their job. The net effect is that problems with the software are swept under the rug by AI.
AI is doing QA on its own code
If you notice, in this contribution, Copilot wrote both the bug-fix code and the QA test! Of course, code written by Copilot will pass its own QA tests!
The question is, will there be separate, independent testing that is thorough and robust enough to catch deeper problems? My hunch is probably not, given Microsoft’s reputation.
Summary of the situation
- Microsoft has a large portfolio of software products. Their flagship product, Windows, is already extremely huge and complicated. Thoroughly testing Windows alone is a colossal task.
- As early as 2019, Microsoft decided to cheap out on testing. This former Microsoft insider explained in this video what happened.
- AI is now writing code for Microsoft software products. As shown in the interaction between the AI (Copilot) and a human software engineer, the human was prevented from doing his job of fixing bugs at the root, leaving the AI to fix the symptoms of the bug and sweeping problems under the carpet.
Sign of things to come?
I do not have a good feeling about this. As Steve Gibson commented,
It seems to me that making it quicker and easier to patch edge cases that may cover up underlying structural problems will have the effect of accelerating the crumbling of an aging infrastructure.
If your organisation is using Microsoft’s operating systems extensively in its IT infrastructure, you really have to think carefully and make your decision accordingly.
|
|