Researchers at Microsoft have developed an AI which can find bugs in code, helping developers to debug their applications more accurately and efficiently.

Researchers Miltos Allamanis, Principal Researcher and Marc Brockschmidt, Senior Principal Research Manager developed their AI, BugLabs, in much the same way Generative Adversarial Networks (GANs) are created.

Microsoft set two networks against each other  – one designed to introduce bugs small bugs into existing code and another aimed at finding those bugs. The two networks became better and better, resulting in an AI that was good at identifying bugs hidden in real code.

The advantage of this approach was that the process was completely self-supervised, and did not need labelled data.

They report the results as below:

In theory, we could apply the hide-and-seek game broadly, teaching a machine to identify arbitrarily complex bugs. However, such bugs are still outside the reach of modern AI methods. Instead, we are concentrating on a set of commonly appearing bugs. These include incorrect comparisons (e.g., using “<=” instead of “<” or “>”), incorrect Boolean operators (e.g., using “and” instead of “or” and vice versa), variable misuses (e.g., incorrectly using “i” instead of “j”) and a few others. To test our system, we focus on Python code.

Once our detector is trained, we use it to detect and repair bugs in real-life code. To measure performance, we manually annotate a small dataset of bugs from packages in the Python Package Index with such bugs and show that models trained with our “hide-and-seek” method are up to 30% better compared to other alternatives, e.g., detectors trained with randomly inserted bugs. The results are promising, showing that about 26% of bugs can be found and fixed automatically. Among the bugs our detector found were 19 previously unknown bugs in real-life open-source GitHub code. However, the results also showed many false positive warnings, suggesting that further advancements are needed before such models can be practically deployed.

They conclude that their approach is promising, though of course much more work is needed to make such detectors reliable for practical use. Given Microsoft’s GPT-3 work on GitHub however, it is quite possible that this work may be commercialised eventually.

Read their full paper here.

Comments