Learning to Fix Programming Errors with Graph2Diff Neural Networks
Deep learning has made great advances in the last several years, and it excels in situations where we have big models and lots of data. Professional programmers generate lots of data in the course of their day-to-day work. Can we train big deep learning models on this data and create tools that are useful to professional developers?
I'll talk about our recent efforts in this direction, focusing on the problem of learning to repair build errors encountered by Google software engineers. We represent source code, build configuration files, and compiler diagnostic messages as a graph, and then use a Graph Neural Network model to predict a diff. The model is an instance of a more general abstraction that we call Graph2Tocopo, which we argue is superior to Sequence2Sequence for problems that involve predicting how to change source code. We evaluate the model on a dataset of over 500k real build errors and their resolutions from professional developers. Compared to a recently published Sequence2Sequence-based baseline, we achieve over double the accuracy while tackling a more difficult task.