A Fresh Look at Formal Software Verification (with Insights from Formal Hardware Verification)
Early efforts in formal analysis of software dismissed viewing programs as state transition systems preferring, instead, the axiomatic semantics of the Floyd-Hoare logic which led to Dijkstra's widely-used predicate transformers. When the goal is program correctness, such an approach has its merits. But it has also fallen quite short of delivering scalable solutions. The most common "formal" methodology in reasonably wide-spread use today is symbolic execution which basically extends the capabilities of program debuggers by propagating symbolic expressions rather than concrete values along program execution paths.
While we concede that program correctness is still a distant goal, we argue that it is possible today to scalably deploy formal analysis to detect certain classes of subtle control/security bugs in large software code bases. This requires that we return to viewing software as a bone fide state transition system and to apply to it some of the lessons we learned in scaling formal hardware verification including a) the identification of specific control-centric safety properties, b) structural abstraction of irrelevant data state that has nothing or little to do with these properties, c) full unbounded abstract reachability using incremental induction ala IC3/PDR to determine if the properties can be violated under any execution, d) concretization of any abstract counterexamples to determine their feasibility, and e) automatic refinement of spurious counterexamples to bring back only those relevant data constraints needed to provably establish that the properties hold or to demonstrate a true violation. The underlying reasoning engines that make this possible are, of course, the modern CDCL SAT and SMT solvers and the minimal unsat subset (MUS) extractors which are the core competencies of this community.
In this work-in-progress talk, I will describe a novel one-hot encoding of program control flow that respects function boundaries and briefly show how it is combined with an abstraction of PHP server codes to detect the presence of cross-site scripting (XSS) vulnerabilities.