Boolean decision rules via column generation
In many applications of machine learning, interpretable or explainable models for binary classification, such as decision trees or decision lists, are preferred over potentially more accurate but less interpretable models such as random forests or support vector machines. In this talk, we consider boolean decision rule sets (equivalent to boolean functions in disjunctive normal form) as interpretable models for binary classification. We define the complexity of a rule set to be the number of rules (clauses) plus the number of conditions (literals) across all clauses, and assume that simpler or less complex models are more interpretable. We discuss an integer programming formulation for such models that trades off classification accuracy against rule simplicity, and obtain high-quality classifiers of this type using column generation techniques. Compared to some recent alternatives, our algorithm dominates the accuracy-simplicity trade-off in 8 out of 16 datasets, and also produced the winning entry in the 2018 FICO explainable machine learning challenge. When compared to rule learning methods designed for accuracy, our algorithm sometimes finds significantly simpler solutions that are no less accurate.
Bio
Sanjeeb Dash leads the Foundations of Optimization and Computation group at the IBM T. J. Watson Research Center. He works on both theoretical and practical aspects of Discrete Optimization, with a focus on Integer and Linear programming. He is a co-author of the QSopt and QSopt_ex linear programming solvers. He is an Area Editor of the Mathematical Programming Computation journal and an Associate Editor of INFORMS Journal on Computing and Operations Research Letters.