The Rankability of Data
This talk poses and solves a new problem, the rankability problem, which refers to a dataset's inherent ability to produce a meaningful ranking of its items. Ranking is a fundamental data science task. Its applications are numerous and include web search, data mining, cybersecurity, machine learning, and statistical learning theory. Yet little attention has been paid to the question of whether a dataset is suitable for ranking. As a result, when a ranking method is applied to an unrankable dataset, the resulting ranking may not be reliable.
The rankability problem asks: How can rankability be quantified? Can rankable subgraphs be identified? At what point is a dynamic, time-evolving graph rankable? If a dataset has low rankability, can modifications be made and which most improve the graph's rankability? We present a combinatorial approach to a rankability measure and then compare several algorithms for computing this new measure. Finally, we apply our new measure to several datasets.