Sort Before You Prune: Improved Worst-Case Guarantees of the DiskANN Family of Graphs
- Siddharth Gollapudi ,
- Ravishankar Krishnaswamy ,
- Kirankumar Shiragur ,
- Harsh Wardhan
2025 International Conference on Machine Learning |
Graph-based data structures have become powerful and ubiquitous tools for scalable approximate nearest-neighbor (ANN) search over the past decade. In spite of their apparent practical performance, there has only recently been progress on the worst-case performance of these data structures. Indeed, the influential work of Indyx and Xu (2023) introduced the key concept of -reachable graphs, showing that graphs constructed by the DiskANN algorithm (Subramanya, et. al. 2023) produce an -approximate solution with a simple best-first search that runs in poly-logarithmic query time. In our work, we improve and generalize this analysis as follows: – We introduce sorted -reachable graphs, and use this notion to obtain a stronger approximation factor of for the DiskANN algorithm on Euclidean metrics. – We present the first worst-case theoretical analysis for the popular beam-search algorithm, which is used in practice to search these graphs for candidate nearest neighbors. We also present empirical results validating the significance of sorted -reachable graphs, which aligns with our theoretical findings.