DBSCAN

Density-based spatial clustering of applications with noise (DBSCAN) is a data clustering algorithm proposed by Martin Ester, Hans-Peter Kriegel, Jörg Sander, and Xiaowei Xu in 1996.[1] It is a density-based clustering non-parametric algorithm: given a set of points in some space, it groups together points that are closely packed (points with many nearby neighbors), and marks as outliers points that lie alone in low-density regions (those whose nearest neighbors are too far away). DBSCAN is one of the most commonly used and cited clustering algorithms.[2]

In 2014, the algorithm was awarded the Test of Time Award (an award given to algorithms which have received substantial attention in theory and practice) at the leading data mining conference, ACM SIGKDD.[3] As of July 2020, the follow-up paper "DBSCAN Revisited, Revisited: Why and How You Should (Still) Use DBSCAN"[4] appears in the list of the 8 most downloaded articles of the prestigious ACM Transactions on Database Systems (TODS) journal.[5]

Another follow-up, HDBSCAN*, was initially published by Ricardo J. G. Campello, David Moulavi, and Jörg Sander in 2013,[6] then expanded upon with Arthur Zimek in 2015.[7] It revises some of the original decisions such as the border points, and produces a hierarchical instead of a flat result.

  1. ^ Ester, Martin; Kriegel, Hans-Peter; Sander, Jörg; Xu, Xiaowei (1996). Simoudis, Evangelos; Han, Jiawei; Fayyad, Usama M. (eds.). A density-based algorithm for discovering clusters in large spatial databases with noise (PDF). Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96). AAAI Press. pp. 226–231. CiteSeerX 10.1.1.121.9220. ISBN 1-57735-004-9.
  2. ^ "Microsoft Academic Search: Papers". Archived from the original on April 21, 2010. Retrieved 2010-04-18. Most cited data mining articles according to Microsoft academic search; DBSCAN is on rank 24.
  3. ^ "2014 SIGKDD Test of Time Award". ACM SIGKDD. 2014-08-18. Retrieved 2016-07-27.
  4. ^ Cite error: The named reference tods was invoked but never defined (see the help page).
  5. ^ "TODS Home". tods.acm.org. Association for Computing Machinery. Retrieved 2020-07-16.
  6. ^ Campello, Ricardo J. G. B.; Moulavi, Davoud; Sander, Joerg (2013). Pei, Jian; Tseng, Vincent S.; Cao, Longbing; Motoda, Hiroshi (eds.). Density-Based Clustering Based on Hierarchical Density Estimates. Advances in Knowledge Discovery and Data Mining. Vol. 7819. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 160–172. doi:10.1007/978-3-642-37456-2_14. ISBN 978-3-642-37455-5. Retrieved 2023-08-18.
  7. ^ Campello, Ricardo J. G. B.; Moulavi, Davoud; Zimek, Arthur; Sander, Jörg (2015). "Hierarchical Density Estimates for Data Clustering, Visualization, and Outlier Detection". ACM Transactions on Knowledge Discovery from Data. 10 (1): 1–51. doi:10.1145/2733381. ISSN 1556-4681. S2CID 2887636.