Count-distinct problem

In computer science, the count-distinct problem[1] (also known in applied mathematics as the cardinality estimation problem) is the problem of finding the number of distinct elements in a data stream with repeated elements. This is a well-known problem with numerous applications. The elements might represent IP addresses of packets passing through a router, unique visitors to a web site, elements in a large database, motifs in a DNA sequence, or elements of RFID/sensor networks.

  1. ^ Ullman, Jeff; Rajaraman, Anand; Leskovec, Jure. "Mining data streams" (PDF). {{cite journal}}: Cite journal requires |journal= (help)