Unseen species problem

The unseen species problem in ecology deals with the estimation of the number of species represented in an ecosystem that were not observed by samples. It more specifically relates to how many new species would be discovered if more samples were taken in an ecosystem. The study of the unseen species problem was started in the early 1940s, by Alexander Steven Corbet. He spent two years in British Malaya trapping butterflies and was curious how many new species he would discover if he spent another two years trapping. Many different estimation methods have been developed to determine how many new species would be discovered given more samples.

The unseen species problem also applies more broadly, as the estimators can be used to estimate any new elements of a set not previously found in samples. An example of this is determining how many words William Shakespeare knew based on all of his written works.^[1]

The unseen species problem can be broken down mathematically as follows: If $n$ independent samples are taken, $X^{n}\triangleq X_{1},\ldots ,X_{n}$ , and then if $m$ more independent samples were taken, the number of unseen species that will be discovered by the additional samples is given by $U\triangleq U(X^{n},X_{n+1}^{m+n})\triangleq \left|\{X_{n+1}^{m+n}\}\setminus \{X^{n}\}\right|,$ with $X_{n+1}^{m+n}\triangleq X_{n+1},\ldots ,X_{n+m}$ being the second set of $m$ samples.

^ Cite error: The named reference Efron 1976 was invoked but never defined (see the help page).

[Efron_1976-1] Cite error: The named reference Efron 1976 was invoked but never defined (see the help page).

[1]