Mathematical simulation has been used to analyze how the sample size affects the accuracy of the estimation of molecular variation in a population. The sample size was varied from 1/200 to 1/4 of the total size of the simulated population. The possible effect of the length of the nucleotide sequences compared has also been estimated; it was varied from 500 to 15 000 bp. A tendency towards underestimation of the mean nucleotide diversity (pi) by about 25% of the expected value has been found. The sample size and/or the length of the nucleotide sequence used have been shown to affect more the scatter of the pi values than the accuracy of its measurement (the proportion of correct estimates of pi is about 14%). The assumption is made that the sample size affects the probability of accepting a false null hypothesis in analysis of the demographic history of a species.
V V Gorbachev. Effect of random sample size on the accuracy of nucleotide diversity estimation]. Genetika. 2012 Jul;48(7):880-4
PMID: 22988774
View Full Text