Testing Hypotheses On Simulated Data: Why Traditional Hypotheses-Testing Statistics Are Not Always Adequate For Simulated Data, And How To Modify Them |

To check whether a new algorithm is better, researchers use traditional statistical techniques for hypotheses testing. In particular, when the results are inconclusive, they run more and more simulations (n(2) > n(1), n(3) > n(2), . . . , n(m) > n(m-1)) until the results become conclusive. In this paper, we point out that these results may be misleading. Indeed, in the traditional approach, we select a statistic and then choose a threshold for which the probability of this statistic "accidentally" exceeding this threshold is smaller than, say, 1%. It is very easy to run additional simulations with ever-larger n. The probability of error is still 1% for each n(i), but the probability that we reach an erroneous conclusion for at least one of the values n(i) increases as m increases. In this paper, we design new statistical techniques oriented towards experiments on simulated data, techniques that would guarantee that the error stays under, say, 1% no matter how many experiments we run. |

2006 | 10.20965/jaciii.2006.p0260 | JOURNAL OF ADVANCED COMPUTATIONAL INTELLIGENCE AND INTELLIGENT INFORMATICS |

hypothesis testing, simulated data | Data dredging,Alternative hypothesis,Population,Statistic,Computer science,Testing hypotheses suggested by the data,Artificial intelligence,Statistics,Exploratory data analysis,Standard deviation,Machine learning,Statistical hypothesis testing | Journal |

10 | 3 | 1343-0130 |

0 | 0.34 | 7 |

3 |

