Test statistical hypotheses with detailed results. Part of the DevTools Surf developer suite. Browse more tools in the Statistics collection.
Use Cases
Test whether a measured difference between two groups is statistically significant using a t-test.
Perform a chi-squared test to evaluate independence between categorical variables.
Calculate the required sample size for a study to achieve specified power and significance level.
Run an ANOVA to compare means across three or more groups.
Tips
State the null hypothesis precisely before collecting data — vague hypotheses produce vague conclusions and lead to cherry-picking.
Check statistical power before interpreting a non-significant result — a p > 0.05 result from an underpowered study doesn't mean the effect doesn't exist.
Report effect sizes (Cohen's d, eta-squared) alongside p-values — statistical significance doesn't indicate practical significance, especially in large samples.
Fun Facts
The p-value threshold of 0.05 was proposed by Ronald Fisher in 1925 in 'Statistical Methods for Research Workers' as a 'convenient' cutoff, not a mathematically-derived boundary. Fisher himself later argued against treating it as a universal rule.
A 2016 Nature survey of 1,576 scientists found that 52% agreed that science was facing a 'reproducibility crisis', with p-hacking (testing multiple hypotheses until one passes p < 0.05) identified as a primary cause.
The American Statistical Association (ASA) issued a statement in 2019 recommending that statistical significance and p-values not be used to make binary pass/fail decisions — a significant shift from decades of scientific practice.
FAQ
Which tests does it support?
One-sample and two-sample t-tests, paired t-test, chi-squared test of independence, ANOVA (one-way and two-way), Mann-Whitney U test, and Wilcoxon signed-rank test.
How do I interpret a p-value?
P-value is the probability of observing results at least as extreme as yours, assuming the null hypothesis is true. A p-value of 0.03 means there's a 3% chance of this result if H0 is true — not a 97% chance H1 is true.
What's statistical power?
Power is the probability of correctly rejecting a false null hypothesis. Power = 1 - β (Type II error rate). Target 80% power minimum (β = 0.20). Low-power studies produce false negatives.