Statistic Settings
Choose the Statistical Method & Confidence Level
Ablyft supports two scientifically grounded methods for evaluating experiment results: Frequentist and Bayesian. Each comes with unique strengths and interpretations.
1. Overview: Two Statistical Approaches
A/B testing is fundamentally about comparing groups. There are two ways to analyze these comparisons:
- Frequentist: Based on long-term repeatability and p-values
- Bayesian: Based on probability distributions over outcomes
2. Key Differences
Aspect | Frequentist | Bayesian |
---|---|---|
Focus | How likely are the results assuming no effect? | How likely is the effect given the results? |
Main Metric | p-value, Confidence Interval | Posterior Probability, Credible Interval |
Can interpret probability of hypothesis? | ❌ No | ✅ Yes |
Can peek at results early? | ❌ No (can bias results) | ✅ Yes (with caution) |
3. Frequentist Statistics
In frequentist analysis, a confidence level expresses how confident we are that a confidence interval contains the true effect in repeated sampling.
Confidence = 100% − p-value
A p-value represents the probability of observing an effect this extreme if there were no real difference (null hypothesis).
- ✅ Good for strict statistical control
4. Bayesian Statistics
In Bayesian analysis, we ask directly:
“What is the probability that B is better than A?”
A credible interval tells you the range where the effect lies with X% probability.
- ✅ Allows probability statements about improvements
- ✅ Naturally handles uncertainty and early decisions
- ✅ Provides posterior probability (e.g. "There’s a 93% chance B is better than A.")
5. Setting the Confidence / Credibility Level
In Ablyft, you can define the level of certainty you want in your conclusions, whether using frequentist or Bayesian analysis.
- For Frequentist: This controls the confidence interval width and the p-value threshold.
- For Bayesian: This controls the width of the credible interval and the interpretation of certainty.
You can set a value between 50% and 99.9% depending on your use case:
- Lower levels → faster results, more risk
- Higher levels → more certainty, longer run time
6. Example Comparison
Variant A conversion rate: 12.0%
Variant B conversion rate: 13.2%
Metric | Frequentist | Bayesian |
---|---|---|
Difference | +1.2% | +1.2% |
Confidence / Credible Interval | [+0.4%, +2.0%] | [+0.3%, +2.1%] |
p-value | 0.03 | — |
Posterior Probability | — | 93% chance B is better |
7. Minimum Conversions
To avoid false positives early in an experiment, we recommend setting a minimum conversion threshold.
Reasonable values might be 100 or 1000 conversions.
8. Summary
- Frequentist: well-established, precise, but limited in interpretation
- Bayesian: intuitive, flexible, and great for decision-making under uncertainty
Choose the method that best fits your team’s mindset or combine both to gain a fuller picture.