Test Reliability9 min readJune 2, 2026Updated June 9, 2026

How to Fix Flaky Tests: A Step-by-Step Reliability Playbook

Flaky tests have a handful of recurring root causes — and each has a concrete fix. This is a practical, step-by-step playbook for diagnosing flakiness and making your suite deterministic.

QG

Alex Johnson

TL;DR

Fixing flaky tests is a repeatable process: reproduce the failure, capture a trace to find the root cause, then apply the right fix — condition-based waiting, test isolation, semantic selectors, or controlling external dependencies. Resist blind retries; treat every flake as a bug to diagnose.

A flaky test — one that passes and fails on the same code — is not a mystery to be endured. Flakiness comes from a small set of recurring root causes, and each has a concrete fix. This is the step-by-step process for turning an unreliable test into a deterministic one.

Step 1: Reproduce the failure

You cannot fix what you cannot see fail. Run the test repeatedly — many test runners have a repeat-until-failure mode — to confirm it actually flakes and to get a failing instance to study. If it only fails in CI, try to match the CI environment locally: same browser, same parallelism, similar resource limits.

Step 2: Capture evidence at the moment of failure

Guessing wastes hours. Capture a trace, screenshot, or video of the failing run. A Playwright trace, for example, shows the exact state of the page before and after each action, plus network activity and console logs. Nine times out of ten, the root cause is obvious once you can see what the browser saw when it broke.

Step 3: Identify the root cause and apply the matching fix

Most flakiness traces back to one of these causes:

Timing and race conditions → wait for conditions, not clocks

If the test acts before the page is ready, replace any hardcoded pauses with waits for the actual state you need — an element being visible, a request having finished, a spinner having disappeared. Frameworks like Playwright auto-wait for elements to be actionable, which removes most timing flakiness when you lean on it instead of fighting it.

Shared state → isolate every test

If a test passes alone but fails when run with others, state is leaking between tests. Give each test its own data — a unique user, a fresh record — and a clean session so nothing carries over. Tests should never depend on the order they run in.

Brittle selectors → target by role and text

If a routine UI tweak breaks the test even though the feature works, the selector is tied to styling. Locate elements the way a user perceives them — by role, label, or visible text — rather than by fragile CSS classes or deep DOM paths.

Uncontrolled dependencies → mock or pin them

If the test depends on a live third-party service, a real network, or the current date and time, it will fail whenever those wobble. Mock external services, stub network responses, and pin time-sensitive values so each run is repeatable.

Step 4: Verify the fix holds

After applying a fix, run the test many times again. If it passes consistently across repeated runs and in parallel with the rest of the suite, you have genuinely fixed it — not just made it fail less often.

The anti-pattern: blind retries

It is tempting to configure tests to auto-retry until they pass. Used sparingly for rare infrastructure blips, retries are fine. Used as a default cure, they hide the problem: the test still flakes, you just stop seeing it, and the underlying race condition stays in your product. Treat a retry as a prompt to investigate, not a fix.

Make reliability a habit

Flakiness creeps back as an app grows, so the real win is cultural: a clean reproduction before every fix, a trace for every failure, and a rule that a red build always means something. We lay out a fuller version of this in our blog post on why flaky tests destroy developer trust.

Sustaining this across a large suite is hard, and it is exactly what a managed QA team does full-time. QA Guardian builds tests that are deterministic by design and investigates every failure, so your suite stays trustworthy as you ship. Book a demo to see what a zero-flake suite looks like.

Frequently asked questions

How do I find the root cause of a flaky test?

Run the test repeatedly to reproduce the failure, then capture a trace or video of a failing run. Reviewing the exact state of the page at the moment of failure usually reveals whether the cause is timing, shared state, a brittle selector, or an external dependency.

Should I delete a flaky test?

Only as a last resort. A flaky test usually points at a real reliability problem worth fixing. Quarantine it from blocking your pipeline if needed, but investigate and repair it rather than deleting the coverage entirely.

How do you quarantine a flaky test?

Tag the test with a flaky marker or move it to a separate CI job that does not block the merge queue. This stops the flaky test from delaying your team while you investigate the root cause without deleting its coverage.

Tags

flaky teststest reliabilitydebuggingPlaywright

See modern QA in action

Everything we write about is what we build and run every day. Book a demo and we'll show you flow-based Playwright coverage on your own codebase.