Disaster Recovery Testing: How Often to Run a Drill

Your backups ran green every night for a year. Then ransomware lands on a Tuesday morning, you go to restore, and the recovery fails halfway through. Disaster recovery testing exists to catch that failure before a real incident does. An untested recovery plan is a document nobody has confirmed actually works. This guide covers how often you should run a drill, the methods worth using, and how to turn your recovery plan into something you can lean on.

What disaster recovery testing actually proves

A disaster recovery plan describes what should happen when something goes badly wrong: a server fails, a building floods, a supplier outage takes down your line-of-business app, or attackers encrypt your data. Disaster recovery testing is the practice of rehearsing that plan under controlled conditions, so you confirm it works before a real incident forces the question. A plan tells you what you intend to do. A test tells you what will actually happen.

Most failures we find in the field are quiet ones. A backup job had been skipping one critical database for months. The recovery runbook named a staff member who left last year. The restore worked, but it took eleven hours when the business assumed two. None of these surface until you run the test. Good it disaster recovery testing measures the two numbers every business owner cares about: how long it takes to get running again, and how much data you lose along the way.

Know your two targets: RTO and RPO

Before you test anything, agree on two numbers. They define what success looks like and stop a drill from drifting into vague tinkering.

•Recovery Time Objective (RTO) is the longest you can tolerate being down. If your accounts team cannot raise invoices for four hours, you may survive it. Four days might sink the quarter.
•Recovery Point Objective (RPO) is the most data you can afford to lose, measured in time. An RPO of one hour means you accept losing, at most, the last hour of work. Nightly backups push your RPO closer to 24 hours, which catches a lot of business owners off guard.

Score every drill against these targets. A recovery that works but blows past your RTO is still a failure you need to fix. You set realistic RTO and RPO figures as part of broader IT strategy and planning, because they are business decisions as much as technical ones.

The main types of disaster recovery testing

The disaster recovery testing methods sit on a ladder from low-effort to high-fidelity. You do not pick one. You rotate through them across the year. Knowing the types of disaster recovery testing lets you match effort to risk.

Tabletop exercise

A discussion-based walkthrough. Your team sits in a room (or on a call) and talks through a scenario step by step: who does what, who they call, where the runbook lives. It costs an hour and exposes gaps in communication and assumptions. Start your disaster recovery plan testing here if you have never run a drill before.

Restore and recovery testing

You restore data and systems from your backups into an isolated environment and confirm they come back clean and usable. This is the single most valuable test, because it validates the thing most businesses get wrong: that backups are recoverable, not just present.

Disaster recovery failover testing

Disaster recovery failover testing cuts a live workload over to your secondary or cloud environment, runs it there, then fails back. This is the gold standard for anything mission critical, and modern cloud and Microsoft 365 management makes it far easier than the old physical-site model.

Full simulation

An unannounced, end-to-end drill that treats a scenario as if it were real, including staff, communications and customer-facing impact. High effort, high realism. Reserve it for organisations where downtime carries serious cost or regulatory weight.

How often should you run a disaster recovery test?

Frequency should track risk, so there is no single answer. Here is a sensible baseline for a typical Sydney SMB, and it forms the spine of solid disaster recovery testing best practices.

•Monthly: automated restore checks of a sample of backups. These run hands-off and verify that recent backups can be read and restored.
•Quarterly: a restore test of a critical system into an isolated environment, scored against your RTO and RPO.
•At least annually: a full disaster recovery failover test plus a tabletop exercise covering a realistic scenario such as ransomware or an extended cloud outage.
•On change: any time you migrate a server, change backup tooling, adopt a new line-of-business app, or restructure the team, retest the affected part. Recovery plans rot fastest right after a change.

The Australian Cyber Security Centre's Essential Eight treats regular backups and tested restoration as core controls, and higher maturity levels expect you to test restoration in a coordinated, documented way. If you carry cyber insurance, your insurer increasingly asks whether you test your recovery, not just whether backups exist. Treat annual testing as the floor.

A backup you have never restored is a hope, not a recovery plan. The only backup that counts is the one you have proven you can bring back.

How to run a drill: a disaster recovery testing checklist

A drill is a deliberate, repeatable exercise with a clear scope and a written outcome. Use this disaster recovery testing checklist to structure any test, from a simple restore to a full failover.

•Define the scenario. Pick one realistic event, say a ransomware encryption of the file server, and write down exactly what you are simulating.
•Set the success criteria. State the target RTO and RPO for the systems in scope so you have something to measure against.
•Assign roles. Name who leads, who restores, who communicates, and who signs off. Check those people are available on the day.
•Use an isolated environment. Never test a restore over the top of production. Recover into a sandbox so a mistake during the drill cannot cause a real outage.
•Run the recovery and time it. Follow the runbook exactly as written. If a step is wrong or missing, note it rather than fixing it from memory.
•Validate the result. Confirm the recovered system works: log in, open files, run a transaction, check data integrity. A system that boots but is corrupt is not recovered.
•Record and remediate. Write down what worked, what failed, the actual times you hit, and every gap you found. Assign owners and due dates, then fix them before the next drill.

That last step is the one most businesses skip, and it compounds. A drill that finds five problems only pays off when you close all five. You run disaster recovery testing scenarios repeatedly to drive the failure count down until a recovery becomes boringly predictable.

Heads up

Always test restoration into an isolated environment, never straight over production. We have watched well-meaning drills cause real outages when a restore overwrote live data, or when someone triggered a failover with no tested way to fail back. Treat the test as a change that needs its own rollback plan. If you are unsure whether your setup is safe to test, get a second pair of eyes before you start.

Common pitfalls that make a drill worthless

A test can pass on paper and still leave you exposed. Watch for these traps, which we run into often when we review existing setups for Sydney businesses.

•Testing only the easy systems. If you keep restoring the same small, simple database, you never learn whether your complex line-of-business app comes back. Rotate the scope across drills.
•Ignoring dependencies. A restored app server is useless if the authentication, DNS or database it relies on stays down. Test systems the way they actually depend on one another.
•Forgetting the people. If the one person who knows the recovery steps is on leave during a real incident, the plan fails. Cross-train your team and keep the runbook somewhere you can reach when systems are down.
•Assuming a backup means immunity to ransomware. Modern attackers go for the backups first. Confirm you hold an offline or immutable copy an attacker cannot encrypt, and that it restores cleanly.

Recovery sits inside your wider security posture, so a recovery drill also lets you check that your defences and your cyber security controls hold up under a realistic attack, not just a tidy hardware failure.

Where a managed IT partner fits

For most small and medium businesses, the barrier to regular testing is not willingness. It is time and the isolated infrastructure you need to test safely. This is where an MSP earns its place. As part of managed IT support, a partner schedules and runs the testing rotation, keeps the runbooks current, scores each drill against your RTO and RPO, and reports the results in plain language you can act on, often backed by an SLA so the testing happens rather than slipping down the list.

What you buy is confidence: a documented, recently proven ability to recover within a known timeframe with a known amount of data loss. That separates a recovery plan that reassures your board from one that sits in a folder nobody has opened since the day it was written.

This article reflects best practices as of the publication date. Technology and security recommendations evolve, so verify current guidance with the original sources or our team before acting.

Frequently Asked Questions

What is disaster recovery testing?▼

Disaster recovery testing is the practice of rehearsing your recovery plan under controlled conditions to confirm it works before a real incident. You validate that backups restore cleanly, that systems come back within your target recovery time, and that data loss stays within an acceptable window. A plan describes intent; the test proves the outcome.

How often should you run a disaster recovery test?▼

A sensible baseline for an SMB is automated restore checks monthly, a critical-system restore test quarterly, and a full failover test plus a tabletop exercise at least annually. Retest whenever you make a significant change, such as migrating a server or swapping backup tooling, because recovery plans break most often right after a change.

What are the main types of disaster recovery testing?▼

The main methods, from lowest to highest effort, are: tabletop exercises (a discussion-based walkthrough of the plan), restore and recovery testing (restoring data into an isolated environment), disaster recovery failover testing (cutting a live workload over to a secondary or cloud environment and back), and full simulation (an unannounced end-to-end drill). Most businesses rotate through several of these across the year.

What happens if you don't test your disaster recovery plan?▼

You discover the plan's flaws during a real crisis, the worst possible time. Common surprises include backups that silently skipped a critical system, restores that take far longer than assumed, runbooks that name departed staff, and backups that attackers managed to encrypt. An untested plan also weakens your position on cyber insurance and against Essential Eight expectations, since both increasingly require tested restoration, not just backups.

What are good reasons to do yearly disaster recovery testing?▼

An annual full test catches the gaps that pile up over a year of changes: new applications, migrated servers, staff turnover and shifting threats. It confirms your end-to-end recovery still meets your RTO and RPO, satisfies Essential Eight and cyber insurance expectations around tested restoration, and gives leadership documented, recent proof that the business can recover. Treat it as the realistic minimum, with more frequent targeted testing layered on top.