Recovery time objective (RTO) and recovery point objective (RPO) are the two numbers that decide whether your client's business survives the next outage - and whether your MSP keeps the contract. RTO is how long you have to bring systems back online after an incident. RPO is how much data the client can afford to lose, measured in time. Get them wrong by ten minutes and you've either overspent on backup infrastructure or written a contract you cannot honor at 3 a.m. on a Sunday.
In this guide, you'll see how the two metrics relate, how to set them by client tier without overbuilding, how to translate them into the right backup architecture, and how to turn the recovery conversation into a tier-pricing engine that lifts MRR.
The framing here pulls from MSP backup design patterns published by Veeam, Rubrik, Acronis, and Druva, and from a decade of SMB recovery work where the gap between contracted and measured recovery times has cost MSPs real money.
What RTO and RPO Mean
RTO answers a single question: from the moment a system goes down, how long until it's back up and serving users? If a client's RTO for their accounting server is four hours, you have four hours to restore from backup, fail over to a warm standby, or rebuild the box. Miss it and the client's SLA breach clock starts.
RPO answers a different question: at the moment of failure, how far back in time is the most recent recoverable copy of the data? If the RPO is one hour, your backup process needs to capture changes at least every 60 minutes. A four-hour RPO means the client accepts losing up to four hours of work in a recovery scenario.
The two numbers point in opposite directions on the timeline. RPO sits before the incident - it's about backup frequency. RTO sits after the incident - it's about restore speed. Confusing them is the most common mistake on MSP intake calls, and it leads to clients who think a daily backup is fine for systems that need fifteen-minute snapshots.
Veeam, Rubrik, Acronis, and Druva all publish their own definitions, and they all agree on the basics. Where vendors differ is on how aggressively they push you toward continuous data protection, which collapses RPO toward zero but costs more than most SMBs will pay.
Why the Difference Matters for MSPs
Pricing comes from this distinction. A client who needs a 15-minute RPO needs near-continuous replication, which means more storage, more bandwidth, and more monitoring overhead. A client who can tolerate a 24-hour RPO can run on nightly image backups and standard agent licensing. The cost delta between those two tiers is 4x to 10x on the infrastructure side, and most MSPs charge a flat backup fee that does not reflect the real difference.
The other reason this matters: liability. If your contract promises a one-hour RTO and your backup product takes three hours to restore a 500GB SQL database over a 100 Mbps link, you signed a check you cannot cash. Plenty of MSPs have lost clients - and faced legal action - over recovery targets that looked reasonable on paper and were impossible in practice. The contract did not match the physics.
The third reason is renewal economics. Clients renew when the cost of switching outweighs the cost of staying, and the moment of truth is whenever something breaks. An MSP that has documented, tested, and right-sized recovery targets walks into renewal meetings with proof of value. An MSP that has paper SLAs and no test record walks into renewal meetings hoping the client forgot about last quarter's six-hour outage.
How to Set RTO and RPO by Client Tier
The right way to set these numbers is to start with the client's revenue exposure, not their wishlist. Ask: for every hour this system is down, how much money does the business lose? For every hour of data they lose, what's the rework cost? Those answers anchor the conversation in business terms instead of IT terms.
Here's a tiering model that works for most SMB MSPs:
| Tier | Client profile | Typical RTO | Typical RPO | Backup architecture |
|---|---|---|---|---|
| Tier 1 - Critical | E-commerce, healthcare, financial, 24/7 ops | 1 hour | 15 minutes | Continuous replication + warm DR site |
| Tier 2 - Important | Professional services, B2B SaaS, 8/5 ops | 4 hours | 1 hour | Hourly snapshots + cloud failover |
| Tier 3 - Standard | Office productivity, internal tools | 8-12 hours | 4 hours | Image-based backup, 4x daily |
| Tier 4 - Tolerant | Archives, legacy systems, low-touch | 24-48 hours | 24 hours | Nightly image, single offsite copy |
Two things to call out. First, the tiers are not based on technology - they're based on revenue impact. A small accounting firm might have a Tier 1 system (their practice management database during tax season) and a Tier 4 system (their shared marketing folder) under the same roof. Per-system tiering is the only honest way to price.
Second, the architecture column changes the math. Most MSPs reflexively quote Tier 2 service for every system, which over-serves the tolerant systems and under-serves the critical ones. A tiered intake conversation forces the client to articulate which systems they cannot live without, and that conversation is where the SLA upsell lives.
The Numbers Most MSPs Get Wrong
Three recurring mistakes show up in MSP backup designs. Each one breaks recovery targets the first time the plan is tested.
The first is bandwidth math. A 500GB virtual machine restored over a 100 Mbps WAN link takes roughly 11 hours at theoretical maximum, and real-world conditions cut that throughput in half. If the client's RTO is four hours and the restore path is the WAN, the math does not work. The fix is local restore appliances, image-based recovery at the hypervisor layer, or pre-staged replicas. None of that gets discussed if the only number on the contract is "RTO: 4 hours."
The second is snapshot retention versus RPO. A vendor that takes a snapshot every 15 minutes does not guarantee a 15-minute RPO if snapshots are pruned after 24 hours and the corruption is only discovered on day three. RPO has a retention dimension that gets ignored. Build retention policies that match the worst-case discovery delay for each client, usually 30 to 60 days for SMBs.
The third is the dependency chain. Restoring a SQL Server is meaningless if the application server, the DNS, the Active Directory, and the firewall config aren't restored in the right order. MSP runbooks routinely state an RTO per system without acknowledging that systems have prerequisites. The honest RTO is the slowest path through the dependency graph, not the fastest standalone restore.
Translating RTO and RPO Into Backup Architecture
Once the tiers are set, the architecture choices fall out of them. There are five common patterns, and each maps to a specific RTO and RPO range.
Local-only backup hits an RPO of 4 to 24 hours and an RTO of 4 to 24 hours, depending on data volume. It's cheap, it's fast for small restores, and it dies completely if the site dies. Use it as the first tier of defense, never the only tier.
Cloud backup with no local cache adds offsite protection at the cost of restore speed. RPO stays in the 1 to 24 hour range. RTO climbs to 8 to 48 hours for anything large because the data has to come back over the internet. Fine for archives, dangerous for production.
Hybrid local-plus-cloud is where most MSPs land for Tier 2 clients. Local appliance for fast restores, cloud for offsite. RPO of 1 to 4 hours, RTO of 2 to 8 hours. Datto, Veeam, and Acronis all sell this pattern, and it covers the majority of SMB use cases without exotic engineering.
Replication with warm standby pushes RPO under one hour and RTO under one hour. The cost is roughly double a hybrid setup because you're paying for a running secondary environment. Reserve this for Tier 1 systems with quantifiable downtime cost.
Continuous data protection (CDP) drives RPO toward seconds and RTO toward minutes. It's expensive, it's bandwidth-hungry, and most SMBs do not need it. When a client demands "zero data loss," walk them through the cost. The conversation usually ends with a four-hour RPO.
The architecture should be documented per-system, not per-client. A single SMB office often runs three or four of these patterns simultaneously, and the documentation matters more than the technology when the recovery happens at 2 a.m.
How to Sell Realistic Recovery Targets to SMB Clients
The hardest part of this conversation is not technical, it's commercial. SMB clients ask for one-hour RTOs because the number sounds reassuring, then balk at the price. MSPs cave and quote an aggressive RTO at a mid-tier price, then live with the gap until something breaks.
The fix is to anchor the conversation in dollars before you anchor it in minutes. Ask the client what an hour of downtime costs their business. Ask them what an hour of lost work costs across the team. Multiply those numbers by a plausible incident frequency, one major incident per year is a defensible starting point, and you have an annual downtime budget. Now match the recovery investment to the budget.
A client whose downtime costs $500 an hour does not need a $40,000 replication setup. A client whose downtime costs $50,000 an hour cannot afford an 8-hour RTO. The numbers force the right answer. If you want a deeper take on this kind of cost-driven conversation, our piece on how to reduce IT costs walks through the framework MSPs are using to reframe price discussions with SMB owners.
The tier table also doubles as a sales tool. Show clients the four tiers, ask which one each system belongs in, and let them see the price step between Tier 3 and Tier 2. When the client picks the tier, they own the trade-off. That single shift, from MSP recommendation to client selection, resolves most contract disputes before they happen. The same logic applies when an MSP runs a stack review and finds tooling that's not earning its keep, a process we break down in our MSP stack audit guide.
Where OpenFrame Fits in the Recovery Stack
OpenFrame is Flamingo's AI-native all-in-one MSP and IT platform. It ships native RMM, native PSA, documentation, and AI workflow agents in a single pane, with no vendor lock-in and pricing that scales by endpoint instead of by module. For recovery work, the value is that the same tool that monitors backup job health also handles the ticket, the documentation, and the runbook automation when a recovery fires.
Most MSPs run their backup product, their RMM, their PSA, and their documentation tool as four disconnected systems. The recovery workflow crosses all four. When a SQL job fails, the alert lives in the backup tool, the ticket lives in the PSA, the runbook lives in the documentation tool, and the affected endpoints are listed in the RMM. OpenFrame collapses that into one workflow with AI-generated incident summaries and dependency mapping that respects the RTO order of operations. MSPs evaluating their current monitoring layer as part of a recovery review usually start with our RMM tools comparison, which lays out the field by price, automation depth, and integration footprint.
Frequently Asked Questions
What is the difference between RTO and RPO in plain English?
RTO is the maximum time a system can be down before the business takes serious damage. RPO is the maximum amount of recent data the business can afford to lose, measured in time. RTO is about restore speed; RPO is about backup frequency. They describe the same incident from opposite sides of the failure moment.
Can RTO and RPO be the same value?
Yes, and it happens often. A client with a four-hour RTO and a four-hour RPO is running hourly-ish backups and committing to bring systems back within four hours. The two numbers are independent variables, though, and pegging them together is a design choice, not a requirement. Critical systems usually need a tighter RPO than RTO.
Who is responsible for defining RTO and RPO, the MSP or the client?
The client owns the decision because the trade-off is commercial, not technical. The MSP's job is to surface the cost curve, present the architecture options for each tier, and document the choice. If the MSP picks the numbers without the client's input, the MSP eats the cost of any mismatch.
What is a realistic RPO for SMB email and productivity?
For Microsoft 365 and Google Workspace, a one-to-four hour RPO is standard with third-party backup tools like Veeam, Acronis, or Druva. Native Microsoft retention is not a backup. It covers user error within a 30-day window but does not protect against admin compromise or ransomware that targets the tenant.
How often should we test recovery to validate RTO and RPO?
Quarterly at minimum, monthly for Tier 1 systems. A recovery target that has never been tested is a guess. Tests should include full restores of representative workloads, not just file-level spot checks. Document the measured RTO and compare it to the contracted RTO every quarter.
Do RTO and RPO apply to ransomware scenarios?
They do, but the math changes. Ransomware introduces an unknown discovery delay. The malware may be dormant for weeks before detonation. That means RPO retention has to cover the worst-case dwell time, often 60 to 90 days, with immutable copies. Standard 30-day rolling retention is not enough for modern ransomware response.
The Recovery Number That Pays the Mortgage
Recovery targets are not a backup conversation. They're a contract conversation, a pricing conversation, and a risk-allocation conversation, and the MSPs that win the next decade will run those conversations on purpose instead of stumbling into them when a server dies. Every client of every MSP has an RTO and an RPO. The only question is whether they're written down or whether they only exist in the heads of the people who will be cleaning up at 3 a.m. Write them down. Tier them. Test them. Price them like the SLA commitments they are. The MSPs doing that today are charging two to three times what their peers charge for the same backup product, and their renewal rates are higher, not lower, because the contract finally matches reality.
Kristina Shkriabina
Kristina runs content, SEO, and community at Flamingo and OpenMSP. She spent years as a correspondent for Ukraine's Public Broadcasting Company before making the jump to tech. Now she covers MSP stack decisions and strategy. You can connect with her in the OpenMSP community or on LinkedIn.
