OpenShift Virtualization
Reference Architecture
Migrate & Modernize  ·  February 2026
The Environment — Where Most Customers Are Starting
This is an anonymized composite of the environments we consistently encounter. The specific hardware names differ. The problems are nearly identical across every organization we talk to.
20–150 hosts
Across primary datacenter, multiple hardware generations and vendor contracts
200–5,000 VMs
Mix of RHEL, Windows Server, and legacy operating systems — each with different owners
3–9 mo.
Typical renewal or contract pressure window forcing a platform decision now

What the infrastructure looks like today

Compute Primary DC has servers across multiple hardware generations — some on active support contracts, some not. Capacity planning is manual and reactive.
Storage VMs live on datastores. Different teams manage storage differently. No consistent tiering strategy. Live migration depends on shared storage being right — and it often isn't.
DR Secondary site exists but is passive and undertested. Backups are agent-based on individual VMs. RTO is measured in hours, not minutes. Last full DR test was over a year ago.
Automation Operations are runbook-driven. Changes go through manual change management. VM provisioning takes days. Secrets live in shared spreadsheets or in the heads of two administrators.
Containers A separate Kubernetes cluster exists somewhere, managed by a different team. VMs and containers do not share tooling, monitoring, or management plane.

What this costs the mission

The issue isn't that the platform is old. The issue is that the platform increasingly dictates what the mission can and cannot do. Velocity slows. Security posture is harder to prove. Every audit requires manual evidence collection. Every new workload requires a negotiation between teams.

The administrators are not the problem. They are experts in a platform that is approaching end of support, end of commercial viability, or both. Their expertise is real and transferable — the goal is not to replace people, it is to give them a platform that keeps pace with the mission.

Common customer questions

"My administrators only know VMware — is this realistic for them?"
"Do I need all new hardware to make this work?"
"I have a renewal in 3–6 months. Does that timeline work?"
"Why do this over a lift-and-shift to cloud or another legacy platform?"
"What is the FedRAMP story here?"
"Can we get hands-on with this before we commit?"
Primary Datacenter — Multi-Availability Zone OpenShift Cluster
Three physically isolated availability zones — separate rooms in the same datacenter. A single OpenShift cluster spans all three, with control plane nodes distributed for HA. Each zone carries independent storage. Each partner's layer is visible throughout.

Architecture Overview

Use the Spotlight buttons above to highlight any partner's contribution to the architecture and surface the relevant discussion context for that segment of the presentation.

Multi-AZ HA Mixed VM workloads Existing hardware FedRAMP boundary
Availability Zone A Physical Room 1
Tier 1 · All-Flash Array (NVMe)
All-Flash ArrayNVMe
StorageClass: ontap-nas-t1  ·  Protocol: NFS / iSCSI  ·  RWX enabled → live VM migration
All-Flash Array — HA PairNVMe
OpenShift Nodes
control-plane-01Control
worker-01Worker
worker-04Worker
+ additional workers
VM Workloads
RHEL Third Party Linux Windows Server Windows Desktop Mixed OS
Tier 2 · Capacity (SAS/SSD)
Capacity ArraySAS / SSD
StorageClass: ontap-nas-t2  ·  Backup target volumes  ·  SnapMirror source
Capacity Array — HA PairSAS / SSD
Availability Zone B Physical Room 2
Tier 1 · All-Flash Array (NVMe)
All-Flash ArrayNVMe
StorageClass: ontap-nas-t1  ·  Protocol: NFS / iSCSI  ·  RWX enabled → live VM migration
All-Flash Array — HA PairNVMe
OpenShift Nodes
control-plane-02Control
worker-02Worker
worker-05Worker
+ additional workers
VM Workloads
RHEL Third Party Linux Windows Server Windows Desktop Legacy Applications
Tier 2 · Capacity (SAS/SSD)
Capacity ArraySAS / SSD
StorageClass: ontap-nas-t2  ·  Backup target volumes  ·  SnapMirror source
Availability Zone C Physical Room 3
Tier 1 · All-Flash Array (NVMe)
All-Flash ArrayNVMe
StorageClass: ontap-nas-t1  ·  Protocol: NFS / iSCSI  ·  RWX enabled → live VM migration
All-Flash Array — HA PairNVMe
OpenShift Nodes
control-plane-03Control
worker-03Worker
worker-06Worker
+ additional workers
VM Workloads
RHEL Third Party Linux Windows Server Windows Desktop Containerized Apps
Tier 2 · Capacity (SAS/SSD)
Capacity ArraySAS / SSD
StorageClass: ontap-nas-t2  ·  Backup target volumes  ·  SnapMirror source
Capacity Array — HA PairSAS / SSD
How it fits in this environment
Conversation anchor
Platform Services
OCP Virtualization
Advanced Cluster Management
NetApp Trident CSI
Veeam Kasten K10
HashiCorp · ArgoCD / GitOps
Ansible Automation Platform
HashiCorp Vault
Elastic Stack
Prometheus / Monitoring
Compute / Platform
Red Hat OpenShift
OCP Virtualization, node sizing, CPU pinning, multi-AZ scheduling, ACM fleet
Storage / CSI
NetApp
Trident CSI, DataVolumes, StorageClass tiering, ReadWriteMany for live migration
DR / Data Protection
Veeam · Kasten K10
K8s-native backup, SnapMirror replication, app-consistent, RPO/RTO planning
Automation / Secrets
HashiCorp
ArgoCD, Ansible AAP, Vault dynamic secrets, IaC-driven VM lifecycle
Observability
Elastic / Carahsoft
Elastic Agent DaemonSet, unified logs + metrics + SIEM, FedRAMP-ready
Key
Control plane node
Worker node
Tier 1 — NetApp All-Flash
Tier 2 — NetApp Capacity
Spotlight buttons highlight each partner's layer across the full architecture
DR & Cloud Sites — Secondary Datacenter and Cloud Egress
A reduced-footprint secondary site serves as the primary DR and backup target. An optional cloud site provides burst, tertiary DR, or archival egress inside a FedRAMP authorization boundary. These sites are managed as part of the same fleet via ACM.

The Backup Conversation is Different on Kubernetes — and More Flexible

Veeam Kasten K10 protects at three scopes: an individual VM and its attached storage, a full namespace containing multiple VMs and applications, or the entire cluster. You define the policy — Kasten executes it consistently. This is a meaningful upgrade from agent-based backup, which requires per-VM configuration and breaks when VMs move. Alongside Kasten, SnapMirror handles block-level storage replication to the DR site independently — so your data is protected at both the application layer and the storage layer. These two mechanisms together give you something more reliable and more testable than most organizations have today.

Per-VM protection Namespace-level protection Full cluster backup App-consistent restore SnapMirror storage replication Testable failover
Secondary Datacenter
Reduced footprint — DR and backup target
Passive / Active DR
OpenShift Nodes
control-planeControl
worker-01Worker
worker-02Worker
+ scale out on failover
NetApp— Less performant by design, SnapMirror destination
Capacity Array — HA PairSAS
Capacity ArraySAS
Object / Archival TargetS3-compatible
Veeam Kasten K10— Backup target & DR restore point
DR Modes
PassiveData replicated, cluster powered down. Manual failover. Lowest cost.
ActiveCluster running, workloads live. Automated failover. Higher cost.
Cloud Egress Site
GovCloud or commercial — optional architecture component
Optional
Managed OpenShift
Managed OpenShift ClusterCloud-hosted
HashiCorp Vault Enterprise— Secrets sync across all sites
Cloud Storage
Object StorageS3-compatible
Managed Block (CSI)Cloud-native
Use Cases
BurstOverflow compute for mission demand peaks.
Cloud DRThird site for additional resiliency.
ArchiveLong-term retention and compliance egress.
ACM provides single-pane fleet management across all three sites. FedRAMP authorization boundary applies when configured correctly across primary, DR, and cloud.
How Recovery Actually Works — Two Workflows Side by Side
NetApp SnapMirror — Storage Failover
Block-level replication · Storage layer recovery · Used for site-wide DR
SnapMirror continuously replicates storage volumes from the primary datacenter to the secondary site at the block level. It is storage-layer protection — independent of what is running on top. In a failover scenario, the secondary volumes are promoted and become read/write. OpenShift at the DR site mounts those volumes and starts workloads.
Step 1
SnapMirror replication running continuously. RPO is minutes, not hours — based on replication schedule configured per volume or policy.
Step 2
Primary site incident detected. Decision to failover made. SnapMirror relationship is broken — secondary volumes promoted to read/write.
Step 3
OpenShift cluster at DR site brought active. Trident CSI re-attaches the promoted volumes as PersistentVolumes.
Step 4
VMs start on DR cluster using replicated storage. Applications resume from last replication checkpoint. DNS/routing updated to DR site.
Step 5
Primary restored. Reverse replication syncs changes back. Planned failback executed. SnapMirror relationship re-established in original direction.
Best used for: Site-wide disaster, storage hardware failure, datacenter-level outage. Protects the data layer regardless of what caused the incident.
Veeam Kasten K10 — Application Restore
Application-layer recovery · Per-VM, namespace, or cluster scope · Any cluster in the fleet
Kasten K10 operates at the application layer. It captures a consistent point-in-time snapshot of a VM, a namespace, or the entire cluster — including storage volumes, configuration, and metadata. Restore can target any cluster in the fleet. This makes Kasten the right tool for accidental deletion, corruption, ransomware recovery, and cross-cluster workload mobility as well as DR.
Step 1
Kasten K10 policies run on a defined schedule. Backup scope is per-VM, per-namespace, or cluster-wide. Snapshots stored on-cluster or exported to S3-compatible object storage at the DR site.
Step 2
Recovery event occurs — deleted VM, corrupted namespace, ransomware, or full site failure. Administrator selects restore point and target cluster from the Kasten dashboard.
Step 3
Kasten restores the VM disk (PVC), VM definition, networking config, and associated secrets. Application-consistent to the backup point-in-time — not just the disk.
Step 4
VM comes online on the target cluster. If restoring to a different namespace or cluster, Kasten handles remapping of storage and network references automatically.
Step 5
Restore verified. Policy compliance and audit log available in Kasten dashboard. Immutable backup copies on object storage remain intact for compliance retention.
Best used for: Accidental deletion, ransomware, application corruption, cross-cluster workload mobility, and DR where application-layer consistency is required.
These two mechanisms are complementary, not duplicative. SnapMirror protects storage at the infrastructure layer continuously and is the primary tool for site-level failover. Kasten protects applications at the platform layer on a schedule and is the primary tool for precision recovery, mobility, and compliance. Most production environments run both.
Migration Flow — From Your Current Platform to OpenShift Virtualization
This is not a rip-and-replace. It is a phased transition that keeps existing workloads running at every step. The goal is to land VMs on the new platform, stabilize operations, and give teams the time and space to adopt cloud-native tooling at their own pace.
No New Hardware to Start
OpenShift Virtualization runs on your existing servers. No forklift. No waiting on procurement. Hardware refresh happens on your existing lifecycle, not as a prerequisite.
Two valid destinations. You choose.
Migrate & Stabilize
VMs land on OpenShift. Your team manages them the same way they do today — console, CLI, familiar workflows. The platform is new. The day-to-day is not.
Migrate & Modernize
VMs land on OpenShift. Over time, teams adopt GitOps, automated secrets, and container-based workloads alongside their VMs. One platform, evolving at your pace.
Day 1
your VMware admins can manage migrated VMs using familiar workflows. Cloud-native tooling is adopted progressively, not on day one.
Free and paid training is available through Red Hat. Admins do not need to learn everything before the first VM migrates.
Phase 01
Assess
Who does this: Your VMware admins + Red Hat architects
Inventory every VM: operating system, CPU/memory footprint, storage dependencies, network topology, and application owner. Identify which VMs have hard dependencies on vSphere features (vMotion, vSAN, snapshots) and map those to OpenShift equivalents. Sequence the migration order — start with low-risk, non-production workloads to build team confidence before touching anything mission-critical.
Migration Toolkit for Virtualization (MTV)
VM inventory and dependency mapping
Network topology analysis
vSphere feature gap analysis
Output: prioritized migration wave plan
Phase 02
Prepare
Who does this: Platform + storage + security teams together
Build the landing zone before migrating anything. This means configuring OpenShift Virtualization, defining StorageClasses in Trident CSI that match your Tier 1 and Tier 2 storage, mapping existing VLANs to NetworkAttachmentDefinitions, and setting up namespace structure and RBAC so each application team owns their space. Bootstrap Vault for secrets, stand up ArgoCD for GitOps, and configure Kasten K10 backup policies before the first VM lands — so protection is in place from day one.
NetApp Trident CSI StorageClasses
NetworkAttachmentDefinitions (VLAN map)
Vault secrets bootstrap
Kasten K10 backup policies pre-configured
Output: hardened landing zone, ready to receive VMs
Phase 03
Migrate
Who does this: Primarily your existing VMware admins via MTV
Run MTV migration plans against each wave. Disk images are converted (V2V), transferred to DataVolumes on NetApp storage, and VMs are started as KubeVirt VMs. Each availability zone can run migration pipelines in parallel to reduce the total migration window. VMs are validated against the original before cutover. Source VMs remain running until cutover is confirmed.
MTV migration plans per wave
Cold — offline conversion, lowest risk
Warm — background copy, brief cutover
Hot — live replication, near-zero downtime
Output: VMs running on OpenShift, source decommissioned per wave
Phase 04
Operate & Evolve
Who does this: Your existing teams, on their timeline
VMs are now on OpenShift. Day-2 operations begin. Your admins use the OpenShift console to manage VMs — the experience is recognizable. Kasten K10 is backing everything up. Elastic is collecting logs and metrics. Over time — months, not days — teams adopt GitOps for VM lifecycle, Vault for secrets, and begin containerizing workloads where it makes sense. The platform does not force this transition. It enables it.
Kasten K10 — backup from day one
Elastic — unified observability
VM console management (familiar UX)
GitOps adoption at team's own pace
Output: stable operations, path to modernization open
Before
Your Current Environment
Where most organizations are today.
Traditional hypervisor (VMware vSphere)vCenter, vMotion, vSAN or external NFS
Datastores (NFS / VMFS)Storage managed separately from compute
Agent-based VM backupPer-VM policies, long RTO, hard to test
Manual operations & runbooksChanges take days, secrets in spreadsheets
Separate container platformDifferent team, different tooling, no shared plane
During
Transition Period — Both Worlds at Once
This phase is real and should be planned for. It is not a failure state.
VMs running on OpenShift VirtualizationManaged via OpenShift console — familiar interface
Backup via Kasten K10 from day oneProtection in place before source decommission
Some VMs still on legacy platformWave-based migration, not all at once
Teams learning GitOps and VaultTraining and adoption run in parallel with operations
Elastic observability across both platformsSingle pane during the transition period
After
Destination Environment
What the platform looks like fully realized.
OpenShift Virtualization (KubeVirt)VMs and containers on one platform, one team
NetApp Trident CSI DataVolumesStorage as code, consistent tiering, live migration
Kasten K10 namespace-aware backupApp-consistent, testable, auditable
GitOps-driven VM & app lifecycleVault secrets, ArgoCD, Ansible AAP
Elastic — unified SIEM & observabilitySingle pane for VMs, containers, and platform