TKGM PR-DR SITE ON VCLOUD DIRECTOR ARCHITECURE

TKGM PR-DR SITE ON VCLOUD DIRECTOR ARCHITECURE

TKGM PR-DR SITE ON VCLOUD DIRECTOR ARCHITECURE

You build:

vSphere + vCD + NSX-T + CSE on both sites.
You deploy TKGm clusters on primary.
You set up Velero to back up YAMLs and volumes.
You mirror Harbor registry to DR.
You test restoring a cluster on DR site using CSE + Velero.
You prepare DNS (manual or automated) to point to DR when needed.

Primary & DR Site Layer Comparison Table

Layer	Component	Primary Site	DR Site	What Happens During DR?	Notes / Tools
1️⃣	Infrastructure	vSphere (ESXi, vCenter)	Same setup	DR vSphere takes over	Ensure hardware compatibility
2️⃣	Networking	NSX-T	Same NSX-T setup	DR NSX routes traffic	Replicate NSX segments, edge configs
3️⃣	Cloud Management	vCloud Director	vCloud Director	DR vCD deploys new VMs	Must sync templates across sites
4️⃣	K8s Provisioning	CSE (TKGM enabled)	CSE (same version)	DR CSE deploys TKGm cluster	Sync catalog/templates
5️⃣	Kubernetes Cluster	TKGm Cluster (Running)	TKGm Cluster (Rebuilt)	Apps are restored on DR cluster	Use Velero / GitOps to restore
6️⃣	Persistent Storage (PV)	CSI Volumes / Datastore	Restored from backup or replication	Apps regain their data	Use Velero+Restic, Zerto, or vSphere Replication
7️⃣	Container Images	Harbor Registry	Mirror / Backup Harbor	DR cluster pulls same images	Enable Harbor replication between sites
8️⃣	K8s Configs / YAMLs	GitOps (Flux / ArgoCD) or Velero	Same tools	Re-apply YAMLs in DR	Use Git source or Velero backup
9️⃣	DNS Failover	DNS entry points to primary	DNS updated to DR IP	DNS points to DR cluster ingress	Use manual switch or automated failover (Route53, Cloudflare)

🥶 Cold vs 🔥 Hot Standby Table

Type	What It Means	Pros	Cons	When to Use
🧊 Cold Standby	DR site is ready but not running TKGm	Cheaper	Slow failover (10–60 mins)	Most common, low-cost DR
🔥 Hot Standby	DR cluster runs live + in sync	Fast failover	High cost, complexity	For mission-critical workloads

🌐 DNS Redirection Table

Method	Description	Tools	Speed	Recommended When
🛠️ Manual DNS Switch	You change DNS IP after failover	GoDaddy, Cloudflare, etc.	Slow (few minutes)	OK for small/low-impact apps
⚙️ Automated Failover	Health check + switch IP automatically	Route53, NS1, F5 GSLB	Fast (seconds to 1 min)	Critical apps needing <1 min downtime

Comments