System Architecture
Every layer of the stack has a defined primary, fallback, and emergency path. The architecture is designed for <60-second failover at the DNS layer with zero code changes required during an incident.
- Cloudflare DNSALL 4 domains proxied. Orange cloud = <60s failover. Auto SSL included.
- Sub-60s PropagationCloudflare proxy eliminates TTL wait β DNS changes are near-instant.
- Auto SSL + DDoSTLS managed automatically. 300+ edge PoPs for attack mitigation.
- Better Uptime30-second HTTP checks on all 6 platforms. SMS + email within 30s of failure.
- Checkly Synthetics4 browser + API checks. Appointments booking, Referrals API, Reservations confirm code, Reservations Landing Page HTTP.
- status.virely.coPublic status page. Merchants can self-check during incidents.
- VercelPrimary for all 6 platforms. Next.js native. Global CDN. Auto-deploy from GitHub main.
- 3 Customer PlatformsReferrals, Appointments, Reservations App β all on custom domains via Cloudflare.
- 2 Internal PlatformsVirelyBooks + Virely Nexus. Manual-deploy only. No auto-deploy to production.
- Cloudflare PagesStandby sites for all 3 Dealsby platforms. Auto-synced on every prod deploy via GitHub Actions.
- standby.dealsby*.comPre-wired CNAME records ready to switch in <60 seconds. Verified and tested monthly.
- PWA Service WorkerReservations App caches floor plans + confirm codes for offline staff access.
- Cloudflare R2virely-backups bucket. Build archives (90-day retention) + nightly Supabase exports (36-month).
- Supabase PITRPro plan: 7-day point-in-time recovery. Target RPO <1 hour post-upgrade.
- Bitwarden VaultAll secrets. 10 folders. 2FA + quarterly rotation. Encrypted offline export monthly.
- AWS AmplifyPre-configured apps for all 3 Dealsby platforms. <72-hour force migration target.
- Migration Runbookforce-migration.sh script. Tested quarterly in staging. DNS switch documented per domain.
- Reversion ProcedureDocumented platform-by-platform revert path back to Vercel after incident resolution.
Platform Coverage
Every platform has a defined primary stack, static fallback, data backup layer, and emergency access path. All six platforms are covered.
- PrimaryVercel
- FallbackCF Pages (<60s)
- MigrationAWS Amplify
- DatabaseSupabase Pro + PITR
- MonitorBetter Uptime + Checkly API
- PrimaryVercel (SSR)
- FallbackCF Pages + OutageBanner
- MigrationAWS Amplify (SSR)
- DatabaseSupabase Pro + Read Replica
- MonitorBooking flow + Stripe render
- PrimaryVercel (Next.js)
- FallbackCF Pages (baked floor plans)
- OfflinePWA Service Worker
- DatabaseSupabase Pro + PITR
- MonitorConfirm code API check (daily)
- PrimaryVercel (static)
- FallbackCF Pages (<60s)
- MigrationAWS Amplify
- DatabaseNone (static content)
- MonitorBetter Uptime (30s check)
- PrimaryVercel (locked prod)
- EmergencyLocal HTML file
- ArchiveR2 signed URL (Bitwarden)
- DatabaseSupabase Pro + RLS
- DeployManual only β no auto-deploy
- PrimaryVercel
- EmergencyLocal HTML + Supabase Studio
- ArchiveR2 + Weekly Journey JSON export
- DatabaseSupabase Pro + RLS
- Campaignssend_log retry queue
Recovery Objectives
RTO = Recovery Time Objective (how fast we're back online). RPO = Recovery Point Objective (maximum acceptable data loss). All times are maximum targets β actual recovery is typically faster.
| Platform | Type | RTO β Partial Outage | RTO β Extended Outage | RTO β Force Migration | RPO (Max Data Loss) |
|---|---|---|---|---|---|
| Dealsby Referrals | Customer | <5 min | <30 min | <10 hours | 24 hours |
| Dealsby Appointments | Customer | <5 min | <2 hours | <10 hours | 24 hours |
| Dealsby Reservations App | Customer | <5 min | <5 min (PWA) | <10 hours | 24 hours |
| Dealsby Reservations Landing Page | Customer | <5 min | <5 min (CF Pages) | <6 hours | N/A (static) |
| VirelyBooks | Internal | <10 min | <10 min (R2 URL) | <2 hours | 24 hours |
| Virely Nexus | Internal | <10 min | <10 min (R2 URL) | <12 hours | 24 hours |
Incident Response Playbooks
Three scenarios. Precise time-boxed actions. Automated where possible β manual steps only where necessary. Execute strictly in order.
-
T+0:00Better Uptime alert fires β SMS + email to Brent Wright (CEO) within 30 seconds of first failed checkAUTO
-
T+0:02Check vercel-status.com β identify outage scope (edge network, builds, specific region, or specific service)Brent Wright (CEO)
-
T+0:05Check status.supabase.com β rule out database layer as root cause before activating fallbackBrent Wright (CEO)
-
T+0:07If Vercel confirms degradation β post internal comms: "Investigating platform performance issue"Brent Wright (CEO)
-
T+0:10Open Cloudflare DNS for affected domain β edit primary CNAME β switch target from cname.vercel-dns.com to CF Pages URL β Save. Verify in incognito tab.CF DNS
-
T+0:12Verify CF Pages standby loads in incognito browser tab β check all links and primary functionalityBrent Wright (CEO)
-
T+0:15If outage exceeds 15 min β send merchant notification email using Template A from this BCP documentBrent Wright (CEO)
-
T+0:25Once Vercel reports resolved β verify production build is healthy, then revert Cloudflare DNS CNAME back to VercelCF DNS
-
T+0:30Log incident: date, time, duration, platforms affected, actions taken, resolution. File in Virely ops incident log.Brent Wright (CEO)
-
T+0:00Scenario A steps 1β5 already executed. All 3 Dealsby platforms now serving from CF Pages standbyAUTO
-
T+0:35Pause all Nexus email/SMS campaigns β prevent delivery during degraded state. Log paused campaigns to SupabaseBrent Wright (CEO)
-
T+0:45Open VirelyBooks via R2 signed URL (from Bitwarden β Virely / R2 β Stable Build URLs). Verify team can access financial data β Build 14 v2.12Brent Wright (CEO)
-
T+0:45Open Virely Nexus via R2 signed URL + Supabase Studio for direct CRM data access without frontendBrent Wright (CEO)
-
T+1:00Send extended outage merchant email (Template B) β all active merchants across all 3 Dealsby platformsBrent Wright (CEO)
-
T+1:00File Vercel Priority 1 support ticket with incident timeline, business impact, and SLA referenceBrent Wright (CEO)
-
T+1:30If extended outage: trigger Amplify build for Dealsby Appointments (most dynamic β Stripe + SSR) via aws amplify start-job CLI commandDev
-
T+2:30Run Checkly synthetic tests against all 6 platform URLs β all must pass before declaring full recoveryCheckly
-
T+4:00Revert DNS platform-by-platform (not all at once). Verify each loads from Vercel before switching next domainCF DNS
-
T+4:30Incident debrief: total downtime, revenue impact estimate, SLA breach assessment, corrective actions for ops logBrent Wright (CEO)
- Hour 0Execute Scenario B immediately. Activate CF Pages fallback for all 3 Dealsby platforms (<60 seconds each)Brent Wright (CEO)
- Hour 1Pull latest builds from R2: aws s3 sync s3://virely-backups/builds/ ./migration-builds/ --endpoint-url $R2_ENDPOINTDev
- Hour 2β5Trigger Amplify builds for all 3 platforms via CLI. Monitor in AWS Amplify Console. Update Stripe webhook temporarily to Amplify URL during migration.Dev
- Hour 6β8Verify all Amplify builds pass. Run full Checkly regression against Amplify URLs. All checks must be green.Dev
- Hour 8β10Switch Cloudflare DNS for all 3 domains to AWS Amplify CNAMEs. Verify each domain loads from Amplify. Revert Stripe webhook to custom domain (no change needed if DNS is switched).CF DNS
- Hour 10+Monitor for 2 hours. Run Better Uptime spot-check. Send merchant resolution email. Log full incident report.Brent Wright (CEO)
- Hour 72All platforms confirmed stable on Amplify. Update BCP with new primary hosting designation. Keep Amplify as new primary β evaluate Vercel reinstatement separately.Brent Wright (CEO)
Backup Schedule
Fully automated where possible. Manual tasks are minimal and time-boxed. Color key: Green = automated Β· Blue = Brent Wright (CEO) Β· Purple = developer
- βSupabase pg_dump β R2 (all 5 DB projects, 2:00 AM EST)
- βCF Pages standby sync (on every prod deploy)
- βBuild archive β R2 (on every prod deploy)
- βUptime checks every 30 seconds (Better Uptime)
- βSynthetic flow tests (Checkly, 9:00 AM EST)
- βExpired DSBY confirm code anonymization
- βSchema DDL export β R2 (Sunday 3:00 AM UTC)
- βNexus journey JSON export β R2
- βSpot-check R2 backup receipts (Monday)
- βReview Better Uptime weekly report
- βReview Checkly dashboard results
- βDNS failover drill (1 platform, rotate monthly)
- βReview Vercel billing vs. baseline
- βAudit Vercel team access β remove inactive accounts
- βVerify domain expiry dates (all 4 domains)
- βExport encrypted Bitwarden vault to offline
- βRotate all secrets (Supabase, Stripe, SG, Twilio)
- βFull Amplify migration test in staging
- βReview & update this plan
- βSupabase RLS policy audit (anon access test)
- βAnnual full DR simulation (half-day)
- βPrivacy policy & data retention review
Backup Vendor Stack
Eight vendors forming an interlocking resilience network. Primary infrastructure cost: ~$80β140/month. Vendors are selected for cost efficiency and zero-lock-in.
- 300+ edge PoPs
- Sub-60s propagation
- DDoS protection
- Auto SSL
- Unlimited bandwidth
- Global CDN
- Git auto-deploy
- Zero cold start
- S3-compatible API
- $0 egress fees
- 90-day build archive
- 36-month DB retention
- SSR + Static
- CloudFront CDN
- Git CI/CD
- Pre-configured builds
- PITR 7-day window
- Read replicas
- PgBouncer pooling
- Row Level Security
- 30s check interval
- SMS + email alerts
- On-call escalation
- Public status page
- Playwright browser tests
- API assertions
- 3 checks active
- Global test locations
- 2FA enforced
- 10 org folders
- Quarterly rotation
- Encrypted export
Vendor Setup Guides
Step-by-step configuration for each vendor. Complete in the order shown β each guide links to the full interactive walkthrough in the Vendor User Flow document.
- Create Cloudflare account at cloudflare.com using brent@virely.co
- Add each domain: + Add a Site β Free plan. Domains: dealsbyreferrals.com, dealsbyappointments.com, dealsbyreservations.com, virely.co. The Reservations Landing Page is served under dealsbyreservations.com and is covered by the same Cloudflare zone.
- Cloudflare auto-scans DNS. Verify all A, CNAME, MX, TXT records match registrar
- Update nameservers in domain registrar to Cloudflare nameservers (provided per domain)
- Enable orange cloud proxy on all A/CNAME records in DNS tab
- Add standby CNAME records for each domain (Name: standby, Target: placeholder β update in M2)
- Note Zone IDs from Overview tab β store in Bitwarden under Virely / Cloudflare
- Set domain expiry reminders in Google Calendar (60 / 30 / 7 days before expiry)
- Create account at betteruptime.com using brent@virely.co. Free plan covers 5 monitors β upgrade to Starter ($20/mo) for the 6th (Reservations Landing Page).
- Monitors β + New Monitor β Website β enter URL β check every 30 seconds. Create for all 6 platform URLs (including Reservations Landing Page).
- Account Settings β Notification Channels β Add SMS (Brent Wright's mobile) + Add Email (brent@virely.co)
- Apply both channels to all monitors
- Alerting β On-Call β New Schedule β add Brent Wright (CEO) β escalate after 5 min if unacknowledged
- Status Pages β New β name = "Virely Platform Status" β subdomain = status.virely.co
- Add status.virely.co CNAME in Cloudflare DNS pointing to betteruptime.com CNAME
- Test: temporarily point one monitor to dead URL. Confirm SMS + email within 60 seconds. Revert.
- Create account at checklyhq.com β Hobby free plan (10K checks/month)
- Install CLI: npm install -g checkly
- Check 1 β Appointments Booking Flow: Browser Check β Playwright script β validates book-btn, date-picker, Stripe element render β daily 9:00 AM EST
- Check 2 β Referrals Tracking API: API Check β GET dealsbyreferrals.com/api/track β expect 200 β assert body.status === 'ok' β every 6 hours
- Check 3 β Reservations App Confirm Code: API Check β GET dealsbyreservations.com/api/confirm-lookup β expect 200 β daily
- Check 4 β Reservations Landing Page: HTTP Check β GET dealsbyreservations.com/landing β expect 200 + HTML body contains "Dealsby Reservations" β daily
- Alert routing: all 3 checks β SMS + email after 1 failed run with 2 retries
- Create account at bitwarden.com β brent@virely.co β strong master password
- Enable 2FA: Account Settings β Security β Two-step Login β Authenticator App. Save recovery code offline.
- Install desktop, browser extension, and mobile app
- Create 10 folders: Virely / Vercel, Supabase, Stripe, SendGrid, Twilio, Cloudflare, AWS, Domain Registrar, Monitoring, R2
- Enter all credentials. For Supabase: one entry per project with anon key, service role key, and project URL.
- Create Secure Note per Vercel project: <Platform> β Env Vars with all env var names and values
- Set monthly calendar reminder (first Monday): export encrypted vault to offline storage
- Set quarterly reminder: rotate Supabase, Stripe, SendGrid, and Twilio secrets
- Pages β Create a project β Connect to Git β authorize GitHub β select repo β main branch
- Build settings: Framework = Next.js β Build: npm run build β Output: out (requires output: 'export' in next.config.js)
- Add all NEXT_PUBLIC_ env vars from Bitwarden Secure Notes. Do NOT add server-side secrets.
- Save and Deploy. Wait for first build to complete.
- Custom Domains β Add standby.dealsby[platform].com. Cloudflare auto-creates DNS record.
- Update standby CNAME in Cloudflare DNS from placeholder to Pages URL (dealsby-[platform]-standby.pages.dev)
- Add backup-sync.yml GitHub Action to each repo: builds β deploys to CF Pages β archives to R2 on every push to main
- Run manual failover test. Target: <60 seconds from DNS edit to verified load. Revert and log time.
- Cloudflare β R2 β Enable R2. Create bucket: virely-backups β Location: Automatic.
- Create R2 API token: My Profile β API Tokens β Custom β R2 Storage: Edit β bucket: virely-backups. Store in Bitwarden.
- Configure AWS CLI for R2: aws configure --profile r2 (Region: auto, Format: json)
- Upload all stable builds manually: VirelyBooks Build 14, Nexus v2.0, Reservations App V2 Build 2.3
- Generate signed URLs (expiry: 1 year) for all stable builds. Store in Bitwarden: Virely β R2 Stable Build URLs.
- Deploy Supabase Edge Function nightly-backup to all 5 DB projects. Schedule via pg_cron at 07:00 UTC (2:00 AM EST). The Reservations Landing Page is static β no DB backup required.
- Deploy Cloudflare Worker cleanup-old-builds.js: deletes archives >90 days, Supabase exports >36 months. Cron: Sundays 3:00 AM UTC.
- Upgrade order: Appointments β Nexus β Reservations App β Referrals β VirelyBooks. Settings β Billing β Upgrade to Pro.
- Enable Connection Pooling (PgBouncer): Settings β Database β Connection Pooling β Transaction mode. Update DATABASE_URL in Vercel env vars. Redeploy.
- Enable Point-in-Time Recovery: Settings β Backups β PITR β Enable. 7-day window activates within 24 hours.
- Enable Read Replica for Appointments: Settings β Infrastructure β Read Replicas β Add β us-east-1.
- Run RLS SQL: enable on all tables, merchant isolation policies, Virely team-only policies for internal platforms.
- Test anon access: SET LOCAL ROLE anon; SELECT * FROM chart_of_accounts LIMIT 1; β expect 0 rows.
- Deploy nightly-backup Edge Function. Schedule via pg_cron. Test manually and verify R2 file appears.
- Create AWS account at aws.amazon.com β brent@virely.co β Basic Support (free)
- Create IAM user: virely-amplify β AWSAmplifyFullAccess β create access key β Application outside AWS. Store in Bitwarden / AWS.
- Install & configure AWS CLI: aws configure --profile virely-amplify (region: us-east-1)
- For each platform: Amplify β New App β Host Web App β GitHub β select repo β add amplify.yml to repo root
- Add all env vars from Bitwarden Secure Notes to each Amplify app. Include both NEXT_PUBLIC_ and server-side vars.
- Add Amplify domain verification CNAMEs to Cloudflare DNS (gray cloud β do NOT change primary records).
- Trigger test builds for all 3 apps. All must complete successfully. Document any build errors.
- Run full staging migration test. Verify all Checkly checks pass against Amplify URLs. Store all App IDs in Bitwarden / AWS.
Implementation Milestones
98 hours Β· $1,470 total Β· ~12 weeks part-time. Milestones are sequential β each builds on the previous. Milestones 2β5 may be parallelized with two developers (6β8 weeks).
- Cloudflare DNS transfer (all 4 domains) + proxy enabled
- Standby CNAME records pre-created (placeholders)
- Better Uptime: 6 monitors (Starter plan for 6th), SMS + email, status.virely.co
- Bitwarden vault: 2FA, 10 folders, all credentials + env vars
- Checkly: 3 synthetic checks (booking flow, tracking API, confirm code)
- Vercel team audit + billing alerts at 50/75/100%
- Stripe webhook confirmed on custom domains, tested via CLI
- Domain expiry reminders set for all 4 domains
- Cloudflare R2 bucket + folder structure + API token
- All stable builds uploaded to R2 manually + signed URLs in Bitwarden
- CF Pages projects for all 3 Dealsby platforms + Reservations Landing Page + custom standby domains
- OutageBanner component (Appointments) + floor plan bake script (Reservations App)
- GitHub Actions backup-sync.yml on all 3 repos
- Manual failover test: all 3 domains, timed, target <60 seconds
- R2 cleanup Worker deployed (90-day build archive auto-delete)
- All 5 Supabase DB projects upgraded to Pro (Landing Page: no DB)
- PgBouncer + PITR enabled. Vercel env vars updated + redeployed
- Read replica provisioned for Dealsby Appointments
- RLS enabled on all tables across all 5 DB projects β anon access test passed
- Nightly backup Edge Function deployed + pg_cron scheduled (5 DB platforms)
- Nexus journey export weekly to R2 via GitHub Actions
- send_log table + SendGrid/Twilio retry queue logging
- AWS account + IAM user virely-amplify + AWS CLI configured
- Amplify apps created for all 3 Dealsby platforms β test builds passed
- amplify.yml committed to all 3 repos
- Amplify domain verification CNAMEs added to Cloudflare DNS
- force-migration.sh runbook created and tested in staging
- All Checkly checks green on Amplify URLs in staging test
- Post-migration reversion procedure documented
- GDPR export + delete/anonymize endpoints on all 3 Dealsby platforms + cookie consent on Landing Page
- Cookie consent banners live on all 3 platforms
- Vercel functions pinned to iad1 (US East) region
- Service Worker offline mode deployed to Dealsby Reservations App (PWA)
- MRR aggregation + Nexus campaign scheduler migrated to Supabase Edge Functions
- Annual DR simulation completed β all tasks within target times
- Data retention pg_cron jobs scheduled (36-month appointments, 30-day confirm codes)
Post-Implementation Operations
Once all 5 milestones are complete, the system runs largely on autopilot. Ongoing developer involvement is minimal and time-boxed.
| Task | Frequency | Owner | Est. Time |
|---|---|---|---|
| Verify nightly R2 backup receipts | Weekly (Monday) | Brent Wright (CEO) | 10 min |
| Review Better Uptime weekly report | Weekly (Monday) | Brent Wright (CEO) | 10 min |
| Review Checkly dashboard | Weekly (Monday) | Brent Wright (CEO) | 10 min |
| Execute monthly DNS failover drill | Monthly (1st Mon) | Brent Wright (CEO) | 30 min |
| Review Vercel billing vs. baseline | Monthly (1st Mon) | Brent Wright (CEO) | 15 min |
| Audit Vercel team access | Monthly (1st Mon) | Brent Wright (CEO) | 15 min |
| Update Checkly scripts after UI changes | Each major deploy | Developer | 30β60 min |
| Secret rotation (all vendors) | Quarterly | Brent Wright (CEO) + Developer | 2β3 hrs |
| Full Amplify migration runbook test (staging) | Quarterly | Developer | 4β6 hrs |
| Review & update this plan | Quarterly | Brent Wright (CEO) | 1β2 hrs |
| Annual DR simulation (half-day exercise) | Annually | Brent Wright (CEO) + Developer | Half-day |
| Privacy policy & data retention review | Annually | Brent Wright (CEO) | 2β3 hrs |
Estimated Ongoing Developer Cost
| Activity | Hours / Year | Cost / Year @ $15/hr |
|---|---|---|
| Quarterly migration runbook tests (4 Γ 5 hrs) | 20 hrs | $300 |
| Quarterly secret rotation (4 Γ 1 hr developer time) | 4 hrs | $60 |
| Checkly script updates (est. 6 major deploys Γ 1 hr) | 6 hrs | $90 |
| Annual DR simulation developer time | 4 hrs | $60 |
| Ad-hoc bug fixes and system updates | 10 hrs | $150 |
| TOTAL ongoing developer cost | 44 hrs/year | $660/year |