Skip to content

Static Website Hosting — Technical Design Document

Overview

The StaticWebsite service kind lets users host a static website (HTML, CSS, JS, images) on a custom domain with automatic HTTPS via cert-manager + Traefik.

Dynamic runtimes (PHP, Node.js, Next.js) are out of scope. The catalog item is named "Static Website".

Architecture

Service Topology

local_id Kind Primary Hidden Purpose
site-pvc PVC No No Stores site files (served by Nginx)
server Deployment Yes No Nginx serving files from PVC
sidecar Deployment No No gRPC file access (visible for file browser)
sidecar-svc ClusterIpService No Yes Internal DNS for sidecar
server-svc ClusterIpService No Yes ClusterIP for Nginx (port 80)
tls-cert TlsCertificate No No cert-manager Certificate CRD
ingress IngressRoute No No Traefik IngressRoute referencing TLS secret

Certificate Management

The TlsCertificate provision kind maps 1:1 to a cert-manager Certificate CRD:

apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
  name: r-{provision_id}
  namespace: {cluster_namespace}
spec:
  secretName: tls-{provision_id}
  issuerRef:
    name: letsencrypt-prod
    kind: ClusterIssuer
  dnsNames:
    - example.com
    - www.example.com   # optional

Traefik's IngressRoute references that secret:

tls:
  secretName: tls-{provision_id}

HTTP-01 Challenge

HTTP-01 is preferred: - No DNS API credentials required on our side - User only needs to point their domain's A record at the cluster's public IP - cert-manager automatically handles renewal (30 days before expiry)

ClusterIssuer Prerequisite

A ClusterIssuer named letsencrypt-prod must exist in the cluster:

apiVersion: cert-manager.io/v1
kind: ClusterIssuer
metadata:
  name: letsencrypt-prod
spec:
  acme:
    server: https://acme-v02.api.letsencrypt.org/directory
    email: certs@zaroz.cloud
    privateKeySecretRef:
      name: letsencrypt-prod-account-key
    solvers:
      - http01:
          ingress:
            class: traefik

Domain Model Changes

OrderServiceConfig

pub struct WebsiteConfiguration {
    pub domain: String,
    pub include_www: bool,
}

ProvisionConfig

TlsCertificate(TlsCertificateConfig),

pub struct TlsCertificateConfig {
    pub domain: String,
    pub include_www: bool,
    pub tls_secret_name: String,   // always "tls-{provision_id}"
}

IngressRouteConfig gains:

pub custom_tls_secret: Option<String>,  // None = use shared wildcard

Setup Flow

DomainInput → DnsInstructions → Confirm

Step 1 — DomainInput: domain (validated: no https://, no path) + include_www checkbox.

Step 2 — DnsInstructions: Shows copy-paste DNS instructions:

Point your domain's A record to: {CLUSTER_PUBLIC_IP}

CLUSTER_PUBLIC_IP is read from env via config.rs.

Step 3 — Confirm: Summary → "Confirm & Create".

Provision Status and the Certificate

cert-manager state DB status
Not yet requested Provisioning
Ready: True Running
Ready: False with error Failed

The TlsCertificate provision is not primary, so the order's primary status (Nginx deployment) transitions independently. The dashboard shows per-provision status so the user can see the certificate state separately.

Blueprint Provisioning Order

PVC → Deployment → Services → IngressRoute → TlsCertificate (last)

cert-manager's HTTP-01 solver needs the IngressRoute to be active before the challenge is issued. Always list tls-cert last in the blueprint.

Dashboard UI

  1. Domain — status badge: "HTTPS Active" (green), "Pending Certificate" (yellow), "Certificate Error" (red)
  2. File Manager — same component as game servers
  3. Start / Stop — toggles the Nginx deployment
  4. DNS Setup instructions — persistent info card until certificate becomes Running

Implementation Phases

Phase What
1 Domain model (CatalogServiceKind, WebsiteConfiguration, TlsCertificateConfig, IngressRouteConfig update, FlowType)
2 Setup flow (WebsiteSetupFlow)
3 Blueprint (WebsiteBlueprint)
4 Infrastructure (apply_tls_certificate, delete_tls_certificate, get_certificate_status, SyncProvisionStatusTask update)
5 Catalog entry
6 Dashboard UI

Open Questions and Risks

1. HTTP-01 challenge reachability: Traefik's port 80 must be reachable before the certificate is issued. Verify the cluster's HTTP entrypoint is not globally redirecting to HTTPS.

2. Rate limits: Let's Encrypt limits to 50 certificates per registered domain per week. Use letsencrypt-staging in dev environments.

3. Certificate failure visibility: When cert-manager fails to issue a certificate, the provision goes to Failed. Consider adding certificate_message: Option<String> to the provision view DTO to surface the raw condition message.

4. Domain ownership: No validation is done before provisioning. cert-manager's HTTP-01 challenge handles this implicitly. If the challenge fails, the user must correct DNS and retry.

5. www redirect: If include_www is true, both example.com and www.example.com are in the certificate and IngressRoute match rule. The user must set up both DNS records.