Customer Backups¶
Context¶
Users can create, download, restore, and delete backups of their game server data. Backups are scoped to primary Deployment provisions that have a sidecar (Minecraft, FiveM, etc.) — the sidecar compresses/extracts via gRPC. Archives are written to the sidecar pod's /tmp (not the customer's data PVC) so a crash during backup leaves no stale files on the customer's volume.
Step 1: Sidecar proto — add ExtractTo¶
message ExtractToReq {
string source = 1; // e.g. "/tmp/restore-{id}.tar.gz"
string destination = 2; // e.g. "/data"
}
service FileIO {
// ... existing RPCs ...
rpc ExtractTo(ExtractToReq) returns (Empty);
}
Sidecar implementation: tar -xzf source -C destination.
Step 2: Migration¶
CREATE TABLE backups (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
order_id UUID NOT NULL REFERENCES orders(id) ON DELETE CASCADE,
provision_id UUID NOT NULL REFERENCES provisions(id) ON DELETE RESTRICT,
name TEXT NOT NULL,
status TEXT NOT NULL DEFAULT 'creating',
asset_id UUID REFERENCES assets(id),
size_bytes BIGINT,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
completed_at TIMESTAMPTZ
);
ON DELETE RESTRICT on provision_id prevents hard-deleting a provision row while backups exist.
Step 3: Domain¶
pub enum BackupStatus { Creating, Ready, Failed, Restoring }
pub struct Backup {
id: BackupId,
order_id: OrderId,
provision_id: ProvisionId,
name: String,
status: BackupStatus,
asset_id: Option<AssetId>,
size_bytes: Option<i64>,
created_at: DateTime<Utc>,
completed_at: Option<DateTime<Utc>>,
}
pub trait BackupRepository: Interface {
async fn save(&self, backup: &Backup) -> exn::Result<Backup, RepositoryError>;
async fn find_by_id(&self, id: &BackupId) -> exn::Result<Option<Backup>, RepositoryError>;
async fn find_by_provision(&self, provision_id: &ProvisionId) -> exn::Result<Vec<Backup>, RepositoryError>;
async fn delete(&self, id: &BackupId) -> exn::Result<(), RepositoryError>;
async fn count_by_provision(&self, provision_id: &ProvisionId) -> exn::Result<i64, RepositoryError>;
async fn has_active_operation(&self, order_id: &OrderId) -> exn::Result<bool, RepositoryError>;
}
Step 4: Application service¶
pub enum BackupServiceError {
OrderNotFound, ProvisionNotFound, BackupNotFound,
NoSidecar,
OperationInProgress,
LimitReached, // >= 10 backups for this provision
StatusConflict,
Unauthorized,
Storage(String), Cluster(String), Database(RepositoryError),
}
pub trait BackupService: Interface {
async fn create_backup(&self, order_id, provision_id, user_id, name) -> Result<BackupId, BackupServiceError>;
async fn list_backups(&self, order_id, provision_id, user_id) -> Result<Vec<Backup>, BackupServiceError>;
async fn get_backup(&self, backup_id, order_id, user_id) -> Result<Backup, BackupServiceError>;
async fn delete_backup(&self, backup_id, order_id, user_id) -> Result<(), BackupServiceError>;
async fn restore_backup(&self, backup_id, order_id, user_id) -> Result<(), BackupServiceError>;
}
create_backup flow¶
- Load order + provision, verify owner or
ViewAllOrders - Verify provision:
kind=Deployment,is_primary=true - Find sibling sidecar provision —
NoSidecarif absent has_active_operation(order_id)→OperationInProgresscount_by_provision(provision_id)>= 10 →LimitReached- Create + persist
Backup { status: Creating }; returnBackupId tokio::spawn:- Connect to sidecar gRPC
client.compress_files(sources: ["/data"], output: "/tmp/backup-{id}.tar.gz", format: TARGZ)- Stream file from sidecar → upload to S3 as
backups/{order_id}/{backup_id}.tar.gz - Create
Assetrecord; update backup:status=Ready, asset_id, size_bytes - Delete
/tmp/backup-{id}.tar.gz(best-effort) - On error:
backup.status = Failed
restore_backup flow¶
- Verify
backup.status == Ready→ elseStatusConflict has_active_operation(order_id)→OperationInProgress- Set
status = Restoring, persist tokio::spawn:- Stream from S3 → write to
/tmp/restore-{id}.tar.gzon sidecar client.extract_to(source: "/tmp/restore-{id}.tar.gz", destination: "/data")- Delete
/tmp/restore-{id}.tar.gz(best-effort) - Set
status = Ready, persist - On error:
status = Failed(S3 backup still intact; user can retry)
delete_backup flow¶
- Verify ownership;
statusnot in(Creating, Restoring)→StatusConflict - Delete S3 object; delete asset; delete backup row
Step 5: Reprovision guard¶
reprovision_order_service.rs checks backup_repo.has_active_operation(order_id) before teardown. Returns 409 BackupInProgress.
Step 6: HTTP¶
Routes under /api/v1/orders/:order_id/provisions/:provision_id/backups:
| Method | Path | Description |
|---|---|---|
GET |
/ |
List backups |
POST |
/ |
Create backup { name?: string } |
DELETE |
/:backup_id |
Delete backup |
POST |
/:backup_id/restore |
Restore backup |
Download: reuse GET /api/v1/assets/:id (enforces OwnerOnly ACL, streams from S3).
pub struct BackupView {
pub id: String,
pub name: String,
pub status: String, // "creating" | "ready" | "failed" | "restoring"
pub asset_id: Option<String>, // set when ready; use /api/v1/assets/{id} to download
pub size_bytes: Option<i64>,
pub created_at: DateTime<Utc>,
pub completed_at: Option<DateTime<Utc>>,
}
Step 7: Dashboard¶
backups-content.tsx:
- Fetch list on mount
- Poll every 3s while any backup is creating or restoring
- Table: Name | Size | Status badge | Created | Actions
- Actions: Download (ready only), Restore (ready only), Delete (ready or failed only)
- "Create Backup" button → optional name modal → POST → refetch
- Disabled create when >= 10 backups or operation in progress
i18n keys¶
order_detail.section_backups
backups.heading, backups.create_button
backups.column_name, backups.column_size, backups.column_status, backups.column_created, backups.column_actions
backups.action_download, backups.action_restore, backups.action_delete
backups.status_creating, backups.status_ready, backups.status_failed, backups.status_restoring
backups.confirm_delete_title, backups.confirm_restore_title
backups.empty_state, backups.limit_reached, backups.error_in_progress