Aller au contenu principal

Export & Storage Architecture

Last Updated: 2026-01-29
Status: Production Ready

Overview

This document describes the architecture for file exports, storage management, and cleanup mechanisms in the aaperture platform.


Architecture Diagram

┌─────────────────────────────────────────────────────────────────────────────┐
│ FRONTEND │
├─────────────────────────────────────────────────────────────────────────────┤
│ EntityExportDialog │ ExportFormatDialog │ useEntityExport hook │
│ │ │ │ │
│ └────────────────────┴───────────────────────┘ │
│ │ │
│ API Call (GET /export or /export/async) │
└─────────────────────────────────────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ BACKEND │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ ExportController│───▶│ ExportService │───▶│ ExportCsvService│ │
│ │ (Rate Limited) │ │ │ │ ExportExcelSvc │ │
│ └─────────────────┘ └─────────────────┘ │ ExportPdfService│ │
│ │ │ └─────────────────┘ │
│ │ │ │ │
│ ▼ ▼ ▼ │
│ ┌─────────────────┐ ┌─────────────────┐ ┌─────────────────┐ │
│ │ MetricsService │ │ StorageService │ │ PythonPdfService│ │
│ │ (Prometheus) │ │ │ │ (WeasyPrint) │ │
│ └─────────────────┘ └─────────────────┘ └─────────────────┘ │
│ │ │
│ ▼ │
│ ┌─────────────────┐ │
│ │ StorageAccess │ │
│ │ Service │ │
│ │ (Access Control)│ │
│ └─────────────────┘ │
│ │ │
└────────────────────────────────┼────────────────────────────────────────────┘


┌─────────────────────────────────────────────────────────────────────────────┐
│ CLOUDFLARE R2 │
├─────────────────────────────────────────────────────────────────────────────┤
│ │
│ Temporary Files (TTL 24h) │ Permanent Files (No auto-delete) │
│ ───────────────────────── │ ──────────────────────────────── │
│ • exports/ │ • pdfs/quotes/ │
│ • pdfs/ai-reports/ │ • pdfs/invoices/ │
│ • temp/ │ • pdfs/contracts/ │
│ │ • documents/ │
│ │ • avatars/ │
│ │ • logos/ │
│ │
└─────────────────────────────────────────────────────────────────────────────┘

Security

1. Access Control on Signed URLs

Service: StorageAccessService (backend/src/storage/storage-access.service.ts)

All signed URL requests are validated against user ownership:

// In StorageController.getSignedUrl()
await this.storageAccessService.verifyAccessOrThrow(decodedKey, userId);
const expiration =
this.storageAccessService.normalizeExpiration(expirationSeconds);

Verification methods:

  • Key pattern matching (userId in path)
  • file_objects table lookup (org_id check)
  • session_files table lookup (session owner check)
  • contact_files table lookup (contact owner check)
  • documents table lookup (uploaded_by check)

Max expiration: 1 hour (3600 seconds) for most files, 1 week for specific use cases.

2. Access Control on FileObjects

Service: FileObjectsService (backend/src/file-objects/file-objects.service.ts)

Organization-based access verification:

async getSignedUrl(id: string, requestingUserId: string, expirationSeconds?: number): Promise<string> {
const fileObject = await this.findById(id);
await this.verifyUserAccess(fileObject, requestingUserId);
return this.storageService.getSignedUrl(fileObject.key, expirationSeconds);
}

3. Rate Limiting on Exports

Decorator: @RateLimit (backend/src/rate-limiting/rate-limiting.guard.ts)

// Synchronous exports: 10 requests per minute
@RateLimit({ keyPrefix: "export:sync", maxRequests: 10, windowMs: 60000 })

// Asynchronous exports: 5 requests per minute
@RateLimit({ keyPrefix: "export:async", maxRequests: 5, windowMs: 60000 })

File Storage

Unified File Objects

Service: EntityFilesService (backend/src/file-objects/entity-files.service.ts)

All files are stored in the file_objects table with polymorphic references:

interface FileObjectTable {
id: string;
org_id: string;
provider: string; // 'R2'
bucket: string;
key: string;
content_type: string;
size_bytes: number;
checksum: string | null;
created_at: Date;
created_by: string | null;
// Unified columns (migration 0158)
file_name: string | null;
file_url: string | null;
is_image: boolean | null;
updated_at: Date | null;
entity_type: string | null; // 'session', 'contact', 'export', 'quote', etc.
entity_id: string | null; // Polymorphic reference
}

Entity types:

  • session - Session attachments
  • contact - Contact attachments
  • export - Export files
  • quote - Quote PDFs
  • invoice - Invoice PDFs
  • contract - Contract PDFs
  • document - Uploaded documents
  • avatar - User avatars
  • logo - Company logos
  • ai-report - AI-generated reports

R2 Key Prefixes

Constants: backend/src/export/export.constants.ts

// Temporary files (cleaned after 24h)
export const TEMPORARY_FILE_PREFIXES = [
"exports/",
"pdfs/ai-reports/",
"temp/",
] as const;

// Permanent files (never auto-deleted)
export const PERMANENT_FILE_PREFIXES = [
"pdfs/quotes/",
"pdfs/invoices/",
"pdfs/contracts/",
"documents/",
"avatars/",
"logos/",
] as const;

Cleanup Mechanisms

1. Export Files Cleanup (Daily)

Scheduler: ExportCleanupScheduler (backend/src/export/export-cleanup.scheduler.ts)

  • Schedule: Every day at 4 AM (0 4 * * *)
  • Retention: 24 hours (configurable via EXPORT_RETENTION_HOURS)
  • Scope: Only TEMPORARY_FILE_PREFIXES
  • Timeout: 30 minutes max
@Cron("0 4 * * *")
async cleanupOldExportFiles() {
const deleted = await this.storageService.deleteExportObjectsOlderThan(
[...TEMPORARY_FILE_PREFIXES],
maxAgeHours,
);
this.metricsService?.recordCleanupJob("export_files", "success", duration, undefined, deleted);
}

2. Orphaned File Objects Cleanup (Weekly)

Scheduler: FileObjectsCleanupScheduler (backend/src/file-objects/file-objects-cleanup.scheduler.ts)

  • Schedule: Every Sunday at 3 AM (0 3 * * 0)
  • Scope: file_objects not referenced by documents or export_jobs
  • Min age: 7 days (to avoid deleting files being uploaded)
@Cron("0 3 * * 0")
async cleanupOrphanedFileObjects() {
// Find orphaned file_objects (no reference in documents or export_jobs)
// Delete R2 file + database record
}

3. Orphaned R2 Files Cleanup (Monthly)

Scheduler: FileObjectsCleanupScheduler

  • Schedule: 1st of every month at 4 AM (0 4 1 * *)
  • Scope: R2 files not referenced in any database table
  • Prefixes scanned: documents/, session-files/, contact-files/
  • Min age: 7 days
@Cron("0 4 1 * *")
async cleanupOrphanedR2Files() {
// Scan R2 prefixes
// Check if file exists in file_objects, session_files, or contact_files
// Delete if orphaned and older than 7 days
}

4. Avatar/Logo Replacement Cleanup

Location: StorageController (backend/src/storage/storage.controller.ts)

When a user uploads a new avatar or logo, the old file is automatically deleted:

async uploadAvatar(userId: string, file: Express.Multer.File) {
// Get old avatar URL
const user = await this.getUser(userId);

// Upload new avatar
const newUrl = await this.storageService.uploadFile({ ... });

// Delete old avatar if different
if (user.avatar_url && user.avatar_url !== newUrl) {
await this.deleteOldAvatarSafely(user.avatar_url);
}

return newUrl;
}

Performance Optimizations

1. Streaming for Large Exports

Service: ExportCsvService (backend/src/export/export-csv.service.ts)

For exports exceeding 5000 records, data is fetched and written in batches:

async exportToCSVWithPagination<T>(
columns: CsvColumn[],
fetchBatch: BatchFetcher<T>,
batchSize: number = 1000,
): Promise<Buffer> {
// Stream data in batches instead of loading all in memory
while (true) {
const batch = await fetchBatch(offset, batchSize);
if (batch.length === 0) break;
// Write batch to CSV stream
offset += batchSize;
}
}

2. Batch Deletion in Cleanup

Service: StorageFileManagementService (backend/src/storage/storage-file-management.service.ts)

Files are deleted in batches of 1000 to avoid memory issues:

async deleteExportObjectsOlderThan(prefixes: string[], maxAgeHours: number): Promise<number> {
const batchSize = 1000;
let batch: string[] = [];

for await (const obj of this.listObjectsByPrefix(prefix)) {
if (obj.lastModified < cutoff) {
batch.push(obj.key);
if (batch.length >= batchSize) {
await this.deleteBatch(batch);
batch = [];
}
}
}
// Delete remaining
if (batch.length > 0) await this.deleteBatch(batch);
}

Monitoring (Prometheus Metrics)

Service: MetricsService (backend/src/common/metrics/metrics.service.ts)

Export Metrics

MetricTypeLabelsDescription
exports_totalCounterentity, format, statusTotal exports performed
export_duration_secondsHistogramentity, formatExport duration
export_size_bytesHistogramentity, formatExport file size
export_errors_totalCounterentity, format, error_typeExport errors

Cleanup Metrics

MetricTypeLabelsDescription
cleanup_jobs_totalCounterjob_type, statusCleanup jobs executed
cleanup_duration_secondsHistogramjob_typeCleanup duration
cleanup_files_scanned_totalCounterjob_typeFiles scanned
cleanup_files_deleted_totalCounterjob_typeFiles deleted
cleanup_errors_totalCounterjob_type, error_typeCleanup errors

Usage

// Record export metrics
this.metricsService.recordExport("sessions", "csv", "success", 2.5, 1024000);

// Record cleanup metrics
this.metricsService.recordCleanupJob("export_files", "success", 120, 5000, 150);

Column Definitions

Constants: backend/src/export/export-columns.constants.ts

Centralized column definitions for all export formats:

export const SESSION_COLUMNS: ColumnDefinition[] = [
{
csvHeader: "Date",
excelHeader: "Date",
key: "start_date",
pdfHeader: "Date",
},
{
csvHeader: "Location",
excelHeader: "Location",
key: "location",
pdfHeader: "Location",
},
// ...
];

export const CONTACT_COLUMNS: ColumnDefinition[] = [
{
csvHeader: "First Name",
excelHeader: "First Name",
key: "first_name",
pdfHeader: "First Name",
},
// ...
];

// Helper functions
export function getValidKeysForEntity(entity: string): string[];
export function getColumnsForEntity(
entity: string,
format: "csv" | "excel" | "pdf",
): Column[];

Field Validation

Service: ExportService (backend/src/export/export.service.ts)

Selected fields are validated against allowed columns:

private validateSelectedFields(entityType: string, selectedFields: string[]): void {
const validKeys = getValidKeysForEntity(entityType);
const invalidFields = selectedFields.filter(f => !validKeys.includes(f));

if (invalidFields.length > 0) {
throw new BadRequestException(
`Invalid fields for ${entityType}: ${invalidFields.join(", ")}. Valid fields: ${validKeys.join(", ")}`
);
}
}

Migration: Unified file_objects

Migration: 0158_unify_file_objects (infra/liquibase/changes/0158_unify_file_objects/)

This migration:

  1. Adds new columns to file_objects (file_name, file_url, is_image, updated_at, entity_type, entity_id)
  2. Migrates data from session_files to file_objects with entity_type='session'
  3. Migrates data from contact_files to file_objects with entity_type='contact'
  4. Creates compatibility views (session_files_view, contact_files_view)
  5. Categorizes existing files by R2 key prefix

Backward compatibility: The original session_files and contact_files tables are preserved. New code should use EntityFilesService which reads/writes to file_objects.


API Endpoints

Synchronous Export

GET /api/export?entity={entity}&format={format}&fields={fields}

Rate limit: 10 requests per minute

Response:

{
"url": "https://r2.../exports/user123/sessions_2026-01-29.csv?...",
"key": "exports/user123/sessions_2026-01-29.csv",
"filename": "sessions-2026-01-29.csv",
"expiration": 3600
}

Asynchronous Export

GET /api/export/async?entity={entity}&format={format}&fields={fields}

Rate limit: 5 requests per minute

Creates a BullMQ job and returns immediately. Result is delivered via WebSocket or can be polled.

Signed URL Generation

GET /api/storage/signed-url/:key?expirationSeconds={seconds}

Access control: Verified via StorageAccessService

Max expiration: 3600 seconds (1 hour)


Best Practices

  1. Always use TEMPORARY_FILE_PREFIXES for exports - Files will be auto-cleaned after 24h
  2. Use PERMANENT_FILE_PREFIXES for business documents - Quotes, invoices, contracts are never auto-deleted
  3. Record metrics - Use MetricsService for all export and cleanup operations
  4. Validate fields - Always validate selectedFields against allowed columns
  5. Use EntityFilesService - For new file operations, use the unified service instead of direct table access
  6. Handle timeouts - Cleanup jobs have 30-minute timeout protection