Aller au contenu principal

Database Cleanup Audit

This document is the active entry point for schema and data cleanup work.

It is intentionally short: the goal is to identify likely cleanup targets before any destructive migration or data purge.

Current Approach

  1. Identify legacy tables no longer used by application code.
  2. Separate:
    • schema candidates to drop,
    • data candidates to purge,
    • active tables that only need retention rules.
  3. Require a usage check before deletion:
    • backend queries,
    • frontend/API exposure,
    • operational scripts,
    • migration history and rollback expectations.

Confirmed Schema/Code Drift

These need cleanup or correction before any broad delete campaign.

Drift already cleaned

  • session_reminders
    • stale runtime reference removed from the insights rule
    • backend runtime no longer references session_reminders
    • table/type can now be treated as a true schema cleanup candidate instead of an active dependency
  • legacy workflow compatibility references
    • removed from backend/src/db/database.types.ts
    • runtime fallbacks removed from:
      • progression/helpers/badge-checker.helper.ts
      • session-info/session-client-info-prefill.service.ts
    • orphan service removed:
      • client-access/client-access-tasks.service.ts
    • this clears the remaining code dependency on:
      • user_workflows
      • user_workflow_phases
      • user_workflow_tasks
      • default_workflows
      • default_workflow_phases
      • default_workflow_tasks
      • session_workflow_tasks

This means the next cleanup pass should include:

  1. removing stale runtime references,
  2. aligning Kysely database types with the real schema,
  3. only then dropping confirmed-dead schema leftovers.

Initial Candidate Tables

These are the first tables to review because they look legacy or replaced by newer flows.

High-confidence review candidates

  • projects
  • project_tags
  • project_contacts
  • project_vendors
  • project_files
  • contact_projects

Rationale:

  • the product pivot is now sessions, not projects,
  • recent code search shows no active backend table queries (selectFrom, insertInto, updateTable, deleteFrom) for these legacy project tables in the main app code,
  • some remaining references are documentation, generated client types, or legacy E2E artifacts that need separate cleanup review.

Review carefully before touching

  • client_access_tokens
    • legacy /client UX was hard-cut, but the table still appears in active backend flows (client-access services, contracts/billing link generation, contacts merge)
  • migration_logs
    • active operational role in migration observability and sync with Liquibase
  • duplicates
  • user_duplicates
    • still active in backend, frontend, OpenAPI, navigation, and E2E
  • user_reminder_settings
    • still used by billing reminder services
  • session_workflow_action_runs
    • part of the workflow v2 runtime schema even if direct query usage is limited

Zero-usage or near-zero-usage review candidates

These currently appear to have no direct backend runtime references outside docs/types and should be reviewed table-by-table:

  • professional_type_profiles
  • organization_settings
  • progression_settings
  • gdpr_processing_activities

Important:

  • zero direct code references do not automatically mean safe to drop,
  • but they are strong review candidates for the next schema cleanup batch.

Current code verdict after strict scan:

  • professional_type_profiles
    • no direct runtime query found in backend/frontend
    • currently referenced only in DB types and documentation
  • organization_settings
    • no direct runtime query found in backend/frontend
    • apparent false positive: insights “organization settings” flows use insight_settings, not organization_settings
  • progression_settings
    • no direct runtime query found in backend/frontend
    • currently referenced only in DB types and documentation
  • gdpr_processing_activities
    • no direct runtime query found in backend/frontend
    • currently referenced only in DB types and documentation

Now strong schema cleanup candidates after code cleanup

  • user_workflows
  • user_workflow_phases
  • user_workflow_tasks
  • default_workflows
  • default_workflow_phases
  • default_workflow_tasks
  • session_workflow_tasks

Rationale:

  • no backend runtime references remain after the latest cleanup pass,
  • the compatibility types and fallback code have been removed,
  • these are now candidates for a dedicated Liquibase drop migration after a final DB-level existence check.

Data Cleanup Candidates

These are likely to need retention or purge rules rather than immediate table deletion.

  • expired tokens in auth-related tables
  • old webhook event logs
  • old scheduled email execution artifacts
  • worker/runtime tables with historical rows:
    • job_runs
    • outbox_events
    • file_objects
    • export_jobs
    • ai_tasks
  • old notification/event tracking rows:
    • email_tracking
    • audit_logs
    • migration_logs
  1. Export an authoritative inventory from the live database:
    • row counts per table
    • total relation size
    • last write timestamps when available
    • use infra/db-cleanup-audit.sql for a non-destructive first pass
  2. Fix schema/code drift first:
    • confirm the live schema still contains the legacy workflow/session reminder tables before dropping them
  3. Confirm no runtime references remain for the legacy project tables.
  4. Draft cleanup migrations in small batches:
    • batch 1: dead legacy tables
    • batch 2: retention policies and archival/purge jobs
  5. Validate backups and rollback before dropping anything in production.

Non-Goals

  • No deletion based only on documentation.
  • No dropping tables that still appear in code, OpenAPI, or operational workflows.
  • No data purge without a retention decision and a production backup path.

Audit Command

Run the inventory audit before any destructive migration:

psql "$DATABASE_URL" -f infra/db-cleanup-audit.sql

Expected outputs:

  • candidate table presence in public
  • approximate live row counts
  • total relation size
  • insert/update/delete counters from pg_stat_user_tables
  • max created_at / updated_at when those columns exist