Files
orchard/docs/design/integrity-verification.md
Mondo Diaz b0d65f3509 Add integrity verification workflow design document
Define SHA256 checksum verification process for artifact downloads:
- Five verification modes: none, header, stream, pre, strict
- Failure detection for hash/size mismatch, S3 errors, truncation
- Retry mechanism with exponential backoff
- Quarantine process for strict mode failures
- Configuration options and client integration examples
2025-12-15 12:30:18 -06:00

14 KiB

Integrity Verification Workflow Design

This document defines the process for SHA256 checksum verification on artifact downloads, including failure handling and retry mechanisms.

Overview

Orchard uses content-addressable storage where the artifact ID is the SHA256 hash of the content. This design leverages that property to provide configurable integrity verification during downloads.

Current State

Aspect Status
Download streams content directly from S3 Implemented
Artifact ID is the SHA256 hash Implemented
S3 key derived from SHA256 hash Implemented
Verification during download Not implemented
Checksum headers in response Not implemented
Retry mechanism on failure Not implemented
Failure handling beyond S3 errors Not implemented

Verification Modes

The verification mode is selected via query parameter ?verify=<mode> or server-wide default via ORCHARD_VERIFY_MODE.

Mode Performance Integrity Use Case
none Fastest Client-side Trusted networks, high throughput
header Fast Client-side Standard downloads, client verification
stream 🔄 Moderate Post-hoc server Logging/auditing, non-blocking
pre 🐢 Slower Guaranteed Critical downloads, untrusted storage
strict 🐢 Slower Guaranteed + Alert Security-sensitive, compliance

Mode: None (Default)

Behavior:

  • Stream content directly from S3 with no server-side processing
  • Maximum download performance
  • Client is responsible for verification

Headers Returned:

X-Checksum-SHA256: <expected_hash>
Content-Length: <expected_size>

Flow:

Client Request → Lookup Artifact → Stream from S3 → Client

Mode: Header

Behavior:

  • Stream content directly from S3
  • Include comprehensive checksum headers
  • Client performs verification using headers

Headers Returned:

X-Checksum-SHA256: <expected_hash>
Content-Length: <expected_size>
Digest: sha-256=<base64_encoded_hash>
ETag: "<sha256_hash>"
X-Content-SHA256: <expected_hash>

Flow:

Client Request → Lookup Artifact → Add Headers → Stream from S3 → Client Verifies

Client Verification Example:

# Download and verify
curl -OJ https://orchard/project/foo/bar/+/v1.0.0
EXPECTED=$(curl -sI https://orchard/project/foo/bar/+/v1.0.0 | grep X-Checksum-SHA256 | cut -d' ' -f2)
ACTUAL=$(sha256sum downloaded_file | cut -d' ' -f1)
[ "$EXPECTED" = "$ACTUAL" ] && echo "OK" || echo "MISMATCH"

Mode: Stream (Post-Hoc Verification)

Behavior:

  • Wrap S3 stream with HashingStreamWrapper
  • Compute SHA256 incrementally while streaming to client
  • Verify hash after stream completes
  • Log verification result
  • Cannot reject content (already sent to client)

Headers Returned:

X-Checksum-SHA256: <expected_hash>
Content-Length: <expected_size>
X-Verify-Mode: stream
Trailer: X-Verified

Trailers (if client supports):

X-Verified: true|false
X-Computed-SHA256: <computed_hash>

Flow:

Client Request → Lookup Artifact → Wrap Stream → Stream to Client
                                       ↓
                              Compute Hash Incrementally
                                       ↓
                              Verify After Complete → Log Result

Implementation:

class HashingStreamWrapper:
    def __init__(self, stream, expected_hash: str, on_complete: Callable):
        self.stream = stream
        self.hasher = hashlib.sha256()
        self.expected_hash = expected_hash
        self.on_complete = on_complete

    def __iter__(self):
        for chunk in self.stream:
            self.hasher.update(chunk)
            yield chunk
        # Stream complete, verify
        computed = self.hasher.hexdigest()
        self.on_complete(computed == self.expected_hash, computed)

Mode: Pre-Verify (Blocking)

Behavior:

  • Download entire content from S3 to memory/temp file
  • Compute SHA256 hash before sending to client
  • On match: stream verified content to client
  • On mismatch: retry from S3 (up to N times)
  • If retries exhausted: return 500 error

Headers Returned:

X-Checksum-SHA256: <expected_hash>
Content-Length: <expected_size>
X-Verify-Mode: pre
X-Verified: true

Flow:

Client Request → Lookup Artifact → Download from S3 → Compute Hash
                                                          ↓
                                                    Hash Matches?
                                                    ↓           ↓
                                                   Yes          No
                                                    ↓           ↓
                                            Stream to Client   Retry?
                                                                ↓
                                                          Yes → Loop
                                                          No  → 500 Error

Memory Considerations:

  • For files < ORCHARD_VERIFY_MEMORY_LIMIT (default 100MB): buffer in memory
  • For larger files: use temporary file with streaming hash computation
  • Cleanup temp files after response sent

Mode: Strict

Behavior:

  • Same as pre-verify but with no retries
  • Fail immediately on any mismatch
  • Quarantine artifact on failure (mark as potentially corrupted)
  • Trigger alert/notification on failure
  • For security-critical downloads

Headers Returned (on success):

X-Checksum-SHA256: <expected_hash>
Content-Length: <expected_size>
X-Verify-Mode: strict
X-Verified: true

Error Response (on failure):

{
  "error": "integrity_verification_failed",
  "message": "Artifact content does not match expected checksum",
  "expected_hash": "<expected>",
  "computed_hash": "<computed>",
  "artifact_id": "<id>",
  "action_taken": "quarantined"
}

Quarantine Process:

  1. Mark artifact status = 'quarantined' in database
  2. Log security event to audit_logs
  3. Optionally notify via webhook/email
  4. Artifact becomes unavailable for download until resolved

Failure Detection

Failure Types

Failure Type Detection Method Severity
Hash mismatch Computed SHA256 ≠ Expected Critical
Size mismatch Actual bytes ≠ Content-Length High
S3 read error boto3 exception Medium
Truncated content Stream ends early High
S3 object missing NoSuchKey error Critical
ETag mismatch S3 ETag ≠ expected Medium

Detection Implementation

class VerificationResult:
    success: bool
    failure_type: Optional[str]  # hash_mismatch, size_mismatch, etc.
    expected_hash: str
    computed_hash: Optional[str]
    expected_size: int
    actual_size: Optional[int]
    error_message: Optional[str]
    retry_count: int

Retry Mechanism

Configuration

Environment Variable Default Description
ORCHARD_VERIFY_MAX_RETRIES 3 Maximum retry attempts
ORCHARD_VERIFY_RETRY_DELAY_MS 100 Base delay between retries
ORCHARD_VERIFY_RETRY_BACKOFF 2.0 Exponential backoff multiplier
ORCHARD_VERIFY_RETRY_MAX_DELAY_MS 5000 Maximum delay cap

Backoff Formula

delay = min(base_delay * (backoff ^ attempt), max_delay)

Example with defaults:

  • Attempt 1: 100ms
  • Attempt 2: 200ms
  • Attempt 3: 400ms

Retry Flow

async def download_with_retry(artifact, max_retries=3):
    for attempt in range(max_retries + 1):
        try:
            content = await fetch_from_s3(artifact.s3_key)
            computed_hash = compute_sha256(content)

            if computed_hash == artifact.id:
                return content  # Success

            # Hash mismatch
            log.warning(f"Verification failed, attempt {attempt + 1}/{max_retries + 1}")

            if attempt < max_retries:
                delay = calculate_backoff(attempt)
                await asyncio.sleep(delay / 1000)
            else:
                raise IntegrityError("Max retries exceeded")

        except S3Error as e:
            if attempt < max_retries:
                delay = calculate_backoff(attempt)
                await asyncio.sleep(delay / 1000)
            else:
                raise

Retryable vs Non-Retryable Failures

Retryable:

  • S3 read timeout
  • S3 connection error
  • Hash mismatch (may be transient S3 issue)
  • Truncated content

Non-Retryable:

  • S3 object not found (404)
  • S3 access denied (403)
  • Artifact not in database
  • Strict mode failures

Configuration Reference

Environment Variables

# Verification mode (none, header, stream, pre, strict)
ORCHARD_VERIFY_MODE=none

# Retry settings
ORCHARD_VERIFY_MAX_RETRIES=3
ORCHARD_VERIFY_RETRY_DELAY_MS=100
ORCHARD_VERIFY_RETRY_BACKOFF=2.0
ORCHARD_VERIFY_RETRY_MAX_DELAY_MS=5000

# Memory limit for pre-verify buffering (bytes)
ORCHARD_VERIFY_MEMORY_LIMIT=104857600  # 100MB

# Strict mode settings
ORCHARD_VERIFY_QUARANTINE_ON_FAILURE=true
ORCHARD_VERIFY_ALERT_WEBHOOK=https://alerts.example.com/webhook

# Allow per-request mode override
ORCHARD_VERIFY_ALLOW_OVERRIDE=true

Per-Request Override

When ORCHARD_VERIFY_ALLOW_OVERRIDE=true, clients can specify verification mode:

GET /api/v1/project/foo/bar/+/v1.0.0?verify=pre
GET /api/v1/project/foo/bar/+/v1.0.0?verify=none

API Changes

Download Endpoint

Request:

GET /api/v1/project/{project}/{package}/+/{ref}?verify={mode}

New Query Parameters:

Parameter Type Default Description
verify string from config Verification mode

New Response Headers:

Header Description
X-Checksum-SHA256 Expected SHA256 hash
X-Verify-Mode Active verification mode
X-Verified true if server verified content
Digest RFC 3230 digest header

New Endpoint: Verify Artifact

Request:

POST /api/v1/project/{project}/{package}/+/{ref}/verify

Response:

{
  "artifact_id": "abc123...",
  "verified": true,
  "expected_hash": "abc123...",
  "computed_hash": "abc123...",
  "size_match": true,
  "expected_size": 1048576,
  "actual_size": 1048576,
  "verification_time_ms": 45
}

Logging and Monitoring

Log Events

Event Level When
verification.success INFO Hash verified successfully
verification.failure ERROR Hash mismatch detected
verification.retry WARN Retry attempt initiated
verification.quarantine ERROR Artifact quarantined
verification.skip DEBUG Verification skipped (mode=none)

Metrics

Metric Type Description
orchard_verification_total Counter Total verification attempts
orchard_verification_failures Counter Failed verifications
orchard_verification_retries Counter Retry attempts
orchard_verification_duration_ms Histogram Verification time

Audit Log Entry

{
  "action": "artifact.download.verified",
  "resource": "project/foo/package/bar/artifact/abc123",
  "user_id": "user@example.com",
  "details": {
    "verification_mode": "pre",
    "verified": true,
    "retry_count": 0,
    "duration_ms": 45
  }
}

Security Considerations

  1. Strict Mode for Sensitive Data: Use strict mode for artifacts containing credentials, certificates, or security-critical code.

  2. Quarantine Isolation: Quarantined artifacts should be moved to a separate S3 prefix or bucket for forensic analysis.

  3. Alert on Repeated Failures: Multiple verification failures for the same artifact may indicate storage corruption or tampering.

  4. Audit Trail: All verification events should be logged for compliance and forensic purposes.

  5. Client Trust: In none and header modes, clients must implement their own verification for security guarantees.

Implementation Phases

Phase 1: Headers Only

  • Add X-Checksum-SHA256 header to all downloads
  • Add verify=header mode support
  • Add configuration options

Phase 2: Stream Verification

  • Implement HashingStreamWrapper
  • Add verify=stream mode
  • Add verification logging

Phase 3: Pre-Verification

  • Implement buffered verification
  • Add retry mechanism
  • Add verify=pre mode

Phase 4: Strict Mode

  • Implement quarantine mechanism
  • Add alerting integration
  • Add verify=strict mode

Client Integration Examples

curl with Verification

#!/bin/bash
URL="https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0"

# Get expected hash from headers
EXPECTED=$(curl -sI "$URL" | grep -i "X-Checksum-SHA256" | tr -d '\r' | cut -d' ' -f2)

# Download file
curl -sO "$URL"
FILENAME=$(basename "$URL")

# Verify
ACTUAL=$(sha256sum "$FILENAME" | cut -d' ' -f1)

if [ "$EXPECTED" = "$ACTUAL" ]; then
    echo "✓ Verification passed"
else
    echo "✗ Verification FAILED"
    echo "  Expected: $EXPECTED"
    echo "  Actual:   $ACTUAL"
    exit 1
fi

Python Client

import hashlib
import requests

def download_verified(url: str) -> bytes:
    # Get headers first
    head = requests.head(url)
    expected_hash = head.headers.get('X-Checksum-SHA256')
    expected_size = int(head.headers.get('Content-Length', 0))

    # Download content
    response = requests.get(url)
    content = response.content

    # Verify size
    if len(content) != expected_size:
        raise ValueError(f"Size mismatch: {len(content)} != {expected_size}")

    # Verify hash
    actual_hash = hashlib.sha256(content).hexdigest()
    if actual_hash != expected_hash:
        raise ValueError(f"Hash mismatch: {actual_hash} != {expected_hash}")

    return content

Server-Side Verification

# Force server to verify before sending
curl -O "https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0?verify=pre"

# Check if verification was performed
curl -I "https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0?verify=pre" | grep X-Verified
# X-Verified: true