# Integrity Verification Orchard uses content-addressable storage with SHA256 hashing to ensure artifact integrity. This document describes how integrity verification works and how to use it. ## How It Works ### Content-Addressable Storage Orchard stores artifacts using their SHA256 hash as the unique identifier. This provides several benefits: 1. **Automatic deduplication**: Identical content is stored only once 2. **Built-in integrity**: The artifact ID *is* the content hash 3. **Tamper detection**: Any modification changes the hash, making corruption detectable When you upload a file: 1. Orchard computes the SHA256 hash of the content 2. The hash becomes the artifact ID (64-character hex string) 3. The file is stored in S3 at `fruits/{hash[0:2]}/{hash[2:4]}/{hash}` 4. The hash and metadata are recorded in the database ### Hash Format - Algorithm: SHA256 - Format: 64-character lowercase hexadecimal string - Example: `dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f` ## Client-Side Verification ### Before Upload Compute the hash locally before uploading to verify the server received your content correctly: ```python import hashlib def compute_sha256(content: bytes) -> str: return hashlib.sha256(content).hexdigest() # Compute hash before upload content = open("myfile.tar.gz", "rb").read() local_hash = compute_sha256(content) # Upload the file response = requests.post( f"{base_url}/api/v1/project/{project}/{package}/upload", files={"file": ("myfile.tar.gz", content)}, ) result = response.json() # Verify server computed the same hash assert result["artifact_id"] == local_hash, "Hash mismatch!" ``` ### Providing Expected Hash on Upload You can provide the expected hash in the upload request. The server will reject the upload if the computed hash doesn't match: ```python response = requests.post( f"{base_url}/api/v1/project/{project}/{package}/upload", files={"file": ("myfile.tar.gz", content)}, headers={"X-Checksum-SHA256": local_hash}, ) # Returns 422 if hash doesn't match if response.status_code == 422: print("Checksum mismatch - upload rejected") ``` ### After Download Verify downloaded content matches the expected hash using response headers: ```python response = requests.get( f"{base_url}/api/v1/project/{project}/{package}/+/{tag}", params={"mode": "proxy"}, ) # Get expected hash from header expected_hash = response.headers.get("X-Checksum-SHA256") # Compute hash of downloaded content actual_hash = compute_sha256(response.content) # Verify if actual_hash != expected_hash: raise Exception(f"Integrity check failed! Expected {expected_hash}, got {actual_hash}") ``` ### Response Headers for Verification Download responses include multiple headers for verification: | Header | Format | Description | |--------|--------|-------------| | `X-Checksum-SHA256` | Hex string | SHA256 hash (64 chars) | | `ETag` | `""` | SHA256 hash in quotes | | `Digest` | `sha-256=` | RFC 3230 format (base64-encoded) | | `Content-Length` | Integer | File size in bytes | ### Server-Side Verification on Download Request server-side verification during download: ```bash # Pre-verification: Server verifies before streaming (returns 500 if corrupt) curl "${base_url}/api/v1/project/${project}/${package}/+/${tag}?mode=proxy&verify=true&verify_mode=pre" # Stream verification: Server verifies while streaming (logs error if corrupt) curl "${base_url}/api/v1/project/${project}/${package}/+/${tag}?mode=proxy&verify=true&verify_mode=stream" ``` The `X-Verified` header indicates whether server-side verification was performed: - `X-Verified: true` - Content was verified by the server ## Server-Side Consistency Check ### Consistency Check Endpoint Administrators can run a consistency check to verify all stored artifacts: ```bash curl "${base_url}/api/v1/admin/consistency-check" ``` Response: ```json { "total_artifacts_checked": 1234, "healthy": true, "orphaned_s3_objects": 0, "missing_s3_objects": 0, "size_mismatches": 0, "orphaned_s3_keys": [], "missing_s3_keys": [], "size_mismatch_artifacts": [] } ``` ### What the Check Verifies 1. **Missing S3 objects**: Database records with no corresponding S3 object 2. **Orphaned S3 objects**: S3 objects with no database record 3. **Size mismatches**: S3 object size doesn't match database record ### Running Consistency Checks **Manual check:** ```bash # Check all artifacts curl "${base_url}/api/v1/admin/consistency-check" # Limit results (for large deployments) curl "${base_url}/api/v1/admin/consistency-check?limit=100" ``` **Scheduled checks (recommended):** Set up a cron job or Kubernetes CronJob to run periodic checks: ```yaml # Kubernetes CronJob example apiVersion: batch/v1 kind: CronJob metadata: name: orchard-consistency-check spec: schedule: "0 2 * * *" # Daily at 2 AM jobTemplate: spec: template: spec: containers: - name: check image: curlimages/curl command: - /bin/sh - -c - | response=$(curl -s "${ORCHARD_URL}/api/v1/admin/consistency-check") healthy=$(echo "$response" | jq -r '.healthy') if [ "$healthy" != "true" ]; then echo "ALERT: Consistency check failed!" echo "$response" exit 1 fi echo "Consistency check passed" restartPolicy: OnFailure ``` ## Recovery Procedures ### Corrupted Artifact (Size Mismatch) If the consistency check reports size mismatches: 1. **Identify affected artifacts:** ```bash curl "${base_url}/api/v1/admin/consistency-check" | jq '.size_mismatch_artifacts' ``` 2. **Check if artifact can be re-uploaded:** - If the original content is available, delete the corrupted artifact and re-upload - The same content will produce the same artifact ID 3. **If original content is lost:** - The artifact data is corrupted and cannot be recovered - Delete the artifact record and notify affected users - Consider restoring from backup if available ### Missing S3 Object If database records exist but S3 objects are missing: 1. **Identify affected artifacts:** ```bash curl "${base_url}/api/v1/admin/consistency-check" | jq '.missing_s3_keys' ``` 2. **Check S3 bucket:** - Verify the S3 bucket exists and is accessible - Check S3 access logs for deletion events - Check if objects were moved or lifecycle-deleted 3. **Recovery options:** - Restore from S3 versioning (if enabled) - Restore from backup - Re-upload original content (if available) - Delete orphaned database records ### Orphaned S3 Objects If S3 objects exist without database records: 1. **Identify orphaned objects:** ```bash curl "${base_url}/api/v1/admin/consistency-check" | jq '.orphaned_s3_keys' ``` 2. **Investigate cause:** - Upload interrupted before database commit? - Database record deleted but S3 cleanup failed? 3. **Resolution:** - If content is needed, create database record manually - If content is not needed, delete the S3 object to reclaim storage ### Preventive Measures 1. **Enable S3 versioning** to recover from accidental deletions 2. **Regular backups** of both database and S3 bucket 3. **Scheduled consistency checks** to detect issues early 4. **Monitoring and alerting** on consistency check failures 5. **Audit logging** to track all artifact operations ## Verification in CI/CD ### Verifying Artifacts in Pipelines ```bash #!/bin/bash # Download and verify artifact in CI pipeline ARTIFACT_URL="${ORCHARD_URL}/api/v1/project/${PROJECT}/${PACKAGE}/+/${TAG}" # Download with verification headers response=$(curl -s -D - "${ARTIFACT_URL}?mode=proxy" -o artifact.tar.gz) expected_hash=$(echo "$response" | grep -i "X-Checksum-SHA256" | cut -d: -f2 | tr -d ' \r') # Compute actual hash actual_hash=$(sha256sum artifact.tar.gz | cut -d' ' -f1) # Verify if [ "$actual_hash" != "$expected_hash" ]; then echo "ERROR: Integrity check failed!" echo "Expected: $expected_hash" echo "Actual: $actual_hash" exit 1 fi echo "Integrity verified: $actual_hash" ``` ### Using Server-Side Verification For critical deployments, use server-side pre-verification: ```bash # Server verifies before streaming - returns 500 if corrupt curl -f "${ARTIFACT_URL}?mode=proxy&verify=true&verify_mode=pre" -o artifact.tar.gz ``` This ensures the artifact is verified before any bytes are streamed to your pipeline.