Merge branch 'feature/presigned-url-downloads' into 'main'

Add presigned URL support for direct S3 downloads (#48) Closes #48 See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!17
Add presigned URL support for direct S3 downloads (#48 )
2025-12-15 16:06:51 -06:00 · 2025-12-15 16:06:51 -06:00 · 2025-12-15 14:47:31 -06:00 · 2025-12-15 14:47:30 -06:00 · 2025-12-15 14:00:32 -06:00 · 2025-12-15 14:00:32 -06:00
11 changed files with 844 additions and 43 deletions
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -7,12 +7,23 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0

 ## [Unreleased]
 ### Added
+- Added presigned URL support for direct S3 downloads (#48)
+- Added `ORCHARD_DOWNLOAD_MODE` config option (`presigned`, `redirect`, `proxy`) (#48)
+- Added `ORCHARD_PRESIGNED_URL_EXPIRY` config option (default: 3600 seconds) (#48)
+- Added `?mode=` query parameter to override download mode per-request (#48)
+- Added `/api/v1/project/{project}/{package}/+/{ref}/url` endpoint for getting presigned URLs (#48)
+- Added `PresignedUrlResponse` schema with URL, expiry, checksums, and artifact metadata (#48)
+- Added MinIO ingress support in Helm chart for presigned URL access (#48)
+- Added `orchard.download.mode` and `orchard.download.presignedUrlExpiry` Helm values (#48)
+- Added integrity verification workflow design document (#24)
 - Added `sha256` field to API responses for clarity (alias of `id`) (#25)
 - Added `checksum_sha1` field to artifacts table for compatibility (#25)
 - Added `s3_etag` field to artifacts table for S3 verification (#25)
 - Compute and store MD5, SHA1, and S3 ETag alongside SHA256 during upload (#25)
 - Added `Dockerfile.local` and `docker-compose.local.yml` for local development (#25)
 - Added migration script `003_checksum_fields.sql` for existing databases (#25)
+### Changed
+- Changed default download mode from `proxy` to `presigned` for better performance (#48)

 ## [0.2.0] - 2025-12-15
 ### Changed
--- a/README.md
+++ b/README.md
@@ -60,7 +60,8 @@ Orchard is a centralized binary artifact storage system that provides content-ad
 | `GET` | `/api/v1/project/:project/packages/:package` | Get single package with metadata |
 | `POST` | `/api/v1/project/:project/packages` | Create a new package |
 | `POST` | `/api/v1/project/:project/:package/upload` | Upload an artifact |
-| `GET` | `/api/v1/project/:project/:package/+/:ref` | Download an artifact (supports Range header) |
+| `GET` | `/api/v1/project/:project/:package/+/:ref` | Download an artifact (supports Range header, mode param) |
+| `GET` | `/api/v1/project/:project/:package/+/:ref/url` | Get presigned URL for direct S3 download |
 | `HEAD` | `/api/v1/project/:project/:package/+/:ref` | Get artifact metadata without downloading |
 | `GET` | `/api/v1/project/:project/:package/tags` | List tags (with pagination, search, sorting, artifact metadata) |
 | `POST` | `/api/v1/project/:project/:package/tags` | Create a tag |
@@ -292,6 +293,12 @@ curl -H "Range: bytes=0-1023" http://localhost:8080/api/v1/project/my-project/re

 # Check file info without downloading (HEAD request)
 curl -I http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
+
+# Download with specific mode (presigned, redirect, or proxy)
+curl "http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0?mode=proxy"
+
+# Get presigned URL for direct S3 download
+curl http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0/url
 ```

 > **Note on curl flags:**
@@ -300,6 +307,33 @@ curl -I http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
 > - `-OJ` combines both: download to a file using the server-provided filename
 > - `-o <filename>` saves to a specific filename you choose

+#### Download Modes
+
+Orchard supports three download modes, configurable via `ORCHARD_DOWNLOAD_MODE` or per-request with `?mode=`:
+
+| Mode | Description | Use Case |
+|------|-------------|----------|
+| `presigned` (default) | Returns JSON with a presigned S3 URL | Clients that handle redirects themselves, web UIs |
+| `redirect` | Returns HTTP 302 redirect to presigned S3 URL | Simple clients, browsers, wget |
+| `proxy` | Streams content through the backend | When S3 isn't directly accessible to clients |
+
+**Presigned URL Response:**
+```json
+{
+  "url": "https://minio.example.com/bucket/...",
+  "expires_at": "2025-01-01T01:00:00Z",
+  "method": "GET",
+  "artifact_id": "a3f5d8e...",
+  "size": 1048576,
+  "content_type": "application/gzip",
+  "original_name": "app-v1.0.0.tar.gz",
+  "checksum_sha256": "a3f5d8e...",
+  "checksum_md5": "d41d8cd..."
+}
+```
+
+> **Note:** For presigned URLs to work, clients must be able to reach the S3 endpoint directly. In Kubernetes, this requires exposing MinIO via ingress (see Helm configuration below).
+
 ### Create a Tag

 ```bash
@@ -485,6 +519,8 @@ Configuration is provided via environment variables prefixed with `ORCHARD_`:
 | `ORCHARD_S3_BUCKET` | S3 bucket name | `orchard-artifacts` |
 | `ORCHARD_S3_ACCESS_KEY_ID` | S3 access key | - |
 | `ORCHARD_S3_SECRET_ACCESS_KEY` | S3 secret key | - |
+| `ORCHARD_DOWNLOAD_MODE` | Download mode: `presigned`, `redirect`, or `proxy` | `presigned` |
+| `ORCHARD_PRESIGNED_URL_EXPIRY` | Presigned URL expiry in seconds | `3600` |

 ## Kubernetes Deployment

@@ -505,6 +541,32 @@ helm install orchard ./helm/orchard -n orchard --create-namespace
 helm install orchard ./helm/orchard -f my-values.yaml
 ```

+### Helm Configuration
+
+Key configuration options in `values.yaml`:
+
+```yaml
+orchard:
+  # Download configuration
+  download:
+    mode: "presigned"       # presigned, redirect, or proxy
+    presignedUrlExpiry: 3600
+
+# MinIO ingress (required for presigned URL downloads)
+minio:
+  ingress:
+    enabled: true
+    className: "nginx"
+    annotations:
+      cert-manager.io/cluster-issuer: "letsencrypt"
+    host: "minio.your-domain.com"
+    tls:
+      enabled: true
+      secretName: minio-tls
+```
+
+When `minio.ingress.enabled` is `true`, the S3 endpoint automatically uses the external URL (`https://minio.your-domain.com`), making presigned URLs accessible to external clients.
+
 See `helm/orchard/values.yaml` for all configuration options.

 ## Database Schema
--- a/backend/app/config.py
+++ b/backend/app/config.py
@@ -32,6 +32,10 @@ class Settings(BaseSettings):
    s3_secret_access_key: str = ""
    s3_use_path_style: bool = True

+    # Download settings
+    download_mode: str = "presigned"  # "presigned", "redirect", or "proxy"
+    presigned_url_expiry: int = 3600  # Presigned URL expiry in seconds (default: 1 hour)
+
    @property
    def database_url(self) -> str:
        sslmode = f"?sslmode={self.database_sslmode}" if self.database_sslmode else ""
--- a/backend/app/routes.py
+++ b/backend/app/routes.py
@@ -1,9 +1,9 @@
-from datetime import datetime
+from datetime import datetime, timedelta, timezone
 from fastapi import APIRouter, Depends, HTTPException, UploadFile, File, Form, Request, Query, Header, Response
-from fastapi.responses import StreamingResponse
+from fastapi.responses import StreamingResponse, RedirectResponse
 from sqlalchemy.orm import Session
 from sqlalchemy import or_, func
-from typing import List, Optional
+from typing import List, Optional, Literal
 import math
 import re
 import io
@@ -29,8 +29,10 @@ from .schemas import (
    ResumableUploadCompleteResponse,
    ResumableUploadStatusResponse,
    GlobalSearchResponse, SearchResultProject, SearchResultPackage, SearchResultArtifact,
+    PresignedUrlResponse,
 )
 from .metadata import extract_metadata
+from .config import get_settings

 router = APIRouter()

@@ -844,27 +846,13 @@ def get_upload_status(
        raise HTTPException(status_code=404, detail=str(e))


-# Download artifact with range request support
-@router.get("/api/v1/project/{project_name}/{package_name}/+/{ref}")
-def download_artifact(
-    project_name: str,
-    package_name: str,
+# Helper function to resolve artifact reference
+def _resolve_artifact_ref(
    ref: str,
-    request: Request,
-    db: Session = Depends(get_db),
-    storage: S3Storage = Depends(get_storage),
-    range: Optional[str] = Header(None),
-):
-    # Get project and package
-    project = db.query(Project).filter(Project.name == project_name).first()
-    if not project:
-        raise HTTPException(status_code=404, detail="Project not found")
-
-    package = db.query(Package).filter(Package.project_id == project.id, Package.name == package_name).first()
-    if not package:
-        raise HTTPException(status_code=404, detail="Package not found")
-
-    # Resolve reference to artifact
+    package: Package,
+    db: Session,
+) -> Optional[Artifact]:
+    """Resolve a reference (tag name, artifact:hash, tag:name) to an artifact"""
    artifact = None

    # Check for explicit prefixes
@@ -885,11 +873,76 @@ def download_artifact(
            # Try as direct artifact ID
            artifact = db.query(Artifact).filter(Artifact.id == ref).first()

+    return artifact
+
+
+# Download artifact with range request support and download modes
+@router.get("/api/v1/project/{project_name}/{package_name}/+/{ref}")
+def download_artifact(
+    project_name: str,
+    package_name: str,
+    ref: str,
+    request: Request,
+    db: Session = Depends(get_db),
+    storage: S3Storage = Depends(get_storage),
+    range: Optional[str] = Header(None),
+    mode: Optional[Literal["proxy", "redirect", "presigned"]] = Query(
+        default=None,
+        description="Download mode: proxy (stream through backend), redirect (302 to presigned URL), presigned (return JSON with URL)"
+    ),
+):
+    settings = get_settings()
+
+    # Get project and package
+    project = db.query(Project).filter(Project.name == project_name).first()
+    if not project:
+        raise HTTPException(status_code=404, detail="Project not found")
+
+    package = db.query(Package).filter(Package.project_id == project.id, Package.name == package_name).first()
+    if not package:
+        raise HTTPException(status_code=404, detail="Package not found")
+
+    # Resolve reference to artifact
+    artifact = _resolve_artifact_ref(ref, package, db)
    if not artifact:
        raise HTTPException(status_code=404, detail="Artifact not found")

    filename = artifact.original_name or f"{artifact.id}"

+    # Determine download mode (query param overrides server default)
+    download_mode = mode or settings.download_mode
+
+    # Handle presigned mode - return JSON with presigned URL
+    if download_mode == "presigned":
+        presigned_url = storage.generate_presigned_url(
+            artifact.s3_key,
+            response_content_type=artifact.content_type,
+            response_content_disposition=f'attachment; filename="{filename}"',
+        )
+        expires_at = datetime.now(timezone.utc) + timedelta(seconds=settings.presigned_url_expiry)
+
+        return PresignedUrlResponse(
+            url=presigned_url,
+            expires_at=expires_at,
+            method="GET",
+            artifact_id=artifact.id,
+            size=artifact.size,
+            content_type=artifact.content_type,
+            original_name=artifact.original_name,
+            checksum_sha256=artifact.id,
+            checksum_md5=artifact.checksum_md5,
+        )
+
+    # Handle redirect mode - return 302 redirect to presigned URL
+    if download_mode == "redirect":
+        presigned_url = storage.generate_presigned_url(
+            artifact.s3_key,
+            response_content_type=artifact.content_type,
+            response_content_disposition=f'attachment; filename="{filename}"',
+        )
+        return RedirectResponse(url=presigned_url, status_code=302)
+
+    # Proxy mode (default fallback) - stream through backend
    # Handle range requests
    if range:
        stream, content_length, content_range = storage.get_stream(artifact.s3_key, range)
@@ -923,6 +976,63 @@ def download_artifact(
    )


+# Get presigned URL endpoint (explicit endpoint for getting URL without redirect)
+@router.get("/api/v1/project/{project_name}/{package_name}/+/{ref}/url", response_model=PresignedUrlResponse)
+def get_artifact_url(
+    project_name: str,
+    package_name: str,
+    ref: str,
+    db: Session = Depends(get_db),
+    storage: S3Storage = Depends(get_storage),
+    expiry: Optional[int] = Query(
+        default=None,
+        description="Custom expiry time in seconds (defaults to server setting)"
+    ),
+):
+    """
+    Get a presigned URL for direct S3 download.
+    This endpoint always returns a presigned URL regardless of server download mode.
+    """
+    settings = get_settings()
+
+    # Get project and package
+    project = db.query(Project).filter(Project.name == project_name).first()
+    if not project:
+        raise HTTPException(status_code=404, detail="Project not found")
+
+    package = db.query(Package).filter(Package.project_id == project.id, Package.name == package_name).first()
+    if not package:
+        raise HTTPException(status_code=404, detail="Package not found")
+
+    # Resolve reference to artifact
+    artifact = _resolve_artifact_ref(ref, package, db)
+    if not artifact:
+        raise HTTPException(status_code=404, detail="Artifact not found")
+
+    filename = artifact.original_name or f"{artifact.id}"
+    url_expiry = expiry or settings.presigned_url_expiry
+
+    presigned_url = storage.generate_presigned_url(
+        artifact.s3_key,
+        expiry=url_expiry,
+        response_content_type=artifact.content_type,
+        response_content_disposition=f'attachment; filename="{filename}"',
+    )
+    expires_at = datetime.now(timezone.utc) + timedelta(seconds=url_expiry)
+
+    return PresignedUrlResponse(
+        url=presigned_url,
+        expires_at=expires_at,
+        method="GET",
+        artifact_id=artifact.id,
+        size=artifact.size,
+        content_type=artifact.content_type,
+        original_name=artifact.original_name,
+        checksum_sha256=artifact.id,
+        checksum_md5=artifact.checksum_md5,
+    )
+
+
 # HEAD request for download (to check file info without downloading)
@router.head("/api/v1/project/{project_name}/{package_name}/+/{ref}")
 def head_artifact(
@@ -941,23 +1051,8 @@ def head_artifact(
    if not package:
        raise HTTPException(status_code=404, detail="Package not found")

-    # Resolve reference to artifact (same logic as download)
-    artifact = None
-    if ref.startswith("artifact:"):
-        artifact_id = ref[9:]
-        artifact = db.query(Artifact).filter(Artifact.id == artifact_id).first()
-    elif ref.startswith("tag:") or ref.startswith("version:"):
-        tag_name = ref.split(":", 1)[1]
-        tag = db.query(Tag).filter(Tag.package_id == package.id, Tag.name == tag_name).first()
-        if tag:
-            artifact = db.query(Artifact).filter(Artifact.id == tag.artifact_id).first()
-    else:
-        tag = db.query(Tag).filter(Tag.package_id == package.id, Tag.name == ref).first()
-        if tag:
-            artifact = db.query(Artifact).filter(Artifact.id == tag.artifact_id).first()
-        else:
-            artifact = db.query(Artifact).filter(Artifact.id == ref).first()
-
+    # Resolve reference to artifact
+    artifact = _resolve_artifact_ref(ref, package, db)
    if not artifact:
        raise HTTPException(status_code=404, detail="Artifact not found")

--- a/backend/app/schemas.py
+++ b/backend/app/schemas.py
@@ -330,6 +330,20 @@ class GlobalSearchResponse(BaseModel):
    counts: Dict[str, int]  # Total counts for each type


+# Presigned URL response
+class PresignedUrlResponse(BaseModel):
+    """Response containing a presigned URL for direct S3 download"""
+    url: str
+    expires_at: datetime
+    method: str = "GET"
+    artifact_id: str
+    size: int
+    content_type: Optional[str] = None
+    original_name: Optional[str] = None
+    checksum_sha256: Optional[str] = None
+    checksum_md5: Optional[str] = None
+
+
 # Health check
 class HealthResponse(BaseModel):
    status: str
--- a/backend/app/storage.py
+++ b/backend/app/storage.py
@@ -450,6 +450,46 @@ class S3Storage:
        except ClientError:
            return False

+    def generate_presigned_url(
+        self,
+        s3_key: str,
+        expiry: Optional[int] = None,
+        response_content_type: Optional[str] = None,
+        response_content_disposition: Optional[str] = None,
+    ) -> str:
+        """
+        Generate a presigned URL for downloading an object.
+
+        Args:
+            s3_key: The S3 key of the object
+            expiry: URL expiry in seconds (defaults to settings.presigned_url_expiry)
+            response_content_type: Override Content-Type header in response
+            response_content_disposition: Override Content-Disposition header in response
+
+        Returns:
+            Presigned URL string
+        """
+        if expiry is None:
+            expiry = settings.presigned_url_expiry
+
+        params = {
+            "Bucket": self.bucket,
+            "Key": s3_key,
+        }
+
+        # Add response header overrides if specified
+        if response_content_type:
+            params["ResponseContentType"] = response_content_type
+        if response_content_disposition:
+            params["ResponseContentDisposition"] = response_content_disposition
+
+        url = self.client.generate_presigned_url(
+            "get_object",
+            Params=params,
+            ExpiresIn=expiry,
+        )
+        return url
+

 # Singleton instance
 _storage = None
--- a/docs/design/integrity-verification.md
+++ b/docs/design/integrity-verification.md
@@ -0,0 +1,504 @@
+# Integrity Verification Workflow Design
+
+This document defines the process for SHA256 checksum verification on artifact downloads, including failure handling and retry mechanisms.
+
+## Overview
+
+Orchard uses content-addressable storage where the artifact ID is the SHA256 hash of the content. This design leverages that property to provide configurable integrity verification during downloads.
+
+## Current State
+
+| Aspect | Status |
+|--------|--------|
+| Download streams content directly from S3 | ✅ Implemented |
+| Artifact ID is the SHA256 hash | ✅ Implemented |
+| S3 key derived from SHA256 hash | ✅ Implemented |
+| Verification during download | ❌ Not implemented |
+| Checksum headers in response | ❌ Not implemented |
+| Retry mechanism on failure | ❌ Not implemented |
+| Failure handling beyond S3 errors | ❌ Not implemented |
+
+## Verification Modes
+
+The verification mode is selected via query parameter `?verify=<mode>` or server-wide default via `ORCHARD_VERIFY_MODE`.
+
+| Mode | Performance | Integrity | Use Case |
+|------|-------------|-----------|----------|
+| `none` | ⚡ Fastest | Client-side | Trusted networks, high throughput |
+| `header` | ⚡ Fast | Client-side | Standard downloads, client verification |
+| `stream` | 🔄 Moderate | Post-hoc server | Logging/auditing, non-blocking |
+| `pre` | 🐢 Slower | Guaranteed | Critical downloads, untrusted storage |
+| `strict` | 🐢 Slower | Guaranteed + Alert | Security-sensitive, compliance |
+
+### Mode: None (Default)
+
+**Behavior:**
+- Stream content directly from S3 with no server-side processing
+- Maximum download performance
+- Client is responsible for verification
+
+**Headers Returned:**
+```
+X-Checksum-SHA256: <expected_hash>
+Content-Length: <expected_size>
+```
+
+**Flow:**
+```
+Client Request → Lookup Artifact → Stream from S3 → Client
+```
+
+### Mode: Header
+
+**Behavior:**
+- Stream content directly from S3
+- Include comprehensive checksum headers
+- Client performs verification using headers
+
+**Headers Returned:**
+```
+X-Checksum-SHA256: <expected_hash>
+Content-Length: <expected_size>
+Digest: sha-256=<base64_encoded_hash>
+ETag: "<sha256_hash>"
+X-Content-SHA256: <expected_hash>
+```
+
+**Flow:**
+```
+Client Request → Lookup Artifact → Add Headers → Stream from S3 → Client Verifies
+```
+
+**Client Verification Example:**
+```bash
+# Download and verify
+curl -OJ https://orchard/project/foo/bar/+/v1.0.0
+EXPECTED=$(curl -sI https://orchard/project/foo/bar/+/v1.0.0 | grep X-Checksum-SHA256 | cut -d' ' -f2)
+ACTUAL=$(sha256sum downloaded_file | cut -d' ' -f1)
+[ "$EXPECTED" = "$ACTUAL" ] && echo "OK" || echo "MISMATCH"
+```
+
+### Mode: Stream (Post-Hoc Verification)
+
+**Behavior:**
+- Wrap S3 stream with `HashingStreamWrapper`
+- Compute SHA256 incrementally while streaming to client
+- Verify hash after stream completes
+- Log verification result
+- Cannot reject content (already sent to client)
+
+**Headers Returned:**
+```
+X-Checksum-SHA256: <expected_hash>
+Content-Length: <expected_size>
+X-Verify-Mode: stream
+Trailer: X-Verified
+```
+
+**Trailers (if client supports):**
+```
+X-Verified: true|false
+X-Computed-SHA256: <computed_hash>
+```
+
+**Flow:**
+```
+Client Request → Lookup Artifact → Wrap Stream → Stream to Client
+                                       ↓
+                              Compute Hash Incrementally
+                                       ↓
+                              Verify After Complete → Log Result
+```
+
+**Implementation:**
+```python
+class HashingStreamWrapper:
+    def __init__(self, stream, expected_hash: str, on_complete: Callable):
+        self.stream = stream
+        self.hasher = hashlib.sha256()
+        self.expected_hash = expected_hash
+        self.on_complete = on_complete
+
+    def __iter__(self):
+        for chunk in self.stream:
+            self.hasher.update(chunk)
+            yield chunk
+        # Stream complete, verify
+        computed = self.hasher.hexdigest()
+        self.on_complete(computed == self.expected_hash, computed)
+```
+
+### Mode: Pre-Verify (Blocking)
+
+**Behavior:**
+- Download entire content from S3 to memory/temp file
+- Compute SHA256 hash before sending to client
+- On match: stream verified content to client
+- On mismatch: retry from S3 (up to N times)
+- If retries exhausted: return 500 error
+
+**Headers Returned:**
+```
+X-Checksum-SHA256: <expected_hash>
+Content-Length: <expected_size>
+X-Verify-Mode: pre
+X-Verified: true
+```
+
+**Flow:**
+```
+Client Request → Lookup Artifact → Download from S3 → Compute Hash
+                                                          ↓
+                                                    Hash Matches?
+                                                    ↓           ↓
+                                                   Yes          No
+                                                    ↓           ↓
+                                            Stream to Client   Retry?
+                                                                ↓
+                                                          Yes → Loop
+                                                          No  → 500 Error
+```
+
+**Memory Considerations:**
+- For files < `ORCHARD_VERIFY_MEMORY_LIMIT` (default 100MB): buffer in memory
+- For larger files: use temporary file with streaming hash computation
+- Cleanup temp files after response sent
+
+### Mode: Strict
+
+**Behavior:**
+- Same as pre-verify but with no retries
+- Fail immediately on any mismatch
+- Quarantine artifact on failure (mark as potentially corrupted)
+- Trigger alert/notification on failure
+- For security-critical downloads
+
+**Headers Returned (on success):**
+```
+X-Checksum-SHA256: <expected_hash>
+Content-Length: <expected_size>
+X-Verify-Mode: strict
+X-Verified: true
+```
+
+**Error Response (on failure):**
+```json
+{
+  "error": "integrity_verification_failed",
+  "message": "Artifact content does not match expected checksum",
+  "expected_hash": "<expected>",
+  "computed_hash": "<computed>",
+  "artifact_id": "<id>",
+  "action_taken": "quarantined"
+}
+```
+
+**Quarantine Process:**
+1. Mark artifact `status = 'quarantined'` in database
+2. Log security event to audit_logs
+3. Optionally notify via webhook/email
+4. Artifact becomes unavailable for download until resolved
+
+## Failure Detection
+
+### Failure Types
+
+| Failure Type | Detection Method | Severity |
+|--------------|------------------|----------|
+| Hash mismatch | Computed SHA256 ≠ Expected | Critical |
+| Size mismatch | Actual bytes ≠ `Content-Length` | High |
+| S3 read error | boto3 exception | Medium |
+| Truncated content | Stream ends early | High |
+| S3 object missing | `NoSuchKey` error | Critical |
+| ETag mismatch | S3 ETag ≠ expected | Medium |
+
+### Detection Implementation
+
+```python
+class VerificationResult:
+    success: bool
+    failure_type: Optional[str]  # hash_mismatch, size_mismatch, etc.
+    expected_hash: str
+    computed_hash: Optional[str]
+    expected_size: int
+    actual_size: Optional[int]
+    error_message: Optional[str]
+    retry_count: int
+```
+
+## Retry Mechanism
+
+### Configuration
+
+| Environment Variable | Default | Description |
+|---------------------|---------|-------------|
+| `ORCHARD_VERIFY_MAX_RETRIES` | 3 | Maximum retry attempts |
+| `ORCHARD_VERIFY_RETRY_DELAY_MS` | 100 | Base delay between retries |
+| `ORCHARD_VERIFY_RETRY_BACKOFF` | 2.0 | Exponential backoff multiplier |
+| `ORCHARD_VERIFY_RETRY_MAX_DELAY_MS` | 5000 | Maximum delay cap |
+
+### Backoff Formula
+
+```
+delay = min(base_delay * (backoff ^ attempt), max_delay)
+```
+
+Example with defaults:
+- Attempt 1: 100ms
+- Attempt 2: 200ms
+- Attempt 3: 400ms
+
+### Retry Flow
+
+```python
+async def download_with_retry(artifact, max_retries=3):
+    for attempt in range(max_retries + 1):
+        try:
+            content = await fetch_from_s3(artifact.s3_key)
+            computed_hash = compute_sha256(content)
+
+            if computed_hash == artifact.id:
+                return content  # Success
+
+            # Hash mismatch
+            log.warning(f"Verification failed, attempt {attempt + 1}/{max_retries + 1}")
+
+            if attempt < max_retries:
+                delay = calculate_backoff(attempt)
+                await asyncio.sleep(delay / 1000)
+            else:
+                raise IntegrityError("Max retries exceeded")
+
+        except S3Error as e:
+            if attempt < max_retries:
+                delay = calculate_backoff(attempt)
+                await asyncio.sleep(delay / 1000)
+            else:
+                raise
+```
+
+### Retryable vs Non-Retryable Failures
+
+**Retryable:**
+- S3 read timeout
+- S3 connection error
+- Hash mismatch (may be transient S3 issue)
+- Truncated content
+
+**Non-Retryable:**
+- S3 object not found (404)
+- S3 access denied (403)
+- Artifact not in database
+- Strict mode failures
+
+## Configuration Reference
+
+### Environment Variables
+
+```bash
+# Verification mode (none, header, stream, pre, strict)
+ORCHARD_VERIFY_MODE=none
+
+# Retry settings
+ORCHARD_VERIFY_MAX_RETRIES=3
+ORCHARD_VERIFY_RETRY_DELAY_MS=100
+ORCHARD_VERIFY_RETRY_BACKOFF=2.0
+ORCHARD_VERIFY_RETRY_MAX_DELAY_MS=5000
+
+# Memory limit for pre-verify buffering (bytes)
+ORCHARD_VERIFY_MEMORY_LIMIT=104857600  # 100MB
+
+# Strict mode settings
+ORCHARD_VERIFY_QUARANTINE_ON_FAILURE=true
+ORCHARD_VERIFY_ALERT_WEBHOOK=https://alerts.example.com/webhook
+
+# Allow per-request mode override
+ORCHARD_VERIFY_ALLOW_OVERRIDE=true
+```
+
+### Per-Request Override
+
+When `ORCHARD_VERIFY_ALLOW_OVERRIDE=true`, clients can specify verification mode:
+
+```
+GET /api/v1/project/foo/bar/+/v1.0.0?verify=pre
+GET /api/v1/project/foo/bar/+/v1.0.0?verify=none
+```
+
+## API Changes
+
+### Download Endpoint
+
+**Request:**
+```
+GET /api/v1/project/{project}/{package}/+/{ref}?verify={mode}
+```
+
+**New Query Parameters:**
+| Parameter | Type | Default | Description |
+|-----------|------|---------|-------------|
+| `verify` | string | from config | Verification mode |
+
+**New Response Headers:**
+| Header | Description |
+|--------|-------------|
+| `X-Checksum-SHA256` | Expected SHA256 hash |
+| `X-Verify-Mode` | Active verification mode |
+| `X-Verified` | `true` if server verified content |
+| `Digest` | RFC 3230 digest header |
+
+### New Endpoint: Verify Artifact
+
+**Request:**
+```
+POST /api/v1/project/{project}/{package}/+/{ref}/verify
+```
+
+**Response:**
+```json
+{
+  "artifact_id": "abc123...",
+  "verified": true,
+  "expected_hash": "abc123...",
+  "computed_hash": "abc123...",
+  "size_match": true,
+  "expected_size": 1048576,
+  "actual_size": 1048576,
+  "verification_time_ms": 45
+}
+```
+
+## Logging and Monitoring
+
+### Log Events
+
+| Event | Level | When |
+|-------|-------|------|
+| `verification.success` | INFO | Hash verified successfully |
+| `verification.failure` | ERROR | Hash mismatch detected |
+| `verification.retry` | WARN | Retry attempt initiated |
+| `verification.quarantine` | ERROR | Artifact quarantined |
+| `verification.skip` | DEBUG | Verification skipped (mode=none) |
+
+### Metrics
+
+| Metric | Type | Description |
+|--------|------|-------------|
+| `orchard_verification_total` | Counter | Total verification attempts |
+| `orchard_verification_failures` | Counter | Failed verifications |
+| `orchard_verification_retries` | Counter | Retry attempts |
+| `orchard_verification_duration_ms` | Histogram | Verification time |
+
+### Audit Log Entry
+
+```json
+{
+  "action": "artifact.download.verified",
+  "resource": "project/foo/package/bar/artifact/abc123",
+  "user_id": "user@example.com",
+  "details": {
+    "verification_mode": "pre",
+    "verified": true,
+    "retry_count": 0,
+    "duration_ms": 45
+  }
+}
+```
+
+## Security Considerations
+
+1. **Strict Mode for Sensitive Data**: Use strict mode for artifacts containing credentials, certificates, or security-critical code.
+
+2. **Quarantine Isolation**: Quarantined artifacts should be moved to a separate S3 prefix or bucket for forensic analysis.
+
+3. **Alert on Repeated Failures**: Multiple verification failures for the same artifact may indicate storage corruption or tampering.
+
+4. **Audit Trail**: All verification events should be logged for compliance and forensic purposes.
+
+5. **Client Trust**: In `none` and `header` modes, clients must implement their own verification for security guarantees.
+
+## Implementation Phases
+
+### Phase 1: Headers Only
+- Add `X-Checksum-SHA256` header to all downloads
+- Add `verify=header` mode support
+- Add configuration options
+
+### Phase 2: Stream Verification
+- Implement `HashingStreamWrapper`
+- Add `verify=stream` mode
+- Add verification logging
+
+### Phase 3: Pre-Verification
+- Implement buffered verification
+- Add retry mechanism
+- Add `verify=pre` mode
+
+### Phase 4: Strict Mode
+- Implement quarantine mechanism
+- Add alerting integration
+- Add `verify=strict` mode
+
+## Client Integration Examples
+
+### curl with Verification
+```bash
+#!/bin/bash
+URL="https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0"
+
+# Get expected hash from headers
+EXPECTED=$(curl -sI "$URL" | grep -i "X-Checksum-SHA256" | tr -d '\r' | cut -d' ' -f2)
+
+# Download file
+curl -sO "$URL"
+FILENAME=$(basename "$URL")
+
+# Verify
+ACTUAL=$(sha256sum "$FILENAME" | cut -d' ' -f1)
+
+if [ "$EXPECTED" = "$ACTUAL" ]; then
+    echo "✓ Verification passed"
+else
+    echo "✗ Verification FAILED"
+    echo "  Expected: $EXPECTED"
+    echo "  Actual:   $ACTUAL"
+    exit 1
+fi
+```
+
+### Python Client
+```python
+import hashlib
+import requests
+
+def download_verified(url: str) -> bytes:
+    # Get headers first
+    head = requests.head(url)
+    expected_hash = head.headers.get('X-Checksum-SHA256')
+    expected_size = int(head.headers.get('Content-Length', 0))
+
+    # Download content
+    response = requests.get(url)
+    content = response.content
+
+    # Verify size
+    if len(content) != expected_size:
+        raise ValueError(f"Size mismatch: {len(content)} != {expected_size}")
+
+    # Verify hash
+    actual_hash = hashlib.sha256(content).hexdigest()
+    if actual_hash != expected_hash:
+        raise ValueError(f"Hash mismatch: {actual_hash} != {expected_hash}")
+
+    return content
+```
+
+### Server-Side Verification
+```bash
+# Force server to verify before sending
+curl -O "https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0?verify=pre"
+
+# Check if verification was performed
+curl -I "https://orchard.example.com/api/v1/project/myproject/mypackage/+/v1.0.0?verify=pre" | grep X-Verified
+# X-Verified: true
+```
--- a/helm/orchard/templates/_helpers.tpl
+++ b/helm/orchard/templates/_helpers.tpl
@@ -97,10 +97,27 @@ password
 {{- end }}

 {{/*
-MinIO host
+MinIO internal host (for server-side operations)
+*/}}
+{{- define "orchard.minio.internalHost" -}}
+{{- if .Values.minio.enabled }}
+{{- printf "http://%s-minio:9000" .Release.Name }}
+{{- else }}
+{{- .Values.orchard.s3.endpoint }}
+{{- end }}
+{{- end }}
+
+{{/*
+MinIO host (uses external URL if ingress enabled, for presigned URLs)
 */}}
 {{- define "orchard.minio.host" -}}
-{{- if .Values.minio.enabled }}
+{{- if and .Values.minio.enabled .Values.minio.ingress.enabled .Values.minio.ingress.host }}
+{{- if .Values.minio.ingress.tls.enabled }}
+{{- printf "https://%s" .Values.minio.ingress.host }}
+{{- else }}
+{{- printf "http://%s" .Values.minio.ingress.host }}
+{{- end }}
+{{- else if .Values.minio.enabled }}
 {{- printf "http://%s-minio:9000" .Release.Name }}
 {{- else }}
 {{- .Values.orchard.s3.endpoint }}
--- a/helm/orchard/templates/deployment.yaml
+++ b/helm/orchard/templates/deployment.yaml
@@ -92,6 +92,10 @@ spec:
                secretKeyRef:
                  name: {{ include "orchard.minio.secretName" . }}
                  key: {{ if .Values.minio.enabled }}root-password{{ else }}{{ .Values.orchard.s3.existingSecretSecretKeyKey }}{{ end }}
+            - name: ORCHARD_DOWNLOAD_MODE
+              value: {{ .Values.orchard.download.mode | quote }}
+            - name: ORCHARD_PRESIGNED_URL_EXPIRY
+              value: {{ .Values.orchard.download.presignedUrlExpiry | quote }}
          livenessProbe:
            {{- toYaml .Values.livenessProbe | nindent 12 }}
          readinessProbe:
--- a/helm/orchard/templates/minio-ingress.yaml
+++ b/helm/orchard/templates/minio-ingress.yaml
@@ -0,0 +1,34 @@
+{{- if and .Values.minio.enabled .Values.minio.ingress.enabled -}}
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  name: {{ include "orchard.fullname" . }}-minio
+  labels:
+    {{- include "orchard.labels" . | nindent 4 }}
+    app.kubernetes.io/component: minio
+  {{- with .Values.minio.ingress.annotations }}
+  annotations:
+    {{- toYaml . | nindent 4 }}
+  {{- end }}
+spec:
+  {{- if .Values.minio.ingress.className }}
+  ingressClassName: {{ .Values.minio.ingress.className }}
+  {{- end }}
+  {{- if .Values.minio.ingress.tls.enabled }}
+  tls:
+    - hosts:
+        - {{ .Values.minio.ingress.host | quote }}
+      secretName: {{ .Values.minio.ingress.tls.secretName }}
+  {{- end }}
+  rules:
+    - host: {{ .Values.minio.ingress.host | quote }}
+      http:
+        paths:
+          - path: /
+            pathType: Prefix
+            backend:
+              service:
+                name: {{ .Release.Name }}-minio
+                port:
+                  number: 9000
+{{- end }}
--- a/helm/orchard/values.yaml
+++ b/helm/orchard/values.yaml
@@ -115,6 +115,11 @@ orchard:
    existingSecretAccessKeyKey: "access-key-id"
    existingSecretSecretKeyKey: "secret-access-key"

+  # Download configuration
+  download:
+    mode: "presigned"  # presigned, redirect, or proxy
+    presignedUrlExpiry: 3600  # Presigned URL expiry in seconds
+
 # PostgreSQL subchart configuration
 postgresql:
  enabled: true
@@ -147,6 +152,17 @@ minio:
  persistence:
    enabled: false
    size: 50Gi
+  # MinIO ingress for presigned URL access
+  ingress:
+    enabled: false
+    className: "nginx"
+    annotations:
+      cert-manager.io/cluster-issuer: "letsencrypt"
+      nginx.ingress.kubernetes.io/proxy-body-size: "0"  # Disable body size limit for uploads
+    host: ""  # e.g., minio.your-domain.com
+    tls:
+      enabled: true
+      secretName: minio-tls

 # Redis subchart configuration (for future caching)
 redis:
Author	SHA1	Message	Date
Mondo Diaz	8999552949	Merge branch 'feature/presigned-url-downloads' into 'main' Add presigned URL support for direct S3 downloads (#48) Closes #48 See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!17	2025-12-15 16:06:51 -06:00
Mondo Diaz	2df97ae94a	Add presigned URL support for direct S3 downloads (#48 )	2025-12-15 16:06:51 -06:00
Mondo Diaz	caa0c5af0c	Merge branch 'feature/store-sha256-checksums' into 'main' Store SHA256 checksums with artifacts and add multiple hash support Closes #25 See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!16	2025-12-15 14:47:31 -06:00
Mondo Diaz	3fd2747ae4	Store SHA256 checksums with artifacts and add multiple hash support	2025-12-15 14:47:30 -06:00
Mondo Diaz	96367da448	Merge branch 'feature/integrity-verification-design' into 'main' Add integrity verification workflow design document Closes #24 See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!15	2025-12-15 14:00:32 -06:00
Mondo Diaz	2686fdcb89	Add integrity verification workflow design document	2025-12-15 14:00:32 -06:00