295 lines
8.4 KiB
Markdown
295 lines
8.4 KiB
Markdown
# Integrity Verification
|
|
|
|
Orchard uses content-addressable storage with SHA256 hashing to ensure artifact integrity. This document describes how integrity verification works and how to use it.
|
|
|
|
## How It Works
|
|
|
|
### Content-Addressable Storage
|
|
|
|
Orchard stores artifacts using their SHA256 hash as the unique identifier. This provides several benefits:
|
|
|
|
1. **Automatic deduplication**: Identical content is stored only once
|
|
2. **Built-in integrity**: The artifact ID *is* the content hash
|
|
3. **Tamper detection**: Any modification changes the hash, making corruption detectable
|
|
|
|
When you upload a file:
|
|
1. Orchard computes the SHA256 hash of the content
|
|
2. The hash becomes the artifact ID (64-character hex string)
|
|
3. The file is stored in S3 at `fruits/{hash[0:2]}/{hash[2:4]}/{hash}`
|
|
4. The hash and metadata are recorded in the database
|
|
|
|
### Hash Format
|
|
|
|
- Algorithm: SHA256
|
|
- Format: 64-character lowercase hexadecimal string
|
|
- Example: `dffd6021bb2bd5b0af676290809ec3a53191dd81c7f70a4b28688a362182986f`
|
|
|
|
## Client-Side Verification
|
|
|
|
### Before Upload
|
|
|
|
Compute the hash locally before uploading to verify the server received your content correctly:
|
|
|
|
```python
|
|
import hashlib
|
|
|
|
def compute_sha256(content: bytes) -> str:
|
|
return hashlib.sha256(content).hexdigest()
|
|
|
|
# Compute hash before upload
|
|
content = open("myfile.tar.gz", "rb").read()
|
|
local_hash = compute_sha256(content)
|
|
|
|
# Upload the file
|
|
response = requests.post(
|
|
f"{base_url}/api/v1/project/{project}/{package}/upload",
|
|
files={"file": ("myfile.tar.gz", content)},
|
|
)
|
|
result = response.json()
|
|
|
|
# Verify server computed the same hash
|
|
assert result["artifact_id"] == local_hash, "Hash mismatch!"
|
|
```
|
|
|
|
### Providing Expected Hash on Upload
|
|
|
|
You can provide the expected hash in the upload request. The server will reject the upload if the computed hash doesn't match:
|
|
|
|
```python
|
|
response = requests.post(
|
|
f"{base_url}/api/v1/project/{project}/{package}/upload",
|
|
files={"file": ("myfile.tar.gz", content)},
|
|
headers={"X-Checksum-SHA256": local_hash},
|
|
)
|
|
|
|
# Returns 422 if hash doesn't match
|
|
if response.status_code == 422:
|
|
print("Checksum mismatch - upload rejected")
|
|
```
|
|
|
|
### After Download
|
|
|
|
Verify downloaded content matches the expected hash using response headers:
|
|
|
|
```python
|
|
response = requests.get(
|
|
f"{base_url}/api/v1/project/{project}/{package}/+/{tag}",
|
|
params={"mode": "proxy"},
|
|
)
|
|
|
|
# Get expected hash from header
|
|
expected_hash = response.headers.get("X-Checksum-SHA256")
|
|
|
|
# Compute hash of downloaded content
|
|
actual_hash = compute_sha256(response.content)
|
|
|
|
# Verify
|
|
if actual_hash != expected_hash:
|
|
raise Exception(f"Integrity check failed! Expected {expected_hash}, got {actual_hash}")
|
|
```
|
|
|
|
### Response Headers for Verification
|
|
|
|
Download responses include multiple headers for verification:
|
|
|
|
| Header | Format | Description |
|
|
|--------|--------|-------------|
|
|
| `X-Checksum-SHA256` | Hex string | SHA256 hash (64 chars) |
|
|
| `ETag` | `"<hash>"` | SHA256 hash in quotes |
|
|
| `Digest` | `sha-256=<base64>` | RFC 3230 format (base64-encoded) |
|
|
| `Content-Length` | Integer | File size in bytes |
|
|
|
|
### Server-Side Verification on Download
|
|
|
|
Request server-side verification during download:
|
|
|
|
```bash
|
|
# Pre-verification: Server verifies before streaming (returns 500 if corrupt)
|
|
curl "${base_url}/api/v1/project/${project}/${package}/+/${tag}?mode=proxy&verify=true&verify_mode=pre"
|
|
|
|
# Stream verification: Server verifies while streaming (logs error if corrupt)
|
|
curl "${base_url}/api/v1/project/${project}/${package}/+/${tag}?mode=proxy&verify=true&verify_mode=stream"
|
|
```
|
|
|
|
The `X-Verified` header indicates whether server-side verification was performed:
|
|
- `X-Verified: true` - Content was verified by the server
|
|
|
|
## Server-Side Consistency Check
|
|
|
|
### Consistency Check Endpoint
|
|
|
|
Administrators can run a consistency check to verify all stored artifacts:
|
|
|
|
```bash
|
|
curl "${base_url}/api/v1/admin/consistency-check"
|
|
```
|
|
|
|
Response:
|
|
```json
|
|
{
|
|
"total_artifacts_checked": 1234,
|
|
"healthy": true,
|
|
"orphaned_s3_objects": 0,
|
|
"missing_s3_objects": 0,
|
|
"size_mismatches": 0,
|
|
"orphaned_s3_keys": [],
|
|
"missing_s3_keys": [],
|
|
"size_mismatch_artifacts": []
|
|
}
|
|
```
|
|
|
|
### What the Check Verifies
|
|
|
|
1. **Missing S3 objects**: Database records with no corresponding S3 object
|
|
2. **Orphaned S3 objects**: S3 objects with no database record
|
|
3. **Size mismatches**: S3 object size doesn't match database record
|
|
|
|
### Running Consistency Checks
|
|
|
|
**Manual check:**
|
|
```bash
|
|
# Check all artifacts
|
|
curl "${base_url}/api/v1/admin/consistency-check"
|
|
|
|
# Limit results (for large deployments)
|
|
curl "${base_url}/api/v1/admin/consistency-check?limit=100"
|
|
```
|
|
|
|
**Scheduled checks (recommended):**
|
|
|
|
Set up a cron job or Kubernetes CronJob to run periodic checks:
|
|
|
|
```yaml
|
|
# Kubernetes CronJob example
|
|
apiVersion: batch/v1
|
|
kind: CronJob
|
|
metadata:
|
|
name: orchard-consistency-check
|
|
spec:
|
|
schedule: "0 2 * * *" # Daily at 2 AM
|
|
jobTemplate:
|
|
spec:
|
|
template:
|
|
spec:
|
|
containers:
|
|
- name: check
|
|
image: curlimages/curl
|
|
command:
|
|
- /bin/sh
|
|
- -c
|
|
- |
|
|
response=$(curl -s "${ORCHARD_URL}/api/v1/admin/consistency-check")
|
|
healthy=$(echo "$response" | jq -r '.healthy')
|
|
if [ "$healthy" != "true" ]; then
|
|
echo "ALERT: Consistency check failed!"
|
|
echo "$response"
|
|
exit 1
|
|
fi
|
|
echo "Consistency check passed"
|
|
restartPolicy: OnFailure
|
|
```
|
|
|
|
## Recovery Procedures
|
|
|
|
### Corrupted Artifact (Size Mismatch)
|
|
|
|
If the consistency check reports size mismatches:
|
|
|
|
1. **Identify affected artifacts:**
|
|
```bash
|
|
curl "${base_url}/api/v1/admin/consistency-check" | jq '.size_mismatch_artifacts'
|
|
```
|
|
|
|
2. **Check if artifact can be re-uploaded:**
|
|
- If the original content is available, delete the corrupted artifact and re-upload
|
|
- The same content will produce the same artifact ID
|
|
|
|
3. **If original content is lost:**
|
|
- The artifact data is corrupted and cannot be recovered
|
|
- Delete the artifact record and notify affected users
|
|
- Consider restoring from backup if available
|
|
|
|
### Missing S3 Object
|
|
|
|
If database records exist but S3 objects are missing:
|
|
|
|
1. **Identify affected artifacts:**
|
|
```bash
|
|
curl "${base_url}/api/v1/admin/consistency-check" | jq '.missing_s3_keys'
|
|
```
|
|
|
|
2. **Check S3 bucket:**
|
|
- Verify the S3 bucket exists and is accessible
|
|
- Check S3 access logs for deletion events
|
|
- Check if objects were moved or lifecycle-deleted
|
|
|
|
3. **Recovery options:**
|
|
- Restore from S3 versioning (if enabled)
|
|
- Restore from backup
|
|
- Re-upload original content (if available)
|
|
- Delete orphaned database records
|
|
|
|
### Orphaned S3 Objects
|
|
|
|
If S3 objects exist without database records:
|
|
|
|
1. **Identify orphaned objects:**
|
|
```bash
|
|
curl "${base_url}/api/v1/admin/consistency-check" | jq '.orphaned_s3_keys'
|
|
```
|
|
|
|
2. **Investigate cause:**
|
|
- Upload interrupted before database commit?
|
|
- Database record deleted but S3 cleanup failed?
|
|
|
|
3. **Resolution:**
|
|
- If content is needed, create database record manually
|
|
- If content is not needed, delete the S3 object to reclaim storage
|
|
|
|
### Preventive Measures
|
|
|
|
1. **Enable S3 versioning** to recover from accidental deletions
|
|
2. **Regular backups** of both database and S3 bucket
|
|
3. **Scheduled consistency checks** to detect issues early
|
|
4. **Monitoring and alerting** on consistency check failures
|
|
5. **Audit logging** to track all artifact operations
|
|
|
|
## Verification in CI/CD
|
|
|
|
### Verifying Artifacts in Pipelines
|
|
|
|
```bash
|
|
#!/bin/bash
|
|
# Download and verify artifact in CI pipeline
|
|
|
|
ARTIFACT_URL="${ORCHARD_URL}/api/v1/project/${PROJECT}/${PACKAGE}/+/${TAG}"
|
|
|
|
# Download with verification headers
|
|
response=$(curl -s -D - "${ARTIFACT_URL}?mode=proxy" -o artifact.tar.gz)
|
|
expected_hash=$(echo "$response" | grep -i "X-Checksum-SHA256" | cut -d: -f2 | tr -d ' \r')
|
|
|
|
# Compute actual hash
|
|
actual_hash=$(sha256sum artifact.tar.gz | cut -d' ' -f1)
|
|
|
|
# Verify
|
|
if [ "$actual_hash" != "$expected_hash" ]; then
|
|
echo "ERROR: Integrity check failed!"
|
|
echo "Expected: $expected_hash"
|
|
echo "Actual: $actual_hash"
|
|
exit 1
|
|
fi
|
|
|
|
echo "Integrity verified: $actual_hash"
|
|
```
|
|
|
|
### Using Server-Side Verification
|
|
|
|
For critical deployments, use server-side pre-verification:
|
|
|
|
```bash
|
|
# Server verifies before streaming - returns 500 if corrupt
|
|
curl -f "${ARTIFACT_URL}?mode=proxy&verify=true&verify_mode=pre" -o artifact.tar.gz
|
|
```
|
|
|
|
This ensures the artifact is verified before any bytes are streamed to your pipeline.
|