Files
orchard/README.md
Mondo Diaz 82f93d8cdd Fix Helm chart: rename minio.ingress to minioIngress to avoid subchart conflict
The minio.ingress config was conflicting with the Bitnami MinIO subchart's
own ingress configuration, causing coalesce.go warnings. Renamed to
minioIngress as a top-level config to avoid the collision.
2025-12-16 12:28:49 -06:00

604 lines
20 KiB
Markdown

# Orchard
**Content-Addressable Storage System**
Orchard is a centralized binary artifact storage system that provides content-addressable storage with automatic deduplication, flexible access control, and multi-format package support.
## Tech Stack
- **Backend**: Python 3.12 + FastAPI
- **Frontend**: React 18 + TypeScript + Vite
- **Database**: PostgreSQL 16
- **Object Storage**: MinIO (S3-compatible)
- **Cache**: Redis (for future use)
## Features
### Currently Implemented
- **Content-Addressable Storage** - Artifacts are stored and referenced by their SHA256 hash, ensuring deduplication and data integrity
- **Project/Package/Artifact Hierarchy** - Organized storage structure:
- **Project** - Top-level organizational container
- **Package** - Named collection within a project
- **Artifact** - Specific content instance identified by SHA256
- **Tags** - Alias system for referencing artifacts by human-readable names (e.g., `v1.0.0`, `latest`, `stable`)
- **Package Formats & Platforms** - Packages can be tagged with format (npm, pypi, docker, deb, rpm, etc.) and platform (linux, darwin, windows, etc.)
- **Rich Package Metadata** - Package listings include aggregated stats (tag count, artifact count, total size, latest tag)
- **S3-Compatible Backend** - Uses MinIO (or any S3-compatible storage) for artifact storage
- **PostgreSQL Metadata** - Relational database for metadata, access control, and audit trails
- **REST API** - Full HTTP API for all operations
- **Web UI** - React-based interface for managing artifacts with:
- Hierarchical navigation (Projects → Packages → Tags/Artifacts)
- Search, sort, and filter capabilities on all list views
- URL-based state persistence for filters and pagination
- Keyboard navigation (Backspace to go up hierarchy)
- Copy-to-clipboard for artifact IDs
- Responsive design for mobile and desktop
- **Docker Compose Setup** - Easy local development environment
- **Helm Chart** - Kubernetes deployment with PostgreSQL, MinIO, and Redis subcharts
- **Multipart Upload** - Automatic multipart upload for files larger than 100MB
- **Resumable Uploads** - API for resumable uploads with part-by-part upload support
- **Range Requests** - HTTP range request support for partial downloads
- **Format-Specific Metadata** - Automatic extraction of metadata from package formats:
- `.deb` - Debian packages (name, version, architecture, maintainer)
- `.rpm` - RPM packages (name, version, release, architecture)
- `.tar.gz/.tgz` - Tarballs (name, version from filename)
- `.whl` - Python wheels (name, version, author)
- `.jar` - Java JARs (manifest info, Maven coordinates)
- `.zip` - ZIP files (file count, uncompressed size)
### API Endpoints
| Method | Endpoint | Description |
|--------|----------|-------------|
| `GET` | `/` | Web UI |
| `GET` | `/health` | Health check |
| `GET` | `/api/v1/projects` | List all projects |
| `POST` | `/api/v1/projects` | Create a new project |
| `GET` | `/api/v1/projects/:project` | Get project details |
| `GET` | `/api/v1/project/:project/packages` | List packages (with pagination, search, filtering) |
| `GET` | `/api/v1/project/:project/packages/:package` | Get single package with metadata |
| `POST` | `/api/v1/project/:project/packages` | Create a new package |
| `POST` | `/api/v1/project/:project/:package/upload` | Upload an artifact |
| `GET` | `/api/v1/project/:project/:package/+/:ref` | Download an artifact (supports Range header, mode param) |
| `GET` | `/api/v1/project/:project/:package/+/:ref/url` | Get presigned URL for direct S3 download |
| `HEAD` | `/api/v1/project/:project/:package/+/:ref` | Get artifact metadata without downloading |
| `GET` | `/api/v1/project/:project/:package/tags` | List tags (with pagination, search, sorting, artifact metadata) |
| `POST` | `/api/v1/project/:project/:package/tags` | Create a tag |
| `GET` | `/api/v1/project/:project/:package/tags/:tag_name` | Get single tag with artifact metadata |
| `GET` | `/api/v1/project/:project/:package/tags/:tag_name/history` | Get tag change history |
| `GET` | `/api/v1/project/:project/:package/artifacts` | List artifacts in package (with filtering) |
| `GET` | `/api/v1/project/:project/:package/consumers` | List consumers of a package |
| `GET` | `/api/v1/artifact/:id` | Get artifact metadata with referencing tags |
#### Resumable Upload Endpoints
For large files, use the resumable upload API:
| Method | Endpoint | Description |
|--------|----------|-------------|
| `POST` | `/api/v1/project/:project/:package/upload/init` | Initialize resumable upload |
| `PUT` | `/api/v1/project/:project/:package/upload/:upload_id/part/:part_number` | Upload a part |
| `POST` | `/api/v1/project/:project/:package/upload/:upload_id/complete` | Complete upload |
| `DELETE` | `/api/v1/project/:project/:package/upload/:upload_id` | Abort upload |
| `GET` | `/api/v1/project/:project/:package/upload/:upload_id/status` | Get upload status |
### Reference Formats
When downloading artifacts, the `:ref` parameter supports multiple formats:
- `latest` - Tag name directly
- `v1.0.0` - Version tag
- `tag:stable` - Explicit tag reference
- `version:2024.1` - Version reference
- `artifact:a3f5d8e12b4c6789...` - Direct SHA256 hash reference
## Quick Start
### Prerequisites
- Docker and Docker Compose
### Running Locally
```bash
# Start all services
docker-compose up -d
# View logs
docker-compose logs -f orchard-server
# Stop services
docker-compose down
```
### Services
| Service | Port | Description |
|---------|------|-------------|
| orchard-server | 8080 | Main API server and Web UI |
| postgres | 5432 | PostgreSQL database |
| minio | 9000 | S3-compatible object storage |
| minio (console) | 9001 | MinIO web console |
| redis | 6379 | Cache (for future use) |
### Access Points
- **Web UI**: http://localhost:8080
- **API**: http://localhost:8080/api/v1
- **API Docs**: http://localhost:8080/docs
- **MinIO Console**: http://localhost:9001 (user: `minioadmin`, pass: `minioadmin`)
## Development
### Backend (FastAPI)
```bash
cd backend
pip install -r requirements.txt
uvicorn app.main:app --reload --port 8080
```
### Frontend (React)
```bash
cd frontend
npm install
npm run dev
```
The frontend dev server proxies API requests to `localhost:8080`.
## Usage Examples
### Create a Project
```bash
curl -X POST http://localhost:8080/api/v1/projects \
-H "Content-Type: application/json" \
-d '{"name": "my-project", "description": "My project artifacts", "is_public": true}'
```
### Create a Package
```bash
curl -X POST http://localhost:8080/api/v1/project/my-project/packages \
-H "Content-Type: application/json" \
-d '{"name": "releases", "description": "Release builds", "format": "generic", "platform": "any"}'
```
Supported formats: `generic`, `npm`, `pypi`, `docker`, `deb`, `rpm`, `maven`, `nuget`, `helm`
Supported platforms: `any`, `linux`, `darwin`, `windows`, `linux-amd64`, `linux-arm64`, `darwin-amd64`, `darwin-arm64`, `windows-amd64`
### List Packages
```bash
# Basic listing
curl http://localhost:8080/api/v1/project/my-project/packages
# With pagination
curl "http://localhost:8080/api/v1/project/my-project/packages?page=1&limit=10"
# With search
curl "http://localhost:8080/api/v1/project/my-project/packages?search=release"
# With sorting
curl "http://localhost:8080/api/v1/project/my-project/packages?sort=created_at&order=desc"
# Filter by format/platform
curl "http://localhost:8080/api/v1/project/my-project/packages?format=npm&platform=linux"
```
Response includes aggregated metadata:
```json
{
"items": [
{
"id": "uuid",
"name": "releases",
"description": "Release builds",
"format": "generic",
"platform": "any",
"tag_count": 5,
"artifact_count": 3,
"total_size": 1048576,
"latest_tag": "v1.0.0",
"latest_upload_at": "2025-01-01T00:00:00Z",
"recent_tags": [...]
}
],
"pagination": {"page": 1, "limit": 20, "total": 1, "total_pages": 1}
}
```
### Get Single Package
```bash
curl http://localhost:8080/api/v1/project/my-project/packages/releases
# Include all tags (not just recent 5)
curl "http://localhost:8080/api/v1/project/my-project/packages/releases?include_tags=true"
```
### Upload an Artifact
```bash
curl -X POST http://localhost:8080/api/v1/project/my-project/releases/upload \
-F "file=@./build/app-v1.0.0.tar.gz" \
-F "tag=v1.0.0"
```
Response:
```json
{
"artifact_id": "a3f5d8e12b4c67890abcdef1234567890abcdef1234567890abcdef12345678",
"size": 1048576,
"project": "my-project",
"package": "releases",
"tag": "v1.0.0",
"format_metadata": {
"format": "tarball",
"package_name": "app",
"version": "1.0.0"
},
"deduplicated": false
}
```
### Resumable Upload (for large files)
For files larger than 100MB, use the resumable upload API:
```bash
# 1. Initialize upload (client must compute SHA256 hash first)
curl -X POST http://localhost:8080/api/v1/project/my-project/releases/upload/init \
-H "Content-Type: application/json" \
-d '{
"expected_hash": "a3f5d8e12b4c67890abcdef1234567890abcdef1234567890abcdef12345678",
"filename": "large-file.tar.gz",
"size": 524288000,
"tag": "v2.0.0"
}'
# Response: {"upload_id": "abc123", "already_exists": false, "chunk_size": 10485760}
# 2. Upload parts (10MB chunks recommended)
curl -X PUT http://localhost:8080/api/v1/project/my-project/releases/upload/abc123/part/1 \
--data-binary @chunk1.bin
# 3. Complete the upload
curl -X POST http://localhost:8080/api/v1/project/my-project/releases/upload/abc123/complete \
-H "Content-Type: application/json" \
-d '{"tag": "v2.0.0"}'
```
### Download an Artifact
```bash
# By tag (use -OJ to save with the correct filename from Content-Disposition header)
curl -OJ http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
# By artifact ID
curl -OJ http://localhost:8080/api/v1/project/my-project/releases/+/artifact:a3f5d8e12b4c6789...
# Using the short URL pattern
curl -OJ http://localhost:8080/project/my-project/releases/+/latest
# Save to a specific filename
curl -o myfile.tar.gz http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
# Partial download (range request)
curl -H "Range: bytes=0-1023" http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
# Check file info without downloading (HEAD request)
curl -I http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0
# Download with specific mode (presigned, redirect, or proxy)
curl "http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0?mode=proxy"
# Get presigned URL for direct S3 download
curl http://localhost:8080/api/v1/project/my-project/releases/+/v1.0.0/url
```
> **Note on curl flags:**
> - `-O` saves the file using the URL path as the filename (e.g., `latest`, `v1.0.0`)
> - `-J` tells curl to use the filename from the `Content-Disposition` header (e.g., `app-v1.0.0.tar.gz`)
> - `-OJ` combines both: download to a file using the server-provided filename
> - `-o <filename>` saves to a specific filename you choose
#### Download Modes
Orchard supports three download modes, configurable via `ORCHARD_DOWNLOAD_MODE` or per-request with `?mode=`:
| Mode | Description | Use Case |
|------|-------------|----------|
| `presigned` (default) | Returns JSON with a presigned S3 URL | Clients that handle redirects themselves, web UIs |
| `redirect` | Returns HTTP 302 redirect to presigned S3 URL | Simple clients, browsers, wget |
| `proxy` | Streams content through the backend | When S3 isn't directly accessible to clients |
**Presigned URL Response:**
```json
{
"url": "https://minio.example.com/bucket/...",
"expires_at": "2025-01-01T01:00:00Z",
"method": "GET",
"artifact_id": "a3f5d8e...",
"size": 1048576,
"content_type": "application/gzip",
"original_name": "app-v1.0.0.tar.gz",
"checksum_sha256": "a3f5d8e...",
"checksum_md5": "d41d8cd..."
}
```
> **Note:** For presigned URLs to work, clients must be able to reach the S3 endpoint directly. In Kubernetes, this requires exposing MinIO via ingress (see Helm configuration below).
### Create a Tag
```bash
curl -X POST http://localhost:8080/api/v1/project/my-project/releases/tags \
-H "Content-Type: application/json" \
-d '{"name": "stable", "artifact_id": "a3f5d8e12b4c6789..."}'
```
### List Tags
```bash
# Basic listing with artifact metadata
curl http://localhost:8080/api/v1/project/my-project/releases/tags
# With pagination
curl "http://localhost:8080/api/v1/project/my-project/releases/tags?page=1&limit=10"
# Search by tag name
curl "http://localhost:8080/api/v1/project/my-project/releases/tags?search=v1"
# Sort by created_at descending
curl "http://localhost:8080/api/v1/project/my-project/releases/tags?sort=created_at&order=desc"
```
Response includes artifact metadata:
```json
{
"items": [
{
"id": "uuid",
"package_id": "uuid",
"name": "v1.0.0",
"artifact_id": "a3f5d8e...",
"created_at": "2025-01-01T00:00:00Z",
"created_by": "user",
"artifact_size": 1048576,
"artifact_content_type": "application/gzip",
"artifact_original_name": "app-v1.0.0.tar.gz",
"artifact_created_at": "2025-01-01T00:00:00Z",
"artifact_format_metadata": {}
}
],
"pagination": {"page": 1, "limit": 20, "total": 1, "total_pages": 1}
}
```
### Get Single Tag
```bash
curl http://localhost:8080/api/v1/project/my-project/releases/tags/v1.0.0
```
### Get Tag History
```bash
curl http://localhost:8080/api/v1/project/my-project/releases/tags/latest/history
```
Returns list of artifact changes for the tag (most recent first).
### List Artifacts in Package
```bash
# Basic listing
curl http://localhost:8080/api/v1/project/my-project/releases/artifacts
# Filter by content type
curl "http://localhost:8080/api/v1/project/my-project/releases/artifacts?content_type=application/gzip"
# Filter by date range
curl "http://localhost:8080/api/v1/project/my-project/releases/artifacts?created_after=2025-01-01T00:00:00Z"
```
Response includes tags pointing to each artifact:
```json
{
"items": [
{
"id": "a3f5d8e...",
"size": 1048576,
"content_type": "application/gzip",
"original_name": "app-v1.0.0.tar.gz",
"created_at": "2025-01-01T00:00:00Z",
"created_by": "user",
"format_metadata": {},
"tags": ["v1.0.0", "latest", "stable"]
}
],
"pagination": {"page": 1, "limit": 20, "total": 1, "total_pages": 1}
}
```
### Get Artifact by ID
```bash
curl http://localhost:8080/api/v1/artifact/a3f5d8e12b4c67890abcdef1234567890abcdef1234567890abcdef12345678
```
Response includes all tags/packages referencing the artifact:
```json
{
"id": "a3f5d8e...",
"size": 1048576,
"content_type": "application/gzip",
"original_name": "app-v1.0.0.tar.gz",
"created_at": "2025-01-01T00:00:00Z",
"created_by": "user",
"ref_count": 2,
"format_metadata": {},
"tags": [
{
"id": "uuid",
"name": "v1.0.0",
"package_id": "uuid",
"package_name": "releases",
"project_name": "my-project"
}
]
}
```
## Project Structure
```
orchard/
├── backend/
│ ├── app/
│ │ ├── __init__.py
│ │ ├── config.py # Pydantic settings
│ │ ├── database.py # SQLAlchemy setup and migrations
│ │ ├── main.py # FastAPI application
│ │ ├── metadata.py # Format-specific metadata extraction
│ │ ├── models.py # SQLAlchemy models
│ │ ├── routes.py # API endpoints
│ │ ├── schemas.py # Pydantic schemas
│ │ └── storage.py # S3 storage layer with multipart support
│ └── requirements.txt
├── frontend/
│ ├── src/
│ │ ├── components/ # Reusable UI components
│ │ │ ├── Badge.tsx # Status/type badges
│ │ │ ├── Breadcrumb.tsx # Navigation breadcrumbs
│ │ │ ├── Card.tsx # Card containers
│ │ │ ├── DataTable.tsx # Sortable data tables
│ │ │ ├── FilterChip.tsx # Active filter chips
│ │ │ ├── Pagination.tsx # Page navigation
│ │ │ ├── SearchInput.tsx # Debounced search
│ │ │ └── SortDropdown.tsx# Sort field selector
│ │ ├── pages/ # Page components
│ │ │ ├── Home.tsx # Project list
│ │ │ ├── ProjectPage.tsx # Package list within project
│ │ │ └── PackagePage.tsx # Tag/artifact list within package
│ │ ├── api.ts # API client with pagination support
│ │ ├── types.ts # TypeScript interfaces
│ │ ├── App.tsx
│ │ └── main.tsx
│ ├── index.html
│ ├── package.json
│ ├── tsconfig.json
│ └── vite.config.ts
├── helm/
│ └── orchard/ # Helm chart
├── Dockerfile # Multi-stage build (Node + Python)
├── docker-compose.yml # Local development stack
└── .gitlab-ci.yml # CI/CD pipeline
```
## Configuration
Configuration is provided via environment variables prefixed with `ORCHARD_`:
| Environment Variable | Description | Default |
|---------------------|-------------|---------|
| `ORCHARD_SERVER_HOST` | Server bind address | `0.0.0.0` |
| `ORCHARD_SERVER_PORT` | Server port | `8080` |
| `ORCHARD_DATABASE_HOST` | PostgreSQL host | `localhost` |
| `ORCHARD_DATABASE_PORT` | PostgreSQL port | `5432` |
| `ORCHARD_DATABASE_USER` | PostgreSQL user | `orchard` |
| `ORCHARD_DATABASE_PASSWORD` | PostgreSQL password | - |
| `ORCHARD_DATABASE_DBNAME` | PostgreSQL database | `orchard` |
| `ORCHARD_S3_ENDPOINT` | S3 endpoint URL | - |
| `ORCHARD_S3_REGION` | S3 region | `us-east-1` |
| `ORCHARD_S3_BUCKET` | S3 bucket name | `orchard-artifacts` |
| `ORCHARD_S3_ACCESS_KEY_ID` | S3 access key | - |
| `ORCHARD_S3_SECRET_ACCESS_KEY` | S3 secret key | - |
| `ORCHARD_DOWNLOAD_MODE` | Download mode: `presigned`, `redirect`, or `proxy` | `presigned` |
| `ORCHARD_PRESIGNED_URL_EXPIRY` | Presigned URL expiry in seconds | `3600` |
## Kubernetes Deployment
### Using Helm
```bash
# Add Bitnami repo for dependencies
helm repo add bitnami https://charts.bitnami.com/bitnami
# Update dependencies
cd helm/orchard
helm dependency update
# Install
helm install orchard ./helm/orchard -n orchard --create-namespace
# Install with custom values
helm install orchard ./helm/orchard -f my-values.yaml
```
### Helm Configuration
Key configuration options in `values.yaml`:
```yaml
orchard:
# Download configuration
download:
mode: "presigned" # presigned, redirect, or proxy
presignedUrlExpiry: 3600
# MinIO ingress (required for presigned URL downloads)
minioIngress:
enabled: true
className: "nginx"
annotations:
cert-manager.io/cluster-issuer: "letsencrypt"
host: "minio.your-domain.com"
tls:
enabled: true
secretName: minio-tls
```
When `minioIngress.enabled` is `true`, the S3 endpoint automatically uses the external URL (`https://minio.your-domain.com`), making presigned URLs accessible to external clients.
See `helm/orchard/values.yaml` for all configuration options.
## Database Schema
### Core Tables
- **projects** - Top-level organizational containers
- **packages** - Collections within projects
- **artifacts** - Content-addressable artifacts (SHA256)
- **tags** - Aliases pointing to artifacts
- **tag_history** - Audit trail for tag changes
- **uploads** - Upload event records
- **consumers** - Dependency tracking
- **access_permissions** - Project-level access control
- **api_keys** - Programmatic access tokens
- **audit_logs** - Immutable operation logs
## Future Work
The following features are planned but not yet implemented:
- [ ] CLI tool (`orchard` command)
- [ ] Dependency file parsing
- [ ] Lock file generation
- [ ] Export/Import for air-gapped systems
- [ ] Consumer notification
- [ ] Automated update propagation
- [ ] OIDC/SAML authentication
- [ ] API key management
- [ ] Redis caching layer
- [ ] Garbage collection for orphaned artifacts
## License
Internal use only.