Comprehensive design for:
- HTTP connection pooling with lifecycle management
- Redis caching layer (TTL for discovery, permanent for immutable)
- Abstract PackageProxyBase for multi-protocol support (npm, Maven)
- Database query optimization with batch operations
- Dependency resolution caching for ensure files
- Observability via health endpoints
Maintains hermetic build guarantees: artifact content and extracted
metadata are immutable, only discovery data uses TTL-based caching.
Wheel METADATA files can list the same dependency multiple times under
different extras (e.g., bokeh appears under [docs] and [bokeh-tests]).
This caused unique constraint violations when storing dependencies.
Fix by deduplicating extracted deps before DB insertion.
Adds the tag removal migration to the inline migrations in database.py:
- Drops tag-related triggers and functions
- Removes tag_constraint column from artifact_dependencies
- Makes version_constraint NOT NULL
- Drops tags and tag_history tables
- Renames uploads.tag_name to version
Re-adds the dependency extraction that was accidentally removed with the
proactive caching feature. Now when a PyPI package is cached:
1. Extract METADATA from wheel or PKG-INFO from sdist
2. Parse Requires-Dist lines for dependencies
3. Store in artifact_dependencies table
This restores the dependency graph functionality for PyPI packages.
Adds ORCHARD_PYPI_DOWNLOAD_MODE setting (default: "redirect"):
- "redirect": Redirect pip to S3 presigned URL - reduces pod bandwidth
- "proxy": Stream through Orchard pod - for environments where clients can't reach S3
In redirect mode, Orchard only handles metadata requests and upstream fetches.
All file transfers go directly from S3 to the client.
The list_package_artifacts endpoint was only querying artifacts via the
Upload table. PyPI proxy creates PackageVersion records but not Upload
records, so cached packages would show stats (size, version count) but
no artifacts in the listing.
Now queries artifacts from both Upload and PackageVersion tables using
a union, so PyPI-cached packages display their artifacts correctly.
Large packages like TensorFlow (~600MB) caused read timeouts because the
entire file was loaded into memory before responding to the client. Now
the file is stored to S3 first, then streamed back using StreamingResponse.
- Add sha256 field to list_package_artifacts response (artifact ID is SHA256)
- Add version field to PackageArtifactResponse schema
- Add version field to frontend PackageArtifact type
- Update getArtifactVersion to prefer direct version field
- Fix upload response to return actual version (not requested version)
when artifact already has a version in the package
- Update ref_count tests to use multiple packages (one version per
artifact per package design constraint)
- Remove allow_public_internet references from upstream caching tests
- Update consistency check test to not assert global system health
- Add versions field to artifact schemas
- Fix dependencies resolution to handle removed tag constraint
- Update CacheRequest test to use version field
- Fix upload_test_file calls that still used tag parameter
- Update artifact history test to check versions instead of tags
- Update artifact stats tests to check versions instead of tags
- Fix garbage collection tests to delete versions instead of tags
- Remove TestGlobalTags class (endpoint removed)
- Update project/package stats tests to check version_count
- Fix upload_test_file fixture in test_download_verification
- Remove Tag/TagHistory model tests from unit tests
- Update CacheSettings tests to remove allow_public_internet field
- Replace tag= with version= in upload_test_file calls
- Update test assertions to use versions instead of tags
- Remove tests for tag: prefix downloads (now uses version:)
- Update dependency tests for version-only schema
Tags were mutable aliases that caused confusion alongside the immutable
version system. This removes tags entirely, keeping only PackageVersion
for artifact references.
Changes:
- Remove tags and tag_history tables (migration 012)
- Remove Tag model, TagRepository, and 6 tag API endpoints
- Update cache system to create versions instead of tags
- Update frontend to display versions instead of tags
- Remove tag-related schemas and types
- Update artifact cleanup service for version-based ref_count
The artifacts endpoint only supports sorting by: created_at, size, original_name
But the frontend was defaulting to 'name' (from the old tags endpoint).
- Change default sort from 'name' to 'created_at'
- Change default order from 'asc' to 'desc' (newest first)
- Remove sortable flag from version/tags columns (not DB fields)
- Add sortable flag to original_name and size columns
The circular dependency error '_pypi/psutil → _pypi/psutil' occurred because
dependencies with extras like 'psutil[test]' weren't being recognized as
self-dependencies. The comparison 'psutil[test] != psutil' failed.
- Add _normalize_pypi_package_name() helper that strips extras brackets
and normalizes separators per PEP 503
- Update _detect_package_cycle to use normalized names for cycle detection
- Update check_circular_dependencies to use normalized initial path
- Simplify self-dependency check in resolve_dependencies to use helper
- Add artifact: prefix handling in resolve_dependencies for direct artifact
ID references, enabling dependency resolution for tagless artifacts
- Refactor PackagePage from tag-based to artifact-based data display
- Add PackageArtifact type with tags array for artifact-centric API responses
- Update download URLs to use artifact:ID prefix when no tags exist
- Conditionally show "View Ensure File" only when artifact has tags
- Add _parse_upstream_error() to extract policy messages from JFrog/Artifactory
- Pass through 403 and other 4xx errors with detailed messages
- Pin babel and electron-to-chromium to older versions for CI compatibility
- Install reactflow and dagre for professional graph visualization
- Use dagre for automatic tree layout (top-to-bottom)
- Custom styled nodes with package name, version, and size
- Built-in zoom/pan controls and minimap
- Click nodes to navigate to package page
- Cleaner, more professional appearance
- Add artifact-level self-dependency check (skip if dep resolves to same artifact)
- Close dependency graph modal if package has no dependencies to show
(only root package with no children and no missing deps)
PyPI packages can have self-referential dependencies for extras
(e.g., pytest[testing] depends on pytest). These were incorrectly
detected as circular dependencies. Now we skip them.
The backend returns detail as an object for some errors (circular dependency,
conflicts, etc.). The API client now JSON.stringifies object details so they
can be properly parsed by error handlers like DependencyGraph.
When dependencies are not cached on the server (common since we removed
proactive caching), the dependency graph now:
- Continues resolving what it can find
- Shows missing dependencies in a separate section with amber styling
- Displays the constraint and which package required them
- Updates the header stats to show "X cached • Y not cached"
This provides a better user experience than showing an error when
some dependencies haven't been downloaded yet.
When a dependency has an invalid version constraint like '>=' (without
a version number), the resolver now treats it as a wildcard and returns
the latest available version instead of failing with 'Dependency not found'.
This handles malformed metadata that may have been stored from PyPI packages.
The background task queue for proactively caching package dependencies was
causing server instability and unnecessary growth. The PyPI proxy now only
caches packages on-demand when users request them.
Removed:
- PyPI cache worker (background task queue and worker pool)
- PyPICacheTask model and related database schema
- Cache management API endpoints (/pypi/cache/*)
- Background Jobs admin dashboard
- Dependency extraction and queueing logic
Kept:
- On-demand package caching (still works when users request packages)
- Async httpx for non-blocking downloads (prevents health check failures)
- URL-based cache lookups for deduplication
The pypi_download_file, pypi_simple_index, and pypi_package_versions endpoints
were using synchronous httpx.Client inside async functions. When upstream PyPI
servers respond slowly, this blocked the entire FastAPI event loop, preventing
health checks from responding. Kubernetes would then kill the pod after the
liveness probe timed out.
Changes:
- httpx.Client → httpx.AsyncClient
- client.get() → await client.get()
- response.iter_bytes() → response.aiter_bytes()
This ensures the event loop remains responsive during slow upstream downloads,
allowing health checks to succeed even when downloads take 20+ seconds.
- Remove "All Jobs" title
- Move Status column to front of table
- Add Cancel button for in-progress jobs
- Add cancel endpoint: POST /pypi/cache/cancel/{package_name}
- Add btn-danger CSS styling
- Download packages in 64KB chunks to temp file instead of loading into memory
- Upload to S3 from temp file (streaming)
- Clean up temp file after processing
- Reduces memory footprint from 2x file size to 1x file size
- Add orchard.pypiCache config section to helm values
- Set default workers to 2 (reduced from 5 to limit memory)
- Bump pod memory from 512Mi to 768Mi (request=limit)
- Add ORCHARD_PYPI_CACHE_* env vars to deployment template