Commit Graph

263 Commits

Author SHA1 Message Date
Mondo Diaz
561b82da92 chore: remove unused auto_fetch_max_depth config setting 2026-02-04 14:04:35 -06:00
Mondo Diaz
0fb69a6aaa feat: remove fetch depth limit for dependency resolution
Real package managers (pip, npm, Maven) don't have depth limits - they
resolve the full dependency tree. We have other safeguards:
- Loop prevention via fetch_attempted set
- Timeout via auto_fetch_timeout setting
- Dependency trees are finite
2026-02-04 13:55:53 -06:00
Mondo Diaz
f1ac43c1cb fix: use lenient conflict handling for dependency resolution
Instead of failing with 409 on version conflicts, use "first version wins"
strategy. This allows resolution to succeed for complex dependency trees
like tensorflow where transitive dependencies may have overlapping but
not identical version requirements.

The resolver now:
- Checks if an already-resolved version satisfies a new constraint
- If yes, reuses the existing version
- If no, logs the mismatch and uses the first-encountered version

This matches pip's behavior of picking a working version rather than
failing on theoretical conflicts.
2026-02-04 13:45:15 -06:00
Mondo Diaz
23ffbada00 feat: increase auto_fetch_max_depth from 3 to 10 2026-02-04 13:38:57 -06:00
Mondo Diaz
2ea3a39416 fix: prevent false circular dependency detection on self-dependencies
When packages like pytest have extras (e.g., pytest[testing]) that depend
on the base package, the resolution was incorrectly detecting this as a
circular dependency.

Added additional check to skip dependencies that resolve to an artifact
already in the visiting set, preventing the false cycle detection while
still catching real circular dependencies.
2026-02-04 13:19:03 -06:00
Mondo Diaz
7bfec020c8 feat: change auto_fetch default to true
Auto-fetching missing dependencies from upstream is the more useful default
behavior. Users who need fast, network-free resolution can explicitly set
auto_fetch=false.

Artifacts are content-addressed by SHA256, so reproducibility concerns don't
apply - the same version always produces the same artifact.
2026-02-04 12:23:01 -06:00
Mondo Diaz
6b9863f9c3 fix: fetch root artifact from upstream when missing in auto_fetch mode
When auto_fetch=true and the root artifact doesn't exist locally in a
system project (_pypi), now attempts to fetch it from upstream before
starting dependency resolution. Also fixed a bug where fetched_artifacts
was being redeclared, which would lose the root artifact from the list.
2026-02-04 12:18:44 -06:00
Mondo Diaz
5cff4092e3 feat: add auto-fetch for missing dependencies from upstream registries
Add auto_fetch parameter to dependency resolution endpoint that fetches
missing dependencies from upstream registries (PyPI) when resolving.

- Add RegistryClient abstraction with PyPIRegistryClient implementation
- Extract fetch_and_cache_pypi_package() for reuse
- Add resolve_dependencies_with_fetch() async function
- Extend MissingDependency schema with fetch_attempted/fetch_error
- Add fetched list to DependencyResolutionResponse
- Add auto_fetch_max_depth config setting (default: 3)
- Remove Usage section from Package page UI
- Add 6 integration tests for auto-fetch functionality
2026-02-04 12:01:49 -06:00
Mondo Diaz
b82bd1c85a fix: remove dead code and security issue from code review
- Remove unused _get_pypi_upstream_sources_cached function (never called)
- Remove unused CacheService import and get_cache helper
- Remove unused cache parameter from pypi_download_file
- Fix asyncio.get_event_loop() deprecation - use get_running_loop()
- Note: The caching implementation was incomplete but the other
  performance improvements (connection pooling, batch DB ops) remain
2026-02-04 10:57:32 -06:00
Mondo Diaz
632bf54087 fix: correct test imports and health endpoint assertions
- Fix import in test_db_utils.py: use app.models instead of backend.app.models
- Update health endpoint test to expect 'ok' status and infrastructure keys
- Add CHANGELOG entries for PyPI proxy performance improvements
2026-02-04 10:37:12 -06:00
Mondo Diaz
170561b32a feat: add infrastructure status to health endpoint 2026-02-04 09:54:45 -06:00
Mondo Diaz
6e05697ae2 infra: enable Redis in Helm chart values for all environments 2026-02-04 09:53:38 -06:00
Mondo Diaz
08b6589712 test: add infrastructure integration tests for pypi_proxy 2026-02-04 09:53:02 -06:00
Mondo Diaz
7ad5a15ef4 perf: use batch dependency storage in pypi_proxy 2026-02-04 09:52:16 -06:00
Mondo Diaz
8fdb73901e perf: use shared HTTP client pool in pypi_download_file 2026-02-04 09:51:05 -06:00
Mondo Diaz
79dd7b833e perf: cache upstream sources lookup in pypi_proxy 2026-02-04 09:49:59 -06:00
Mondo Diaz
71089aee0e refactor: add infrastructure dependency injection to pypi_proxy
Add dependency injection helper functions for HttpClientManager
and CacheService, along with imports for the new infrastructure
modules (http_client, cache_service, db_utils).
2026-02-04 09:49:04 -06:00
Mondo Diaz
ffe0529ea8 feat: add ArtifactRepository with batch DB operations
Add optimized database operations for artifact storage:
- Atomic upserts using ON CONFLICT for artifact creation
- Batch inserts for dependencies to eliminate N+1 queries
- Joined queries for cached URL lookups
- All methods include comprehensive unit tests
2026-02-04 09:48:08 -06:00
Mondo Diaz
146ca2ad74 feat: integrate HttpClientManager and CacheService into lifespan 2026-02-04 09:45:09 -06:00
Mondo Diaz
a045509fe4 feat: add CacheService with Redis caching and graceful fallback
Implements Redis-backed caching with category-aware TTL management:
- Immutable categories (artifact metadata, dependencies) cached forever
- Mutable categories (index pages, upstream sources) use configurable TTL
- Graceful fallback when Redis unavailable or disabled
- Pattern-based invalidation for bulk cache clearing
2026-02-04 09:44:12 -06:00
Mondo Diaz
14806b05f0 feat: add HttpClientManager with connection pooling
Add HttpClientManager class for managing httpx.AsyncClient pools with
FastAPI lifespan integration. Features include:
- Default shared connection pool for general requests
- Configurable max connections, keep-alive, and timeouts
- Dedicated thread pool for blocking I/O operations
- Graceful startup/shutdown lifecycle management
- Per-upstream client isolation support (for future use)

Includes comprehensive unit tests covering initialization, startup,
shutdown, client retrieval, blocking operations, idempotency, and
error handling.
2026-02-04 09:16:27 -06:00
Mondo Diaz
c67004af52 config: add HTTP pool, Redis, and updated DB pool settings 2026-02-04 09:12:01 -06:00
Mondo Diaz
8c6ba01a73 deps: add redis-py for caching layer 2026-02-04 09:11:12 -06:00
Mondo Diaz
196f3f957c docs: add detailed implementation plan for PyPI proxy performance 2026-02-04 09:05:18 -06:00
Mondo Diaz
9cadfa3b1b Add PyPI proxy performance & multi-protocol architecture design
Comprehensive design for:
- HTTP connection pooling with lifecycle management
- Redis caching layer (TTL for discovery, permanent for immutable)
- Abstract PackageProxyBase for multi-protocol support (npm, Maven)
- Database query optimization with batch operations
- Dependency resolution caching for ensure files
- Observability via health endpoints

Maintains hermetic build guarantees: artifact content and extracted
metadata are immutable, only discovery data uses TTL-based caching.
2026-02-04 08:56:40 -06:00
Mondo Diaz
19e034ef56 Fix duplicate dependency extraction from PyPI wheel METADATA
Wheel METADATA files can list the same dependency multiple times under
different extras (e.g., bokeh appears under [docs] and [bokeh-tests]).
This caused unique constraint violations when storing dependencies.

Fix by deduplicating extracted deps before DB insertion.
2026-02-03 17:43:38 -06:00
Mondo Diaz
45a48cc1ee Add inline migration for tag removal (024_remove_tags)
Adds the tag removal migration to the inline migrations in database.py:
- Drops tag-related triggers and functions
- Removes tag_constraint column from artifact_dependencies
- Makes version_constraint NOT NULL
- Drops tags and tag_history tables
- Renames uploads.tag_name to version
2026-02-03 17:22:40 -06:00
Mondo Diaz
7068f36cb5 Restore dependency extraction from PyPI packages
Re-adds the dependency extraction that was accidentally removed with the
proactive caching feature. Now when a PyPI package is cached:
1. Extract METADATA from wheel or PKG-INFO from sdist
2. Parse Requires-Dist lines for dependencies
3. Store in artifact_dependencies table

This restores the dependency graph functionality for PyPI packages.
2026-02-03 17:18:54 -06:00
Mondo Diaz
e471202f2e Fix SQLAlchemy subquery warning in artifact listing 2026-02-03 17:10:34 -06:00
Mondo Diaz
d12e4cdfc5 Add configurable PyPI download mode (redirect vs proxy)
Adds ORCHARD_PYPI_DOWNLOAD_MODE setting (default: "redirect"):
- "redirect": Redirect pip to S3 presigned URL - reduces pod bandwidth
- "proxy": Stream through Orchard pod - for environments where clients can't reach S3

In redirect mode, Orchard only handles metadata requests and upstream fetches.
All file transfers go directly from S3 to the client.
2026-02-03 17:09:05 -06:00
Mondo Diaz
1ffe17bf62 Fix artifact listing to include PyPI proxy cached packages
The list_package_artifacts endpoint was only querying artifacts via the
Upload table. PyPI proxy creates PackageVersion records but not Upload
records, so cached packages would show stats (size, version count) but
no artifacts in the listing.

Now queries artifacts from both Upload and PackageVersion tables using
a union, so PyPI-cached packages display their artifacts correctly.
2026-02-03 16:46:35 -06:00
Mondo Diaz
c21af708af Fix PyPI proxy timeout by streaming from S3 instead of loading into memory
Large packages like TensorFlow (~600MB) caused read timeouts because the
entire file was loaded into memory before responding to the client. Now
the file is stored to S3 first, then streamed back using StreamingResponse.
2026-02-03 16:42:30 -06:00
Mondo Diaz
1ae989249b Fix PackageArtifactResponse missing sha256 and version fields
- Add sha256 field to list_package_artifacts response (artifact ID is SHA256)
- Add version field to PackageArtifactResponse schema
- Add version field to frontend PackageArtifact type
- Update getArtifactVersion to prefer direct version field
2026-02-03 16:24:31 -06:00
Mondo Diaz
c0c8603d05 Fix migrations 008 and 011 to handle removed tags table 2026-02-03 16:05:29 -06:00
Mondo Diaz
2501ba21d4 Fix migration 005 to not create indexes on removed tags table 2026-02-03 16:01:09 -06:00
Mondo Diaz
c94fe0389b Fix tests for tag removal and version behavior
- Fix upload response to return actual version (not requested version)
  when artifact already has a version in the package
- Update ref_count tests to use multiple packages (one version per
  artifact per package design constraint)
- Remove allow_public_internet references from upstream caching tests
- Update consistency check test to not assert global system health
- Add versions field to artifact schemas
- Fix dependencies resolution to handle removed tag constraint
2026-02-03 15:35:44 -06:00
Mondo Diaz
9a95421064 Fix remaining tag references in tests
- Update CacheRequest test to use version field
- Fix upload_test_file calls that still used tag parameter
- Update artifact history test to check versions instead of tags
- Update artifact stats tests to check versions instead of tags
- Fix garbage collection tests to delete versions instead of tags
- Remove TestGlobalTags class (endpoint removed)
- Update project/package stats tests to check version_count
- Fix upload_test_file fixture in test_download_verification
2026-02-03 12:51:31 -06:00
Mondo Diaz
87f30ea898 Update tests for tag removal
- Remove Tag/TagHistory model tests from unit tests
- Update CacheSettings tests to remove allow_public_internet field
- Replace tag= with version= in upload_test_file calls
- Update test assertions to use versions instead of tags
- Remove tests for tag: prefix downloads (now uses version:)
- Update dependency tests for version-only schema
2026-02-03 12:45:44 -06:00
Mondo Diaz
106e30b533 Remove obsolete tag support test from DragDropUpload
The tag functionality was removed in the previous commit, so
this test that expected a 'tag' field in the upload FormData
is no longer valid.
2026-02-03 12:32:11 -06:00
Mondo Diaz
c4c9c20763 Remove tag system, use versions only for artifact references
Tags were mutable aliases that caused confusion alongside the immutable
version system. This removes tags entirely, keeping only PackageVersion
for artifact references.

Changes:
- Remove tags and tag_history tables (migration 012)
- Remove Tag model, TagRepository, and 6 tag API endpoints
- Update cache system to create versions instead of tags
- Update frontend to display versions instead of tags
- Remove tag-related schemas and types
- Update artifact cleanup service for version-based ref_count
2026-02-03 12:18:19 -06:00
Mondo Diaz
62c709e368 Remove superuser-only session_replication_role from factory reset 2026-02-03 11:19:50 -06:00
Mondo Diaz
b6fb9e7546 Use same variable pattern as integration tests for reset job 2026-02-03 11:05:04 -06:00
Mondo Diaz
9db94d035d Add shell-level debug for password variable 2026-02-03 11:01:01 -06:00
Mondo Diaz
6d9cd9d45d Add debug to detect hidden characters in password 2026-02-03 10:59:00 -06:00
Mondo Diaz
f5b60468ce Fix invalid sort field error on package artifact listing
The artifacts endpoint only supports sorting by: created_at, size, original_name
But the frontend was defaulting to 'name' (from the old tags endpoint).

- Change default sort from 'name' to 'created_at'
- Change default order from 'asc' to 'desc' (newest first)
- Remove sortable flag from version/tags columns (not DB fields)
- Add sortable flag to original_name and size columns
2026-02-03 10:55:00 -06:00
Mondo Diaz
f7643a5c13 Add debug output to reset_feature job for auth troubleshooting 2026-02-03 10:25:36 -06:00
Mondo Diaz
281474d72f Fix self-dependency detection to strip PyPI extras brackets
The circular dependency error '_pypi/psutil → _pypi/psutil' occurred because
dependencies with extras like 'psutil[test]' weren't being recognized as
self-dependencies. The comparison 'psutil[test] != psutil' failed.

- Add _normalize_pypi_package_name() helper that strips extras brackets
  and normalizes separators per PEP 503
- Update _detect_package_cycle to use normalized names for cycle detection
- Update check_circular_dependencies to use normalized initial path
- Simplify self-dependency check in resolve_dependencies to use helper
2026-02-03 10:17:13 -06:00
Mondo Diaz
bb7c30b15c Fix circular dependency resolution by switching to artifact-centric display
- Add artifact: prefix handling in resolve_dependencies for direct artifact
  ID references, enabling dependency resolution for tagless artifacts
- Refactor PackagePage from tag-based to artifact-based data display
- Add PackageArtifact type with tags array for artifact-centric API responses
- Update download URLs to use artifact:ID prefix when no tags exist
- Conditionally show "View Ensure File" only when artifact has tags
2026-02-03 10:00:15 -06:00
Mondo Diaz
9587ed8f17 Fix progress bar CSS scoping conflict between upload and dashboard 2026-02-03 08:29:03 -06:00
Mondo Diaz
e86d974339 Add reset job after integration tests on feature branches 2026-02-03 08:24:22 -06:00