Commit Graph

93 Commits

Author SHA1 Message Date
Mondo Diaz
c94fe0389b Fix tests for tag removal and version behavior
- Fix upload response to return actual version (not requested version)
  when artifact already has a version in the package
- Update ref_count tests to use multiple packages (one version per
  artifact per package design constraint)
- Remove allow_public_internet references from upstream caching tests
- Update consistency check test to not assert global system health
- Add versions field to artifact schemas
- Fix dependencies resolution to handle removed tag constraint
2026-02-03 15:35:44 -06:00
Mondo Diaz
9a95421064 Fix remaining tag references in tests
- Update CacheRequest test to use version field
- Fix upload_test_file calls that still used tag parameter
- Update artifact history test to check versions instead of tags
- Update artifact stats tests to check versions instead of tags
- Fix garbage collection tests to delete versions instead of tags
- Remove TestGlobalTags class (endpoint removed)
- Update project/package stats tests to check version_count
- Fix upload_test_file fixture in test_download_verification
2026-02-03 12:51:31 -06:00
Mondo Diaz
87f30ea898 Update tests for tag removal
- Remove Tag/TagHistory model tests from unit tests
- Update CacheSettings tests to remove allow_public_internet field
- Replace tag= with version= in upload_test_file calls
- Update test assertions to use versions instead of tags
- Remove tests for tag: prefix downloads (now uses version:)
- Update dependency tests for version-only schema
2026-02-03 12:45:44 -06:00
Mondo Diaz
c4c9c20763 Remove tag system, use versions only for artifact references
Tags were mutable aliases that caused confusion alongside the immutable
version system. This removes tags entirely, keeping only PackageVersion
for artifact references.

Changes:
- Remove tags and tag_history tables (migration 012)
- Remove Tag model, TagRepository, and 6 tag API endpoints
- Update cache system to create versions instead of tags
- Update frontend to display versions instead of tags
- Remove tag-related schemas and types
- Update artifact cleanup service for version-based ref_count
2026-02-03 12:18:19 -06:00
Mondo Diaz
62c709e368 Remove superuser-only session_replication_role from factory reset 2026-02-03 11:19:50 -06:00
Mondo Diaz
281474d72f Fix self-dependency detection to strip PyPI extras brackets
The circular dependency error '_pypi/psutil → _pypi/psutil' occurred because
dependencies with extras like 'psutil[test]' weren't being recognized as
self-dependencies. The comparison 'psutil[test] != psutil' failed.

- Add _normalize_pypi_package_name() helper that strips extras brackets
  and normalizes separators per PEP 503
- Update _detect_package_cycle to use normalized names for cycle detection
- Update check_circular_dependencies to use normalized initial path
- Simplify self-dependency check in resolve_dependencies to use helper
2026-02-03 10:17:13 -06:00
Mondo Diaz
bb7c30b15c Fix circular dependency resolution by switching to artifact-centric display
- Add artifact: prefix handling in resolve_dependencies for direct artifact
  ID references, enabling dependency resolution for tagless artifacts
- Refactor PackagePage from tag-based to artifact-based data display
- Add PackageArtifact type with tags array for artifact-centric API responses
- Update download URLs to use artifact:ID prefix when no tags exist
- Conditionally show "View Ensure File" only when artifact has tags
2026-02-03 10:00:15 -06:00
Mondo Diaz
bf2737b3a2 Fix self-dependency check to use case-insensitive PyPI name normalization 2026-02-03 08:23:39 -06:00
Mondo Diaz
17d3004058 Pass upstream policy errors through PyPI proxy to users
- Add _parse_upstream_error() to extract policy messages from JFrog/Artifactory
- Pass through 403 and other 4xx errors with detailed messages
- Pin babel and electron-to-chromium to older versions for CI compatibility
2026-02-03 08:09:08 -06:00
Mondo Diaz
34ff9caa08 Fix circular dependency error message to show actual cycle path
The error was hardcoding [pkg_key, pkg_key] regardless of actual cycle.
Now tracks the path through dependencies to report the real cycle.
2026-02-02 20:43:05 -06:00
Mondo Diaz
01915bcb45 Fix circular dependency detection and hide empty graph modal
- Add artifact-level self-dependency check (skip if dep resolves to same artifact)
- Close dependency graph modal if package has no dependencies to show
  (only root package with no children and no missing deps)
2026-02-02 20:31:46 -06:00
Mondo Diaz
72952d84a1 Skip self-dependencies in dependency resolver
PyPI packages can have self-referential dependencies for extras
(e.g., pytest[testing] depends on pytest). These were incorrectly
detected as circular dependencies. Now we skip them.
2026-02-02 19:45:34 -06:00
Mondo Diaz
b3ae3b03eb Show missing dependencies in dependency graph instead of failing
When dependencies are not cached on the server (common since we removed
proactive caching), the dependency graph now:
- Continues resolving what it can find
- Shows missing dependencies in a separate section with amber styling
- Displays the constraint and which package required them
- Updates the header stats to show "X cached • Y not cached"

This provides a better user experience than showing an error when
some dependencies haven't been downloaded yet.
2026-02-02 16:29:37 -06:00
Mondo Diaz
ba0a658611 Fix dependency graph error for invalid version constraints
When a dependency has an invalid version constraint like '>=' (without
a version number), the resolver now treats it as a wildcard and returns
the latest available version instead of failing with 'Dependency not found'.

This handles malformed metadata that may have been stored from PyPI packages.
2026-02-02 16:26:18 -06:00
Mondo Diaz
081cc6df83 Remove proactive PyPI dependency caching feature
The background task queue for proactively caching package dependencies was
causing server instability and unnecessary growth. The PyPI proxy now only
caches packages on-demand when users request them.

Removed:
- PyPI cache worker (background task queue and worker pool)
- PyPICacheTask model and related database schema
- Cache management API endpoints (/pypi/cache/*)
- Background Jobs admin dashboard
- Dependency extraction and queueing logic

Kept:
- On-demand package caching (still works when users request packages)
- Async httpx for non-blocking downloads (prevents health check failures)
- URL-based cache lookups for deduplication
2026-02-02 16:17:33 -06:00
Mondo Diaz
1329d380a4 Convert PyPI proxy from sync to async httpx to prevent event loop blocking
The pypi_download_file, pypi_simple_index, and pypi_package_versions endpoints
were using synchronous httpx.Client inside async functions. When upstream PyPI
servers respond slowly, this blocked the entire FastAPI event loop, preventing
health checks from responding. Kubernetes would then kill the pod after the
liveness probe timed out.

Changes:
- httpx.Client → httpx.AsyncClient
- client.get() → await client.get()
- response.iter_bytes() → response.aiter_bytes()

This ensures the event loop remains responsive during slow upstream downloads,
allowing health checks to succeed even when downloads take 20+ seconds.
2026-02-02 15:26:24 -06:00
Mondo Diaz
361210a2bc Add cancel job button and improve jobs table UI
- Remove "All Jobs" title
- Move Status column to front of table
- Add Cancel button for in-progress jobs
- Add cancel endpoint: POST /pypi/cache/cancel/{package_name}
- Add btn-danger CSS styling
2026-02-02 15:18:59 -06:00
Mondo Diaz
415ad9a29a Stream downloads to temp file to reduce memory usage
- Download packages in 64KB chunks to temp file instead of loading into memory
- Upload to S3 from temp file (streaming)
- Clean up temp file after processing
- Reduces memory footprint from 2x file size to 1x file size
2026-02-02 15:10:25 -06:00
Mondo Diaz
92edef92e6 Redesign jobs dashboard with unified table and progress bar
- Add overall progress bar showing completed/active/failed counts
- Unify all job types into single table with Type column
- Simplify status to Working/Pending/Failed badges
- Remove NPM "Coming Soon" section
- Add get_recent_activity() function for future activity feed
- Fix dark mode CSS using CSS variables
2026-02-02 14:34:48 -06:00
Mondo Diaz
47b137f4eb Improve Active Workers table and recover stale tasks
Backend:
- Add _recover_stale_tasks() to reset tasks stuck in 'in_progress'
  from previous crashes (tasks >5 min old get reset to pending)
- Called automatically on startup

Frontend:
- Fix dark mode colors using CSS variables instead of hardcoded values
- Add elapsed time column showing how long task has been running
- Add spinning indicator next to package name
- Add status badge (Running/Stale?)
- Highlight stale tasks (>5 min) in amber
- Auto-updates every 5 seconds with existing refresh
2026-02-02 14:29:17 -06:00
Mondo Diaz
1138309aaa Add Active Workers table to Background Jobs dashboard
Shows currently processing cache tasks in a dynamic table with:
- Package name and version constraint being cached
- Recursion depth and attempt number
- Start timestamp
- Pulsing indicator to show live activity

Backend changes:
- Add get_active_tasks() function to pypi_cache_worker.py
- Add GET /pypi/cache/active endpoint to pypi_proxy.py

Frontend changes:
- Add PyPICacheActiveTask type
- Add getPyPICacheActiveTasks() API function
- Add Active Workers section with animated table
- Auto-refreshes every 5 seconds with existing data
2026-02-02 13:50:45 -06:00
Mondo Diaz
3bdeade7ca Fix nested dependency depth tracking in PyPI cache worker
When the cache worker downloaded a package through the proxy, dependencies
were always queued with depth=0 instead of depth+1. This meant depth limits
weren't properly enforced for nested dependencies.

Changes:
- Add cache-depth query parameter to pypi_download_file endpoint
- Worker now passes its current depth when fetching packages
- Dependencies are queued at cache_depth+1 instead of hardcoded 0
- Add tests for depth tracking behavior
2026-02-02 13:47:22 -06:00
Mondo Diaz
97b39d000b Add security fixes and code cleanup for PyPI cache
- Add require_admin authentication to cache management endpoints
- Add limit validation (1-500) on failed tasks query
- Add thread lock for worker pool thread safety
- Fix exception handling with separate recovery DB session
- Remove obsolete design doc
2026-02-02 11:37:25 -06:00
Mondo Diaz
d274f3f375 Add robust PyPI dependency caching with task queue
Replace unbounded thread spawning with managed worker pool:
- New pypi_cache_tasks table tracks caching jobs
- Thread pool with 5 workers (configurable via ORCHARD_PYPI_CACHE_WORKERS)
- Automatic retries with exponential backoff (30s, 60s, then fail)
- Deduplication to prevent duplicate caching attempts

New API endpoints for visibility and control:
- GET /pypi/cache/status - queue health summary
- GET /pypi/cache/failed - list failed tasks with errors
- POST /pypi/cache/retry/{package} - retry single package
- POST /pypi/cache/retry-all - retry all failed packages

This fixes silent failures in background dependency caching where
packages would fail to cache without any tracking or retry mechanism.
2026-02-02 11:16:02 -06:00
Mondo Diaz
3c2ab70ef0 Fix proactive dependency caching HTTPS redirect issue
When background threads fetch from our own proxy using the request's
base_url, it returns http:// but ingress requires https://. The 308
redirect was dropping trailing slashes, causing requests to hit the
frontend catch-all route instead of /pypi/simple/.

Force HTTPS explicitly in the background caching function to avoid
the redirect entirely.
2026-01-30 18:59:31 -06:00
Mondo Diaz
109a593f83 Add debug logging for proactive caching regex failures 2026-01-30 18:43:09 -06:00
Mondo Diaz
1d727b3f8c Fix proactive caching regex to match both hyphens and underscores
PEP 503 normalizes package names to use hyphens, but wheel filenames
may use underscores (e.g., typing_extensions-4.0.0-py3-none-any.whl).

Convert the search pattern to match either separator.
2026-01-30 18:25:30 -06:00
Mondo Diaz
47aa0afe91 Fix proactive caching failing on HTTP->HTTPS redirects
The background dependency caching was getting 308 redirects because
request.base_url returns http:// but the ingress redirects to https://.

Enable follow_redirects=True in httpx client to handle this.
2026-01-30 18:11:08 -06:00
Mondo Diaz
f992fc540e Add proactive dependency caching for PyPI packages
When a PyPI package is cached, its dependencies are now automatically
fetched in background threads. This ensures the entire dependency tree
is cached even if pip already has some packages installed locally.

Features:
- Background threads fetch each dependency without blocking the response
- Uses our own proxy endpoint to cache, which recursively caches transitive deps
- Max depth of 10 to prevent infinite loops
- Daemon threads so they don't block process shutdown
2026-01-30 17:45:30 -06:00
Mondo Diaz
044a6c1d27 Fix duplicate dependency constraint causing 500 errors
- Deduplicate dependencies by package name before inserting
- Some packages (like anyio) list the same dep (trio) multiple times with
  different version constraints for different extras
- The unique constraint on (artifact_id, project, package) rejected these
- Also removed debug logging from dependencies.py
2026-01-30 17:43:49 -06:00
Mondo Diaz
62c77dc16d Add detailed debug logging to _resolve_dependency_to_artifact 2026-01-30 17:29:19 -06:00
Mondo Diaz
7c05360eed Add debug logging to resolve_dependencies 2026-01-30 17:21:04 -06:00
Mondo Diaz
76878279e9 Add backfill script for PyPI package dependencies
Script extracts Requires-Dist metadata from cached PyPI packages
and stores them in artifact_dependencies table.

Usage:
  docker exec <container> python -m backend.scripts.backfill_pypi_dependencies
  docker exec <container> python -m backend.scripts.backfill_pypi_dependencies --dry-run
2026-01-30 15:38:45 -06:00
Mondo Diaz
e1b01abf9b Add PEP 440 version constraint matching for dependency resolution
- Parse version constraints like >=1.9, <2.0 using packaging library
- Find the latest version that satisfies the constraint
- Support wildcard (*) to get latest version
- Fall back to exact version and tag matching
2026-01-30 15:34:19 -06:00
Mondo Diaz
47b3eb439d Extract and store dependencies from PyPI packages
- Add functions to parse Requires-Dist metadata from wheel and sdist files
- Store extracted dependencies in artifact_dependencies table
- Fix streaming response for cached artifacts (proper tuple unpacking)
- Fix version uniqueness check to use version string instead of artifact_id
- Skip creating versions for .metadata files
2026-01-30 15:14:52 -06:00
Mondo Diaz
c5f75e4fd6 Add is_system to all ProjectResponse constructions in routes 2026-01-30 13:34:26 -06:00
Mondo Diaz
ff31379649 Fix: ensure existing _pypi project gets is_system=true 2026-01-30 13:33:31 -06:00
Mondo Diaz
424b1e5770 Add is_system field to ProjectResponse schema 2026-01-30 13:11:11 -06:00
Mondo Diaz
fe6c6c52d2 Fix PyPI proxy UX and package stats calculation
- Fix artifact_count and total_size calculation to use Tags instead of
  Uploads, so PyPI cached packages show their stats correctly
- Fix PackagePage dropdown menu positioning (use fixed position with backdrop)
- Add system project detection for projects starting with "_"
- Show Version as primary column for system projects, hide Tag column
- Hide upload button for system projects (they're cache-only)
- Rename section header to "Versions" for system projects
- Fix test_projects_sort_by_name to exclude system projects from sort comparison
2026-01-30 12:16:05 -06:00
Mondo Diaz
f3afdd3bbf Improve PyPI proxy and Package page UX
PyPI proxy improvements:
- Set package format to "pypi" instead of "generic"
- Extract version from filename and create PackageVersion record
- Support .whl, .tar.gz, and .zip filename formats

Package page UX overhaul:
- Move upload to header button with modal
- Simplify table: combine Tag/Version, remove Type and Artifact ID columns
- Add row action menu (⋯) with: Copy ID, Ensure File, Create Tag, Dependencies
- Remove cluttered "Download by Artifact ID" and "Create/Update Tag" sections
- Add modals for upload and create tag actions
- Cleaner, more scannable table layout
2026-01-30 11:52:37 -06:00
Mondo Diaz
2dc7fe5a7b Fix PyPI proxy: use correct storage method and make project public
- Use storage.get_stream(s3_key) instead of non-existent get_artifact_stream()
- Make _pypi project public (is_public=True) so cached packages are visible
2026-01-30 10:59:50 -06:00
Mondo Diaz
534e4b964f Fix Project and Tag model fields in PyPI proxy
Use correct model fields:
- Project: is_public, is_system, created_by (not visibility)
- Tag: add required created_by field
2026-01-30 10:29:25 -06:00
Mondo Diaz
757e43fc34 Fix Artifact model field names in PyPI proxy
Use correct Artifact model fields:
- original_name instead of filename
- Add required created_by and s3_key fields
- Include checksum fields from storage result
2026-01-30 09:58:15 -06:00
Mondo Diaz
d78092de55 Fix PyPI proxy to use correct storage.store() method
The code was calling storage.store_artifact() which doesn't exist.
Changed to use storage.store() which handles content-addressable
storage with automatic deduplication.
2026-01-30 09:41:34 -06:00
Mondo Diaz
0fa991f536 Allow full path in PyPI upstream source URL
Users can now configure the full path including /simple in their
upstream source URL (e.g., https://example.com/api/pypi/repo/simple)
instead of having the code append /simple/ automatically.

This matches pip's --index-url format, making configuration more
intuitive and copy/paste friendly.
2026-01-30 09:24:05 -06:00
Mondo Diaz
00fb2729e4 Fix test_rewrite_relative_links assertion to expect correct URL
The test was checking for the wrong URL pattern. When urljoin resolves
../../packages/ab/cd/... relative to /api/pypi/pypi-remote/simple/requests/,
it correctly produces /api/pypi/pypi-remote/packages/ab/cd/... (not
/api/pypi/packages/...).
2026-01-30 08:51:30 -06:00
Mondo Diaz
8ae4d7a685 Improve PyPI proxy test assertions for all status codes
Tests now verify the correct response for each scenario:
- 200: HTML content-type
- 404: "not found" error message
- 503: "No PyPI upstream sources configured" error message
2026-01-29 19:35:20 -06:00
Mondo Diaz
4b887d1aad Fix PyPI proxy tests to work with or without upstream sources
- Tests now accept 200/404/503 responses since upstream sources may or
  may not be configured in the test environment
- Added upstream_base_url parameter to _rewrite_package_links test
- Added test for relative URL resolution (Artifactory-style URLs)
2026-01-29 19:34:33 -06:00
Mondo Diaz
4dc54ace8a Fix HTTPS scheme detection behind reverse proxy
When behind a reverse proxy that terminates SSL, the server sees HTTP
requests internally. Added _get_base_url() helper that respects the
X-Forwarded-Proto header to generate correct external HTTPS URLs.

This fixes links in the PyPI simple index showing http:// instead of
https:// when accessed via HTTPS through a load balancer.
2026-01-29 18:02:21 -06:00
Mondo Diaz
64bfd3902f Fix relative URL handling in PyPI proxy
Artifactory and other registries may return relative URLs in their
Simple API responses (e.g., ../../packages/...). The proxy now resolves
these to absolute URLs using urljoin() before encoding them in the
upstream parameter.

This fixes package downloads failing when the upstream registry uses
relative URLs in its package index.
2026-01-29 18:01:19 -06:00