Commit Graph

190 Commits

Author SHA1 Message Date
Mondo Diaz
5517048f05 Fix nested dependency depth tracking in PyPI cache worker
When the cache worker downloaded a package through the proxy, dependencies
were always queued with depth=0 instead of depth+1. This meant depth limits
weren't properly enforced for nested dependencies.

Changes:
- Add cache-depth query parameter to pypi_download_file endpoint
- Worker now passes its current depth when fetching packages
- Dependencies are queued at cache_depth+1 instead of hardcoded 0
- Add tests for depth tracking behavior
2026-02-05 09:15:08 -06:00
Mondo Diaz
c7eca269f4 Fix jobs dashboard showing misleading completion message
The dashboard was showing "All jobs completed successfully" whenever
there were no failed tasks, even if there were pending or in-progress
jobs. Now shows:
- "All jobs completed" only when pending=0 and in_progress=0
- "Jobs are processing. No failures yet." when jobs are in queue
2026-02-05 09:15:08 -06:00
Mondo Diaz
6a3a875a9c Add security fixes and code cleanup for PyPI cache
- Add require_admin authentication to cache management endpoints
- Add limit validation (1-500) on failed tasks query
- Add thread lock for worker pool thread safety
- Fix exception handling with separate recovery DB session
- Remove obsolete design doc
2026-02-05 09:15:08 -06:00
Mondo Diaz
a39b6f098f Add Background Jobs dashboard for admin users
New admin page at /admin/jobs showing:
- PyPI cache job status (pending, in-progress, completed, failed)
- Failed task list with error details
- Retry individual packages or retry all failed
- Auto-refresh every 5 seconds (toggleable)
- Placeholder for future NPM cache jobs

Accessible from admin dropdown menu as "Background Jobs".
2026-02-05 09:15:08 -06:00
Mondo Diaz
e0562195df Add robust PyPI dependency caching with task queue
Replace unbounded thread spawning with managed worker pool:
- New pypi_cache_tasks table tracks caching jobs
- Thread pool with 5 workers (configurable via ORCHARD_PYPI_CACHE_WORKERS)
- Automatic retries with exponential backoff (30s, 60s, then fail)
- Deduplication to prevent duplicate caching attempts

New API endpoints for visibility and control:
- GET /pypi/cache/status - queue health summary
- GET /pypi/cache/failed - list failed tasks with errors
- POST /pypi/cache/retry/{package} - retry single package
- POST /pypi/cache/retry-all - retry all failed packages

This fixes silent failures in background dependency caching where
packages would fail to cache without any tracking or retry mechanism.
2026-02-05 09:15:08 -06:00
Mondo Diaz
db7d0bb7c4 Add design doc for PyPI cache robustness improvements 2026-02-05 09:15:08 -06:00
Mondo Diaz
4a287d46c8 Fix proactive dependency caching HTTPS redirect issue
When background threads fetch from our own proxy using the request's
base_url, it returns http:// but ingress requires https://. The 308
redirect was dropping trailing slashes, causing requests to hit the
frontend catch-all route instead of /pypi/simple/.

Force HTTPS explicitly in the background caching function to avoid
the redirect entirely.
2026-02-05 09:15:08 -06:00
Mondo Diaz
cbea91a528 Add debug logging for proactive caching regex failures 2026-02-05 09:15:08 -06:00
Mondo Diaz
80e2f3d157 Fix proactive caching regex to match both hyphens and underscores
PEP 503 normalizes package names to use hyphens, but wheel filenames
may use underscores (e.g., typing_extensions-4.0.0-py3-none-any.whl).

Convert the search pattern to match either separator.
2026-02-05 09:15:08 -06:00
Mondo Diaz
522d23ec01 Fix proactive caching failing on HTTP->HTTPS redirects
The background dependency caching was getting 308 redirects because
request.base_url returns http:// but the ingress redirects to https://.

Enable follow_redirects=True in httpx client to handle this.
2026-02-05 09:15:08 -06:00
Mondo Diaz
c1060feb5f Add proactive dependency caching for PyPI packages
When a PyPI package is cached, its dependencies are now automatically
fetched in background threads. This ensures the entire dependency tree
is cached even if pip already has some packages installed locally.

Features:
- Background threads fetch each dependency without blocking the response
- Uses our own proxy endpoint to cache, which recursively caches transitive deps
- Max depth of 10 to prevent infinite loops
- Daemon threads so they don't block process shutdown
2026-02-05 09:15:08 -06:00
Mondo Diaz
e62e75bade Fix duplicate dependency constraint causing 500 errors
- Deduplicate dependencies by package name before inserting
- Some packages (like anyio) list the same dep (trio) multiple times with
  different version constraints for different extras
- The unique constraint on (artifact_id, project, package) rejected these
- Also removed debug logging from dependencies.py
2026-02-05 09:15:08 -06:00
Mondo Diaz
befa517485 Add detailed debug logging to _resolve_dependency_to_artifact 2026-02-05 09:15:08 -06:00
Mondo Diaz
7a2c0a54c6 Add debug logging to resolve_dependencies 2026-02-05 09:15:08 -06:00
Mondo Diaz
ead016208d Add backfill script for PyPI package dependencies
Script extracts Requires-Dist metadata from cached PyPI packages
and stores them in artifact_dependencies table.

Usage:
  docker exec <container> python -m backend.scripts.backfill_pypi_dependencies
  docker exec <container> python -m backend.scripts.backfill_pypi_dependencies --dry-run
2026-02-05 09:15:08 -06:00
Mondo Diaz
4b76ca2046 Add PEP 440 version constraint matching for dependency resolution
- Parse version constraints like >=1.9, <2.0 using packaging library
- Find the latest version that satisfies the constraint
- Support wildcard (*) to get latest version
- Fall back to exact version and tag matching
2026-02-05 09:15:08 -06:00
Mondo Diaz
94bbd87e6b Fix ensure file modal z-index when opened from deps modal 2026-02-05 09:15:08 -06:00
Mondo Diaz
2cf04a43ef Extract and store dependencies from PyPI packages
- Add functions to parse Requires-Dist metadata from wheel and sdist files
- Store extracted dependencies in artifact_dependencies table
- Fix streaming response for cached artifacts (proper tuple unpacking)
- Fix version uniqueness check to use version string instead of artifact_id
- Skip creating versions for .metadata files
2026-02-05 09:15:08 -06:00
Mondo Diaz
9acef055b6 Add is_system to all ProjectResponse constructions in routes 2026-02-05 09:15:08 -06:00
Mondo Diaz
694f25ac9b Fix: ensure existing _pypi project gets is_system=true 2026-02-05 09:15:08 -06:00
Mondo Diaz
06b2beb152 Add is_system field to ProjectResponse schema 2026-02-05 09:15:08 -06:00
Mondo Diaz
2b2dbae38b Hide Tags and Latest columns for system projects in package table 2026-02-05 09:15:08 -06:00
Mondo Diaz
cd56d00ebf Improve system project UX and make dependencies a modal
- Hide tag count stat for system projects (show "versions" instead of "artifacts")
- Hide "Latest" tag stat for system projects
- Change "Create/Update Tag" to only show for non-system projects
- Add "View Artifact ID" menu option with modal showing the SHA256 hash
- Move dependencies section to a modal (opened via "View Dependencies" menu)
- Add deps-modal and artifact-id-modal CSS styles
2026-02-05 09:15:08 -06:00
Mondo Diaz
558e1bc78f Fix PyPI proxy UX and package stats calculation
- Fix artifact_count and total_size calculation to use Tags instead of
  Uploads, so PyPI cached packages show their stats correctly
- Fix PackagePage dropdown menu positioning (use fixed position with backdrop)
- Add system project detection for projects starting with "_"
- Show Version as primary column for system projects, hide Tag column
- Hide upload button for system projects (they're cache-only)
- Rename section header to "Versions" for system projects
- Fix test_projects_sort_by_name to exclude system projects from sort comparison
2026-02-05 09:15:08 -06:00
Mondo Diaz
32218dbb1c Hide format filter and column for system projects
System projects like _pypi only contain packages of one format,
so the format filter dropdown and column are redundant.
2026-02-05 09:15:08 -06:00
Mondo Diaz
006df9dff9 Hide Settings and New Package buttons for system projects
System projects should be system-controlled only. Users should not
be able to create packages or change settings on system cache projects.
2026-02-05 09:15:08 -06:00
Mondo Diaz
844e937071 Improve PyPI proxy and Package page UX
PyPI proxy improvements:
- Set package format to "pypi" instead of "generic"
- Extract version from filename and create PackageVersion record
- Support .whl, .tar.gz, and .zip filename formats

Package page UX overhaul:
- Move upload to header button with modal
- Simplify table: combine Tag/Version, remove Type and Artifact ID columns
- Add row action menu (⋯) with: Copy ID, Ensure File, Create Tag, Dependencies
- Remove cluttered "Download by Artifact ID" and "Create/Update Tag" sections
- Add modals for upload and create tag actions
- Cleaner, more scannable table layout
2026-02-05 09:15:08 -06:00
Mondo Diaz
77c7526023 Show team name instead of individual user in Owner column
Projects owned by teams now display the team name in the Owner column
for better organizational continuity when team members change.
Falls back to created_by if no team is assigned.
2026-02-05 09:15:08 -06:00
Mondo Diaz
ec69d7619b Add "(coming soon)" label for unsupported upstream source types
Only pypi and generic are currently supported. Other types now show
"(coming soon)" in both the dropdown and the sources table.
2026-02-05 09:15:08 -06:00
Mondo Diaz
8e3af8c4f5 Fix PyPI proxy: use correct storage method and make project public
- Use storage.get_stream(s3_key) instead of non-existent get_artifact_stream()
- Make _pypi project public (is_public=True) so cached packages are visible
2026-02-05 09:15:08 -06:00
Mondo Diaz
24a0a71cf4 Fix Project and Tag model fields in PyPI proxy
Use correct model fields:
- Project: is_public, is_system, created_by (not visibility)
- Tag: add required created_by field
2026-02-05 09:15:08 -06:00
Mondo Diaz
ab50148a60 Fix Artifact model field names in PyPI proxy
Use correct Artifact model fields:
- original_name instead of filename
- Add required created_by and s3_key fields
- Include checksum fields from storage result
2026-02-05 09:15:08 -06:00
Mondo Diaz
acee458b3c Fix PyPI proxy to use correct storage.store() method
The code was calling storage.store_artifact() which doesn't exist.
Changed to use storage.store() which handles content-addressable
storage with automatic deduplication.
2026-02-05 09:15:08 -06:00
Mondo Diaz
f18b8ed560 Allow full path in PyPI upstream source URL
Users can now configure the full path including /simple in their
upstream source URL (e.g., https://example.com/api/pypi/repo/simple)
instead of having the code append /simple/ automatically.

This matches pip's --index-url format, making configuration more
intuitive and copy/paste friendly.
2026-02-05 09:15:08 -06:00
Mondo Diaz
7e84dd3958 Fix test_rewrite_relative_links assertion to expect correct URL
The test was checking for the wrong URL pattern. When urljoin resolves
../../packages/ab/cd/... relative to /api/pypi/pypi-remote/simple/requests/,
it correctly produces /api/pypi/pypi-remote/packages/ab/cd/... (not
/api/pypi/packages/...).
2026-02-05 09:15:08 -06:00
Mondo Diaz
a72c9d3f6e Improve PyPI proxy test assertions for all status codes
Tests now verify the correct response for each scenario:
- 200: HTML content-type
- 404: "not found" error message
- 503: "No PyPI upstream sources configured" error message
2026-02-05 09:15:08 -06:00
Mondo Diaz
a6618fe550 Fix PyPI proxy tests to work with or without upstream sources
- Tests now accept 200/404/503 responses since upstream sources may or
  may not be configured in the test environment
- Added upstream_base_url parameter to _rewrite_package_links test
- Added test for relative URL resolution (Artifactory-style URLs)
2026-02-05 09:15:08 -06:00
Mondo Diaz
796176c251 Fix HTTPS scheme detection behind reverse proxy
When behind a reverse proxy that terminates SSL, the server sees HTTP
requests internally. Added _get_base_url() helper that respects the
X-Forwarded-Proto header to generate correct external HTTPS URLs.

This fixes links in the PyPI simple index showing http:// instead of
https:// when accessed via HTTPS through a load balancer.
2026-02-05 09:15:08 -06:00
Mondo Diaz
f58fb0079a Fix relative URL handling in PyPI proxy
Artifactory and other registries may return relative URLs in their
Simple API responses (e.g., ../../packages/...). The proxy now resolves
these to absolute URLs using urljoin() before encoding them in the
upstream parameter.

This fixes package downloads failing when the upstream registry uses
relative URLs in its package index.
2026-02-05 09:15:08 -06:00
Mondo Diaz
f57762334f Remove dead code from pypi_proxy.py
- Remove unused imports (UpstreamClient, UpstreamClientConfig,
  UpstreamHTTPError, UpstreamConnectionError, UpstreamTimeoutError)
- Simplify matched_source selection logic, removing dead conditional
  that always evaluated to True due to 'or True'
2026-02-05 09:15:08 -06:00
Mondo Diaz
599c8c1d5b Fix httpx.Timeout configuration in PyPI proxy
httpx.Timeout requires either a default value or all four parameters.
Changed to httpx.Timeout(default, connect=X) format.
2026-02-05 09:15:08 -06:00
Evan Cohen-Doty
11c5aee0f1 Merge branch 's3_bucket_provisioner' into 'main'
59 Add S3 Bucket Provisioner

Closes #59

See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!49
2026-02-04 11:32:12 -08:00
Evan Cohen-Doty
1b706fe858 59 Add S3 Bucket Provisioner 2026-02-04 11:32:12 -08:00
Mondo Diaz
dcd405679a Merge branch 'feature/transparent-proxy' into 'main'
Add transparent PyPI proxy and improve upstream sources UI

Closes #108

See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!56
2026-01-29 16:12:57 -06:00
Mondo Diaz
97498b2f86 Add transparent PyPI proxy and improve upstream sources UI 2026-01-29 16:12:57 -06:00
Mondo Diaz
e8cf2462b7 Merge branch 'fix/upstream-caching-bugs-2' into 'main'
Simplify cache management UI and improve test status display (#107)

See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!55
2026-01-29 14:25:19 -06:00
Mondo Diaz
038ad4ed1b Simplify cache management UI and improve test status display (#107) 2026-01-29 14:25:19 -06:00
Mondo Diaz
858b45d434 Merge branch 'fix/purge-seed-data-user-id' into 'main'
Fix purge_seed_data type mismatch for access_permissions.user_id (#107)

See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!54
2026-01-29 13:48:21 -06:00
Mondo Diaz
95470b2bf6 Fix purge_seed_data type mismatch for access_permissions.user_id (#107) 2026-01-29 13:48:21 -06:00
Mondo Diaz
c512d85f9e Merge branch 'fix/upstream-caching-bugs' into 'main'
Remove public internet features and fix upstream source UI (#107)

See merge request esv/bsf/bsf-integration/orchard/orchard-mvp!53
2026-01-29 13:26:29 -06:00