8 Commits

Author SHA1 Message Date
Mondo Diaz
7d4091221a Update CHANGELOG for issues 33, 34, 35 2026-01-05 15:11:48 -06:00
Mondo Diaz
939192f425 Add integration tests for stats endpoints and fix JSON report serialization 2026-01-05 15:11:15 -06:00
Mondo Diaz
c977d1d465 Fix API/frontend type mismatch for dashboard
- Backend: Change 'id' to 'artifact_id' in most_referenced_artifacts response
- Backend: Add content_type field to referenced artifacts
- Frontend: Add orphaned_size_bytes to Stats interface
- Frontend: Add missing fields to DeduplicationStats interface
2026-01-05 15:05:41 -06:00
Mondo Diaz
e37e1892b2 Add Dashboard UI for storage and deduplication statistics
New components:
- Dashboard.tsx - main dashboard page with stats overview
- Dashboard.css - responsive styling with dark theme

Features:
- Storage overview cards (total used, saved, ratio, percentage)
- Artifact statistics cards
- Deduplication effectiveness visualization with progress bars
- Circular progress indicator for deduplication rate
- Top referenced artifacts table

Updated files:
- api.ts - added getStats, getDeduplicationStats, getTimelineStats, getCrossProjectStats
- types.ts - added Stats, DeduplicationStats, TimelineStats interfaces
- App.tsx - added /dashboard route
- Layout.tsx - added Dashboard navigation link
2026-01-05 15:02:30 -06:00
Mondo Diaz
c79b10cbc5 Add comprehensive stats endpoints and reporting features
Backend stats endpoints:
- GET /api/v1/project/:project/packages/:package/stats - per-package stats
- GET /api/v1/artifact/:id/stats - artifact reference statistics
- GET /api/v1/stats/cross-project - cross-project deduplication detection
- GET /api/v1/stats/timeline - time-based metrics (daily/weekly/monthly)
- GET /api/v1/stats/export - CSV/JSON export
- GET /api/v1/stats/report - markdown/JSON summary report generation

Enhanced existing endpoints:
- Added storage_saved_bytes and deduplication_ratio to project stats
- Added date range filtering via from_date/to_date params

New schemas:
- PackageStatsResponse
- ArtifactStatsResponse
- CrossProjectDeduplicationResponse
- TimeBasedStatsResponse
- StatsReportResponse
2026-01-05 14:57:47 -06:00
Mondo Diaz
e215ecabcd Add S3 configuration options and improved error handling
- Add s3_verify_ssl config option for SSL/TLS verification
- Add s3_connect_timeout and s3_read_timeout config options
- Add s3_max_retries config option with adaptive retry mode
- Add S3StorageUnavailableError for backend availability issues
- Add HashCollisionError for detecting extremely rare hash collisions
- Add hash collision detection by comparing file sizes on dedup
- Handle network interruption and timeout errors explicitly
- Update routes.py to handle new exception types with appropriate HTTP codes
2026-01-05 14:46:18 -06:00
Mondo Diaz
eca291d194 Add S3 verification and failure cleanup tests
- Add test_s3_bucket_single_object_after_duplicates to verify only one S3 object exists
- Add tests for upload failure scenarios (invalid project/package, empty file)
- Add tests for orphaned S3 objects and database records cleanup
- Add S3 direct access helpers (list_s3_objects_by_hash, s3_object_exists, etc.)
- Fix conftest.py to use setdefault for env vars (don't override container config)

All 52 tests now pass.
2026-01-05 14:39:22 -06:00
Mondo Diaz
7c31b6a244 Add integration tests for deduplication and ref_count
- Add test_integration_uploads.py with 12 tests for duplicate upload scenarios
- Add test_ref_count.py with 7 tests for ref_count management
- Fix ArtifactDetailResponse to include sha256 and checksum fields
- Fix health check SQL warning by wrapping in text()
- Update tests to use unique content per test run for idempotency
2026-01-05 14:29:12 -06:00
16 changed files with 3234 additions and 21 deletions

View File

@@ -10,18 +10,32 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- Added `StorageBackend` protocol/interface for backend-agnostic storage (#33)
- Added `health_check()` method to storage backend with `/health` endpoint integration (#33)
- Added `verify_integrity()` method for post-upload hash validation (#33)
- Added S3 configuration options: `s3_verify_ssl`, `s3_connect_timeout`, `s3_read_timeout`, `s3_max_retries` (#33)
- Added `S3StorageUnavailableError` and `HashCollisionError` exception types (#33)
- Added hash collision detection by comparing file sizes during deduplication (#33)
- Added garbage collection endpoint `POST /api/v1/admin/garbage-collect` for orphaned artifacts (#36)
- Added orphaned artifacts listing endpoint `GET /api/v1/admin/orphaned-artifacts` (#36)
- Added global storage statistics endpoint `GET /api/v1/stats` (#34)
- Added storage breakdown endpoint `GET /api/v1/stats/storage` (#34)
- Added deduplication metrics endpoint `GET /api/v1/stats/deduplication` (#34)
- Added per-project statistics endpoint `GET /api/v1/projects/{project}/stats` (#34)
- Added per-package statistics endpoint `GET /api/v1/project/{project}/packages/{package}/stats` (#34)
- Added per-artifact statistics endpoint `GET /api/v1/artifact/{id}/stats` (#34)
- Added cross-project deduplication endpoint `GET /api/v1/stats/cross-project` (#34)
- Added timeline statistics endpoint `GET /api/v1/stats/timeline` with daily/weekly/monthly periods (#34)
- Added stats export endpoint `GET /api/v1/stats/export` with JSON/CSV formats (#34)
- Added summary report endpoint `GET /api/v1/stats/report` with markdown/JSON formats (#34)
- Added Dashboard page at `/dashboard` with storage and deduplication visualizations (#34)
- Added pytest infrastructure with mock S3 client for unit testing (#35)
- Added unit tests for SHA256 hash calculation (#35)
- Added unit tests for duplicate detection and deduplication behavior (#35)
- Added integration tests for upload scenarios and ref_count management (#35)
- Added integration tests for S3 verification and failure cleanup (#35)
- Added integration tests for all stats endpoints (#35)
- Added test dependencies to requirements.txt (pytest, pytest-asyncio, pytest-cov, httpx, moto) (#35)
### Fixed
- Fixed Helm chart `minio.ingress` conflicting with Bitnami MinIO subchart by renaming to `minioIngress` (#48)
- Fixed JSON report serialization error for Decimal types in `GET /api/v1/stats/report` (#34)
## [0.3.0] - 2025-12-15
### Changed

View File

@@ -22,7 +22,9 @@ class Settings(BaseSettings):
database_pool_size: int = 5 # Number of connections to keep open
database_max_overflow: int = 10 # Max additional connections beyond pool_size
database_pool_timeout: int = 30 # Seconds to wait for a connection from pool
database_pool_recycle: int = 1800 # Recycle connections after this many seconds (30 min)
database_pool_recycle: int = (
1800 # Recycle connections after this many seconds (30 min)
)
# S3
s3_endpoint: str = ""
@@ -31,10 +33,16 @@ class Settings(BaseSettings):
s3_access_key_id: str = ""
s3_secret_access_key: str = ""
s3_use_path_style: bool = True
s3_verify_ssl: bool = True # Set to False for self-signed certs (dev only)
s3_connect_timeout: int = 10 # Connection timeout in seconds
s3_read_timeout: int = 60 # Read timeout in seconds
s3_max_retries: int = 3 # Max retry attempts for transient failures
# Download settings
download_mode: str = "presigned" # "presigned", "redirect", or "proxy"
presigned_url_expiry: int = 3600 # Presigned URL expiry in seconds (default: 1 hour)
presigned_url_expiry: int = (
3600 # Presigned URL expiry in seconds (default: 1 hour)
)
@property
def database_url(self) -> str:

View File

@@ -1,3 +1,4 @@
import json
from datetime import datetime, timedelta, timezone
from fastapi import (
APIRouter,
@@ -13,7 +14,7 @@ from fastapi import (
)
from fastapi.responses import StreamingResponse, RedirectResponse
from sqlalchemy.orm import Session
from sqlalchemy import or_, func
from sqlalchemy import or_, func, text
from typing import List, Optional, Literal
import math
import re
@@ -29,6 +30,8 @@ from .storage import (
HashComputationError,
S3ExistenceCheckError,
S3UploadError,
S3StorageUnavailableError,
HashCollisionError,
)
from .models import (
Project,
@@ -78,6 +81,11 @@ from .schemas import (
StorageStatsResponse,
DeduplicationStatsResponse,
ProjectStatsResponse,
PackageStatsResponse,
ArtifactStatsResponse,
CrossProjectDeduplicationResponse,
TimeBasedStatsResponse,
StatsReportResponse,
)
from .metadata import extract_metadata
from .config import get_settings
@@ -263,7 +271,7 @@ def health_check(
# Check database connectivity
try:
db.execute("SELECT 1")
db.execute(text("SELECT 1"))
database_healthy = True
except Exception as e:
logger.warning(f"Database health check failed: {e}")
@@ -998,6 +1006,19 @@ def upload_artifact(
status_code=503,
detail="Storage service temporarily unavailable. Please retry.",
)
except S3StorageUnavailableError as e:
logger.error(f"S3 storage unavailable: {e}")
raise HTTPException(
status_code=503,
detail="Storage backend is unavailable. Please retry later.",
)
except HashCollisionError as e:
# This is extremely rare - log critical alert
logger.critical(f"HASH COLLISION DETECTED: {e}")
raise HTTPException(
status_code=500,
detail="Data integrity error detected. Please contact support.",
)
except StorageError as e:
logger.error(f"Storage error during upload: {e}")
raise HTTPException(status_code=500, detail="Internal storage error")
@@ -2131,9 +2152,13 @@ def get_artifact(artifact_id: str, db: Session = Depends(get_db)):
return ArtifactDetailResponse(
id=artifact.id,
sha256=artifact.id, # SHA256 hash is the artifact ID
size=artifact.size,
content_type=artifact.content_type,
original_name=artifact.original_name,
checksum_md5=artifact.checksum_md5,
checksum_sha1=artifact.checksum_sha1,
s3_etag=artifact.s3_etag,
created_at=artifact.created_at,
created_by=artifact.created_by,
ref_count=artifact.ref_count,
@@ -2411,11 +2436,12 @@ def get_deduplication_stats(
most_referenced = [
{
"id": a.id,
"artifact_id": a.id,
"ref_count": a.ref_count,
"size": a.size,
"storage_saved": a.size * (a.ref_count - 1),
"original_name": a.original_name,
"content_type": a.content_type,
}
for a in top_artifacts
]
@@ -2480,17 +2506,25 @@ def get_project_stats(
artifact_count = artifact_stats[0] if artifact_stats else 0
total_size_bytes = artifact_stats[1] if artifact_stats else 0
# Upload counts
# Upload counts and storage saved
upload_stats = (
db.query(
func.count(Upload.id),
func.count(Upload.id).filter(Upload.deduplicated == True),
func.coalesce(
func.sum(Artifact.size).filter(Upload.deduplicated == True), 0
),
)
.join(Artifact, Upload.artifact_id == Artifact.id)
.filter(Upload.package_id.in_(package_ids))
.first()
)
upload_count = upload_stats[0] if upload_stats else 0
deduplicated_uploads = upload_stats[1] if upload_stats else 0
storage_saved_bytes = upload_stats[2] if upload_stats else 0
# Calculate deduplication ratio
deduplication_ratio = upload_count / artifact_count if artifact_count > 0 else 1.0
return ProjectStatsResponse(
project_id=str(project.id),
@@ -2501,4 +2535,502 @@ def get_project_stats(
total_size_bytes=total_size_bytes,
upload_count=upload_count,
deduplicated_uploads=deduplicated_uploads,
storage_saved_bytes=storage_saved_bytes,
deduplication_ratio=deduplication_ratio,
)
# =============================================================================
# Package Statistics Endpoint
# =============================================================================
@router.get(
"/api/v1/project/{project_name}/packages/{package_name}/stats",
response_model=PackageStatsResponse,
)
def get_package_stats(
project_name: str,
package_name: str,
db: Session = Depends(get_db),
):
"""Get statistics for a specific package."""
project = db.query(Project).filter(Project.name == project_name).first()
if not project:
raise HTTPException(status_code=404, detail="Project not found")
package = (
db.query(Package)
.filter(Package.project_id == project.id, Package.name == package_name)
.first()
)
if not package:
raise HTTPException(status_code=404, detail="Package not found")
# Tag count
tag_count = (
db.query(func.count(Tag.id)).filter(Tag.package_id == package.id).scalar() or 0
)
# Artifact stats via uploads
artifact_stats = (
db.query(
func.count(func.distinct(Upload.artifact_id)),
func.coalesce(func.sum(Artifact.size), 0),
)
.join(Artifact, Upload.artifact_id == Artifact.id)
.filter(Upload.package_id == package.id)
.first()
)
artifact_count = artifact_stats[0] if artifact_stats else 0
total_size_bytes = artifact_stats[1] if artifact_stats else 0
# Upload stats
upload_stats = (
db.query(
func.count(Upload.id),
func.count(Upload.id).filter(Upload.deduplicated == True),
func.coalesce(
func.sum(Artifact.size).filter(Upload.deduplicated == True), 0
),
)
.join(Artifact, Upload.artifact_id == Artifact.id)
.filter(Upload.package_id == package.id)
.first()
)
upload_count = upload_stats[0] if upload_stats else 0
deduplicated_uploads = upload_stats[1] if upload_stats else 0
storage_saved_bytes = upload_stats[2] if upload_stats else 0
deduplication_ratio = upload_count / artifact_count if artifact_count > 0 else 1.0
return PackageStatsResponse(
package_id=str(package.id),
package_name=package.name,
project_name=project.name,
tag_count=tag_count,
artifact_count=artifact_count,
total_size_bytes=total_size_bytes,
upload_count=upload_count,
deduplicated_uploads=deduplicated_uploads,
storage_saved_bytes=storage_saved_bytes,
deduplication_ratio=deduplication_ratio,
)
# =============================================================================
# Artifact Statistics Endpoint
# =============================================================================
@router.get(
"/api/v1/artifact/{artifact_id}/stats", response_model=ArtifactStatsResponse
)
def get_artifact_stats(
artifact_id: str,
db: Session = Depends(get_db),
):
"""Get detailed statistics for a specific artifact."""
artifact = db.query(Artifact).filter(Artifact.id == artifact_id).first()
if not artifact:
raise HTTPException(status_code=404, detail="Artifact not found")
# Get all tags referencing this artifact
tags = (
db.query(Tag, Package, Project)
.join(Package, Tag.package_id == Package.id)
.join(Project, Package.project_id == Project.id)
.filter(Tag.artifact_id == artifact_id)
.all()
)
tag_list = [
{
"tag_name": tag.name,
"package_name": pkg.name,
"project_name": proj.name,
"created_at": tag.created_at.isoformat() if tag.created_at else None,
}
for tag, pkg, proj in tags
]
# Get unique projects and packages
projects = list(set(proj.name for _, _, proj in tags))
packages = list(set(f"{proj.name}/{pkg.name}" for _, pkg, proj in tags))
# Get first and last upload times
upload_times = (
db.query(func.min(Upload.uploaded_at), func.max(Upload.uploaded_at))
.filter(Upload.artifact_id == artifact_id)
.first()
)
return ArtifactStatsResponse(
artifact_id=artifact.id,
sha256=artifact.id,
size=artifact.size,
ref_count=artifact.ref_count,
storage_savings=(artifact.ref_count - 1) * artifact.size
if artifact.ref_count > 1
else 0,
tags=tag_list,
projects=projects,
packages=packages,
first_uploaded=upload_times[0] if upload_times else None,
last_referenced=upload_times[1] if upload_times else None,
)
# =============================================================================
# Cross-Project Deduplication Endpoint
# =============================================================================
@router.get(
"/api/v1/stats/cross-project", response_model=CrossProjectDeduplicationResponse
)
def get_cross_project_deduplication(
limit: int = Query(default=20, ge=1, le=100),
db: Session = Depends(get_db),
):
"""Get statistics about artifacts shared across multiple projects."""
# Find artifacts that appear in multiple projects
# Subquery to count distinct projects per artifact
project_counts = (
db.query(
Upload.artifact_id,
func.count(func.distinct(Package.project_id)).label("project_count"),
)
.join(Package, Upload.package_id == Package.id)
.group_by(Upload.artifact_id)
.subquery()
)
# Get artifacts with more than one project
shared_artifacts_query = (
db.query(Artifact, project_counts.c.project_count)
.join(project_counts, Artifact.id == project_counts.c.artifact_id)
.filter(project_counts.c.project_count > 1)
.order_by(project_counts.c.project_count.desc(), Artifact.size.desc())
.limit(limit)
)
shared_artifacts = []
total_savings = 0
for artifact, project_count in shared_artifacts_query:
# Calculate savings: (project_count - 1) * size
savings = (project_count - 1) * artifact.size
total_savings += savings
# Get project names
project_names = (
db.query(func.distinct(Project.name))
.join(Package, Package.project_id == Project.id)
.join(Upload, Upload.package_id == Package.id)
.filter(Upload.artifact_id == artifact.id)
.all()
)
shared_artifacts.append(
{
"artifact_id": artifact.id,
"size": artifact.size,
"project_count": project_count,
"projects": [p[0] for p in project_names],
"storage_savings": savings,
}
)
# Total count of shared artifacts
shared_count = (
db.query(func.count())
.select_from(project_counts)
.filter(project_counts.c.project_count > 1)
.scalar()
or 0
)
return CrossProjectDeduplicationResponse(
shared_artifacts_count=shared_count,
total_cross_project_savings=total_savings,
shared_artifacts=shared_artifacts,
)
# =============================================================================
# Time-Based Statistics Endpoint
# =============================================================================
@router.get("/api/v1/stats/timeline", response_model=TimeBasedStatsResponse)
def get_time_based_stats(
period: str = Query(default="daily", regex="^(daily|weekly|monthly)$"),
from_date: Optional[datetime] = Query(default=None),
to_date: Optional[datetime] = Query(default=None),
db: Session = Depends(get_db),
):
"""Get deduplication statistics over time."""
from datetime import timedelta
# Default date range: last 30 days
if to_date is None:
to_date = datetime.utcnow()
if from_date is None:
from_date = to_date - timedelta(days=30)
# Determine date truncation based on period
if period == "daily":
date_trunc = func.date_trunc("day", Upload.uploaded_at)
elif period == "weekly":
date_trunc = func.date_trunc("week", Upload.uploaded_at)
else: # monthly
date_trunc = func.date_trunc("month", Upload.uploaded_at)
# Query uploads grouped by period
stats = (
db.query(
date_trunc.label("period_start"),
func.count(Upload.id).label("total_uploads"),
func.count(func.distinct(Upload.artifact_id)).label("unique_artifacts"),
func.count(Upload.id)
.filter(Upload.deduplicated == True)
.label("duplicated"),
func.coalesce(
func.sum(Artifact.size).filter(Upload.deduplicated == True), 0
).label("bytes_saved"),
)
.join(Artifact, Upload.artifact_id == Artifact.id)
.filter(Upload.uploaded_at >= from_date, Upload.uploaded_at <= to_date)
.group_by(date_trunc)
.order_by(date_trunc)
.all()
)
data_points = [
{
"date": row.period_start.isoformat() if row.period_start else None,
"total_uploads": row.total_uploads,
"unique_artifacts": row.unique_artifacts,
"duplicated_uploads": row.duplicated,
"bytes_saved": row.bytes_saved,
}
for row in stats
]
return TimeBasedStatsResponse(
period=period,
start_date=from_date,
end_date=to_date,
data_points=data_points,
)
# =============================================================================
# CSV Export Endpoint
# =============================================================================
@router.get("/api/v1/stats/export")
def export_stats(
format: str = Query(default="json", regex="^(json|csv)$"),
db: Session = Depends(get_db),
):
"""Export global statistics in JSON or CSV format."""
from fastapi.responses import Response
# Gather all stats
total_artifacts = db.query(func.count(Artifact.id)).scalar() or 0
total_size = db.query(func.coalesce(func.sum(Artifact.size), 0)).scalar() or 0
total_uploads = db.query(func.count(Upload.id)).scalar() or 0
deduplicated_uploads = (
db.query(func.count(Upload.id)).filter(Upload.deduplicated == True).scalar()
or 0
)
unique_artifacts = (
db.query(func.count(Artifact.id)).filter(Artifact.ref_count > 0).scalar() or 0
)
storage_saved = (
db.query(func.coalesce(func.sum(Artifact.size), 0))
.join(Upload, Upload.artifact_id == Artifact.id)
.filter(Upload.deduplicated == True)
.scalar()
or 0
)
stats = {
"generated_at": datetime.utcnow().isoformat(),
"total_artifacts": total_artifacts,
"total_size_bytes": total_size,
"total_uploads": total_uploads,
"unique_artifacts": unique_artifacts,
"deduplicated_uploads": deduplicated_uploads,
"storage_saved_bytes": storage_saved,
"deduplication_ratio": total_uploads / unique_artifacts
if unique_artifacts > 0
else 1.0,
}
if format == "csv":
import csv
import io
output = io.StringIO()
writer = csv.writer(output)
writer.writerow(["Metric", "Value"])
for key, value in stats.items():
writer.writerow([key, value])
return Response(
content=output.getvalue(),
media_type="text/csv",
headers={"Content-Disposition": "attachment; filename=orchard_stats.csv"},
)
return stats
# =============================================================================
# Summary Report Endpoint
# =============================================================================
@router.get("/api/v1/stats/report", response_model=StatsReportResponse)
def generate_stats_report(
format: str = Query(default="markdown", regex="^(markdown|json)$"),
db: Session = Depends(get_db),
):
"""Generate a summary report of storage and deduplication statistics."""
# Gather stats
total_artifacts = db.query(func.count(Artifact.id)).scalar() or 0
total_size = int(db.query(func.coalesce(func.sum(Artifact.size), 0)).scalar() or 0)
total_uploads = db.query(func.count(Upload.id)).scalar() or 0
deduplicated_uploads = (
db.query(func.count(Upload.id)).filter(Upload.deduplicated == True).scalar()
or 0
)
unique_artifacts = (
db.query(func.count(Artifact.id)).filter(Artifact.ref_count > 0).scalar() or 0
)
orphaned_artifacts = (
db.query(func.count(Artifact.id)).filter(Artifact.ref_count == 0).scalar() or 0
)
storage_saved = int(
db.query(func.coalesce(func.sum(Artifact.size), 0))
.join(Upload, Upload.artifact_id == Artifact.id)
.filter(Upload.deduplicated == True)
.scalar()
or 0
)
project_count = db.query(func.count(Project.id)).scalar() or 0
package_count = db.query(func.count(Package.id)).scalar() or 0
# Top 5 most referenced artifacts
top_artifacts = (
db.query(Artifact)
.filter(Artifact.ref_count > 1)
.order_by(Artifact.ref_count.desc())
.limit(5)
.all()
)
def format_bytes(b):
for unit in ["B", "KB", "MB", "GB", "TB"]:
if b < 1024:
return f"{b:.2f} {unit}"
b /= 1024
return f"{b:.2f} PB"
generated_at = datetime.utcnow()
if format == "markdown":
report = f"""# Orchard Storage Report
Generated: {generated_at.strftime("%Y-%m-%d %H:%M:%S UTC")}
## Overview
| Metric | Value |
|--------|-------|
| Projects | {project_count} |
| Packages | {package_count} |
| Total Artifacts | {total_artifacts} |
| Unique Artifacts | {unique_artifacts} |
| Orphaned Artifacts | {orphaned_artifacts} |
## Storage
| Metric | Value |
|--------|-------|
| Total Storage Used | {format_bytes(total_size)} |
| Storage Saved | {format_bytes(storage_saved)} |
| Savings Percentage | {(storage_saved / (total_size + storage_saved) * 100) if (total_size + storage_saved) > 0 else 0:.1f}% |
## Uploads
| Metric | Value |
|--------|-------|
| Total Uploads | {total_uploads} |
| Deduplicated Uploads | {deduplicated_uploads} |
| Deduplication Ratio | {total_uploads / unique_artifacts if unique_artifacts > 0 else 1:.2f}x |
## Top Referenced Artifacts
| Artifact ID | Size | References | Savings |
|-------------|------|------------|---------|
"""
for art in top_artifacts:
savings = (art.ref_count - 1) * art.size
report += f"| `{art.id[:12]}...` | {format_bytes(art.size)} | {art.ref_count} | {format_bytes(savings)} |\n"
return StatsReportResponse(
format="markdown",
generated_at=generated_at,
content=report,
)
# JSON format
return StatsReportResponse(
format="json",
generated_at=generated_at,
content=json.dumps(
{
"overview": {
"projects": project_count,
"packages": package_count,
"total_artifacts": total_artifacts,
"unique_artifacts": unique_artifacts,
"orphaned_artifacts": orphaned_artifacts,
},
"storage": {
"total_bytes": total_size,
"saved_bytes": storage_saved,
"savings_percentage": (
storage_saved / (total_size + storage_saved) * 100
)
if (total_size + storage_saved) > 0
else 0,
},
"uploads": {
"total": total_uploads,
"deduplicated": deduplicated_uploads,
"ratio": total_uploads / unique_artifacts
if unique_artifacts > 0
else 1,
},
"top_artifacts": [
{
"id": art.id,
"size": art.size,
"ref_count": art.ref_count,
"savings": (art.ref_count - 1) * art.size,
}
for art in top_artifacts
],
},
indent=2,
),
)

View File

@@ -456,3 +456,62 @@ class ProjectStatsResponse(BaseModel):
total_size_bytes: int
upload_count: int
deduplicated_uploads: int
storage_saved_bytes: int = 0 # Bytes saved through deduplication
deduplication_ratio: float = 1.0 # upload_count / artifact_count
class PackageStatsResponse(BaseModel):
"""Per-package statistics"""
package_id: str
package_name: str
project_name: str
tag_count: int
artifact_count: int
total_size_bytes: int
upload_count: int
deduplicated_uploads: int
storage_saved_bytes: int = 0
deduplication_ratio: float = 1.0
class ArtifactStatsResponse(BaseModel):
"""Per-artifact reference statistics"""
artifact_id: str
sha256: str
size: int
ref_count: int
storage_savings: int # (ref_count - 1) * size
tags: List[Dict[str, Any]] # Tags referencing this artifact
projects: List[str] # Projects using this artifact
packages: List[str] # Packages using this artifact
first_uploaded: Optional[datetime] = None
last_referenced: Optional[datetime] = None
class CrossProjectDeduplicationResponse(BaseModel):
"""Cross-project deduplication statistics"""
shared_artifacts_count: int # Artifacts used in multiple projects
total_cross_project_savings: int # Bytes saved by cross-project sharing
shared_artifacts: List[Dict[str, Any]] # Details of shared artifacts
class TimeBasedStatsResponse(BaseModel):
"""Time-based deduplication statistics"""
period: str # "daily", "weekly", "monthly"
start_date: datetime
end_date: datetime
data_points: List[
Dict[str, Any]
] # List of {date, uploads, unique, duplicated, bytes_saved}
class StatsReportResponse(BaseModel):
"""Summary report in various formats"""
format: str # "json", "csv", "markdown"
generated_at: datetime
content: str # The report content

View File

@@ -14,7 +14,13 @@ from typing import (
)
import boto3
from botocore.config import Config
from botocore.exceptions import ClientError
from botocore.exceptions import (
ClientError,
ConnectionError as BotoConnectionError,
EndpointConnectionError,
ReadTimeoutError,
ConnectTimeoutError,
)
from .config import get_settings
@@ -193,10 +199,33 @@ class StorageResult(NamedTuple):
s3_etag: Optional[str] = None
class S3StorageUnavailableError(StorageError):
"""Raised when S3 storage backend is unavailable"""
pass
class HashCollisionError(StorageError):
"""Raised when a hash collision is detected (extremely rare)"""
pass
class S3Storage:
def __init__(self):
# Build config with retry and timeout settings
s3_config = {}
if settings.s3_use_path_style:
s3_config["addressing_style"] = "path"
config = Config(
s3={"addressing_style": "path"} if settings.s3_use_path_style else {}
s3=s3_config if s3_config else None,
connect_timeout=settings.s3_connect_timeout,
read_timeout=settings.s3_read_timeout,
retries={
"max_attempts": settings.s3_max_retries,
"mode": "adaptive", # Adaptive retry mode for better handling
},
)
self.client = boto3.client(
@@ -206,6 +235,7 @@ class S3Storage:
aws_access_key_id=settings.s3_access_key_id,
aws_secret_access_key=settings.s3_secret_access_key,
config=config,
verify=settings.s3_verify_ssl, # SSL/TLS verification
)
self.bucket = settings.s3_bucket
# Store active multipart uploads for resumable support
@@ -271,14 +301,38 @@ class S3Storage:
Body=content,
)
s3_etag = response.get("ETag", "").strip('"')
except (EndpointConnectionError, BotoConnectionError) as e:
logger.error(f"S3 storage unavailable: {e}")
raise S3StorageUnavailableError(
f"Storage backend unavailable: {e}"
) from e
except (ReadTimeoutError, ConnectTimeoutError) as e:
logger.error(f"S3 operation timed out: {e}")
raise S3UploadError(f"Upload timed out: {e}") from e
except ClientError as e:
error_code = e.response.get("Error", {}).get("Code", "")
if error_code == "ServiceUnavailable":
logger.error(f"S3 service unavailable: {e}")
raise S3StorageUnavailableError(
f"Storage service unavailable: {e}"
) from e
logger.error(f"S3 upload failed: {e}")
raise S3UploadError(f"Failed to upload to S3: {e}") from e
else:
# Get existing ETag
# Get existing ETag and verify integrity (detect potential hash collision)
obj_info = self.get_object_info(s3_key)
if obj_info:
s3_etag = obj_info.get("etag", "").strip('"')
# Check for hash collision by comparing size
existing_size = obj_info.get("size", 0)
if existing_size != size:
logger.critical(
f"HASH COLLISION DETECTED! Hash {sha256_hash} has size mismatch: "
f"existing={existing_size}, new={size}. This is extremely rare."
)
raise HashCollisionError(
f"Hash collision detected for {sha256_hash}: size mismatch"
)
return StorageResult(
sha256=sha256_hash,
@@ -341,6 +395,17 @@ class S3Storage:
if exists:
obj_info = self.get_object_info(s3_key)
s3_etag = obj_info.get("etag", "").strip('"') if obj_info else None
# Check for hash collision by comparing size
if obj_info:
existing_size = obj_info.get("size", 0)
if existing_size != size:
logger.critical(
f"HASH COLLISION DETECTED! Hash {sha256_hash} has size mismatch: "
f"existing={existing_size}, new={size}. This is extremely rare."
)
raise HashCollisionError(
f"Hash collision detected for {sha256_hash}: size mismatch"
)
return StorageResult(
sha256=sha256_hash,
size=size,
@@ -354,7 +419,11 @@ class S3Storage:
file.seek(0)
# Start multipart upload
mpu = self.client.create_multipart_upload(Bucket=self.bucket, Key=s3_key)
try:
mpu = self.client.create_multipart_upload(Bucket=self.bucket, Key=s3_key)
except (EndpointConnectionError, BotoConnectionError) as e:
logger.error(f"S3 storage unavailable for multipart upload: {e}")
raise S3StorageUnavailableError(f"Storage backend unavailable: {e}") from e
upload_id = mpu["UploadId"]
try:

View File

@@ -4,7 +4,7 @@ python_files = test_*.py
python_functions = test_*
python_classes = Test*
asyncio_mode = auto
addopts = -v --tb=short
addopts = -v --tb=short --cov=app --cov-report=term-missing --cov-report=html:coverage_html --cov-fail-under=0
filterwarnings =
ignore::DeprecationWarning
ignore::UserWarning
@@ -12,3 +12,18 @@ markers =
unit: Unit tests (no external dependencies)
integration: Integration tests (require database/storage)
slow: Slow tests (skip with -m "not slow")
# Coverage configuration
[coverage:run]
source = app
omit =
*/tests/*
*/__pycache__/*
[coverage:report]
exclude_lines =
pragma: no cover
def __repr__
raise NotImplementedError
if __name__ == .__main__.:
pass

View File

@@ -14,16 +14,17 @@ from typing import Generator, BinaryIO
from unittest.mock import MagicMock, patch
import io
# Set test environment before importing app modules
os.environ["ORCHARD_DATABASE_HOST"] = "localhost"
os.environ["ORCHARD_DATABASE_PORT"] = "5432"
os.environ["ORCHARD_DATABASE_USER"] = "test"
os.environ["ORCHARD_DATABASE_PASSWORD"] = "test"
os.environ["ORCHARD_DATABASE_DBNAME"] = "orchard_test"
os.environ["ORCHARD_S3_ENDPOINT"] = "http://localhost:9000"
os.environ["ORCHARD_S3_BUCKET"] = "test-bucket"
os.environ["ORCHARD_S3_ACCESS_KEY_ID"] = "test"
os.environ["ORCHARD_S3_SECRET_ACCESS_KEY"] = "test"
# Set test environment defaults before importing app modules
# Use setdefault to NOT override existing env vars (from docker-compose)
os.environ.setdefault("ORCHARD_DATABASE_HOST", "localhost")
os.environ.setdefault("ORCHARD_DATABASE_PORT", "5432")
os.environ.setdefault("ORCHARD_DATABASE_USER", "test")
os.environ.setdefault("ORCHARD_DATABASE_PASSWORD", "test")
os.environ.setdefault("ORCHARD_DATABASE_DBNAME", "orchard_test")
os.environ.setdefault("ORCHARD_S3_ENDPOINT", "http://localhost:9000")
os.environ.setdefault("ORCHARD_S3_BUCKET", "test-bucket")
os.environ.setdefault("ORCHARD_S3_ACCESS_KEY_ID", "test")
os.environ.setdefault("ORCHARD_S3_SECRET_ACCESS_KEY", "test")
# =============================================================================
@@ -199,3 +200,215 @@ def test_app():
from app.main import app
return TestClient(app)
# =============================================================================
# Integration Test Fixtures
# =============================================================================
@pytest.fixture
def integration_client():
"""
Create a test client for integration tests.
Uses the real database and MinIO from docker-compose.local.yml.
"""
from httpx import Client
# Connect to the running orchard-server container
base_url = os.environ.get("ORCHARD_TEST_URL", "http://localhost:8080")
with Client(base_url=base_url, timeout=30.0) as client:
yield client
@pytest.fixture
def unique_test_id():
"""Generate a unique ID for test isolation."""
import uuid
return f"test-{uuid.uuid4().hex[:8]}"
@pytest.fixture
def test_project(integration_client, unique_test_id):
"""
Create a test project and clean it up after the test.
Yields the project name.
"""
project_name = f"test-project-{unique_test_id}"
# Create project
response = integration_client.post(
"/api/v1/projects",
json={"name": project_name, "description": "Test project", "is_public": True},
)
assert response.status_code == 200, f"Failed to create project: {response.text}"
yield project_name
# Cleanup: delete project
try:
integration_client.delete(f"/api/v1/projects/{project_name}")
except Exception:
pass # Ignore cleanup errors
@pytest.fixture
def test_package(integration_client, test_project, unique_test_id):
"""
Create a test package within a test project.
Yields (project_name, package_name) tuple.
"""
package_name = f"test-package-{unique_test_id}"
# Create package
response = integration_client.post(
f"/api/v1/project/{test_project}/packages",
json={"name": package_name, "description": "Test package"},
)
assert response.status_code == 200, f"Failed to create package: {response.text}"
yield (test_project, package_name)
# Cleanup handled by test_project fixture (cascade delete)
@pytest.fixture
def test_content():
"""
Generate unique test content for each test.
Returns (content_bytes, expected_sha256) tuple.
"""
import uuid
content = f"test-content-{uuid.uuid4().hex}".encode()
sha256 = compute_sha256(content)
return (content, sha256)
def upload_test_file(
client,
project: str,
package: str,
content: bytes,
filename: str = "test.bin",
tag: str = None,
) -> dict:
"""
Helper function to upload a test file.
Returns the upload response as a dict.
"""
files = {"file": (filename, io.BytesIO(content), "application/octet-stream")}
data = {}
if tag:
data["tag"] = tag
response = client.post(
f"/api/v1/project/{project}/{package}/upload",
files=files,
data=data if data else None,
)
assert response.status_code == 200, f"Upload failed: {response.text}"
return response.json()
# =============================================================================
# S3 Direct Access Helpers (for integration tests)
# =============================================================================
def get_s3_client():
"""
Create a boto3 S3 client for direct S3 access in integration tests.
Uses environment variables for configuration (same as the app).
Note: When running in container, S3 endpoint should be 'minio:9000' not 'localhost:9000'.
"""
import boto3
from botocore.config import Config
config = Config(s3={"addressing_style": "path"})
# Use the same endpoint as the app (minio:9000 in container, localhost:9000 locally)
endpoint = os.environ.get("ORCHARD_S3_ENDPOINT", "http://minio:9000")
return boto3.client(
"s3",
endpoint_url=endpoint,
region_name=os.environ.get("ORCHARD_S3_REGION", "us-east-1"),
aws_access_key_id=os.environ.get("ORCHARD_S3_ACCESS_KEY_ID", "minioadmin"),
aws_secret_access_key=os.environ.get(
"ORCHARD_S3_SECRET_ACCESS_KEY", "minioadmin"
),
config=config,
)
def get_s3_bucket():
"""Get the S3 bucket name from environment."""
return os.environ.get("ORCHARD_S3_BUCKET", "orchard-artifacts")
def list_s3_objects_by_hash(sha256_hash: str) -> list:
"""
List S3 objects that match a specific SHA256 hash.
Uses the fruits/{hash[:2]}/{hash[2:4]}/{hash} key pattern.
Returns list of matching object keys.
"""
client = get_s3_client()
bucket = get_s3_bucket()
prefix = f"fruits/{sha256_hash[:2]}/{sha256_hash[2:4]}/{sha256_hash}"
response = client.list_objects_v2(Bucket=bucket, Prefix=prefix)
if "Contents" not in response:
return []
return [obj["Key"] for obj in response["Contents"]]
def count_s3_objects_by_prefix(prefix: str) -> int:
"""
Count S3 objects with a given prefix.
Useful for checking if duplicate uploads created multiple objects.
"""
client = get_s3_client()
bucket = get_s3_bucket()
response = client.list_objects_v2(Bucket=bucket, Prefix=prefix)
if "Contents" not in response:
return 0
return len(response["Contents"])
def s3_object_exists(sha256_hash: str) -> bool:
"""
Check if an S3 object exists for a given SHA256 hash.
"""
objects = list_s3_objects_by_hash(sha256_hash)
return len(objects) > 0
def delete_s3_object_by_hash(sha256_hash: str) -> bool:
"""
Delete an S3 object by its SHA256 hash (for test cleanup).
"""
client = get_s3_client()
bucket = get_s3_bucket()
s3_key = f"fruits/{sha256_hash[:2]}/{sha256_hash[2:4]}/{sha256_hash}"
try:
client.delete_object(Bucket=bucket, Key=s3_key)
return True
except Exception:
return False

View File

@@ -0,0 +1,551 @@
"""
Integration tests for duplicate uploads and storage verification.
These tests require the full stack to be running (docker-compose.local.yml).
Tests cover:
- Duplicate upload scenarios across packages and projects
- Storage verification (single S3 object, single artifact row)
- Upload table tracking
- Content integrity verification
- Concurrent upload handling
- Failure cleanup
"""
import pytest
import io
import threading
import time
from concurrent.futures import ThreadPoolExecutor, as_completed
from tests.conftest import (
compute_sha256,
upload_test_file,
list_s3_objects_by_hash,
s3_object_exists,
delete_s3_object_by_hash,
)
class TestDuplicateUploadScenarios:
"""Integration tests for duplicate upload behavior."""
@pytest.mark.integration
def test_same_file_twice_returns_same_artifact_id(
self, integration_client, test_package
):
"""Test uploading same file twice returns same artifact_id."""
project, package = test_package
content = b"content uploaded twice for same artifact test"
expected_hash = compute_sha256(content)
# First upload
result1 = upload_test_file(
integration_client, project, package, content, tag="first"
)
assert result1["artifact_id"] == expected_hash
# Second upload
result2 = upload_test_file(
integration_client, project, package, content, tag="second"
)
assert result2["artifact_id"] == expected_hash
assert result1["artifact_id"] == result2["artifact_id"]
@pytest.mark.integration
def test_same_file_twice_increments_ref_count(
self, integration_client, test_package
):
"""Test uploading same file twice increments ref_count to 2."""
project, package = test_package
content = b"content for ref count increment test"
# First upload
result1 = upload_test_file(
integration_client, project, package, content, tag="v1"
)
assert result1["ref_count"] == 1
# Second upload
result2 = upload_test_file(
integration_client, project, package, content, tag="v2"
)
assert result2["ref_count"] == 2
@pytest.mark.integration
def test_same_file_different_packages_shares_artifact(
self, integration_client, test_project, unique_test_id
):
"""Test uploading same file to different packages shares artifact."""
project = test_project
content = f"content shared across packages {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Create two packages
pkg1 = f"package-a-{unique_test_id}"
pkg2 = f"package-b-{unique_test_id}"
integration_client.post(
f"/api/v1/project/{project}/packages",
json={"name": pkg1, "description": "Package A"},
)
integration_client.post(
f"/api/v1/project/{project}/packages",
json={"name": pkg2, "description": "Package B"},
)
# Upload to first package
result1 = upload_test_file(integration_client, project, pkg1, content, tag="v1")
assert result1["artifact_id"] == expected_hash
assert result1["deduplicated"] is False
# Upload to second package
result2 = upload_test_file(integration_client, project, pkg2, content, tag="v1")
assert result2["artifact_id"] == expected_hash
assert result2["deduplicated"] is True
@pytest.mark.integration
def test_same_file_different_projects_shares_artifact(
self, integration_client, unique_test_id
):
"""Test uploading same file to different projects shares artifact."""
content = f"content shared across projects {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Create two projects with packages
proj1 = f"project-x-{unique_test_id}"
proj2 = f"project-y-{unique_test_id}"
pkg_name = "shared-pkg"
try:
# Create projects and packages
integration_client.post(
"/api/v1/projects",
json={"name": proj1, "description": "Project X", "is_public": True},
)
integration_client.post(
"/api/v1/projects",
json={"name": proj2, "description": "Project Y", "is_public": True},
)
integration_client.post(
f"/api/v1/project/{proj1}/packages",
json={"name": pkg_name, "description": "Package"},
)
integration_client.post(
f"/api/v1/project/{proj2}/packages",
json={"name": pkg_name, "description": "Package"},
)
# Upload to first project
result1 = upload_test_file(
integration_client, proj1, pkg_name, content, tag="v1"
)
assert result1["artifact_id"] == expected_hash
assert result1["deduplicated"] is False
# Upload to second project
result2 = upload_test_file(
integration_client, proj2, pkg_name, content, tag="v1"
)
assert result2["artifact_id"] == expected_hash
assert result2["deduplicated"] is True
finally:
# Cleanup
integration_client.delete(f"/api/v1/projects/{proj1}")
integration_client.delete(f"/api/v1/projects/{proj2}")
@pytest.mark.integration
def test_same_file_different_filenames_shares_artifact(
self, integration_client, test_package
):
"""Test uploading same file with different original filenames shares artifact."""
project, package = test_package
content = b"content with different filenames"
expected_hash = compute_sha256(content)
# Upload with filename1
result1 = upload_test_file(
integration_client,
project,
package,
content,
filename="file1.bin",
tag="v1",
)
assert result1["artifact_id"] == expected_hash
# Upload with filename2
result2 = upload_test_file(
integration_client,
project,
package,
content,
filename="file2.bin",
tag="v2",
)
assert result2["artifact_id"] == expected_hash
assert result2["deduplicated"] is True
@pytest.mark.integration
def test_same_file_different_tags_shares_artifact(
self, integration_client, test_package, unique_test_id
):
"""Test uploading same file with different tags shares artifact."""
project, package = test_package
content = f"content with different tags {unique_test_id}".encode()
expected_hash = compute_sha256(content)
tags = ["latest", "stable", "v1.0.0", "release"]
for i, tag in enumerate(tags):
result = upload_test_file(
integration_client, project, package, content, tag=tag
)
assert result["artifact_id"] == expected_hash
if i == 0:
assert result["deduplicated"] is False
else:
assert result["deduplicated"] is True
class TestStorageVerification:
"""Tests to verify storage behavior after duplicate uploads."""
@pytest.mark.integration
def test_artifact_table_single_row_after_duplicates(
self, integration_client, test_package
):
"""Test artifact table contains only one row after duplicate uploads."""
project, package = test_package
content = b"content for single row test"
expected_hash = compute_sha256(content)
# Upload same content multiple times with different tags
for tag in ["v1", "v2", "v3"]:
upload_test_file(integration_client, project, package, content, tag=tag)
# Query artifact - should exist and be unique
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
artifact = response.json()
assert artifact["id"] == expected_hash
assert artifact["ref_count"] == 3
@pytest.mark.integration
def test_upload_table_multiple_rows_for_duplicates(
self, integration_client, test_package
):
"""Test upload table contains multiple rows for duplicate uploads (event tracking)."""
project, package = test_package
content = b"content for upload tracking test"
# Upload same content 3 times
for tag in ["upload1", "upload2", "upload3"]:
upload_test_file(integration_client, project, package, content, tag=tag)
# Check package stats - should show 3 uploads but fewer unique artifacts
response = integration_client.get(
f"/api/v1/project/{project}/packages/{package}"
)
assert response.status_code == 200
pkg_info = response.json()
assert pkg_info["tag_count"] == 3
@pytest.mark.integration
def test_artifact_content_matches_original(self, integration_client, test_package):
"""Test artifact content retrieved matches original content exactly."""
project, package = test_package
original_content = b"exact content verification test data 12345"
# Upload
result = upload_test_file(
integration_client, project, package, original_content, tag="verify"
)
# Download and compare
download_response = integration_client.get(
f"/api/v1/project/{project}/{package}/+/verify", params={"mode": "proxy"}
)
assert download_response.status_code == 200
downloaded_content = download_response.content
assert downloaded_content == original_content
@pytest.mark.integration
def test_storage_stats_reflect_deduplication(
self, integration_client, test_package
):
"""Test total storage size matches single artifact size after duplicates."""
project, package = test_package
content = b"content for storage stats test - should only count once"
content_size = len(content)
# Upload same content 5 times
for tag in ["a", "b", "c", "d", "e"]:
upload_test_file(integration_client, project, package, content, tag=tag)
# Check global stats
response = integration_client.get("/api/v1/stats")
assert response.status_code == 200
stats = response.json()
# Deduplication should show savings
assert stats["deduplicated_uploads"] > 0
assert stats["storage_saved_bytes"] > 0
class TestConcurrentUploads:
"""Tests for concurrent upload handling."""
@pytest.mark.integration
def test_concurrent_uploads_same_file(self, integration_client, test_package):
"""Test concurrent uploads of same file handle deduplication correctly."""
project, package = test_package
content = b"content for concurrent upload test"
expected_hash = compute_sha256(content)
num_concurrent = 5
results = []
errors = []
def upload_worker(tag_suffix):
try:
# Create a new client for this thread
from httpx import Client
base_url = "http://localhost:8080"
with Client(base_url=base_url, timeout=30.0) as client:
files = {
"file": (
f"concurrent-{tag_suffix}.bin",
io.BytesIO(content),
"application/octet-stream",
)
}
response = client.post(
f"/api/v1/project/{project}/{package}/upload",
files=files,
data={"tag": f"concurrent-{tag_suffix}"},
)
if response.status_code == 200:
results.append(response.json())
else:
errors.append(f"Status {response.status_code}: {response.text}")
except Exception as e:
errors.append(str(e))
# Run concurrent uploads
with ThreadPoolExecutor(max_workers=num_concurrent) as executor:
futures = [executor.submit(upload_worker, i) for i in range(num_concurrent)]
for future in as_completed(futures):
pass # Wait for all to complete
# Verify results
assert len(errors) == 0, f"Errors during concurrent uploads: {errors}"
assert len(results) == num_concurrent
# All should have same artifact_id
artifact_ids = set(r["artifact_id"] for r in results)
assert len(artifact_ids) == 1
assert expected_hash in artifact_ids
# Verify final ref_count
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
assert response.json()["ref_count"] == num_concurrent
class TestDeduplicationAcrossRestarts:
"""Tests for deduplication persistence."""
@pytest.mark.integration
def test_deduplication_persists(
self, integration_client, test_package, unique_test_id
):
"""
Test deduplication works with persisted data.
This test uploads content, then uploads the same content again.
Since the database persists, the second upload should detect
the existing artifact even without server restart.
"""
project, package = test_package
content = f"persisted content for dedup test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# First upload
result1 = upload_test_file(
integration_client, project, package, content, tag="persist1"
)
assert result1["artifact_id"] == expected_hash
assert result1["deduplicated"] is False
# Second upload (simulating after restart - data is persisted)
result2 = upload_test_file(
integration_client, project, package, content, tag="persist2"
)
assert result2["artifact_id"] == expected_hash
assert result2["deduplicated"] is True
# Verify artifact exists with correct ref_count
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
assert response.json()["ref_count"] == 2
class TestS3ObjectVerification:
"""Tests to verify S3 storage behavior directly."""
@pytest.mark.integration
def test_s3_bucket_single_object_after_duplicates(
self, integration_client, test_package, unique_test_id
):
"""Test S3 bucket contains only one object after duplicate uploads."""
project, package = test_package
content = f"content for s3 object count test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Upload same content multiple times with different tags
for tag in ["s3test1", "s3test2", "s3test3"]:
upload_test_file(integration_client, project, package, content, tag=tag)
# Verify only one S3 object exists for this hash
s3_objects = list_s3_objects_by_hash(expected_hash)
assert len(s3_objects) == 1, (
f"Expected 1 S3 object, found {len(s3_objects)}: {s3_objects}"
)
# Verify the object key follows expected pattern
expected_key = (
f"fruits/{expected_hash[:2]}/{expected_hash[2:4]}/{expected_hash}"
)
assert s3_objects[0] == expected_key
class TestUploadFailureCleanup:
"""Tests for cleanup when uploads fail."""
@pytest.mark.integration
def test_upload_failure_invalid_project_no_orphaned_s3(
self, integration_client, unique_test_id
):
"""Test upload to non-existent project doesn't leave orphaned S3 objects."""
content = f"content for orphan s3 test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Attempt upload to non-existent project
files = {"file": ("test.bin", io.BytesIO(content), "application/octet-stream")}
response = integration_client.post(
f"/api/v1/project/nonexistent-project-{unique_test_id}/nonexistent-pkg/upload",
files=files,
data={"tag": "test"},
)
# Upload should fail
assert response.status_code == 404
# Verify no S3 object was created
assert not s3_object_exists(expected_hash), (
"Orphaned S3 object found after failed upload"
)
@pytest.mark.integration
def test_upload_failure_invalid_package_no_orphaned_s3(
self, integration_client, test_project, unique_test_id
):
"""Test upload to non-existent package doesn't leave orphaned S3 objects."""
content = f"content for orphan s3 test pkg {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Attempt upload to non-existent package
files = {"file": ("test.bin", io.BytesIO(content), "application/octet-stream")}
response = integration_client.post(
f"/api/v1/project/{test_project}/nonexistent-package-{unique_test_id}/upload",
files=files,
data={"tag": "test"},
)
# Upload should fail
assert response.status_code == 404
# Verify no S3 object was created
assert not s3_object_exists(expected_hash), (
"Orphaned S3 object found after failed upload"
)
@pytest.mark.integration
def test_upload_failure_empty_file_no_orphaned_s3(
self, integration_client, test_package, unique_test_id
):
"""Test upload of empty file doesn't leave orphaned S3 objects or DB records."""
project, package = test_package
content = b"" # Empty content
# Attempt upload of empty file
files = {"file": ("empty.bin", io.BytesIO(content), "application/octet-stream")}
response = integration_client.post(
f"/api/v1/project/{project}/{package}/upload",
files=files,
data={"tag": f"empty-{unique_test_id}"},
)
# Upload should fail (empty files are rejected)
assert response.status_code in (400, 422), (
f"Expected 400/422, got {response.status_code}"
)
@pytest.mark.integration
def test_upload_failure_no_orphaned_database_records(
self, integration_client, test_project, unique_test_id
):
"""Test failed upload doesn't leave orphaned database records."""
content = f"content for db orphan test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Attempt upload to non-existent package (should fail before DB insert)
files = {"file": ("test.bin", io.BytesIO(content), "application/octet-stream")}
response = integration_client.post(
f"/api/v1/project/{test_project}/nonexistent-package-{unique_test_id}/upload",
files=files,
data={"tag": "test"},
)
# Upload should fail
assert response.status_code == 404
# Verify no artifact record was created
artifact_response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert artifact_response.status_code == 404, (
"Orphaned artifact record found after failed upload"
)
@pytest.mark.integration
def test_duplicate_tag_upload_handles_gracefully(
self, integration_client, test_package, unique_test_id
):
"""Test uploading with duplicate tag is handled without orphaned data."""
project, package = test_package
content1 = f"content version 1 {unique_test_id}".encode()
content2 = f"content version 2 {unique_test_id}".encode()
tag = f"duplicate-tag-{unique_test_id}"
# First upload with tag
result1 = upload_test_file(
integration_client, project, package, content1, tag=tag
)
hash1 = result1["artifact_id"]
# Second upload with same tag (should update the tag to point to new artifact)
result2 = upload_test_file(
integration_client, project, package, content2, tag=tag
)
hash2 = result2["artifact_id"]
# Both artifacts should exist
assert integration_client.get(f"/api/v1/artifact/{hash1}").status_code == 200
assert integration_client.get(f"/api/v1/artifact/{hash2}").status_code == 200
# Tag should point to the second artifact
tag_response = integration_client.get(
f"/api/v1/project/{project}/{package}/tags/{tag}"
)
assert tag_response.status_code == 200
assert tag_response.json()["artifact_id"] == hash2

View File

@@ -0,0 +1,176 @@
"""
Unit and integration tests for reference counting behavior.
Tests cover:
- ref_count is set correctly for new artifacts
- ref_count increments on duplicate uploads
- ref_count query correctly identifies existing artifacts
- Artifact lookup by SHA256 hash works correctly
"""
import pytest
import io
from tests.conftest import (
compute_sha256,
upload_test_file,
TEST_CONTENT_HELLO,
TEST_HASH_HELLO,
)
class TestRefCountQuery:
"""Tests for ref_count querying and artifact lookup."""
@pytest.mark.integration
def test_artifact_lookup_by_sha256(self, integration_client, test_package):
"""Test artifact lookup by SHA256 hash (primary key) works correctly."""
project, package = test_package
content = b"unique content for lookup test"
expected_hash = compute_sha256(content)
# Upload a file
upload_result = upload_test_file(
integration_client, project, package, content, tag="v1"
)
assert upload_result["artifact_id"] == expected_hash
# Look up artifact by ID (SHA256)
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
artifact = response.json()
assert artifact["id"] == expected_hash
assert artifact["sha256"] == expected_hash
assert artifact["size"] == len(content)
@pytest.mark.integration
def test_ref_count_query_identifies_existing_artifact(
self, integration_client, test_package
):
"""Test ref_count query correctly identifies existing artifacts by hash."""
project, package = test_package
content = b"content for ref count query test"
expected_hash = compute_sha256(content)
# Upload a file with a tag
upload_result = upload_test_file(
integration_client, project, package, content, tag="v1"
)
# Query artifact and check ref_count
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
artifact = response.json()
assert artifact["ref_count"] >= 1 # At least 1 from the tag
@pytest.mark.integration
def test_ref_count_set_to_1_for_new_artifact_with_tag(
self, integration_client, test_package, unique_test_id
):
"""Test ref_count is set to 1 for new artifacts when created with a tag."""
project, package = test_package
content = f"brand new content for ref count test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Upload a new file with a tag
upload_result = upload_test_file(
integration_client, project, package, content, tag="initial"
)
assert upload_result["artifact_id"] == expected_hash
assert upload_result["ref_count"] == 1
assert upload_result["deduplicated"] is False
@pytest.mark.integration
def test_ref_count_increments_on_duplicate_upload_with_tag(
self, integration_client, test_package, unique_test_id
):
"""Test ref_count is incremented when duplicate content is uploaded with a new tag."""
project, package = test_package
content = f"content that will be uploaded twice {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# First upload with tag
result1 = upload_test_file(
integration_client, project, package, content, tag="v1"
)
assert result1["ref_count"] == 1
assert result1["deduplicated"] is False
# Second upload with different tag (same content)
result2 = upload_test_file(
integration_client, project, package, content, tag="v2"
)
assert result2["artifact_id"] == expected_hash
assert result2["ref_count"] == 2
assert result2["deduplicated"] is True
@pytest.mark.integration
def test_ref_count_after_multiple_tags(self, integration_client, test_package):
"""Test ref_count correctly reflects number of tags pointing to artifact."""
project, package = test_package
content = b"content for multiple tag test"
expected_hash = compute_sha256(content)
# Upload with multiple tags
tags = ["v1", "v2", "v3", "latest"]
for i, tag in enumerate(tags):
result = upload_test_file(
integration_client, project, package, content, tag=tag
)
assert result["artifact_id"] == expected_hash
assert result["ref_count"] == i + 1
# Verify final ref_count via artifact endpoint
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.status_code == 200
assert response.json()["ref_count"] == len(tags)
class TestRefCountWithDeletion:
"""Tests for ref_count behavior when tags are deleted."""
@pytest.mark.integration
def test_ref_count_decrements_on_tag_delete(self, integration_client, test_package):
"""Test ref_count decrements when a tag is deleted."""
project, package = test_package
content = b"content for delete test"
expected_hash = compute_sha256(content)
# Upload with two tags
upload_test_file(integration_client, project, package, content, tag="v1")
upload_test_file(integration_client, project, package, content, tag="v2")
# Verify ref_count is 2
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.json()["ref_count"] == 2
# Delete one tag
delete_response = integration_client.delete(
f"/api/v1/project/{project}/{package}/tags/v1"
)
assert delete_response.status_code == 204
# Verify ref_count is now 1
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.json()["ref_count"] == 1
@pytest.mark.integration
def test_ref_count_zero_after_all_tags_deleted(
self, integration_client, test_package
):
"""Test ref_count goes to 0 when all tags are deleted."""
project, package = test_package
content = b"content that will be orphaned"
expected_hash = compute_sha256(content)
# Upload with one tag
upload_test_file(integration_client, project, package, content, tag="only-tag")
# Delete the tag
integration_client.delete(f"/api/v1/project/{project}/{package}/tags/only-tag")
# Verify ref_count is 0
response = integration_client.get(f"/api/v1/artifact/{expected_hash}")
assert response.json()["ref_count"] == 0

View File

@@ -0,0 +1,488 @@
"""
Integration tests for statistics endpoints.
Tests cover:
- Global stats endpoint
- Deduplication stats endpoint
- Cross-project deduplication
- Timeline stats
- Export and report endpoints
- Package and artifact stats
"""
import pytest
from tests.conftest import compute_sha256, upload_test_file
class TestGlobalStats:
"""Tests for GET /api/v1/stats endpoint."""
@pytest.mark.integration
def test_stats_returns_valid_response(self, integration_client):
"""Test stats endpoint returns expected fields."""
response = integration_client.get("/api/v1/stats")
assert response.status_code == 200
data = response.json()
# Check all required fields exist
assert "total_artifacts" in data
assert "total_size_bytes" in data
assert "unique_artifacts" in data
assert "orphaned_artifacts" in data
assert "orphaned_size_bytes" in data
assert "total_uploads" in data
assert "deduplicated_uploads" in data
assert "deduplication_ratio" in data
assert "storage_saved_bytes" in data
@pytest.mark.integration
def test_stats_values_are_non_negative(self, integration_client):
"""Test all stat values are non-negative."""
response = integration_client.get("/api/v1/stats")
assert response.status_code == 200
data = response.json()
assert data["total_artifacts"] >= 0
assert data["total_size_bytes"] >= 0
assert data["unique_artifacts"] >= 0
assert data["orphaned_artifacts"] >= 0
assert data["total_uploads"] >= 0
assert data["deduplicated_uploads"] >= 0
assert data["deduplication_ratio"] >= 0
assert data["storage_saved_bytes"] >= 0
@pytest.mark.integration
def test_stats_update_after_upload(
self, integration_client, test_package, unique_test_id
):
"""Test stats update after uploading an artifact."""
project, package = test_package
# Get initial stats
initial_response = integration_client.get("/api/v1/stats")
initial_stats = initial_response.json()
# Upload a new file
content = f"stats test content {unique_test_id}".encode()
upload_test_file(
integration_client, project, package, content, tag=f"stats-{unique_test_id}"
)
# Get updated stats
updated_response = integration_client.get("/api/v1/stats")
updated_stats = updated_response.json()
# Verify stats increased
assert updated_stats["total_uploads"] >= initial_stats["total_uploads"]
class TestDeduplicationStats:
"""Tests for GET /api/v1/stats/deduplication endpoint."""
@pytest.mark.integration
def test_dedup_stats_returns_valid_response(self, integration_client):
"""Test deduplication stats returns expected fields."""
response = integration_client.get("/api/v1/stats/deduplication")
assert response.status_code == 200
data = response.json()
assert "total_logical_bytes" in data
assert "total_physical_bytes" in data
assert "bytes_saved" in data
assert "savings_percentage" in data
assert "total_uploads" in data
assert "unique_artifacts" in data
assert "duplicate_uploads" in data
assert "average_ref_count" in data
assert "max_ref_count" in data
assert "most_referenced_artifacts" in data
@pytest.mark.integration
def test_most_referenced_artifacts_format(self, integration_client):
"""Test most_referenced_artifacts has correct structure."""
response = integration_client.get("/api/v1/stats/deduplication")
assert response.status_code == 200
data = response.json()
artifacts = data["most_referenced_artifacts"]
assert isinstance(artifacts, list)
if len(artifacts) > 0:
artifact = artifacts[0]
assert "artifact_id" in artifact
assert "ref_count" in artifact
assert "size" in artifact
assert "storage_saved" in artifact
@pytest.mark.integration
def test_dedup_stats_with_top_n_param(self, integration_client):
"""Test deduplication stats respects top_n parameter."""
response = integration_client.get("/api/v1/stats/deduplication?top_n=3")
assert response.status_code == 200
data = response.json()
assert len(data["most_referenced_artifacts"]) <= 3
@pytest.mark.integration
def test_savings_percentage_valid_range(self, integration_client):
"""Test savings percentage is between 0 and 100."""
response = integration_client.get("/api/v1/stats/deduplication")
assert response.status_code == 200
data = response.json()
assert 0 <= data["savings_percentage"] <= 100
class TestCrossProjectStats:
"""Tests for GET /api/v1/stats/cross-project endpoint."""
@pytest.mark.integration
def test_cross_project_returns_valid_response(self, integration_client):
"""Test cross-project stats returns expected fields."""
response = integration_client.get("/api/v1/stats/cross-project")
assert response.status_code == 200
data = response.json()
assert "shared_artifacts_count" in data
assert "total_cross_project_savings" in data
assert "shared_artifacts" in data
assert isinstance(data["shared_artifacts"], list)
@pytest.mark.integration
def test_cross_project_respects_limit(self, integration_client):
"""Test cross-project stats respects limit parameter."""
response = integration_client.get("/api/v1/stats/cross-project?limit=5")
assert response.status_code == 200
data = response.json()
assert len(data["shared_artifacts"]) <= 5
@pytest.mark.integration
def test_cross_project_detects_shared_artifacts(
self, integration_client, unique_test_id
):
"""Test cross-project deduplication is detected."""
content = f"shared across projects {unique_test_id}".encode()
# Create two projects
proj1 = f"cross-proj-a-{unique_test_id}"
proj2 = f"cross-proj-b-{unique_test_id}"
try:
# Create projects and packages
integration_client.post(
"/api/v1/projects",
json={"name": proj1, "description": "Test", "is_public": True},
)
integration_client.post(
"/api/v1/projects",
json={"name": proj2, "description": "Test", "is_public": True},
)
integration_client.post(
f"/api/v1/project/{proj1}/packages",
json={"name": "pkg", "description": "Test"},
)
integration_client.post(
f"/api/v1/project/{proj2}/packages",
json={"name": "pkg", "description": "Test"},
)
# Upload same content to both projects
upload_test_file(integration_client, proj1, "pkg", content, tag="v1")
upload_test_file(integration_client, proj2, "pkg", content, tag="v1")
# Check cross-project stats
response = integration_client.get("/api/v1/stats/cross-project")
assert response.status_code == 200
data = response.json()
assert data["shared_artifacts_count"] >= 1
finally:
# Cleanup
integration_client.delete(f"/api/v1/projects/{proj1}")
integration_client.delete(f"/api/v1/projects/{proj2}")
class TestTimelineStats:
"""Tests for GET /api/v1/stats/timeline endpoint."""
@pytest.mark.integration
def test_timeline_returns_valid_response(self, integration_client):
"""Test timeline stats returns expected fields."""
response = integration_client.get("/api/v1/stats/timeline")
assert response.status_code == 200
data = response.json()
assert "period" in data
assert "start_date" in data
assert "end_date" in data
assert "data_points" in data
assert isinstance(data["data_points"], list)
@pytest.mark.integration
def test_timeline_daily_period(self, integration_client):
"""Test timeline with daily period."""
response = integration_client.get("/api/v1/stats/timeline?period=daily")
assert response.status_code == 200
data = response.json()
assert data["period"] == "daily"
@pytest.mark.integration
def test_timeline_weekly_period(self, integration_client):
"""Test timeline with weekly period."""
response = integration_client.get("/api/v1/stats/timeline?period=weekly")
assert response.status_code == 200
data = response.json()
assert data["period"] == "weekly"
@pytest.mark.integration
def test_timeline_monthly_period(self, integration_client):
"""Test timeline with monthly period."""
response = integration_client.get("/api/v1/stats/timeline?period=monthly")
assert response.status_code == 200
data = response.json()
assert data["period"] == "monthly"
@pytest.mark.integration
def test_timeline_invalid_period_rejected(self, integration_client):
"""Test timeline rejects invalid period."""
response = integration_client.get("/api/v1/stats/timeline?period=invalid")
assert response.status_code == 422
@pytest.mark.integration
def test_timeline_data_point_structure(self, integration_client):
"""Test timeline data points have correct structure."""
response = integration_client.get("/api/v1/stats/timeline")
assert response.status_code == 200
data = response.json()
if len(data["data_points"]) > 0:
point = data["data_points"][0]
assert "date" in point
assert "total_uploads" in point
assert "unique_artifacts" in point
assert "duplicated_uploads" in point
assert "bytes_saved" in point
class TestExportEndpoint:
"""Tests for GET /api/v1/stats/export endpoint."""
@pytest.mark.integration
def test_export_json_format(self, integration_client):
"""Test export with JSON format."""
response = integration_client.get("/api/v1/stats/export?format=json")
assert response.status_code == 200
data = response.json()
assert "total_artifacts" in data
assert "generated_at" in data
@pytest.mark.integration
def test_export_csv_format(self, integration_client):
"""Test export with CSV format."""
response = integration_client.get("/api/v1/stats/export?format=csv")
assert response.status_code == 200
assert "text/csv" in response.headers.get("content-type", "")
content = response.text
assert "Metric,Value" in content
assert "total_artifacts" in content
@pytest.mark.integration
def test_export_invalid_format_rejected(self, integration_client):
"""Test export rejects invalid format."""
response = integration_client.get("/api/v1/stats/export?format=xml")
assert response.status_code == 422
class TestReportEndpoint:
"""Tests for GET /api/v1/stats/report endpoint."""
@pytest.mark.integration
def test_report_markdown_format(self, integration_client):
"""Test report with markdown format."""
response = integration_client.get("/api/v1/stats/report?format=markdown")
assert response.status_code == 200
data = response.json()
assert data["format"] == "markdown"
assert "generated_at" in data
assert "content" in data
assert "# Orchard Storage Report" in data["content"]
@pytest.mark.integration
def test_report_json_format(self, integration_client):
"""Test report with JSON format."""
response = integration_client.get("/api/v1/stats/report?format=json")
assert response.status_code == 200
data = response.json()
assert data["format"] == "json"
assert "content" in data
@pytest.mark.integration
def test_report_contains_sections(self, integration_client):
"""Test markdown report contains expected sections."""
response = integration_client.get("/api/v1/stats/report?format=markdown")
assert response.status_code == 200
content = response.json()["content"]
assert "## Overview" in content
assert "## Storage" in content
assert "## Uploads" in content
class TestProjectStats:
"""Tests for GET /api/v1/projects/:project/stats endpoint."""
@pytest.mark.integration
def test_project_stats_returns_valid_response(
self, integration_client, test_project
):
"""Test project stats returns expected fields."""
response = integration_client.get(f"/api/v1/projects/{test_project}/stats")
assert response.status_code == 200
data = response.json()
assert "project_id" in data
assert "project_name" in data
assert "package_count" in data
assert "tag_count" in data
assert "artifact_count" in data
assert "total_size_bytes" in data
assert "upload_count" in data
assert "deduplicated_uploads" in data
assert "storage_saved_bytes" in data
assert "deduplication_ratio" in data
@pytest.mark.integration
def test_project_stats_not_found(self, integration_client):
"""Test project stats returns 404 for non-existent project."""
response = integration_client.get("/api/v1/projects/nonexistent-project/stats")
assert response.status_code == 404
class TestPackageStats:
"""Tests for GET /api/v1/project/:project/packages/:package/stats endpoint."""
@pytest.mark.integration
def test_package_stats_returns_valid_response(
self, integration_client, test_package
):
"""Test package stats returns expected fields."""
project, package = test_package
response = integration_client.get(
f"/api/v1/project/{project}/packages/{package}/stats"
)
assert response.status_code == 200
data = response.json()
assert "package_id" in data
assert "package_name" in data
assert "project_name" in data
assert "tag_count" in data
assert "artifact_count" in data
assert "total_size_bytes" in data
assert "upload_count" in data
assert "deduplicated_uploads" in data
assert "storage_saved_bytes" in data
assert "deduplication_ratio" in data
@pytest.mark.integration
def test_package_stats_not_found(self, integration_client, test_project):
"""Test package stats returns 404 for non-existent package."""
response = integration_client.get(
f"/api/v1/project/{test_project}/packages/nonexistent-package/stats"
)
assert response.status_code == 404
class TestArtifactStats:
"""Tests for GET /api/v1/artifact/:id/stats endpoint."""
@pytest.mark.integration
def test_artifact_stats_returns_valid_response(
self, integration_client, test_package, unique_test_id
):
"""Test artifact stats returns expected fields."""
project, package = test_package
content = f"artifact stats test {unique_test_id}".encode()
expected_hash = compute_sha256(content)
# Upload artifact
upload_test_file(
integration_client, project, package, content, tag=f"art-{unique_test_id}"
)
# Get artifact stats
response = integration_client.get(f"/api/v1/artifact/{expected_hash}/stats")
assert response.status_code == 200
data = response.json()
assert "artifact_id" in data
assert "sha256" in data
assert "size" in data
assert "ref_count" in data
assert "storage_savings" in data
assert "tags" in data
assert "projects" in data
assert "packages" in data
@pytest.mark.integration
def test_artifact_stats_not_found(self, integration_client):
"""Test artifact stats returns 404 for non-existent artifact."""
fake_hash = "0" * 64
response = integration_client.get(f"/api/v1/artifact/{fake_hash}/stats")
assert response.status_code == 404
@pytest.mark.integration
def test_artifact_stats_shows_correct_projects(
self, integration_client, unique_test_id
):
"""Test artifact stats shows all projects using the artifact."""
content = f"multi-project artifact {unique_test_id}".encode()
expected_hash = compute_sha256(content)
proj1 = f"art-stats-a-{unique_test_id}"
proj2 = f"art-stats-b-{unique_test_id}"
try:
# Create projects and packages
integration_client.post(
"/api/v1/projects",
json={"name": proj1, "description": "Test", "is_public": True},
)
integration_client.post(
"/api/v1/projects",
json={"name": proj2, "description": "Test", "is_public": True},
)
integration_client.post(
f"/api/v1/project/{proj1}/packages",
json={"name": "pkg", "description": "Test"},
)
integration_client.post(
f"/api/v1/project/{proj2}/packages",
json={"name": "pkg", "description": "Test"},
)
# Upload same content to both projects
upload_test_file(integration_client, proj1, "pkg", content, tag="v1")
upload_test_file(integration_client, proj2, "pkg", content, tag="v1")
# Check artifact stats
response = integration_client.get(f"/api/v1/artifact/{expected_hash}/stats")
assert response.status_code == 200
data = response.json()
assert len(data["projects"]) == 2
assert proj1 in data["projects"]
assert proj2 in data["projects"]
finally:
integration_client.delete(f"/api/v1/projects/{proj1}")
integration_client.delete(f"/api/v1/projects/{proj2}")

View File

@@ -3,12 +3,14 @@ import Layout from './components/Layout';
import Home from './pages/Home';
import ProjectPage from './pages/ProjectPage';
import PackagePage from './pages/PackagePage';
import Dashboard from './pages/Dashboard';
function App() {
return (
<Layout>
<Routes>
<Route path="/" element={<Home />} />
<Route path="/dashboard" element={<Dashboard />} />
<Route path="/project/:projectName" element={<ProjectPage />} />
<Route path="/project/:projectName/:packageName" element={<PackagePage />} />
</Routes>

View File

@@ -13,6 +13,10 @@ import {
ArtifactListParams,
ProjectListParams,
GlobalSearchResponse,
Stats,
DeduplicationStats,
TimelineStats,
CrossProjectStats,
} from './types';
const API_BASE = '/api/v1';
@@ -156,3 +160,29 @@ export async function uploadArtifact(projectName: string, packageName: string, f
export function getDownloadUrl(projectName: string, packageName: string, ref: string): string {
return `${API_BASE}/project/${projectName}/${packageName}/+/${ref}`;
}
// Stats API
export async function getStats(): Promise<Stats> {
const response = await fetch(`${API_BASE}/stats`);
return handleResponse<Stats>(response);
}
export async function getDeduplicationStats(): Promise<DeduplicationStats> {
const response = await fetch(`${API_BASE}/stats/deduplication`);
return handleResponse<DeduplicationStats>(response);
}
export async function getTimelineStats(
period: 'day' | 'week' | 'month' = 'day',
fromDate?: string,
toDate?: string
): Promise<TimelineStats> {
const params = buildQueryString({ period, from_date: fromDate, to_date: toDate });
const response = await fetch(`${API_BASE}/stats/timeline${params}`);
return handleResponse<TimelineStats>(response);
}
export async function getCrossProjectStats(): Promise<CrossProjectStats> {
const response = await fetch(`${API_BASE}/stats/cross-project`);
return handleResponse<CrossProjectStats>(response);
}

View File

@@ -42,6 +42,15 @@ function Layout({ children }: LayoutProps) {
</svg>
Projects
</Link>
<Link to="/dashboard" className={location.pathname === '/dashboard' ? 'active' : ''}>
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<rect x="3" y="3" width="7" height="7" rx="1"/>
<rect x="14" y="3" width="7" height="7" rx="1"/>
<rect x="3" y="14" width="7" height="7" rx="1"/>
<rect x="14" y="14" width="7" height="7" rx="1"/>
</svg>
Dashboard
</Link>
<a href="/docs" className="nav-link-muted">
<svg width="16" height="16" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M14 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V8z"/>

View File

@@ -0,0 +1,547 @@
.dashboard {
max-width: 1200px;
margin: 0 auto;
}
.dashboard__header {
position: relative;
margin-bottom: 48px;
padding-bottom: 32px;
border-bottom: 1px solid var(--border-primary);
overflow: hidden;
}
.dashboard__header-content {
position: relative;
z-index: 1;
}
.dashboard__header h1 {
font-size: 2.5rem;
font-weight: 700;
color: var(--text-primary);
letter-spacing: -0.03em;
margin-bottom: 8px;
background: linear-gradient(135deg, var(--text-primary) 0%, var(--accent-primary) 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
background-clip: text;
}
.dashboard__subtitle {
font-size: 1rem;
color: var(--text-tertiary);
letter-spacing: -0.01em;
}
.dashboard__header-accent {
position: absolute;
top: -100px;
right: -100px;
width: 400px;
height: 400px;
background: radial-gradient(circle, rgba(16, 185, 129, 0.08) 0%, transparent 70%);
pointer-events: none;
}
.dashboard__section {
margin-bottom: 48px;
}
.dashboard__section-title {
display: flex;
align-items: center;
gap: 12px;
font-size: 1.125rem;
font-weight: 600;
color: var(--text-primary);
margin-bottom: 20px;
letter-spacing: -0.01em;
}
.dashboard__section-title svg {
color: var(--accent-primary);
}
.dashboard__section-description {
color: var(--text-tertiary);
font-size: 0.875rem;
margin-bottom: 20px;
margin-top: -8px;
}
.stat-grid {
display: grid;
gap: 16px;
}
.stat-grid--4 {
grid-template-columns: repeat(4, 1fr);
}
.stat-grid--3 {
grid-template-columns: repeat(3, 1fr);
}
.stat-grid--2 {
grid-template-columns: repeat(2, 1fr);
}
@media (max-width: 1024px) {
.stat-grid--4 {
grid-template-columns: repeat(2, 1fr);
}
}
@media (max-width: 640px) {
.stat-grid--4,
.stat-grid--3,
.stat-grid--2 {
grid-template-columns: 1fr;
}
}
.stat-card {
position: relative;
display: flex;
align-items: flex-start;
gap: 16px;
background: var(--bg-secondary);
border: 1px solid var(--border-primary);
border-radius: var(--radius-lg);
padding: 20px;
transition: all var(--transition-normal);
overflow: hidden;
}
.stat-card::before {
content: '';
position: absolute;
top: 0;
left: 0;
right: 0;
height: 3px;
background: var(--border-secondary);
transition: background var(--transition-normal);
}
.stat-card:hover {
border-color: var(--border-secondary);
transform: translateY(-2px);
box-shadow: var(--shadow-lg);
}
.stat-card--success::before {
background: var(--accent-gradient);
}
.stat-card--success {
background: linear-gradient(135deg, rgba(16, 185, 129, 0.03) 0%, transparent 50%);
}
.stat-card--accent::before {
background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
}
.stat-card--accent {
background: linear-gradient(135deg, rgba(59, 130, 246, 0.03) 0%, transparent 50%);
}
.stat-card__icon {
display: flex;
align-items: center;
justify-content: center;
width: 48px;
height: 48px;
border-radius: var(--radius-md);
background: var(--bg-tertiary);
color: var(--text-tertiary);
flex-shrink: 0;
}
.stat-card--success .stat-card__icon {
background: rgba(16, 185, 129, 0.1);
color: var(--accent-primary);
}
.stat-card--accent .stat-card__icon {
background: rgba(59, 130, 246, 0.1);
color: #3b82f6;
}
.stat-card__content {
display: flex;
flex-direction: column;
gap: 2px;
min-width: 0;
}
.stat-card__label {
font-size: 0.75rem;
font-weight: 500;
text-transform: uppercase;
letter-spacing: 0.05em;
color: var(--text-tertiary);
}
.stat-card__value {
font-size: 1.75rem;
font-weight: 700;
color: var(--text-primary);
letter-spacing: -0.02em;
line-height: 1.2;
display: flex;
align-items: baseline;
gap: 8px;
}
.stat-card__subvalue {
font-size: 0.75rem;
color: var(--text-muted);
margin-top: 2px;
}
.stat-card__trend {
font-size: 0.875rem;
font-weight: 600;
}
.stat-card__trend--up {
color: var(--success);
}
.stat-card__trend--down {
color: var(--error);
}
.progress-bar {
width: 100%;
}
.progress-bar__header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 8px;
}
.progress-bar__label {
font-size: 0.8125rem;
color: var(--text-secondary);
}
.progress-bar__percentage {
font-size: 0.8125rem;
font-weight: 600;
color: var(--text-primary);
}
.progress-bar__track {
position: relative;
height: 8px;
background: var(--bg-tertiary);
border-radius: 100px;
overflow: hidden;
}
.progress-bar__fill {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: var(--border-secondary);
border-radius: 100px;
transition: width 0.5s ease-out;
}
.progress-bar__glow {
position: absolute;
top: 0;
left: 0;
height: 100%;
background: transparent;
border-radius: 100px;
transition: width 0.5s ease-out;
}
.progress-bar--success .progress-bar__fill {
background: var(--accent-gradient);
}
.progress-bar--success .progress-bar__glow {
box-shadow: 0 0 12px rgba(16, 185, 129, 0.4);
}
.progress-bar--accent .progress-bar__fill {
background: linear-gradient(135deg, #3b82f6 0%, #8b5cf6 100%);
}
.effectiveness-grid {
display: grid;
grid-template-columns: 1.5fr 1fr;
gap: 16px;
}
@media (max-width: 900px) {
.effectiveness-grid {
grid-template-columns: 1fr;
}
}
.effectiveness-card {
background: var(--bg-secondary);
border: 1px solid var(--border-primary);
border-radius: var(--radius-lg);
padding: 24px;
}
.effectiveness-card h3 {
font-size: 0.875rem;
font-weight: 600;
color: var(--text-primary);
margin-bottom: 24px;
text-transform: uppercase;
letter-spacing: 0.05em;
}
.storage-comparison {
display: flex;
flex-direction: column;
gap: 20px;
margin-bottom: 24px;
}
.storage-bar__label {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 8px;
font-size: 0.8125rem;
color: var(--text-secondary);
}
.storage-bar__value {
font-weight: 600;
color: var(--text-primary);
font-family: 'JetBrains Mono', 'Fira Code', monospace;
}
.storage-savings {
display: flex;
align-items: center;
gap: 16px;
padding: 20px;
background: linear-gradient(135deg, rgba(16, 185, 129, 0.08) 0%, rgba(5, 150, 105, 0.04) 100%);
border: 1px solid rgba(16, 185, 129, 0.2);
border-radius: var(--radius-md);
}
.storage-savings__icon {
display: flex;
align-items: center;
justify-content: center;
width: 56px;
height: 56px;
border-radius: 50%;
background: var(--accent-gradient);
color: white;
flex-shrink: 0;
box-shadow: 0 0 24px rgba(16, 185, 129, 0.3);
}
.storage-savings__content {
display: flex;
flex-direction: column;
}
.storage-savings__value {
font-size: 1.5rem;
font-weight: 700;
color: var(--accent-primary);
letter-spacing: -0.02em;
}
.storage-savings__label {
font-size: 0.8125rem;
color: var(--text-tertiary);
}
.dedup-rate {
display: flex;
flex-direction: column;
align-items: center;
gap: 24px;
}
.dedup-rate__circle {
position: relative;
width: 160px;
height: 160px;
}
.dedup-rate__svg {
width: 100%;
height: 100%;
transform: rotate(0deg);
}
.dedup-rate__value {
position: absolute;
top: 50%;
left: 50%;
transform: translate(-50%, -50%);
display: flex;
align-items: baseline;
gap: 2px;
}
.dedup-rate__number {
font-size: 2.5rem;
font-weight: 700;
color: var(--text-primary);
letter-spacing: -0.03em;
}
.dedup-rate__symbol {
font-size: 1.25rem;
font-weight: 600;
color: var(--text-tertiary);
}
.dedup-rate__details {
display: flex;
gap: 32px;
}
.dedup-rate__detail {
display: flex;
flex-direction: column;
align-items: center;
text-align: center;
}
.dedup-rate__detail-value {
font-size: 1.25rem;
font-weight: 700;
color: var(--text-primary);
}
.dedup-rate__detail-label {
font-size: 0.6875rem;
color: var(--text-muted);
text-transform: uppercase;
letter-spacing: 0.05em;
margin-top: 4px;
}
.artifacts-table {
margin-top: 16px;
}
.artifact-link {
display: inline-block;
}
.artifact-link code {
font-family: 'JetBrains Mono', 'Fira Code', monospace;
font-size: 0.8125rem;
padding: 4px 8px;
background: var(--bg-tertiary);
border-radius: var(--radius-sm);
color: var(--accent-primary);
transition: all var(--transition-fast);
}
.artifact-link:hover code {
background: rgba(16, 185, 129, 0.15);
}
.artifact-name {
max-width: 200px;
overflow: hidden;
text-overflow: ellipsis;
white-space: nowrap;
display: block;
color: var(--text-secondary);
}
.ref-count {
display: inline-flex;
align-items: baseline;
gap: 4px;
}
.ref-count__value {
font-weight: 600;
color: var(--text-primary);
font-size: 1rem;
}
.ref-count__label {
font-size: 0.6875rem;
color: var(--text-muted);
text-transform: uppercase;
}
.storage-saved {
color: var(--success);
font-weight: 600;
}
.dashboard__loading {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
gap: 16px;
padding: 80px 32px;
color: var(--text-tertiary);
}
.dashboard__loading-spinner {
width: 40px;
height: 40px;
border: 3px solid var(--border-primary);
border-top-color: var(--accent-primary);
border-radius: 50%;
animation: spin 1s linear infinite;
}
@keyframes spin {
to {
transform: rotate(360deg);
}
}
.dashboard__error {
display: flex;
flex-direction: column;
align-items: center;
justify-content: center;
gap: 16px;
padding: 80px 32px;
text-align: center;
background: var(--bg-secondary);
border: 1px solid var(--border-primary);
border-radius: var(--radius-lg);
}
.dashboard__error svg {
color: var(--error);
opacity: 0.5;
}
.dashboard__error h3 {
font-size: 1.25rem;
font-weight: 600;
color: var(--text-primary);
}
.dashboard__error p {
color: var(--text-tertiary);
max-width: 400px;
}
.dashboard__error .btn {
margin-top: 8px;
}

View File

@@ -0,0 +1,436 @@
import { useState, useEffect } from 'react';
import { Link } from 'react-router-dom';
import { Stats, DeduplicationStats, ReferencedArtifact } from '../types';
import { getStats, getDeduplicationStats } from '../api';
import { DataTable } from '../components/DataTable';
import './Dashboard.css';
function formatBytes(bytes: number): string {
if (bytes === 0) return '0 B';
const k = 1024;
const sizes = ['B', 'KB', 'MB', 'GB', 'TB'];
const i = Math.floor(Math.log(bytes) / Math.log(k));
return `${parseFloat((bytes / Math.pow(k, i)).toFixed(2))} ${sizes[i]}`;
}
function formatNumber(num: number): string {
return num.toLocaleString();
}
function truncateHash(hash: string, length: number = 12): string {
if (hash.length <= length) return hash;
return `${hash.slice(0, length)}...`;
}
interface StatCardProps {
label: string;
value: string;
subvalue?: string;
icon: React.ReactNode;
variant?: 'default' | 'success' | 'accent';
trend?: 'up' | 'down' | 'neutral';
}
function StatCard({ label, value, subvalue, icon, variant = 'default', trend }: StatCardProps) {
return (
<div className={`stat-card stat-card--${variant}`}>
<div className="stat-card__icon">{icon}</div>
<div className="stat-card__content">
<span className="stat-card__label">{label}</span>
<span className="stat-card__value">
{value}
{trend && (
<span className={`stat-card__trend stat-card__trend--${trend}`}>
{trend === 'up' && '↑'}
{trend === 'down' && '↓'}
</span>
)}
</span>
{subvalue && <span className="stat-card__subvalue">{subvalue}</span>}
</div>
</div>
);
}
interface ProgressBarProps {
value: number;
max: number;
label?: string;
showPercentage?: boolean;
variant?: 'default' | 'success' | 'accent';
}
function ProgressBar({ value, max, label, showPercentage = true, variant = 'default' }: ProgressBarProps) {
const percentage = max > 0 ? Math.min((value / max) * 100, 100) : 0;
return (
<div className={`progress-bar progress-bar--${variant}`}>
{label && (
<div className="progress-bar__header">
<span className="progress-bar__label">{label}</span>
{showPercentage && <span className="progress-bar__percentage">{percentage.toFixed(1)}%</span>}
</div>
)}
<div className="progress-bar__track">
<div
className="progress-bar__fill"
style={{ width: `${percentage}%` }}
/>
<div className="progress-bar__glow" style={{ width: `${percentage}%` }} />
</div>
</div>
);
}
function Dashboard() {
const [stats, setStats] = useState<Stats | null>(null);
const [dedupStats, setDedupStats] = useState<DeduplicationStats | null>(null);
const [loading, setLoading] = useState(true);
const [error, setError] = useState<string | null>(null);
useEffect(() => {
async function loadStats() {
try {
setLoading(true);
const [statsData, dedupData] = await Promise.all([
getStats(),
getDeduplicationStats(),
]);
setStats(statsData);
setDedupStats(dedupData);
setError(null);
} catch (err) {
setError(err instanceof Error ? err.message : 'Failed to load statistics');
} finally {
setLoading(false);
}
}
loadStats();
}, []);
if (loading) {
return (
<div className="dashboard">
<div className="dashboard__loading">
<div className="dashboard__loading-spinner" />
<span>Loading statistics...</span>
</div>
</div>
);
}
if (error) {
return (
<div className="dashboard">
<div className="dashboard__error">
<svg width="48" height="48" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="1.5">
<circle cx="12" cy="12" r="10"/>
<line x1="12" y1="8" x2="12" y2="12"/>
<line x1="12" y1="16" x2="12.01" y2="16"/>
</svg>
<h3>Unable to load dashboard</h3>
<p>{error}</p>
<button className="btn btn-primary" onClick={() => window.location.reload()}>
Try Again
</button>
</div>
</div>
);
}
const artifactColumns = [
{
key: 'artifact_id',
header: 'Artifact ID',
render: (item: ReferencedArtifact) => (
<Link to={`/artifact/${item.artifact_id}`} className="artifact-link">
<code>{truncateHash(item.artifact_id, 16)}</code>
</Link>
),
},
{
key: 'original_name',
header: 'Name',
render: (item: ReferencedArtifact) => (
<span className="artifact-name" title={item.original_name || 'Unknown'}>
{item.original_name || '—'}
</span>
),
},
{
key: 'size',
header: 'Size',
render: (item: ReferencedArtifact) => formatBytes(item.size),
},
{
key: 'ref_count',
header: 'References',
render: (item: ReferencedArtifact) => (
<span className="ref-count">
<span className="ref-count__value">{formatNumber(item.ref_count)}</span>
<span className="ref-count__label">refs</span>
</span>
),
},
{
key: 'storage_saved',
header: 'Storage Saved',
render: (item: ReferencedArtifact) => (
<span className="storage-saved">
{formatBytes(item.storage_saved)}
</span>
),
},
];
return (
<div className="dashboard">
<header className="dashboard__header">
<div className="dashboard__header-content">
<h1>Storage Dashboard</h1>
<p className="dashboard__subtitle">Real-time deduplication and storage analytics</p>
</div>
<div className="dashboard__header-accent" />
</header>
<section className="dashboard__section">
<h2 className="dashboard__section-title">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M21 16V8a2 2 0 0 0-1-1.73l-7-4a2 2 0 0 0-2 0l-7 4A2 2 0 0 0 3 8v8a2 2 0 0 0 1 1.73l7 4a2 2 0 0 0 2 0l7-4A2 2 0 0 0 21 16z"/>
</svg>
Storage Overview
</h2>
<div className="stat-grid stat-grid--4">
<StatCard
label="Total Storage Used"
value={formatBytes(stats?.total_size_bytes || 0)}
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M22 12h-4l-3 9L9 3l-3 9H2"/>
</svg>
}
variant="default"
/>
<StatCard
label="Storage Saved"
value={formatBytes(stats?.storage_saved_bytes || 0)}
subvalue="through deduplication"
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<polyline points="23 6 13.5 15.5 8.5 10.5 1 18"/>
<polyline points="17 6 23 6 23 12"/>
</svg>
}
variant="success"
/>
<StatCard
label="Deduplication Ratio"
value={`${(stats?.deduplication_ratio || 1).toFixed(2)}x`}
subvalue="compression achieved"
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<rect x="3" y="3" width="18" height="18" rx="2" ry="2"/>
<line x1="3" y1="9" x2="21" y2="9"/>
<line x1="9" y1="21" x2="9" y2="9"/>
</svg>
}
variant="accent"
/>
<StatCard
label="Savings Percentage"
value={`${(dedupStats?.savings_percentage || 0).toFixed(1)}%`}
subvalue="of logical storage"
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<circle cx="12" cy="12" r="10"/>
<polyline points="12 6 12 12 16 14"/>
</svg>
}
variant="success"
/>
</div>
</section>
<section className="dashboard__section">
<h2 className="dashboard__section-title">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M12 20V10"/>
<path d="M18 20V4"/>
<path d="M6 20v-4"/>
</svg>
Artifact Statistics
</h2>
<div className="stat-grid stat-grid--4">
<StatCard
label="Total Artifacts"
value={formatNumber(stats?.total_artifacts || 0)}
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M14.5 2H6a2 2 0 0 0-2 2v16a2 2 0 0 0 2 2h12a2 2 0 0 0 2-2V7.5L14.5 2z"/>
<polyline points="14 2 14 8 20 8"/>
</svg>
}
/>
<StatCard
label="Total Uploads"
value={formatNumber(stats?.total_uploads || 0)}
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<path d="M21 15v4a2 2 0 0 1-2 2H5a2 2 0 0 1-2-2v-4"/>
<polyline points="17 8 12 3 7 8"/>
<line x1="12" y1="3" x2="12" y2="15"/>
</svg>
}
/>
<StatCard
label="Deduplicated Uploads"
value={formatNumber(stats?.deduplicated_uploads || 0)}
subvalue="uploads reusing existing"
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<rect x="9" y="9" width="13" height="13" rx="2" ry="2"/>
<path d="M5 15H4a2 2 0 0 1-2-2V4a2 2 0 0 1 2-2h9a2 2 0 0 1 2 2v1"/>
</svg>
}
variant="success"
/>
<StatCard
label="Unique Artifacts"
value={formatNumber(stats?.unique_artifacts || 0)}
subvalue="distinct content hashes"
icon={
<svg width="24" height="24" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<polygon points="12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2"/>
</svg>
}
/>
</div>
</section>
<section className="dashboard__section">
<h2 className="dashboard__section-title">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<line x1="12" y1="20" x2="12" y2="10"/>
<line x1="18" y1="20" x2="18" y2="4"/>
<line x1="6" y1="20" x2="6" y2="16"/>
</svg>
Deduplication Effectiveness
</h2>
<div className="effectiveness-grid">
<div className="effectiveness-card">
<h3>Logical vs Physical Storage</h3>
<div className="storage-comparison">
<div className="storage-bar">
<div className="storage-bar__label">
<span>Logical (with duplicates)</span>
<span className="storage-bar__value">{formatBytes(dedupStats?.total_logical_bytes || 0)}</span>
</div>
<ProgressBar
value={dedupStats?.total_logical_bytes || 0}
max={dedupStats?.total_logical_bytes || 1}
showPercentage={false}
variant="default"
/>
</div>
<div className="storage-bar">
<div className="storage-bar__label">
<span>Physical (actual storage)</span>
<span className="storage-bar__value">{formatBytes(dedupStats?.total_physical_bytes || 0)}</span>
</div>
<ProgressBar
value={dedupStats?.total_physical_bytes || 0}
max={dedupStats?.total_logical_bytes || 1}
showPercentage={false}
variant="success"
/>
</div>
</div>
<div className="storage-savings">
<div className="storage-savings__icon">
<svg width="32" height="32" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<polyline points="20 6 9 17 4 12"/>
</svg>
</div>
<div className="storage-savings__content">
<span className="storage-savings__value">{formatBytes(dedupStats?.bytes_saved || 0)}</span>
<span className="storage-savings__label">saved through deduplication</span>
</div>
</div>
</div>
<div className="effectiveness-card">
<h3>Deduplication Rate</h3>
<div className="dedup-rate">
<div className="dedup-rate__circle">
<svg viewBox="0 0 100 100" className="dedup-rate__svg">
<circle
cx="50"
cy="50"
r="45"
fill="none"
stroke="var(--border-primary)"
strokeWidth="8"
/>
<circle
cx="50"
cy="50"
r="45"
fill="none"
stroke="url(#gradient)"
strokeWidth="8"
strokeLinecap="round"
strokeDasharray={`${(dedupStats?.savings_percentage || 0) * 2.827} 282.7`}
transform="rotate(-90 50 50)"
/>
<defs>
<linearGradient id="gradient" x1="0%" y1="0%" x2="100%" y2="0%">
<stop offset="0%" stopColor="#10b981" />
<stop offset="100%" stopColor="#059669" />
</linearGradient>
</defs>
</svg>
<div className="dedup-rate__value">
<span className="dedup-rate__number">{(dedupStats?.savings_percentage || 0).toFixed(1)}</span>
<span className="dedup-rate__symbol">%</span>
</div>
</div>
<div className="dedup-rate__details">
<div className="dedup-rate__detail">
<span className="dedup-rate__detail-value">{(stats?.deduplication_ratio || 1).toFixed(2)}x</span>
<span className="dedup-rate__detail-label">Compression Ratio</span>
</div>
<div className="dedup-rate__detail">
<span className="dedup-rate__detail-value">{formatNumber(stats?.deduplicated_uploads || 0)}</span>
<span className="dedup-rate__detail-label">Duplicate Uploads</span>
</div>
</div>
</div>
</div>
</div>
</section>
{dedupStats && dedupStats.most_referenced_artifacts.length > 0 && (
<section className="dashboard__section">
<h2 className="dashboard__section-title">
<svg width="20" height="20" viewBox="0 0 24 24" fill="none" stroke="currentColor" strokeWidth="2">
<polygon points="12 2 15.09 8.26 22 9.27 17 14.14 18.18 21.02 12 17.77 5.82 21.02 7 14.14 2 9.27 8.91 8.26 12 2"/>
</svg>
Top Referenced Artifacts
</h2>
<p className="dashboard__section-description">
These artifacts are referenced most frequently across your storage, maximizing deduplication savings.
</p>
<DataTable
data={dedupStats.most_referenced_artifacts.slice(0, 10)}
columns={artifactColumns}
keyExtractor={(item) => item.artifact_id}
emptyMessage="No referenced artifacts found"
className="artifacts-table"
/>
</section>
)}
</div>
);
}
export default Dashboard;

View File

@@ -161,3 +161,67 @@ export interface GlobalSearchResponse {
export interface ProjectListParams extends ListParams {
visibility?: 'public' | 'private';
}
// Stats types
export interface Stats {
total_artifacts: number;
total_size_bytes: number;
unique_artifacts: number;
orphaned_artifacts: number;
orphaned_size_bytes: number;
total_uploads: number;
deduplicated_uploads: number;
deduplication_ratio: number;
storage_saved_bytes: number;
}
export interface ReferencedArtifact {
artifact_id: string;
ref_count: number;
size: number;
original_name: string | null;
content_type: string | null;
storage_saved: number;
}
export interface DeduplicationStats {
total_logical_bytes: number;
total_physical_bytes: number;
bytes_saved: number;
savings_percentage: number;
total_uploads: number;
unique_artifacts: number;
duplicate_uploads: number;
average_ref_count: number;
max_ref_count: number;
most_referenced_artifacts: ReferencedArtifact[];
}
export interface TimelineDataPoint {
date: string;
uploads: number;
deduplicated: number;
bytes_uploaded: number;
bytes_saved: number;
}
export interface TimelineStats {
period: 'day' | 'week' | 'month';
start_date: string;
end_date: string;
data_points: TimelineDataPoint[];
}
export interface CrossProjectDuplicate {
artifact_id: string;
size: number;
original_name: string | null;
projects: string[];
total_references: number;
}
export interface CrossProjectStats {
total_cross_project_duplicates: number;
bytes_saved_cross_project: number;
duplicates: CrossProjectDuplicate[];
}