2025-10-15 14:47:09 -05:00
2025-10-15 14:47:09 -05:00
2025-10-14 15:37:37 -05:00
2025-10-14 15:37:37 -05:00
2025-10-15 11:44:01 -05:00
2025-10-14 23:32:38 -05:00
2025-10-15 14:28:38 -05:00
2025-10-14 15:37:37 -05:00
2025-10-14 15:37:37 -05:00
2025-10-15 14:28:38 -05:00
2025-10-14 15:37:37 -05:00
2025-10-15 14:28:38 -05:00
2025-10-15 14:28:38 -05:00
2025-10-15 14:28:38 -05:00
2025-10-14 15:37:37 -05:00

Obsidian

Enterprise Test Artifact Storage

A lightweight, cloud-native API for storing and querying test artifacts including CSV files, JSON files, binary files, and packet captures (PCAP). Built with FastAPI and supports both AWS S3 and self-hosted MinIO storage backends.

Features

  • Multi-format Support: Store CSV, JSON, binary files, and PCAP files
  • Flexible Storage: Switch between AWS S3 and self-hosted MinIO
  • Rich Metadata: Track test configurations, results, and custom metadata
  • Powerful Querying: Query artifacts by test name, suite, result, tags, date ranges, and more
  • RESTful API: Clean REST API with automatic OpenAPI documentation
  • Cloud-Native: Fully containerized with Docker and Kubernetes/Helm support
  • Production-Ready: Includes GitLab CI/CD pipeline for automated deployments

Architecture

┌─────────────┐
│   FastAPI   │  ← REST API
│   Backend   │
└──────┬──────┘
       │
       ├─────────┐
       ↓         ↓
┌──────────┐  ┌────────────┐
│PostgreSQL│  │ S3/MinIO   │
│(Metadata)│  │ (Blobs)    │
└──────────┘  └────────────┘
  • PostgreSQL: Stores artifact metadata, test configs, and query indexes
  • S3/MinIO: Stores actual file contents (blob storage)
  • FastAPI: Async REST API for uploads, downloads, and queries

Quick Start

One-Command Setup

Linux/macOS:

./scripts/quickstart.sh

Windows (PowerShell):

.\scripts\quickstart.ps1

Manual Setup with Docker Compose

  1. Clone the repository:
git clone <repository-url>
cd datalake
  1. Copy environment configuration:
cp .env.example .env
  1. Start all services:
docker-compose up -d
  1. Access the application:

Using Python Directly

  1. Install dependencies:
pip install -r requirements.txt
  1. Set up PostgreSQL and MinIO/S3

  2. Configure environment variables in .env

  3. Run the application:

python -m uvicorn app.main:app --reload

API Usage

Upload an Artifact

curl -X POST "http://localhost:8000/api/v1/artifacts/upload" \
  -F "file=@test_results.csv" \
  -F "test_name=auth_test" \
  -F "test_suite=integration" \
  -F "test_result=pass" \
  -F 'test_config={"browser":"chrome","timeout":30}' \
  -F 'tags=["regression","smoke"]' \
  -F "description=Authentication test results"

Query Artifacts

curl -X POST "http://localhost:8000/api/v1/artifacts/query" \
  -H "Content-Type: application/json" \
  -d '{
    "test_suite": "integration",
    "test_result": "fail",
    "start_date": "2024-01-01T00:00:00",
    "limit": 50
  }'

Download an Artifact

curl -X GET "http://localhost:8000/api/v1/artifacts/123/download" \
  -o downloaded_file.csv

Get Presigned URL

curl -X GET "http://localhost:8000/api/v1/artifacts/123/url?expiration=3600"

List All Artifacts

curl -X GET "http://localhost:8000/api/v1/artifacts/?limit=100&offset=0"

Delete an Artifact

curl -X DELETE "http://localhost:8000/api/v1/artifacts/123"

API Endpoints

Method Endpoint Description
POST /api/v1/artifacts/upload Upload a new artifact with metadata
GET /api/v1/artifacts/{id} Get artifact metadata by ID
GET /api/v1/artifacts/{id}/download Download artifact file
GET /api/v1/artifacts/{id}/url Get presigned download URL
DELETE /api/v1/artifacts/{id} Delete artifact and file
POST /api/v1/artifacts/query Query artifacts with filters
GET /api/v1/artifacts/ List all artifacts (paginated)
GET / API information
GET /health Health check
GET /docs Interactive API documentation

Configuration

Environment Variables

Variable Description Default
DATABASE_URL PostgreSQL connection string postgresql://user:password@localhost:5432/datalake
STORAGE_BACKEND Storage backend (s3 or minio) minio
AWS_ACCESS_KEY_ID AWS access key (for S3) -
AWS_SECRET_ACCESS_KEY AWS secret key (for S3) -
AWS_REGION AWS region (for S3) us-east-1
S3_BUCKET_NAME S3 bucket name test-artifacts
MINIO_ENDPOINT MinIO endpoint localhost:9000
MINIO_ACCESS_KEY MinIO access key minioadmin
MINIO_SECRET_KEY MinIO secret key minioadmin
MINIO_BUCKET_NAME MinIO bucket name test-artifacts
MINIO_SECURE Use HTTPS for MinIO false
API_HOST API host 0.0.0.0
API_PORT API port 8000
MAX_UPLOAD_SIZE Max upload size (bytes) 524288000 (500MB)

Switching Between S3 and MinIO

To use AWS S3:

STORAGE_BACKEND=s3
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-bucket

To use self-hosted MinIO:

STORAGE_BACKEND=minio
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET_NAME=test-artifacts

Deployment

Kubernetes with Helm

  1. Build and push Docker image:
docker build -t your-registry/datalake:latest .
docker push your-registry/datalake:latest
  1. Install with Helm:
helm install datalake ./helm \
  --set image.repository=your-registry/datalake \
  --set image.tag=latest \
  --namespace datalake \
  --create-namespace
  1. Access the API:
kubectl port-forward -n datalake svc/datalake 8000:8000

Helm Configuration

Edit helm/values.yaml to customize:

  • Replica count
  • Resource limits
  • Storage backend (S3 vs MinIO)
  • Ingress settings
  • PostgreSQL settings
  • Autoscaling

GitLab CI/CD

The included .gitlab-ci.yml provides:

  • Automated testing
  • Linting
  • Docker image builds
  • Deployments to dev/staging/prod

Required GitLab CI/CD Variables:

  • CI_REGISTRY_USER: Docker registry username
  • CI_REGISTRY_PASSWORD: Docker registry password
  • KUBE_CONFIG_DEV: Base64-encoded kubeconfig for dev
  • KUBE_CONFIG_STAGING: Base64-encoded kubeconfig for staging
  • KUBE_CONFIG_PROD: Base64-encoded kubeconfig for prod

Database Schema

The artifacts table stores:

  • File metadata (name, type, size, storage path)
  • Test information (name, suite, config, result)
  • Custom metadata and tags
  • Timestamps and versioning

Example Use Cases

Store Test Results

Upload CSV files containing test execution results with metadata about the test suite and configuration.

Archive Packet Captures

Store PCAP files from network tests with tags for easy filtering and retrieval.

Track Test Configurations

Upload JSON test configurations and query them by date, test suite, or custom tags.

Binary Artifact Storage

Store compiled binaries, test data files, or any binary artifacts with full metadata.

Development

NPM Registry Configuration

The frontend supports working with multiple npm registries (public npm vs corporate Artifactory). See frontend/README-REGISTRY.md for detailed instructions.

Quick switch:

cd frontend

# Use public npm (default)
npm run registry:public
npm ci --force

# Use Artifactory
npm run registry:artifactory
npm ci --force

Running Tests

pytest tests/ -v

Code Formatting

black app/
flake8 app/

Database Migrations

alembic revision --autogenerate -m "description"
alembic upgrade head

Troubleshooting

Cannot Connect to Database

  • Verify PostgreSQL is running
  • Check DATABASE_URL is correct
  • Ensure database exists

Cannot Upload Files

  • Check storage backend is running (MinIO or S3 accessible)
  • Verify credentials are correct
  • Check file size is under MAX_UPLOAD_SIZE

MinIO Connection Failed

  • Ensure MinIO service is running
  • Verify MINIO_ENDPOINT is correct
  • Check MinIO credentials

Documentation

Detailed documentation is available in the docs/ folder:

Scripts

Helper scripts are available in the scripts/ folder:

  • quickstart.sh / quickstart.ps1 - Quick start with Docker Compose
  • quickstart-build.sh - Quick start with image rebuild
  • dev-start.sh / dev-start.ps1 - Start development environment

License

[Your License Here]

Support

For issues and questions, please open an issue in the repository.

Description
No description provided
Readme 2.2 MiB
Languages
Python 27.5%
HTML 27.5%
CSS 15.9%
TypeScript 14.4%
Shell 5.6%
Other 9.1%