Test Artifact Data Lake
A lightweight, cloud-native API for storing and querying test artifacts including CSV files, JSON files, binary files, and packet captures (PCAP). Built with FastAPI and supports both AWS S3 and self-hosted MinIO storage backends.
Features
- Multi-format Support: Store CSV, JSON, binary files, and PCAP files
- Flexible Storage: Switch between AWS S3 and self-hosted MinIO
- Rich Metadata: Track test configurations, results, and custom metadata
- Powerful Querying: Query artifacts by test name, suite, result, tags, date ranges, and more
- RESTful API: Clean REST API with automatic OpenAPI documentation
- Cloud-Native: Fully containerized with Docker and Kubernetes/Helm support
- Production-Ready: Includes GitLab CI/CD pipeline for automated deployments
Architecture
┌─────────────┐
│ FastAPI │ ← REST API
│ Backend │
└──────┬──────┘
│
├─────────┐
↓ ↓
┌──────────┐ ┌────────────┐
│PostgreSQL│ │ S3/MinIO │
│(Metadata)│ │ (Blobs) │
└──────────┘ └────────────┘
- PostgreSQL: Stores artifact metadata, test configs, and query indexes
- S3/MinIO: Stores actual file contents (blob storage)
- FastAPI: Async REST API for uploads, downloads, and queries
Quick Start
One-Command Setup
Linux/macOS:
./scripts/quickstart.sh
Windows (PowerShell):
.\scripts\quickstart.ps1
Manual Setup with Docker Compose
- Clone the repository:
git clone <repository-url>
cd datalake
- Copy environment configuration:
cp .env.example .env
- Start all services:
docker-compose up -d
- Access the application:
- Web UI: http://localhost:8000
- API Docs: http://localhost:8000/docs
- MinIO Console: http://localhost:9001
Using Python Directly
- Install dependencies:
pip install -r requirements.txt
-
Set up PostgreSQL and MinIO/S3
-
Configure environment variables in
.env -
Run the application:
python -m uvicorn app.main:app --reload
API Usage
Upload an Artifact
curl -X POST "http://localhost:8000/api/v1/artifacts/upload" \
-F "file=@test_results.csv" \
-F "test_name=auth_test" \
-F "test_suite=integration" \
-F "test_result=pass" \
-F 'test_config={"browser":"chrome","timeout":30}' \
-F 'tags=["regression","smoke"]' \
-F "description=Authentication test results"
Query Artifacts
curl -X POST "http://localhost:8000/api/v1/artifacts/query" \
-H "Content-Type: application/json" \
-d '{
"test_suite": "integration",
"test_result": "fail",
"start_date": "2024-01-01T00:00:00",
"limit": 50
}'
Download an Artifact
curl -X GET "http://localhost:8000/api/v1/artifacts/123/download" \
-o downloaded_file.csv
Get Presigned URL
curl -X GET "http://localhost:8000/api/v1/artifacts/123/url?expiration=3600"
List All Artifacts
curl -X GET "http://localhost:8000/api/v1/artifacts/?limit=100&offset=0"
Delete an Artifact
curl -X DELETE "http://localhost:8000/api/v1/artifacts/123"
API Endpoints
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/v1/artifacts/upload |
Upload a new artifact with metadata |
| GET | /api/v1/artifacts/{id} |
Get artifact metadata by ID |
| GET | /api/v1/artifacts/{id}/download |
Download artifact file |
| GET | /api/v1/artifacts/{id}/url |
Get presigned download URL |
| DELETE | /api/v1/artifacts/{id} |
Delete artifact and file |
| POST | /api/v1/artifacts/query |
Query artifacts with filters |
| GET | /api/v1/artifacts/ |
List all artifacts (paginated) |
| GET | / |
API information |
| GET | /health |
Health check |
| GET | /docs |
Interactive API documentation |
Configuration
Environment Variables
| Variable | Description | Default |
|---|---|---|
DATABASE_URL |
PostgreSQL connection string | postgresql://user:password@localhost:5432/datalake |
STORAGE_BACKEND |
Storage backend (s3 or minio) |
minio |
AWS_ACCESS_KEY_ID |
AWS access key (for S3) | - |
AWS_SECRET_ACCESS_KEY |
AWS secret key (for S3) | - |
AWS_REGION |
AWS region (for S3) | us-east-1 |
S3_BUCKET_NAME |
S3 bucket name | test-artifacts |
MINIO_ENDPOINT |
MinIO endpoint | localhost:9000 |
MINIO_ACCESS_KEY |
MinIO access key | minioadmin |
MINIO_SECRET_KEY |
MinIO secret key | minioadmin |
MINIO_BUCKET_NAME |
MinIO bucket name | test-artifacts |
MINIO_SECURE |
Use HTTPS for MinIO | false |
API_HOST |
API host | 0.0.0.0 |
API_PORT |
API port | 8000 |
MAX_UPLOAD_SIZE |
Max upload size (bytes) | 524288000 (500MB) |
Switching Between S3 and MinIO
To use AWS S3:
STORAGE_BACKEND=s3
AWS_ACCESS_KEY_ID=your_key
AWS_SECRET_ACCESS_KEY=your_secret
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-bucket
To use self-hosted MinIO:
STORAGE_BACKEND=minio
MINIO_ENDPOINT=minio:9000
MINIO_ACCESS_KEY=minioadmin
MINIO_SECRET_KEY=minioadmin
MINIO_BUCKET_NAME=test-artifacts
Deployment
Kubernetes with Helm
- Build and push Docker image:
docker build -t your-registry/datalake:latest .
docker push your-registry/datalake:latest
- Install with Helm:
helm install datalake ./helm \
--set image.repository=your-registry/datalake \
--set image.tag=latest \
--namespace datalake \
--create-namespace
- Access the API:
kubectl port-forward -n datalake svc/datalake 8000:8000
Helm Configuration
Edit helm/values.yaml to customize:
- Replica count
- Resource limits
- Storage backend (S3 vs MinIO)
- Ingress settings
- PostgreSQL settings
- Autoscaling
GitLab CI/CD
The included .gitlab-ci.yml provides:
- Automated testing
- Linting
- Docker image builds
- Deployments to dev/staging/prod
Required GitLab CI/CD Variables:
CI_REGISTRY_USER: Docker registry usernameCI_REGISTRY_PASSWORD: Docker registry passwordKUBE_CONFIG_DEV: Base64-encoded kubeconfig for devKUBE_CONFIG_STAGING: Base64-encoded kubeconfig for stagingKUBE_CONFIG_PROD: Base64-encoded kubeconfig for prod
Database Schema
The artifacts table stores:
- File metadata (name, type, size, storage path)
- Test information (name, suite, config, result)
- Custom metadata and tags
- Timestamps and versioning
Example Use Cases
Store Test Results
Upload CSV files containing test execution results with metadata about the test suite and configuration.
Archive Packet Captures
Store PCAP files from network tests with tags for easy filtering and retrieval.
Track Test Configurations
Upload JSON test configurations and query them by date, test suite, or custom tags.
Binary Artifact Storage
Store compiled binaries, test data files, or any binary artifacts with full metadata.
Development
Running Tests
pytest tests/ -v
Code Formatting
black app/
flake8 app/
Database Migrations
alembic revision --autogenerate -m "description"
alembic upgrade head
Troubleshooting
Cannot Connect to Database
- Verify PostgreSQL is running
- Check
DATABASE_URLis correct - Ensure database exists
Cannot Upload Files
- Check storage backend is running (MinIO or S3 accessible)
- Verify credentials are correct
- Check file size is under
MAX_UPLOAD_SIZE
MinIO Connection Failed
- Ensure MinIO service is running
- Verify
MINIO_ENDPOINTis correct - Check MinIO credentials
Documentation
Detailed documentation is available in the docs/ folder:
- Quick Start Guide - Get started in minutes
- API Documentation - Complete API reference
- Architecture - System design and architecture
- Features - Detailed feature descriptions
- Deployment Guide - Production deployment instructions
- Frontend Setup - Angular frontend setup
- Frontend Usage - Using the web UI
Scripts
Helper scripts are available in the scripts/ folder:
quickstart.sh/quickstart.ps1- Quick start with Docker Composequickstart-build.sh- Quick start with image rebuilddev-start.sh/dev-start.ps1- Start development environment
License
[Your License Here]
Support
For issues and questions, please open an issue in the repository.