8.6 KiB
8.6 KiB
Deployment Guide
This guide covers deploying the Test Artifact Data Lake in various environments.
Table of Contents
Local Development
Prerequisites
- Python 3.11+
- PostgreSQL 15+
- MinIO or AWS S3 access
Steps
- Create virtual environment:
python -m venv venv
source venv/bin/activate # On Windows: venv\Scripts\activate
- Install dependencies:
pip install -r requirements.txt
- Set up PostgreSQL:
createdb datalake
- Configure environment:
cp .env.example .env
# Edit .env with your configuration
- Run the application:
python -m uvicorn app.main:app --reload
Docker Compose
Quick Start
- Start all services:
docker-compose up -d
- Check logs:
docker-compose logs -f api
- Stop services:
docker-compose down
Services Included
- PostgreSQL (port 5432)
- MinIO (port 9000, console 9001)
- API (port 8000)
Customization
Edit docker-compose.yml to:
- Change port mappings
- Adjust resource limits
- Add environment variables
- Configure volumes
Kubernetes/Helm
Prerequisites
- Kubernetes cluster (1.24+)
- Helm 3.x
- kubectl configured
Installation
- Add dependencies (if using PostgreSQL/MinIO from Bitnami):
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
- Install with default values:
helm install datalake ./helm \
--namespace datalake \
--create-namespace
- Custom installation:
helm install datalake ./helm \
--namespace datalake \
--create-namespace \
--set image.repository=your-registry/datalake \
--set image.tag=1.0.0 \
--set ingress.enabled=true \
--set ingress.hosts[0].host=datalake.yourdomain.com
Configuration Options
Image:
--set image.repository=your-registry/datalake
--set image.tag=1.0.0
--set image.pullPolicy=Always
Resources:
--set resources.requests.cpu=1000m
--set resources.requests.memory=1Gi
--set resources.limits.cpu=2000m
--set resources.limits.memory=2Gi
Autoscaling:
--set autoscaling.enabled=true
--set autoscaling.minReplicas=3
--set autoscaling.maxReplicas=10
--set autoscaling.targetCPUUtilizationPercentage=80
Ingress:
--set ingress.enabled=true
--set ingress.className=nginx
--set ingress.hosts[0].host=datalake.example.com
--set ingress.hosts[0].paths[0].path=/
--set ingress.hosts[0].paths[0].pathType=Prefix
Upgrade
helm upgrade datalake ./helm \
--namespace datalake \
--set image.tag=1.1.0
Uninstall
helm uninstall datalake --namespace datalake
AWS Deployment
Using AWS S3 Storage
- Create S3 bucket:
aws s3 mb s3://your-test-artifacts-bucket
- Create IAM user with S3 access:
aws iam create-user --user-name datalake-service
aws iam attach-user-policy --user-name datalake-service \
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
- Generate access keys:
aws iam create-access-key --user-name datalake-service
- Deploy with Helm:
helm install datalake ./helm \
--namespace datalake \
--create-namespace \
--set config.storageBackend=s3 \
--set aws.enabled=true \
--set aws.accessKeyId=YOUR_ACCESS_KEY \
--set aws.secretAccessKey=YOUR_SECRET_KEY \
--set aws.region=us-east-1 \
--set aws.bucketName=your-test-artifacts-bucket \
--set minio.enabled=false
Using EKS
- Create EKS cluster:
eksctl create cluster \
--name datalake-cluster \
--region us-east-1 \
--nodegroup-name standard-workers \
--node-type t3.medium \
--nodes 3
- Configure kubectl:
aws eks update-kubeconfig --name datalake-cluster --region us-east-1
- Deploy application:
helm install datalake ./helm \
--namespace datalake \
--create-namespace \
--set config.storageBackend=s3
Using RDS for PostgreSQL
helm install datalake ./helm \
--namespace datalake \
--create-namespace \
--set postgresql.enabled=false \
--set config.databaseUrl="postgresql://user:pass@your-rds-endpoint:5432/datalake"
Self-Hosted Deployment
Using MinIO
- Deploy MinIO:
helm install minio bitnami/minio \
--namespace datalake \
--create-namespace \
--set auth.rootUser=admin \
--set auth.rootPassword=adminpassword \
--set persistence.size=100Gi
- Deploy application:
helm install datalake ./helm \
--namespace datalake \
--set config.storageBackend=minio \
--set minio.enabled=false \
--set minio.endpoint=minio:9000 \
--set minio.accessKey=admin \
--set minio.secretKey=adminpassword
On-Premise Kubernetes
- Prepare persistent volumes:
apiVersion: v1
kind: PersistentVolume
metadata:
name: datalake-postgres-pv
spec:
capacity:
storage: 20Gi
accessModes:
- ReadWriteOnce
hostPath:
path: /data/postgres
- Deploy with local storage:
helm install datalake ./helm \
--namespace datalake \
--create-namespace \
--set postgresql.persistence.storageClass=local-storage \
--set minio.persistence.storageClass=local-storage
GitLab CI/CD
Setup
- Configure GitLab variables:
Go to Settings → CI/CD → Variables and add:
| Variable | Description | Protected | Masked |
|---|---|---|---|
CI_REGISTRY_USER |
Docker registry username | No | No |
CI_REGISTRY_PASSWORD |
Docker registry password | No | Yes |
KUBE_CONFIG_DEV |
Base64 kubeconfig for dev | No | Yes |
KUBE_CONFIG_STAGING |
Base64 kubeconfig for staging | Yes | Yes |
KUBE_CONFIG_PROD |
Base64 kubeconfig for prod | Yes | Yes |
- Encode kubeconfig:
cat ~/.kube/config | base64 -w 0
Pipeline Stages
- Test: Runs on all branches and MRs
- Build: Builds Docker image on main/develop/tags
- Deploy: Manual deployment to dev/staging/prod
Deployment Flow
Development:
git push origin develop
# Manually trigger deploy:dev job in GitLab
Staging:
git push origin main
# Manually trigger deploy:staging job in GitLab
Production:
git tag v1.0.0
git push origin v1.0.0
# Manually trigger deploy:prod job in GitLab
Customizing Pipeline
Edit .gitlab-ci.yml to:
- Add more test stages
- Change deployment namespaces
- Adjust Helm values per environment
- Add security scanning
- Configure rollback procedures
Monitoring
Health Checks
# Kubernetes
kubectl get pods -n datalake
kubectl logs -f -n datalake deployment/datalake
# Direct
curl http://localhost:8000/health
Metrics
Add Prometheus monitoring:
helm install datalake ./helm \
--set metrics.enabled=true \
--set serviceMonitor.enabled=true
Backup and Recovery
Database Backup
# PostgreSQL
kubectl exec -n datalake deployment/datalake-postgresql -- \
pg_dump -U user datalake > backup.sql
# Restore
kubectl exec -i -n datalake deployment/datalake-postgresql -- \
psql -U user datalake < backup.sql
Storage Backup
S3:
aws s3 sync s3://your-bucket s3://backup-bucket
MinIO:
mc mirror minio/test-artifacts backup/test-artifacts
Troubleshooting
Pod Not Starting
kubectl describe pod -n datalake <pod-name>
kubectl logs -n datalake <pod-name>
Database Connection Issues
kubectl exec -it -n datalake deployment/datalake -- \
psql $DATABASE_URL
Storage Issues
# Check MinIO
kubectl port-forward -n datalake svc/minio 9000:9000
# Access http://localhost:9000
Security Considerations
-
Use secrets management:
- Kubernetes Secrets
- AWS Secrets Manager
- HashiCorp Vault
-
Enable TLS:
- Configure ingress with TLS certificates
- Use cert-manager for automatic certificates
-
Network policies:
- Restrict pod-to-pod communication
- Limit external access
-
RBAC:
- Configure Kubernetes RBAC
- Limit service account permissions
Performance Tuning
Database
- Increase connection pool size
- Add database indexes
- Configure autovacuum
API
- Increase replica count
- Configure horizontal pod autoscaling
- Adjust resource requests/limits
Storage
- Use CDN for frequently accessed files
- Configure S3 Transfer Acceleration
- Optimize MinIO deployment