466 lines
8.6 KiB
Markdown
466 lines
8.6 KiB
Markdown
# Deployment Guide
|
|
|
|
This guide covers deploying the Test Artifact Data Lake in various environments.
|
|
|
|
## Table of Contents
|
|
- [Local Development](#local-development)
|
|
- [Docker Compose](#docker-compose)
|
|
- [Kubernetes/Helm](#kuberneteshelm)
|
|
- [AWS Deployment](#aws-deployment)
|
|
- [Self-Hosted Deployment](#self-hosted-deployment)
|
|
- [GitLab CI/CD](#gitlab-cicd)
|
|
|
|
---
|
|
|
|
## Local Development
|
|
|
|
### Prerequisites
|
|
- Python 3.11+
|
|
- PostgreSQL 15+
|
|
- MinIO or AWS S3 access
|
|
|
|
### Steps
|
|
|
|
1. **Create virtual environment:**
|
|
```bash
|
|
python -m venv venv
|
|
source venv/bin/activate # On Windows: venv\Scripts\activate
|
|
```
|
|
|
|
2. **Install dependencies:**
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
3. **Set up PostgreSQL:**
|
|
```bash
|
|
createdb datalake
|
|
```
|
|
|
|
4. **Configure environment:**
|
|
```bash
|
|
cp .env.example .env
|
|
# Edit .env with your configuration
|
|
```
|
|
|
|
5. **Run the application:**
|
|
```bash
|
|
python -m uvicorn app.main:app --reload
|
|
```
|
|
|
|
---
|
|
|
|
## Docker Compose
|
|
|
|
### Quick Start
|
|
|
|
1. **Start all services:**
|
|
```bash
|
|
docker-compose up -d
|
|
```
|
|
|
|
2. **Check logs:**
|
|
```bash
|
|
docker-compose logs -f api
|
|
```
|
|
|
|
3. **Stop services:**
|
|
```bash
|
|
docker-compose down
|
|
```
|
|
|
|
### Services Included
|
|
- PostgreSQL (port 5432)
|
|
- MinIO (port 9000, console 9001)
|
|
- API (port 8000)
|
|
|
|
### Customization
|
|
|
|
Edit `docker-compose.yml` to:
|
|
- Change port mappings
|
|
- Adjust resource limits
|
|
- Add environment variables
|
|
- Configure volumes
|
|
|
|
---
|
|
|
|
## Kubernetes/Helm
|
|
|
|
### Prerequisites
|
|
- Kubernetes cluster (1.24+)
|
|
- Helm 3.x
|
|
- kubectl configured
|
|
|
|
### Installation
|
|
|
|
1. **Add dependencies (if using PostgreSQL/MinIO from Bitnami):**
|
|
```bash
|
|
helm repo add bitnami https://charts.bitnami.com/bitnami
|
|
helm repo update
|
|
```
|
|
|
|
2. **Install with default values:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace
|
|
```
|
|
|
|
3. **Custom installation:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set image.repository=your-registry/datalake \
|
|
--set image.tag=1.0.0 \
|
|
--set ingress.enabled=true \
|
|
--set ingress.hosts[0].host=datalake.yourdomain.com
|
|
```
|
|
|
|
### Configuration Options
|
|
|
|
**Image:**
|
|
```bash
|
|
--set image.repository=your-registry/datalake
|
|
--set image.tag=1.0.0
|
|
--set image.pullPolicy=Always
|
|
```
|
|
|
|
**Resources:**
|
|
```bash
|
|
--set resources.requests.cpu=1000m
|
|
--set resources.requests.memory=1Gi
|
|
--set resources.limits.cpu=2000m
|
|
--set resources.limits.memory=2Gi
|
|
```
|
|
|
|
**Autoscaling:**
|
|
```bash
|
|
--set autoscaling.enabled=true
|
|
--set autoscaling.minReplicas=3
|
|
--set autoscaling.maxReplicas=10
|
|
--set autoscaling.targetCPUUtilizationPercentage=80
|
|
```
|
|
|
|
**Ingress:**
|
|
```bash
|
|
--set ingress.enabled=true
|
|
--set ingress.className=nginx
|
|
--set ingress.hosts[0].host=datalake.example.com
|
|
--set ingress.hosts[0].paths[0].path=/
|
|
--set ingress.hosts[0].paths[0].pathType=Prefix
|
|
```
|
|
|
|
### Upgrade
|
|
|
|
```bash
|
|
helm upgrade datalake ./helm \
|
|
--namespace datalake \
|
|
--set image.tag=1.1.0
|
|
```
|
|
|
|
### Uninstall
|
|
|
|
```bash
|
|
helm uninstall datalake --namespace datalake
|
|
```
|
|
|
|
---
|
|
|
|
## AWS Deployment
|
|
|
|
### Using AWS S3 Storage
|
|
|
|
1. **Create S3 bucket:**
|
|
```bash
|
|
aws s3 mb s3://your-test-artifacts-bucket
|
|
```
|
|
|
|
2. **Create IAM user with S3 access:**
|
|
```bash
|
|
aws iam create-user --user-name datalake-service
|
|
aws iam attach-user-policy --user-name datalake-service \
|
|
--policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
|
|
```
|
|
|
|
3. **Generate access keys:**
|
|
```bash
|
|
aws iam create-access-key --user-name datalake-service
|
|
```
|
|
|
|
4. **Deploy with Helm:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set config.storageBackend=s3 \
|
|
--set aws.enabled=true \
|
|
--set aws.accessKeyId=YOUR_ACCESS_KEY \
|
|
--set aws.secretAccessKey=YOUR_SECRET_KEY \
|
|
--set aws.region=us-east-1 \
|
|
--set aws.bucketName=your-test-artifacts-bucket \
|
|
--set minio.enabled=false
|
|
```
|
|
|
|
### Using EKS
|
|
|
|
1. **Create EKS cluster:**
|
|
```bash
|
|
eksctl create cluster \
|
|
--name datalake-cluster \
|
|
--region us-east-1 \
|
|
--nodegroup-name standard-workers \
|
|
--node-type t3.medium \
|
|
--nodes 3
|
|
```
|
|
|
|
2. **Configure kubectl:**
|
|
```bash
|
|
aws eks update-kubeconfig --name datalake-cluster --region us-east-1
|
|
```
|
|
|
|
3. **Deploy application:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set config.storageBackend=s3
|
|
```
|
|
|
|
### Using RDS for PostgreSQL
|
|
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set postgresql.enabled=false \
|
|
--set config.databaseUrl="postgresql://user:pass@your-rds-endpoint:5432/datalake"
|
|
```
|
|
|
|
---
|
|
|
|
## Self-Hosted Deployment
|
|
|
|
### Using MinIO
|
|
|
|
1. **Deploy MinIO:**
|
|
```bash
|
|
helm install minio bitnami/minio \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set auth.rootUser=admin \
|
|
--set auth.rootPassword=adminpassword \
|
|
--set persistence.size=100Gi
|
|
```
|
|
|
|
2. **Deploy application:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--set config.storageBackend=minio \
|
|
--set minio.enabled=false \
|
|
--set minio.endpoint=minio:9000 \
|
|
--set minio.accessKey=admin \
|
|
--set minio.secretKey=adminpassword
|
|
```
|
|
|
|
### On-Premise Kubernetes
|
|
|
|
1. **Prepare persistent volumes:**
|
|
```yaml
|
|
apiVersion: v1
|
|
kind: PersistentVolume
|
|
metadata:
|
|
name: datalake-postgres-pv
|
|
spec:
|
|
capacity:
|
|
storage: 20Gi
|
|
accessModes:
|
|
- ReadWriteOnce
|
|
hostPath:
|
|
path: /data/postgres
|
|
```
|
|
|
|
2. **Deploy with local storage:**
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--namespace datalake \
|
|
--create-namespace \
|
|
--set postgresql.persistence.storageClass=local-storage \
|
|
--set minio.persistence.storageClass=local-storage
|
|
```
|
|
|
|
---
|
|
|
|
## GitLab CI/CD
|
|
|
|
### Setup
|
|
|
|
1. **Configure GitLab variables:**
|
|
|
|
Go to Settings → CI/CD → Variables and add:
|
|
|
|
| Variable | Description | Protected | Masked |
|
|
|----------|-------------|-----------|---------|
|
|
| `CI_REGISTRY_USER` | Docker registry username | No | No |
|
|
| `CI_REGISTRY_PASSWORD` | Docker registry password | No | Yes |
|
|
| `KUBE_CONFIG_DEV` | Base64 kubeconfig for dev | No | Yes |
|
|
| `KUBE_CONFIG_STAGING` | Base64 kubeconfig for staging | Yes | Yes |
|
|
| `KUBE_CONFIG_PROD` | Base64 kubeconfig for prod | Yes | Yes |
|
|
|
|
2. **Encode kubeconfig:**
|
|
```bash
|
|
cat ~/.kube/config | base64 -w 0
|
|
```
|
|
|
|
### Pipeline Stages
|
|
|
|
1. **Test**: Runs on all branches and MRs
|
|
2. **Build**: Builds Docker image on main/develop/tags
|
|
3. **Deploy**: Manual deployment to dev/staging/prod
|
|
|
|
### Deployment Flow
|
|
|
|
**Development:**
|
|
```bash
|
|
git push origin develop
|
|
# Manually trigger deploy:dev job in GitLab
|
|
```
|
|
|
|
**Staging:**
|
|
```bash
|
|
git push origin main
|
|
# Manually trigger deploy:staging job in GitLab
|
|
```
|
|
|
|
**Production:**
|
|
```bash
|
|
git tag v1.0.0
|
|
git push origin v1.0.0
|
|
# Manually trigger deploy:prod job in GitLab
|
|
```
|
|
|
|
### Customizing Pipeline
|
|
|
|
Edit `.gitlab-ci.yml` to:
|
|
- Add more test stages
|
|
- Change deployment namespaces
|
|
- Adjust Helm values per environment
|
|
- Add security scanning
|
|
- Configure rollback procedures
|
|
|
|
---
|
|
|
|
## Monitoring
|
|
|
|
### Health Checks
|
|
|
|
```bash
|
|
# Kubernetes
|
|
kubectl get pods -n datalake
|
|
kubectl logs -f -n datalake deployment/datalake
|
|
|
|
# Direct
|
|
curl http://localhost:8000/health
|
|
```
|
|
|
|
### Metrics
|
|
|
|
Add Prometheus monitoring:
|
|
```bash
|
|
helm install datalake ./helm \
|
|
--set metrics.enabled=true \
|
|
--set serviceMonitor.enabled=true
|
|
```
|
|
|
|
---
|
|
|
|
## Backup and Recovery
|
|
|
|
### Database Backup
|
|
|
|
```bash
|
|
# PostgreSQL
|
|
kubectl exec -n datalake deployment/datalake-postgresql -- \
|
|
pg_dump -U user datalake > backup.sql
|
|
|
|
# Restore
|
|
kubectl exec -i -n datalake deployment/datalake-postgresql -- \
|
|
psql -U user datalake < backup.sql
|
|
```
|
|
|
|
### Storage Backup
|
|
|
|
**S3:**
|
|
```bash
|
|
aws s3 sync s3://your-bucket s3://backup-bucket
|
|
```
|
|
|
|
**MinIO:**
|
|
```bash
|
|
mc mirror minio/test-artifacts backup/test-artifacts
|
|
```
|
|
|
|
---
|
|
|
|
## Troubleshooting
|
|
|
|
### Pod Not Starting
|
|
```bash
|
|
kubectl describe pod -n datalake <pod-name>
|
|
kubectl logs -n datalake <pod-name>
|
|
```
|
|
|
|
### Database Connection Issues
|
|
```bash
|
|
kubectl exec -it -n datalake deployment/datalake -- \
|
|
psql $DATABASE_URL
|
|
```
|
|
|
|
### Storage Issues
|
|
```bash
|
|
# Check MinIO
|
|
kubectl port-forward -n datalake svc/minio 9000:9000
|
|
# Access http://localhost:9000
|
|
```
|
|
|
|
---
|
|
|
|
## Security Considerations
|
|
|
|
1. **Use secrets management:**
|
|
- Kubernetes Secrets
|
|
- AWS Secrets Manager
|
|
- HashiCorp Vault
|
|
|
|
2. **Enable TLS:**
|
|
- Configure ingress with TLS certificates
|
|
- Use cert-manager for automatic certificates
|
|
|
|
3. **Network policies:**
|
|
- Restrict pod-to-pod communication
|
|
- Limit external access
|
|
|
|
4. **RBAC:**
|
|
- Configure Kubernetes RBAC
|
|
- Limit service account permissions
|
|
|
|
---
|
|
|
|
## Performance Tuning
|
|
|
|
### Database
|
|
- Increase connection pool size
|
|
- Add database indexes
|
|
- Configure autovacuum
|
|
|
|
### API
|
|
- Increase replica count
|
|
- Configure horizontal pod autoscaling
|
|
- Adjust resource requests/limits
|
|
|
|
### Storage
|
|
- Use CDN for frequently accessed files
|
|
- Configure S3 Transfer Acceleration
|
|
- Optimize MinIO deployment
|