Files
warehouse13/DEPLOYMENT.md
2025-10-14 15:37:37 -05:00

8.6 KiB

Deployment Guide

This guide covers deploying the Test Artifact Data Lake in various environments.

Table of Contents


Local Development

Prerequisites

  • Python 3.11+
  • PostgreSQL 15+
  • MinIO or AWS S3 access

Steps

  1. Create virtual environment:
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate
  1. Install dependencies:
pip install -r requirements.txt
  1. Set up PostgreSQL:
createdb datalake
  1. Configure environment:
cp .env.example .env
# Edit .env with your configuration
  1. Run the application:
python -m uvicorn app.main:app --reload

Docker Compose

Quick Start

  1. Start all services:
docker-compose up -d
  1. Check logs:
docker-compose logs -f api
  1. Stop services:
docker-compose down

Services Included

  • PostgreSQL (port 5432)
  • MinIO (port 9000, console 9001)
  • API (port 8000)

Customization

Edit docker-compose.yml to:

  • Change port mappings
  • Adjust resource limits
  • Add environment variables
  • Configure volumes

Kubernetes/Helm

Prerequisites

  • Kubernetes cluster (1.24+)
  • Helm 3.x
  • kubectl configured

Installation

  1. Add dependencies (if using PostgreSQL/MinIO from Bitnami):
helm repo add bitnami https://charts.bitnami.com/bitnami
helm repo update
  1. Install with default values:
helm install datalake ./helm \
  --namespace datalake \
  --create-namespace
  1. Custom installation:
helm install datalake ./helm \
  --namespace datalake \
  --create-namespace \
  --set image.repository=your-registry/datalake \
  --set image.tag=1.0.0 \
  --set ingress.enabled=true \
  --set ingress.hosts[0].host=datalake.yourdomain.com

Configuration Options

Image:

--set image.repository=your-registry/datalake
--set image.tag=1.0.0
--set image.pullPolicy=Always

Resources:

--set resources.requests.cpu=1000m
--set resources.requests.memory=1Gi
--set resources.limits.cpu=2000m
--set resources.limits.memory=2Gi

Autoscaling:

--set autoscaling.enabled=true
--set autoscaling.minReplicas=3
--set autoscaling.maxReplicas=10
--set autoscaling.targetCPUUtilizationPercentage=80

Ingress:

--set ingress.enabled=true
--set ingress.className=nginx
--set ingress.hosts[0].host=datalake.example.com
--set ingress.hosts[0].paths[0].path=/
--set ingress.hosts[0].paths[0].pathType=Prefix

Upgrade

helm upgrade datalake ./helm \
  --namespace datalake \
  --set image.tag=1.1.0

Uninstall

helm uninstall datalake --namespace datalake

AWS Deployment

Using AWS S3 Storage

  1. Create S3 bucket:
aws s3 mb s3://your-test-artifacts-bucket
  1. Create IAM user with S3 access:
aws iam create-user --user-name datalake-service
aws iam attach-user-policy --user-name datalake-service \
  --policy-arn arn:aws:iam::aws:policy/AmazonS3FullAccess
  1. Generate access keys:
aws iam create-access-key --user-name datalake-service
  1. Deploy with Helm:
helm install datalake ./helm \
  --namespace datalake \
  --create-namespace \
  --set config.storageBackend=s3 \
  --set aws.enabled=true \
  --set aws.accessKeyId=YOUR_ACCESS_KEY \
  --set aws.secretAccessKey=YOUR_SECRET_KEY \
  --set aws.region=us-east-1 \
  --set aws.bucketName=your-test-artifacts-bucket \
  --set minio.enabled=false

Using EKS

  1. Create EKS cluster:
eksctl create cluster \
  --name datalake-cluster \
  --region us-east-1 \
  --nodegroup-name standard-workers \
  --node-type t3.medium \
  --nodes 3
  1. Configure kubectl:
aws eks update-kubeconfig --name datalake-cluster --region us-east-1
  1. Deploy application:
helm install datalake ./helm \
  --namespace datalake \
  --create-namespace \
  --set config.storageBackend=s3

Using RDS for PostgreSQL

helm install datalake ./helm \
  --namespace datalake \
  --create-namespace \
  --set postgresql.enabled=false \
  --set config.databaseUrl="postgresql://user:pass@your-rds-endpoint:5432/datalake"

Self-Hosted Deployment

Using MinIO

  1. Deploy MinIO:
helm install minio bitnami/minio \
  --namespace datalake \
  --create-namespace \
  --set auth.rootUser=admin \
  --set auth.rootPassword=adminpassword \
  --set persistence.size=100Gi
  1. Deploy application:
helm install datalake ./helm \
  --namespace datalake \
  --set config.storageBackend=minio \
  --set minio.enabled=false \
  --set minio.endpoint=minio:9000 \
  --set minio.accessKey=admin \
  --set minio.secretKey=adminpassword

On-Premise Kubernetes

  1. Prepare persistent volumes:
apiVersion: v1
kind: PersistentVolume
metadata:
  name: datalake-postgres-pv
spec:
  capacity:
    storage: 20Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: /data/postgres
  1. Deploy with local storage:
helm install datalake ./helm \
  --namespace datalake \
  --create-namespace \
  --set postgresql.persistence.storageClass=local-storage \
  --set minio.persistence.storageClass=local-storage

GitLab CI/CD

Setup

  1. Configure GitLab variables:

Go to Settings → CI/CD → Variables and add:

Variable Description Protected Masked
CI_REGISTRY_USER Docker registry username No No
CI_REGISTRY_PASSWORD Docker registry password No Yes
KUBE_CONFIG_DEV Base64 kubeconfig for dev No Yes
KUBE_CONFIG_STAGING Base64 kubeconfig for staging Yes Yes
KUBE_CONFIG_PROD Base64 kubeconfig for prod Yes Yes
  1. Encode kubeconfig:
cat ~/.kube/config | base64 -w 0

Pipeline Stages

  1. Test: Runs on all branches and MRs
  2. Build: Builds Docker image on main/develop/tags
  3. Deploy: Manual deployment to dev/staging/prod

Deployment Flow

Development:

git push origin develop
# Manually trigger deploy:dev job in GitLab

Staging:

git push origin main
# Manually trigger deploy:staging job in GitLab

Production:

git tag v1.0.0
git push origin v1.0.0
# Manually trigger deploy:prod job in GitLab

Customizing Pipeline

Edit .gitlab-ci.yml to:

  • Add more test stages
  • Change deployment namespaces
  • Adjust Helm values per environment
  • Add security scanning
  • Configure rollback procedures

Monitoring

Health Checks

# Kubernetes
kubectl get pods -n datalake
kubectl logs -f -n datalake deployment/datalake

# Direct
curl http://localhost:8000/health

Metrics

Add Prometheus monitoring:

helm install datalake ./helm \
  --set metrics.enabled=true \
  --set serviceMonitor.enabled=true

Backup and Recovery

Database Backup

# PostgreSQL
kubectl exec -n datalake deployment/datalake-postgresql -- \
  pg_dump -U user datalake > backup.sql

# Restore
kubectl exec -i -n datalake deployment/datalake-postgresql -- \
  psql -U user datalake < backup.sql

Storage Backup

S3:

aws s3 sync s3://your-bucket s3://backup-bucket

MinIO:

mc mirror minio/test-artifacts backup/test-artifacts

Troubleshooting

Pod Not Starting

kubectl describe pod -n datalake <pod-name>
kubectl logs -n datalake <pod-name>

Database Connection Issues

kubectl exec -it -n datalake deployment/datalake -- \
  psql $DATABASE_URL

Storage Issues

# Check MinIO
kubectl port-forward -n datalake svc/minio 9000:9000
# Access http://localhost:9000

Security Considerations

  1. Use secrets management:

    • Kubernetes Secrets
    • AWS Secrets Manager
    • HashiCorp Vault
  2. Enable TLS:

    • Configure ingress with TLS certificates
    • Use cert-manager for automatic certificates
  3. Network policies:

    • Restrict pod-to-pod communication
    • Limit external access
  4. RBAC:

    • Configure Kubernetes RBAC
    • Limit service account permissions

Performance Tuning

Database

  • Increase connection pool size
  • Add database indexes
  • Configure autovacuum

API

  • Increase replica count
  • Configure horizontal pod autoscaling
  • Adjust resource requests/limits

Storage

  • Use CDN for frequently accessed files
  • Configure S3 Transfer Acceleration
  • Optimize MinIO deployment