GitOps CI/CD with Flask, Kubernetes, and Webhook Orchestration

Table of Contents
“Manual deployments are technical debt with compound interest. Every time you run
kubectl applyby hand you’re borrowing against future reliability.”
This is a full breakdown of a push-to-deploy GitOps pipeline on Kubernetes â Flask webhook orchestration server, isolated test namespace with resource quotas, RBAC scoped to minimum permissions, network policy isolation between test and production, and blue-green deployment with automated rollback. Built because the manual process was unsustainable, documented because the failure modes are worth knowing.
The Problem with Manual Deployments#
# The old process
docker build -t my-app:v1.2.3 .
docker push my-registry/my-app:v1.2.3
kubectl set image deployment/my-app my-app=my-registry/my-app:v1.2.3
kubectl rollout status deployment/my-app
# Realize config map wasn't updated
kubectl apply -f configmap.yaml
kubectl rollout restart deployment/my-app
# Watch pods crashloop
kubectl get pods --watch
The failure modes compound. You forget a config map. You push to the wrong environment. You apply a manifest that was edited locally and never committed. Manual processes don’t just create toil â they create inconsistency, and inconsistency is where incidents come from.
GitOps fixes this at the source: Git is the single source of truth. If it’s not committed, it doesn’t exist in the cluster. Every deployment is auditable, every rollback is a revert.
Architecture#
Git Push
â
GitLab Webhook (HTTPS + signature verification)
â
Flask Orchestration Server
ââ Signature validation
ââ Payload parsing
ââ Pipeline triggering
â
Kubernetes â test namespace
ââ Clone repo
ââ Run tests
ââ Build image
â
Kubernetes â production namespace
ââ Blue-green rollout
ââ Health checks
ââ Automatic rollback on failure
Flask Orchestration Server#
Flask handles incoming webhook events, validates them, and triggers Kubernetes jobs. Lightweight, containerizable, easy to deploy inside the cluster.
from flask import Flask, request, jsonify
import subprocess
import logging
import hmac
import hashlib
import os
app = Flask(__name__)
app.logger.setLevel(logging.INFO)
WEBHOOK_SECRET = os.getenv('WEBHOOK_SECRET')
def verify_webhook_signature(payload: bytes, signature: str) -> bool:
"""Verify webhook payload came from GitLab"""
expected = hmac.new(
WEBHOOK_SECRET.encode(),
payload,
hashlib.sha256
).hexdigest()
return hmac.compare_digest(expected, signature)
def trigger_test_pipeline(repo_url: str, branch: str, commit_sha: str) -> bool:
app.logger.info(f"Pipeline: {repo_url}@{branch} ({commit_sha[:8]})")
result = subprocess.run([
'kubectl', 'create', 'job', f'test-{commit_sha[:8]}',
'--image=python:3.9-slim',
'--namespace=test',
'--env', f'REPO_URL={repo_url}',
'--env', f'BRANCH={branch}',
'--env', f'COMMIT_SHA={commit_sha}',
'--', 'sh', '-c',
'git clone $REPO_URL -b $BRANCH /app && cd /app && pip install -r requirements.txt && python -m pytest tests/ -v'
], capture_output=True, text=True, timeout=30)
if result.returncode != 0:
app.logger.error(f"Pipeline trigger failed: {result.stderr}")
return False
return True
@app.route('/webhook', methods=['POST'])
def handle_webhook():
signature = request.headers.get('X-GitLab-Token', '')
if not verify_webhook_signature(request.get_data(), signature):
app.logger.warning("Invalid webhook signature â rejected")
return "Invalid signature", 401
payload = request.json
if not payload or 'repository' not in payload:
return "Invalid payload", 400
branch = payload['ref'].split('/')[-1]
commit_sha = payload['after']
repo_url = payload['repository']['git_http_url']
success = trigger_test_pipeline(repo_url, branch, commit_sha)
return jsonify({"status": "started" if success else "failed"}), 202
On ssl_context='adhoc': The Flask dev server supports ssl_context='adhoc' for quick local TLS testing â it’s not for production. In production, run Flask behind Nginx or a Kubernetes Ingress controller with a proper cert. The orchestration server itself runs on HTTP internally; TLS termination happens at the ingress layer.
Webhook signature verification is not optional. Without it, any HTTP client that discovers your endpoint can trigger a deployment. I skipped this initially and a web crawler hit the endpoint and triggered a pipeline against stale code. Verify signatures before touching the payload.
Kubernetes Test Environment#
Namespace and Resource Quotas#
kubectl create namespace test
apiVersion: v1
kind: ResourceQuota
metadata:
name: test-quota
namespace: test
spec:
hard:
requests.cpu: "2"
requests.memory: 4Gi
limits.cpu: "4"
limits.memory: 8Gi
pods: "10"
Resource limits on the namespace, not just the pod. A test with an infinite loop and no limits will consume all available cluster memory. Ask me how I know.
Test Runner Job#
apiVersion: batch/v1
kind: Job
metadata:
name: test-runner
namespace: test
spec:
backoffLimit: 2
ttlSecondsAfterFinished: 3600
template:
spec:
restartPolicy: Never
containers:
- name: test-runner
image: python:3.9-slim
env:
- name: REPO_URL
value: "https://gitlab.com/your-repo.git"
- name: BRANCH
value: "main"
- name: COMMIT_SHA
value: "abc123"
command: ["sh", "-c"]
args:
- |
set -e
git clone $REPO_URL -b $BRANCH /test-code
cd /test-code
pip install -r requirements.txt
python -m pytest tests/ -v --junitxml=test-results.xml
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
volumes:
- name: test-results
emptyDir: {}
ttlSecondsAfterFinished: 3600 â finished jobs clean themselves up after an hour. Without this, failed jobs accumulate and eventually exhaust the namespace quota, blocking new jobs from starting.
set -e in the shell script â exits on any non-zero return code. Without it, a failed pip install continues to the test run and you get misleading failures.
RBAC â Minimum Permissions#
The orchestration server needs specific permissions in specific namespaces. Not cluster-admin. Not wildcard resources.
apiVersion: v1
kind: ServiceAccount
metadata:
name: orchestration-sa
namespace: orchestration
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: test
name: test-job-manager
rules:
- apiGroups: ["batch"]
resources: ["jobs"]
verbs: ["create", "delete", "list"]
- apiGroups: [""]
resources: ["pods", "pods/log"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
namespace: production
name: production-deployer
rules:
- apiGroups: ["apps"]
resources: ["deployments"]
verbs: ["get", "patch", "list"]
- apiGroups: [""]
resources: ["services", "configmaps"]
verbs: ["get", "list"]
---
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: orchestration-test-binding
namespace: test
subjects:
- kind: ServiceAccount
name: orchestration-sa
namespace: orchestration
roleRef:
kind: Role
name: test-job-manager
apiGroup: rbac.authorization.k8s.io
The orchestration service account can create and delete jobs in test, and patch deployments in production. It cannot access secrets, delete namespaces, or touch any other namespace. If the orchestration server is compromised, the blast radius is contained to those two verbs in those two namespaces.
Network Policy â Namespace Isolation#
Test jobs should not be able to reach production databases, internal services, or anything outside what they need.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: test-namespace-isolation
namespace: test
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: orchestration
egress:
- to:
- namespaceSelector:
matchLabels:
name: production
ports:
- protocol: TCP
port: 443
Ingress to the test namespace: only from the orchestration namespace. Egress from test: only to production on 443. A compromised test job cannot reach your production database or internal APIs.
Blue-Green Production Deployment#
def deploy_to_production(image_tag: str):
# Apply green deployment
subprocess.run([
'kubectl', 'apply', '-f',
f'deployment-green-{image_tag}.yaml'
], check=True)
# Wait for green to be healthy
subprocess.run([
'kubectl', 'rollout', 'status',
'deployment/green-deployment',
'--timeout=300s'
], check=True)
# Cut traffic to green
subprocess.run([
'kubectl', 'patch', 'service', 'my-app',
'-p', '{"spec":{"selector":{"version":"green"}}}'
], check=True)
# Blue stays running for immediate rollback
app.logger.info(f"Deployed {image_tag} â blue retained for rollback")
apiVersion: apps/v1
kind: Deployment
metadata:
name: my-app
spec:
minReadySeconds: 10
revisionHistoryLimit: 3
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
spec:
containers:
- name: my-app
image: my-registry/my-app:v1.2.3
livenessProbe:
httpGet:
path: /health
port: 8080
initialDelaySeconds: 30
periodSeconds: 10
failureThreshold: 3
readinessProbe:
httpGet:
path: /ready
port: 8080
initialDelaySeconds: 5
periodSeconds: 5
maxUnavailable: 0 â no pods go down before new ones are up. Zero-downtime rollout.
revisionHistoryLimit: 3 â keeps the last three ReplicaSets for fast rollback.
minReadySeconds: 10 â pod must be ready for 10 seconds before Kubernetes considers it stable. Prevents a pod that crashes after startup from being counted as healthy.
Kubernetes handles rollback automatically when health checks fail. Blue deployment stays running until green is confirmed stable â instant traffic switch back if needed.
Troubleshooting Reference#
Webhook timeout â GitLab reports failure:
# Flask dev server can't handle concurrent long-running requests
# Use a production WSGI server
from gevent.pywsgi import WSGIServer
http_server = WSGIServer(('0.0.0.0', 5000), app)
http_server.serve_forever()
Jobs stuck in Pending â resource quota exhausted:
kubectl describe resourcequota test-quota -n test
# Check which resources are at limit
kubectl delete jobs --field-selector status.successful=1 -n test
ttlSecondsAfterFinished prevents this accumulation if set correctly from the start.
ImagePullBackOff on test jobs:
# Attach pull secret to the test runner service account
apiVersion: v1
kind: ServiceAccount
metadata:
name: test-runner-sa
namespace: test
imagePullSecrets:
- name: registry-credentials
Git authentication in Kubernetes jobs:
volumes:
- name: git-credentials
secret:
secretName: git-credentials
defaultMode: 0400
Mount credentials as a read-only secret volume. Never pass credentials as environment variables â they show up in kubectl describe pod output.
Results#
| Metric | Manual | Automated |
|---|---|---|
| Deployment time | 15â30 min | 2â5 min |
| Deployment frequency | Weekly | Multiple daily |
| Rollback time | 10â15 min | 30 seconds |
| Failed production deployments | Regular | 2 in 150+ runs, both auto-rolled back |
150+ pipeline runs over the project lifetime. The two production failures were both caught by health checks and rolled back automatically before they affected users.
What I’d Do Differently#
External secrets manager from the start. Git credentials and webhook secrets mounted as Kubernetes secrets work, but Secrets Manager or Vault with rotation is the right answer for anything beyond a personal project.
Pipeline observability from day one. I added metrics and alerting after the fact. Build time, success rate, and failure reasons should be instrumented before the pipeline handles real workloads.
Staging environment between test and production. The current setup goes test â production. A staging namespace that mirrors production configuration would catch environment-specific failures before they reach prod.
Source#
Full code and manifests on GitHub .
Tags#
#Infrastructure #Kubernetes #GitOps #CI/CD #PlatformEngineering #Python
About the Author#
Elijah Udom (elijahu) is an Infrastructure & Cloud Engineer based in Lagos, Nigeria. AWS, Kubernetes, eBPF security, AI/ML infrastructure. Building in the open.

Navigation#
â Previous: Self-Hosting Gitea on AWS | Next: AWS Security Auditor â