Skip to main content

Environment Management

Orchestr8 provides sophisticated multi-environment deployment strategies that enable teams to safely promote applications from development to production while maintaining consistency, security, and compliance across all stages.

Environment Philosophy

Environment as Code

Every environment is defined declaratively in Git:

  • Infrastructure configuration in Terraform/Pulumi
  • Platform configuration in Helm values
  • Application configuration in environment-specific overlays
  • Security policies enforced consistently across environments

Progressive Delivery

Applications flow through environments with increasing production-likeness:

Development → Integration → Staging → Production
↓ ↓ ↓ ↓
Fast feedback Integration Production Live traffic
Unit tests testing simulation monitoring
Developer API tests Load tests SLAs
access E2E tests Security Compliance

Environment Types

Development Environment

Purpose: Rapid iteration and developer productivity

Characteristics:

  • Fast deployment: Changes deploy within seconds
  • Relaxed security: Developer-friendly debugging access
  • Resource efficient: Minimal resource allocation
  • Data isolation: Synthetic or anonymized test data
# dev environment configuration
environments:
dev:
cluster: dev-cluster
namespace: my-service-dev

# Resource constraints for cost efficiency
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi

# Relaxed security for debugging
security:
podSecurityStandard: baseline
debugging: enabled

# Fast iteration settings
deployment:
strategy: RollingUpdate
maxUnavailable: 50%
maxSurge: 100%

# Developer access
access:
developers: read-write
qa: read-only

Integration Environment

Purpose: Automated testing and validation

Characteristics:

  • Automated testing: CI/CD pipeline integration
  • Service integration: Multiple services working together
  • Data consistency: Stable test datasets
  • Quality gates: Automated quality checks
environments:
integration:
cluster: shared-cluster
namespace: my-service-int

# Test-optimized resources
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi

# Test data management
data:
source: test-dataset-v1
refresh: daily
anonymization: enabled

# Integration testing
testing:
smoke_tests: enabled
integration_tests: enabled
performance_tests: basic
security_scans: enabled

# Automated promotion
promotion:
on_success: staging
on_failure: block

Staging Environment

Purpose: Production simulation and final validation

Characteristics:

  • Production parity: Mirrors production configuration
  • Performance testing: Load and stress testing
  • Security validation: Full security posture
  • User acceptance: Business stakeholder testing
environments:
staging:
cluster: staging-cluster
namespace: my-service-staging

# Production-like resources
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi

# Production security
security:
podSecurityStandard: restricted
networkPolicies: strict
secretsManagement: external

# Production-like testing
testing:
load_tests: enabled
chaos_engineering: enabled
penetration_tests: weekly

# Manual approval gates
promotion:
approvers: [platform-team, security-team]
approval_required: true

Production Environment

Purpose: Live traffic and business operations

Characteristics:

  • High availability: Multi-zone deployment
  • Monitoring: Comprehensive observability
  • Security: Maximum security posture
  • Compliance: Full audit trail and controls
environments:
production:
cluster: prod-cluster-primary
failover_cluster: prod-cluster-secondary
namespace: my-service

# Production resources
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi

# High availability
deployment:
replicas: 3
strategy: RollingUpdate
maxUnavailable: 1
maxSurge: 1

# Production security
security:
podSecurityStandard: restricted
networkPolicies: strict
secretsManagement: vault
compliance: [soc2, pci-dss]

# Monitoring and alerting
monitoring:
sla_objectives:
availability: 99.9%
latency_p95: 500ms
error_rate: <0.1%

# Change management
deployment:
change_window: business-hours
rollback_enabled: true
canary_deployment: true

Environment Configuration

Configuration Hierarchy

Orchestr8 uses a layered configuration approach:

base configuration (defaults)
├── environment overrides
├── cluster-specific settings
├── region-specific values
└── runtime secrets

Kustomize Integration

# kustomization.yaml for staging
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization

resources:
- ../../base

patches:
- target:
kind: Deployment
name: my-service
patch: |-
- op: replace
path: /spec/replicas
value: 2
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 1Gi

configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=info
- DB_POOL_SIZE=10
- FEATURE_FLAG_NEW_UI=true

Helm Values Override

# environments/staging/values.yaml
global:
environment: staging
cluster: staging-cluster

# Application configuration
app:
replicaCount: 2
image:
tag: "v1.2.3"

resources:
limits:
memory: 1Gi
cpu: 1000m
requests:
memory: 512Mi
cpu: 200m

# Environment-specific features
features:
debug_mode: false
performance_monitoring: true
synthetic_data: true

# Database configuration
database:
host: staging-db.internal
ssl_mode: require
connection_pool: 20

Promotion Strategies

Automated Promotion Pipeline

GitOps Promotion

# Automated promotion via ArgoCD
o8 environment promote my-service \
--from dev \
--to staging \
--auto-approve

# Manual promotion with approval
o8 environment promote my-service \
--from staging \
--to production \
--require-approval \
--approvers platform-team,security-team

Blue-Green Deployment

# Blue-Green deployment configuration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-service
spec:
strategy:
blueGreen:
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-service
activeService: my-service-active
previewService: my-service-preview

Environment Security

Network Isolation

Each environment operates in isolated network segments:

# Environment-specific network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: environment-isolation
namespace: my-service-prod
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
# Only allow traffic from production ingress
- from:
- namespaceSelector:
matchLabels:
environment: production
egress:
# Only allow traffic to production services
- to:
- namespaceSelector:
matchLabels:
environment: production

Secret Management by Environment

# Environment-specific secret store
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-staging
namespace: my-service-staging
spec:
provider:
vault:
server: "https://vault.staging.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes-staging"
role: "staging-secret-reader"

RBAC by Environment

# Staging environment access
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: staging-developers
namespace: my-service-staging
subjects:
- kind: Group
name: staging-developers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: developer
apiGroup: rbac.authorization.k8s.io

Data Management

Test Data Strategy

# Test data configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: test-data-config
data:
data_strategy.yaml: |
environments:
dev:
data_source: synthetic
refresh_frequency: daily
anonymization: basic

staging:
data_source: production_snapshot
refresh_frequency: weekly
anonymization: full
retention: 30_days

production:
data_source: live
backup_frequency: hourly
retention: 7_years

Database Per Environment

# Environment-specific database
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-service-db
namespace: my-service-staging
spec:
instances: 2

# Staging-appropriate resources
resources:
requests:
memory: "1Gi"
cpu: 500m
limits:
memory: "2Gi"
cpu: 1

# Staging backup policy
backup:
retentionPolicy: "7d"
barmanObjectStore:
destinationPath: "s3://staging-backups/my-service"

Monitoring and Observability

Environment-Specific Dashboards

# Grafana dashboard for staging
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: staging-overview
spec:
datasources:
- inputName: "DS_PROMETHEUS"
datasourceName: "staging-prometheus"

json: |
{
"dashboard": {
"title": "Staging Environment - My Service",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total{environment=\"staging\"}[5m])"
}
]
}
]
}
}

Environment-Specific Alerts

# Staging alerts (less sensitive than production)
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: staging-alerts
spec:
groups:
- name: staging.rules
rules:
- alert: StagingHighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{environment="staging"}[5m])) > 1
for: 10m
labels:
severity: warning
environment: staging
annotations:
summary: "High latency in staging environment"

Cost Optimization

Right-Sizing by Environment

environments:
dev:
# Minimal resources for development
node_pool:
instance_type: "t3.small"
min_nodes: 1
max_nodes: 3

staging:
# Production-like but smaller
node_pool:
instance_type: "t3.large"
min_nodes: 2
max_nodes: 5

production:
# Full production resources
node_pool:
instance_type: "c5.2xlarge"
min_nodes: 3
max_nodes: 10

Auto-Scaling by Usage

# Development auto-shutdown
apiVersion: batch/v1
kind: CronJob
metadata:
name: dev-environment-shutdown
spec:
schedule: "0 20 * * 1-5" # 8 PM weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: shutdown
image: kubectl:latest
command:
- kubectl
- scale
- deployment
- --all
- --replicas=0
- -n
- my-service-dev

Best Practices

Environment Parity

  1. Infrastructure as Code: Use identical infrastructure definitions
  2. Configuration Management: Minimize environment-specific differences
  3. Dependency Versions: Pin versions consistently across environments
  4. Security Policies: Apply consistent security baselines

Change Management

  1. Progressive Rollout: Deploy to environments in order
  2. Quality Gates: Automated checks between environments
  3. Rollback Plans: Quick rollback procedures for each environment
  4. Change Windows: Scheduled maintenance windows for production

Monitoring and Alerting

  1. Environment Labels: Tag all metrics with environment labels
  2. Alert Sensitivity: Different thresholds for different environments
  3. Dashboards: Environment-specific monitoring dashboards
  4. Log Aggregation: Centralized logging with environment filtering

Troubleshooting

Common Environment Issues

Configuration Drift

# Compare configurations between environments
o8 environment diff staging production

# Sync configuration from Git
o8 environment sync staging --force

Resource Constraints

# Check resource usage by environment
o8 environment resources --environment staging

# Scale environment resources
o8 environment scale staging --nodes 3

Network Connectivity

# Test connectivity between environments
o8 network test --from staging --to production

# Debug network policies
o8 network policies --environment staging --debug

Next Steps