Environment Management
Orchestr8 provides sophisticated multi-environment deployment strategies that enable teams to safely promote applications from development to production while maintaining consistency, security, and compliance across all stages.
Environment Philosophy
Environment as Code
Every environment is defined declaratively in Git:
- Infrastructure configuration in Terraform/Pulumi
- Platform configuration in Helm values
- Application configuration in environment-specific overlays
- Security policies enforced consistently across environments
Progressive Delivery
Applications flow through environments with increasing production-likeness:
Development → Integration → Staging → Production
↓ ↓ ↓ ↓
Fast feedback Integration Production Live traffic
Unit tests testing simulation monitoring
Developer API tests Load tests SLAs
access E2E tests Security Compliance
Environment Types
Development Environment
Purpose: Rapid iteration and developer productivity
Characteristics:
- Fast deployment: Changes deploy within seconds
- Relaxed security: Developer-friendly debugging access
- Resource efficient: Minimal resource allocation
- Data isolation: Synthetic or anonymized test data
# dev environment configuration
environments:
dev:
cluster: dev-cluster
namespace: my-service-dev
# Resource constraints for cost efficiency
resources:
requests:
cpu: 50m
memory: 128Mi
limits:
cpu: 200m
memory: 256Mi
# Relaxed security for debugging
security:
podSecurityStandard: baseline
debugging: enabled
# Fast iteration settings
deployment:
strategy: RollingUpdate
maxUnavailable: 50%
maxSurge: 100%
# Developer access
access:
developers: read-write
qa: read-only
Integration Environment
Purpose: Automated testing and validation
Characteristics:
- Automated testing: CI/CD pipeline integration
- Service integration: Multiple services working together
- Data consistency: Stable test datasets
- Quality gates: Automated quality checks
environments:
integration:
cluster: shared-cluster
namespace: my-service-int
# Test-optimized resources
resources:
requests:
cpu: 100m
memory: 256Mi
limits:
cpu: 500m
memory: 512Mi
# Test data management
data:
source: test-dataset-v1
refresh: daily
anonymization: enabled
# Integration testing
testing:
smoke_tests: enabled
integration_tests: enabled
performance_tests: basic
security_scans: enabled
# Automated promotion
promotion:
on_success: staging
on_failure: block
Staging Environment
Purpose: Production simulation and final validation
Characteristics:
- Production parity: Mirrors production configuration
- Performance testing: Load and stress testing
- Security validation: Full security posture
- User acceptance: Business stakeholder testing
environments:
staging:
cluster: staging-cluster
namespace: my-service-staging
# Production-like resources
resources:
requests:
cpu: 200m
memory: 512Mi
limits:
cpu: 1000m
memory: 1Gi
# Production security
security:
podSecurityStandard: restricted
networkPolicies: strict
secretsManagement: external
# Production-like testing
testing:
load_tests: enabled
chaos_engineering: enabled
penetration_tests: weekly
# Manual approval gates
promotion:
approvers: [platform-team, security-team]
approval_required: true
Production Environment
Purpose: Live traffic and business operations
Characteristics:
- High availability: Multi-zone deployment
- Monitoring: Comprehensive observability
- Security: Maximum security posture
- Compliance: Full audit trail and controls
environments:
production:
cluster: prod-cluster-primary
failover_cluster: prod-cluster-secondary
namespace: my-service
# Production resources
resources:
requests:
cpu: 500m
memory: 1Gi
limits:
cpu: 2000m
memory: 2Gi
# High availability
deployment:
replicas: 3
strategy: RollingUpdate
maxUnavailable: 1
maxSurge: 1
# Production security
security:
podSecurityStandard: restricted
networkPolicies: strict
secretsManagement: vault
compliance: [soc2, pci-dss]
# Monitoring and alerting
monitoring:
sla_objectives:
availability: 99.9%
latency_p95: 500ms
error_rate: <0.1%
# Change management
deployment:
change_window: business-hours
rollback_enabled: true
canary_deployment: true
Environment Configuration
Configuration Hierarchy
Orchestr8 uses a layered configuration approach:
base configuration (defaults)
├── environment overrides
├── cluster-specific settings
├── region-specific values
└── runtime secrets
Kustomize Integration
# kustomization.yaml for staging
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
resources:
- ../../base
patches:
- target:
kind: Deployment
name: my-service
patch: |-
- op: replace
path: /spec/replicas
value: 2
- op: replace
path: /spec/template/spec/containers/0/resources/limits/memory
value: 1Gi
configMapGenerator:
- name: app-config
literals:
- LOG_LEVEL=info
- DB_POOL_SIZE=10
- FEATURE_FLAG_NEW_UI=true
Helm Values Override
# environments/staging/values.yaml
global:
environment: staging
cluster: staging-cluster
# Application configuration
app:
replicaCount: 2
image:
tag: "v1.2.3"
resources:
limits:
memory: 1Gi
cpu: 1000m
requests:
memory: 512Mi
cpu: 200m
# Environment-specific features
features:
debug_mode: false
performance_monitoring: true
synthetic_data: true
# Database configuration
database:
host: staging-db.internal
ssl_mode: require
connection_pool: 20
Promotion Strategies
Automated Promotion Pipeline
GitOps Promotion
# Automated promotion via ArgoCD
o8 environment promote my-service \
--from dev \
--to staging \
--auto-approve
# Manual promotion with approval
o8 environment promote my-service \
--from staging \
--to production \
--require-approval \
--approvers platform-team,security-team
Blue-Green Deployment
# Blue-Green deployment configuration
apiVersion: argoproj.io/v1alpha1
kind: Rollout
metadata:
name: my-service
spec:
strategy:
blueGreen:
autoPromotionEnabled: false
scaleDownDelaySeconds: 30
prePromotionAnalysis:
templates:
- templateName: success-rate
args:
- name: service-name
value: my-service
activeService: my-service-active
previewService: my-service-preview
Environment Security
Network Isolation
Each environment operates in isolated network segments:
# Environment-specific network policy
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: environment-isolation
namespace: my-service-prod
spec:
podSelector: {}
policyTypes:
- Ingress
- Egress
ingress:
# Only allow traffic from production ingress
- from:
- namespaceSelector:
matchLabels:
environment: production
egress:
# Only allow traffic to production services
- to:
- namespaceSelector:
matchLabels:
environment: production
Secret Management by Environment
# Environment-specific secret store
apiVersion: external-secrets.io/v1beta1
kind: SecretStore
metadata:
name: vault-staging
namespace: my-service-staging
spec:
provider:
vault:
server: "https://vault.staging.company.com"
path: "secret"
version: "v2"
auth:
kubernetes:
mountPath: "kubernetes-staging"
role: "staging-secret-reader"
RBAC by Environment
# Staging environment access
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: staging-developers
namespace: my-service-staging
subjects:
- kind: Group
name: staging-developers
apiGroup: rbac.authorization.k8s.io
roleRef:
kind: ClusterRole
name: developer
apiGroup: rbac.authorization.k8s.io
Data Management
Test Data Strategy
# Test data configuration
apiVersion: v1
kind: ConfigMap
metadata:
name: test-data-config
data:
data_strategy.yaml: |
environments:
dev:
data_source: synthetic
refresh_frequency: daily
anonymization: basic
staging:
data_source: production_snapshot
refresh_frequency: weekly
anonymization: full
retention: 30_days
production:
data_source: live
backup_frequency: hourly
retention: 7_years
Database Per Environment
# Environment-specific database
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: my-service-db
namespace: my-service-staging
spec:
instances: 2
# Staging-appropriate resources
resources:
requests:
memory: "1Gi"
cpu: 500m
limits:
memory: "2Gi"
cpu: 1
# Staging backup policy
backup:
retentionPolicy: "7d"
barmanObjectStore:
destinationPath: "s3://staging-backups/my-service"
Monitoring and Observability
Environment-Specific Dashboards
# Grafana dashboard for staging
apiVersion: integreatly.org/v1alpha1
kind: GrafanaDashboard
metadata:
name: staging-overview
spec:
datasources:
- inputName: "DS_PROMETHEUS"
datasourceName: "staging-prometheus"
json: |
{
"dashboard": {
"title": "Staging Environment - My Service",
"panels": [
{
"title": "Request Rate",
"targets": [
{
"expr": "rate(http_requests_total{environment=\"staging\"}[5m])"
}
]
}
]
}
}
Environment-Specific Alerts
# Staging alerts (less sensitive than production)
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: staging-alerts
spec:
groups:
- name: staging.rules
rules:
- alert: StagingHighLatency
expr: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket{environment="staging"}[5m])) > 1
for: 10m
labels:
severity: warning
environment: staging
annotations:
summary: "High latency in staging environment"
Cost Optimization
Right-Sizing by Environment
environments:
dev:
# Minimal resources for development
node_pool:
instance_type: "t3.small"
min_nodes: 1
max_nodes: 3
staging:
# Production-like but smaller
node_pool:
instance_type: "t3.large"
min_nodes: 2
max_nodes: 5
production:
# Full production resources
node_pool:
instance_type: "c5.2xlarge"
min_nodes: 3
max_nodes: 10
Auto-Scaling by Usage
# Development auto-shutdown
apiVersion: batch/v1
kind: CronJob
metadata:
name: dev-environment-shutdown
spec:
schedule: "0 20 * * 1-5" # 8 PM weekdays
jobTemplate:
spec:
template:
spec:
containers:
- name: shutdown
image: kubectl:latest
command:
- kubectl
- scale
- deployment
- --all
- --replicas=0
- -n
- my-service-dev
Best Practices
Environment Parity
- Infrastructure as Code: Use identical infrastructure definitions
- Configuration Management: Minimize environment-specific differences
- Dependency Versions: Pin versions consistently across environments
- Security Policies: Apply consistent security baselines
Change Management
- Progressive Rollout: Deploy to environments in order
- Quality Gates: Automated checks between environments
- Rollback Plans: Quick rollback procedures for each environment
- Change Windows: Scheduled maintenance windows for production
Monitoring and Alerting
- Environment Labels: Tag all metrics with environment labels
- Alert Sensitivity: Different thresholds for different environments
- Dashboards: Environment-specific monitoring dashboards
- Log Aggregation: Centralized logging with environment filtering
Troubleshooting
Common Environment Issues
Configuration Drift
# Compare configurations between environments
o8 environment diff staging production
# Sync configuration from Git
o8 environment sync staging --force
Resource Constraints
# Check resource usage by environment
o8 environment resources --environment staging
# Scale environment resources
o8 environment scale staging --nodes 3
Network Connectivity
# Test connectivity between environments
o8 network test --from staging --to production
# Debug network policies
o8 network policies --environment staging --debug
Next Steps
- Learn about GitOps Fundamentals for deployment automation
- Explore Security Model for environment security
- Review Module System for application packaging
- Check Monitoring for observability setup