Systems Availability and Capacity Management is the discipline of ensuring IT systems are operational when needed (availability) and that sufficient resources exist to meet current and future workload demands without degradation (capacity). Availability is expressed as a percentage of uptime over a defined period — '99.9% availability' means no more than 8.77 hours of unplanned downtime per year. Capacity management is the continuous practice of forecasting demand, measuring utilization, and provisioning resources proactively so that performance SLAs are met before constraints are hit, not after. Together, these two practices form the operational backbone that determines whether systems can reliably deliver business value.
Where it stops · what it isn't
- —IS: Proactive planning and monitoring of resource utilization (CPU, memory, storage, network, database connections) against defined SLA/SLO thresholds
- —IS: Defining RPO (Recovery Point Objective) and RTO (Recovery Time Objective) per business criticality tier and designing redundancy accordingly
- —IS: Trend analysis and statistical forecasting 12–24 months out, including seasonal and event-driven demand spikes
- —IS: Governance of cloud capacity commitments (reserved instances, auto-scaling policies, multi-region load distribution)
- —IS NOT: Incident response or break-fix operations — those belong to Problem and Incident Management, though capacity events do trigger incidents
- —IS NOT: Disaster Recovery planning itself — DR is a sibling competency; capacity management defines the resource baseline that DR plans must replicate
- —IS NOT: Application performance engineering or code optimization — capacity management governs infrastructure resources, not software architecture
- —IS NOT: IT financial management or FinOps — capacity management informs cloud spend decisions, but cost governance is a separate function
Connected concepts in the graph
Every cubelet sits in a knowledge graph. Here's what this one connects to.
REQUIRESProblem and Incident ManagementIT Components (Infrastructure)
ENABLESIT Service Level ManagementBusiness Continuity PlanningDisaster Recovery Plans
PART OFSystem and Operational Resilience
RELATED TODatabase Management
CONSTRAINSIT Change Management