A data warehouse is a centralized, integrated repository of structured data organized for analytical querying, historical reporting, and business intelligence—not for day-to-day transaction processing. It consolidates data from multiple source systems (ERP, CRM, POS, IoT, etc.) into a single, consistent, time-stamped store that supports complex queries across large volumes of historical data. Within ISACA's CDPSE framework, data warehousing represents the Data Persistence phase of the Data Lifecycle: the deliberate act of storing collected data in a form that preserves its integrity, supports long-term value extraction, and satisfies compliance obligations. A data warehouse is NOT an operational database (which handles live transactions), NOT a data lake (which stores raw, unstructured data without enforced schema), and NOT a simple archive or backup—it is an actively governed, analytically optimized asset.
Where it stops · what it isn't
- —IS: A subject-oriented, integrated, time-variant, non-volatile analytical data store (Inmon, 1990) optimized for OLAP workloads
- —IS: Inclusive of modern cloud-native platforms (Snowflake, BigQuery, Redshift, Azure Synapse) and on-premises implementations
- —IS: Inclusive of adjacent patterns such as data lakehouses (Delta Lake, Apache Iceberg) where warehouse-grade governance is applied to lake-style storage
- —IS NOT: An OLTP database optimized for INSERT/UPDATE/DELETE throughput
- —IS NOT: A data lake, which stores raw, unvalidated data in any format without enforced schemas or quality guarantees
- —IS NOT: A data mart, which is a subject-specific subset of a warehouse, not a full warehouse
- —IS NOT: A backup or archival system—warehouses are actively queried and governed, not passively stored
- —IS NOT: A real-time streaming platform—though modern warehouses increasingly ingest streaming data, the warehouse is a persistence layer, not a message broker
Connected concepts in the graph
Every cubelet sits in a knowledge graph. Here's what this one connects to.
PART OFData Lifecycle — Data Persistence (ISACA CDPSE Domain 3)
REQUIRESData Collection and Ingestion (upstream lifecycle phase)Data Governance Framework (lineage, quality, metadata management)Data Integration / ETL/ELT Pipelines
ENABLESData Usage and Analytics (downstream lifecycle phase)
RELATED TOData Lake (alternative persistence pattern)Data Lakehouse (convergent persistence pattern)
CONSTRAINSRegulatory Compliance (GDPR, CCPA, HIPAA, BCBS 239 — data residency, retention, access)