DATA MODELING FOR DATA WAREHOUSE (DMDW)
Many enterprises today are building one or more "data warehouses" as a means of dealing with the underlying data mess resulting from their disintegrated application systems and databases. Although the building of a data warehouse will not remedy the underlying systems and data mess, it can be an effective interim strategy to allow the appearance of consistent and integrated data, and provide a data store where different business processes can at least share the same data at the same time.
The principles of correct data organization are every bit as applicable (if not more so) to a data warehouse as they are to designing any shared database. In fact, since data warehouses tend to be searched and accessed in highly spontaneous and ad hoc ways, having a design which is completely sensible to the user is crucial - a mirror image of the real world persons, places, things, concepts and events about which the enterprise must collect and store data. This seminar combines the discipline of good data modeling practice with the additional concerns of identifying the best source data in existing systems, and planning the continuous or periodic movement of data from the source files/databases to the data warehouse.
Data Modeling for Data Warehouse is a skill-building seminar which teaches the familiar three-level approach to data modeling (Conceptual, Logical and Physical) applied to the problem of designing a data warehouse. It integrates the disciplines of entity/relationship modeling, data element analysis and standardization, normalization, and physical database design to fully equip the student to design a stable, flexible, highly sharable, effective, non-redundant data warehouse. Also, the necessary metadata to support the data warehouse (where did this data come from?) is addressed in detail with models and discussion of a repository in which to hold the appropriate metadata. The seminar material is completely independent of any particular vendor's CASE tool or physical data base management system (DBMS).
Because of the large amount of material covered, this seminar is appropriate for experienced data modelers who are/will be engaged in the modeling and construction of data warehouse. It focuses on the special and often confusing issues and differences between a data model independent of the existing systems/data, and a data model for a data warehouse which is constrained by the legacy data/systems which already exist in the business and which will serve as the sources of the data which will be imported into the data warehouse.
TOPICAL OUTLINE
- Background Concepts
- IRM Environment
- Traditional "Application Systems" Approach
- The IRM View
- Data Warehouse Definitions and Concepts
- Data Warehouse as Path Forward to IRM Environment
- The Eight Greatest Misconceptions about Data Warehousing
- Why Data Models are Needed for Data Warehouse Design
- Differences in Data Modeling for True Data Resource vs. Data Warehouse
- Three Data Model Continuity
- Data Modeling Sessions
- Defining the Scope of the Data Warehouse
- Avoiding Redundant Data Warehouses
- Ways of Expressing/Defining Scope
- Determinants of the Scope
- Steps to Define Scope
- Guidelines for Scope Definition
- Conceptual Data Modeling of Data Warehouse
- Purpose, Form and Content
- Conceptual Data Modeling Definitions
- Conceptual Data Modeling Steps
- 1. Detecting and Qualifying Entities
- 2. Diagramming Entities and Relationships
- Relationship Rules
- E/R Diagram Symbols
- Reading the E/R Diagram
- Instance Diagram
- Normal Forms in the E/R Model
- Best-Fit Decisions
- E/R Modeler's Useful Tips and Questions
- Special E/R Constructs
- N-ary Relationship
- Recursive Relationship
- Subtype/Supertype Construct
- Characteristic Entity
- Associative Entity
- Relationship Roles
- Role Entity
- 3. Analyzing and Defining E/R States
- Multiple Entity vs. State Decision
- Multiple Relationship vs. State Decision
- State/Transition Analysis
- Example State/Transition Diagram
- General State Rules
- State Variations
- True Data Resource vs. Data Warehouse Perspective of States
- Capturing State Information in the Metadata Repository
- Example Questions for State/Transition Analysis
- 4. Fully Defining Entities and Relationships
- Entity Definition Pro Forma
- Notes on Primary Keys
- Relationship Definition Pro Forma
- Recording In-Between Rules
- Example Questions for Writing Definitions
- 5. Matching Entities to Candidate Legacy Data Sources
- 6. Reviewing and Stabilizing the DW E/R Model
- Stability Review
- Example Questions for Reviewing Model
- Metadata Repository Meta-E/R Model to Support Conceptual Modeling and Data Source Data
- Logical Data Modeling of Data Warehouse
- Purpose, Form and Content
- Logical Data Modeling Definitions
- Alternative Approaches
- Logical Data Model Diagrams
- Logical Data Modeling Steps
- 1. Identifying Pertinent Dataviews/Transactions
- 2. Analyzing Dataviews/Transactions
- 3. Standardizing and Defining Data Elements
- Detecting and Understanding Need and Meaning
- Determining Entity or Relationship of Residence
- Choosing Best Representation
- Rules for Data Elements
- Handling Different Types of Data Elements
- Data Elements to Represent States
- Fully Defining the Data Element
- Data Element Definition Pro Forma
- Data Element Domains and Synonyms
- Naming the Data Element
- Example Data Element Naming Standard
- Standard Keyword/Abbreviation Glossary
- Checking the Dictionary/Repository for Redundancy
- Matching Data Elements to Candidate Legacy Source Fields/Columns
- Documenting New Data Elements in the Metadata Repository
- Data Element Modeler's Useful Tips and Questions
- 4. Diagramming and Normalizing the Data for the Transaction
- Bubblecharting Symbols
- Bubblechart/Logical Structure Rules
- Building the Bubblechart
- The Normal Forms
- First Normal Form Examples
- Second Normal Form Examples
- Third Normal Form Examples
- Normalizing Time-Variant Attributes
- Bubblecharting Notes
- Recursive Relationship/Multiple Relationships
- N-ary Relationship
- Subtype/Supertype
- Characteristic Entity vs. Subordinate Data Group
- Relationship with Data vs. Associative Entity
- Relationship Roles
- Questions to Assist Normalizing Data Elements
- 5. Fully Defining New Entities, Relationships, Logical Data Groups, Associations
- 6. Verifying the Dataview Bubblechart
- 7. Synthesizing into Logical Model
- Synthesis Example
- Metadata Repository Meta-E/R Model to Support Logical Data Modeling
- 8. Matching DW Data Elements with Source Data Fields/Columns
- Example Transforms
- Example Questions for Data Source Matching
- 9. Reviewing and Stabilizing the DW Logical Data Model
- Example Questions for Reviewing/Stabilizing
- Physical Data Warehouse Modeling
- Purpose, Form and Content
- Physical Data Modeling Definitions
- Physical Design Issues
- Physical Modeling Steps
- 1. Formalizing and Weighting Design Objectives
- 2. Defining Physical and Technological Environment
- 3. Laying Out First Cut Physical Design(s)
- General Logical to Physical Transform
- General Foreign Key Rules
- Example Logical to Physical Transforms
- 4. Deciding Stored vs. Virtual Derived Data
- 5. Analyzing and Adjusting for Volume and Growth
- 6. Analyzing and Adjusting for Security Requirements
- 7. Analyzing and Adjusting for Transaction Performance
- Modeling Predominant Patterns of Use
- Focusing on Critical Transaction Performance
- Collecting or Separating
- Denormalizing
- 1st Normal Form Compromise
- 2nd Normal Form Compromise
- 3rd Normal Form Compromise
- "Star" Schema
- "Snowflake" Schema
- Physical Clustering to Avoid, Minimize or Ease Joining
- Deciding Secondary Key Indices
- 8. Analyzing and Adjusting for Ease of Use
- 9. Assessing Design Objectives
- 10. Finalizing the Physical Model
- 11. Specifing the Design in DBMS Data Definition Language (DDL)
- Specifying Final Source to Data Warehouse Field/Column Transforms
- Metadata Repository Meta-E/R Model to Support Physical Design and Source Data Mapping
- Summary
- Workshops
- Identifying Candidate Entities
- E/R Diagramming
- Defining Entity States/Transitions
- Defining Entities/Relationships
- Data Element Analysis and Bubblecharting
DURATION: 5 days
TARGET AUDIENCES: (recommended maximum number of attendees - 25)
- CIO/IS/IT/IRM Management
- Conceptual, Logical and Physical Data Modelers
- Process Analysts/Modelers
- DW Development Project Managers
- Business persons who participate in data warehouse development projects
PREREQUISITES: (very strongly recommended, not mandatory)
RETURN TO TOP OF PAGE