BusinessObjects Topics

Search This Blog

BusinessObjects Topics

Sunday, September 14, 2008

DATA MODELING CONCEPTS CHAPTER 1


CHAPTER 1-DATA MODELING CONCEPTS

The Role of Data Modeling

Data modeling tasks provide the most benefit when performed early in the development lifecycle. The model provides information critical to understanding the scope of a project for iterative development phases. Beginning the implementation phase without a clear understanding of the data requirements might cause your project to incur costly overruns or end up on the scrap heap.

An Introduction to Project Development

Many publications discuss project development, and this text does not cover this subject in detail. I included this section to assist modelers in understanding the role of data modeling in project development and to provide an understanding of when modeling should occur.

Most companies follow a methodology that outlines the development lifecycle selected to guide the development process. To some degree, most adhere to the same order of high-level concepts:

1. Problem definition
2. Requirements analysis
3. Conceptual design
4. Detail design
5. Implementation
6. Testing

This development method is generally referred to as the waterfall method. As you can see in Figure 1.1, each phase is completed before moving to the next, creating a “waterfall” effect.


Figure 1.1
The waterfall method of project development. Note that the results of each phase cascade into the next.

Many projects are developed using iterations or phases. An iterative development approach decreases risk by breaking the project into discrete manageable phases. Each phase includes analysis, detail design, implementation, and testing. Subsequent phases build upon and leverage the functionality of the preceding phase. However, within each phase, the waterfall method applies.

As with most engineering projects, you create a data model by following a set of steps.
1. Problem and scope definition
2. Requirements gathering
3. Analysis
4. Logical data model creation
5. Physical data model creation
6. Database creation

Figure 1.2 illustrates how each step provides input for the next.



Figure 1.2
Logical data model creation can occur prior to selecting a database platform (Oracle, DB2, Sybase, and so on). ERwin can provide support for specific physical properties if the physical data model is produced after the database platform is selected.

Problem and Scope Definition

Begin logical data modeling by defining the problem. This step is sometimes rederred to as writing a mission or scoping statement. The problem definition can be a simple pragraph or it can be a complex document that outlines a series of business objectives. The problem definition defines the scope, or boundary, of the data model, much the way a survey defines property boundaries.

Gathering Information Requirements

Most industry experts agree that the most critical task in a development project is an accurate and complete definition of the requirements. In fact, an incomplete or inaccurate understanding of requirements can cause expensive re-work and significant delay.

Gathering information requirements is the act of discovering and documenting the information necessary to identify and define the entities, attributes, and business rules for the logical model. There are two well-recognized methods for gathering requirements: facilitated sessions and interviews. Most development methodologies recommend facilitated sessions. The sections that follow provide high-level guidelines for gathering information requirements using facilitated sessions. A later exercise demonstrates how to use the information gathered to create a data model using ERwin.

Analysis

You must analyze and research the data requirements and business rules to produce a complete logical model. Analysis tasks should provide accurate and complete definitions for all entities, attributes, and relationships. Metadata, data about the data, is collected and documented during the analysis phase.

The analysis can be performed by the modeler or by a business analyst. Either the model or the business analyst works with users to document how users intend to use the data. These tasks drive out the corporate business objects needed to support the information requirements. Corporate business objects are also called code, reference, or classification data structures. This is also the opportunity to document code values that will be used. You should carefully document any derived data, data that is created by manipulating or combining one or more other data elements, and data elements used in the derivation.

Logical Data Model

A logical data model is a visual representation of data structures, data attributes, and business rules. The logical model represents data in a way that can be easily understood by business users. The logical model design should be independent of platform or implementation language requirements or how the data will be used.

The modeler uses the data requirements and the results of analysis to produce the logical data model. The modeler also resolves the logical model to third normal form and validates against the enterprise data model, if available. Later sections provide a description of a complete logical model, resolving a logical model to third normal form, an overview of an enterprise model, and provide some tips on validating a logical model against an enterprise model.

After you compare the logical model and enterprise data model and make any necessary changes, it is important to review the model for accuracy and completeness. The best practice includes a peer review as well as a review with the business partners and development team.


Entities

Entities represent the things about which the enterprise is interested in keeping data. An entity can be a tangible object such as a person or a book, but it can also be conceptual such as a cost center or business unit. Entities are nouns and are expressed in singular form, CUSTOMER as opposed to CUSTOMER, for clarity and consistency.

You should describe an entity using factual particulars that make it uniquely identifiable. Each instance of an entity must be separate and cleary identifiable from all other instances of that entity. For example, a data model to store information about customers must have a way of distinguishing one customer from another.

Figure 1.3 provides some examples of entities.




Figure 1.3
Here are examples of using ERwin to display entities in their simplest form.


Attributes

Attributes represent the data the enterprise is interested in keeping about objects. Attributes are nouns that describe the characteristics of entities.



Relationships

Relationships represent the associations between the objects about which the enterprise is interested in keeping data. A relationship is expressed as a verb or verb phrase that describes the association. Figure 1.5 provides some examples using ERwin’s Information Engineering (IE) notation to represent relationships.

Normalization

Normalization is the act of moving attributes to appropriate entities to satisfy the normal forms. Normalization is usually presented as a set of complex statements that make it seem a complicated concept. Actually, normalization is quite straightforward: “One fact in one place,” as stated by C.J.Date in his 1999 books An Introduction to Database Systems. Normalizing data means you design the data structures in such a way as to remove redundancy and limit unrelated structures.

Five normal forms are widely accepted in the industry. The forms are simply named first normal form, second normal form, third normal form, fourth normal form, and fifth normal form. In practice, many logical models are only resolved to third normal form.

Formal Definitions of Normal Forms

The following normal form definitions might seem intimidating; just consider them formulas for achieving normalization. Normal forms are based on relational algebra and should be interpreted as mathematical functions.

Business Normal Forms

In his 1992 book, Strategic Systems Development, Clive Finklestein takes a different approach to normalization. He defines business normal forms in terms of the resolution to those forms. Many modelers, myself include, find this business approach more intuitive and practical.

First business normal form (1BNF) removes repeating groups to another entity. This entity takes its name, and primary (compound) key attributes, from the original entity and forms the repeating group.

Second business normal form (2BNF) removes attributes that are partially dependent on the primary key to another entity. The primary (compound) key of this entity is the primary key of the entity in which it originally resided, together with all additional keys on which the attribute is wholly dependent.

Third business normal form (3BNF) removes attributes that are not dependent at all on the primary key to another entity where they are wholly dependent on the primary key of that entity.

Fourth business normal form (4BNF) removes attributes that are dependent on the values of the primary key or that are optional to a secondary entity where they wholly depend on the value of the primary key or where they must (it is mandatory) exist in that entity.

Fifth business normal form (BNF) exists as a structure entity if recursive or other associations exist between occurrences of secondary entities or if recursive associations exist between occurrences of their principal entity.

A Complete Logical Data Model

A complete logical model should be in third business normal form and include all entities, attributes, and relational ships required to support the data requirements and the business rules associated with the data.


Read more »

Data warehouse Concepts

Central Data warehouse

A Central Data Warehouse is a repository of company data where a database is created from operational data extracts. This database adheres to a single, consistent enterprise data model to ensure consistency in decision making support across the company.

Active Metadata Warehouse

An Active Metadata Warehouse is a repository of Metadata to help speed up data reporting and analyses from an active data warehouse. In its most simple definition, a Metadata is data describing data.

Enterprise Data Warehouse:

Enterprise Data Warehouse is a centralized warehouse which provides service for the entire enterprise. A data warehouse is by essence a large repository of historical and current transaction data of an organization. An Enterprise Data Warehouse is a specialized data warehouse which may have several interpretations

Functional Data warehouse

The Enterprise Data Warehouse is broken down into Functional Data Warehouse based one functional Data. Depending on the size of the company and their financial capability, a Functional Data Warehouse may serve on department or may server more. There are also companies that have branches in many different geographic locations around the globe and their Enterprise Data Warehouse may set up differently with different clustering for Functional Data Warehouses.

Operational Data Store

An Operational Data Store (ODS) is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data.

Read more »

Friday, September 12, 2008

ODS Vs. Data Warehouse

ODS Vs. Data Warehouse

Operational Data store

- Data Focused Integration From Transaction Processing Focused Systems

- Age Of The Data is Current, Near Term (Today, Last Week’s)

- Primary Use: Day-To-Day Decisions Tactical Reporting Current Operational
Results
- Frequency Of Load: Twice Daily , Daily, Weekly

Datawarehouse

- Subject Oriented,Integrated,Non-Volatile,Time Variant

- Age of the Data is Historic (Last Month, Qtrly, Five Years)

- Primary Use: Long-Term Decisions Strategic Reporting Trend Detection

- Frequency of load : Weekly, Monthly, Quarterly

Read more »

Tuesday, September 2, 2008

Incompatible objects

Incompatible objects nothing but Sometimes it is not possible to use certain combinations of objects in data providers. This situation arises when objects bear no relationship to one another. These objects are called incompatible objects.

For example, the Island Resorts Marketing universe contains the
[Reservation Year] and [Revenue] objects, which are incompatible. This is
because there is no revenue associated with a reservation. Revenue is
generated only when the customer is invoiced. The underlying database
structure reflects this; you cannot build a query that aggregates revenue by
reservation year because there is no such thing as revenue by reservation
year. In other words, the aggregation context that you specified for the
[Revenue] object does not exist.

Read more »

What is a scope of analysis?

The scope of analysis for a query is extra data that you can retrieve from the
database to give more details on the data returned by each of the objects in a
query. This extra data does not appear in the initial result document, but it
remains available in the data cube, so you can pull this data in to the report to
allow you to access more detail at any time. This process of refining the data
to lower levels of detail is called drilling down on an object

In the universe, the scope of analysis corresponds to the hierachical levels
below the object selected for a query.

You can set this level when you build a query.

You can also create a custom scope of analysis by selecting specific
dimensions to the Scope of Analysis pane.

For example, a scope of analysis of one
level down for the object Year, would include the object Quarter, which
appears immediately under Year.

Read more »

Tags