BusinessObjects Topics

Search This Blog

BusinessObjects Topics

Tuesday, November 11, 2008

Oracle Learning -1

SQL: SELECT Statement

The SELECT statement allows you to retrieve records from one or more tables in your database.
The syntax for the SELECT statement is:
SELECT columns
FROM tables
WHERE predicates;

Example #1
Let's take a look at how to select all fields from a table.
SELECT *
FROM supplier
WHERE city = 'Newark';
In our example, we've used * to signify that we wish to view all fields from the supplier table where the supplier resides in Newark.

Example #2
You can also choose to select individual fields as opposed to all fields in the table.
For example:
SELECT name, city, state
FROM supplier
WHERE supplier_id > 1000;
This select statement would return all name, city, and state values from the supplier table where the supplier_id value is greater than 1000.

Example #3
You can also use the select statement to retrieve fields from multiple tables.
SELECT orders.order_id, supplier.name
FROM supplier, orders
WHERE supplier.supplier_id = orders.supplier_id;
The result set would display the order_id and suppier name fields where the supplier_id value existed in both the supplier and orders table.



SQL: DISTINCT Clause

The DISTINCT clause allows you to remove duplicates from the result set. The DISTINCT clause can only be used with select statements.
The syntax for the DISTINCT clause is:
SELECT DISTINCT columns
FROM tables
WHERE predicates;

Example #1
Let's take a look at a very simple example.
SELECT DISTINCT city
FROM supplier;
This SQL statement would return all unique cities from the supplier table.

Example #2
The DISTINCT clause can be used with more than one field.
For example:
SELECT DISTINCT city, state
FROM supplier;
This select statement would return each unique city and state combination. In this case, the distinct applies to each field listed after the DISTINCT keyword.

SQL: COUNT Function

The COUNT function returns the number of rows in a query.
The syntax for the COUNT function is:
SELECT COUNT(expression)
FROM tables
WHERE predicates;



Simple Example
For example, you might wish to know how many employees have a salary that is above $25,000 / year.
SELECT COUNT(*) as "Number of employees"
FROM employees
WHERE salary > 25000;
In this example, we've aliased the count(*) field as "Number of employees". As a result, "Number of employees" will display as the field name when the result set is returned.

Example using DISTINCT
You can use the DISTINCT clause within the COUNT function.
For example, the SQL statement below returns the number of unique departments where at least one employee makes over $25,000 / year.
SELECT COUNT(DISTINCT department) as "Unique departments"
FROM employees
WHERE salary > 25000;
Again, the count(DISTINCT department) field is aliased as "Unique departments". This is the field name that will display in the result set.

Example using GROUP BY
In some cases, you will be required to use a GROUP BY clause with the COUNT function.
For example, you could use the COUNT function to return the name of the department and the number of employees (in the associated department) that make over $25,000 / year.
SELECT department, COUNT(*) as "Number of employees"
FROM employees
WHERE salary > 25000
GROUP BY department;
Because you have listed one column in your SELECT statement that is not encapsulated in the COUNT function, you must use a GROUP BY clause. The department field must, therefore, be listed in the GROUP BY section.

TIP: Performance Tuning
Since the COUNT function will return the same results regardless of what field(s) you include as the COUNT function parameters (ie: within the brackets), you can change the syntax of the COUNT function to COUNT(1) to get better performance as the database engine will not have to fetch back the data fields.

For example, based on the example above, the following syntax would result in better performance:
SELECT department, COUNT(1) as "Number of employees"
FROM employees
WHERE salary > 25000
GROUP BY department;
Now, the COUNT function does not need to retrieve all fields from the employees table as it had to when you used the COUNT(*) syntax. It will merely retrieve the numeric value of 1 for each record that meets your criteria.
SQL: WHERE Clause

The WHERE clause allows you to filter the results from an SQL statement - select, insert, update, or delete statement.
It is difficult to explain the basic syntax for the WHERE clause, so instead, we'll take a look at some examples.

Example #1
SELECT *
FROM supplier
WHERE supplier_name = 'IBM';
In this first example, we've used the WHERE clause to filter our results from the supplier table. The SQL statement above would return all rows from the supplier table where the supplier_name is IBM. Because the * is used in the select, all fields from the supplier table would appear in the result set.

Example #2
SELECT supplier_id
FROM supplier
WHERE supplier_name = 'IBM'
or supplier_city = 'Newark';
We can define a WHERE clause with multiple conditions. This SQL statement would return all supplier_id values where the supplier_name is IBM or the supplier_city is Newark.

Example #3
SELECT supplier.suppler_name, orders.order_id
FROM supplier, orders
WHERE supplier.supplier_id = orders.supplier_id
and supplier.supplier_city = 'Atlantic City';
We can also use the WHERE clause to join multiple tables together in a single SQL statement. This SQL statement would return all supplier names and order_ids where there is a matching record in the supplier and orders tables based on supplier_id, and where the supplier_city is Atlantic City.

SQL: "AND" Condition

The AND condition allows you to create an SQL statement based on 2 or more conditions being met. It can be used in any valid SQL statement - select, insert, update, or delete.
The syntax for the AND condition is:
SELECT columns
FROM tables
WHERE column1 = 'value1'
and column2 = 'value2';
The AND condition requires that each condition be must be met for the record to be included in the result set. In this case, column1 has to equal 'value1' and column2 has to equal 'value2'.

Example #1
The first example that we'll take a look at involves a very simple example using the AND condition.
SELECT *
FROM supplier
WHERE city = 'New York'
and type = 'PC Manufacturer';
This would return all suppliers that reside in New York and are PC Manufacturers. Because the * is used in the select, all fields from the supplier table would appear in the result set.

Example #2
Our next example demonstrates how the AND condition can be used to "join" multiple tables in an SQL statement.
SELECT order.order_id, supplier.supplier_name
FROM supplier, order
WHERE supplier.supplier_id = order.supplier_id
and supplier.supplier_name = 'IBM';
This would return all rows where the supplier_name is IBM. And the supplier and order tables are joined on supplier_id. You will notice that all of the fields are prefixed with the table names (ie: order.order_id). This is required to eliminate any ambiguity as to which field is being referenced; as the same field name can exist in both the supplier and order tables.
In this case, the result set would only display the order_id and supplier_name fields (as listed in the first part of the select statement.).
SQL: "OR" Condition

The OR condition allows you to create an SQL statement where records are returned when any one of the conditions are met. It can be used in any valid SQL statement - select, insert, update, or delete.

The syntax for the OR condition is:
SELECT columns
FROM tables
WHERE column1 = 'value1'
or column2 = 'value2';
The OR condition requires that any of the conditions be must be met for the record to be included in the result set. In this case, column1 has to equal 'value1' OR column2 has to equal 'value2'.

Example #1
The first example that we'll take a look at involves a very simple example using the OR condition.
SELECT *
FROM supplier
WHERE city = 'New York'
or city = 'Newark';
This would return all suppliers that reside in either New York or Newark. Because the * is used in the select, all fields from the supplier table would appear in the result set.

Example #2
The next example takes a look at three conditions. If any of these conditions is met, the record will be included in the result set.
For example:
SELECT supplier_id
FROM supplier
WHERE name = 'IBM'
or name = 'Hewlett Packard'
or name = 'Gateway';
This SQL statement would return all supplier_id values where the supplier's name is either IBM, Hewlett Packard or Gateway.

Read more »

Tuesday, November 4, 2008

Business Objects FAQs

1. What is the difference between thin client & thick client?
Thin Client is a browser based version, whereas thick client is a desktop based version. In thick client, you have lot of functions and formatting options.

Desktop Intelligence is full client. It is 2 tier architecture, where Web-I is 3 tier with Enterprise server in between.

Desktop-I and Web-I differs in some syntaxes.
E.g: [] in Web-I, <> in Deski.

Also scheduling can be done directly in Web-I (Xi R2), where as we need additional software to schedule Deski reports.

You can view the Deski reports in Web-I, but not Web-I reports in Deski.

But we can schedule the Deski reports via Web-I.

WebI: it is 3tier architecture. and also known as thin client . in boxir2 wise merge option available,edit sql also.
scheduling directly, Reports will generate not only corporate doc's, others also(excel,pdf--),hide option is not available,

Deski: it is 2 tier architecture & also known as full client. Here hide option, edit Sql, rank options are available. Desktop intelligence reports are dynamic. desktop intellegence is window base tool and need installation on every PC
where as WebI is web base tool and can be access any where through interenet explorer

Crystal reports are static and pagewise.

Infoview: WebI is a part of infoview. it generates java based reports. while open the reports through infoview.

2. In BOXIR2 we have following tools.
Import Wizard
Report Conversion Tool
Repository Migration Wizard

3. Multi pass SQL:
Multipass: Breaking one large SQL into multiple SQLs.If you are using the star schema with two or more fact tables,and you enable this feature, BO will automatically generate two or more SQLs (i.e. one SQL for each fact table object used in the report). Then the results will be synchronised in the report.

4. what is isolated joins in check integrity
Isolated join is the join which is not included in any of your contexts, so you are getting that error.

Solution :
First of all find what are all the joins you left with out including in any of your contexts and join them to any of the context which
you thnk appropriate.

5. Migration Process:
To migrate the bo 6.5 to XIR2:--

a) open the migration wizard
b) select ur source location(Here give ur BO 6.5 doc)
c) click next
d) select ur destination location (ur BoXIR2 environment)
e) select the users or admin or specific users
f) click next
f) click ok.

6. Difference b/w Break & section?

Break removes duplicates but the same thing cannot be done by section.
Break displays data within the same cell content and Sectioning appears outside the grid.
When you do any arithmetic operation on break say sum or count, you can see the sum for individual block and for all the blocks in bottom.

In Section it performs operation only on individual block.

7. What is master-detail report?
Master-Detail report allows us to display the result in Section wise. It splits large blocks of data into sections. It minimizes the repeating values. We can have subtotals also.

It displays the data section wise. If you have the following in a report, for e.g. Country, Store, Sales, you can change it into a master detail report country wise by dragging and dropping Country as a section when the cursor shows the text 'Drop here to create a section' you can seethe data country wise.

8. how big is ur team?

in my enviroment their are 3000+ Business End Users and 4 report Developers and Two Universe Designer and one Administrator.

9. is it possible to creating reports without universe.
it is possible to creating reports without universe. By Using personal data files&free hand sql
By Using personal data files&free hand sql. This is possible only in deski reports not in WebI.

11. how to solve #multivalue, #syntax, #error. iwant complete solution process in practical wise?
#Multivalue :- this error will occur in 3ways
1) #multivalue in aggregation
2) #multivalue in breaks header or footer
3) #multivalue in section level.

1:-- the o/p context not include i/p context its situation this error occurs.
Ex: in a report i have year,city dia's & Revenue measure is there.
= In
The above condition will to run the query getting revenue column #multivalue error occurs.

solution: cilck the formulabar in view menu
select the error containg cell, edit the formula to write below condition.
= In(,) In
The above formula will run correct data will appear in the report.
Note: the above condition by default it will take agg "sum" function.

#syntax:--
the variable in the formula no longer exist in the
condition.

Ex:- *
The above condition will run this error will occur.

Solution:- Click edit dataprovider--> new object will be
need --> select error cell --> edit formula --> click ok.

#error:--
the variable in the formula is incorrect. this
error will occur.

solution : go for data menu --> click variable
select the error containing a cell --> copy the formula in
edit menu --> paste it in new cell --> go for formula bar in
view menu --> --> take the first error containg cell
-->edit the formula --> repeat the above steps.

12. What is the difference B/W Variable & Formula?
Whenever we execute the formula , the result will be stored in the variable.

13. what is ment by incompatable object error in the report level?

When the contexts are not properly defined we will get the error as incompaitable combination of objects.

14. In a report i want to fetch the data from 2 data Providers. which condition will satisfy to link the 2 data providers.
ex: Q1 have columns A,B,C Q2 has a X,Y,Z columns. requirement is like i want to get all the columns from those 2 tables in report level..like A,B,C,X,Y,Z in a single report.

in BOxir2 wise it is possible. would u have base uni & dervied uni's. i think ur requirement is solve by using "combining query" option & just select "union" operator wise ur query is solve. anotherwise u go for WebI, select "MERGE" option it is possible.

otherwise, ur requirement is not possible. because ur columns names are not maching.

u go for deski, select "datamanager" --> click "link to" Option it is possible.

15. What is meant by For each For all function. In which case we use the option in BO?

for each-add all objects to the context,
for all-delete all objects from the context
we use forall for summary purpose and foreach for detail purpose

16. how we improve the performance of report and universe?
By creating the summary tables and using aggregate awareness.

17. In xir2 how to send reports to end user?
You can send reports to any user via the scheduling options for a report. The report will then run as per the scheduled options and when successful, it will send a copy to the user's email address or inbox (in BO), depending on the options selected.

18.which is the best way to resolve loops? Context or Alias?
in a schema we have only one look loop u can create Alias. But, when we have multiple look up u can create contexts.
most of the cases are using context can be use to resolve the loops. why because more number of complexity loops and
also schema contains 2 or more fact tables in ur universe design.
If there are more than one fact table use a context or if only one fact table use alias.

19. How do we link 2 universes?
The linking can be done at reporter level by linking of data providers. We can link the dimensions and measures of two different universes with 2 different connections by linking the data providers built upon them.

20. What r the types of joins universe supports?
Inner Join, outerjoin, left, right outer join, full outer Join.

Read more »

Monday, November 3, 2008

Desktop Intelligence vs. WebIntelligence XI R2

Entering Deski/Webi:
For Deski:
Wizard: Universe vs. Other Data Source
4 wizard options (cell, table, crosstab,chart)
Many Microsoft formatting toolbars
For Webi:
Universes (Or OLAP) Only
No personal data files (Excel, XML, etc)
No real wizard
Limited Microsoft formatting toolbars
Interactive Mode: Can Enter By accident

Query Panel:
For Deski: Data Tab
When editing query, does add new objects to the report
Radial button for display of classes and object or predefined conditions
Button For: Save & Close/View/Run/Cancel
View Button for look at data and other functions
Add Query From Report Manager Window
Right Click in white area in Data Section
Insert New Data Wizard pops up
Report Manager: Click radial button to sort by data provider
Edit only 1 query at a time
User Objects can be created
View SQL

For Webi: Data Tab
When editing existing query, does NOT add in to the report
Edit Query/Edit Report Icon
Properties tab for queries
Predefined conditions integrated together with classes and objects
Run Query Button on top (Only 1 option)
Can selectively run only 1 instead of all queries (Refresh too)
No View Button
No statistics/view data options
Can hide the Query Filter Box
Add Query Button (To open up another query panel)
Creates a Query Tab in Query Window
Has mini speed menu for those Tabs
Report Manager: Click down arrow to sort by query
Can click on query tab to edit directly (jump around)
No regular templates option
No User Objects capability
View SQL now available
Scope of Analysis Option (Click On/Off)
Appears on bottom of query panel (Below Query Filters Box)
Creating Query Filters (Conditions) more convenient: List of Operators and some Operand settings displayed within Query Filter-Builder.
No ‘Show List of prompts’ choice in Query Filters.
(Properties?) Tab next to Data Tab has box for changing retrieval record limit or retrieval time.

Report Manager:
For Deski:
Slice & Dice Panel
Format Templates
No drag and drop templates
Microsoft Formatting Toolbars
No Report Filter Window
Drilling: Must Grab All dimensions down path, or use scope of analysis

For Webi:
No Slice & Dice Panel
“Templates” Option (Drag and Drop)
No Format Templates
No Query on Query/Subquery Calc
No Grouping (Clip Icon)
No hide Objects
No Count All
No Fold option
Dragging/Dropping within Report Window very easy.
Can drag objects directly from Results Object window to Query Filters
No personal lov’s
Limited Microsoft Formatting Toolbars
Right Click on Edge of Report: Turn To Option
4 Report Options + 1 Full Chart Options as well
Report Filter Window Option (Appears on top of display)
To Remove Calcs: Drag Off or Structure Mode or Right Click/Remove Row or Column
Custom Sorts: But less sorting options
Breaks: Less Property Options
Appear on left side via properties tab (Must drill down)
Ranking: But less property options
Properties Tab on Left:
Have to click on option to see pull down’s
Contexts now different
Prompting options far more powerful and easy to use
Formulas/Variables:
Includes most Deski functions now
IF is a Function (Not a command): Like Excel
Display Format: More Difficult
Tabs on Left: Data/Functions/Operators
Formula on Right/Bottom
Name/Definition on Right/Top
Operators list remains fixed
Subquery Done Via Toolbar Option (Not in conditions)
Linking Multiple Data Providers: Merge Dimensions
New Toolbar Option
Easy to Use Menu
Drilling: Will Drill via New Query to lower level
Snapshot more limited

Read more »

Tuesday, September 16, 2008

UNDERSTANDING ENTITIES - CHAPTER 2

What is an Entity ?

An entity is s physical representation of a logical grouping of data. Entities can be tangible, real things, such as a PERSON or ICE CREAM, or intangible concepts, such as a COST CENTER or MARKET. Entities do not represent single things. Instead, they represent collections of instances that contain the information of interest for all instances or occurrences. For example a PERSON entity represents instances of things of type Person. Gabriel De Angelies, R.J golcher, Jessica Corter, and Venessa Westley are examples of specific instances of PERSON. A specific instance of an entity is represented by a row and is identified by a primary key.

An entity has the following characterstics:
• It has a name and description.
• It represents a class, rather than a single instance of a concept.
• It has the ability to uniquely identify each specific instance.
• It contains a logical grouping of attributes representing the information of
interest to the enterprise.


Formal Entity Definitions

The following list contains entity definitions from some of the most influential leaders in data modeling. Notice the similarities:
• Chen (1976): “A thing which can be distinctly identified.”
• Date (1986): “Any distinguishable object that is to be represented in the database.”
• Finklestein (1989): “A data entity represents some ‘thing’that is to be stored for later reference. The term entity to the logical representaion of data.”

Defining Entity Types

Within the independent and dependent entities are entity types:
• Core entities-- These are sometimes called primary or prime entities. They represent the important objects about which the enterprise in interested in keeping data.
• Code/reference/classification entities-These entities contain rows that define the set of values, or domain, for an attribute.
• Associative entities--These entities are used to resolve many-to-many relationships.
• Subtype entities-These entities come in two types, exclusive and inclusive.

Core Entity

Core entities are the most important objects about which an enterprise is interested in keeping data. They are often referred to as prime, principal, or primary entities. Because these entities are so important, it is likely that they are used elsewhere in the enterprise. Take the time to look for similar entities because there are many opportunities for the reuse of core entities. Core entities should be modeled consistently throughout the enterprise. Good modelers consider this an essential best practice.


Note the straight corners of the independent entities, STORE and ICE CREAM and the rounded corners of the dependent entity STORE ICE CREAM.

A core entity can be an independent entity or a depenent entity. Figure 2.1 provides examples of core entities for an enterprie that sells ice cream. ICE CREAM represents the base products sold by the enterprise. STORE is an example of a distribution channel, or the vehicle through which a product is sold.

Consider that the enterprise is doing well and has decided to add another STORE. The model requires no change to support the addittion of a new instance of STORE. It is simpy another row added to the STORE entity. The same applies to ICE CREAM.

Notice the core entities ICE CREAM and STORE. Although the example may seem straight forward, it illustrates a powerful concept regarding the modeling of core entities.

Understanding how to model core entities as scalable and extensible containers of information requires the modeler to think about the entity as an abstract concept and to model the information independently of the way it is used today. In this example, model ICE CREAM completely outside the context of STORE and vice versa. So, if the enterprise decides to sell ICE CREAM using an addittional channel, such as the Internet or door-to-door, the new channel can be added without distrubing other entities.

Code Entity

Code entities are always independent entities. They are often referred to as reference, classification, or type entities, depending on the methodology. The unique instances represented by code entities define the domain of values for attributes present in other entities. You might be tempted to use a single attribute in a code table. It is a best practice to include at least three attributes in a code entity: an identifier, a name (sometimes called a short name), and a description.

In Figure 2.2, TOPPING is an independent entity; note the sharp corners. TOPPING is also a code or classification entity. The instances (or rows) of TOPPING define the list of toppings available.





Figure 2.2
Code entities allow an enterprise to define a set of values for consistent use throughoput the enterprise. The instances of a code entity define a domain of values for use elsewhere in the model.

Code entities usually contain a limited number of attributes. I have seen instances where these entities contain only a single attribute. I prefer to model code entities with an artificial identifier. Using an artificial identifier, along with a name and description, allows the addittion of new kinds of TOPPING to be added as instances (rows) in the entity. Note that TOPPING contains three attributes.

I often refer to code entities as corporate business objects. The name, corporate business objects, indicates that the entities are defined and shared at a corporate level, not by a single application, system, or business unit. These entities are often shared by many databases to allow consistent roll-up reporting or trending analysis.

Associative entity

Associative entities are entities that contain the primary key from two or more other entities. Associative entities are always dependent entities. They are used to resolve many-to many relationships between other entities. Many – to many relationships are those in which many instances of one entity are related to many instances of another. Associative entities allow us to model the intersection between the instances of the two entities, thereby allowing each instance in the associatie entity to be unique.

Note

Many- to – many relationships cannot be implemented in a physical database. ERwin will automatically create an associative entity to resolve a many-tomany relationship when the model is changed from logical to physical mode.

Figure 2.1 uses an associative entity to resolve a many-to-many relationship between STORE and ICE CREAM. The addittion of an associative entity allows the same ICE CREAM to be sold in many instances of STORE, while not requiring every STORE to sell the same ICE CREAM. The associative entity STORE ICE CREAM resolves the fact that an instance of STORE sells many instances of ICE CREAM and an instance of ICE CREAM is sold by many instances of STORE.


Subtype Entity

Subtype entities are always dependent entities. You should use subtype entities when it makes sense to keep different sets of attributes for the instances of an entity. Finklestein refers to subtype entities as secondary entities. Subtype entities almost always have one or more “sibling” entities. The subtype entity siblings are related to a parent entity through a special relationship that is either exclusive or inclusive.

Note

Subtype sibling entities that have an exclusive relationship to the parent entity indicate that only one sibling has an instance for each instance of the parent entity. Exclusive subtypes represent an “is a” relationship.
Subtype sibling entities that have an inclusive relationship to the parent entity indicate that more than one sibling can have an instance for each instance of the parent entity.

Figure 2.3 shows the CONTAINER entity and the subtype entities CONE and CUP. The ice cream store apparently does not sell ice cream in bulk, only single servings. Note that an instance of CONTAINER must be either a CONE or a CUP. A CONTAINER cannot be both a CONE and a CUP. This is an exclusive subtype.

Figure 2.3, the PERSON entity has two subtypes, EMPLOYEE and CUSTOMER. Note that an exclusive subtype would not allow a single instance of PERSON to contain facts common to both an EMPLOYEE and a CUSTOMER. A VENDOR can also be a CUSTOMER.These are examples of inclusive subtypes.






Figure2.3
Two examples of subtype entities, PERSON and CONTAINER. Both use ERwin IE notation to represent exclusive and inclusive subtypes. The (X) in the subtype symbol of CONTAINER indicates exlusive. The absence of the (X) in the subtype symbol indicates inclusive.


Structure Entity

Sometimes, instances of the same entity are related. In his 1992 book Strategic Systems Development, Clive Finklestein proposes the use of a structure entity to represent relationships between instances of an entity. Relationships between instances of an entity are called recursive relationships. “ Recursive relationships are a logical concept, a concept sometimes difficult for users to grasp.

Figure 2.4 shows the addittion of a structure entity that allows a relationship between instances of EMPLOYEE. The diagram shows that the EMPLOYEE subtype of the PERSON entity has two subtypes, SERVER and MANAGER. The EMPLOYEE STRUCTURE entity represents the relationship between instances of EMPLOYEE.




Figure 2.4
Structure entity illustrates Clive Finklestein’s resolution for recursive relationship.

Naming Entities

The name assigned to an entity should be indicative of the instances of the entity. The name should be understood and accepted across the enterprise. When selecting a name, keep an enterprise view and take care to use a name that reflects how the data is used throughout the entire enterprise, not just a single area. Use names that are meaningful to the user community and domain experts.

I hope you have a set of naming conventions that were developed for use in the enterprise, or an enterprise data model, to guide you. Using naming conventions ensures that names are constructed consistently across the enterprise, regardless of who constructs the name. The following sections provide a starter set of naming conventions and give examples of good and bad names.

Entity Naming Conventions

Naming conventions might not seem important if you work in a small organization with a small set of users. However, in a large organization with many development teams and many users, naming conventions greatly facilitate communication and data sharing. As a best practice, you should develop and maintain naming conventions in a central location and then document and publish them for the whole enterprise.

I include some pointers for beginning a good set of naming conventions, just in case your organization has not yet developed one:

• An entity name should be as descriptive as necessary. Use single-word names only when the name is a widely accepted concept. Consider using noun phrases.

• An entity name should be a singular noun or noun phrase. Use PERSON instead of PERSONS or PEOPLE, or CONTAINER instead of CONTAINERS.

• An entity name should be unique. Using the same entity name to contain different data, or a different entity name to contain the same data, is needlessly confusing to developers and users alike.

• An entity name should be indicative of the data that will be contained for each instance.

• An entity name should be indicative of the data that will be contained for each instance.

• An entity name should not contain special characters (such as! @,#,$,%,^,&,*, and so on) or show possession (PERSON’S ICE CREAM).

• An entity name should not include acronyms or abbreviations unless they are part of the accepted naming conventions.

I encourage modelers to use good naming conventions if they are available and to develop them if they do not follow these guidelines.

Read more »

Sunday, September 14, 2008

DATA MODELING CONCEPTS CHAPTER 1


CHAPTER 1-DATA MODELING CONCEPTS

The Role of Data Modeling

Data modeling tasks provide the most benefit when performed early in the development lifecycle. The model provides information critical to understanding the scope of a project for iterative development phases. Beginning the implementation phase without a clear understanding of the data requirements might cause your project to incur costly overruns or end up on the scrap heap.

An Introduction to Project Development

Many publications discuss project development, and this text does not cover this subject in detail. I included this section to assist modelers in understanding the role of data modeling in project development and to provide an understanding of when modeling should occur.

Most companies follow a methodology that outlines the development lifecycle selected to guide the development process. To some degree, most adhere to the same order of high-level concepts:

1. Problem definition
2. Requirements analysis
3. Conceptual design
4. Detail design
5. Implementation
6. Testing

This development method is generally referred to as the waterfall method. As you can see in Figure 1.1, each phase is completed before moving to the next, creating a “waterfall” effect.


Figure 1.1
The waterfall method of project development. Note that the results of each phase cascade into the next.

Many projects are developed using iterations or phases. An iterative development approach decreases risk by breaking the project into discrete manageable phases. Each phase includes analysis, detail design, implementation, and testing. Subsequent phases build upon and leverage the functionality of the preceding phase. However, within each phase, the waterfall method applies.

As with most engineering projects, you create a data model by following a set of steps.
1. Problem and scope definition
2. Requirements gathering
3. Analysis
4. Logical data model creation
5. Physical data model creation
6. Database creation

Figure 1.2 illustrates how each step provides input for the next.



Figure 1.2
Logical data model creation can occur prior to selecting a database platform (Oracle, DB2, Sybase, and so on). ERwin can provide support for specific physical properties if the physical data model is produced after the database platform is selected.

Problem and Scope Definition

Begin logical data modeling by defining the problem. This step is sometimes rederred to as writing a mission or scoping statement. The problem definition can be a simple pragraph or it can be a complex document that outlines a series of business objectives. The problem definition defines the scope, or boundary, of the data model, much the way a survey defines property boundaries.

Gathering Information Requirements

Most industry experts agree that the most critical task in a development project is an accurate and complete definition of the requirements. In fact, an incomplete or inaccurate understanding of requirements can cause expensive re-work and significant delay.

Gathering information requirements is the act of discovering and documenting the information necessary to identify and define the entities, attributes, and business rules for the logical model. There are two well-recognized methods for gathering requirements: facilitated sessions and interviews. Most development methodologies recommend facilitated sessions. The sections that follow provide high-level guidelines for gathering information requirements using facilitated sessions. A later exercise demonstrates how to use the information gathered to create a data model using ERwin.

Analysis

You must analyze and research the data requirements and business rules to produce a complete logical model. Analysis tasks should provide accurate and complete definitions for all entities, attributes, and relationships. Metadata, data about the data, is collected and documented during the analysis phase.

The analysis can be performed by the modeler or by a business analyst. Either the model or the business analyst works with users to document how users intend to use the data. These tasks drive out the corporate business objects needed to support the information requirements. Corporate business objects are also called code, reference, or classification data structures. This is also the opportunity to document code values that will be used. You should carefully document any derived data, data that is created by manipulating or combining one or more other data elements, and data elements used in the derivation.

Logical Data Model

A logical data model is a visual representation of data structures, data attributes, and business rules. The logical model represents data in a way that can be easily understood by business users. The logical model design should be independent of platform or implementation language requirements or how the data will be used.

The modeler uses the data requirements and the results of analysis to produce the logical data model. The modeler also resolves the logical model to third normal form and validates against the enterprise data model, if available. Later sections provide a description of a complete logical model, resolving a logical model to third normal form, an overview of an enterprise model, and provide some tips on validating a logical model against an enterprise model.

After you compare the logical model and enterprise data model and make any necessary changes, it is important to review the model for accuracy and completeness. The best practice includes a peer review as well as a review with the business partners and development team.


Entities

Entities represent the things about which the enterprise is interested in keeping data. An entity can be a tangible object such as a person or a book, but it can also be conceptual such as a cost center or business unit. Entities are nouns and are expressed in singular form, CUSTOMER as opposed to CUSTOMER, for clarity and consistency.

You should describe an entity using factual particulars that make it uniquely identifiable. Each instance of an entity must be separate and cleary identifiable from all other instances of that entity. For example, a data model to store information about customers must have a way of distinguishing one customer from another.

Figure 1.3 provides some examples of entities.




Figure 1.3
Here are examples of using ERwin to display entities in their simplest form.


Attributes

Attributes represent the data the enterprise is interested in keeping about objects. Attributes are nouns that describe the characteristics of entities.



Relationships

Relationships represent the associations between the objects about which the enterprise is interested in keeping data. A relationship is expressed as a verb or verb phrase that describes the association. Figure 1.5 provides some examples using ERwin’s Information Engineering (IE) notation to represent relationships.

Normalization

Normalization is the act of moving attributes to appropriate entities to satisfy the normal forms. Normalization is usually presented as a set of complex statements that make it seem a complicated concept. Actually, normalization is quite straightforward: “One fact in one place,” as stated by C.J.Date in his 1999 books An Introduction to Database Systems. Normalizing data means you design the data structures in such a way as to remove redundancy and limit unrelated structures.

Five normal forms are widely accepted in the industry. The forms are simply named first normal form, second normal form, third normal form, fourth normal form, and fifth normal form. In practice, many logical models are only resolved to third normal form.

Formal Definitions of Normal Forms

The following normal form definitions might seem intimidating; just consider them formulas for achieving normalization. Normal forms are based on relational algebra and should be interpreted as mathematical functions.

Business Normal Forms

In his 1992 book, Strategic Systems Development, Clive Finklestein takes a different approach to normalization. He defines business normal forms in terms of the resolution to those forms. Many modelers, myself include, find this business approach more intuitive and practical.

First business normal form (1BNF) removes repeating groups to another entity. This entity takes its name, and primary (compound) key attributes, from the original entity and forms the repeating group.

Second business normal form (2BNF) removes attributes that are partially dependent on the primary key to another entity. The primary (compound) key of this entity is the primary key of the entity in which it originally resided, together with all additional keys on which the attribute is wholly dependent.

Third business normal form (3BNF) removes attributes that are not dependent at all on the primary key to another entity where they are wholly dependent on the primary key of that entity.

Fourth business normal form (4BNF) removes attributes that are dependent on the values of the primary key or that are optional to a secondary entity where they wholly depend on the value of the primary key or where they must (it is mandatory) exist in that entity.

Fifth business normal form (BNF) exists as a structure entity if recursive or other associations exist between occurrences of secondary entities or if recursive associations exist between occurrences of their principal entity.

A Complete Logical Data Model

A complete logical model should be in third business normal form and include all entities, attributes, and relational ships required to support the data requirements and the business rules associated with the data.


Read more »

Tags