Informatica-DWH

Monday 1 September 2014

The following are the different phases involved in a ETL project development life cycle.

The following are the different phases involved in a ETL project development life cycle.

1) Requirement Gathering
a) Business Requirement Collection (BRD)
--> The business requirement gathering start by
Business analyst, onsite technical lead and client business users.
--> In this phase, a business analyst prepares business requirement document (BRD) or business requirement specification (BRS)
--> Br collection takes place at client location.
--> The o/p from BR analyses are
--> BRS business requirement specifications
--> SRS system requirement specifications
b) System Requirement Collection (SRD)
Senior technical people (or) ETL architect will prepare the SRS which contains s/w and h/w details
the SRS will includes
a) o/s to be used (windows or Unix)
b) rdbms required to build database (oracle, teradata etc)
c) etl tools required (Informatica, data stage)
d) olap tools required (cognos, bo)
the srs is also called as technical requirement specifications (trs)

2) Design Phase
a) High level design document (HLD)
an etl architect and dwh architect participate in designing a solution to build a dwh.
an HLD document is prepared based on business requirement.
b) Low level design document (LLD)
based on HLD, a senior etl developer prepare low level design document
the lld contains more technical details of an etl system
an lld contains data flow diagram (dfd), details of source and targets of each mapping.
an lld also contains information about full and incremental load.
after lld then development phase will start.
3) Development Phase
a) Mapping design
--> based an lld, the etl team will create mapping(etl code)
--> after designing the mappings, the code (mappings) will be reviewed by developers.
b) code review
--> code review will be done by developer.
--> in code review, the developer will review the code and the logic but not the data.
--> the foll. activities takes place in code review
--> you have to check the naming standards of transformation, mappings of data etc.
--> source and target mapping (placed the correct logic or not in mapping)
c) peer review
--> the code will reviewed by your team member (third party developer)
4) Testing
a) unit testing
--> A unit test for the dwh is a white box testing, it should check the etl procedure and mappings.
--> the foll. are the test cases can be executed by an etl developer.
1) verify data loss
2) no. of records in the source and target
3) data load / insert
4)data load /update
5) Incremental load
6) Data accuracy
7) Verify naming standards
8) Verify column mapping
--> The unit test will be carried by etl developer in development phase.
-- > ETL developer has to do the data validations also in this phase.

b) Development integration testing
--> run all the mappings in the sequence order.4
--> first run the source to stage mappings
--> then run the mappings related to dimensions and facts.
c) System integration testing
--> after development phase, we have to move our code to QA environment.
--> in this environment, we are giving read only permissions to testing people.
--> they will test all the workflows.
--> and they will test our code according to their standards.
d) User acceptance testing (UAT)
--> This test is carried out in the presence of client side technical users to verify the data migration from source to destination.
5) Pre-production
6) Production (go-live)

Regards,
Venkat

Could anyone please tell me how to expalin an informatica project in interview..?

Could anyone please tell me how to expalin an informatica project in interview..
What are your Daily routines?
Cheking any imp mails are came.Need to disucss with Team lead if we have to do anywork. By EOD send the mail to team lead regarding work. need to attend weekly status meeting.

How many mapping have you created all together in your project?

As of now i did 3 projects first project : 48 workflows , second : 52 workflow , third project : 156 worlfows as of now.

In which account does your Project Fall?
account???
hope its manufacturing and advanced services.

What is your Reporting Hierarchy?

me -->team lead--->Project Manager--->programming Manager
me---> AR--->HR

How many Complex Mapping’s have you created? Could you please me the situation for which you have developed that Complex mapping?
i think 8 , one for character by character comparision (with informatica not possible) so written 950 lines if single SQL code. and one for Dynamical Hierarchy distribution .... and one for Multi byte characters ( japanes,chines,..etc) and so on...
What is your Involvement in Performance tuning of your Project?
some times any way performance team will take care of my code.
What is the Schema of your Project? And why did you opt for that particular schema?
some_Prd --->that is the souce for us

What are your Roles in this project?

as a developer , desing the workflows , unit testing, and so on...

Can I have one situation which you have adopted by which performance has improved dramatically?
yes ..... my firest project ..i faced perormance issue after go live.

Where you Involved in more than two projects simultaneously?
of couse ,, i involved 3 projects at a time.

Do you have any experience in the Production support?

no

What kinds of Testing have you done on your Project (Unit or Integration or System or UAT)? And Enhancement’s were done after testing?
Unit testing and sometimes integration. UAT will done by business not BI team.
How many Dimension Table are there in your Project and how are they linked to the fact table?

current project 18 and 2 facts . and relationship will always be dimention keys.

How do we do the Fact Load?
loding the fact table ??
after dimention load complete ..

How did you implement CDC in your project?

change data capture ??
It is always on souce modification date.

How does your Mapping in File to Load look like?

souce --->ods--->flat file ( .dat) -- e cap delimeter

How does your Mapping in Load to Stage look like?

is depends on project ...
Souce -->ods--->stage

How does your Mapping in Stage to ODS look like?

Stage to ods ???
never ...

What is the size of your Data warehouse?
10 TB
What is your Daily feed size and weekly feed size?
feed size>???
Which Approach (Top down or Bottom Up) was used in building your project?
Bottom up ---i mean you are asking about dimention to fact and fact to dimention??
How do you access your source’s (are they Flat files or Relational)?

Relational and some times flat files also.

Have you developed any Stored Procedure or triggers in this project? How did you use them and in which situation?

no

Did your Project go live? What are the issues that you have faced while moving your project from the Test Environment to the Production Environment?

yeah...i faced some issues.

What is the biggest Challenge that you encountered in this project?

Dynamic hierachy data distribution and moving the files from unix box to informatica directory through shell scriopt , need to clean the data in flatfile itselt.

What is the scheduler tool you have used in this project? How did you schedule jobs using it?

Dollar Universe.. by using $U Sessioin task.

Friday 8 August 2014

Slowly Changing Dimensions(SCD) full Description

Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes slowly,.

rather than changing regularly on a time basis.

Let’s say I have a customer dimension with these columns mainly

(Customer Id, Customer First Name, Customer Last Name, Customer Country)

Customer Id	Customer First Name	Customer Last Name	Customer Country
1	Sudheer	Sharma	India

Now, this guy moved to US. In source the country name has been changed to US, we need to update that in our target dimension to reflect this change.

SCD Type 1: The new incoming record (changed/modified data set) replaces the existing old record in target.

Customer Id	Customer First Name	Customer Last Name	Customer Country
1	Sudheer	Sharma	US

Old value (India) is overwritten by the new value (US) and there is no way to find out the old version of data. It holds only the current version of data.

SCD Type 2: In this case, an additional record is added into the customer dimension. The beauty of this approach is it will maintain two versions, you will find two records the older version and the current version. In other words it maintains history. Again we can implement Type 2 in following methods

1. Versioning

2. Effective Dates

3. By setting Current Flag values/Record Indicators.

Method 1: Versioning

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Year	Version
1	Sudheer	Sharma	India	2008	0
1	Sudheer	Sharma	US	2009	1

Method 2: Effective Dates

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Start Date	Effective EndDate
1	Sudheer	Sharma	India	01/01/2008	12/31/2008
1	Sudheer	Sharma	US	01/01/2009	tilldate

Method 3: Effective Dates & Current Record Indicators

Customer Id	Customer First Name	Customer Last Name	Customer Country	Effective Start Date	Effective End Date	Current Record IND
1	Sudheer	Sharma	India	01/01/2008	12/31/2008	N
1	Sudheer	Sharma	US	01/01/2009	tilldate	Y

SCD Type 3: In this approach, only the information about a previous value of a dimension is written into the database. An ‘old ‘or ‘previous’ column is created which stores the immediate previous attribute.

Product ID	Product Name	Current Year	Current Price	Previous Year	Previous Price
1	Close-up	2008	50.00	2007	45.00

The problem with this approach is over years, if the product price continuously changes, then the complete history may not be stored, only the latest change will be stored. For example, in year 2009, if the product price changes to 60, then we would not be able to see the complete history of 2007 prices, since the old values would have been updated with 2008 information.

Top 30 Informatica Interview Questions and Answers ?

1.What is Data Warehousing?
Ans:
Data warehouses Basis:- Data warehouses are widely used within the largest and most complex businesses in the world.
Use with in moderately large organizations, even those with more than 1,000 employees remains surprisingly low at the moment.
We are confident that use of this technology will grow dramatically in the next few years. In challenging times good decision-making becomes critical.
The best decisions are made when all the relevant data available is taken into consideration.
The best possible source for that data is a welldesigned data warehouse.
To make any new decision or to introduce new Plan data warehousing is very important.
*ETL is one of the main processes in data warehousing.ETL means extract transform and Load data into data warehouse.
Informatica is ETL tool. It is very flexible and cheaper as compared to other ETL tool.

2.What is Business Intelligence (BI)?
Ans:
BI is an abbreviation of the two words Business Intelligence, bringing the right information at the right time to the right people in the right format.
It is a 5-step process to run your business smarter, starting with registering the right data correctly, collecting the data from multiple sources,
transforming, combining and storing it in a data warehouse.

3.What is a Dimension Table?
Ans:
A dimension table stores attributes, or dimensions, that describe the objects in a fact table.
A dimension table has a primary key column that uniquely identifies each dimension record (row).
Types of Dimensions:
* Slowly Changing Dimensions
*Rapidly Changing Dimensions

*Junk Dimensions
*Inferred Dimensions
*Conformed Dimensions
Etc..

4.What is a Fact Table?
Ans:

A fact table stores quantitative information for analysis and is often denormalized.

A fact table works with dimension tables. A fact table holds the data to be analyzed,
and a dimension table stores data about the ways in which the data in the fact table can be analyzed. Thus,
the fact table consists of two types of columns. The foreign keys column allows joins with dimension tables,
and the measures columns contain the data that is being analyzed.

5.What are the Fundamental Stages of Data Warehousing?
Ans:
i.Offline Operational Databases – Data warehouses in this initial stage are developed by simply copying the database of an operational system to
an off-line server where the processing load of reporting does not impact on the operational system’s performance.
ii.Offline Data Warehouse – Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly)
from the operational systems and the data is stored in an integrated reporting-oriented data structure.
iii.Real Time Data Warehouse – Data warehouses at this stage are updated on a transaction or event basis,
every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
iv.Integrated Data Warehouse – Data warehouses at this stage are used to generate activity or transactions that
are passed back into the operational systems for use in the daily activity of the organization.

8.What are the Different Methods of Loading Dimension tables?
Ans:
i. Conventional Load: Before loading the data, all the Table constraints will be checked against the data.
ii.Direct load:(Faster Loading): All the Constraints will be disabled. Data will be loaded directly.
Later the data will be checked against the table constraints and the bad data won't be indexed.

9.What is Data Mining?
Ans:
Data mining is the practice of automatically searching large stores of data to discover patterns and trends that go beyond simple analysis.
Data mining uses sophisticated mathematical algorithms to segment the data and evaluate the probability of future events.
Data mining is also known as Knowledge Discovery in Data (KDD).

The key properties of data mining are:

Automatic discovery of patterns

Prediction of likely outcomes

Creation of actionable information

Focus on large data sets and databases

Data mining can answer questions that cannot be addressed through simple query and reporting techniques.

10.What is the Difference between a View and a Materialized View?
Ans:
A view
is nothing but a SQL query takes the output of a query and makes it appear like
a virtual table which does not take up any storage space or contain any data

But Materialized views are schema objects it storing the results of a query in a
separate schema object (i.e. take up storage space and contain data). This
indicates the materialized view is returning a physically separate copy of the
table data.

11.What is a staging area?
Ans:
Staging(logical Memory) area is place where you hold temporary tables on data warehouse server.
Staging tables are connected to work area or fact tables. We basically need staging area to hold the data ,
and perform data cleansing and merging , before loading the data into warehouse.

12.What is OLAP?
Ans:
OLAP (online analytical processing) enables a user to easily and selectively extract and view data from different points-of-view.

13.What is the Difference between OLTP and OLAP?

Ans: OLTP OLAP
1. Application : * Operational: ERP, CRM, legacy apps, * Management Information System, Decision Support System.

2.Typical users: * Staff * Managers, Executives .

3.Horizon : *Weeks, Months *Years .

4. : * Normalization is promoted * Denormalization is promoted .

5. : *On Line Transaction Processing *On Line Analytic Processing .

13.What is ODS?
Ans:
> ODS - Operational Data Store.
> ODS Comes between staging area & Data Warehouse.
> ODS is nothing but a staging area, in which you can keep your OLTP type data like your day to day transactional data.

14.What is ER Diagram?
Ans:
An entity-relationship diagram is a data modeling technique that creates a graphical representation of the entities,
and the relationships between entities, within an information system.

15.What are Lookup Tables?
Ans:
Look up stage performs the same action as merge and join
stages do. But here we have a table as lookup table which
will be having a key column with the source table. if the
key colums in source are matched with the lookup table, that
rows will only be passed to output. Others can be taken in
reject.

16.What are Aggregate Tables?
Ans:
Aggregate table contains the summary of existing warhouse data , it contains data which is group accordingly (month , quartely etc)
depending upon the business needs.For example you may contain transaction in fact table for all the customers but
sometime you want the sum of amount for a particular customer over a month in that case you need to run the query in fact table which
contains millions of row and time consuming. so avoid this problem we do the aggregation of amount in aggregate over a month for each customers,
so this will improve the performance and you can retrieve the result very fastly.

17.What are Conformed Dimensions?
Ans:Conformed dimentions are dimensions which are common to the cubes.
(cubes are the schemas contains facts and dimension tables) Consider Cube-1 contains F1,D1,D2,D3 and Cube-2 contains F2,D1,D2,D4 are
the Facts and Dimensions here D1,D2 are the Conformed Dimensions .

18.How do you Load the Time Dimension?
Ans:
We can use SCD Type 1/2/3 to load any Dimensions based on the requirement.

OR

U can load time dimension manually by writing scripts in PL/SQL to load the time dimension table with values for a period.

19.What is a Level of Granularity of a Fact Table?
Ans:
The granularity is the lowest level of information stored in the fact table. The depth of data level is known as granularity.
In date dimension the level could be year, month, quarter, period, week, day of granularity.

20.What is a Factless Facts Table?
Ans:Is fact table that does not contain any measure. This table will only contain keys from different dimension tables.
This is often used to resolve a many-to-many cardinality issue.

21.What are Slowly Changing Dimensions (SCD)?
Ans:
Slowly Changing Dimensions: Slowly changing dimensions are the dimensions in which the data changes slowly,
rather than changing regularly on a time basis.

22.What is a Surrogate Key?
Ans:sarrogate key is ssystem generated sequence number,an
artificial key. which can be used in maintaing the history
of the data .

23.What is Junk Dimension?
Ans:
A "junk" dimension is a collection of random transactional
codes, flags and/or text attributes that are unrelated to
any particular dimension.

The junk dimension is simply a structure that provides a
convenient place to store the junk attributes. A good
example would be a trade fact in a company that brokers
equity trades.

24.What is a Data Mart?
Ans: Data mart is a part of the data warehouse.

25.What is Schema? & Types of Schema?
Ans:
There are 3 types of Schemas,
1.Star Schema
2.Snowflake Schema
3.Galaxy/Integrated/Hybride/Consalation Schemas
1)Star Schema-It is defined as the fact table is centrally located,surrounded by dimension tables is called star schema.
In that dimension tables are denormalized and fact table is normalized.(or)Star schema is defind as two or more fact and
two or more dimension tables that are related to foreigh keys.
2)Snowflake Schema-It is defind as the Denormalized Dimensions tables are can be splited into two or more normalized dimensions is called snowflake schema.
In that both dimensions and fact tables are normalized.
3)Hybride Schema-It is defind as a dimension table is shared by two or more fact tables.

26.Explain the Paradigm of Bill Inmon and Ralph Kimball.
Ans:
Bill Inmon’s paradigm: Data warehouse is one part of the overall business intelligence system. An enterprise has one data warehouse, and data marts source their information from the data warehouse. In the data warehouse, information is stored in 3rd normal form.

Ralph Kimball’s paradigm: Data warehouse is the conglomerate of all data marts within the enterprise. Information is always stored in the dimensional model.

Informatica-DWH

Pages

Monday 1 September 2014

The following are the different phases involved in a ETL project development life cycle.

Could anyone please tell me how to expalin an informatica project in interview..?

Friday 8 August 2014

Slowly Changing Dimensions(SCD) full Description

Top 30 Informatica Interview Questions and Answers ?

Thursday 3 July 2014

Top 20 Informatica Interview Questions with Answers-Part-5

Wednesday 2 July 2014

Top 20 Informatica Interview Questions with Answers-Part-4

Top 20 Informatica Interview Questions with Answers-Part-3

Blog Archive