Sunday, 17 March 2013

What is a FACT LESS FACT TABLE? Where we use Factless Fact?


We know that fact table is a collection of many facts and measures having multiple keys joined with one or more dimension tables. Facts contain both numeric and additive fields. But factless fact table are different from all these.
factless fact table is fact table that does not contain fact. They contain only dimensional keys and it captures events that happen only at information level but not included in the calculations level. just an information about an event that happen over a period.
A factless fact table captures the many-to-many relationships between dimensions, but contains no numeric or textual facts. They are often used to record events or coverage information. Common examples of factless fact tables include:
  • Identifying product promotion events (to determine promoted products that didn’t sell)
  • Tracking student attendance or registration events
  • Tracking insurance-related accident events
  • Identifying building, facility, and equipment schedules for a hospital or university
Factless fact tables are used for tracking a process or collecting stats. They are called so because, the fact table does not have aggregate table numeric values or information. There are two types of factless fact tables: those that describe events, and those that describe conditions. Both may play important roles in your dimensional models.
Factless fact tables for events
The first type of factless fact table is a table that records an event. Many event-tracking tables in dimensional data warehouses turn out to be factless. Sometimes there seem to be no facts associated with an importantbusiness process. Events or activities occur that you wish to track, but you find no measurements. In situations like this, build a standard transaction-grained fact table that contains no facts.
For eg.
The above fact is used to capture the leave taken by an employee. Whenever an employee takes leave a record is created with the dimensions. Using the fact FACT_LEAVE we can answer many questions like
  • Number of leaves taken by an employee
  • The type of leave an employee takes
  • Details of the employee who took leave
Factless fact tables for conditions
Factless fact tables are also used to model conditions or other important relationships among dimensions. In these cases, there are no clear transactions or events. It is used to support negative analysis report. For example a Store that did not sell a product for a given period.  To produce such report, you need to have a fact table to capture all the possible combinations.  You can then figure out what is missing.
For eg, fact_promo gives the information about the products which have promotions but still did not sell
This  fact answers the below questions:
  • To find out products that have promotions.
  • To find out products that have promotion that sell.
  • The list of products that have promotion but did not sell.
This kind of factless fact table is used to track conditions, coverage or eligibility.  In Kimball terminology, it is called a “coverage table.”
Note:
We may have the question that why we cannot include these information in the actual fact table .The problem is that if we do so then the fact size will increase enormously.
Factless fact table is crucial in many complex business processes. By applying you can design a dimensional model that has no clear facts to produce more meaningful information for your business processes. Factless fact table itself can be used to generate the useful reports.

Event Wait Task for Dynamically Changing File Names


Whenever we schedule an Informatica Workflow, most of the times even before the actual session can run, we need to check some prerequisites like, successful completion of source system data loads, arrival of source data files etc.  Informatica PowerCenter Workflow Manager provides task such as ‘Timer’, ‘Event Wait’, Event Raise’ for such validations and checks.  In this article lets see how we can configure ‘Event Wait’ task to check the existence for a file with dynamically changing name.

For the demonstration, lets consider a scenario, where we need to reads data from the daily sales file and load into a relational table.  This source file is supposed to be available daily at 08:00 PM with a naming standard ‘daily_sales_mm-dd-yyyy.csv’

Informatica Implementation

Here we will just concentrate on building the workflow.
As the first step, lets create the workflow and a workflow variable $$FileName as shown in below image.
    • $$FileName as NSTRING
Informatica Woekfloe Variable
Now Create an Assignment task and assign the value of the Variable as below. This variable will set the value of the file, to be waited dynamically.
  • $$FileName := 'daily_sales_' || TO_CHAR(SYSDATE,'MM-DD-YYYY') || '.csv' 
Informatica Assignment Task
Now we need to create an ‘Event Wait’ task to wait for the Source File, which has the dynamic name. We will be using the workflow variable created in the previous step for the same as shown down below.

Informatica Event Wait
Now we just need to add the session which reads data from the source file. You can use the same parameter for the source file.

Informatica Workflow
You can schedule the workflow to run daily 08:00 PM as the last step.
Informatica Scheduler

Extracting Information from the Source File Name can Never be Easier


Many times we get data files from different sources and the source file name itself might contain some valid information like 'file creation date', 'file generated department name', 'region name' etc... We used to use Unix script to extract such information from the file name, but we don't have to write such scripts anymore. In this article let's look at the option provided by Informatica PowerCenter to extract information from the source file name.

Business Requirement

Let's directly jump into the demonstration. For our demonstration, lets consider the "Customer Data" file, which includes all the new customers information. This file comes with the naming format as below
    • Customer_Data_YYYYMMDD.csv
When the file is loaded into the staging table, along with all the source columns, the date part from the source file name needs to be extracted and loaded into the target column AS_OF_DATE as DATE.

Informatica Implementation

As the first step, lets create the source definition based on the source file. Now open the source definition and navigate to the 'Property' tab and set the 'Add Currently Processed Flat File Name Port' as shown in the picture below.


Setting 'Add Currently Processed Flat File Name Port' property will add an additional port'CurrentlyProcessedFileName' in your source definition as shown below image. This port will provide the value of your source file name.


Note : The newly added port can be treated as any normal transformation port. So lets pull all the columns to an expression transformation and extract the date value.

In the Expression Transformation add an output port of DATE data type.

          AS_OF_DATE as Output Port.

And following is  the expression for the port.

         TO_DATE(
                  SUBSTR(CurrentlyProcessedFileName,
                               LENGTH(CurrentlyProcessedFileName)-12,
                               LENGTH(CurrentlyProcessedFileName)-4),
         'YYYYMMDD')


Now map all the columns from the Expression Transformation to the target. After this step the mapping will look like we have it in the below image.


We are all done with the required configuration at the mapping level and we do not need any special setting at theworkflow configuration. Once you run the workflow you will see the target table is loaded with all the column along with AS_OF_DATE which is extracted from the source file name.

Friday, 15 March 2013

How to Change the Session Log Name in Informatica Workflow?

Joiner Cache
When a Joiner transformation occurs in a session, the Informatica Server reads all the records from the master source and builds index and data caches based on the master rows. After building the caches, the Joiner transformation reads records from the detail source and performs joins.
Joiner cache have 2 types of cache 1.Index cache 2. Data cache.
Index cache stores all the port values which are participated in the join condition and Data cache have stored all ports which are not participated in the join condition.




How to Change the Session Log Name in Informatica Workflow?


Whenever we run the session in Informatica, integration service creates session log file. By default Session Log File Directory defaults to the service process variable, $PMSessionLogDir and the Session Log File Name defaults to $PMSessionLogFile.
In order to change the session log name you can set the $PMSessionLogFileto Session_test.txt in the parameter file. When you make this change in the parameter file and run the session, the integration service creates a session log file named Session_test.txt

Reusability is a great feature in Informatica PowerCenter which can be used by developers. Its general purpose is to reduce unnecessary coding which ultimately reduces development time and increases supportability. In this article lets see different options available in Informatica PowerCenter to make your code reusable.
  1. Mapplet
  2. Reusable Transformation
  3. Shared Folder
  4. Global Repository
  5. Worklet
  6. Reusable Session
  7. Reusable Tasks
  8. Mapping Parameter
  9. Mapping Variable
  10. WorkFlow Variable
  11. Worklet Variable

1. Mapplet

Mapplet is a reusable object that you create in the Mapplet Designer. It contains a set of transformations and lets you reuse the transformation logic in multiple mappings. When you use a mapplet in a mapping, you use an instance of the mapplet. Any change made to the mapplet is inherited by all instances of the mapplet.
If you have several fact tables that require a series of dimension keys, you can create a mapplet containing a series of Lookup transformations to find each dimension key. You can then use the mapplet in each fact table mapping.

Learn more about Mapplets from the Demo Video
Informatica Mapplet

2. Reusable Transformation

Reusable Transformations can be created in Transformation Developer and can be reused in multiple mappings. When you use a reusable transformation in a mapping, you use an instance of the transformation. Any change made to the transformation is inherited by all its instances.

If you have a business rule to trim spaces from customer name and customer address columns, then you can create a reusable expression transformation to trim spaces from the column, which can be reused in multiple mappings.

Learn more about Reusable Transformations from the Demo Video
Informatica Reusable Transformation

3. Shared Folder

A Shared folder can be created in a Repository from the repository manager. The objects from the shared folder can be accessed using shortcuts from different folders in the same Repository.

You can create all the reusable framework objects (CDC Framework, Error Handling Framework etc...) in the Shared Folder  and can be accessed using shortcuts from different folders in the same  repository.
Informatica PowerCenter Shared Folder

4. Global Repository

A Global Repository can be created in a Repository Domain which is linked to multiple Local Repositories. An object created in the Global Repository is accessible from the Local Repositories using the shortcuts created to the global objects. Any change in the Global Repository Object will be inherited to all the shortcut objects.

You can create all the reusable framework objects (CDC Framework, Error Handling Framework etc...) in the Global Repository and can be accessed using shortcuts from different local repositories.
Informatica PowerCenter Object Shortcut

5. Worklet

A worklet is an object created by combining set of tasks to build a workflow logic. Worklet can be reused in multiple  workflows, which can be configured to run concurrently. You can create a worklet in the Worklet Designer.
Reusable worklet implemented in the Change Data Capture Framework, which is discussed in one of our prior article is a practical  application of worklet.
Informatica PowerCenter Worklet Designer

6. Reusable Session

A session is a set of instructions that tells the Integration Service how and when to move data from sources to  targets. You can create a reusable Session task in the Task Developer. A reusable session can be used multiple workflows and even in a worklet.

Reusable session used in the Operational Metadata Logging Framework, which is discussed in one of our prior article is a practical  implementation of reusable session.
Informatica PowerCenter Reusable Session

7. Reusable Tasks

Apart from the reusable session task, we can create reusable email task and reusable command task in the Task  Developer. We can reuse these reusable tasks in multiple workflows and worklets.

A reusable email task can be used to create a standard session failure email notification, which can be reused in different session tasks.
Informatica PowerCenter Task Developer

8. Mapping Parameter

Define values that remain constant throughout a session, such as state sales tax rates. When declared in a mapping or mapplet, $$ParameterName is a user-defined mapping parameter. 
Informatica Mapping Parameter

9. Mapping Variable

Define values that can change during a session. The Integration Service saves the value of a mapping variable to the repository at the end of each successful session run and uses that value the next time you run the session. When declared in a mapping or mapplet, $$VariableName is a mapping variable. Read more about mapping variables implementation from the article Change Data Capture (CDC) Made Easy Using Mapping Variables
Informatica Mapping variables

10. WorkFlow Variable

You can create user defined workflow variables to pass variable values between mapping, sessions within a workflow. Even if the workflow variables alone do not give code reusability, it works with other components to provide reusability. Read more on workflow variable from the article An ETL Framework for Change Data Capture.
Informatica Workflow Variables

11. Worklet Variable

User defined worklet variables can be create in worklets to pass values between mapping, sessions, worklets within a workflow. Worklet variables by itself do not give code reusability, but it works with other components to facilitate code reusability. Learn more on worklet variable from the article An ETL Framework for Change Data Capture.
Informatica Worklet Variables
In addition to the parameters and variables mentioned above, Informatica PowerCenter provides much more type of Variables and Parameters, which provide more flexibility to build reusable code, such as
    • Service variables.
    • Service process variables.
    • Session parameters.
    • $Source. $Target connection variables.
    • Email variables.
    • Local variables.
    • Built-in variables.

Thursday, 14 March 2013

Error Handling


Error Handling
Identifying errors and creating an error handling strategy is very important. The 2 types of errors in an ETL process are – Data Errors & Process Errors.
To handle Data errors we can use the Row Error Logging feature. The errors are captured into the error tables. We can then analyse, correct and reprocess them.
To handle Process errors we can configure an email task to notify the event of a session failure.
Row Error Logging: When we configure the session with this option the Integration service logs errors information to relational tables or to an error log file.
First time it creates the table or a file and then onwards it appends to the existing table or file. This log file contains information such as source name, row ID, row data, transformation error code etc. which can be used to determine the cause & source of an error.
By default the Integration service does not write the dropped rows to session log or create a reject file. So we can enable verbose tracing to write to session log. Performance is decreased as one row at a time is processed.

There is one file called the bad file which generally has the format as *.bad and it contains the records rejected by informatica server. There are two parameters one fort the types of row and other for the types of columns. The row indicators signifies what operation is going to take place ( i.e. insertion, deletion, updation etc.). The column indicators contain information regarding why the column has been rejected.( such as violation of not null constraint, value error, overflow etc.) If one rectifies the error in the data preesent in the bad file and then reloads the data in the target,then the table will contain only valid data.

Error Handling is one of the must have components in any Data Warehouse or Data Integration project. When we start with any Data Warehouse or Data Integration projects,business users come up with set of exceptions to be handled in the ETL process. In this article,  lets talk about how do we easily handle these user defined error.

Informatica Functions Used

We are going to use two functions provided by Informatica PowerCenter to define our user defined error capture logic. Before we get into the coding lets understand the functions, which we are going to use.
  1. ERROR() 
  2. ABORT()
ERROR() : This function Causes the PowerCenter Integration Service to skip a row and issue an error message, which you define. The error message displays in the session log or written to the error log tables based on the error logging type configuration in the session.

ABORT() 
: Stops the session, and issues a specified error message to the session log file or written to the error log tables based on the error logging type configuration in the session. When the PowerCenter Integration Service encounters an ABORT function, it stops transforming data at that row. It processes any rows read before the session aborts.

Note : Use the ERROR, ABORT function for both input and output port default values. You might use these functions for input ports to keep null values from passing into a transformation and use for output ports to handle any kind of transformation error.

Informatica Implementation

For the demonstration lets consider a workflow which loads daily credit card transactions and below two user defined data validation checks
  1. Should not load any transaction with 0 (zero) amount, but capture such transactions into error tables
  2. Should not process any transactions with out credit card number and Stop the workflow.

Mapping Level Changes

To handle both the exceptions, lets create an expression transformation and add two variable ports.
  • TEST_TRANS_AMOUNT as Variable Port
  • TEST_CREDIT_CARD_NB as Variable Port
Add below expression for both ports. First expression will take care of the user defined data validation check No 1 and second expression will take care of user defined data validation check No 2.
  • TEST_TRANS_AMOUNT :- IIF(TRANS_AMOUNT = 0,ERROR('0 (Zero) Transaction Amount'))
  • TEST_CREDIT_CARD_NB :- IIF(ISNULL(LTRIM(RTRIM(CREDIT_CARD_ND))),ABORT('Empty Credit Card Number'))
The complete expression transformation is shown in below image.
Informatica Error ABORT functions usage
Now insert this transformation in the mapping where you need the data validation and complete the mapping.

Hint :  This Expression can be converted into a Reusable transformation, So that any mapping needs this data validation can reuse this transformation.

Session Level Changes

Once the mapping is complete, configure the session and provide the settings for row error logging as shown in below image. Please read the article Error handling made easy using Informatica Row Error Logging for more details on row error logging.
informatica row error logging
With this configuration we specified, Informatica PowerCenter will create four different tables for error logging and the table details as below.
  • ETL_PMERR_DATA :- Stores data about a transformation row error and its corresponding source row.
  • ETL_PMERR_MSG :- Stores metadata about an error and the error message.
  • ETL_PMERR_SESS :- Stores metadata about the session.
  • ETL_PMERR_TRANS :- Stores metadata about the source and transformation ports, when error occurs.
With this, we are done with the setting required to capture user defined errors.  Any data records which violates our data validation check will be captured into PMERR tables mentioned above.

Report the Error Data.

Now we have the error data stored in the error table, we can pull the error report using an SQL querry. Below is a basic query to get the error report. We can be more fancy with the SQL and get more information from the error tables.

select 
       sess.FOLDER_NAME as 'Folder Name',
       sess.WORKFLOW_NAME as 'WorkFlow Name',
       sess.TASK_INST_PATH as 'Session Name',
       data.SOURCE_ROW_DATA as 'Source Data',
       msg.ERROR_MSG as 'Error MSG'
from 
       ETL_PMERR_SESS sess
left outer join ETL_PMERR_DATA data
      on data.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and 
      data.SESS_INST_ID = sess.SESS_INST_ID
left outer join ETL_PMERR_MSG msg
      on msg.WORKFLOW_RUN_ID = sess.WORKFLOW_RUN_ID and 
      msg.SESS_INST_ID = sess.SESS_INST_ID
where 
      sess.FOLDER_NAME = <Project Folder Name> and
      sess.WORKFLOW_NAME = <Workflow Name> and
      sess.TASK_INST_PATH = <Session Name> and
      sess.SESS_START_TIME = <Session Run Time>

Pros and Cons of this Approach.

We should know the Pros and Cons of this approach  before applying this to your project.

Pros.

  1. Out of the box Solution Provided by Informatica.
  2. Less Coding and Testing efforts required by the development team.

Cons.

  1. Added overhead to the Session performance, which is expected and acceptable.

Wednesday, 13 March 2013

Scenario

1 Q. i have two source, in that send matched rows into one target and unmached rows into another target by using joiner

Ans:JOINER: 
Join Type as - Full Outer Join
Join Condition- M_PK = D_PK (e.g.)


ROUTER:

MATCH GROUP:
Condition M_PK IS NOT NULL AND D_PK IS NOT NULL

DEFAULT: Will have Unmatched records.

2 Q> Source
ID NAME MOBILE
101 RAM 12345
101 RAM 67890
102 RAJ 23456
102 RAJ 78901
103 RACK 34567
103 RACK 89012
104 SAM 45678
104 SAM 90123

Target
ID NAME MOBILE
101 RAM 1234567890
102 RAJ 2345678901
103 RACK 3456789012
104 SAM 4567890123
Ans:

Assuming that the source records do not change the order-

Use an Expression transform:

Port Name | Port Type | Expression
--------------------------------------------
ID ------------ | I/O 
NAME ------ | I/O
MOBILE ---- | I
V_MOB ---- | V ---------- | IIF( V_PREV_ID = ID, V_MOB || MOBILE, MOBILE )
V_PREV_ID | V ---------- | ID
O_MOB -----| O ---------- | V_MOB
O_FLAG --- | O ---------- | CUME(1)


Next use a Filter transform:

Condition: MOD( O_FLAG, 2 ) = 0

Finally the target.

3 Q: Why cant we use dynamic lookup in unconnected lookup ?

Ans: Dynamic lookup feature can never be handled by Unconnected lookup. 

For Dynamic Lookup we need the NewlookupRow column as output, along with all the columns of the lookup which needs to be associated first with the input ports as output to populate the target. We need multiple ports as output that cannot be achieved by Unconnected lookup.

4 Q: how we can add header and footer to the flat file?
ANS:
There are two options available in session properties to take care of this requirement. For this, Go to Mapping Tab Target Properties and Choose the header option as Output Field names OR Use Header Command output File.

Option 1, will create your output file with a header record and the column heading names will be same as your Target transformation port names.

Option 2, we can create our command to generate the header record text. We can use an 'echo' command here to get this created. Here is an example 
echo '"EMPNO"|"DEPTNO"'

It is recommended using the second option as it gives more flexibility for writing the column names. 

For footer in the mapping level, create the footer information like number of records processed, sum of revenue etc. and generate a flat file.

Use post-session command task to append this information of the footer file to the data file.

cat footer.txt >> EMP.txt