Sunday, 17 March 2013

Extracting Information from the Source File Name can Never be Easier


Many times we get data files from different sources and the source file name itself might contain some valid information like 'file creation date', 'file generated department name', 'region name' etc... We used to use Unix script to extract such information from the file name, but we don't have to write such scripts anymore. In this article let's look at the option provided by Informatica PowerCenter to extract information from the source file name.

Business Requirement

Let's directly jump into the demonstration. For our demonstration, lets consider the "Customer Data" file, which includes all the new customers information. This file comes with the naming format as below
    • Customer_Data_YYYYMMDD.csv
When the file is loaded into the staging table, along with all the source columns, the date part from the source file name needs to be extracted and loaded into the target column AS_OF_DATE as DATE.

Informatica Implementation

As the first step, lets create the source definition based on the source file. Now open the source definition and navigate to the 'Property' tab and set the 'Add Currently Processed Flat File Name Port' as shown in the picture below.


Setting 'Add Currently Processed Flat File Name Port' property will add an additional port'CurrentlyProcessedFileName' in your source definition as shown below image. This port will provide the value of your source file name.


Note : The newly added port can be treated as any normal transformation port. So lets pull all the columns to an expression transformation and extract the date value.

In the Expression Transformation add an output port of DATE data type.

          AS_OF_DATE as Output Port.

And following is  the expression for the port.

         TO_DATE(
                  SUBSTR(CurrentlyProcessedFileName,
                               LENGTH(CurrentlyProcessedFileName)-12,
                               LENGTH(CurrentlyProcessedFileName)-4),
         'YYYYMMDD')


Now map all the columns from the Expression Transformation to the target. After this step the mapping will look like we have it in the below image.


We are all done with the required configuration at the mapping level and we do not need any special setting at theworkflow configuration. Once you run the workflow you will see the target table is loaded with all the column along with AS_OF_DATE which is extracted from the source file name.

No comments:

Post a Comment