Data Modeling & Warehousing Basics
Data Modeling & Warehousing Basics
A data model is a database design which describes tables and relationships with in the database. Data
model helps in organizing the data in an efficient manner.There are three different phases in data
modeling.
a) Conceptual Data Modeling: This is the first phase of the data modeling. After analyzing FSD and BRD
we identify the list of entities and relationships between them. No attributes and keys are defined in this
phase.
b) Logical Data Modeling: This is the second phase of the data modeling. In this phase we define
attributes for entities and keys in each entity i.e includes primary key and foreign keys. In this phase we
go for approvals from Data Architect Team.
c) Physical Data Modeling: This is the third phase of the data modeling. In this phase we define table
names, column names, data types, constraints etc according the standards. We use this model to create
database objects in the database.
Dimensional modeling is used in data warehousing projects. Dimensional modeling includes star schema,
snow flake schema and hybrid schema. Hybrid schema includes combination of star and snow flake
schema. All these schema's contains dimension tables and fact tables. Relationships exist between
dimension tables and fact tables.
Star Schema is a Dimensional Model, centrally located fact table surrounded by multiple dimension
tables. In this schema, each dimension table has its own primary key. Fact tables contain primary keys,
and foreign keys from each dimension table.
a) Easy To Understand: Relationships between dimension and fact are based on primary and foreign keys.
b) High Performance: In order to get relevant data for the business reports, need to join
only dimensions and facts, leads to less number of joins.
Snow Flake schema is similar to star schema except that there is a hierarchical relationship between
dimension to dimension. Snow flake schema also contains dimensions and facts. Data is normalized in
snow flake schema, which leads to occupy less space in the database. Compare to star schema, snow
flake schema performance is low, need to join more number of tables in order to get relevant data for
business reports.
A dimension represents descriptive information or textual information. The data modeler identifies
attributes which are come under dimension table. Common examples for dimension tables are customer,
time and product etc.
A fact represents numeric or numbers. Every numeric or number is not a fact. A numeric value which is
used for business analysis is considered as a fact. For example customer number is not a fact even
though it is number. Quantity is a fact, we can do business analysis on this, how much quantity sold in a
particular location, how much quantity sold in a particular year etc
In general in DWH projects we prefer to use Star Schema only. Star schema is a powerful design in
performance consideration. We go for snow flake schema in DWH projects in below cases.
a) Database Space is a constraint: Data is normalized in snow flake schema, so less chance of data
redundancy leads to occupy less space.
b) No.of columns in the table are more than 500: If the number of columns are huge in a table, the
retrieval speed is less. To overcome this we split columns into multiple tables.
Data purging is the process of deleting data from the data warehouse. Based on business requirement
we maintain 5 or 7 or 10 years of data in DWH dimension and fact tables, before that data we delete
data from dimensions and fact tables. We create separate mappings to purge or delete data from
dimensions and facts.
These mappings we schedule yearly once to delete old data. Based on key columns we delete data from
dimensions and facts along with update strategy transformation. Data purging process helps to delete
unnecessary data from the DWH and leads to performance efficiency.
Data cleansing means removing unwanted data or inconsistent data. For example you have a table
Employee and column is city, values are coming different ways from different source systems
S1- Hyd S2- hyd S3- Hyderabad S4-hyderabad S5- HYDERABAD
In DWH we maintain all values as "HYDERABAD" based on business decision. We convert S1- Hyd value
to HYDERABAD, S2- hyd value to HYDERABAD, S3- Hyderabad value to HYDERABAD and S4-hyderabad
value to HYDERABAD.
Data merging means combining data from multiple source systems. We use join and union operations to
perform this. In order to join two sources, you should have at least one common column. In order to use
Union you should have same no.of columns an data types in both the sources.
Data Scrubbing means deriving new column values based on existing existing column values. For
example in an employee table we have columns eno, ename, sal coming from the source, need to
calculate tax based on salary, but tax is not coming from the source, we are deriving the value tax, this
operation is known as data scrubbing.
A data mart is a subset of enterprise data warehouse, at any point of time it contains only one subject or
department information. If it contains sales information then we call it as sales data mart, if it contains IT
information then we call it as IT data mart, if it contains finance information then we call it as finance
data mart.
a) Requirements gathering is simple, if you want construct only sales data mart you need to connect only
with sales people
c) The security is high, if you construct sales data mart, you are providing access to only sales people
Aggregator transformation is an active transformation and its connected, mainly used to perform
calculations on group of rows.
For example in each department we have several employees. In order to know how much salary we are
paying for each department then first we group the rows based on department number and then we use
sum() aggregate function.
If you don't select any group by ports in aggregator transformation, by default it treats all rows as single
group and returns last row. For example if you send 1 million rows as input to aggregator, the output is
only 1 row i.e last row. When you double click on aggregator transformation it opens a dialogue box,
select ports tab, under this tab you can select group by ports.
If you select all ports as group by ports in aggregator, it applies grouping on all ports and eliminates
duplicate records. In grouping it consider all ports, if all ports are same then it will treat as a single group.
Eg:-
Input:
Output:
There are two different types of caches in aggregator transformation. By default the cache values stored
under directory $PMCacheDir
Following are the different types of aggregate functions available in informatica power center tool.
a. Sum()
b. Max()
c. Min()
d. Avg()
e. First()
f. Last() etc
The conditional statements you can include in the aggregate function itself. While performing aggregate
calculations, we can use conditional statements with the help of IIF.
Eg:- SUM(IIF(CITY='HYDERABAD',SAL,0))
If you place one aggregate function in another then it is called nesting of aggregation. If you want to use
incremental aggregation, you should not use nested aggregate functions.
Eg:- MAX(SUM(SAL))
The above example returns which department is paying highest salaries. Inner function calculates how
much salary we are paying for each department.
All rows containing null values will treat as separate group and the aggregator transformation returns
last row from that group.
Yes we can create variable ports in aggregator transformation. Generally we define variable ports for
temporary calculations, which are used in other output ports calculations.
There are two optimization techniques in aggregator transformation to improve the performance.
a. Sorted Input
b. Incremental Aggregation
Sorted Input option available under properties tab of aggregator transformation. When you select this
option you need to send the sorted data to the aggregator. You need sort group by ports before sending
data to the aggregator. For example if you take DNO is the group by port then you need to sort on DNO.
When we sort the data and send to aggregator, then immediately performs aggregate calculations. It
won't wait till all rows entered into aggregator. Informatica can't read all rows at a time, it will read block
by block. If you don't select sorted input, aggregator will wait till all rows entered into aggregator.
Incremental aggregation means aggregating values based on existing aggregated values. Below example
clarifies.
Ouput :- 10 5000
this time when you use incremental aggregation it performs 5000+4000 instead of again adding all
employee salaries 2000+3000+4000
25. When cache files deleted for incremental aggregation what happens?
When cache files not available in informatica server, then session won't fail, it will recreate the cache
automatically. But it takes more time compared to normal run, reason is recreation of cache takes some
time.
26. Can we select both sorted input and incremental aggregation at a time?
No, we can't select. Reason is if you select sorted input it performs calculations in memory. When you
select incremental aggregation it performs calculations on existing cache values.
All the cache files by default stored under server/infa_shared/cache directory. You can change the
default directory value also by assigning some other directory.
Automatically cache files will be deleted after session succeeded. No need of manual intervention.
If session fails, cache files won't delete, we need to manually delete. Connect to informatica server and
then go to the cache directory and then apply rm command in unix/linux.
By default you can get last record from aggregator, to get first record use First() function for each port.
Source, SQ, two aggregators, by default one aggregator returns last row connect to one target isntance,
another aggregator select first() function on each port and connect to other target instances.
A transformation is a repository object that generates, modifies and produce output. The
transformations are generally used to implement different types of business logic.
There are two different types of transformations available in informatica power center tool.
A transformation is said to be active if the number of rows pass through that transformation can change.
For example if you send 5 rows as input to that transformation, the output may be less than 5 or more
than 5 rows.
Examples for active transformations are: aggregator, filter, router, rank, sorter, normalizer, joiner, update
strategy, source qualifier etc.
A transformation is said to be passive if the number of rows pass through that transformation can't
change. For example if you send 5 rows as input to that transformation, the output also 5 rows.
Examples for passive transformations are: expression, stored procedure, sequence generator etc.
We use expression transformation to perform calculations on each and every row. For example to
calculate tax and net salary for each and every employee.
We use aggregator to perform calculations on group of rows. For example in order to find out the total
amount of salary paying for each department, then we use sum() aggregate function. For example if you
want to know who is taking highest salary from each department then we use max() aggregate function.
Filter transformation supports single condition, where as router transformation supports multiple
conditions by creating multiple groups. You cannot capture rejected records in filter, where as in router
you can capture rejected records using default group.
We can eliminate duplicate records using distinct option under properties tab of sorter transformation.
For example if you have 5 rows in source and 2 rows are duplicate then 4 rows come as output.
40. Can you sort data on more than one port in sorter transformation?
Yes we can sort data more than one port, but the default port order for sorting is top to bottom. You can
select one port ascending and another port descending also. You can move the ports up and down also if
you are expecting different sort order.
There are two ways to eliminate duplicate rows which are coming from the source. If your source is
relational then you can use either first or second option below. Performance wise first option is the best
one. If your source is files then you need to go for second option only.
42. What are different types of caches exist for sorter transformation?
In general transformations that contains caches internally contains two types data cache and index
cache. Sorter transformation has only one cache i.e data cache. This data cache contains output ports
from the sorter transformation.
There are four different types of joins exist in joiner transformation. In order to use joiner transformation
you should have at least two sources. The sources may be tables or files or combination of both.
1. Normal Join
2. Master Outer Join
3. Detail Outer Join
4. Full Outer Join
We can join only two sources at a time. One we need to consider it as master and the other need to
consider it as detail. For performance point of view we take less no.of rows as master and huge no.of
rows as detail. Even if you take in reverse direction the output is same. Internally it builds cache for
master source.
Two joiner transformations required to join 3 tables. You can join only two tables at a time. First take two
tables and then connect to joiner. The third table and first joiner output need to connect to second
joiner.
Normal join means, matching rows come as output from both the sources. You should have at least one
common column to join sources.
Matching rows from both the sources and non-matching rows from detail source or normal join + non-
matching rows from details source.
Matching rows from both the sources and non-matching rows from master source or normal join + non-
matching rows from master source.
Matching rows come as output from both the sources and non-matching rows from master and detail
sources or normal join + non-matching rows from both master and detail sources.
Index Cache contains join condition columns. The join conditions might be on single column or multiple
columns.Data Cache contains output ports from joiner transformation.
51. Can you assign more than one port as rank port in rank transformation?
No, we can assign only one port as rank port in the rank transformation. Double click on rank
transformation, select ports tab, under this you can select rank port. If you try to select more than one
port, the last port you selected is only checked remaining all unchecked.
52. How many maximum number of ranks you can give in rank transformation?
The maximum number of ranks allowed in rank transformation is 2147483647. If you try to give more
than this number, then it will throw an error.
Input/output ports
Input Ports
Variable Ports
Output Ports
54. What are different types of constants available in update strategy transformation?
Update strategy transformation allows us to provide either character constants or numeric constants.
The default value of update strategy expression is 0. Zero means flagging rows for insert, 1 means
flagging rows for update, 2 means flagging rows for delete and 3 means flagging rows for reject.
DD_INSERT OR 0
DD_UPDATE OR 1
DD_DELETE OR 2
DD_REJECT OR 3
55. Can you update target table data without using update strategy transformation?
Yes we can update target table data without using update strategy transformation in two ways.
1. Use update override option, available at target side. Double Click on target instance.
2. In the session level you can select update as update option and you need to set treat source as
update. For this key column should be defined in the target.
SCD means slowly changing dimensions, whenever there is a change in the source, what we need to do
in our DWH tables. All dimension tables use SCD in data warehousing projects.
SCD-I: Only maintains current information, If a new record is coming from the source we will insert
otherwise we will update.
SCD-II: Maintains complete history. If a new record is coming from the source or a source row with
changes exist then we will insert the record. Old record just we will update flag or version or
record_end_dt columns.
SCD-III: Only Maintains partial history (current and previous only). We maintain two columns for this.
Curr_Sal and Previous_Sal. For a new employee we assign value to Curr_Sal column and Previous_Sal
column to null.
58. What is the difference between connected lookup and unconnected lookup?
a. Connected lookup participates in a mapping flow, unconnected lookup doesn't participate in the
mapping flow
b. Connected lookup returns multiple ports, unconnected lookup can return only one return port
c. Connected lookup supports user defined default values, unconnected lookup doesn't support user
defined default values
d. Connected lookup supports dynamic cache, unconnected lookup doesn't support dynamic cache
59. When we go for connected lookup and when we go for unconnected lookup?
If you want return multiple ports, you can choose connected lookup. If you want to return only one port
you can choose either connected lookup or unconnected lookup. If you want to call the same lookup
multiple times with different inputs for a single row, then we choose unconnected lookup.
There are different types of caches available in look up transformation. By default it will take Static
Cache.
Static Cache : The cache values won't change during session run
Dynamic Cache
Persistent Cache
61. What is dynamic cache?
A dynamic cache is a cache where you can modify the data in your cache while loading data in to your
target table. Generally we go for dynamic cache to filer out one record randomly from the source, if the
source has duplicates and target has key columns. To over come the failure on target side, we can use
dynamic cache and we load only one record into the target.
Persistent cache is a fixed cache, once you build, you can use same cache in multiple mappings.
Generally persistent cache we use, if the same lookup data using in multiple mappings, instead of
building same cache multiple times we build only one time and reuse. For first lookup we assign a name,
this name we use for all other lookup caches.
The newlookuprow port comes into picture when we use dynamic cache. This port automatically adds
when you select dynamic cache option in lookup transformation. It returns following values.
The associated port comes automatically into picture when we select dynamic cache in lookup
transformation. It represents with what value need to update the data in the cache.
65. What is the use of stored procedure transformation?
To call a pl/sql procedure from informatica, we use stored procedure transformation. There are two
types of stored procedure transformations connected and unconnected.
Using Normalizer transformation, you can convert columns to rows. After you add the ports into
normalizer tab, then you need select occurs clause. If you want to convert 4 columns to 4 rows then you
need to select occurs clause with value 4.
When you drag cobol source into the mapping designer tool from navigation bar, normalizer comes as
source qualifier automatically.
68. What is the difference between GC_ID and GK_ID in normalizer when you define ID with occurs
clause?
GC_ID and GK_ID comes automatically for ports, when you select occurs clause for that port. GC_ID
means Generated Character ID, it gives repetition of values to rows based on occurs clause. If we give
occurs 3, then it repeats the sequence 1,2,3 again 1,2,3 etc
GK_ID means Generated Key Value, it gives continuous sequence 1,2,3,4,5,6 etc
69. What is the priority in source qualifier if we give filter condition (dno=10) and also sql override
(dno=20)?
If you double click on source qualifier, you can see both the properties filter condition and sql override.
The highest priority is sql override, it takes the condition dno=20. If you don't provide sql override then it
will take value from the filter condition.
70. Can we connect more than one source to a single source qualifier?
yes, we can connect more than one source to a single source qualifier. When you drag multiple sources,
for each source you can see one source qualifier, you need manually delete all source qualifier except
one and then all other sources ports you can connect to one source qualifier.
71. What is the difference you observed between source and source qualifier?
Source contains source DB data types, where as source qualifier contains informatica data types.
Informatica cannot understand source data types so automatically converts to informatica data types.
For example if you drag oracle source, number is converted to integer, varchar is converted to string etc
b) Unconnected Stored Procedure: It is not linked to any other transformations in a mapping. We can call
this using expression transformation.
73. What is the difference between top down and bottom up approach?
If a mapping contains multiple flows, which flow target table should load first decides target load plan.
Generally we use this, if one flow target becomes source for another flow. This option is available under
mappings menu.
We can create reusable transformations using transformation developer tool. Once you create
reusable transformation you can see in the navigation bar under transformations folder.
If there is a change in the logic, you need not touch all the mapping where ever you used reusable
transformation, just change at only one place i.e at transformation developer tool, automatically that
change will reflect to all of the mappings.
A mapplet is a reusable logic, which is built based on multiple transformations. You can create mapplets
under mapplet designer tool. The same mapplet you can us in n number of mappings, in future if there is
any change in logic you need not touch all the mappings, just you need to change at only one place i.e in
the mapplet.
a) Active Mapplets
b) Passive Mapplets
If a mapplet contains atleast one active transformation then it is called active mapplet. Every mapplet
contains one mapplet input and mapplet out transformation.
If a mapplet contains all passive transformations then it is called passive mapplet. Passive mapplets give
more performance than active mapplets.
a) Take less no.of rows as master and huge number of rows as detail
83. When you select sorted input option in joiner what ports you need to sort?
The join condition columns in both the sources need to be sorted. You can sort those ports using sorter
transformations before joiner or in the source qualifier itself. We can sort the port either in ascending or
descending order.
84. What happens if you don't sort the data and you selected sorted input option?
If you select the sorted input option and don't sort the sources data, then the session will fail.
c) Persistent Cache
Export the code as xml from one repository and import that xml into another repository or you can
create a deployment group in one repository and copy that deployment group to another repository.
If any of the components in the mapping changed, example sources or targets or reusable
transformations or mapplets then mapping becomes impacted. It gives in yellow color triangle symbol.
We have to validate the mapping to disappear impacted symbol.
Invalid mapping shows is red color ball symbol. In order to execute the mapping it should be valid state.
For example if you use joiner transformation in a mapping and don't give any join condition then
mapping becomes invalid.
A mapping parameter represents constant value, generally we use mapping parameters if the values are
changing frequently. Mapping parameter starts with $$ symbol.
91. Where the parameter values stored?
The mapping parameter values stored in a parameter file. A parameter file is a text file that contains
parameter header followed by parameters and values.
[GLOBAL]
[session_name]
[workflow name]
[folder_name.wf:workflowname.st:session_name] etc
The session won't fail, it uses default value. In the session log we can verify whether the parameter has
taken value or not.
In power center designer tool, menu bar select mappings and then parameters and variables. You can
declare multiple parameters with in a mapping.
A mapping variable is a variable, the value is stored in informatica repository at the end of successful
session run. Both parameters and variables starts with $$ symbol.
Parameter values stored in a file, where as variable values stored in a repository. Parameter values can't
change during session run, variable values can change during session run.
SetVariable()
SetMaxVariable()
SetMinVariable() etc
In workflow manager tool, select workflow designer tool, select the session right click and select view
persistent values option.
Yes we can reset the value of a variable. In workflow manager tool, select workflow designer tool, select
the session right click and select view persistent values option. Here you can see reset option.
Tracing level represents the amount of information written into the session log. You can set tracing level
option in the transformation level or session level. If you want to set at transformation level, double click
on the transformation and select properties tab. Under properties tab you can select tracing level. If you
want to set at session level, double click on the session and select config object and then select override
tracing option.
Terse: Lowest amount of information compared to all other tracing levels, logs just status information
and errors.
Normal: Logs initialization Information, Status information and rows skipped due to transformation
errors.
Verbose Initialization: Normal tracing information and logs names of data and index file names.
Verbose Data: All information you can get here, status initialization, errors, column by column and row
by row values
We use verbose data, if we want debug our code. Verbose data gives more information compared to all
other tracing levels.
This option is available at session level under config object. The default value for this option is 1024
bytes. We need to alter the value for this option when the length of the record in the flat file exceeds
1024 characters. If the length of the row in a flat file is 2000 characters then we need to set line
sequential buffer length as 2000.
There are two different types of flat files. A flat file is a notepad file or text file that contains organised
data.
a) Fixed Width: In this file each and every field is defined with fixed length. If the value of the filed
exceeds the field length, the exceeded value moved to next field value.
b) Delimited: In this file each and every field is separated by some symbol. In real time most of the times
we see comma and pipe delimited flat files.
Informatica can handle multiple delimited flat files. We can specify list of delimiters one after another.
Here we need to specify two delimiters to handle this data , and | symbols.
When you drag flat file source into a mapping, along with source, source qualifier comes automatically.
In the source qualifier properties tab you need to select "Currently Processed File Name" option. Then
you can connect this port to target table.
You need to set the property number of initial rows to skip to 1. This option you can set in the session
level or under source analyzer tool.
You need to set the property number of initial rows to skip to 3. This option you can set in the session
level or under source analyzer tool.
Read the footer value into the first port and use filter transformation and then you can skip the footer.
110. What are the characteristics of data warehouse?
https://youtu.be/3-kZaJOtcfs
111. How to you load multiple flat file data at the same time?
Use file indirect or file list method. In the session level you need to set source file type as Indirect. The
source file name you need to give the file that contains list of all other files
112. What are the prerequisites to use file list or file indirect method?
d) All files should have same data types for the columns
113. What is the most frequently used row delimiter in a flat file?
115. How many ways you can get the source structure of a flat file?
If file is already available, we can directly select the option import from file under source analyzer tool. If
file not exists then we need to select the create option under source analyzer.
116. How many ways you can get the target structure of a flat file?
If file is already available, we can directly select the option import from file under target designer tool. If
file not exists then we need to select the create option under target designer.
117. How can you eliminate duplicate records from a flat file?
You need to user sorter transformation. Under properties tab need to select distinct option.
The default value of Stop On Errors is zero. If any error comes it will by default write into the session log
and session will succeed.
You need to set Stop On Errors to 1. This property is available under session config object.
Task Developer
Worklet Designer
Worflow Designer
If we create a session under task developer tool i.e automatically reusable and it is available in the
navigation bar under sessions folder.
If we create a session under workflow designer tool i.e automatically non-reusable. You cannot use this
session in another workflow and is restricted to that workflow only.
If you create any task under task developer i.e reusable, you can use that task in n number of workflows.
If you create any task under workflow manager i.e non-reusable i.e restricted to that workflow only
Command
Event Wait
Event Raise
Timer
Decision
Assignment
Control
Generally E-mail task is used at the end of the workflow, to send a mail to business about today's data
loads status
Command task is used to execute any unix commands or to call a shell script
The timer task is used to wait the process particular time. At that time the timer task will execute. For
example you have two sessions, after one session succeeds you need to wait 1 min and then you need to
start another session. In this scenario we can use timer for 1 min
The decision task is used to avoid multiple link conditions. For example after 10 sessions succeeds then
you want to run another task then we can use single decision task after 10 sessions
You need to check the property, Fail Parent If this Task Fails.
Double click on that task and check the property Disable the task
Yes we can convert from non-reusable task to reusable task, double click on the task and top corner you
need to select check box reusable.
Parallel: From start icon in the workflow, we can connect multiple tasks, all these will run at the same
time.
Sequential: From start icon in the workflow, we can connect one task after another task sequentially.
140. Where do you set parameter file?
You can give parameter file and path name either in the session level or workflow level. Some times we
can directly use parameter file name along with pmcmd command.
The default commit interval is 10000. Once 10000 rows reached to target it will apply commit operation.
We have the flexibility to increase or decrease commit interval value.
142. Will session log override when you run the session second time?
Yes by default the session log will override, when ever we run a session, unless you specify properties
number of runs or by time stamp.
2. By Time stamp: It will create new session log every time you run the session.
144. What might be the reasons, your session failed with table or view doesn't exist ?
145. What happens to third session if second session failed in the workflow, workflow contains 3
sessions?
By default third session will execute even if the second session fails and also it will show workflow as
succeeded, unless if you specify fail parent if this task fails option.
146. How to make it happen in a workflow only one task succeeds then only need to run another task?
We need to connect all tasks with in a workflow using links. By using link task, task we can set
status=succeeded. Just double click on the link then it opens a dialogue box. If you set this if first task
succeeds in a workflow then only second task will start.
Domain is a collection of nodes and services. You can configure domain details at the time of informatica
server installation. You can use power center informatica administration console to get more details
about your environment.
A node is a logical representation of a machine. In real time we can see multiple nodes in a single
environment. One node act as master gateway node and all other nodes work as worker nodes.
149. What are main different services in power center?
Repository Service: It is responsible for retrieving or inserting data into the repository database.
Integration Service: It is responsible for running workflows and talks with repository service.
When you connect to repository using userid and passowrd, service manager is responsible for
authorization and authentication. Service manager is one of the component in power center tool.
We can know the current value of the sequence generator transformation without running the session.
From the navigation bar, disconnect to the folder and connect again, open the mapping and then double
click on the sequence generator transformation, select properties tab, you can see the current value.
The reset option is available under properties tab of the sequence generator transformation. When you
select reset option, every time when you run the session the sequence generator starts with same value.
In real time we set this option if the target table is truncate and load.
154. What happens if you connect same sequence generator to two targets?
When you connect same sequence generator transformation to two different targets, it gives different
values. For example if it assigns values 1,2,3,4 to first target then for second target it assigns values
5,6,7,8.
155. What happens if you connect only currval port from sequence generator?
If you connect only currval port from sequence generator transformation, then it assigns same currval to
all the rows. Currval port automatically comes into sequence generator when you create, you can't
delete this port from the transformation.
The union transformation won't eliminate duplicates, it is equal to union all operation in oracle. If you
want to delete duplicate rows then you need to place a sorter transformation after union and you need
to enable distinct option which is available under properties tab of sorter transformation.
You can connect n number of sources to union transformation. All the sources should have same number
of columns and data types also. Suppose if one of the source has less number of columns, we can create
a dummy port and we can use this while connecting to union transformation.
The target pre sql option is available at the session level. Doublce click on the session, select mappings
tab, select targets left side, right side you can see this option. If you want to execute any sql statement
before loading into your target you can use this. In real time we use this option to drop indexes before
loading data into the target table.
159. What is target post sql?
The target post sql option is available at the session level. Doublce click on the session, select mappings
tab, select targets left side, right side you can see this option. If you want to execute any sql statement
after loading into your target you can use this. In real time we use this option to re create indexes after
loading data into the target table.
We have predefined option at the session level, to send a mail when the session fails. Double click on the
session, select components tab, there is a property "On Failure E-mail". You can select either reusable or
non-reusable e-mail task.
We can fail a session explicitly by calling abort() function in the mapping. We can call abort() function
from the expression transformation. For example if you want to fail a session when the date is invalid,
then in the expression you need to create output port and need to set value like this.
IIF(IS_DATE(EXP_DATE,'YYYYMMDD')!=1,ABORT("Invalid Date"))
Using repository manager, we can compare folders from different repositories. Connect to both the
repositories using login credentials. After this you can select same folder from each repository and
compare. It gives result similarities and differences from both the folders.
Up to Informatica 8 version, lookup is passive. From version 9 onward it is active. By default it is passive
in 9 version also, if you want convert to active, at the time of creation, select the check box.
When you click on stop, immediately it stops reading, continue processing and writing data to the target
for commit, process waits till commit applied on the target. Abort also stops reading, continue
processing and writing data to the target for commit. If commit doesn't happen in 60 seconds, it
immediately kills writer thread.
166. What is the difference between reusable transformation and mapplet?
You can't link multiple transformations under transformation developer as reusable. Mapplets allow you
to link multiple transformations in order to build a reusable business logic. You need to use
transformation developer tool to create reusable transformation. You need to use mapplet designer tool
to create a mapplet.
167. Can you tell 4 output files that informatica generates during session running?
Session Log
Workflow Log
Error Log
Bad File
Yes we can return multiple ports from unconnected lookup indirectly by concatenating multiple ports as
single and then you can select that port as return port. Once we get this port value, using expression we
can break value into multiple ports.
The disadvantage of bulk load utility is you cannot recover the data, why because it writes data into the
data file, it won't write data into the redo log file. If you want to recover data the data should be
available in the redo log file also.
171. Can you use bulk load if indexes exist on target table?
Bulk load won't support if you have defined indexes on your target table. In order to use bulk load, drop
indexes before load and create indexes after loading data into the target table.
A code page represents the character set. Character set represents alphabets, digits and special
characters.There are different types of code pages ASCII, EBCIDIC, UTF etc. Source and target data falls
under some code pages.
We give code page while creating relational connections. If you want to write data into the database
other than alphabets, digits and special characters for example non-English characters then we need to
change accordingly. But your target data base also should support that code page.
A surrogate key is an artificial key, which is not coming from the source. We are going to generate
surrogate key values using sequence generator transformation.
Push down optimization means pushing business logic handing into the database side instead of
handling in the informatica server level. When we use this option, internally it creates sql for
transformations in the mapping and it will fire on the database.
a) Source Side push down optimization: From source to upstream transformations, creates sql and fire it
on the database.
b) Target Side push down optimization: From target to downstream transformations, creates sql and fire
it on the database.
c) Full push down optimization: From sources and targets, creates sql and fire it on the database.
PMCMD is a command line utility, used to run the workflow from command line. Generally we use this
command in unix shell scripts.There are different options available with this command, you can set
parameter file also in the command itself.
A worklet is a group of reusable tasks. You can link multiple tasks with in a worklet. You can link the tasks
parallel or sequential. The tasks might be session, email, command, decision, timer, control etc
You can create worklet under worklet designer tool. The worklet designer tool is available under power
center workflow manager tool. When you crate a worklet you can see start icon similar to workflow icon.
If you want to execute some number of tasks in multiple workflows in order to fulfill particular business
logic.In future if you wan to add one more task, you need not touch all workflows, just add at only one
place under worklet designer, that change automatically will reflect in all workflows. It saves a lot of time
for build.
Yes you can include one worklet in another worklet. Including one worklet in another worklet known as
nesting worklets. In worklet designer tool select insert option and then select another worklet you want
to include.
We can't run a worklet without workflow even though start symbol exists for worklet also. If you want to
execute a worklet you need to place in workflow only similar to the tasks in the workflow.
183. Can you run a workflow with out a session?
Yes we can run a workflow with out session task. We can run any other tasks email or command or timer
or control or event wait etc with in the workflow.
Throughput represents number of rows read from source per second and the number of rows writing in
to the target. It shows number of bytes along with number of rows.
Using workflow monitor we can view the load statistics, how many rows read from source, how many
rows written and failure and success statuses. Using this we can see history runs also.
Task View: This view displays workflow run details in chronological format.
GanttChart View: This view displays workflow run details in report format.
187. What is the difference between applied rows and affected rows?
Applied rows on the target side means number of rows reached to the target and affected rows means
number of rows actually updated or deleted or inserted in the target.
188. How many ways you can create ports?
There are different ways you can create ports. One way is double click and go to ports tab, select the icon
add a new port to the transformation another option is you can directly drag from another
transformation.
Using this option you can propagate attribute names, data types and size from one transformation to
other entire flow. Generally we use this option in enhancements of the project.
Under workflow manager tool, in the menu bar select connections, select database, give connection
name, userid and password. Generally we can create relational connections only for databases.
The default condition for filter transformation is TRUE. Just if we connect filter and don't give any
condition then, what ever the rows entered into filter are moving to next level transformation or target.
Eg:- replacechr(0,'abcd','ac','fx')
output: fbxd
Eg:- replacestr(0,'abcd','ac','fx')
output: abcd
We can compare two dates in informatica using predefined function date_comapre(). This function takes
two arguments. If date1 is less than date2 then it retruns -1, if date1 is greater than date2 then it returns
1 if date1 and date2 are equal then it returns 0.
We can remove spaces at the beginning and end of a string using LTRIM and RTRIM functions. Both these
functions you can apply at the same time on a string. ltrim(rtrim(String)).
We can check date is valid or not using predefined function IS_DATE(). This function takes two
arguments, first argument is date and the second argument is format of the date. If it is valid date then it
returns 1 else it returns 0.
SYSDATE gives current time of that node at that time, it will change while loading huge data you can
observe. SESSIONSTARTTIME is the time represents constant value during entire session run, that
contains time at the session started.
We can remove new line character in a character column using replacestr() function. In real time we face
this kind of issue, if the column value contains huge amount of data.
REPLACESTR(1,STRING_PORT,CHR(10),CHR(13),'')
History Load: First we load entire data into the DWH from the beginning of the business to till date.
Incremental Load Or Delta Load: Incremental data include daily data after history load or some times
weekly data.