[go: up one dir, main page]

0% found this document useful (0 votes)
34 views33 pages

Datastage Questions1

Uploaded by

Charan Tej
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views33 pages

Datastage Questions1

Uploaded by

Charan Tej
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 33

DATA Q Technologies

DataStage is the product of Ascential (company name).


DataStage is a ETL (Extraction Transformation Loading) tool
For Data warehouse.
Datastage is powerful software suite that is used to develop and run
DataStage jobs.
A DataStage can extract from different sources, and then
Cleanse, integrate, and transform the data according to requirement.

1) What is the DataStage version are you using?


Ans : Version 7.1
2) What are the data sources (source) you used in ETL (Extraction Transformation Loading)?
Ans : Oracle, Text files
3) What are the components are there in DataStage?
Ans : 1.Administrator 2. DataStage Manager 3. DataStage Director 4. DataStage designer
4) What are the uses of DataStage Director?
Ans : DataStage Director is the client component that validates, runs, schedules, and monitors
jobs by Datastage Server.
5) What are the uses of DataStage Manager?
Ans : To import metadata information. Export and import datastage jobs for backup purposes.
6) What are the uses of DataStage Designer?
Ans : To develop datastage jobs for extration data and load to target.
7) What is meant by Repository?
Ans : A server area where projects and jobs are stored. The repository also holds definitions for
the data, stages etc.
8) What is active stage?
Ans : A stage in a job that carries out processing. Ex: Transformer, Aggregator, sort
9) What is passive stage?
Ans : A stage in a job that carries out no processing. ex : sequential stage, ODBC stage, hashed
file stage , Oracle OCI stage.
10) What is process?
Ans : A process is defined as passive stage to active stage then passive stage.
11) What are the uses of hash files?
Ans : Hash files are used as reference files to improve job performance.
12) How to improve performance when using hash files?
Ans : By enable cache memory.
13) Do you know what the latest version is for Datastage?
Ans : Version 7.5
14) What is the back end you are using for data storage?
Ans : Oracle and version 8i.
15) What is job batch?
Ans : A job batch is group of jobs or separate instances of the same job that to run.
16) How many maximum transformers you used in your job?
DATA Q Technologies
Ans : 4
17) How do you measure performance?
Ans : By number of rows processed per second.

DATA STAGE FAQ’S


In Datastage how we can do for this type or requirement:

The sales data will be extracted in 2 flat files.


Here is the codification suggested:

SALE_HEADER_XXXXX_YYYYMMDD.PSV
SALE_LINE_XXXXX_YYYYMMDD.PSV

XXXXX = LVM sequence to ensure unicity and continuity of file exchanges


Caution, there will an increment to implement.
YYYYMMDD = LVM date of file creation

COMPRESSION AND DELIVERY TO: SALE_HEADER_XXXXX_YYYYMMDD.ZIP AND


SALE_LINE_XXXXX_YYYYMMDD.ZIP

if suppose we run first time means


the job names are
sale_header_1_systemcurrentdate & sale_line_1_systemcurrentdate.
if we run next time means the file name are
sale_header_2_systemcurrentdate & sale_line_2_systemcurrentdate.
if we run next time means the file names are
sale_header_3_systemcurrentdate & sale_line_3_systemcurrentdate.

We want one increment in that xxxxx place.

Can any one give suggestion on this issue?


While using oracle database as target or Source why we will use OCI stage why don't we use
ODBC stage?

Using OCI we can transfer the data rather than ODBC

OCI supports OraBulk and Plug-ins using these concepts Bulk loads possible quickly

What is the default size of the hash file?

2 GB

What type of files is created when we create the hash file?


DATA Q Technologies
These are the three types of files will be created .dat, .type, .over
How to improve the performance of hash file?

You can improve performance of hashed file by

1 .Preloading hash file into memory -->this can be done by enabling preloading options in hash
file output stage

2. Write caching options -->.It makes data written into cache before being flushed to disk. you
can enable this to ensure that hash files are written in order onto cash before flushed to disk
instead of order in which individual rows are written

3 .Pre allocating--> Estimating the approx size of the hash file so that file needs not to be
splitted to often after write operation

How to implement slowly changing dimensions in Datastage?


Slow changing dimension is a common problem in Data warehousing. For example: There exists
a customer called Lisa in a company ABC and she lives in New York. Later she moved to
Florida. The company must modify her address now. In general 3 ways to solve this problem

Type 1: The new record replaces the original record, no trace of the old record at all, Type 2: A
new record is added into the customer dimension table. Therefore, the customer is treated
essentially as two different people. Type 3: The original record is modified to reflect the changes.

In Type1 the new one will over write the existing one that means no history is maintained,
History of the person where she stayed last is lost, simple to use.

In Type2 New record is added, therefore both the original and the new record Will be present, the
new record will get its own primary key, Advantage of using this type2 is, Historical information
is maintained But size of the dimension table grows, storage and performance can become a
concern.
Type2 should only be used if it is necessary for the data warehouse to track the historical changes.

In Type3 there will be 2 columns one to indicate the original value and the other to indicate the
current value. Example a new column will be added which shows the original address as
Newyork and the current address as Florida. Helps in keeping some part of the history and table
size is not increased. But one problem is when the customer moves from Florida to Texas the
Newyork information is lost. So Type 3 should only be used if the changes will only occur for a
finite number of times.

========
We can handle SCD in the following ways
Type 1: Just use, “Insert rows Else Update rows”
Or
“Update rows Else Insert rows”, in update action of target
Type 2: Use the steps as follows
DATA Q Technologies
a) U have use one hash file to Look-Up the target
b) Take 3 instances of target
c) Give different conditions depending on the process
d) Give different update actions in target
e) Use system variables like Sysdate and Null.
========
To develope scd type 2 us the update action in target as “insert new rows only"

For this u need to maintain primary key as composite key in target table. It would be better to use
timestamp column as one of key column in target table.

What is the difference between symmetrically parallel processing, massively parallel processing?
How to remove the locked jobs using Datastage?
Go to Director Tools then click Clear the Job Resources option there u note the PID Number 1.Go
to Administrator 2.Properties3.Command 4.type ds.tools5. U will get options6.select 4th
option6.select the 6th option7.Then enter the PID Number
How to use Excel file as input in Datastage??
Store input excel file as .csv file and use sequential file stage to read that file..!!!
==========

You can use excel file as input by importing the .xls file.

step1 --> Go to Administrative Tools -> Data Source (ODBC) --> System DSN. Click on Add
button and configure the corresponding .xsl file in your system DSN. Make sure that workbook
contains the name of your excel sheet.

Step2 --> Import the excel file into the Datastage as ODBC table definition.

Step3 --> Use ODBC stage as input stage.

You should be able to use excel file very effectively. Please let me know if you face any problem.

How can I identify the duplicate rows in a seq or comma delimited file?
The case is...> the source has 4 values like, agent id, agent name, etc... Our requirement is that the
ID shouldn't be repeated. So how can i identify the duplicate rows, set a flag and send the rejects
to the specified reject file? The source systems data is directly given to us. That’s why we are
getting these duplicates. If it has a primary key set up already then it would have been very easy.

We have to choose between Datastage and Informatica.


Which of these is better?
Our DW is approx. 3-5 GB
Environment is NT.
We have INFA 7.2 and the business we just took over has Datastage 7.5. Here's one side of the
story:
A customer that recently migrated from Ascential Datastage 7.5 to Informatica PowerCenter 7.
DATA Q Technologies
The customer has identified that Informatica outperformed Ascential Datastage in the following
areas.

· Performance - Informatica PowerCenter 2.5x faster than Datastage 7.5

· Developer Productivity –

· Informatica was 6-times more productive (It took 6 Datastage mappings compared to 1
Informatica mapping).

· Informatica mappings required no custom coding. Ascential Datastage required 20% custom
coding.

· Vendor Independence - Datastage underlying engine is replaced by DB2. Customer has to


upgrade for future support and upgrades requires the customer to pay for services. Customer is
Oracle/SQL Server shop with no DB2 expertise.

I would like a suggestion from someone that has worked on both the tools.
I am new to Datastage can we need to load a excel file which has both ARABIC and ENGLISH
text. Here my source id IBM DB2. If I want to display the Arabic content in my browser I need to
store it in the UNICODE format in the Database. Can any one tell how to store the content in a
excel file as UNICODE NLS . In Datastage the option has been provided only for Sequential
files.

To pass parameters to the job we can use Unix shell program.

But I do not know exactly how to do that.

Plz let me know detail steps to pass the parameter through shell script
I am trying to execute batch file which has dsjob command to run data stage job from CLI.
I am getting error code = 81011 (project not found but project is there)
But I can run other jobs. I am getting into problem for one particular batch.
How to eliminate the errors in delimited flat file?
Can a sequential file stage can read the path from other sequential file. in which the data is stored
The entire user defined environmental variables and assignee values for it. But Encrypted IE
password i am not able to read. If it reads only in the server job it is able to read. Not from the
sequencer. it says password error...
What are the different types of links in Datastage?
There are 3 different links in the Datastage
1. stream link means straight link
2. Reference link it acts like a lookup.
3. Rejected link used in parallel jobs
=======================

There 2 types of links in Datastage


DATA Q Technologies
1 Link partitioner stage

2 Link collector stage

The link partitioner collects the data in single input link & diverts to the maximum of 64 output
links insert at the same time

Link collector stage should collect the data from the passive stage.

What are the Datastage strings?


What is metastage? explain

basically it is Ascential's metadata repository


What is Datastage parallel Extender?
its a new version of Datastage called PX it process data using partion/collection methods
(algorithms) it is very fast compare to older (server) version.. Portioning can be defined via
Config file which stores no of nodes and resource pool information. there are various new stages
added to PX version for example dataset, fileset, row generator, look up stage...and many
morecheers,
======
Parallel extender is that the parallel processing of data extraction and transformation application.
There are two types of parallel processing 1) pipeline parallelism 2) partition parallelism.

Is it possible (rename) to change column name of a table?


What is atomic it?
WHAT IS DE EXACT DEFNITION FOR RDBMS AND ORDBMS?

Yes we can alter Column name in a table using the following query alter table rename column.
If a Datastage job has aborted after say 1000 records, how to continue the load from 1000th
record after fixing the error?
What are Static Hash files and Dynamic Hash files?
The hashed files have the default size established by their modulus and separation when you
create them, and this can be static or dynamic.
Overflow space is only used when data grows over the reserved size for someone of the groups
(sectors) within the file. There are many groups as the specified by the modulus.
What does separation option in static hash-file mean?
What is the difference between sequential file and a dataset? When to use the copy stage?
Sequential file stores small amount of the data with any extension .txt where as DataSet stores
huge amount of the data and opens the file only with an extension .ds.
====
Sequential Stage stores small amount of the data with any extension in order to acces the file
where as DataSet is used to store Huge amount of the data and it opens only with an extension
(.ds ) .The Copy stage copies a single input data set to a number of output datasets. Each record of
DATA Q Technologies
the input data set is copied to every output data set .Records can be copied without modification
or you can drop or change the order of columns.
What is the purpose of exception activity in data stage 7.5?
It is used to catch the exception raised while running the job
Where we use link partitioner in data stage job? Explain with example?
We use Link Partitioner in Datastage Server Jobs. The Link Partitioner stage is an active stage
which takes one input and allows you to distribute partitioned rows to up to 64 output links.
How to kill the job in data stage?
By killing the respective process ID
How to parameterize a field in a sequential file? I am using Datastage as ETL Tool, Sequential
file as source.
We cannot parameterize a particular field in a sequential file; instead we can parameterize the
source file name in a sequential file.
2 #FILENAME#
How to drop the index before loading data in target and how to rebuild it in data stage?
If the size of the Hash file exceeds 2GB..What happens? Does it overwrite the current rows?
It overwrites the file
It is possible to access the same job two users at a time in Datastage?
No, it is not possible to access the same job two users at the same time. DS will produce the
following error: "Job is accessed by other user"
How to find errors in job sequence?
Using Datastage Director we can find the errors in job sequence
What is job control? How can it used explain with steps?
JCL defines Job Control Language it is used to run more number of jobs at a time with or without
using loops. steps: click on edit in the menu bar and select 'job properties' and enter the
parameters asparamete prompt type STEP_ID STEP_ID string Source SRC string DSN string
Username, string Password pwd string after editing the above steps then set JCL button and select
the jobs from the listbox and run the job
How we can call the routine in Datastage job? Explain with steps?
Routines are used for impelementing the business logic they are two types 1) Before Sub
Routines and 2)After Sub Routinestepsdouble click on the transformer stage right click on any
one of the mapping field select [dsroutines] option within edit window give the business logic and
select the either of the options( Before / After Sub Routines)
What are the most important aspects that a beginner must consider doing his first DS project?
He should be good at Data Warehousing Concepts and he should be familiar with all stages
What are the different types of lookups in Datastage?

- Look-up file stage - Generally used with Look Up stage

- Hash Look-up

- You can also implement a "look up" using Merge stage

Where actually the flat files store? What is the path?


DATA Q Technologies
Flat files stores the data and the path can be given in general tab of the sequential file stage

How to find the number of rows in a sequential file?

Using Row Count System variable

What is difference between Merge stage and Join stage?

Join can have max of two input datasets; Merge can have more than two input datasets.

Merge and Join Stage Difference:

1. Merge Reject Links are there

2. Can take Multiple Update links

3. If you used it for comparison, then first matching data will be the output.

Because it uses the update links to extend the primary details which are coming from master link

Someone was saying that join does not support more than two input, while merge support two or
more input (one master and one or more update links). I will say, that is highly incomplete
information. The fact is join does support two or more input links (left right and possibly
intermediate links). But, yes, if you are talking about full outer join then more than two links are
not supported.

Coming back to main question of difference between Join and Merge Stage, the other significant
differences that I have noticed are:

1) Number of Reject Link

(Join) does not support reject link.

(Merge) has as many reject link as the update links( if there are n-input links then 1 will be
master link and n-1 will be the update link).

2) Data Selection

(Join) There are various ways in which data is being selected. e.g. we have different types of
joins, inner, outer( left, right, full), cross join, etc. So, you have different selection criteria for
dropping/selecting a row.

(Merge) Data in Master record and update records are merged only when both have same value
for the merge key columns.
DATA Q Technologies
What is the difference between validated ok and compiled in Datastage.

When you compile a job, it ensures that basic things like all the important stage parameters have
been set, mappings are correct, etc. and then it creates an executable job.

You validate a compiled job to make sure that all the connections are valid. All the job
parameters are set and a valid output can be expected after running this job. It is like a
dry run where you don't actually play with the live data but you are confident that things
will work.

When we say "Validating a Job", we are talking about running the Job in the "check only" mode.
The following checks are made:

- Connections are made to the data sources or data warehouse.


- SQL SELECT statements are prepared.
- Files are opened. Intermediate files in Hashed File, UniVerse, or ODBC stages that use the local
data source are created, if they do not already exist.

What are the environment variables in Datastage? Give some examples?

There are the variables used at the project or job level. We can use them to configure the job ie.
We can associate the configuration file (Without this u can not run ur job), increase the sequential
or dataset read/ write buffer.

ex: $APT_CONFIG_FILE

Like above we have so many environment variables. Please go to job properties and click on "add
environment variable" to see most of the environment variables.

Purpose of using the key and difference between Surrogate keys and natural key

We use keys to provide relationships between the entities (Tables). By using primary and foreign
key relationship, we can maintain integrity of the data.

The natural key is the one coming from the OLTP system.

The surrogate key is the artificial key which we are going to create in the target DW. We can use
these surrogate keys instead of using natural key. In the SCD2 scenarios surrogate keys play a
major role

How do you do Usage analysis in Datastage?

1. If u wants to know some job is a part of a sequence, then in the Manager right click the job and
select Usage Analysis. It will show all the jobs dependents.
DATA Q Technologies
2. To find how many jobs are using a particular table.

3. To find how many jobs are using a particular routine.

Like this, u can find all the dependents of a particular object.

Its like nested. U can move forward and backward and can see all the dependents.

How to remove duplicates in server job

1) Use a hashed file stage or


2) If you use sort command in UNIX (before job sub-routine), you can reject duplicated records
using -u parameter or
3) using a Sort stage

Will the data stage consider the second constraint in the transformer once the first condition is
satisfied (if the link ordering is given?)

"Will Datastage consider the second constraint in the transformer if the first constraint is satisfied
(if link ordering is given)?"

A: Yes.

* What are constraints and derivation?


* Explain the process of taking backup in Datastage?
*What are the different types of lookups available in Datastage?

Constraints are used to check for a condition and filter the data. Example: Cust_Id<>0 is set as a
constraint and it means and only those records meeting this will be processed further.

Derivation is a method of deriving the fields, for example if you need to get some SUM, AVG
etc.

What is a project? Specify its various components?

You always enter Datastage through a Datastage project. When you start a Datastage client you
are prompted to connect to a project. Each project contains:

• Datastage jobs.
• Built-in components. These are predefined components used in a job.
• User-defined components. These are customized components created using the Datastage
Manager or Datastage Designer

Briefly describe the various client components?


DATA Q Technologies
There are four client components

Datastage Designer. A design interface used to create Datastage applications


(known as jobs). Each job specifies the data sources, the transforms required, and
the destination of the data. Jobs are compiled to create executables that are
scheduled by the Director and run by the Server.

Datastage Director. A user interface used to validate, schedule, run, and monitor
Datastage jobs.

Datastage Manager. A user interface used to view and edit the contents of the
Repository.

Datastage Administrator. A user interface used to configure Datastage projects


and users.

What are the Steps involved in development of a job in Datastage?

The steps required are:

Select the data source stage depending upon the sources for ex: flat file, database, xml etc

Select the required stages for transformation logic such as transformer, link collector, link
partitioner, Aggregator, merge etc

Select the final target stage where u want to load the data either it is datawarehosuing, datamart,
ODS, staging etc

What is meaning of file extender in data stage server jobs. can we run the data stage job
from one job to another job that file data where it is stored and what is the file extender in
dsjobs.?

File extender means the adding the columns or records to the already existing the file, in
the data stage,

We can run the data stage job from one job to another job in data stage.

What is the difference between drs and odbc stage

DRS and ODBC stage are similar as both uses the Open Database Connectivity to connect to a
database. Performance wise there is not much of a difference. We use DRS stage in parallel jobs.

How I can convert Server Jobs into Parallel Jobs?


DATA Q Technologies
U can’t convert server to parallel! U have to rebuild whole graph.

What is the difference between in process and Interposes?

in process is the process where the server transfers only one row at a time to target and interposes
means that the server sends group of rows to the target table...these both are available at the
tunables tab page of the administrator client component..

Can you convert a snow flake schema into star schema?

Yes, we can convert by attaching one hierarchy to lowest level of another hierarchy.

Is it possible to move the data from oracle ware house to SAP Warehouse using with
DATASTAGE Tool.

We can use Datastage Extract Pack for SAP R/3 and Datastage Load Pack for SAP
BW to transfer the data from oracle to SAP Warehouse. These Plug-in Packs are available with
Datastage Version 7.5

If a Datastage job aborts after say 1000 records, how to continue the job from 1000th record after
fixing the error?

By specifying Checkpointing in job sequence properties, if we restart the job. Then job will start
by skipping up to the failed record. This option is available in 7.5 edition.

What is OCI?

If you mean by Oracle Call Interface (OCI), it is a set of low-level APIs used to interact with
Oracle databases. It allows one to use operations like logon, execute, parses etc. using a C or C++
program

What is hashing algorithm and explain briefly how it works?

Hashing is key-to-address translation. This means the value of a key is transformed into a disk
address by means of an algorithm, usually a relative block and anchor point within the block. It's
closely related to statistical probability as to how well the algorithms work.

It sounds fancy but these algorithms are usually quite simple and use division and remainder
techniques. Any good book on database systems will have information on these techniques.

Interesting to note that these approaches are called "Monte Carlo Techniques" because the
behavior of the hashing or randomizing algorithms can be simulated by a roulette wheel where
the slots represent the blocks and the balls represent the records (on this roulette wheel there are
many balls not just one).
DATA Q Technologies
It is possible to call one job in another job in server jobs?

We cannot call one job within another in Datastage, however we can write a wrapper to access
the jobs in a stated sequence. We can also use sequencer to sequence the series of jobs.

1. What about System variables?


2. How can we create Containers?
3. How can we improve the performance of Datastage?
4. What are the Job parameters?
5. What is the difference between routine and transform and function?
6. What are all the third party tools used in Datastage?
7. How can we implement Lookup in Datastage Server jobs?
8. How can we implement Slowly Changing Dimensions in Datastage?.
9. How can we join one Oracle source and Sequential file?.
10. What is iconv and oconv functions?
11. Difference between Hashfile and Sequential File?
12. Maximum how many characters we can give for a Job name in Datastage?

If I add a new environment variable in Windows, how can I access it in Datastage?

U can call it in designer window

Under that job properties

There u can add an new environment variable r u can use the existing one

2] U can view all the environment variables in designer. U can check it in Job properties. U can
add and access the environment variables from Job properties

What are the enhancements made in Datastage 7.5 compare with 7.0?

Many new stages were introduced compared to Datastage version 7.0. In server jobs we have
stored procedure stage, command stage and generate report option was there in file tab. In job
sequence many stages like start loop activity, end loop activity, terminate loop activity and user
variables activities were introduced. In parallel jobs surrogate key stage, stored procedure stage
was introduced. For all other specifications,

What is data set? And what is file set?

I assume you are referring Lookup fileset only. It is only used for lookup stages only. Dataset:
Datastage parallel extender jobs use data sets to manage data within a job. You can think of each
link in a job as carrying a data set. The Data Set stage allows you to store data being operated on
in a persistent form, which can then be used by other Datastage jobs. FileSet: Datastage can
DATA Q Technologies
generate and name exported files, write them to their destination, and list the files it has generated
in a file whose extension is, by convention, .fs. The data files and the file that lists them are called
a file set. This capability is useful because some operating systems impose a 2 GB limit on the
size of a file and you need to distribute files among nodes to prevent overruns.

How the hash file is doing lookup in serverjobs? How is it comparing the key values?

Hashed File is used for two purposes: 1. Remove Duplicate Records 2. Then Used for reference
lookups. The hashed file contains 3 parts: Each record having Hashed Key, Key Header and Data
portion. By using hashed algorithm and the key valued the lookup is faster.

What are the differences between the data stage 7.0 and 7.5
in server jobs?

There is lot of Differences: There are lot of new stages is available in DS7.5 For Eg: CDC Stage
Stored procedure Stage etc...

How to handle the rejected rows in Datastage?

We can handle by using constraints and store it in file or DB.

2] we can handle rejected rows in two ways with help of Constraints in a Tansformer.1) By
Putting on the Rejected cell where we will be writing our constraints in the properties of the
Transformer2)Use REJECTED in the expression editor of the Constraint Create a hash file as a
temporary storage for rejected rows. Create a link and use it as one of the output of the
transformer. Apply either of the two steps above said on that Link. All the rows which are
rejected by all the constraints will go to the Hash File.

What are orabulk and bcp stages?

ORABULK is used to load bulk data into single table of target oracle database.

BCP is used to load bulk data into a single table for microsoft sql server and sysbase.

How is Datastage 4.0 functionally different from the enterprise edition now? What are the exact
changes?

There is lot of Changes in DS EE. CDC Stage, Procedure Stage, Etc..........

What is the difference between Datastage and informatica?

The main difference between data stage and informatica is the SCALABILTY. Informatica is
scalable than Datastage
DATA Q Technologies
In my view Datastage is also Scalable, the difference lies in the number of built-in functions
which makes Datastage more user friendly

Datastage from Staging to MDW is only running at 1 row per second! What do we do to remedy?

I am assuming that there are too many stages, which is causing problem and providing the
solution.

In general if you too many stages (especially transformers, hash look up), there would be a lot of
overhead and the performance would degrade drastically. I would suggest you to write a query
instead of doing several look ups. It seems as though embarrassing to have a tool and still write a
query but that is best at times.

If there are too many look ups that are being done, ensure that you have appropriate indexes while
querying. If you do not want to write the query and use intermediate stages, ensure that you use
proper elimination of data between stages so that data volumes do not cause overhead. So, there
might be a re-ordering of stages needed for good performance.

Other things in general that could be looked in:

1) For massive transaction set hashing size and buffer size to appropriate values to perform as
much as possible in memory and there is no I/O overhead to disk.

2) Enable row buffering and set appropriate size for row buffering

3) It is important to use appropriate objects between stages for performance

What user variable activity when it used how it used! Where it is used with real example

By using This User variable activity we can create some variables in the job sequence, this
variables r available for all the activities in that sequence.

Most probably this activity is @ starting of the job sequence

Here are three different types of user-created stages available for PX.
What are they? Which would you use? What is the disadvantage for using each type?

These are the three different stages: i) Custom ii) Build iii) Wrapped

What is the exact difference between Join, Merge and Lookup Stage??

The exact difference between Join, Merge and lookup is

The three stages differ mainly in the memory they use


DATA Q Technologies
Datastage doesn't know how large your data is, so cannot make an informed choice whether to
combine data using a join stage or a lookup stage. Here's how to decide which to use:

if the reference datasets are big enough to cause trouble, use a join. A join does a high-speed sort
on the driving and reference datasets. This can involve I/O if the data is big enough, but the I/O is
all highly optimized and sequential. Once the sort is over the join processing is very fast and
never involves paging or other I/O

Unlike Join stages and Lookup stages, the Merge stage allows you to specify several reject links
as many as input links.

Can any one tell me how to extract data from more than 1 heterogeneous Sources mean, example
1 sequence file, Sybase , Oracle in a single Job.

Yes you can extract the data from two heterogeneous sources in data stages using the transformer
stage it's so simple you need to just form a link between the two sources in the transformer stage
that's it

2] U can convert all heterogeneous sources into sequential files & join them using merge

Or U can write user defined query in the source itself to join them

Can we use shared container as lookup in Datastage server jobs?

We can use shared container as lookup in server jobs.


Wherever we can use same lookup in multiple places, on that time we will develop lookup in
shared containers, then we will use shared containers as lookup.

How can I specify a filter command for processing data while defining sequential file output
data?

We have some thing called as after job subroutine and before subroutine, with then we can
execute the UNIX commands. Here we can use the sort command or the filter command

What are validations you perform after creating jobs in designer?


What r the different type of errors u faced during loading and how u solve them?

Check for Parameters.

And check for input files are existed or not and also check for input tables existed or not and also
usernames, datasource names, passwords like that

What is the difference between Datastage and Datastage TX?


DATA Q Technologies
It’s a critical question to A, but one thing i can tell u that Datastage Tx is not a ETL tool & this is
not a new version of Datastage 7.5. Tx is used for ODS source, this much i know

If data is partitioned in your job on key 1 and then you aggregate on key 2, what issues could
arise?

Data will partition on both the keys! Hardly will it take more for execution.

If you’re running 4 ways parallel and you have 10 stages on the canvas, how many processes does
Datastage create?

A is 40, you have 10 stages and each stage can be partitioned and run on 4 nodes which makes
total number of processes generated are 40

How can you do incremental load in Datastage?

You can create a table where u can store the last successful refresh time for each
table/Dimension.
Then in the source query take the delta of the last successful and sysdate should give you
incremental load.

Does Enterprise Edition only add the parallel processing for better performance?
Are any stages/transformations available in the enterprise edition only?

Datastage Standard Edition was previously called Datastage and Datastage Server Edition. •
Datastage Enterprise Edition was originally called Orchestrate, then renamed to Parallel Extender
when purchased by Ascential. • Datastage Enterprise: Server jobs, sequence jobs, parallel jobs.
The enterprise edition offers parallel processing features for scalable high volume solutions.
Designed originally for Unix, it now supports Windows, Linux and Unix System Services on
mainframes. • Datastage Enterprise MVS: Server jobs, sequence jobs, parallel jobs, mvs jobs.
MVS jobs are jobs designed using an alternative set of stages that are generated into cobol/JCL
code and are transferred to a mainframe to be compiled and run. Jobs are developed on a UNIX
or Windows server transferred to the mainframe to be compiled and run. The first two versions
share the same Designer interface but have a different set of design stages depending on the type
of job you are working on. Parallel jobs have parallel stages but also accept some server stages
via a container. Server jobs only accept server stages, MVS jobs only accept MVS stages. There
are some stages that are common to all types (such as aggregation) but they tend to have different
fields and options within that stage.

How can you implement Complex Jobs in Datastage?

Complex design means having more joins and more look ups. Then that job design will be called
as complex job. We can easily implement any complex design in Datastage by following simple
tips in terms of increasing performance also. There is no limitation of using stages in a job. For
better performance, Use at the Max of 20 stages in each job. If it is exceeding 20 stages then go
DATA Q Technologies
for another job. Use not more than 7 look ups for a transformer otherwise go for including one
more transformer. Am I Aed for u'r abstract Question.

How can u implement slowly changed dimensions in Datastage? Explain?

2) Can u join flat file and database in Datastage? How?

Yes, we can join a flat file and database in an indirect way. First create a job which can populate
the data from database into a Sequential file and name it as Seq_ First. Take the flat file which
you are having and use a Merge Stage to join these two files. You have various join types in
Merge Stage like Pure Inner Join, Left Outer Join, Right Outer Join etc., You can use any one of
these which suits your requirements.

What is the meaning of the following?

1) If an input file has an excessive number of rows and can be split-up then use standard

2)logic to run jobs in parallel

3)Tuning should occur on a job-by-job basis. Use the power of DBMS.

Question is not clear even though I will try to A something

If u have SMP machines u can use IPC, link-collector, link-partitioner for performance tuning

If u have cluster, MPP machines u can use parallel jobs

What is the mean of Try to have the constraints in the 'Selection' criteria of the jobs itself? This
will eliminate the unnecessary records even getting in before joins are made?

t probably means that u can put the selection criteria in the where clause, i.e whatever data u need
to filter ,filter it out in the SQL ,rather than carrying it forward and then filtering it out.

Constraints are nothing but restrictions to data. Here it is restriction to data at entry itself, as he
told it will avoid unnecessary data entry.

This means try to improve the performance by avoiding use of constraints wherever possible and
instead using them while selecting the data itself using a where clause. This improves
performance.

How can ETL excel file to Datamart?

Take the source file (excel file) in the .csv format and apply the conditions which satisfies the
datamart.
DATA Q Technologies
What is difference between serverjobs & parallel jobs?

Server jobs. These are available if you have installed Datastage Server. They run on the
Datastage Server, connecting to other data sources as necessary.

Parallel jobs. These are only available if you have installed Enterprise Edition. These run on
Datastage servers that are SMP, MPP, or cluster systems. They can also run on a separate z/OS
(USS) machine if required.

What is merging? And how to use merge?

Merge is nothing but a filter conditions that have been used for filter condition

2] Merge is a stage that is available in both parallel and server jobs. The merge stage is used to
join two tables (server/parallel) or two tables/datasets (parallel). Merge requires that the master
table/dataset and the update table/dataset to be sorted. Merge is performed on a key field, and the
key field is mandatory in the master and update dataset/table.

How we use NLS function in Datastage? What are advantages of NLS function? Where we can
use that one? Explain briefly?

By using NLS function we can do the following


- Process the data in a wide range of languages
- Use Local formats for dates, times and money
- Sort the data according to the local rules

If NLS is installed, various extra features appear in the product.


For Server jobs, NLS is implemented in Datastage Server engine
For Parallel jobs, NLS is implemented using the ICU library.

What is APT_CONFIG in Datastage?

Datastage understands the architecture of the system through this file (APT_CONFIG_FILE). For
example this file consists, information of node names, disk storage information...etc.

What is the OCI? And how to use the ETL Tools?

OCI doesn't mean the orabulk data. It actually uses the "Oracle Call Interface" of the oracle to
load the data. It is kind of the lowest level of Oracle being used for loading the data.
DATA Q Technologies
How can I extract data from DB2 (on IBM iSeries) to the data warehouse via Datastage as the
ETL tool? I mean do I first need to use ODBC to create connectivity and use an adapter for the
extraction and transformation of data? Thanks so much if anybody could provide an A.

From db2 stage, we can extract the data in ETL

What happens out put of hash file is connected to transformer..


What error it through

If u connect output of hash file to transformer, it will act like reference .there is no errors at all!! It
can be used in implementing SCD's

What is version Control?

Version Control, stores different versions of DS jobs, runs different versions of same job, reverts
to previous version of a job, view version histories

What are the Repository Tables in Datastage and what are they?

Dear User. A data warehouse is a repository (centralized as well as distributed) of Data, able to A
any Adhoc, analytical, historical or complex queries. Metadata is data about data. Examples of
metadata include data element descriptions, data type descriptions, attribute/property descriptions,
range/domain descriptions, and process/method descriptions. The repository environment
encompasses all corporate metadata resources: database catalogs, data dictionaries, and
navigation services. Metadata includes things like the name, length, valid values, and description
of a data element. Metadata is stored in a data dictionary and repository. It insulates the data
warehouse from changes in the schema of operational systems. In data stage I/O and Transfer,
under interface tab: input, out put & transfer pages. U will have 4 tabs and the last one is build
under that u can find the TABLE NAME .The Datastage client components are: Administrator
Administers Datastage projects and conducts housekeeping on the server Designer Creates
Datastage jobs that are compiled into executable programs Director Used to run and monitor the
Datastage jobs Manager Allows you to view and edit the contents of the repository.

Question: Orchestrate Vs Datastage Parallel Extender?


A: Orchestrate itself is an ETL tool with extensive parallel processing capabilities and
running on UNIX platform. Datastage used Orchestrate with Datastage XE (Beta version
of 6.0) to incorporate the parallel processing capabilities. Now Datastage has purchased
Orchestrate and integrated it with Datastage XE and released a new version Datastage 6.0
i.e. parallel Extender

Question: Differentiate Primary Key and Partition Key?


DATA Q Technologies
A: Primary Key is a combination of unique and not null. It can be a collection of key
values called as composite primary key. Partition Key is a just a part of Primary Key.
There are several methods of partition like Hash, DB2, Random etc...While using Hash
partition we specify the Partition Key.

Question: How do you execute DataStage job from command line prompt?
A: Using "dsjob" command as follows.
Dsjob -run -jobstatus projectname jobname

Question: What are Stage Variables, Derivations and Constants?


A: Stage Variable - An intermediate processing variable that retains value during read
and doesn’t pass the value into target column.

Derivation - Expression that specifies value to be passed on to the target column.

Question: What is the default cache size? How do you change the cache size if needed?
A: Default cache size is 256 MB. We can increase it by going into Datastage
Administrator and selecting the Tunable Tab and specify the cache size over there.

Question: What are types of Hashed File?


A: Hashed File is classified broadly into 2 types.
a) Static - Sub divided into 17 types based on Primary Key Pattern.
b) Dynamic - sub divided into 2 types
i) Generic
ii) Specific.

Default Hashed file is "Dynamic - Type Random 30 D"

Question: Containers: Usage and Types?


A: Container is a collection of stages used for the purpose of Reusability.

There are 2 types of Containers.


a) Local Container: Job Specific
b) Shared Container: Used in any job within a project.

Question: Compare and Contrast ODBC and Plug-In stages?


A: ODBC: a) Poor Performance.
b) Can be used for Variety of Databases.
c) Can handle Stored Procedures.

Plug-In: a) Good Performance.


b) Database specific.(Only one database)
c) Cannot handle Stored Procedures
DATA Q Technologies

Question: How to run a Shell Script within the scope of a Data stage job?
A: By using "ExcecSH" command at Before/After job properties

Question: How to handle Date conversions in Datastage? Convert mm/dd/yyyy format to


yyyy-dd-mm?
A: We use a) "Iconv" function - Internal Conversion.
b) "Oconv" function - External Conversion.
Function to convert mm/dd/yyyy format to yyyy-dd-mm is
Oconv (Iconv (Filedname, "D/MDY[2,2,4]"),"D-MDY[2,2,4]")

Question: Types of Parallel Processing?


A: Parallel Processing is broadly classified into 2 types.
a) SMP - Symmetrical Multi Processing.
b) MPP - Massive Parallel Processing

Question: What does a Config File in parallel extender consist of?


A: Config file consists of the following.
a) Number of Processes or Nodes.
b) Actual Disk Storage Location

Question: Functionality of Link Partitioner and Link Collector?


A: Link Partitioner: It actually splits data into various partitions or data flows using
various partition methods.

Link Collector: It collects the data coming from partitions, merges it into a single data
flow and loads to target.

Question: What is Modulus and Splitting in Dynamic Hashed File?


A: In a Hashed File, the size of the file keeps changing randomly.
If the size of the file increases it is called as "Modulus".
If the size of the file decreases it is called as "Splitting

Question: Types of vies in Datastage Director?


A: There are 3 types of views in Datastage Director
a) Job View - Dates of Jobs Compiled.
b) Log View - Status of Job last run
c) Status View - Warning Messages, Event Messages, Program Generated
Messages.
Question: What are the difficulties faced in using DataStage? Or what are the constraints
in using DataStage?
DATA Q Technologies

Question: How do you eliminate duplicate rows?

Question: How do we do the automation of dsjobs


Question: What r XML files and how do you read data from XML files and what stage
to be used?
Question: How do you catch bad rows from OCI stage?

Question: Why do you use SQL LOADER or OCI STAGE?

Question: Suppose if there are million records did you use OCI? if not then what stage
do you prefer

Question: What are the often used Stages or stages you worked with in your last project?
A: A) Transformer, ORAOCI8/9, ODBC, Link-Partitioner, Link-Collector, Hash, ODBC,
Aggregator, Sort.

Question: How many jobs have you created in your last project?
A: 100+ jobs for every 6 months if you are in Development, if you are in testing 40 jobs
for every 6 months although it need not be the same number for everybody

Question: Tell me the environment in your last projects


A: Give the OS of the Server and the OS of the Client of your recent most project

Question: How do you merge two files in DS?


A: Either used Copy command as a Before-job subroutine if the metadata of the 2 files
are same or created a job to concatenate the 2 files into one if the metadata is different.

Question: Did you Parameterize the job or hard-coded the values in the jobs?
A: Always parameterized the job. Either the values are coming from Job Properties or
from a ‘Parameter Manager’ – a third part tool. There is no way you will hard–code some
parameters in your jobs. The often Parameterized variables in a job are: DB DSN name,
username, password, dates W.R.T for the data to be looked against at.

Question: Have you ever involved in updating the DS versions like DS 5.X, if so tell us
some the steps you have taken in doing so?
A: Yes. The following are some of the steps; I have taken in doing so:
1) Definitely take a back up of the whole project(s) by exporting the project as a
.dsx file
2) See that you are using the same parent folder for the new version also for your
old jobs using the hard-coded file path to work.
DATA Q Technologies
3) After installing the new version import the old project(s) and you have to
compile them all again. You can use 'Compile All' tool for this.
4) Make sure that all your DB DSN's are created with the same name as old one's.
This step is for moving DS from one machine to another.
5) In case if you are just upgrading your DB from Oracle 8i to Oracle 9i there is
tool on DS CD that can do this for you.
6) Do not stop the 6.0 server before the upgrade; version 7.0 install process
collects project information during the upgrade. There is NO rework (recompilation of
existing jobs/routines) needed after the upgrade.

Question: What is Hash file stage and what is it used for?


A: Used for Look-ups. It is like a reference table. It is also used in-place of ODBC, OCI
tables for better performance

Question: What are Static Hash files and Dynamic Hash files?
A: As the names itself suggest what they mean. In general we use Type-30 dynamic Hash
files. The Data file has a default size of 2Gb and the overflow file is used if the data
exceeds the 2GB size.

Question: What is DS Manager used for - did u use it?


Question: What is DS Director used for - did u use it?
Question: What is DS Director used for - did u use it?
Question: What is DS Designer used for - did u use it?

Question: What versions of DS you worked with?


A: DS 7.0.2/6.0/5.2

Question: What other ETL's you have worked with?


A: Informatica and also DataJunction if it is present in your Resume.

Question: Did you work in UNIX environment?


A: Yes. One of the most important requirements.

Question: How good are you with your PL/SQL?


A: On the scale of 1-10 say 8.5-9

Question: Explain the differences between Oracle8i/9i?


Question: Do you know about INTEGRITY/QUALITY stage?
Question: Do u know about METASTAGE?

Question: How did u connect to DB2 in your last project?


A: Using DB2 ODBC drivers.
DATA Q Technologies

Question: What are OConv () and Iconv () functions and where are they used?
A: IConv () - Converts a string to an internal storage format
OConv () - Converts an expression to an output format

Question: What are Routines and where/how are they written and have you written any
routines before?
A: Routines are stored in the Routines branch of the DataStage Repository, where you
can create, view or edit. The following are different types of routines:
1) Transform functions
2) Before-after job subroutines
3) Job Control routines

Question: If worked with DS6.0 and latest versions what are Link-Partitioner and Link-
Collector used for?
A: Link Partitioner - Used for partitioning the data.
Link Collector - Used for collecting the partitioned data.

Question: How did you handle reject data?


A: Typically a Reject-link is defined and the rejected data is loaded back into data
warehouse. So Reject link has to be defined every Output link you wish to collect
rejected data. Rejected data is typically bad data like duplicates of Primary keys or null-
rows where data is expected.

Question: What are other Performance tunings you have done in your last project to
increase the performance of slowly running jobs?
A:
• Staged the data coming from ODBC/OCI/DB2UDB stages or any database on the
server using Hash/Sequential files for optimum performance also for data
recovery in case job aborts.
• Tuned the OCI stage for 'Array Size' and 'Rows per Transaction' numerical values
for faster inserts, updates and selects.
• Tuned the 'Project Tunables' in Administrator for better performance.
• Used sorted data for Aggregator.
• Sorted the data as much as possible in DB and reduced the use of DS-Sort for
better performance of jobs
• Removed the data not used from the source as early as possible in the job.
• Worked with DB-admin to create appropriate Indexes on tables for better
performance of DS queries
• Converted some of the complex joins/business in DS to Stored Procedures on DS
for faster execution of the jobs.
DATA Q Technologies
• If an input file has an excessive number of rows and can be split-up then use
standard logic to run jobs in parallel.
• Before writing a routine or a transform, make sure that there is not the
functionality required in one of the standard routines supplied in the sdk or ds
utilities categories.
• Constraints are generally CPU intensive and take a significant amount of time to
process. This may be the case if the constraint calls routines or external macros
but if it is inline code then the overhead will be minimal.
• Try to have the constraints in the 'Selection' criteria of the jobs itself. This will
eliminate the unnecessary records even getting in before joins are made.
• Tuning should occur on a job-by-job basis.
• Use the power of DBMS.
• Try not to use a sort stage when you can use an ORDER BY clause in the
database.
• Using a constraint to filter a record set is much slower than performing a SELECT
… WHERE….
• Make every attempt to use the bulk loader for your particular database. Bulk
loaders are generally faster than using ODBC or OLE.

Question: How did you handle an 'Aborted' sequencer?


A: In almost all cases we have to delete the data inserted by this from DB manually and
fix the job and then run the job again.

Question: What are Sequencers?


A: Sequencers are job control programs that execute other jobs with preset Job
parameters.

Question: How did u connect with DB2 in your last project?


A : Most of the times the data was sent to us in the form of flat files. The data is dumped
and sent to us. In some cases were we need to connect to DB2 for look-ups as an instance
then we used ODBC drivers to connect to DB2 (or) DB2-UDB depending the situation
and availability. Certainly DB2-UDB is better in terms of performance as you know the
native drivers are always better than ODBC drivers. 'iSeries Access ODBC Driver
9.00.02.02' - ODBC drivers to connect to AS400/DB2.

Question: Read the String functions in DS


A:
Functions like [] -> sub-string function and ':' -> concatenation operator
Syntax: string [ [ start, ] length ]
String [delimiter, instance, and repeats

Question: What will you in a situation where somebody wants to send you a file and use
that file as an input or reference and then run job.
DATA Q Technologies
A: A. Under Windows: Use the 'WaitForFileActivity' under the Sequencers and then run
the job. May be you can schedule the sequencer around the time the file is expected to
arrive.
B. Under UNIX: Poll for the file. Once the file has start the job or sequencer depending
on the file.

Question: How would call an external Java function which are not supported by
DataStage?
A: Starting from DS 6.0 we have the ability to call external Java functions using a Java
package from Ascential. In this case we can even use the command line to invoke the
Java function and write the return values from the Java program (if any) and use that files
as a source in DataStage job.

Question: What is the utility you use to schedule the jobs on a UNIX server other than
using Ascential Director?
A: Use crontab utility along with dsexecute() function along with proper parameters
passed.

Question: What are the command line functions that import and export the DS jobs?
A: A. dsimport.exe- imports the DataStage components.
B. dsexport.exe- exports the DataStage components.

Question: How will you determine the sequence of jobs to load into data warehouse?
A: First we execute the jobs that load the data into Dimension tables, then Fact tables,
then load the Aggregator tables (if any).

Question: The above might rise another question: Why do we have to load the
dimensional tables first, then fact tables:
A: As we load the dimensional tables the keys (primary) are generated and these keys
(primary) are Foreign keys in Fact tables.

Question: Tell me one situation from your last project, where you had faced problem and
How did u solve it?
A: A. The jobs in which data is read directly from OCI stages are running extremely
slow. I had to stage the data before sending to the transformer to make the jobs run faster.
B. The job aborts in the middle of loading some 500,000 rows. Have an option
either cleaning/deleting the loaded data and then run the fixed job or run the job again
from the row the job has aborted. To make sure the load is proper we opted the former.

Question: Does the selection of 'Clear the table and Insert rows' in the ODBC stage send
a Truncate statement to the DB or does it do some kind of Delete logic.
A: There is no TRUNCATE on ODBC stages. It is Clear table blah blah and that is a
delete from statement. On an OCI stage such as Oracle, you do have both Clear and
DATA Q Technologies
Truncate options. They are radically different in permissions (Truncate requires you to
have alter table permissions where Delete doesn't).

Question: How do you rename all of the jobs to support your new File-naming
conventions?
A: Create a Excel spreadsheet with new and old names. Export the whole project as a
dsx. Write a Perl program, which can do a simple rename of the strings looking up the
Excel file. Then import the new dsx file probably into a new project for testing.
Recompile all jobs. Be cautious that the name of the jobs has also been changed in your
job control jobs or Sequencer jobs. So you have to make the necessary changes to these
Sequencers.

Question: Difference between Hashfile and Sequential File?


Question: What is iconv and oconv functions
Question: How can we join one Oracle source and Sequential file?.
Question: How can we implement Slowly Changing Dimensions in DataStage?.
Question: How can we implement Lookup in DataStage Server jobs
Question: What are all the third party tools used in DataStage?
Question: what is the difference between routine and transform and function?
Question: what are the Job parameters?
Question: How can we improve the performance of DataStage jobs?
Question: How can we create Containers?
Question: What about System variables?

Question: When should we use ODS?


A: DWH's are typically read only, batch updated on a schedule

ODS's are maintained in more real time, trickle fed constantly

Question: What is difference between operational data stage (ODS) & data warehouse?

Question: What are the difficulties faced in using DataStage? Or what are the constraints
in using DataStage?
A: Many, depends on project/transformation rules.

Question: How do you eliminate duplicate rows?


A: By creating hash file with unique key in server edition and by using duplicate
removal stage in PX.

Question: How do we do the automation of dsjobs?


A: By using Batch or sequencer can be run through schedular (Data Stage Director)
or using external automation tools like Control-m
DATA Q Technologies
Question: What r XML files and how do you read data from XML files and what stage
to be used?
A: XML files can be read through Data Stage by using xml stage.

Question: How do you catch bad rows from OCI stage?


A: Constraints place at data stage to write to reject tables

Question: Why do you use SQL LOADER/OCI STAGE?


A: To load data to Oracle Database.

Question: Suppose if there are million records, do you use OCI? if not then what stage
do you prefer?
A: That depends on the target database; using native database stage is faster than
using ODBC stage.

Question: How do you populate source files?


A: From the source

Question: How do you pass filename as the parameter for a job?


A: From designer, job properties, parameters

Question: How do you pass the parameter to the job sequence if the job is running at
night?
A: Using Scheduling tools or OS scripts using command line.

Question: What happens if the job fails at night?


A: That depends on the job problem, if problem occurs with data or job design, manual
action is required to re-run the job. If the problem with breakdowns, it can be rescheduled
using scheduling tools.

Question: What is SQL tuning? How do you do it?


A: Using explain plan.

Question: What is project life cycle and how do you implement it?
A: Go through SDLC (Software Development Life Cycle) from any Software
Engineering book. (Requirement gathering, gap analysis, design target model (Logical
and physical), design the data flow, development, implementation.)

Question: How will you call external function or subroutine from datastage?
A: External function using before or after subroutines.
Sub routines using Transformer derivation.

Question: How do you track performance statistics and enhance it?


DATA Q Technologies
A: From data stage director/monitor

Question: How do you do oracle 4 way inner join if there are 4 oracle input files?
A: Use the single OCI/ODBC stage and join the tables.

Question: What is the order of execution done internally in the transformer with the
stage editor having input links on the lft hand side and output links?

ANS: lookups, stage variables, constraints and derivations.

Question: Explain your last project and your role in It?


A: Project team member/developer
Design Data Stage Jobs based on the source target mapping form.
Unit test of jobs.
Performance tuning of jobs, documentation.

Question: Explain the differences between Oracle8i/9i?


A: Read the release notes.

Question: Do you know about INTEGRITY/QUALITY stage?


A: Integration of data and cleansing of data.

Question: Do u know about METASTAGE


A: Documentation purpose and impact analysis made easy if use metastage

Question: Difference between Hashfile and Sequential File?


A:
Hash files are reference files where as sequential file as data files.
Hash files will be stored as buckets and it is useful for fast searching.
Seq files will be stored as single bucket and will be read in sequential order.

Question: What is iconv and oconv functions


A:
iconv = convert date to numaric.
oconv = converts numeric to date.

Question: How can we join one Oracle source and Sequential file?
A: First store the Oracle into seq file and join 2 seq files using merge stage server edition.
Use merge stage to join the oracle and seq file in PX.

Question: How can we implement Slowly Changing Dimensions in DataStage?


A: Slowly changing dimensions:
Type1: When an attribute changed, old value will be over written with new value.
DATA Q Technologies
Type2: When an attribute changed, new member record will be created with effected
date.
Type3: When an attribute changed, new member attribute will be created with indicator.
These dimensions can be implemented with delta process in DataStage.
Delta process: Comparing previous snapshot with current snapshot

Question: How can we implement Lookup in DataStage Server jobs using hash files?
Question: What are all the third party tools used in DataStage?
A: Shell scripts, oracle procedures etc.

Question: what is the difference between routine and transform and function?
A: Transform is single line function and routine is multi line function. Routines will be
executed outside of the job, it will be not be compiled with the job executables.
Transform will be executed with in the job as it is compiled with in the job executable.
Hence using transform is faster than routine.

Question: How can we create Containers?


A: Select the part of job which is to be created as container and use the
"Construct container" (Edit->construct container)

Question: What about System variables?


A. Like sysdate etc.

Question. how do we call procedure(Oracle) from DataStage?


A. Oci stage before or after Sql.

Question. About the Datastage Architecture?


A: Datastage has 2 types of architectures, 1) Design time architecture 2) Run time
architecture. Design time architecture includes 4 interfaces.
Run time architecture includes job executables and universe objects for client/server
slave links.
Question.what is parallel extender in datastage?
A: PX uses parallel architecture and splits the process in to many processes to speed up
the process at server.

Question. What are the interfaces u used?


Manager, Director and Designer.

Question: How do you populate source files?


Question: How do you pass filename as the parameter for a job?
Question: How do you pass the parameter to the job sequence if the job is running at
night?
Question: What happens if the job fails at night?
DATA Q Technologies
Question: What is SQL tuning? How do you do it?
Question: What is project life cycle and how do you implement it?
Question: How will you call external function or subroutine from DataStage?
Question: How do you track performance statistics and enhance it?
Question: How do you do oracle 4 way inner join if there are 4 oracle input files?
Question: How do you do oracle 4 way inner join if there are 4 oracle input files?
Question: What is the order of execution done internally in the transformer with the
Stage editor having input links on the lft hand side and output links
Question: Explain your last project and your role in It.?

1. What is job parameter?


A: we can use job parameter for flexible job design and more over we r using job meter
in place of any user account name and password and give any path of file loc etc. and
simply we can say run time variable by defining its default values

2. What is hashed file?


A: We can use hashed file as target and source mainly, hashed file used for look up
operations since there is no look up stage in server jobs

3. What is link partitioner and link collector?


A: Partitioning the server jobs we can user link partitioner so that we can run it on smp
systems

4. What is inter process


A: To speed up the communications between the stages we can use inter process (by
enabling buffer size default is 128kb)

5. How do u enhance the performance of server jobs?


A: By using inter process or using link partitioner and link collector

6. How do u call a stored procedure in server jobs?


A: Using odbc stage we can store procedure

7. What are the types of partitions in link partitioner?


A: 1.Round Robin 2.rondom 3.modulus 4.hash

8. What is NLS?
A: National Language Support it means Datastage support any type of character set

9. What is diff between server jobs and parallel jobs?


A: Server jobs run at server side where as parallel job run at smp and mpp modes but
limitations of parallel jobs only on UNIX server
DATA Q Technologies

10. What is job sequence and what r the types of triggers u have in job sequence?
A: To run the jobs either server or parallel in sequence mode we can use job sequence
and also lets us to run the jobs by defining triggers

11. What r new features in version 7.5?


A: start loop and end loop and terminator activities and surrogate key stage

12. What is container and types of containers?


A: Container is nothing but a template for reusability. Two types one local and shared

13. What is routine and types of routines?


Some predefine functionality there four types of routines r there before/after, job/stage

14. What is a plug_in stage?


A: To perform any special task that does not support by built-in stages

15. What is stage and types of stages?


A: stage is processing the data these r two types active and passive

16. What is merge and join stages and different between them?
A: Merge is for merging the two seq files where as join for joining any two databases

17. What is configuration file?


A: Regards less of O/S we can run the job on any operating system by configure its
hostname and path of the job

You might also like