We have moved to www.dataGenX.net, Keep Learning with us.

Sunday, July 22, 2012

Setting environment variables for the parallel engine in DataStage


You set environment variables to ensure smooth operation of the parallel engine. Environment variables are set on a per-project basis from the Administrator client.

Procedure
1. Click Start > All Programs > IBM Information Server > IBM InfoSphere DataStage and QualityStage Administrator, and log in to the Administrator client.
2. Click the Project tab, and select a project.
3. Click Properties.
4. On the General tab, click Environment.
5. Set the values for the environment variables as necessary.

 

Environment variables for the parallel engine
Set the listed environment variables depending on whether your environment meets the conditions stated in each variable.


Network settings

1.     
APT_IO_MAXIMUM_OUTSTANDING
If the system connects to multiple processing nodes through a network, set the APT_IO_MAXIMUM_OUTSTANDING environment variable to specify the amount of memory, in bytes, to reserve for the parallel engine on every node for TCP/IP communications. The default value is 2 MB.
If TCP/IP throughput at that setting is so low that there is idle processor time, increment it by doubling the setting until performance improves. If the system is paging, however, or if your job fails with messages about broken pipes or broken TCP connections, the setting is probably too high.

 
2.      APT_RECVBUFSIZE
If any of the stages within a job has a large number of communication links between nodes, specify this environment variable with the TCP/IP buffer space that is allocated for each connection. Specify the value in bytes.
The APT_SENDBUFSIZE and APT_RECVBUFSIZE values are the same. If you set one of these environment variables, the other is automatically set to the same value. These environment variables override the APT_IO_MAXIMUM_OUTSTANDING environment variable that sets the total amount of TCP/IP buffer space that is used by one partition of a stage.

 
3.      APT_SENDBUFSIZE
If any of the stages within a job has a large number of communication links between nodes, specify this environment variable with the TCP/IP buffer space that is allocated for each connection. Specify the value in bytes.
The APT_SENDBUFSIZE and APT_RECVBUFSIZE values are the same. If you set one of these environment variables, the other is automatically set to the same value. These environment variables override the APT_IO_MAXIMUM_OUTSTANDING environment variable that sets the total amount of TCP/IP buffer space that is used by one partition of a stage.

 
4.      Transform library
If you are working on a non-NFS MPP system, set the APT_COPY_TRANSFORM_OPERATOR environment variable to true to enable Transformer stages to work in this environment. IBM® InfoSphere® DataStage® and QualityStage™ Administrator users must have the appropriate privileges to create project directory paths on all the remote nodes at runtime. This environment variable is set to false by default.

 
5.      Job monitoring
By default, the job monitor uses time-based monitoring in the InfoSphere DataStage and QualityStage Administrator Director. The job monitor window is updated every five seconds. You can also specify that the monitoring is based on size. For example, the job monitor window is updated based on the number of new entries. To base monitoring on the number of new entries, set a value for the APT_MONITOR_SIZE environment variable. If you override the default setting for the APT_MONITOR_TIME, the setting of the APT_MONITOR_SIZE environment variable is also overridden.

 
6.      Detailed information about jobs
To produce detailed information about jobs as they run, set the APT_DUMP_SCORE value to True. By default, this environment variable is set to False.

 
7.      C++ compiler
The environment variables APT_COMPILER and APT_LINKER are set at installation time to point to the default locations of the supported compilers. If your compiler is installed on a different computer from the parallel engine, you must change the default environment variables for every project by using the Administrator client.

 
8.      Temporary directory
By default, the parallel engine uses the C:\tmp directory for some temporary file storage. If you do not want to use this directory, assign the path name to a different directory by using the environment variable TMPDIR.


 

Specifying C++ compiler settings



If you are using one of the compilers listed, then you must make the specified changes before you create and run parallel jobs. Complete this task for every development engine computer or any production engine computer where jobs will be recompiled. The compiler settings must be specified for each project that requires them.

Procedure


 For GCC:

Use the IBM® InfoSphere® DataStage® and QualityStage™ Administrator to configure compiler settings for each project.

APT_COMPILEOPT: -O -fPIC -Wno-deprecated -c -m32
APT_COMPILER: g++
APT_LINKER: g++
APT_LINKOPT: -shared -m32 -Wl,-Bsymbolic,--allow-shlib-undefined


 For Solaris 9 and 10:

Use the IBM InfoSphere DataStage and QualityStage Administrator to configure compiler settings for each project.

APT_COMPILEOPT: -c -O -xarch=v9 -library=iostream –KPIC
APT_COMPILER: /opt/SUNWspro/bin/CC
APT_LINKER: /opt/SUNWspro/bin/CC
APT_LINKOPT: -xarch=v9 -library=iostream -G -KPIC

 
 For Microsoft Visual Studio .NET 2003:

Select Start > IBM InfoSphere Information Server > IBM InfoSphere DataStage and QualityStage Administrator, and then log in to the Administrator client.
Select Projects > Properties > Environment.
Select Parallel > Compiler > APT_COMPILEOPT and delete the string -W/Zc:wchar_t- from the end of the existing setting in the Value field.
Select Parallel > Compiler > APT_LINKOPT and delete the string -W/Zc:wchar_t- from the end of the existing setting in the Value field.


 For Microsoft Visual Studio 2005 Professional Edition C++:
No additional changes are required to run the parallel engine transforms when this compiler is installed before you install IBM InfoSphere Information Server.

 
 For Microsoft Visual Studio .NET 2005 Express® Edition C++ and Microsoft Visual Studio .NET 2008 Express Edition C++:

The compiler settings for this version are correctly set by default; however, ensure that you also installed the SDK and that its resources are available to the system environment.
From the Windows Control Panel, select System Advanced > Environment Variables > System Variables.
Set the LIB environment variable to the location of the library directory for the SDK. For example, for Microsoft Visual Studio .NET 2008 Express Edition C++, a typical location is C:\Program Files\Microsoft SDKs\Windows\v6.0A\Lib.
Set the INCLUDE environment variable to the location of the include directory for the SDK. For example, for Microsoft Visual Studio .NET 2008 Express Edition C++, a typical location is C:\Program Files\Microsoft SDKs\Windows\v6.0A\Include.
Note: The LIB and INCLUDE environment variable names must be specified in uppercase characters.
Save the settings and restart the computer.

Note: You must restart the computer for the environment variable settings to take effect.





njoy the simplicity.......
Atul Singh

No comments :

Post a Comment