DataStage job run statistics (i.e. rows per second processed) do not update in DataStage Designer or Director clients.
Check these details :
This
section contains a quick series of diagnosis steps for those familiar
with DataStage and Job monitor. If more detail is needed for any step,
please refer to the more detailed instructions in the "Resolving the
Problem" section:
If customer has checked and verified that the following:
If customer has checked and verified that the following:
- Check the DataStage Director job log for jobs
which do not not show job run statistics and confirm whether the
following variable is defined:
APT_NO_JOBMON
If APT_NO_JOBMON is defined and set to value of 1 or true, then it will disable job monitoring and process metadata reporting for parallel jobs. - Confirm the JobMonApp process is up and running:
ps -ef | grep JobMonApp - Confirm default ports 13400 and 13401 are listening:
netstat -an | grep 134 - Check job monitor log file for errors:
cat /ibm/InformationServer/Server/PXEngine/java/JobMonApp.log - Confirm job monitor is setup to use ports 13400 and 13401
cat /ibm/InformationServer/Server/PXEngine/etc/jobmon_ports - If job monitor log shows no errors but job log reports "Failed to connect to JobMonApp on port 13401" then update jobmon_ports file to use 2 new ports which are not already in use. This will require restart of JobMonApp.
- If problem still occurs, confirm that /etc/hosts file contains the following entry
127.0.0.1 localhost
without a localhost entry, Job Monitor will be unable to use the ports correctly.
Solution :
DataStage jobs generate statistics (such as rows per second processed) which can be displayed on each link when a job is run via Designer. However, these statistics only update when the job monitor application, JobMonApp, is running.JobMonApp is started with command jobmoninit script located in directory:
- .../ibm/InformationServer/Server/PXEngine/java
Verify that JobMonApp is running
On Unix systems, you can enter the following command to confirm if JobMonApp process is running:
- ps -ef | grep JobMonApp
- $ ps -ef | grep JobMonApp
SYSTEM 4964 4044 0 09:22:27 con 0:00 C:\IBM\InformationServer\ASBNode\apps\jre\bin\java -Xrs -classpath C:/IBM/InformationServer/Server/PXEngine/java/JobMonApp.jar;C:/IBM/InformationServer/Server/PXEngine/java/xerces/xmlParserAPIs.jar;C:/IBM/InformationServer/Server/PXEngine/java/xerces/xercesImpl.jar JobMonApp
13400 13401 -debug
dsadm 7788 4136 0 16:48:50 con 0:00 grep JobMonApp
However, on some platforms such as Solaris, the PS output may be truncated, and since JobMonApp appears at the end of string, the above command may not find a match even though the job is running. In this situation, you can instead look in the .jobmonpid file, located in same directory as jobmoninit. The .jobmonpid file contains the process id last used for JobMonApp. You can then query that process id to see if it is running, i.e. if .jobmonpid file contains 3174, then enter command:
- $ ps -ef | grep 3174
If JobMonApp is not running, then run the "jobmoninit" command script to restart it.
Check for errors in JobMonApp.log
If JobMonApp is running, but your jobs do not update statistics, then the next place to check for an error is the JobMonApp.log file written to the above directory. Historical logs are also saved in the same directory. During a normal startup, JobMonApp requires that is 2 defined ports be available, not used by other programs. One port is used to communicate with the job, while the other port is used to communicate with the DataStage engine. Both ports must be available, so if startup message indicates one port available and one has conflict, then Job Monitor will not function correctly.
A normal startup log will appear as follows:
- WELCOME to the Job Mon Application.
Tue Jun 30 09:22:28 PDT 2009
Using ports: 13400 and 13401
A startup with port conflict will instead contain:
- Tue May 19 13:48:19 CDT 2009
Using ports: 13400 and 13401
Could not listen on port: 13400 Address already in use
Additionally, if the failing port is used to communicate with the running job, it may cause an additional error to appear in the job log:
- Failed to connect to JobMonApp on port 13401
Resolving port conflicts for JobMonApp
To resolve port conflict issues for JobMonApp, use the following command to determine current usage for each port used by job monitor, i.e.:
- netstat -a | grep 134
If the ports are found, they should have a status of "LISTENING". If the status is CLOSE_WAIT or something else, it could indicate that an older instance of DataStage or JobMonApp did not successfully release the port. While some operating systems have commands to force the release of the port or to kill an application holding the port, in some cases it may take a system restart to free the port.
If this port conflict continues even after a system restart, then multiple applications may have been setup to use this port. If you are running multiple DataStage instances on one server, you should check the /etc/services file to confirm your ports have not been allocated to multiple applications. Then look at the following file:
- .../ibm/InformationServer/Server/PXEngine/etc/jobmon_ports
- APT_JOBMON_PORT1=13400
APT_JOBMON_PORT2=13401
If two DataStage instances are using the same job monitor ports, you will need to update this file for one instance. After changing the above port values, you will need to stop and restart JobMonApp for the change to take effect.
Also confirm that your /etc/hosts file on DataStage server machine contains the following entry:
127.0.0.1 localhost
Without a localhost definition, the job monitor may not be able to communicate correctly on the above ports.
If no port conflict exists, and no port errors are found in the JobMonApp.log file, but the log does contain other errors, it may be necessary to contact Information Server technical support if the error message does not give a clear cause of the problem.
An additional problem using localhost can occur if the /etc/nsswitch.conf file is not setup correctly to check hosts file before domain nameserver. The correct entry normally appears as:
hosts: files dns
Running JobMonApp with debug output
If no errors appear in log file, or if more detailed error messages are needed, you can run JobMonApp in debug mode. To enable debug output you will modify jobmoninit, so first create a backup copy of jobmoninit. Next, edit jobmoninit and find the section of script for your current operating system, and then locate a line similar to:
- nohup
$APT_ORCHHOME/java/jre/bin/java -classpath $CLASSPATH JobMonApp
$jobmon_port1 $jobmon_port2 > $logfile 2>&1 &
- nohup
$APT_ORCHHOME/java/jre/bin/java -classpath $CLASSPATH JobMonApp
$jobmon_port1 $jobmon_port2 -debug > $logfile 2>&1 &
Tracing job monitor calls originating in DataStage jobs
Enabling debug mode for JobMonApp / jobmoninit will only trace problems which occur within the JobMonApp process. For problems where DataStage jobs cannot connect to job monitor or do not work correctly with job monitor, an additinoal trace needs to be enabled for the failing job.
In the job properties dialog, parameters panel, use the Add Environment Variables button to add variable:
OSHMON_TRACE
and set it to value of 1 (or true if presented with selection dialog for value). Compile and re-run the failing job. When OSHMON_TRACE=1 is set, additional trace files will be written to the &PH& directory of the project which owns the failing job.
Review the new files written at time of job failure for additional errors. For example, in the case where connection to job monitor from job fails, the trace may show the following line:
"opensocket() returned 30"
which means that the host name is not recognized (which in this case means a problem with the localhost definition).
Contacting technical support for job monitor problems
When contacting technical support with a job monitor issue, provide the following files and details:
- OS platform/release info
- Problem symptoms/errors
- Version.xml
- JobMonApp.log
- jobmon_ports file
- /etc/services file, etc/hosts file, etc/nsswitch.conf file
- output of command: ps -ef | grep JobMonApp
- output of command: netstat -a
- confirm if error occurs when issue command: telnet localhost 13400 (or whichever port job monitor uses).
Courtesy : IBM