Nuts & Bolts of DataStage: Configuration

Showing posts with label Configuration. Show all posts

Wednesday, June 03, 2015

MongoDB Configuration in Linux

DOWNLOAD the stable version of MongoDB. It will a tar file.

1. Create a folder named 'learn' ( or whatever you want to give )

$ mkdir -p /learn/mongodb /app/dbMongo

# /learn/mongodb = Holds the MongoDB source

# /app/dbMongo = Holds the MongoDB database

2. Extract the Mongo DB zip file in /learn folder

$ tar -xvf MongoDB.tar /learn/mongodb

3. change the permission of folder to user who run the db here- In my case User - atul and Group - atul
$ chown -R atul:atul /learn/mongodb /app/dbMongo

APT_CONFIG_FILE : Configuration File

APT_CONFIG_FILE is the file using which DataStage determines the configuration file (one can have many configuration files for a project) to be used. In fact, this is what is generally used in production. However, if this environment variable is not defined then how DataStage determines which file to use?

1)If the APT_CONFIG_FILE environment variable is not defined then DataStage look for default configuration file (config.apt) in following path:

1)Current working directory.

2)INSTALL_DIR/etc, where INSTALL_DIR ($APT_ORCHHOME) is the top level directory of DataStage installation.

What are the different options a logical node can have in the configuration file?

Datastage Coding Checklist

Ensure that the null handling properties are taken care for all the nullable fields. Do not set the null field value to some value which may be present in the source.
Ensure that all the character fields are trimmed before any processing. Normally extra spaces in the data may lead to some errors like lookup mismatch which are hard to detect.
Always save the metadata (for source, target or lookup definitions) in the repository to ensure re usability and consistency.

Tail Stage in DataStage

Tail Stage is another one stage from development stage category. It can have a single input link and a single output link.
The Tail Stage selects the last N records from each partition of an input data set and copies the selected records to an output data set.

a) Job Design :

Head Stage in DataStage

Welcome to Basic Intro with Stage Series, We are going to look into HEAD stage ( Developmet/Dubug Categoty). It can have a single input link and a single output link.

The Head Stage selects the first N rows from each partition of an input data set and copies the selected rows to an output data set. You determine which rows are copied by setting properties which allow you to specify:

The number of rows to copy
The partition from which the rows are copied
The location of the rows to copy
The number of rows to skip before the copying operation begins.

Conductor Node in Datastage

Below is the sample APT CONFIG FILE ,see in bold to mention conductor node.

{
node "node0"
{
fastname "server1"
pools "conductor"
resource disk "/datastage/Ascential/DataStage/Datasets/node0" {pools "conductor"}
resource scratchdisk "/datastage/Ascential/DataStage/Scratch/node0" {pools ""}
}

Create a unique counter in datastage

This entry describes various ways of creating a unique counter in DataStage jobs.
A parallel job has a surrogate key stage that creates unique IDs, however it is limited in that it does not support conditional code and it may be more efficient to add a counter to an existing transformer rather than add a new stage.

In a server job there are a set of key increment routines installed in the routine SDK samples that offer a more complex counter that remembers values between job executions.
The following section outlines a transformer only technique.

How can I run the osh command of the InfoSphere Parallel Engine?

The osh command is the main program of the InfoSphere Parallel Engine. This command is used by DataStage to perform several different tasks including parallel job execution and dataset management. Normally, there is no need to run this command directly but sometimes it is useful to use it for troubleshooting purposes.

To run this command there are 3 environment variables that must be set. These are:

APT_ORCHHOME should point to Parallel Engine location
APT_CONFIG_FILE should point to a configuration file
LD_LIBRARY_PATH should include the path to the parallel engine libraries. Please note that the name of this environment variable may take a different name (such as LIBPATH in AIX or SLIB_PATH in HP-UX) depending on your Operating System. Note: This variable does not need to be set in Windows environments.

Managing and Deleting Persistent Data Sets within IBM InfoSphere Datastage

Data Sets sometimes take up too much disk space. This technote describes how to obtain information about datasets and how to delete them.

Data sets can be managed using the Data Set Management tool, invoked from the Tools > Data Set Management menu option within DataStage Designer (DataStage Manager in the 7.5 releases.) Alternatively, the 'orchadmin' command line program can be used to perform the same tasks.

The files which store the actual data persist in the locations identified as resource disks in the configuration files. These files are named according to the pattern below:

descriptor.user.host.ssss.pppp.nnnn.pid.time.index.random

Oracle Interview Questions - Part-3

51. What is a database instance? Explain.
A database instance (Server) is a set of memory structure and background processes that access a set of database files. The processes can be shared by all of the users. The memory structure that is used to store the most queried data from database. This helps up to improve database performance by decreasing the amount of I/O performed against data file.

52. What is Parallel Server?
Multiple instances accessing the same database (only in multi-CPU environments)

WinSCP - Save all configurations in an INI file

You can configure WinSCP to save configurations to an INI file (instead of Windows Registry):

Open WinSCP and go to the Preferences section.
Next to "Other general options" click on the "Preferences" button.

First time IIS User Setup - DataStage 8.7

Typical users list:

wasadmin
isadmin
db2inst1,db2iadm1
db2fenc1,db2fadm1
dasusr1,dasadm1
xmeta,xmeta (db owner)
xmetasr (staging area)
iauser,iadb (db user)
dsadm,dstage

All about 000 - 421 : DataStage Certification Exam Test Preparation

1. DataStage v8 Configuration (5%)

Describe how to properly configure DataStage V.8.0.

This is kind of vague but focus on how DataStage 8 gets attached to a Metadata Server via the Metadata Console and how security rights are set up.
Read up on configuring DB2 and Oracle client and ODBC.
Get to know the dsenv file. Read the DataStage Installation Guide for post-installation steps.

uvodbc.config File

This is a Part of ODBC Configuration in DataStage tutorial.

Update the uvodbc.config File

The uvodbc.config is located in the root of the project directory (The project directory can be determined by opening DataStage Administrator, clicking on the Projects Tab, selecting the project)
This steps simply adds the datasource to the drop down list on the ODBC Import Screen

.odbc.ini EXAMPLE file

This is a Part of ODBC Configuration in DataStage tutorial.
This is Example file of .odnc.ini

[ODBC Data Sources]

TESTDB=DataDirect DB2 Wire Protocol Driver

DB2 Wire Protocol=DataDirect DB2 Wire Protocol Driver

dBase=DataDirect dBaseFile(*.dbf)

Informix=DataDirect Informix

.odbc.ini file in DataStage

This is a Part of ODBC Configuration in DataStage tutorial.

Update the .odbc.ini File

The .odbc.ini file creates the relationship between the datasource name and the odbc driver that is supposed to be used to connect to it.

The .odbc.ini file is located in the DSHOME directory

ODBC Configuration in DataStage

For configure the DataStage ODBC connections, need to edit three files to set up the required ODBC connections. These are:

dsenv
.odbc.ini
uvodbc.config

All three are located in the $DSHOME directory. Copies of uvodbc.config are also placed in the project directories.

An introduction to MQ - Part2

Let's look at the Remote Queue defnition for CAPA.TO.APPB.SENDQ.REMOTE, shown next. On the left-hand side are the defnitions on QMA, which comprise the Remote Queue, the Transmission Queue, and the Channel defnition. The defnitions on QMB are on the right-hand side and comprise the Local Queue and the Receiver Channel.

An introduction to MQ - Part1

In a nutshell, WebSphere MQ is an assured delivery mechanism, which consists of queues managed by Queue Managers. We can put messages onto, and retrieve messages from queues, and the movement of messages between queues is facilitated by components called Channels and Transmission Queues.

There are a number of fundamental points that we need to know about WebSphere MQ:

All objects in WebSphere MQ are case sensitive
We cannot read messages from a Remote Queue (only from a Local Queue)
We can only put a message onto a Local Queue (not a Remote Queue)

DataStage Configuration file : Explained - 3

Below is the sample diagram for 1 node and 4 node resource allocation:

Subscribe to: Comments ( Atom )

Wednesday, June 03, 2015

Thursday, June 19, 2014

Monday, February 17, 2014

Thursday, November 28, 2013

a) Job Design :

Wednesday, November 27, 2013

Friday, November 08, 2013

Monday, October 07, 2013

Tuesday, September 17, 2013

Friday, September 06, 2013

Data Sets sometimes take up too much disk space. This technote describes how to obtain information about datasets and how to delete them.

Tuesday, August 20, 2013

Monday, August 12, 2013

Thursday, June 20, 2013

Wednesday, March 13, 2013

Monday, February 11, 2013

Update the uvodbc.config File

Thursday, February 07, 2013

Tuesday, February 05, 2013

Monday, January 07, 2013

Tuesday, January 01, 2013

Thursday, October 18, 2012