1. Introduction.

DSC Communications produces hardware and software for telecommunications.
The software products run on DU systems upon a Digital product called 
TeMIP. Furthermore, we use databases of different kinds, Oracle and Ingres.
Almost all customer installations are tailored by hand by our Systems
Integration department. The installations range from standalone systems
to distributed systems with many clusters.

I experience many problems with putting those systems together. Not that
ASE does not work but simply ibecause of mess in the installations. The
problems are (prioritized):

1:	ASE errors such as getting the same service present on two members 
	at the same time. This leads to disk corruption.

2:	Failover errors because the filesystems could not be mounted or
	unmounted as they were still occupied.

3:	Errors due to bad syntax in the action scripts. This my lead to a
	dialog where the only possibility available is to delete the 
	service.

4:	Errors in the software where it is misunderstood whether a
	service is present or not. 

5:	Errors caused by the software when the integrator tries to put
	two services together. Aliases go wrong and it is even harder,
	in a consistent way, to determine what is running where.

6:	Problems describing for the systems integrator what to do in a 
	manner that is general enough to cover both standalone systems
	and large multi-clusters.

7: 	Problems when migrating from standalone to cluster. 

It is my idea that it must be possible to define a model that covers all
our system activity from the standalone case to the cluster case so that
software installation and physical upgrades can be done the same way
independently of the system configuration.

2. Model Considerations.

If I make a service X, this service can only be present on one cluster
member at a time. If I have more clusters, I can have more X'es, in 
particular, I can have one per standalone host. I would like a naming
scheme where no no two different services would ever collide. This can
be done by inventing a 'cluster name', one different name per cluster and
standalone and then concatenate the service name and the cluster name.

For internal use in the cluster itself, the X is enough. 

For cluster management, I would like all management things such as start,
stop scripts and other utilities to be placed with the service data. That
is, I hate to have to check all the other members if I have made a change
to one file.

I would also like as few mountpoints per cluster service as possible. Some
of the services that I deal with have above 10 mountpoints simply because
the service directories are sparse across the filesystem. The only case 
where more mountpoints should be allowed is where the service has to
provide more than one disk. For example Oracle installations love to
span multiple disks.

3. Proposal.

3.1 /disk

Below /disk is one directory for each cluster service. The naming 
is simple, for example /disk/s00, /disk/s01, /disk/s02...
The disks owned by a cluster service are below. If the service owns
more than one disk, they are numbered like /disk/s02/00, 
/disk/s02/01...

It is important that when putting the cluster together, the systems 
integrator needs only a worksheet telling how many services and the
capacity of each. He should not care about what each service will be 
used for.

As for local disks, they are also put here. They will however be 
numbered /disk/l00, /disk/l01, /disl/l02...

3.2 /service

Below /service are the logical services. The database, NSR, NIS and so on. 
The following may seem a little backwards but explanation follows. Beneath
/service is the disk numbers 00, 01, 02 and so on. Beneath those again, the
local service name as described above. If nsr only needs one disk, it is
represented by /service/00/nsr, but Oracle that needs three disks will 
look like /service/00/oracle, /service/01/oracle, /service/02/oracle.

Administrative scripts where the start and stop scripts are mandatory, 
are put in /service/00/name/sbin. 

3.3 Putting services, local disks and ASE together.

If I would like the nsr service to be on ASE service s02, I would simply
copy recursively everything below /services/00/nsr to /disk/s02/00/nsr and
make the link /services/00/nsr -> /disk/s02/00/nsr. That is trivial. But
the perspectives are nice, because:

3.3.1 	I can have any number of logical services moved to the same disks or 
	cluster services.

3.3.2	I can move around with the services anytime I want. Migrate from
	standalone to cluster and back again.

3.3.3	There is only ONE start and ONE stop script to add to ASE. On 
	start of ASE service s02, run the scripts /disk/s02/00/*/sbin/start.

3.3.4	Local services are easily managed by /sbin/rcN.d, as the start 
	script runs /disk/l*/00/*/sbin/start.

Multi-disk services like Oracle go the same way. There are just more 
directories to link and copy. In general, when migrating a service from
one place to another, just copy all of /service/00/name, /service/01/name
and so on until there are no more. General and elegant.

Migrating ASE services means also editing /etc/hosts of the cluster members. 
When the ASE services are added, /etc/hosts may look like this:

129.25.13.101   claudia
129.25.13.102   cathy
129.25.13.103   cindy
129.25.13.104   s00
129.25.13.105   s01
129.25.13.106   s02
129.25.13.107   s03

After migrating Oracle to s02, it will look like this:

129.25.13.101   claudia
129.25.13.102   cathy
129.25.13.103   cindy
129.25.13.104   s00
129.25.13.105   s01
129.25.13.106   s02	oracle_c07
129.25.13.107   s03

Where c07 is the cluster name and oracle is the local name of the service.

If NIS is run as an ASE service, there will be problems with XIsso if 
the service name is not an alias to the hostneme of the system. XIsso
will complain that the server mentioned in the NIS map is not the local
host. 

3.3.5	It will do a lot of good if the 'generic' start and stop script
	moves the aliases of the service started from the service to
	the local host.

After start of the oracle service, /etc/hosts on claudia will look like
this:

129.25.13.101   claudia		oracle_s07
129.25.13.102   cathy
129.25.13.103   cindy
129.25.13.104   s00
129.25.13.105   s01
129.25.13.106   s02	
129.25.13.107   s03

3.4 And the 'real' filesystem...

If I want to put some part of the system into this, I just need to go into 
the filesystem and point out what directories are specific to this thing
and make them links to /services/... For example, if backup should be enrolled,
let /var/nsr be a link to /services/00/backup/var/nsr, then migrate it to
a local disk or the cluster as you please. For Oracle, /u01 thru /u05 or
how many are used could be made links to /services/00...05/oracle/u01...05.

3.4.1	This reduces the number of mountpoints I otherwise have to make.

I will of course make one setld kit for each local service name as I invent
those local services.

3.5 Obtaining state of the system


3.5.1	Q: Is this service configured on this cluster member ?
	A: Yes, if /service/00/servicename points to anything in /disks

3.5.2	Q: Is this service running here ? 
	A: Yes, if /service/00/servicename points to anything existing

3.5.3	Q: Are the processes of the service running ?
	A: Can be dertermined if a state program is provided with the service
	kit (in /service/00/servicename/sbin/state)


3.5.2	Q: Are the programs of that service active

3.6	Putting all together.


The master kit contains

/disk
/service
/service/etc
/service/etc/clustername
/service/sbin
/service/sbin/migrate
/sbin/init.d/service


The backup service kit contains

/service/00/backup
/service/00/backup/sbin
/service/00/backup/sbin/start
/service/00/backup/sbin/stop
/service/00/backup/sbin/state

script that converts /var/nsr to be a link to /service/00/backup/var/nsr


Lars Bro