1. Introduction. DSC Communications produces hardware and software for telecommunications. The software products run on DU systems upon a Digital product called TeMIP. Furthermore, we use databases of different kinds, Oracle and Ingres. Almost all customer installations are tailored by hand by our Systems Integration department. The installations range from standalone systems to distributed systems with many clusters. I experience many problems with putting those systems together. Not that ASE does not work but simply ibecause of mess in the installations. The problems are (prioritized): 1: ASE errors such as getting the same service present on two members at the same time. This leads to disk corruption. 2: Failover errors because the filesystems could not be mounted or unmounted as they were still occupied. 3: Errors due to bad syntax in the action scripts. This my lead to a dialog where the only possibility available is to delete the service. 4: Errors in the software where it is misunderstood whether a service is present or not. 5: Errors caused by the software when the integrator tries to put two services together. Aliases go wrong and it is even harder, in a consistent way, to determine what is running where. 6: Problems describing for the systems integrator what to do in a manner that is general enough to cover both standalone systems and large multi-clusters. 7: Problems when migrating from standalone to cluster. It is my idea that it must be possible to define a model that covers all our system activity from the standalone case to the cluster case so that software installation and physical upgrades can be done the same way independently of the system configuration. 2. Model Considerations. If I make a service X, this service can only be present on one cluster member at a time. If I have more clusters, I can have more X'es, in particular, I can have one per standalone host. I would like a naming scheme where no no two different services would ever collide. This can be done by inventing a 'cluster name', one different name per cluster and standalone and then concatenate the service name and the cluster name. For internal use in the cluster itself, the X is enough. For cluster management, I would like all management things such as start, stop scripts and other utilities to be placed with the service data. That is, I hate to have to check all the other members if I have made a change to one file. I would also like as few mountpoints per cluster service as possible. Some of the services that I deal with have above 10 mountpoints simply because the service directories are sparse across the filesystem. The only case where more mountpoints should be allowed is where the service has to provide more than one disk. For example Oracle installations love to span multiple disks. 3. Proposal. 3.1 /disk Below /disk is one directory for each cluster service. The naming is simple, for example /disk/s00, /disk/s01, /disk/s02... The disks owned by a cluster service are below. If the service owns more than one disk, they are numbered like /disk/s02/00, /disk/s02/01... It is important that when putting the cluster together, the systems integrator needs only a worksheet telling how many services and the capacity of each. He should not care about what each service will be used for. As for local disks, they are also put here. They will however be numbered /disk/l00, /disk/l01, /disl/l02... 3.2 /service Below /service are the logical services. The database, NSR, NIS and so on. The following may seem a little backwards but explanation follows. Beneath /service is the disk numbers 00, 01, 02 and so on. Beneath those again, the local service name as described above. If nsr only needs one disk, it is represented by /service/00/nsr, but Oracle that needs three disks will look like /service/00/oracle, /service/01/oracle, /service/02/oracle. Administrative scripts where the start and stop scripts are mandatory, are put in /service/00/name/sbin. 3.3 Putting services, local disks and ASE together. If I would like the nsr service to be on ASE service s02, I would simply copy recursively everything below /services/00/nsr to /disk/s02/00/nsr and make the link /services/00/nsr -> /disk/s02/00/nsr. That is trivial. But the perspectives are nice, because: 3.3.1 I can have any number of logical services moved to the same disks or cluster services. 3.3.2 I can move around with the services anytime I want. Migrate from standalone to cluster and back again. 3.3.3 There is only ONE start and ONE stop script to add to ASE. On start of ASE service s02, run the scripts /disk/s02/00/*/sbin/start. 3.3.4 Local services are easily managed by /sbin/rcN.d, as the start script runs /disk/l*/00/*/sbin/start. Multi-disk services like Oracle go the same way. There are just more directories to link and copy. In general, when migrating a service from one place to another, just copy all of /service/00/name, /service/01/name and so on until there are no more. General and elegant. Migrating ASE services means also editing /etc/hosts of the cluster members. When the ASE services are added, /etc/hosts may look like this: 129.25.13.101 claudia 129.25.13.102 cathy 129.25.13.103 cindy 129.25.13.104 s00 129.25.13.105 s01 129.25.13.106 s02 129.25.13.107 s03 After migrating Oracle to s02, it will look like this: 129.25.13.101 claudia 129.25.13.102 cathy 129.25.13.103 cindy 129.25.13.104 s00 129.25.13.105 s01 129.25.13.106 s02 oracle_c07 129.25.13.107 s03 Where c07 is the cluster name and oracle is the local name of the service. If NIS is run as an ASE service, there will be problems with XIsso if the service name is not an alias to the hostneme of the system. XIsso will complain that the server mentioned in the NIS map is not the local host. 3.3.5 It will do a lot of good if the 'generic' start and stop script moves the aliases of the service started from the service to the local host. After start of the oracle service, /etc/hosts on claudia will look like this: 129.25.13.101 claudia oracle_s07 129.25.13.102 cathy 129.25.13.103 cindy 129.25.13.104 s00 129.25.13.105 s01 129.25.13.106 s02 129.25.13.107 s03 3.4 And the 'real' filesystem... If I want to put some part of the system into this, I just need to go into the filesystem and point out what directories are specific to this thing and make them links to /services/... For example, if backup should be enrolled, let /var/nsr be a link to /services/00/backup/var/nsr, then migrate it to a local disk or the cluster as you please. For Oracle, /u01 thru /u05 or how many are used could be made links to /services/00...05/oracle/u01...05. 3.4.1 This reduces the number of mountpoints I otherwise have to make. I will of course make one setld kit for each local service name as I invent those local services. 3.5 Obtaining state of the system 3.5.1 Q: Is this service configured on this cluster member ? A: Yes, if /service/00/servicename points to anything in /disks 3.5.2 Q: Is this service running here ? A: Yes, if /service/00/servicename points to anything existing 3.5.3 Q: Are the processes of the service running ? A: Can be dertermined if a state program is provided with the service kit (in /service/00/servicename/sbin/state) 3.5.2 Q: Are the programs of that service active 3.6 Putting all together. The master kit contains /disk /service /service/etc /service/etc/clustername /service/sbin /service/sbin/migrate /sbin/init.d/service The backup service kit contains /service/00/backup /service/00/backup/sbin /service/00/backup/sbin/start /service/00/backup/sbin/stop /service/00/backup/sbin/state script that converts /var/nsr to be a link to /service/00/backup/var/nsr Lars Bro