The openMosix HOWTO

Live free() or die()

Kris Buytaert

and Others

Revision History
Revision v1.011 may 2003
At last
Revision v1.0 RC 107 may 2003
Major Cleaning
Revision v0.9504 april 2003
Replaced ClumpOS by PlumpOS
Revision v0.9425 february 2003
Patches by Mirko Caserta
Revision v0.9316 february 2003
Extra features and fixes
Revision v0.9221 january 2003
Revision v0.9127 september 2002
Revision v0.9003 september 2002
Revision v0.7126 August 2002
Spleling Fexis
Revision v0.7022 August 2002
Stripped out empty parts, replaced Mosixview with openMosixView
Revision v0.506 July 2002
First openMosix HOWTO
Revision v0.205 July 2002
Latest Mosix HOWTO (for now)
Revision v0.1728 June 2002
Revision v0.1513 March 2002
Revision v0.1318 Feb 2002
Revision ALPHA 0.0309 October 2001

"The best way to become acquainted with a subject is to write a book about it." (Benjamin Disraeli)


Table of Contents
I. Introduction
1. Introduction
2. So what is openMosix Anyway ?
II. Installing openMosix
3. Requirements and Planning
4. Distribution specific installations
5. Autodiscovery
6. Cluster Installation
III. Administrating openMosix
7. Administrating openMosix
8. Tuning Mosix
9. openMosixview
10. Other openMosix related Programs
11. Common Problems
12. Hints and Tips
13. (stress)Testing your openMosix installation
IV. Running Applications on openMosix
14. Improving Compiling Performance
15. Imaging with openMosix
16. BioInformatics and openMosix
V. openMosix Development
17. Getting started with openMosix internals
VI. FAQ
A. More Info
A.1. irc
A.2. Further Reading
A.3. Translations
A.4. Links
A.5. Mailing List
B. Credits
C. GNU Free Documentation License
0. PREAMBLE
1. APPLICABILITY AND DEFINITIONS
2. VERBATIM COPYING
3. COPYING IN QUANTITY
4. MODIFICATIONS
5. COMBINING DOCUMENTS
6. COLLECTIONS OF DOCUMENTS
7. AGGREGATION WITH INDEPENDENT WORKS
8. TRANSLATION
9. TERMINATION
10. FUTURE REVISIONS OF THIS LICENSE
How to use this License for your documents
Index
List of Tables
2-1. Pros of openMosix
2-2. Cons of openMosix
4-1. Other Directories
7-1. Changing /proc/hpc parameters
7-2. /proc/hpc/admin/
7-3. Writing a 1 to the following files /proc/hpc/decay/
7-4. Informations about the other nodes
7-5. Additional Informations about local processes
7-6. more detailed
7-7. extra options for mosrun
9-1. how to start

Chapter 1. Introduction

1.1. openMosix HOWTO

In the beginning there was Mosix, then came openMosix, in my opinion a more interesting project. Not only from a technical point of view but also due to the more correct license. I made the decision to focus this HOWTO on openMosix rather than on Mosix, mainly based on the fact that openMosix has a bigger userbase. (Moshe Bar states that about 97% of the old Mosix community has switched over to openMosix.) (20020705) Given the above, lots of information might be valuable to both users of Mosix and openMosix. I decided to split the HOWTO. The latest release of the Mosix HOWTO, containing info about both Mosix and OpenMosix will be 0.20 My intention is to focus on the openMosix HOWTO, however not neglecting the Mosix users. More info on http://howto.ipng.be/Mosix-HOWTO/


1.2. Introduction

This document gives a brief description of openMosix, a software package that turns a network of GNU/Linux computers into a computer cluster. Along the way, some background to parallel processing is given, as well as a brief introduction to programs that make special use of openMosix's capabilities. The HOWTO expands on the documentation as it provides more background information and discusses the quirks of various distributions.

Since the creation of this HOWTO some people of the Mosix team created openMosix (more info later), initially both openMosix and Mosix were discussed in this HOWTO. Although lots of information might be valuable to both users of Mosix and openMosix. I decided to split the HOWTO. The latest relase of the Mosix HOWTO, containing info about both Mosix and OpenMosix will be 0.20 and can be found on http://howto.ipng.be/Mosix-HOWTO/Mosix-HOWTO/

Kris Buytaert got involved in this piece of work when Scot Stevenson was looking for somebody to take over the Job: this was during February 2002. While initially we discussed both Mosix and openMosix, this version of the HOWTO now mainly focuses on openMosix. Please note that the document often still mentions Mosix where it should read openMosix.

You will notice that some of the headings are not as serious as they should be. Scot had planned to write the HOWTO in a slightly lighter style, as the world (and even the part of the world with a burping penguin as a mascot) is full of technical literature that is deadly. Therefore some parts still have these comments.


1.3. Disclaimer

Use the information in this document at your own risk. I disavow potential liability for the contents of this document. Use of these concepts, examples, and/or other content of this document is entirely at your own risk.

All copyrights are owned by their respective owners, unless specified otherwise. Use of a term in this document should not be regarded affecting the validity of any trademark or service mark. openMosix is Copyright (c) by Moshe Bar. Mosix is Copyright (c) by Amnon Barak. Linux is a Registered Trademark of Linus Torvalds. openMosix is licensed under version 2 of the GNU General Public License as published by the Free Software Foundation.

Naming of particular products or brands should not be seen as endorsements.

You are strongly recommended to take a backup of your system before major installation and backups at regular intervals.


1.4. Distribution policy

Copyright (c) 2002 by Kris Buytaert and Scot W. Stevenson. This document may be distributed under the terms of the GNU Free Documentation License, Version 1.1 or any later version published by the Free Software Foundation; with no Invariant Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A copy of the license is included in the appendix entitled "GNU Free Documentation License".


1.5. New versions of this document

Official New versions of this document can be found on the web pages of the Linux Documentation Project Drafts and Beta versions will be available on howto.ipng.be in the appropriate sub folder. Changes to this document will usually be discussed on the openMosix Mailing Lists. See the openMosix for details.


1.6. Feedback

Currently this HOWTO is being maintained by Kris Buytaert. Please do send remarks and updates for the howto to him.

If you have a technical question about openMosix/Mosix itself, please post them on the more appropriate mailing list.


Chapter 2. So what is openMosix Anyway ?

2.1. A very, very brief introduction to clustering

Most of the time, your computer is bored. Start a program like xload or top that monitors your system use, and you will probably find that your processor load is not even hitting the 1.0 mark. If you have two or more computers, chances are that at any given time, at least one of them is doing nothing. Unfortunately, when you really do need CPU power - during a C++ compile, or encoding Ogg Vorbis music files - you need a lot of it at once. The idea behind clustering is to spread these loads among all available computers, using the resources that are free on other machines.

The basic unit of a cluster is a single computer, also called a "node". Clusters can grow in size - they "scale" - by adding more machines. A cluster as a whole will be more powerful the faster the individual computers and the faster their connection speeds are. In addition, the operating system of the cluster must make the best use of the available hardware in response to changing conditions. This becomes more of a challenge if the cluster is composed of different hardware types (a "heterogeneous" cluster), if the configuration of the cluster changes unpredictably (machines joining and leaving the cluster), and the loads cannot be predicted ahead of time.


2.1.1. A very, very brief introduction to clustering

2.1.1.1. HPC vs Fail-over vs Load-balancing

Basically there are 3 types of clusters, Fail-over, Load-balancing and HIGH Performance Computing, The most deployed ones are probably the Failover cluster and the Load-balancing Cluster.

  • Fail-over Clusters consist of 2 or more network connected computers with a separate heartbeat connection between the 2 hosts. The Heartbeat connection between the 2 machines is being used to monitor whether all the services are still in use: as soon as a service on one machine breaks down the other machines try to take over.

  • With load-balancing clusters the concept is that when a request for say a web-server comes in, the cluster checks which machine is the least busy and then sends the request to that machine. Actually most of the times a Load-balancing cluster is also a Fail-over cluster but with the extra load balancing functionality and often with more nodes.

  • The last variation of clustering is the High Performance Computing Cluster: the machines are being configured specially to give data centers that require extreme performance what they need. Beowulfs have been developed especially to give research facilities the computing speed they need. These kind of clusters also have some load-balancing features; they try to spread different processes to more machines in order to gain performance. But what it mainly comes down to in this situation is that a process is being parallelized and that routines that can be ran separately will be spread on different machines instead of having to wait till they get done one after another.

Most common known examples of loadbalancing and failover clusters are webfarms, databases or firewalls. People want to have a 99,99999% uptime for their services, the internet is open 24/24 7/7/ 365/365 not unlike in the old days when you could shut down your server when the office closed.

People that are in need of cpu cycles often can afford to schedule downtime for their environments, as long as they can use the maximum power of their machines when they need it.


2.1.1.2. Supercomputers vs. clusters

Traditionally Supercomputers have only been built by a selected number of vendors: a company or organization that required the performance of such a machine had to have a huge budget available for its Supercomputer. Lots of universities could not afford the costs of a Supercomputer by themselves, therefore other alternatives were being researched by them. The concept of a cluster was born when people first tried to spread different jobs over more computers and then gather back the data those jobs produced. With cheaper and more common hardware available to everybody, results similar to real Supercomputers were only to be dreamed of during the first years, but as the PC platform developed further, the performance gap between a Supercomputer and a cluster of multiple personal computers became smaller.


2.1.1.3. Cluster models [(N)UMA, PVM/MPI]

There are different ways of doing parallel processing: (N)UMA, DSM, PVM and MPI are all different kinds of Parallel Processing schemes. Some of them are implemented in hardware, others in software, others in both.

(N)UMA ((Non-)Uniform Memory Access), machines for example have shared access to the memory where they can execute their code. In the Linux kernel there is a NUMA implementation that varies the memory access times for different regions of memory. It then is the kernel's task to use the memory that is the closest to the CPU it is using.

DSM aka Distributed Shared memory, has been implemented in both software and hardware , the concept is to provide an abstraction layer for physically distributed memory.

PVM and MPI are the tools that are most commonly being used when people talk about GNU/Linux based Beowulfs.

MPI stands for Message Passing Interface. It is the open standard specification for message passing libraries. MPICH is one of the most used implementations of MPI. Next to MPICH you also can find LAM, another implementation of MPI based on the free reference implementation of the libraries.

PVM or Parallel Virtual Machine is another cousin of MPI that is also quite often being used as a tool to create a Beowulf. PVM lives in user space so no special kernel modifications are required: basically each user with enough rights can run PVM.


2.1.1.4. openMosix's role

The openMosix software package turns networked computers running GNU/Linux into a cluster. It automatically balances the load between different nodes of the cluster, and nodes can join or leave the running cluster without disruption of the service. The load is spread out among nodes according to their connection and CPU speeds.

Since openMosix is part of the kernel and maintains full compatibility with Linux, a user's programs, files, and other resources will all work as before without any further changes. The casual user will not notice the difference between a Linux and an openMosix system. To her, the whole cluster will function as one (fast) GNU/Linux system.

openMosix is a Linux-kernel patch which provides full compatibility with standard Linux for IA32-compatible platforms. The internal load-balancing algorithm transparently migrates processes to other cluster members. The advantage is a better load-sharing between the nodes. The cluster itself tries to optimize utilization at any time (of course the sysadmin can affect the automatic load-balancing by manual configuration during runtime).

This transparent process-migration feature makes the whole cluster look like a BIG SMP-system with as many processors as available cluster-nodes (of course multiplied with X for X-processor systems such as dual/quad systems and so on). openMosix also provides a powerful optimized File System (oMFS) for HPC-applications, which unlike NFS provides cache, time stamp and link consistency.


2.2. The story so far

2.2.1. Historical Development

Rumours say that Mosix comes from Moshe Unix. Initially Mosix started out as an application running on BSD/OS 3.0.
Announcing MO6 for BSD/OS 3.0
Oren Laadan (orenl@cs.huji.ac.il)
Tue, 9 Sep 1997 19:50:12 +0300 (IDT)

Hi:

We are pleased to announce the availability of MO6 Version 3.0
Release 1.04 (beta-4) - compatible with BSD/OS 3.0, patch level
K300-001 through M300-029.

MO6 is a 6 processor version of the MOSIX multicomputer enhancements
of BSD/OS for a PC Cluster. If you have 2 to 6 PC's connected by a
LAN, you can experience truly multi-computing environment by using
the MO6 enhancements.

The MO6 Distribution
--------------------
MO6 is available either in "source" or "binary" distribution. It is
installed as a patch to BSD/OS, using an interactive installation
script.

MO6 is available at http://www.cnds.jhu.edu/mirrors/mosix/
or at our site: http://www.cs.huji.ac.il/mosix/

Main highlights of the current release:
--------------------------------------
- Memory ushering (depletion prevention) by process migration.
- Improved installation procedure.
- Enhanced migration control.
- Improved administration tools.
- More user utilities.
- More documentation and new man pages.
- Dynamic configurations.

Please send feedback and comments to mosix@cs.huji.ac.il.
-------------------
GNU/Linux was chosen as a development platform for the 7th incarnation in 1999. Early 1999 Mosix M06 Beta was released for Linux 2.2.1 At the end of 2001 and early 2002 openMosix, the open version of Mosix was born (more in the next paragraph).


2.2.2. openMosix

openMosix is in addition to whatever you find at mosix.org and in full appreciation and respect for Prof. Barak's leadership in the outstanding Mosix project.

Moshe Bar has been involved for a number of years with the Mosix project (www.mosix.com) and was co-project manager of the Mosix project and general manager of the commercial Mosix company.

After a difference of opinions on the commercial future of Mosix, he has started a new clustering company - Qlusters, Inc. - and Prof. Barak has decided not to participate for the moment in this venture (although he did seriously consider joining) and held long running negotiations with investors. It appears that Mosix is not any longer supported openly as a GPL project. Because there is a significant user base out there (about 1000 installations world-wide), Moshe Bar has decided to continue the development and support of the Mosix project under a new name: openMosix and under the full GPL2 license. Whatever code in openMosix comes from the old Mosix project is Copyright 2002 by Amnon Bark. All the new code is Copyright 2002 by Moshe Bar.

There could (and will) be significant changes in the architecture of the future openMosix versions. New concepts about auto-configuration, node-discovery and new user-land tools are being discussed in the openMosix mailing lists. Most of these new functionalities are already implemented while some of them, such as DSM (Distributed Shared Memory) are still being worked on at the moment I write this (march 2003).

To approach standardization and future compatibility the proc-interface has changed from /proc/mosix to /proc/hpc and the /etc/mosix.map was changed to /etc/hpc.map. More recently the standard for the config file has been set to be located in /etc/openmosix.map (this is in fact the first config file the /etc/init.d/openmosix script will look for). Adapted command-line user-space tools for openMosix are already available on the web-page of the project.

The openmosix.map config file can be replaced with a node-auto-discovery system which is called omdiscd (openMosix auto DISCovery Daemon) about which we will discuss later.

openMosix is supported by various competent people (see openmosix.sourceforge.net) working together around the world. The main goal of the project is to create a standardized clustering-environment for all kinds of HPC-applications.

openMosix has also a project web-page at http://openMosix.sourceforge.net with a CVS tree and mailing-lists for developers as well as users.


2.2.3. Current state

Like most active Open Source programs, openMosix's rate of change tends to outstrip the followers' ability to keep the documentation up to date.

As I write this part in February 2003 openMosix 2.4.20 is available and openMosix Userland Tools v0.2.4 are available, including the new autodiscovery tools.

For a more recent state of development please take a look at the openMosix website


2.2.4. Which applications work

It is almost impossible to give a list off all the applications that work with openMosix. The community however tries to keep track of the applications that migrate and the ones who don't.


2.3. openMosix in action: An example

openMosix clusters can take various forms. To demonstrate this, let's assume you are a student and share a dorm room with a rich computer science guy, with whom you have linked computers to form an openMosix cluster. Let's also assume you are currently converting music files from your CDs to Ogg Vorbis for your private use, which is legal in your country. Your roommate is working on a project in C++ that he says will bring World Peace. However, at just this moment he is in the bathroom doing unspeakable things, and his computer is idle.

So when you start a program like bladeenc to convert Bach's .... from .wav to .ogg format, the openMosix routines on your machine compare the load on both nodes and decide that things will go faster if that process is sent from your Pentium-233 to his Athlon XP. This happens automatically: you just type or click your commands as you would if you were on a standalone machine. All you notice is that when you start two more coding runs, things go a lot faster, and the response time doesn't go down.

Now while you're still typing ...., your roommate comes back, mumbling something about red chili peppers in cafeteria food. He resumes his tests, using a program called 'pmake', a version of 'make' optimized for parallel execution. Whatever he's doing, it uses up so much CPU time that openMosix even starts to send subprocesses to your machine to balance the load.

This setup is called *single-pool*: all computers are used as a single cluster. The advantage/disadvantage of this is that your computer is part of the pool: your stuff will run on other computers, but their stuff will run on yours too.


2.4. Components

2.4.1. Process migration

With openMosix you can start a process on one machine and find out it actually runs on another machine in the cluster. Each process has its own Unique Home Node (UHN) where it gets created.

Migration means that a process is splitted in 2 parts, a user part and a system part. The user part will be moved to a remote node while the system part will stay on the UHN. This system-part is sometimes called the deputy process: this process takes care of resolving most of the system calls.

openMosix takes care of the communication between these 2 processes.


2.4.2. The openMosix File System (oMFS)

oMFS is a feature of openMosix which allows you to access remote filesystems in a cluster as if they were locally mounted. The filesystems of your other nodes can be mounted on /mfs and you will, for instance, find the files in /home on node 3 on each machine in /mfs/3/home.


2.4.3. Direct File System Access (DFSA)

Both Mosix and openMosix provide a cluster-wide file-system (MFS) with the DFSA-option (Direct File-System Access). It provides access to all local and remote file-systems of the nodes in a Mosix or openMosix cluster.


2.5. openMosix Test Drive

In support of openMosix, Major Chai Mee Joon is giving OM users a free trial account to his online openMosix cluster service, which users can use to test and experiment openMosix with.

The availability of this online openMosix cluster service will help both new users overcome the initial openMosix configuration issues, and also provides higher computing power to openMosix users who are developing or porting their applications.

To get your userid and password to the cluster: http://www.mosixcluster.com/trial.php


2.6. Pros of openMosix

Table 2-1. Pros of openMosix

No extra packages are required.
No code changes to your application are required.
Simple to install/configure.
On a Red-Hat based system/distro, installing openMosix is as simple as typing: # rpm -Uvh openMosix*.rpm
DSM is being released soon (late march 2003).
Well integrated with openAFS.
Port to IA-64 as well as AMD-64 is underway.
oMFS has been improved much since plain MFS.
It is a clustering platform with more than 10 products based on it: openMosixView, openMosixWebView, openMosixApplet, RxLinux, PlumpOS, K12LTSP, LTSP and many others.
openMosix is a product developed by the users themselves so it's more close to the user by definition.
Node autodiscovery/fail-over daemon already implemented in the user land tools via multicast messaging.
Aliases for hosts with multiple interfaces.
Basic routing available (in the rare case where true multicast routing is undesirable).
Cluster Mask allows to specify to which nodes a given process can migrate.


2.7. Cons of openMosix

Table 2-2. Cons of openMosix

Kernel dependent.
Shared memory issues (an alpha release of DSM should be available as of late march 2003).
There are issues with Multiple Threads not gaining performance.
You won't gain performance when running one single process such as your web browser on an openMosix Cluster: the process won't spread itself over the cluster. Except of course your process will migrate to a more performant machine.


Chapter 3. Requirements and Planning

3.1. Hardware requirements

Installing a basic cluster requires at least 2 network connected machines, either using a cross-cable between the two network cards or using a switch or hub (a switch is much better than a hub though and only costs a few bucks more). Of course the faster your network-cards the easier you will get better performance for your cluster.

These days Fast Ethernet (100 Mbps) is standard; putting multiple ports in a machine isn't that difficult, but make sure to connect them through other physical networks in order to gain the speed you want. Gigabit Ethernet is getting cheaper every day now but I suggest that you don't rush to the shop spending your money before you have actually tested your setup with multiple 100Mbit cards and noticed that you really do need the extra network capacity. Next to putting a Gigabit card you might also want to try bonding different 100Mbit cards together. An even cheaper alternative can be found in Firewire, as discussed in this paper


3.2. Hardware Setup Guidelines

Setting up a big cluster requires some thinking to be done: where are you going to put the machines? Not under a table somewhere or in the middle of your office I hope! It's ok if you just want to do some small tests, but if you are planning to deploy a N node cluster you will have to make sure that the environment that will hold these machines is capable of doing so.

I'm talking about preparing one or more 19" racks to host the machines, configure the appropriate network topology, either straight, single connected or even a 1 to 1 cross connected network between all your nodes. You will also need to make sure that there is enough power to support such a range of machines, that your air-conditioning system supports the load and that in case of power-failure your UPS can cleanly shut down all the required systems. You might want to invest in a KVM (Keyboard, Video, Mouse) Switch in order to facility access to the machines' consoles.

But even if you don't have the number of nodes that justifies such an investment, make sure that you can always easily access the different nodes, you never know when you have to replace a CPU fan or an hard-disk of a machine in trouble. If that means that you have to unload a stack of machines to reach the bottom one, hence shutting down your cluster, you are in trouble.


3.3. Software requirements

The systems we plan to use will need a basic Linux installation of your choice: Red Hat, Suse, Debian, Gentoo or any another distribution: it doesn't really matter which one. What does matter is that the kernel is at least on 2.4 level, and that your network-cards are configured correctly; next to that you'll need a healthy space of swap.


3.4. Planning your Cluster

When it comes to configuring openMosix Clusters with a pool of servers and a set of (personal) workstations, you have different options that will have their advantages and disadvantages.

  • In a Single-pool configuration all the servers and workstations are used as a single cluster: each machine is a part of the cluster and can migrate processes to each other existing node. This of course makes your workstation a part of the pool.

  • In an environment that is called a Server-pool, servers are a part of the cluster while workstations aren't part of it, they don't even haven openMosix kernel. If you want to run applications on the cluster you will need to specifically log on to these servers. However your workstation will also stay clean and no remote processes will migrate to it.

  • A third alternative is called an Adaptive-pool configuration: here servers are shared while workstations join or leave the cluster. Imagine your workstation being used during daytime by yourself but, as soon as you log out in the evening, a script tells the workstation to join the cluster and start crunching numbers. This way your machine is being used while you don't need it. If you need the resources of the machine again just run the openmosix stop script and your processes will stay away from the cluster and vice-versa.

    Practically this means that you will change the role of your machine by using mosctl.


3.5. Classrooms

Although it might seem a good idea to convert your classroom into an openMosix cluster at night, you'll have to consider training your end users not to pull the power switch of those machines when they want to use them again. More recent machines support automatic shutdowns when hitting the power button, but with older machines you might loose some data now and then when this actually happens.


Chapter 4. Distribution specific installations

4.1. Installing openMosix

This chapter deals with installing openMosix on different distributions. It won't be an exhaustive list of all the possible combinations. However throughout the chapter you should find enough information on installing openMosix in your environment.

Techniques for installing multiple machines with openMosix will be discussed in one of the next chapters.


4.2. Getting openMosix

You can download the latest versions of openMosix from http://sourceforge.net/project/showfiles.php?group_id=46729. You can either choose the binary (even in rpm) compiled for UP or SMP or download the source code. You will need both the kernel patch or binaries and the userland tools. Alternatively you can get the CVS version:

cvs -d:pserver:anonymous@cvs.openmosix.sourceforge.net:/cvsroot/openmosix login
cvs -z3 -d:pserver:anonymous@cvs.openmosix.sourceforge.net:/cvsroot/openmosix co linux-openmosix
cvs -z3 -d:pserver:anonymous@cvs.openmosix.sourceforge.net:/cvsroot/openmosix co userspace-tools
At the password prompt, just type enter since you're doing an anonymous login. Please take care that CVS trees DO BREAK now and then and that it might not be the easiest way to install openMosix ;-)


4.3. openMosix General Instructions

4.3.1. Kernel Compilation

Always use pure vanilla kernel-sources from http://www.kernel.org/ to compile an openMosix kernel! Please be kind enough to download the kernel using a mirror near to you and always try and download patches to the latest kernel sources you do have instead of downloading the whole thing. This is going to be much appreciated by the Linux community and will greatly increase your geeky Karma ;-) Be sure to use the right openMosix patch depending on the kernel-version. At the moment I write this, the latest 2.4 kernel is 2.4.20 so you should download the openMosix-2.4.20-x.gz patch, where the "x" stands for the patch revision (ie: the greater the revision number, the most recent it is). Do not use the kernel that comes with any Linux-distribution: it won't work. These kernel sources get heavily patched by the distribution-makers so, applying the openMosix patch to such a kernel is going to fail for sure! Been there, done that: trust me ;-)

Download the actual version of the openMosix patch and move it in your kernel-source directory (e.g. /usr/src/linux-2.4.20). If your kernel-source directory is other than "/usr/src/linux-[version_number]" at least the creation of a symbolic link to "/usr/src/linux-[version_number]" is required. Supposing you're the root user and you've downloaded the gzipped patch file in your home directory, apply the patch using (guess what?) the patch utility:
mv /root/openMosix-2.4.20-2.gz /usr/src/linux-2.4.20
cd /usr/src/linux-2.4.20
zcat openMosix-2.4.20-2.gz | patch -Np1
In the rare case you don't have "zcat" on your system, do:
mv /root/openMosix-2.4.20-2.gz /usr/src/linux-2.4.20
cd /usr/src/linux-2.4.20
gunzip openMosix-2.4.20-2.gz
cat openMosix-2.4.20-2 | patch -Np1
If the even more weird case you don't have a "cat" on your system (!), do:
mv /root/openMosix-2.4.20-2.gz /usr/src/linux-2.4.20
cd /usr/src/linux-2.4.20
gunzip openMosix-2.4.20-2.gz
patch -Np1 < openMosix-2.4.20-2
The "patch" command should now display a list of patched files from the kernel-sources. If you feel adventurous enough, enable the openMosix related options in the kernel-configuration file, e.g.
...
CONFIG_MOSIX=y
# CONFIG_MOSIX_TOPOLOGY is not set
CONFIG_MOSIX_UDB=y
# CONFIG_MOSIX_DEBUG is not set
# CONFIG_MOSIX_CHEAT_MIGSELF is not set
CONFIG_MOSIX_WEEEEEEEEE=y
CONFIG_MOSIX_DIAG=y
CONFIG_MOSIX_SECUREPORTS=y
CONFIG_MOSIX_DISCLOSURE=3
CONFIG_QKERNEL_EXT=y
CONFIG_MOSIX_DFSA=y
CONFIG_MOSIX_FS=y
CONFIG_MOSIX_PIPE_EXCEPTIONS=y
CONFIG_QOS_JID=y
...
However, it's going to be pretty much easier if you configure the above options using one of the Linux-kernel configuration tools:
make config | menuconfig | xconfig
The above means you have to choose one of "config", "menuconfig", and "xconfig". It's a matter of taste. By the way, "config" is going to work on any system; "menuconfig" needs the curses libraries installed while "xconfig" needs an installed X-window environment plus the TCL/TK libraries and interpreters.

Now compile it with:
make dep bzImage modules modules_install
After compilation install the new kernel with the openMosix options within you boot-loader; e.g. insert an entry for the new kernel in /etc/lilo.conf and run lilo after that.

Reboot and your openMosix-cluster-node is up!


4.3.2. Syntax of the /etc/openmosix.map file

Before starting openMosix, there has to be an /etc/openmosix.map configuration file which must be the same on each node.

The standard is now /etc/openmosix.map, /etc/mosix.map and /etc/hpc.map are old standards, but the CVS-version of the tools is backwards compatible and looks for /etc/openmosix.map, /etc/mosix.map and /etc/hpc.map (in that order).

The openmosix.map file contains three space separated fields:
openMosix-Node_ID               IP-Address(or hostname)          Range-size
An example openmosix.map file could look like this:
1       node1   1
2       node2   1
3       node3   1
4       node4   1
or
1       192.168.1.1     1
2       192.168.1.2     1
3       192.168.1.3     1
4       192.168.1.4     1
or with the help of the range-size both of the above examples equal to:
1       192.168.1.1     4
openMosix "counts-up" the last byte of the ip-address of the node according to its openMosix-Node_ID. Of course, if you use a range-size greater than 1 you have to use ip-addresses instead of hostnames.

If a node has more than one network-interface it can be configured with the ALIAS option in the range-size field (which equals to setting the range-size to 0) e.g.
1       192.168.1.1     1
2       192.168.1.2     1
3       192.168.1.3     1
4       192.168.1.4     1
4       192.168.10.10   ALIAS
Here the node with the openMosix-Node_ID 4 has two network-interfaces (192.168.1.4 + 192.168.10.10) which are both visible to openMosix.

Always be sure to run the same openMosix version AND configuration on each of your Cluster's nodes!

Start openMosix with the "setpe" utility on each node :
setpe -w -f /etc/openmosix.map
Execute this command (which will be described later on in this HOWTO) on every node in your openMosix cluster.

Alternatively, you can grab the "openmosix" script which can be found in the scripts directory of the userspace-tools, copy it to the /etc/init.d directory, chmod 0755 it, then use the following commands as root:
/etc/init.d/openmosix stop
/etc/init.d/openmosix start
/etc/init.d/openmosix restart

Installation is finished now: the cluster is up and running :)


4.3.3. oMFS

First of all, the CONFIG_MOSIX_FS option in the kernel configuration has to be enabled. If the current kernel was compiled without this option, then recompilation with this option enabled is required.

Also the UIDs (User IDs) and GIDs (Group IDs) on each of the clusters' nodes file-systems must be the same. You might want to accomplish this using openldap. The CONFIG_MOSIX_DFSA option in the kernel is optional but of course required if DFSA should be used. To mount oMFS on the cluster there has to be an additional fstab-entry on each node's /etc/fstab.

in order to have DFSA enabled:
mfs_mnt         /mfs            mfs     dfsa=1          0 0
in order to have DFSA disabled:
mfs_mnt          /mfs           mfs     dfsa=0          0 0
the syntax of this fstab-entry is:
[device_name]           [mount_point]   mfs     defaults        0 0
After mounting the /mfs mount-point on each node, each node's file-system is going to be accessible through the /mfs/[openMosix-Node_ID]/ directories.

With the help of some symbolic links all cluster-nodes can access the same data e.g. /work on node1
on node2 :      ln -s /mfs/1/work /work
on node3 :      ln -s /mfs/1/work /work
on node3 :      ln -s /mfs/1/work /work
...
Now every node can read+write from and to /work !

The following special files are excluded from the oMFS:

  • the /proc directory

  • special files which are not regular-files, directories or symbolic links (e.g. /dev/hda1)

Creating links like:
ln -s /mfs/1/mfs/1/usr         
or
ln -s /mfs/1/mfs/3/usr
is invalid.

The following system calls are supported without sending the migrated process (which executes this call on its home (remote) node) going back to its home node:

read, readv, write, writev, readahead, lseek, llseek, open, creat, close, dup, dup2, fcntl/fcntl64, getdents, getdents64, old_readdir, fsync, fdatasync, chdir, fchdir, getcwd, stat, stat64, newstat, lstat, lstat64, newlstat, fstat, fstat64, newfstat, access, truncate, truncate64, ftruncate, ftruncate64, chmod, chown, chown16, lchown, lchown16, fchmod, fchown, fchown16, utime, utimes, symlink, readlink, mkdir, rmdir, link, unlink, rename

Here are situations when system calls on DFSA mounted file-systems may not work:

  • different mfs/dfsa configuration on the cluster-nodes

  • dup2 if the second file-pointer is non-DFSA

  • chdir/fchdir if the parent dir is non-DFSA

  • pathnames that leave the DFSA-filesystem

  • when the process which executes the system-call is being traced

  • if there are pending requests for the process which executes the system-call

Next to the /mfs/1/ /mfs/2/ and so on files you will find some other directories as well.

Table 4-1. Other Directories

/mfs/hereThe current node where your process runs
/mfs/homeYour home node
/mfs/magicThe current node when used by the "creat" system call (or an "open" with the "O_CREAT" option) - otherwise, the last node on which an oMFS magical file was successfully created (this is very useful for creating temporary-files, then immediately unlinking them)
/mfs/lastexecThe node on which the process last issued a successful "execve" system-call.
/mfs/selectedThe node you selected by either your process itself or one of its ancestor's (before forking this process), writing a number into "/proc/self/selected".

Note that these magic files are all ``per process''. That is their content is dependent upon which process opens them.

A last not about openMFS is that there are versions around that return faultive results when you run "df" on those filesystems. Don't be surpised if you suddenlty have about 1.3 TB available on those systems.


4.4. Red Hat and openMosix

If you are running a RedHat 7.2, 7.3 or 8.0 version, this is probably the easiest *Mosix install you have ever done. Choose the appropriate openMosix RPMs from sourceforge. They have precompiled kernels (as I write this 2.4.20) that work seamlessly: I have tested them on several machines including Laptops with PCMCIA cards and Servers with SCSI disks. If you are a grub user, the kernel rpm even modifies your grub.conf. So all you have to do is install 2 RPMs:
rpm -Uvh openmosix-kernel-2.4.20-openmosix2.i686.rpm openmosix-tools-0.2.4-1.i386.rpm
and edit your /etc/openmosix.map. Since this seems to be a problem for lots of people, let's go with another example. Say you have 3 machines: 192.168.10.220, 192.168.10.78 and 192.168.10.84. Your openmosix.map will look like this.
[root@oscar0 root]# more /etc/openmosix.map 
# openMosix CONFIGURATION
# ===================
#
# Each line should contain 3 fields, mapping IP addresses to openMosix node-numbers:
# 1) first openMosix node-number in range.
# 2) IP address of the above node (or node-name from /etc/hosts).
# 3) number of nodes in this range.
#
# Example: 10 machines with IP 192.168.1.50 - 192.168.1.59
# 1	   192.168.1.50	    10
#
# openMosix-#  IP  number-of-nodes
# ============================
1 192.168.10.220 1
2 192.168.10.78  1
3 192.168.10.84  1
Now by rebooting the different machines with the newly installed kernel you will get one step closer to having a working cluster.

Most RedHat installations have one extra thing to fix. You often get the following error:
[root@inspon root]# /etc/init.d/openmosix start 
Initializing openMosix...
setpe: the supplied table is well-formatted,
but my IP address (127.0.0.1) is not there!
This means that your hostname is not listed in /etc/hosts with the same ip as in your openmosix.map. You might have a machine called omosix1.localhost.org in your hostfile listed as
127.0.0.1	omosix1.localhost.org localhost 
If you modify your /etc/hosts to look like below, openMosix will have less troubles starting up.
192.168.10.78   omosix1.localhost.org
127.0.0.1       localhost 
[root@inspon root]# /etc/init.d/openmosix start 
Initializing openMosix...
[root@inspon root]# /etc/init.d/openmosix status
This is openMosix node #2
Network protocol: 2 (AF_INET)
openMosix range     1-1     begins at 192.168.10.220
openMosix range     2-2     begins at inspon.localhost.be
openMosix range     3-3     begins at 192.168.10.84
Total configured: 3

If you would like to use more bleeding edge patches, you can always opt for the src rpm and run rpmbuild --rebuild on it. This will install the source for you and create an initial config file. From there you can go further applying patches to openMosix

As new RedHat versions come out, they might be supported out of the box so, feel free to drop the author a note and help him keeping this information updated.


4.5. Suse and openMosix

Although the RPMs are being built on a RedHat based environment, you can use most of them on other RPM based systems.

Suse however has /sbin/mk_initrd as a link to /sbin/mkinitrd, which makes rpms before release 20-2 fail. Newer version should have a fix for this.


4.6. Debian and openMosix

Installing openMosix ``the Debian way'' can be easily done as described below.

The first step consists in downloading the packages from the net. I had to use a 2.4.19 kernel since the openMosix patches package is not yet available for 2.4.20 at the moment I write this. Since we are using a Debian setup we needed: http://packages.debian.org/unstable/net/openmosix.html, http://packages.debian.org/unstable/net/kernel-patch-openmosix.html, http://packages.debian.org/unstable/misc/kernel-package.html, http://packages.debian.org/unstable/devel/kernel-source-2.4.19.html. You can also apt-get install them ;).

The next part is making the kernel openMosix capable.

Basically, the procedure to follow is:
cd /usr/src
apt-get install kernel-source-2.4.19 kernel-package \
        openmosix kernel-patch-openmosix
tar vxjf kernel-source-2.4.19.tar.bz2
ln -s /usr/src/kernel-source-2.4.19 /usr/src/linux
cd /usr/src/linux
../kernel-patches/i386/apply/openmosix
make menuconfig
make-kpkg kernel_image modules_image
cd ..
dpkg -i kernel-image-*-openmosix-*.deb
You now will need to edit your /etc/openmosix.map. Please follow the instructions given in the ``Syntax of /etc/openmosix.map'' part of this HOWTO.

After rebooting with this kernel and a configured /etc/openmosix.map, you should then have a cluster of openMosix machines that talk to each-other and that do migration of processes.

You can test that by running the following small script:
awk 'BEGIN {for(i=0;i<10000;i++)for(j=0;j<10000;j++);}'
a couple of times, and monitor its behaviour with "mosmon" where you will see that it spreads the load between the different nodes.

We also setup openMosixView on the Debian machine:
apt-get install openmosixview
In order to be able to actually use openMosixView you will need to run it from a user who can log in to the different nodes as root. We suggest you set this up using ssh. Please note that there is a difference between the ssh and ssh2 implementations. If you do have an identity.pub file, ssh will check authorized_keys, while if you do have an id_dsa.pub you will need authorized_keys2!

openMosixView gives you a nice interface that shows the load of different machines and gives you the possibility to migrate processes manually.

A detailed discussion of openMosixView can be found elsewhere in this document.


4.7. openMosix and Gentoo

First Install Gentoo Linux

Then, install openMosix: type "emerge sys-apps/openmosix-user", which will install an openMosix kernel source tree in /usr/src/linux along with the openMosix userland tools.

Michael Imhof, aka tantive, keeps Gentoo current for the latest openMosix version.

Daniel Robbins, the President/CEO of Gentoo Technologies, Inc. and the creator of Gentoo Linux, wrote the artitles we use as our Introduction to openMosix Clusters.


4.8. Other distributions

Based on the explanations above you should be able to install openMosix on most other Linux platforms.


Chapter 5. Autodiscovery

5.1. Easy Configuration

The auto-discovery daemon (omdiscd) provides a way to automatically configure an openMosix cluster hence eliminating the need of a /etc/mosix.map or similar manual configurations. Auto-discovery uses multicast packages to notify other nodes that it is an openMosix node. This way adding an extra node to your mosix cluster means that you just have to start the omdiscd on your machine and it will join the cluster.

However there are some small requirements, Like with any openMosix cluster , you need to have networking configured correctly. mainly the routing. Without a default route, you must specify an interface to omdiscd with the -i option. Otherwise omdiscd will exit with an error like.
Aug 31 20:41:49 localhost omdiscd[1290]: Unable to determine address of 
default interface.  This may happen because there is no default route 
configured.  Without a default route, an interface must be: Network is 
unreachable 
Aug 31 20:41:49 localhost omdiscd[1290]: Unable to initialize network.  
Exiting. 
An example of a correct routing is below
[root@localhost log]# route -n 
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
10.0.0.0        0.0.0.0         255.0.0.0       U     0      0        0 eth0
127.0.0.0       0.0.0.0         255.0.0.0       U     0      0        0 lo
0.0.0.0         10.0.0.99       0.0.0.0         UG    0      0        0 eth0
Basically from now on everything will get easier. Just start
omdiscd
And have a look at your logfiles you should see something similar to this
Sep  2 10:00:49 oscar0 kernel: openMosix configuration changed: This is openMosix #2780 (of 6 configured)
Sep  2 10:00:49 oscar0 kernel: openMosix #2780 is at IP address 192.168.10.220
Sep  2 10:00:49 oscar0 kernel: openMosix #2638 is at IP address 192.168.10.78
Sep  2 10:00:49 oscar0 kernel: openMosix #2646 is at IP address 192.168.10.86
Sep  2 10:00:49 oscar0 kernel: openMosix #2627 is at IP address 192.168.10.67
Sep  2 10:00:49 oscar0 kernel: openMosix #2634 is at IP address 192.168.10.74
Congratulations , your openMosix cluster is now working.

omdiscd has some other options that you can use. You can either run omdiscd as a daemon (default) or in the foreground where output goes to the screen (standard output) omdiscd -n . An interface can be specified with the -i option.

Now lets still have a short look at the other tool , it's showmap. This tool will show you the newly auto generated openMosix map.
[root@oscar0 root]# showmap
My Node-Id: 0x0adc

Base Node-Id Address          Count
------------ ---------------- -----
0x0adc       192.168.10.220   1
0x0a4e       192.168.10.78    1
0x0a56       192.168.10.86    1
0x0a43       192.168.10.67    1
0x0a4a       192.168.10.74    1

Auto-discovery has some other features not listed here such as a routing mechanism for clusters with more than one network. More detailed information can be found in the README and DESIGN files in the user-land tools source tree.

More recent versions of the openMosix rc scripts will first verify wether an /etc/openmosix.map file or similar exists before trying to use autoconfiguration.


5.2. Compiling auto-discovering

If you are compiling autodiscovery from source you will need to make a small modification to openmosix.c. One of the first lines will be
#define ALPHA
You will need to put this in comment. If you want to have some more logging available to you you should edit main.c to show log_set_debug(DEBUG_TRACE_ALL); (somewhere around line 84) now run

% make clean
% make


5.3. Troubleshooting autodiscovery

Sometimes however autodiscovery does not function as you would like, for example a node might not see multicast traffic from other nodes. This has occurred with some PCMCIA ethernet drivers. One solution is to place the interface in promiscuous and or multicast mode as detailed below:
Aug 31 20:45:58 localhost kernel: openMosix configuration changed: This is openMosix #98 (of 1 configured)
Aug 31 20:45:58 localhost kernel: openMosix #98 is at IP address 10.0.0.98Aug 31 20:45:58 localhost omdiscd[1627]: Notified kernel to activate 
openMosix  Aug 31 20:45:58 localhost kernel: Received an unauthorized information request from 10.0.0.99
What you should to then is try to force your NIC into promiscuous and/ or multicast mode manually.
ifconfig ethx promisc
or
ifconfig ethx multicast 
You might also want to run
tcpdump -i eth0 ether multicast 
which will have the same effect but you will now also be able to see the packages yourself.

Aug 31 22:14:43 inspon omdiscd[1422]: Simulated notification to activate openMosix
[root@inspon root]# showmap
My Node-Id: 0x0063

Base Node-Id Address          Count
------------ ---------------- -----
0x0063       10.0.0.99        1
[root@inspon root]# /etc/init.d/openmosix status
OpenMosix is currently disabled
[root@inspon root]# 
If you see the simulated you have probably forgotten to put the
#define ALPHA
in comment.

I have also noticed that autodiscovery does not work with FireWire based network cards.


Chapter 6. Cluster Installation

6.1. Cluster Installations

This chapter does not deal with installing openMosix as such, it does however deal with installing multiple machines with openMosix. Automated or semi automated mass installs.


6.2. DSH, Distributed Shell

At the time of this writing (May 2003) DSH's most current release is available from http://www.netfort.gr.jp/~dancer/software/downloads/ More info on the package can be found on http://www.netfort.gr.jp/~dancer/software/dsh.html The latest version available for download is 0.23.6 You will need both libdshconfig-0.20.8.tar.gz and dsh-0.23.5.tar.gz Start with installing libdshconfig
./configure
make
make install 
Repeat the process for the dsh package.

Say we have a small cluster with a couple of nodes. To make life easier we want type each command once but have it executed on each node. You then have to create a file in $HOME/.dsh/group/clusterwname that lists the ip's of your cluster. eg.
[root@inspon root]# cat .dsh/group/mosix 
192.168.10.220
192.168.10.84
As an example we run ls on each of these machines We use -g to use the mosix group (this way you can create subsets of a group with different configurations)
[root@inspon root]# dsh -r ssh -g mosix ls
192.168.10.84: anaconda-ks.cfg
192.168.10.84: id_rsa.pub
192.168.10.84: install.log
192.168.10.84: install.log.syslog
192.168.10.84: openmosix-kernel-2.4.17-openmosix1.i686.rpm
192.168.10.84: openmosix-tools-0.2.0-1.i386.rpm
192.168.10.220: anaconda-ks.cfg
192.168.10.220: id_dsa.pub
192.168.10.220: id_rsa.pub
192.168.10.220: openmosix-kernel-2.4.17-openmosix1.i686.rpm
192.168.10.220: openmosix-tools-0.2.0-1.i386.rpm
192.168.10.220: oscar-1.2.1rh72
192.168.10.220: oscar-1.2.1rh72.tar.gz
Note that neither of the machines ask for a password. This is because we have set up rsa authentication between the different accounts. If you want to run commands with multiple parameters you will have either have to put the command between quotes.
[root@inspon root]# dsh -r ssh -g mosix "uname -a"
192.168.10.84: Linux omosix2.office.be.stone-it.com 2.4.17-openmosix1 #1 
Wed May 29 14:32:28 CEST 2002 i686 unknown
192.168.10.220: Linux oscar0 2.4.17-openmosix1 #1 Wed May 29 14:32:28 CEST 
2002 i686 unknown
or use the -c -- option. Both give basically the same output.
[root@inspon root]# dsh -r ssh -g mosix -c -- uname -a 
192.168.10.220: Linux oscar0 2.4.17-openmosix1 #1 Wed May 29 14:32:28 CEST 
2002 i686 unknown
192.168.10.84: Linux omosix2.office.be.stone-it.com 2.4.17-openmosix1 #1 
Wed May 29 14:32:28 CEST 2002 i686 unknown


Chapter 7. Administrating openMosix

7.1. Basic Administration

openMosix provides the advantage of process migration to HPC-applications. The administrator can configure and tune the openMosix-cluster by using the openMosix-user-space-tools or the /proc/hpc interface which will be now described in detail.

Up till openMosix version 2.4.16 the /proc interface was named /proc/mosix ! Until openMosix version 2.4.17 it was named /proc/hpc.


7.2. Configuration

The values in the flat files in the /proc/hpc/admin directory presenting the current configuration of the cluster. Also the administrator can write its own values into these files to change the configuration during runtime, e.g.

Table 7-1. Changing /proc/hpc parameters

echo 1 > /proc/hpc/admin/blockblocks the arrival of remote processes
echo 1 > /proc/hpc/admin/bringbring all migrated processes home

...

Table 7-2. /proc/hpc/admin/

(binary files) config the main configuration file (written by the setpe util)
(flat files) block allow/forbid arrival of remote processes
  bring bring home all migrated processes
  dfsalinks list of current symbolic dfsa-links
  expel sending guest processes home
  gateways maximum number of gateways
  lstay local processes should stay
  mospe contains the openMosix node id
  nomfs disables/enables MFS
  overheads for tuning
  quiet stop collecting load-load-balancing informations
  decay-interval interval for collecting informations about load-balancing
  slow-decay default 975
  fast-decay default 926
  speed speed relative to PIII/1GHz)
  stay enables/disables automatic process migration

Table 7-3. Writing a 1 to the following files /proc/hpc/decay/

clear clears the decay statistics
cpujob tells openMosix that the process is cpu-bound
iojob tells openMosix that the process is io-bound
slow tells openMosix to decay its statistics slow
fast tells openMosix to decay its statistics fast

Table 7-4. Informations about the other nodes

/proc/hpc/nodes/[openMosix_ID]/CPUs how many CPU's the node has
/proc/hpc/nodes/[openMosix_ID]/load the openMosix load of this node
/proc/hpc/nodes/[openMosix_ID]/mem available memory as openMosix believes
/proc/hpc/nodes/[openMosix_ID]/rmem available memory as Linux believes
/proc/hpc/nodes/[openMosix_ID]/speed speed of the node relative to PIII/1GHz
/proc/hpc/nodes/[openMosix_ID]/status status of the node
/proc/hpc/nodes/[openMosix_ID]/tmem available memory
/proc/hpc/nodes/[openMosix_ID]/util utilization of the node

Table 7-5. Additional Informations about local processes

/proc/[PID]/cantmove reason why a process cannot be migrated
/proc/[PID]/goto to which node the process should migrate
/proc/[PID]/lock if a process is locked to its home node
/proc/[PID]/nmigs how many times the process migrated
/proc/[PID]/where where the process is currently being computed
/proc/[PID]/migrate same as goto remote processes
/proc/hpc/remote/from the home node of the process
/proc/hpc/remote/identity additional informations about the process
/proc/hpc/remote/statm memory statistic of the process
/proc/hpc/remote/stats cpu statistics of the process

7.3. the userspace-tools

These following tools are providing easy administration to openMosix clusters.
migrate -send a migrate request to a process
                syntax: 
                        migrate [PID] [openMosix_ID]

mon             -is a ncurses-based terminal monitor
                 several informations about the current status are displayed in bar-charts

mosctl          -is the openMosix main configuration utility
                syntax:
                        mosctl  [stay|nostay]
                                [lstay|nolstay]
                                [block|noblock]
                                [quiet|noquiet]
                                [nomfs|mfs]
                                [expel|bring]
                                [gettune|getyard|getdecay]

                        mosctl  whois   [openMosix_ID|IP-address|hostname]

                        mosctl  [getload|getspeed|status|isup|getmem|getfree|getutil]   [openMosix_ID]

                        mosctl  setyard [Processor-Type|openMosix_ID||this]

                        mosctl  setspeed        interger-value

                        mosctl  setdecay interval       [slow fast]

Table 7-6. more detailed

stay no automatic process migration
nostay automatic process migration (default)
lstay local processes should stay
nolstay local processes could migrate
block block arriving of guest processes
noblock allow arriving of guest processes
quiet disable gathering of load-balancing informations
noquiet enable gathering of load-balancing informations
nomfs disables MFS
mfs enables MFS
expel send away guest processes
bring bring all migrated processes home
gettune shows the current overhead parameter
getyard shows the current used Yardstick
getdecay shows the current decay parameter
whois resolves openMosix-ID, ip-addresses and hostnames of the cluster
getload display the (openMosix-) load
getspeed shows the (openMosix-) speed
status displays the current status and configuration
isup is a node up or down (openMosix kind of ping)
getmem shows logical free memory
getfree shows physical free mem
getutil display utilization
setyard sets a new Yardstick-value
setspeed sets a new (openMosix-) speed value
setdecay sets a new decay-interval

mosrun          -run a special configured command on a chosen node
                syntax:
                        mosrun  [-h|openMosix_ID| list_of_openMosix_IDs] command [arguments]

The mosrun command can be executed with several more commandline options. To ease this up there are several preconfigured run-scripts for executing jobs with a special (openMosix) configuration.

Table 7-7. extra options for mosrun

nomig runs a command which process(es) won't migrate
runhome executes a command locked to its home node
runon runs a command which will be directly migrated and locked to a node
cpujob tells the openMosix cluster that this is a cpu-bound process
iojob tells the openMosix cluster that this is a io-bound process
nodecay executes a command and tells the cluster not to refresh the load-balancing statistics
slowdecay executes a command with a slow decay interval for collecting load-balancing statistics
fastdecay executes a command with a fast decay interval for collecting load-balancing statistics

setpe           -manual node configuration utility
                syntax:
                        setpe   -w -f   [hpc_map]
                        setpe   -r [-f  [hpc_map]]
                        setpe   -off

-w reads the openMosix configuration from a file (typically /etc/hpc.map)
-r writes the current openMosix configuration to a file (typically /etc/hpc.map)
-off turns the current openMosix configuration off

tune            openMosix calibration and optimizations utility.
                (for further informations review the tune-man page)

Additional to the /proc interface and the commandline-openMosix utilities (which are using the /proc interface) there is a patched "ps" and "top" available (they are called "mps" and "mtop") which displays also the openMosix-node ID on a column. This is useful for finding out where a specific process is currently being computed.

This actually summarised the command line tools, but have a look at openMosixview which is a GUI for the most common administration tasks, and which ill be discussed in a future chapter.


7.4. Cluster Mask

(by Moshe Bar)

Several people have asked for a feature in openMosix which allows to specifiy to which nodes a given process and it's children can migrate and to which nodes it cannot.

Simone Ettore has just committed a new patch to the CVS which allows you to do just that.

Here is how it works:

  • /proc/[pid]/migfilter enable/disable the capability of filter migration.

  • /proc/[pid]/mignodes is a bit-list of nodes. The bit position of a node is calculated as 2^(PE-1). PE is node number.

  • /proc/[pid]/migpolicy is the policy of the filtering: 0=DENY: the process can migrate in all nodes except when the relative bit on mignodes is 1 1=ALLOW: the process can migrate in all nodes where the relative bit on mignodes is 1

We are shortly going to release also a simple user-land tool to set the node mask, but I would like you guys to give it a try asap before we release it as openMosix 2.4.20-3.


Chapter 8. Tuning Mosix

8.1. Introduction

Some of the parts below are still from the old Mosix Howto, as time passes these parts will get replaced by relevant openMosix parts, however some things are still the same , but your mileage may vary.


8.2. Creating a "Master" node

Although openMosix architcture does not require a master node as such, you might want to have a head node from where you launch processes, this might be a multihomed node from where users log in to your cluster. You want to configure your machine to make processes migrate away

You have to trick the node in thinking it is the slowest node around and it'd better migrate all it's processes to the faster nodes.

You will have to make it "slow" with :
mosctl setspeed [n]
where n should be much lower than the speed of the other nodes Processes will move/migrate away fast. You can get the speed of a node with :
mosctl getspeed


8.3. Optimizing Mosix

Editorial Comment: To be checked with openMosix versions

Login a normal terminal as root. Type
       setpe -r 
which, if everything went right, will give you a listing of your /etc/mosix.map. If things did not go right, try
        setpe -w -f /etc/mosix.map 
to set up your node. Then, type
       cat /proc/$$/lock
to see if your child processes are locked in your mode (1) or can migrate (0). If for some reason you find your processes are locked, you can change this with
        echo 0 > /proc/$$/lock
until you fix the problem. Repeat the whole configuration scheme for a second computer. The programs tune_kernel and prep_tune that Mosix uses to calibrate the individual nodes do not work with the SuSE distribution. However, you can fake it. First, bring the computer you want to tune and another computer with Mosix installed down to single user mode by typing
        init 1
as root. All other computers on the network should be shutdown if possible. On both machines, run the following commands:
        /etc/init.d/network start
        /etc/init.d/mosix start
        echo 1 > /proc/mosix/admin/quiet
This fakes prep_tune and the first parts of tune_kernel. Note that if you have a laptop with a pcmcia network card, you will have to run
        /etc/init.d/pcmcia start
instead of "/etc/init.d/network start". On the computer you want to tune, run tune_kernel and follow instructions. Depending on your machines, this can take a while - if you have a dog, this might be the time to go on that long, long walk you've always promised him. tune_kernel will create a program called "pg" in /root for testing reasons. Ignore it. After tuning is over, copy the contents of /tmp/overheads to the file /etc/overheads (and/or recompile the kernel). Repeat the tuning procedure for each computer. Reboot, enjoy Mosix, and don't forget to brag to your friends about your new cluster.


8.4. Channel Bonding Made Easy

Contributed by Evan Hisey

Channel bonding is actually horrible easy. This may explain the lack of documentation on this subject A bonded network appears as a normal network to the applications. All machines on a subnet must be either bonded the same way. Bonded and non-bonded machine really don't talk well to each other.

Channel bonding needs at least to physical sub-nets but can have more(Currently I have a tri-bonded cluster). To enable bonding you need to either compile in to the kernel or as a module (bonding.o) the Channel Bonding kernel code, as of 2.4.x is it a standard option of the kernel. The NIC's are setup as normal with except that you only us 'ifconfig' to initialize the first card of the bond. 'ifenslave' is used to initialize the remaining cards in the bonded connection. 'ifenslave' can be locate in the linux/Documentation/network/ directory. It will need to be compiled as it is a .c file. The basic format for use is
ifenslave <master> <slave1> <slave2>
...'. Channel bonded networks can connect to standard networks via a router or bridge that supports channel bonding( I just use an extra NIC and port-forwarding in the head node).


8.5. Updatedb

Updatedb in combination with mfs can cause some issues, you might want to add /mfs to the PRUNEFPATHS or mfs to the PRUNEFS in your /etc/updatedb.conf to disable updatedb from indexing this mountpoints.


8.6. openMosix and FireWire

openMosix does gain performance by using another type of network device, as described within the paper about openMosix and FireWire


Chapter 9. openMosixview

9.1. Introduction

openMosixview is the next version and a complete rewrite of Mosixview. It is a cluster-management GUI for openMosix-cluster and everybody is invited to download and use it (at your own risk and responsibility). The openMosixview-suite contains 5 useful applications for monitoring and administrating openMosix-cluster.

openMosixview the main monitoring+administration application
openMosixprocs a process-box for managing processes
openMosixcollector collecting daemon which logs cluster+node informations
openMosixanalyzer for analyzing the data collected by the openMosixcollector
openMosixhistory a process-history for your cluster

All parts are accessible from the main application window. The most common openMosix-commands are executable by a few mouse-clicks. An advanced execution dialog helps to start applications on the cluster. "Priority-sliders" for each node simplifying the manual and automatic load-balancing. openMosixview is now adapted to the openMosix-auto-discovery and gets all configuration-values from the openMosix /proc-interface.


9.2. openMosixview vs Mosixview

openMosixview is fully designed for openMosix cluster only. The Mosixview-website (and all mirrors) will stay as they are but all further developing will continue with openMosixview located at the new domain www.openmosixview.com

If you have: questions, features wanted, problems during installation, comments, exchange of experiences etc. feel free to mail me, Matt Rechenburg or subscribe to the openMosix/Mosixview-mailing-list and mail to the openMosix/Mosixview-mailing-list

changes: (to Mosixview 1.1) openMosixview is a complete rewrite "from the scratch" of Mosixview! It has the same functionalities but there are fundamental changes in ALL parts of the openMosixview source-code. It is tested with a constantly changing cluster topography (required for the openMosix auto-discovery) All "buggy" parts are removed or rewritten and it (should ;) run much more stable now.

adapted to the openMosix-auto-discovery
not using /etc/mosix.map or any cluster-map file anymore
removed the (buggy) map-file parser
rewrote all parts/functions/methods to a cleaner c++ interface
fixed some smaller bugs in the display
replaced MosixMem+Load with the openMosixanalyzer
.. many more changes


9.3. Installation

Requirements

QT
root rights !
rlogin and rsh (or ssh) to all cluster-nodes without password the openMosix userland-tools mosctl, migrate, runon, iojob, cpujob ... (download them from the www.openmosix.org website)

On a RH 8.0 you will need at least the following rpm's qt-3.0.5-17, libmng-1.0.4, XFree86-Mesa-libGLU-4.2.0, glut-3.7 etc ...

Documentation about openMosixview There is a full HTML-documentation about openMosixview included in every package. You find the startpage of the documentation in your openMosixview installation directory: openmosixview/openmosixview/docs/en/index.html

The RPM-packages have their installation directories in: /usr/local/openmosixview


9.3.1. Installation of the RPM-distribution

Download the latest version of openMosixview rpm-package. Then just execute e.g.:
rpm -i openmosixview-1.4.rpm 
This will install all binaries in /usr/bin To uninstall:
rpm -e openmosixview 


9.3.2. Installation of the source-distribution

Download the latest version of openMosixview and unzip+untar the sources and copy the tarball to e.g. /usr/local/.
gunzip openmosixview-1.4.tar.gz 
tar -xvf openmosixview-1.4.tar 


9.3.3. Automatic setup-script

Just cd to the openmosixview-directory and execute
./setup [your_qt_2.3.x_installation_directory] 


9.3.4. Manual compiling

Set the QTDIR-Variable to your actual QT-Distribution, e.g.
export QTDIR=/usr/lib/qt-2.3.0 (for bash) 
or 
setenv QTDIR /usr/lib/qt-2.3.0 (for csh) 


9.3.5. Hints

(from the testers of openMosixview/Mosixview who compiled it on different linux-distributions, thanks again) Create the link /usr/lib/qt pointing to your QT-2.3.x installation e.g. if QT-2.3.x is installed in /usr/local/qt-2.3.0
ln -s /usr/local/qt-2.3.0 /usr/lib/qt 
Then you have to set the QTDIR environment variable to
export QTDIR=/usr/lib/qt (for bash) 
or 
setenv QTDIR /usr/lib/qt (for csh) 
After that the rest should work fine:
./configure 
make 
then do the same in the subdirectory openmosixcollector, openmosixanalyzer, openmosixhistory and openmosixviewprocs. Copy all binaries to /usr/bin
cp openmosixview/openmosixview /usr/bin 
cp openmosixviewproc/openmosixviewprocs/mosixviewprocs /usr/bin 
cp openmosixcollector/openmosixcollector/openmosixcollector /usr/bin 
cp openmosixanalyzer/openmosixanalyzer/openmosixanalyzer /usr/bin 
cp openmosixhistory/openmosixhistory/openmosixhistory /usr/bin 
And the openmosixcollector init-script to your init-directory e.g.
cp openmosixcollector/openmosixcollector.init /etc/init.d/openmosixcollector 
or 
cp openmosixcollector/openmosixcollector.init /etc/rc.d/init.d/openmosixcollector 
Now copy the openmosixprocs binary on each of your cluster-nodes to /usr/bin/openmosixprocs
rcp openmosixprocs/openmosixprocs your_node:/usr/bin/openmosixprocs 
You can now execute mosixview
openmosixview 


9.4. using openMosixview

9.4.1. main application

Here is a picture of the main application-window. The functionality is explained in the following.

openMosixview displays a row with a lamp, a button, a slider, a lcd-number, two progress-bars and some labels for each cluster-member. The lights at the left are displaying the openMosix-Id and the status of the cluster-node. Red if down, green for available.

If you click on a button displaying the ip-address of one node a configuration-dialog will pop up. It shows buttons to execute the most common used "mosctl"-commands. (described later in this HOWTO) With the "speed-sliders" you can set the openMosix-speed for each host. The current speed is displayed by the lcd-number.

You can influence the load-balancing of the whole cluster by changing these values. Processes in a openMosix-Cluster are migrating easier to a node with more openMosix-speed than to nodes with less speed. Sure it is not the physically speed you can set but it is the speed openMosix "thinks" a node has. e.g. a cpu-intensive job on a cluster-node which speed is set to the lowest value of the whole cluster will search for a better processor for running on and migrate away easily.

The progress bars in the middle gives an overview of the load on each cluster-member. It displays in percent so it does not represent exactly the load written to the file /proc/hpc/nodes/x/load (by openMosix), but it should give an overview.

The next progressbar is for the used memory the nodes. It shows the currently used memory in percent from the available memory on the hosts (the label to the right displays the available mem). How many CPUs your cluster have is written in the box to the right. The first line of the main windows contains a configuration button for "all-nodes". You can configure all nodes in your cluster similar by this option.

How good the load-balancing works is displayed by the progressbar in the top left. 100% is very good and means that all nodes nearly have the same load.

Use the collector- and analyzer-menu to manage the openMosixcollector and open the openMosixanalyzer. This two parts of the openMosixview-application suite are useful for getting an overview of your cluster during a longer period.


9.4.2. the configuration-window

This dialog will pop up if an "cluster-node"-button is clicked.

The openMosix-configuration of each host can be changed easily now. All commands will be executed per "rsh" or "ssh" on the remote hosts (even on the local node) so "root" has to "rsh" (or "ssh") to each host in the cluster without prompting for a password (it is well described in a Beowulf documentation or on the HOWTO on this page how to configure it).

The commands are:

automigration on/off 
quiet yes/no 
bring/lstay yes/no 
exspel yes/no 
openMosix start/stop 
If openMosixprocs is properly installed on the remote cluster-nodes click the "remote proc-box"-button to open openMosixprocs (proc-box) from remote. xhost +hostname will be set and the display will point to your localhost. The client is executed on the remote also per "rsh" or "ssh". (the binary openmosixprocs must be copied to e.g. /usr/bin on each host of the cluster) openMosixprocs is a process-box for managing your programs. It is useful to manage programs started and running local on the remote nodes and is described later in this HOWTO.

If you are logged on your cluster from a remote workstation insert your local hostname in the edit-box below the "remote proc-box". Then openMosixprocs will be displayed on your workstation and not on the cluster-member you are logged on. (maybe you have to set "xhost +clusternode" on your workstation). There is a history in the combo-box so you have to write the hostname only once.


9.4.3. advanced-execution

If you want to start jobs on your cluster the "advanced execution"-dialog may help you.

Choose a program to start with the "run-prog" button (file-open-icon) and you can specify how and where the job is started by this execution-dialog. There are several options to explain.


9.4.4. the command-line

You can specify additional commandline-arguments in the lineedit-widget on top of the window.

Table 9-1. how to start

-no migration start a local job which won't migrate
-run home start a local job
-run on start a job on the node you can choose with the "host-chooser"
-cpu job start a computation intensive job on a node (host-chooser)
-io job start a io intensive job on a node (host-chooser)
-no decay start a job with no decay (host-chooser)
-slow decay start a job with slow decay (host-chooser)
-fast decay start a job with fast decay (host-chooser)
-parallel start a job parallel on some or all node (special host-chooser)

9.4.5. the host-chooser

For all jobs you start non-local simple choose a host with the dial-widget. The openMosix-id of the node is also displayed by a lcd-number. Then click execute to start the job.


9.4.6. the parallel host-chooser

You can set the first and last node with 2 spinboxes. Then the command will be executed an all nodes from the first node to the last node. You can also inverse this option.


9.5. openMosixprocs

9.5.1. intro

This process-box is really useful for managing the processes running on your cluster.

You should install it on every cluster-node!

The processlist gives an overview what is running where. The second column displays the openMosix-node ID of each process. 0 means local, all other values are remote nodes. Migrated processes are marked with a green icon and non movable processes have a lock.

By double-clicking a process from the list the migrator-window will pop-up for managing e.g. migrating the process. There are also options to migrate the remote processes away, send SIGSTOP and SIGCONT to it or to "renice" it.

If you click on the "manage procs from remote" button a new window will come up (the remote-procs windows) displaying the process currently migrated to this host.


9.5.2. the migrator-window

This dialog will pop up if process from the process box is clicked.

The openMosixview-migrator window displays all nodes in your openMosix-cluster. This window is for managing one process (with additional status-information). By double-clicking on an host from the list the process will migrate to this host. After a short moment the process-icon for the managed process will be green, which means it is running remote.

The "home"-button sends the process to its home node. With the "best"-button the process is send to the best available node in your cluster. This migration is influenced by the load, speed, CPU's and what openMosix "thinks" of each node. It maybe will migrate to the host with the most CPU's and/or the best speed. With the "kill"-button you can kill the process immediately.

To pause a program just click the "SIGSTOP"-button and to continue the "SIGCONT"-button. With the renice-slider below you can renice the current managed process (-20 means very fast, 0 normal and 20 very slow)


9.5.3. managing processes from remote

This dialog will pop up if the "manage procs from remote"-button beneath the process-box is clicked

The TabView displays processes that are migrated to the local host. The procs are coming from other nodes in your cluster and currently computed on the host openMosixview is started on. Similar to the two buttons in the migrator-window the process is send home by the "goto home node"-button and send to the best available node by the "goto best node"-button.


9.6. openMosixcollector

The openMosixcollector is a daemon which should/could be started on one cluster-member. It logs the openMosix-load of each node to the directory /tmp/openmosixcollector/* These history log-files analyzed by the openMosixanalyzer (as described later) gives an nonstop overview of the load, memory and processes in your cluster. There is one main log-file called /tmp/openmosixcollector/cluster Additional to this there are additional files in this directory to which the data is written.

At startup the openMosixcollector writes its PID (process id) to /var/run/openMosixcollector.pid

The openMosixcollector-daemon restarts every 12 hours and saves the current history to /tmp/openmosixcollector[date]/* These backups are done automatically but you can also trigger this manual.

There is an option to write a checkpoint to the history. These checkpoints are graphically marked as a blue vertical line if you analyze the history log-files with the openMosixanalyzer. For example you can set a checkpoint when you start a job on your cluster and another one at the end..

Here is the explanation of the possible commandline-arguments:
openmosixcollector -d      //starts the collector as a daemon 
openmosixcollector -k      //stops the collector 
openmosixcollector -n      //writes a checkpoint to the history 
openmosixcollector -r      //saves the current history and starts a new one 
openmosixcollector         //print out a short help 

You can start this daemon with its init-script in /etc/init.d or /etc/rc.d/init.d. You just have to create a symbolic link to one of the runlevels for automatic startup.

How to analyze the created logfiles is described in the openMosixanalyzer-section.


9.7. openMosixanalyzer

9.7.1. the load-overview

This picture shows the graphical Load-overview in the openMosixanalyzer (Click to enlarge)

With the openMosixanalyzer you can have a non-stop openMosix-history of your cluster. The history log-files created by openMosixcollector are displayed in a graphically way so that you have a long-time overview what happened and happens on your cluster. The openMosixanalyzer can analyze the current "online" logfiles but you can also open older backups of your openMosixcollector history logs by the filemenu. The logfiles are placed in /tmp/openmosixcollector/* (the backups in /tmp/openmosixcollector[date]/*) and you have to open only the main history file "cluster" to take a look at older load-informations. (the [date] in the backup directories for the log-files is the date the history is saved) The start time is displayed on the top and you have a full-day view in the openMosixanalyzer (12 h).

If you are using the openMosixanalyzer for looking at "online"-logfiles (current history) you can enable the "refresh"-checkbox and the view will auto-refresh.

The load-lines are normally black. If the load increases to >75 the lines are drawn red. These values are openMosix--informations. The openMosixanalyzer gets these informations from the files /proc/hpc/nodes/[openMosix ID]/*

The Find-out-button of each nodes calculates several useful statistic values. Clicking it will open a small new window in which you get the average load- and mem values and some more statically and dynamic informations about the specific node or the whole cluster.


9.7.2. statistical informations about a cluster-node

If there are checkpoints written to the load-history by the openMosixcollector they are displayed as a vertical blue line. You now can compare the load values at a certain moment much easier.


9.7.3. the memory-overview

This picture shows the graphical Memory-overview in the openMosixanalyzer

With Memory-overview in the openMosixanalyzer you can have a non-stop memory history similar to the Load-overview. The history log-files created by openMosixcollector are displayed in a graphically way so that you have a long-time overview what happened and happens on your cluster. It analyze the current "online" logfiles but you can also open older backups of your openMosixcollector history logs by the filemenu.

The displayed values are openMosix-informations. The openMosixanalyzer gets these informations from the files
/proc/hpc/nodes/[openMosix-ID]/mem. 
/proc/hpc/nodes/[openMosix-ID]/rmem. 
/proc/hpc/nodes/[openMosix-ID]/tmem. 

If there are checkpoints written to the memory-history by the openMosixcollector they are displaye