KestrelHPC

How it works

KestrelCluster comes with an extensible template system which generates and edits the system and the nodes files. The behaviour and values of the templates can be changed with the variables, and also the template system allows to overload any template by a user customized one simply creating a new one with the same name and label (since we can have different templates editing the same file, and flags are ignored in the overload process).

The minimal node images are created using debootstrap which bootstraps a minimal debian system, this way we ensure that the OS run on the nodes is minimal. Afterwards the different configuration stages are run over this image.

KestrelCluster adds dinamically entries on the /etc/hosts file, and at the same time we parse this file to know what nodes are connected. Registered nodes are stored on /var/lib/kestrel/registered_nodes with dnsmasq's host format, this way we can map MAC to hostnames and at the same time dnsmasq gives the correct hostnames to the nodes since the start. Both files can be changed with KESTREL_CON_NODES and KESTREL_REG_NODES variables.

The node system and home directory are shared using NFS 4, although this can be extended in the future.

Boot secuence

Design of KestrelHPC 2.0's templates

We tried to keep a simple design so that it would be easy to understand how everything works and know what The problem is that each time we add a new abstraction/stage, the more difficult it gets to understand how everything works and what they do.

The first approach was creating a name convention for every script, a try to provide good a descriptive name. The first name convention was :

(host|image)_service_description[_chroot]

This way we indicated if it was modifying system files or image files, for example the /etc/exports file when configuring an image, and if it was required to be run on a chroot, por example for holding a package with dpkg.

We also to splitted each script as they got too big under the lema: one script one functionality.

In the end, this approach started to be too hard to mantain.

  • Script names endend being too large, but they were never enough to describe what they did
  • We changed their names too much as they grew, sometimes to express better what they did, but this ruined the template override capability, which works as you define another file under /etc/kestrel/ with the same name.
  • Even for me, sometimes I found that it was getting harder to know which script was modifying which file, so this did would be more dificult for anyone to study, review or extend the code.

Design of KestrelCluster 3.0's templates

Templates are named as they files the generate, scripts are named as the files they edit, and non editing scripts describe what they do:

Examples:

1. node/install.d/sbin/dhclient-script(edit)
2. node/install.d/etc/hostname
3. node/install.d/install/etc/init.d/kestrel_disconnect(OS=Ubuntu)
4. node/configure/etc/fstab("nfs4",edit)
5. node/configure/restart.dnsmasq

As we can see 2 and 3 are templates. 1 and 4 are editing scripts. 5 is a non editing script.

Pros

  • It is much easier to understand what files we are editing.
  • It is much easier to review. Could have anyone spotted before that we were editing the apt configuration?
  • 2 modules can provide different templates for modifing the same file, while they use different labels.
  • We can use distinct templates for Ubuntu and Debian.
  • It is easier for the system to know what is being edited, and to make backup of those files

Contras

  • We lose the description of the reason of an action at name level, but we continue adding explanations at the own file.

The Flag System

Flags are used to describe if a file is a template (by default) or an editing script, if it needs to be run under a chroot, and grouping a set of templates under a common "label". These grouped templates can be enabled/disabled globally with the option <label>_disabled=yes, or for an image with <label>_<image>_disabled=yes.

We also may want to create different templated for different OS releases, and we may also want to disable templates with variables.

  • Templates

    system/configure.d/etc/dnsmasq.d/kestrel
    system/configure.d/etc/dnsmasq.d/kestrel(template)

    By default a file is a template and when executed a sed script replaces the variables found by their value on the template.

  • Editing scripts

    share/node/system-install.d/etc/exports(edit)
  • Non editing scripts

    share/system/configure.d/${TFTPBOOT_DIR}/kernel(run)
  • Prefixed non editing scripts

    run.<name>, restart.<name>, check.<name>
    Examples:
    share/system/configure.d/check.nfs
    share/system/configure.d/run.recreate_ssh_keys
    share/system/configure.d/restart.ufw
    share/system/configure.d/restart.dnsmasq

    check scripts are executed on the stage before everything else. For example check.nfs parses the /etc/exports file to autodetect the nfs root directory and saves it on a variable.

    run and restart scripts are executed on the stage after everything else.

    set_key_value function

    This is an interesting function for editing files, since it makes easy to edit key-value based configuration files.

    set_key_value <key> <value> <file> [<separator_re> [<separator> [<space>]]]

    Examples:

    set_key_value FRONTEND_IP "192.168.30.1" ${FILE}
    set_key_value PrintMotd no ${FILE} " "

    FILE variable is an special variable the absolute path to the file being edited. Of course if the script is not executed under a chroot on an image it will point to the complete path where the image is found.

  • User, group and mode setting

    system/install.d/${KESTREL_DATA_DIR}/rpc/fifo(edit,user=kestrel,group=root,mode=660)

    Allow setting the user, group an mode to generated/edited files.

  • Scripts run under a chroot jail

    node/pre-install.d/run.disable-dpkg-upstart(chroot)
    node/pre-install.d/run.disable-dpkg-upstart(nochroot)
  • Distribution and architecture flags

    Syntax:
    os=${OS_DISTRIBUTION}-${OS_CODENAME}
    os=${OS_DISTRIBUTION}-${OS_RELEASE}
    Examples:
    node/install.d/etc/init.d/kestrel_disconnect(os=Ubuntu-11.04)
    node/install.d/etc/init.d/kestrel_connect(os=Debian-squeeze)
    node/packages.d/openmpi(arch=amd64)
    

    Templates and scripts are only applied to machines depending of the distribution or architecture.

  • Variables on paths

    system/configure.d/${TFTPBOOT_DIR}/reboot("pxelinux")

    Variables on paths will be expanded, allowing a user to change the tftpboot dir.

  • Variable flag

    system/configure.d/etc/ufw/applications.d/nfs(${secure_nfs})

    We can enable or disable this script setting the variable secure_nfs_<image>=yes or secure_nfs_=yes.

Kestrel Libraries

  • Configuration library.

    Merges the user and the system configuration.

    load_config, variable_list, variable_user_list, variable_values, export_config, detect_iface
  • Kestrel Library.

    Common functions.

    • Image functions
      mount_image, check_image, list_image, lock_image, ...
    • User functions
      check_user, list_users, sshkeygen, check_root
    • Util functions
      run_script, kestrel_dialog, question_yN, msg, warn, die, msg_config, warn_config, eval_variables, 
      check_kestrel_daemon, check_enabled
  • Node Status Library.

    This library parses /etc/hosts (kestrel daemon adds dynamically entries for each connected node) to get the list of connected nodes, and parses a dnsmasq's dhcp config file (When a node is registered we add an entry containing the mac a unique hostname) to get the list of registered nodes.

    (connected|disconnected|registered)_(nodes|images|groups), list_groups, check_(hostname|group|mac)
  • Reconfiguration Library.

    This library contains functions for configuring files.

    • Template functions:
      create_evaluation_script, find_templates, run_template, applicable_templates
    • Backups functions:
      search_backup_file, get_backup_version, list_(backup_versions|original_files|edited_files), 
      diff_files
    • Stages functions:
      list_stages, check_stages, run_configuration_stage
    • Configure
      node_configure, node_install, system_configure, system_install
  • Edit Library.

    This library contains functions for edit scripts

    copy_file, link_file, template_file, test_backup, perms_and_backup, restore_file, evaluate_template,
    set_key_value

RPC and Daemon

Node stages

  1. Pre-Install

    Templates to be run before the installing packages stage.

    For example, we hold dpkg and upstart packages, since we replace temporaly start-stop-daemon to avoid starting any service when we are installing a packages under a chroot jail.

  2. Package Installation

    The output of scripts in this stage are installed with apt-get.

  3. Install

    Templates in this stage are run only once when KestrelCluster is installed/uninstalled on the system/image.

  4. Configure

    Templates in this stage are run each time you change any configuration parameter.

  5. System-install

    Templates in this stage are run only once when KestrelCluster is installed/uninstalled on the system/image.

  6. System-Configure

    Templates in this stage are run each time you change any configuration parameter.

System stages

  1. Pre-Install

    Unused by now.

  2. Package Installation

    The output of scripts in this stage are installed with apt-get.

    Unused by now, we provide the same functionality with package's dependencies

  3. Install

    Templates in this stage are run only once when KestrelCluster is installed/uninstalled on the system/image.

  4. Configure

    Templates in this stage are run each time you change any configuration parameter.

Configuration variables

Variables get their default value from the configuration files located at:

/usr/share/kestrel/* files should not be modified by a user because they will be overriden when a package is updated. They should modify /etc/kestrel/kestrel.conf instead.

Development

Mailing list:

Mail to:
kestrelhpc-developers@lists.sourceforge.net
Suscribe or browse archives:
https://lists.sourceforge.net/lists/listinfo/kestrelhpc-developers

Git

Browse code online:
https://github.com/KestrelCluster/KestrelCluster
Download code:
git clone git@github.com:jonanh/KestrelCluster.git

Developers

  • Jon Ander Hernández.
    • Rewrote KestrelHPC 2 from scratch while on an unpaid summer intership at the CEIT (July - September 2010).
    • Has been the unique developer since the rewrite of KestrelHPC 2.0 creating the new website, logo (avoiding posible copyright troubles) and documentation.
    • Since the unpaid intership on the CEIT has continued developing KestrelHPC in his spare time and has not received any economic support for his work, he happily accepts donations :-).
    • Developed KestrelHPC with the free software philosophy in mind with the aim of learning lots of Linux internals and sharing what could be a very useful tool.
  • Ander Martínez
    • Wrote the plugin system of kestrel-daemon
  • Denis Sánchez Argoitia.
    • Wrote KestrelHPC 1.0 as his final project while studing at EHU/UPV with a paid intership at CEIT.
    • KestrelHPC 1.0 was an script/package which installed the scripts and packages from PelicanHPC over a standard Debian installation.
    • Directed Jon Ander Hernández while on his summer intership at the CEIT.

Special thanks to:

  • MickD
    • Has been an important contributor reporting tons of issues of KestrelHPC 2.0, and helping to mature this project.
  • Michael Creel.
    • He was a really important asset to the project.

Statistics

Total Files
184
Total Lines of Code
9622 (21591 added, 11969 removed)
Total Commits
285 (average 5.4 commits per active day, 0.9 per all days)
Authors
1 (average 285.0 commits per author)