Skip to content

Instantly share code, notes, and snippets.

@dan-blanchard
Last active March 20, 2022 10:40
Show Gist options
  • Save dan-blanchard/6586533 to your computer and use it in GitHub Desktop.
Save dan-blanchard/6586533 to your computer and use it in GitHub Desktop.
How to setup a single-machine (Sun) Grid Engine installation for unit tests on Travis-CI

I recently needed a way to run unit tests on Travis for a project that uses Sun Grid Engine, Grid Map. Unfortunately, it seemed like no one had figured out how to set that up on Travis before (or simply create a single-machine installation without any user interaction). After hours of trial-and-error, I now know the secrets to making a single-machine installation of SGE that runs on Travis, and I'm sharing my script to prevent other people from going through the same frustrating experience.

To use the install_sge.sh script below, you just need to copy all of the files in this gist to a travis sub-directory directly under the root of your GitHub project, and add the following lines to your .travis.yml

before_install:
  - travis/install_sge.sh
  - export SGE_ROOT=/var/lib/gridengine
  - export SGE_CELL=default
  - export DRMAA_LIBRARY_PATH=/usr/lib/libdrmaa.so.1.0

Once you've done that, you should be able to use qsub, or any libraries that use DRMAA to talk to the grid engine.

How it works

If you care about what the script actually does, we first modify the /etc/hosts file to make sure that the VM's hostname maps back to 127.0.0.1 to prevent SGE from complaining that localhost doesn't point to the same IP as the hostname. Then, we install SGE by pre-specifying some options using debconf-set-selections. After that, we determine who the current user is and how many cores the current machine has, and modify the setting template files to reflect that. Finally, we apply the settings in the template files and print out some debugging info to make sure the grid started properly.

The weirdest issue I had when creating this was that sometimes when I tried to add localhost as an execution host, I would get the error that it already was one, so that's why there's that rather crazy-looking check to see if that's the case.

Additional files (only shows up on roughdraft.io)

install_sge.sh

install_sge.sh

host_template

host_template

queue_template

queue_template

smp_template

smp_template

hostname localhost
load_scaling NONE
complex_values NONE
user_lists arusers
xuser_lists NONE
projects NONE
xprojects NONE
usage_scaling NONE
report_variables NONE
#!/bin/bash
# This script installs and configures a Sun Grid Engine installation for use
# on a Travis instance.
#
# Written by Dan Blanchard ([email protected]), September 2013
cd travis
sudo sed -i -r "s/^(127.0.0.1\s)(localhost\.localdomain\slocalhost)/\1localhost localhost.localdomain $(hostname) /" /etc/hosts
sudo apt-get update -qq
echo "gridengine-master shared/gridenginemaster string localhost" | sudo debconf-set-selections
echo "gridengine-master shared/gridenginecell string default" | sudo debconf-set-selections
echo "gridengine-master shared/gridengineconfig boolean true" | sudo debconf-set-selections
sudo apt-get install gridengine-common gridengine-master
# Do this in a separate step to give master time to start
sudo apt-get install libdrmaa1.0 gridengine-client gridengine-exec
export CORES=$(grep -c '^processor' /proc/cpuinfo)
sed -i -r "s/template/$USER/" user_template
sudo qconf -Auser user_template
sudo qconf -au $USER arusers
sudo qconf -as localhost
export LOCALHOST_IN_SEL=$(qconf -sel | grep -c 'localhost')
if [ $LOCALHOST_IN_SEL != "1" ]; then sudo qconf -Ae host_template; else sudo qconf -Me host_template; fi
sed -i -r "s/UNDEFINED/$CORES/" queue_template
sudo qconf -Ap smp_template
sudo qconf -Aq queue_template
echo "Printing queue info to verify that things are working correctly."
qstat -f -q all.q -explain a
echo "You should see sge_execd and sge_qmaster running below:"
ps aux | grep "sge"
qname all.q
hostlist localhost
seq_no 0
load_thresholds np_load_avg=1.75
suspend_thresholds NONE
nsuspend 1
suspend_interval 00:05:00
priority 0
min_cpu_interval 00:05:00
processors UNDEFINED
qtype BATCH INTERACTIVE
ckpt_list NONE
pe_list make smp
rerun FALSE
slots UNDEFINED
tmpdir /tmp
shell /bin/bash
prolog NONE
epilog NONE
shell_start_mode posix_compliant
starter_method NONE
suspend_method NONE
resume_method NONE
terminate_method NONE
notify 00:00:60
owner_list NONE
user_lists arusers
xuser_lists NONE
subordinate_list NONE
complex_values NONE
projects NONE
xprojects NONE
calendar NONE
initial_state default
s_rt INFINITY
h_rt INFINITY
s_cpu INFINITY
h_cpu INFINITY
s_fsize INFINITY
h_fsize INFINITY
s_data INFINITY
h_data INFINITY
s_stack INFINITY
h_stack INFINITY
s_core INFINITY
h_core INFINITY
s_rss INFINITY
h_rss INFINITY
s_vmem INFINITY
h_vmem INFINITY
pe_name smp
slots 999
user_lists NONE
xuser_lists NONE
start_proc_args /bin/true
stop_proc_args /bin/true
allocation_rule $pe_slots
control_slaves FALSE
job_is_first_task TRUE
urgency_slots min
accounting_summary FALSE
name template
oticket 0
fshare 0
delete_time 0
default_project NONE
@stevekm
Copy link

stevekm commented Aug 15, 2017

I am getting this error from Travis:

$ travis/install_sge.sh

/home/travis/.travis/job_stages: line 57: travis/install_sge.sh: Permission denied

The command "travis/install_sge.sh" failed and exited with 126 during .

Your build has been stopped.

Any ideas?
EDIT: Nevermind, had to run chmod +x on install_sge.sh first. However, even after running this, the qstat command is not available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment