From CAC Wiki
Revision as of 13:11, 14 March 2018 by Hasch (Talk | contribs) (Redirect temporary files)

Jump to: navigation, search

Note: This page has recently undergone substantial changes due to our move from the "old" SW to the "new" Frontenac cluster. Please re-read if you are using COMSOL.

Although COMSOL is installed on our clusters, users will need to provide their own license before they are able to use the software.

Any given COMSOL job requires 4 components to run, namely a license key (follows the format "CMC_#####.key"), a licensing script, a job script, and an input .mph file. The CMC license key and script should be obtained from CMC. If CMC asks for an IP address of the machine COMSOL will be run on, give them (this is the external IP address of the Frontenac cluster).

Using your own license

The script requires editing before it can be used. These changes are designed to allow your job scripts to be run non-interactively. Please edit the line beginning with "ssh" and replace it with the following:

ssh -o UserKnownHostsFile=/dev/null -o StrictHostKeyChecking=no -p 443 -i $CMC_KEY -L 6601:lmserver-8:6601 -L 16601:lmserver-8:16601 -l $CMC_AG

As a test, verify that your license is active. To do this, run with the command "./ <yourUID>". This will open up a connection with CMC and fetch your license status. Your UID is the numeric portion of your COMSOL key. As an example, the UID of "CMC_12345.key" would be "12345".

An active license will show the following output (type "quit" to quit):

Warning: Permanently added '[]:443,[]:443' (RSA) to the list of known hosts.

IP access services                                            Status
 CMC_COMSOL_lmgrd  CMC COMSOL lmgrd (CMC Data center)           Active
 CMC_COMSOL_vendor  CMC COMSOL vendor                            Active

Enter a service name above, or 'help' for further instructions.


If you see dashes ("-") instead of "Active", you should get in touch with CMC and ensure your license gets activated. COMSOL jobs will fail with a licensing error until this is resolved.

Redirect temporary files

COMSOL will attempt to place a large number of temporary and configuration files in your home directory and /tmp (several GB per run). This is not recommended on compute clusters, as /tmp is not shared between nodes and can fill up quickly (causing COMSOL runs to crash with a disk error), and the files it places in one's home directory may use up a significant amount of a user's disk quota under /home. To avoid this, we suggest redirecting all COMSOL tempfiles to your scratch directory. Follow these directions to setup temp file redirection (replace hpc1234 with your user name):

mkdir -p /global/scratch/hpc1234/comsol_scratch
mv ~/.comsol/* /global/scratch/hpc1234/comsol_scratch             # moves any existing COMSOL tempfiles to your new scratch directory
rmdir ~/.comsol                                                   # if the .comsol directory exists, delete it
ln -s /global/scratch/hpc1234/comsol_scratch ~/.comsol

Running jobs

Assuming all required files are in the same directory, a typical COMSOL job might look like this (replace <Your_UID> with your UID number):

#SBATCH --job-name=COMSOL_job
#SBATCH --mail-type=ALL
#SBATCH -o COMSOL-job.out
#SBATCH -e COMSOL-job.err
#SBATCH -n 1
#SBATCH -c 12
#SBATCH -t 30:00
#SBATCH --mem=24000
module load comsol
(while true; do sleep 60 ; done) | ./ <Your_UID> &
sleep 30
comsol -np $SLURM_CPUS_PER_TASK batch -tmpdir $TMPDIR -inputfile inputFilename.mph -outputfile outputFilename

Of course, this script needs to be modified to fit the individual case. Specify your email address with the --mail-user option to get notified when a job starts and finishes. The -o and -e options serve to re-direct the screeen output from COMSOL and the system, respectively. The -N and -n option specify node and process numbers and need to stay at 1 for COMSOL. -c specifies the number of cores to be allocated and sets the environment variable SLURM_CPUS_PER_TASK which is passed to COMSOL.

Please consult our [short guide to the SLURM scheduler] about how to run jobs using SLURM.

Once done creating this job script, submit the job with "sbatch". If you've reached this point, congratulations! You can now run COMSOL jobs on Frontenac.

License troubleshooting

A start error occured on node 11: Could_not_obtain_license_for#Cluster Node
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 11

Check that you have registered and activated your hpc#### username with CMC.

Could not obtain license for COMSOL Multiphysics.
License error -5. 
No such product exists.

All of the available seats for your license have been checked out and are in use.

A start error occured on node 0: License_error_-15_Cannot_connect_to_license_server_system

The licensing script will silently fail (and not connect your job to the server) unless the CMC_#####.key file is in the working directory of the job. To change this behavior, edit script to point directly to the location of your key. In this case, change the line beginning with CMC_KEY to the following:


My license only works for a single node! How can I schedule multiple jobs to one machine?

(... to be added ...)