MediaWiki API result

This is the HTML representation of the JSON format. HTML is good for debugging, but is unsuitable for application use.

Specify the format parameter to change the output format. To see the non-HTML representation of the JSON format, set format=json.

See the complete documentation, or the API help for more information.

{
    "warnings": {
        "query": {
            "*": "Formatting of continuation data has changed. To receive raw query-continue data, use the 'rawcontinue' parameter. To silence this warning, pass an empty string for 'continue' in the initial query."
        }
    },
    "batchcomplete": "",
    "continue": {
        "gapcontinue": "SLURM_Accounting",
        "continue": "gapcontinue||"
    },
    "query": {
        "pages": {
            "174": {
                "pageid": 174,
                "ns": 0,
                "title": "Renewal",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "= '''Account Renewals''' =\n\nThis is a guide to the account renewal process at the Centre for Advanced Computing. It explains what you need to do in order to keep your account active.\nIt also explains details of our present effort to bring our accounts up to date and synchronize the account structure with the one at Compute Canada.\n \n== How to renew an account at the Centre for Advanced Computing ==\n\nAlmost all accounts that entitle users to access to our '''Frontenac''' cluster require an active '''Compute Canada Role'''. The exception are temporary and teaching accounts. If you have a username starting with \"hpc\", you had to obtain Compute Canada credentials to get it. Compute Canada conducts annual renewals of their accounts/roles and will notify you with a deadline. Please follow those instructions. A separate renewal of your account at the Centre for Advanced Computing is not necessary as long as you keep the associated Compute Canada role active.\n\n'''If you let your Compute Canada role expire without renewing it, it will be deactivated. The CAC account that is associated with this role will also be de-activated and you will loose access to your CAC account'''.\n\nFind details about the Compute Canada renewal process at https://www.computecanada.ca/research-portal/account-management/account-renewals\n\n<pre>\nImportant: The next deadline for active Compute Canada accounts is April 23, 2018.\nCAC accounts whose associated CCRI is not active after that date, will be de-activated.\n</pre>\n\nIf your account at the Centre for Advanced Computing is de-activated because you failed to renew or activate the associated Compute Canada role, you can re-activate it by contacting us at cac.admin@queensu.ca with a request for re-activation. However, you must supply us with an active CCRI from Compute Canada, which means that you have to renew your or re-activate your Compute Canada role first.\n\n== Account cleanup at CAC ==\n\nIn the spring of 2018, we are conducting a review of our accounts to bring them in line with Compute Canada's practice of regular renewals. We will contact users whose account is associated with inactive CCRI's with a request to participate in the annual Compute Canada account renewal to keep their accounts with us active.\n\n=== Old accounts ===\n\nSome of our users have accounts that are associated with Compute Canada roles that are not active anymore. In past years we have kept these accounts active unless a de-activation was requested or became necessary for other reasons. We cannot continue this practise, and will '''de-activate CAC accounts that are found to be associated with inactive CCRI's after April 23, 2018'''.\n\nIf you are receiving an email to remind you to renew or re-activate your Compute Canada role to maintain an active account with the Centre for Advance Computing, please do so before the deadline set by Compute Canada for account renewal. We cannot make exceptions, as active accounts (roles) with Compute Canada are a pre-condition for an account with the CAC and are essential for proper accounting and usage reporting.\n\n=== What if I don't have a Compute Canada account ? ===\n\nIn some rare cases, user may not have Compute Canada credentials. We will contact these users and direct them to apply for credentials. This is done through the [https://ccdb.computecanada.ca/account_application Compute Canada Database Registration Page]. Once you have obtained a CCRI (Compute Canada Role Identifier), please contact us at cac.admin@queensu.ca with that information, and we will link the role to your account. Make sure you apply for the right type of role: if you are a Principal Investigator (PI), apply for a role as faculty or another PI role; if you are \"sponsored\" by a PI, apply for one of the sponsored roles (student, post-doc, researcher, etc.) and provide the CCRI of your sponsor. It is important that this matches your account with us so we can link it.\n\n=== What if my CCRI has expired / was de-activated ? ===\n\nIf you have a CCRI with Compute Canada, and it has expired, you will need to re-activate it or get a new one:\n\n* If the role has expired because you failed to renew it, but otherwise still reflects your current status, you need to login to the Compute Canada database and follow the instructions for role renewal there. If you are PI this may require providing a \"CCV\" (Canadian Curriculum Vitae).\n* If your role was de-activated because the account (role) of your \"sponsor\" (PI) became inactive, you need to ask your sponsor to renew his or her role first.\n* If your role was de-activated because your current status changed and is no longer reflected by that role, you need to apply for a new role. If this is the case, once you have this role, you may have to re-apply for a different account with CAC as well, because it is likely that the old account is no longer appropriate for you. If you are in doubt, please contact cac.help@queensu.ca and ask. We will assist you navigate the re-activation or account migration.\n\n'''Important''': Note that we will check the CCRI that we have associated with your account at CAC after the deadline on April 23, 2018. If it is found to be inactive, we will de-activate the CAC account as well. This is done automatically. If your CCRI has changed recently, please contact us at cac.help@queensu.ca so we can reflect the change in our records and avoid de-activation.\n\n=== Why ? ===\n\nRegular account renewals are necessary to keep account information up-to-date and to avoid issues with account conditions no longer applying. Since Compute Canada account credentials are a necessary condition for a default CAC account, we are using the renewal cycle of Compute Canada to test for the continued existence of this condition. This also makes it possible to conduct regular \"syncs\" between the Compute Canada database and ours, which is necessary for proper reporting of usage."
                    }
                ]
            },
            "122": {
                "pageid": 122,
                "ns": 0,
                "title": "SLURM",
                "revisions": [
                    {
                        "contentformat": "text/x-wiki",
                        "contentmodel": "wikitext",
                        "*": "__TOC__\n\n== About SLURM ==\n\nSLURM is the scheduler used by the Frontenac cluster. Like Sun Grid Engine (the scheduler used for the M9000 and SW clusters), SLURM is used for submitting, monitoring, and controlling jobs on a cluster. Any jobs or computations done on the Frontenac cluster must be started via SLURM. Reading this tutorial will supply all the information necessary to run jobs on Frontenac.\n\nAlthough existing users are likely very familiar with Sun Grid Engine (SGE), switching to SLURM offers a number of advantages over the old system. The biggest advantage is that the scheduling algorithm is significantly better than that offered by SGE, allowing more jobs to be run on the same amount of hardware. SLURM also supports new types of jobs- users will now be able to schedule interactive sessions or run individual commands via the scheduler. In terms of administration and accounting, SLURM is also considerably more flexible. Although easier cluster administration does not directly impact users in the short term, CAC will be able to more easily reconfigure our systems over time to meet the changing needs of users and perform critical system maintenance. All in all, we believe switching to SLURM will offer our users an all-around better experience when using our systems.\n\n=== How SLURM works ===\n\nSLURM is the piece of software that allows many users to share a compute cluster. A cluster is a set of networked computers- each computer represents one \"node\" of the cluster. When a user submits a job, SLURM will schedule this job on a node (or nodes) that meets the resource requirements indicated by the user. If no resources are currently available, the users job will wait in a queue until the resources they have requested become available for use.\n\nNodes in SLURM are divided into distinct \"partitions\" (similar to queues in SGE) and a node may be part of multiple partitions. Different partitions may have different uses, such as directing users' jobs to nodes with a particular piece of software installed (some software licenses only allow us to install software on a given number of nodes). Generally, the default partition (named \"standard\") will suffice for most uses and encompasses the largest amount of hardware. \n\nAll users will have one or more SLURM usage accounts. Accounts are used to record accounting information and may be used control access to certain partitions (such as those for RAC allocations). For everyday, default use, most users will not need to bother with accounts or accounting details (just be aware that they exist). For a detailed overview of SLURM accounts and accounting, please see [http://cac.queensu.ca/wiki/index.php/SLURM_Accounting our guide to SLURM accounting ].\n\n== Basic SLURM commands ==\n\nThese are the basic commands used to do most basic operations with SLURM.\n\n\n==== sinfo - Check the status of the cluster/partitions ====\n<pre>\nsinfo \nsinfo -lNe  # same as above, but shows per-node status\n</pre>\n\nExample output of sinfo on a small demonstration cluster. Nodes cac002-cac006 are part of the \"standard\" partition (jobs are submitted to this partition by default, indicated by the '*' character), and nodes cac007-cac009 are part of the \"large\" partition. One node in the \"large\" partition is currently allocated and being used (cac007).\n<pre>\n$ sinfo\nPARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST\nstandard*    up 2-00:00:00      5   idle cac[002-006]\nlarge        up 14-00:00:0      1  alloc cac007\nlarge        up 14-00:00:0      2   idle cac[008-009]\n</pre>\n\n==== squeue - Show status of jobs ====\n<pre>\nsqueue                  # your jobs\nsqueue -u <username>    # show jobs for user <username>\nsqueue --start          # show expected start times of jobs in queue\n</pre>\n\nExample output of squeue on a demonstration cluster. User jeffs has 5 jobs running on nodes ac002-ac006 (in partition \"standard\"), and 4 jobs in queue. Job 1164 is has not started because no resources are available for that job, and jobs 1165-1167 have not started because job 1164 has priority.\n<pre>\n$ squeue\n             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)\n              1166  standard long-job    jeffs PD       0:00      1 (Priority)\n              1167  standard long-job    jeffs PD       0:00      1 (Priority)\n              1165  standard long-job    jeffs PD       0:00      1 (Priority)\n              1164  standard long-job    jeffs PD       0:00      1 (Resources)\n              1161  standard long-job    jeffs  R       0:08      1 cac004\n              1162  standard long-job    jeffs  R       0:08      1 cac005\n              1163  standard long-job    jeffs  R       0:08      1 cac006\n              1160  standard long-job    jeffs  R       0:12      1 cac003\n              1159  standard long-job    jeffs  R       0:16      1 cac002\n</pre>\n\n\n=== scancel - Kill a job ===\n\nYou can get job IDs with '''squeue''' Note that you can only kill your own jobs.\n<pre>\nscancel <jobID>         # kill job <jobID>. (you can get the job IDs with \"squeue\")\nscancel -u <username>   # kill all jobs for user <username>. \nscancel -t <state>      # kill all jobs in state <state>. <state> can be one of: PENDING, RUNNING, SUSPENDED\n</pre>\n\n== Running jobs ==\n\nThere are actually 3 methods of submitting jobs under SLURM: '''sbatch''', '''srun''', and '''salloc'''. Although this may initially seem unnecessarily complicated, these commands have the same options, and allows users to submit new types of jobs.\n\n=== sbatch - Submit a job script to be run ===\n\n'''sbatch''' will submit a job script to be run by the cluster. Job scripts under SLURM are simply just shell scripts (*.sh) with a set of resource requests at the top of the script. Users of Sun Grid Engine should note that SLURM's '''sbatch''' is functionally identical to SGE's '''qsub'''.\n\nTo submit a job script to SLURM:\n<pre>\nsbatch nameOfScript.sh\n</pre>\n\nExample output:\n<pre>\n$ sbatch long-job.sh\nSubmitted batch job 1169\n</pre>\n\nJob scripts specify the resources requested and other special considerations with special \"#SBATCH\" comments at the top of a job script. Although many of these options are optional, directives dealing with resource requests (CPUs, memory, and walltime) are mandatory. All directives should be added to your scripts in the following manner:\n\n<pre>\n#SBATCH <directive>\n</pre>\n\nTo specify a job name, for instance, you would add the following to your script:\n<pre>\n#SBATCH --job-name=myJobName\n</pre>\n\nFor users looking to get started with SLURM as fast as possible, a minimalist template job script is shown below:\n\n<pre>\n#!/bin/bash\n#SBATCH -c                                 # Number of CPUS requested. If omitted, the default is 1 CPU.\n#SBATCH --mem=megabytes                    # Memory requested in megabytes. If omitted, the default is 1024 MB.\n#SBATCH --time=days-hours:minutes:seconds      # How long will your job run for? If omitted, the default is 3 hours.\n\n# commands for your job go here\n</pre>\n\n==== Mandatory directives ====\n\nDirectives in this section are mandatory, and are by SLURM to determine where and when your jobs will run. If you do not assign a value for these, the scheduler will assign your jobs the default value. If you do not specifically request resources for a job, it will be assigned a set of default resources. Unlike with Sun Grid Engine, jobs that exceed their resource requests will be automatically killed by SLURM. Though this seems harsh, it means that users exceeding the resources that the scheduler has given them will not degrade the experiences of other users on the system. Jobs requesting more resources may be harder to schedule (because they have to wait for a larger slot).\n\n'''-c <cpus>''' -- This is the number of CPUs your job needs. Note that SLURM is relatively generous with CPUs, and the value specified here is the ''minimum'' number of CPUs that your job will be assigned. If additional CPUs are available on a node beyond what was requested, your job will be given those CPUs until they are needed by other jobs. Default value is 1 CPU. Attempting to use more CPUs than you have been allocated will result in your extra processes taking turns on the same CPU (slowing your job down).\n\n'''--mem=<megabytes>''' -- This is the amount of memory your job needs to run. Chances are, you may not know how much memory your job will use. If this is the case, a good rule of thumb is 2048 megabytes (2 gigabytes) per processor that your job uses. Note that jobs will be killed if they exceed their memory allocations, so it's best to err on the safe side and request extra memory if you are unsure of things (there is no penalty for requesting too much memory). Default value is 1024 MB.\n\n'''-t <days-hours:minutes:seconds>''' -- Walltime for your job. The walltime is the length of time you expect your job to run. Again, your job will be killed if it runs for longer than the requested walltime. If you do not know how long your job will run for, err on the side of requesting too much walltime, rather than to little. A typical rule of thumb is asking for twice or three times the amount of time you think you will need. May also follow the format \"hours:minutes:seconds\". Default value is 3 hours, and the maximum walltime is 2 weeks (please contact us if you need to run longer jobs, this is quite easy to accommodate).\n\n==== Optional directives ====\n\nFor a list of all directives available, see the SLURM documentation at [http://slurm.schedmd.com/sbatch.html http://slurm.schedmd.com/sbatch.html]. The directives in this article were covered because they were the most relevant for typical use cases.\n\n'''--mail-type=BEGIN,END,FAIL,ALL''' and '''--mail-user=<emailAddress>''' -- Be emailed when your job starts/finishes/fails. You can specify multiple values for this (separated by commas) if need be.\n\n'''-p <partition>, --partition=<partition>''' -- Submit a job to a specific partition. Your submission may be rejected if you do not have permission to run in the requested partition.\n\n'''-A <account>, --account=<account>''' -- Associate a job with a particular SLURM usage account. Unnecessary unless you wish to submit jobs to a partition that require the use of a particular account.\n\n'''-D <directory>, --chdir=<directory>''' -- The working directory you want your job script to execute in. By default, job working directory is the location where '''sbatch <script>''' was run.\n\n'''-J <name>, --jobname=<name>''' -- Specify a name for your job.\n\n'''-o <STDOUT_log>, --output=<STDOUT_log>''' -- Redirect output to a the logfiles you specify. By default, both STDOUT and STDERR are sent to this file. You can specify %j as part of the log filename to indicate job ID (as an example, \"#SBATCH -o ouptut_%j.o\" would redirect output to \"output_123456.o\").\n\n'''-e <STDERR_log>, --error=<STDERR_log>''' -- Redirect STDERR to a separate file. Works exactly the same as \"-o\".\n\n==== Array jobs ====\n\nWhen running hundreds or thousands of jobs, it may be advantages to run these jobs as an \"array job\". Array jobs allow you submit thousands of such jobs (called \"job steps\") with a single job script. Each job will be assigned a unique value for the environment variable SLURM_ARRAY_TASK_ID. You can use this variable to read parameters for individual steps from a given line of a file, for instance.\n\nA sample array job that creates 6 job steps with SLURM_ARRAY_TASK_ID incremented by 3. STDOUT and STDERR output streams have been redirected to the same file: arrayJob_%A_%a.out (%A is the job number of the array job itself, %a is the job step).\n<pre>\n#!/bin/bash\n#SBATCH --array=0-20:3\n#SBATCH --output=arrayJob_%A_%a.out\n\necho 'This is job step '${SLURM_ARRAY_TASK_ID}\n</pre>\n\n=== srun - Run a single command on the cluster ===\n\nSometimes it may be advantageous to run a single command on the cluster as a test or to quickly perform an operation with additional resources. '''srun''' enables users to do this, and shares all of the same directives as '''sbatch'''. STDOUT and STDERR for an '''srun''' job will be redirected to the user's screen. Ctrl-C will cancel an srun job.\n\nBasic usage:\n<pre>\nsrun <someCommand>     \n</pre>\n\nExample output (running the command \"hostname\" to return which computer you are running on):\n<pre>\n$ srun hostname\ncac003\n</pre>\n\nSubmit a command with additional directives (in this case run the program \"test\" with 12 cpus/20 gigabytes of memory in partition \"bigjob\"):\n<pre>\nsrun -c 12 --mem=20000 --partition=bigjob test\n</pre>\n\n=== salloc - Schedule an interactive job ===\n\nSLURM has the unique capability of being able to schedule interactive sessions for a user. An \"interactive session\" is identical to having normal, command-line usage of one of a cluster node with the resources requested. Need to run a program that requires using a GUI or test out a program? No problem, this just requires a slight modification to '''srun's''' syntax. \n\nTo start an interactive shell, use '''salloc''' in the following manner. Note that use of X11 forwarding requires that you have connected to the cluster using an SSH client that supports X-forwarding (done using \"ssh -X\" on logon, requires XQuartz on OSX or MobaXTerm on Windows).\n\n<pre>\nsalloc [other slurm options here]\n</pre>\n\nExample usage (use 4 processors and 6 gigabytes of RAM interactively):\n<pre>\n[jeffs@caclogin02 ~]$ salloc -c 4 --mem=6g      # start an interactive session with x forwarding for graphics\n[jeffs@cac002 ~]$ xeyes                                     # test graphics forwarding\n[jeffs@cac002 ~]$ exit                                      # quit interactive session\nexit\n[jeffs@caclogin02 ~]$                                           # we are now back on the node where we started\n</pre>\n\n== Parallel Jobs ==\n\nMany of the jobs running on a production cluster are going to involve more than one processor (CPU, core). Such parallel jobs need to request the number of required resources through additional options. The most common ones are:\n\n<pre>\n-N, --node= Number of cluster nodes requested\n-n, --ntasks= Total number of tasks (processes)\n-c, --cpus-per-task Number of cpus (cores) per task\n</pre>\n\nFor different types of parallel jobs different options will be specified. The most common parallel jobs are MPI (distributed memory) jobs, multi-threaded (shared memory) jobs, and so-called hybrids that are a combination of the two. Let's discuss them separately with a n example for each.\n\n=== MPI Jobs ===\nMPI (Message Passing Interface) is the standard communication API for parallel distributed-memory job capable of being deployed on a cluster. To schedule such a job, it is necessary to specify the number of cluster nodes that will be used, and the number of processes (tasks) that are going to run on each node. \n\nCurrently, each MPI job on our cluster is restricted to run on a single node, i.e. all processes are scheduled on different CPUs (cores) and use a so-called shared-memory layer to communicate with each other. The upside of this type of scheduling is that communication is fast and efficient compared with inter-node communication. The downside is that the total number of tasks (processes) used by a program is limited by the size of the node on which it runs. A typical MPI script for such a program looks like this:\n\n<pre>\n#!/bin/bash\n#SBATCH --job-name=MPI_test\n#SBATCH --mail-type=ALL\n#SBATCH --mail-user=joe.user@email.ca\n#SBATCH --output=STD.out\n#SBATCH --error=STD.err\n#SBATCH --nodes=1\n#SBATCH --ntasks=8\n#SBATCH --cpus-per-task=1\n#SBATCH --time=30:00\n#SBATCH --mem=1G\nmpirun -np $SLURM_NTASKS ./mpi_program\n</pre>\n\nThe key option here is \"-ntasks=8\" which requests enough cores for 8 MPI tasks. \n\nThe \"--nodes\" and \"--cpus-per-task\" options need to be kept at 1 to indicated that all processes are to be run on a single node, and that each process is single-threaded (i.e. we are not doing any multi-threading on the MPI processes).\n\nA specification of the number of processes in the mpirun line may be omitted as mpirun interfaces with SLURM and selects the proper number automatically from the \"--ntasks\" option.\n\n=== Multi-threaded Jobs ===\n\nParallel jobs designed to run on a multi-core (shared-memory) system are usually \"multi-threaded\". Scheduling such a job requires to specify the number of cores being used to accommodate the threads.\n\nOpenMP is the commonly set of compiler directives to facilitate the development of multi-threaded programs. A typical SLURM script for such a program looks like this:\n\n<pre>\n#!/bin/bash\n#SBATCH --job-name=OMPtest\n#SBATCH --mail-type=ALL\n#SBATCH --mail-user=my.email@whatever.ca\n#SBATCH --output=STD.out\n#SBATCH --error=STD.err\n#SBATCH --nodes=1\n#SBATCH --ntasks=1\n#SBATCH --cpus-per-task=4\n#SBATCH --time=30:00\n#SBATCH --mem=1G\nOMP_NUM_THREADS=$SLURM_CPUS_PER_TASK time ./omp-program \n</pre>\n\nWhen using an OpenMP program, the number of threads (and therefore the required number of cores) is specified via the environment variable OMP_NUM_THREADS which therefore appears in the script in front of the call to the program.We are setting it to the internal variable SLURM_CPUS_PER_TASK which is set through the \"-cpus-per-task\" option (to 4 in our case). \n\nThe \"--nodes\" and \"--ntasks\" options are kept at 1 to indicate a single main program running on a single node.\n\nMulti-threaded programs that use different multi-threaded techniques (for instance, the Posix thread libraries) use a slightly different approach, but the principle is the same:\n\nSpecify the number of required cores through the \"--cpus-per-task\" option and pass that number to the program through the variable SLURM_CPUS_PER_TASK.\n\n=== Hybrid Jobs ===\n\nMPI distributed-memory and OpenMP shared-memory parallelism may be combined to obtain a \"hybrid\" program. This has to be done with great care to avoid race-conditions on the process-to-process communication. However, such programs are particularly useful when it is important to exploit the multi-core nature of the nodes in a cluster.\n\nThe following script works for simple run of a hybrid program on a single node, assuming each MPI process uses the same number of sub-threads:\n\n<pre>\n#!/bin/bash\n#SBATCH --job-name=OMP_test\n#SBATCH --mail-type=ALL\n#SBATCH --mail-user=my.email@whatever.ca\n#SBATCH --output=STD.out\n#SBATCH --error=STD.err\n#SBATCH --nodes=1\n#SBATCH --ntasks=8\n#SBATCH --cpus-per-task=4\n#SBATCH --time=30:00\n#SBATCH --mem=1000\nOMP_NUM_THREADS=$SLURM_CPUS_PER_TASK mpirun -np $SLURM_NTASKS ./hybrid-program \n</pre>\n\nThis example would run the program \"hybrid-program\" with 8 MPI processes, each utilizing 4 threads for a total of 32. Note that the number of nodes (i.e. the --nodes option) is set to one to indicate that all cores need to be allocated on a single node. This setting should not be changed in the current cluster configuration.\n\n== GPU jobs ==\n\nCAC has a small number of NVIDIA V100 GPUs (nodes cac107-109), NVIDIA GP100 (cac104-cac106, 3xGP100/node), and a few RTX4000 and a Titan GPUs for short jobs available for general use. To access these, add the following to your job script:\n\n<pre>\n#SBATCH --partition=gpu\n#SBATCH --gres gpu:1\n</pre>\n\nTo specify a specific flavour of GPU, use any of these lines\n<pre>\n#SBATCH --gres gpu:v100:1\n#SBATCH --gres gpu:quadro:1      #(or 2 or 3)\n#SBATCH --gres gpu:titan:1       #(maxtime is 3 hours)\n#SBATCH --gres gpu:rtx4000:1     #(maxtime is 3 hours)\n#SBATCH --gres gpu:rtx4000:2     #(maxtime is 3 hours)\n</pre>\n\nThe <code>--partition=gpu</code> flag sends your job to a partition with GPUs available. <code>--gres gpu:1</code> requests a single GPU. GPUs are billed as 10 CPUs when performing fairshare/job priority calculations - using one will count as using either #GPUs x 10 or the #CPUs, whichever is higher. Note that most GPU-accelerated software will not be displayed by default when running <code>module avail</code>, make sure to use <code>module spider softwarename</code> to find and learn how to load GPU-specific software modules.\n\nFor an interactive job using GPUs, run the following (this example uses 1 GPU, 10 CPUs, and 40GB of memory for 8 hours):\n\n<pre>\nsalloc -p gpu --gres gpu:1 -c 10 --mem 40g -t 8:0:0\n</pre>\n\nHere is an example job that would run the \"deviceQuery\" program from the NVIDIA CUDA developer samples:\n\n<pre>\n#!/bin/bash\n#SBATCH --cores-per-task=1\n#SBATCH --mem=10g\n#SBATCH --time=1:00\n#SBATCH --partition=gpu\n#SBATCH --gres gpu:1\n\n#SBATCH --output=STD.out\n#SBATCH --error=STD.err\n\nmodule load cuda\nmodule load cuda-samples\n# display all allocated GPUs and their stats\ndeviceQuery\n# show GPU bandwidth for 1 GPU only\nbandwidthTest\n</pre>\n\nThe line \"module load cuda-samples\" adds to the PATH a directory containing pre-compiled CUDA sample programs. The output of these simple sample programs are directed to STD.out (standard output) and STD.err (standard error).\n\n== Accounts and Partitions ==\n\n=== Accounts ===\n\nA substantial part of our resources are allocated beforehand for large projects. This is handled through the scheduler using priorities and allocation limits. A more detailed account of allocations on the Frontenac cluster can be found on our [[Allocation|Allocations Page]].\n\nEvery user of the Frontenac cluster is issued a '''Default Account''' for the scheduler. This is done automatically at first login. It entitles the user to access the \"Standard\" partition of nodes. This partition contains a (somewhat variable) number of nodes. Most of these have 24 cores and 256 GB of memory. For details see [https://cac.queensu.ca/wiki/index.php/Allocation#Default_accounts this entry]. If no partition and no account are specified, this default will be used. This account is also associated with a default priority, which is used to determine when a job gets scheduled if there is competition for resources. \n\n'''Note:''' The scheduler is trying to maximize the utilization of scarce resources. Due to the relatively low priority of the default accounts, you have to expect long waiting times if many users are on the cluster. Some of the resources (for instance, nodes with large amounts of memory) may not be available to a default account at all. \n\nIf a user has been given an allocation (for instance, from an application to Compute Canada, an additional non-default account is issued. This is done \"manually\" and the account is only used if specified explicitly in the job script. Account specification is done through the SLURM '''-A''' or '''--account=''' option, for instance\n\n<pre>\n#SBATCH --account=rac-2017-hpcg1234\n</pre>\n\nAn account specification consists of three parts:\n* The type of account and its associated allocation. Presently this may be \"def\" (for default), \"rac\" (for RAC allocation from Compute Canada), or \"con\" (for contributed systems). In the above example we are specifying a \"RAC\" type account, thus the \"rac-\"\n* The year of the associated allocation (2017 in the above example)\n* The name of the group. Typically this is \"hpcg\" followed by 4 digits. Since allocation limits usually apply on a group level, this needs to be specified, in the above example it's hpcg1234.\n\nNote that if you are entitled to use a special allocation, you '''must''' specify the proper account or you will not be able to access the extended resources that go with it. Non-default accounts also receive a higher priority on shared resources, i.e. their jobs will be scheduled preferably if resources are sparse (as they usually are).\n\n=== Partitions ===\n\nThe Frontenac cluster is split up into partly overlapping \"partitions\", i.e. group of nodes. There are currently two of these:\n* The '''standard''' partition is accessible by default accounts. It cannot be used from a non-default account. It consists largely of smaller nodes with 24 cores and 256 GB of memory.\n* The '''reserved''' partition can only be accessed by rac- and con- accounts (i.e. non-default ones). It contains large-memory and other nodes with an extended number of cores (40-144).\n\nThe two partitions are partially overlapping, i.e. some nodes may be accessed from either. However, for those nodes the non-standard accounts take precendence because of their higher priority, so that is a default account and a non-default account compete, the latter will be scheduled first.\n\n'''Note:''' The partition '''must''' be specified and it must be compatible with the account type. To specify it, you use the '''-p''' or '''--partition''' option:\n\n<pre>\n#SBATCH --account=rac-2017-hpcg1234\n#SBATCH --partition=reserved\n</pre>\n\nIf no partition is specified, '''standard''' is assumed. \n'''Important:''' If you are using a \"rac-\" or \"con-\" account, you must specify the \"reserved\" partition, as the \"standard\" one is incompatible and the job will not be scheduled. This means that you always need to specify '''both''' the \"--account\" and the \"--partition\" option. Specifying only one of these will cause the job to remain on the queue indefinitely.\n\n== Migrating from other Schedulers ==\n\n=== Sun Grid Engine ===\n\nMost SGE commands (qsub, qstat, etc.) will work on SLURM, although you will need to rewrite your scripts to use #SBATCH directives instead of the #$ directives used by SGE. The command <code>sge2slurm</code> will convert a SGE job script to a SLURM job script.\n\n=== PBS/TORQUE ===\n\nSLURM [https://hpc.nih.gov/docs/pbs2slurm.html can actually run PBS job scripts] in many cases. Most PBS commands (qstat, qsub, etc.) will work on SLURM. The \"pbs2slurm\" script can be used to convert a PBS script to a SLURM one.\n\n\n== Using SSE3 Nodes(Decommissioned) ==\n\n'''CAC''' has several older nodes that use SSE3 architecture (as opposed to AVX). These nodes may or may not work with your code but users are welcome to try. They currently are under utilized and may offer a solution during times of high usage of the regular nodes on the Frontenac Cluster. A different stack needs to be loaded prior to running jobs on these nodes. Simply run ''''load-sse3'''' prior to submitting your slurm script."
                    }
                ]
            }
        }
    }
}