Skip to main content
Apply

Computing Center

Open Main MenuClose Main Menu

Pete Tutorials

New Users


TIGER Tutorials


Frequently asked Questions

Accounts

 

Storage

 

Queuing/Scheduler

  • What are the differences in the queues?
    • batch: The 'batch' queue is the default queue. The walltime limit is 120 hours (120:00:00).

    • express: The 'express' queue is for short jobs and debugging/testing scripts. The express queue has a walltime limit of one hour (1:00:00).

    • long: The 'long' queue is for long running jobs that are unable to use a checkpoint/restart feature. The walltime limit is 21 days (504 hours). Jobs in this queue are subject to being killed at the discretion of HPCC administrators for hardware and software issues.

    • bigmem: The 'bigmem' queue directs jobs to one of a dozen large memory nodes that each have 768 GB RAM. The walltime limit is 7 days, or 168 hours. (7-00:00:00 -or- 168:00:00).

    • supermem: The 'supermem' queue consists of one 1.5TB memory node. Users must have shown demonstrated need to use this queue, and should not attempt to use it unless HPC staff verified the 'bigmem' queue was not sufficient.

    • bullet: The 'bullet' queue consists of 10 nodes with dual NVIDIA QUADRO RTX6000, for a total of 20 GPUs. The walltime limit for this queue is 5 days, or 120 hours.

  • Does it matter if I estimate my walltime accurately?

    If you specify an excessively long runtime, your job may be delayed in the queue longer than it should be. The scheduler my allow your job to start ahead of certain MPI jobs if it has a short enough walltime; therefore, please attempt to accurately estimate your walltime!

  • How can I extend the walltime for a job that’s already running?

    Email hpcc@okstate.edu and request a walltime extension.

  • Do I get a node to myself when I submit a job?

    It depends. When your job runs on the node, no other users are able to submit jobs to the same node if all the cores are reserved by your job(s). However, if other users' jobs can fit on the node with your jobs, then they will share the resources. Note that the nodes have controls in place to keep multiple users' jobs from competing for memory/CPU resources.

  • My job only needs to run for a few minutes, but it has been in the queue all day and hasn't started yet! How can I get the job to start?

    You should specify "#SBATCH -p express" in your submission script if your job only needs to run for a few minutes to an hour. The walltime must be one hour or less. New jobs in the express nodes usually start within a few seconds or a few hours. We will not extend walltime for jobs submitted to this queue, however.

  • I’ve been submitting numerous jobs to the cluster over the last several weeks and I’ve noticed my jobs are beginning to start behind other users’ jobs. Why is this happening?

    The scheduler keeps track of how much walltime is used by each user over the course of several weeks. The scheduler will begin to give lower priority to the 'heaviest' users who have submitted a lot of jobs. This help ensures new and less active users have an opportunity to utilize our free resources. All jobs will start normally if nodes are available and no other jobs are waiting in the queue.

  • I’ve submitted hundreds of jobs to the cluster at once and many of them ahave not started and the "Reason" is "QOSMaxCpuPerUserLimit". Do I need to kill those jobs and resubmit them?

    The scheduler limits the max number of jobs that a user can run at any given time. The excess jobs will be placed in the 'QOSMaxCpuPerUserLimit' sections until nodes become free. The jobs will eventually start on their own without need for intervention.

  • Why do jobs with large amounts of cores seem to ‘cut in line’ to the front of the queue?

    The large core jobs are MPI jobs, which often require numerous nodes. MPI jobs have the ability to ‘partially reserve’ nodes that become free. The scheduler will still run, or “backfill,” some jobs onto these nodes while the MPI job is waiting for more resources to become available. A job will typically backfill as long as the specified walltime is short enough that it can start and finish before the MPI job is scheduled to have enough nodes. This functionality becomes more efficient when all users estimate walltimes accurately.

 

Software


Back To Top
SVG directory not found.
MENUCLOSE