FAQ
ACCOUNTS
Am I allowed to have an account on OSU HPCC resources? If you are a student, faculty or staff member of Oklahoma State University you are entitled to an HPCC account, simply by filling out a HPCC Account Request form. Other researchers throughout the state of Oklahoma and those affiliated through various means with research/academic ventures underway at OSU are potentially welcome to an account also. If you have any question about your eligibility for an account, please contact jessejs@okstate.edu with questions.
I forgot my password. Can you tell me what it is? We are not able to give you the current password on our systems, but can reset your password and text the new temporary password to the mobile phone number on record. (NOTE: passwords are never emailed. For security purposes, they are only sent under separate copy to a mobile phone number or provided to you in person.)
Why was my account deactivated or suspended? How can I reactivate it? Your account may be suspended if you have not accessed HPCC resources for six months or if it becomes necessary to protect the security of the HPC system. Please send an email to jessejs@okstate.edu, including your account name and issues experienced.
Can I share my account with my colleague(s), student(s), friend(s), neighbor(s), etc? Sharing of a personal account is not allowed on OSU or OSU HPCC resources.
I failed several login attempts from an off-campus computer and I can no longer connect. How can I gain regain access? Please send an email to jessejs@okstate.edu and request your account be reactivated.
Storge
How much space do I have in my /home/username directory? Each user begins with a default 1GB quota on their /home/username directory, unless 10GB is requested. 10GB is the maximum size of allowed space in /home per user. There are other storage offerings available, please contact jessejs@okstate.edu for questions.
How can I determine how much space I’ve used? Type `du -sh ~` at the home directory prompt.
Is my data guaranteed to be backed up? The file systems are very reliable, however data can be lost or damaged due to media failures, software bugs, hardware failures or other problems. IT IS YOUR RESPONSIBILITY to back up critical files. If you need archival storage there is PetaStore available here in Oklahoma.
How much space can I use in /scratch? The /scratch space is not limited by quotas, but is a resource shared among all users. You should only store current work and files in your scratch directory. You may be asked to move or delete data when total scratch storage is high. If you need archival storage there is PetaStore available here in Oklahoma.
I need to transfer numerous large files to/from another location. What is the best way to do this? There are several options for transferring large files outlined HERE.
QUEUING/SCHEDULER
What are the differences in the 'queues,' sometimes known as 'partitions?'
The queues on Pete vary, based on hardware capabilities (CPU/RAM/GPU) and walltime limits. It's important to note that the 196 general compute nodes on Pete may be assigned to more than one queue.
-
batch: The 'batch' queue is the default queue. This includes 195 of the general purpose compute nodes (a mix of "Skylake" and "Cascade Lake" CPUs). The walltime limit is 120 hours (120:00:00).
-
express: The 'express' queue is for short jobs and debugging/testing scripts. This includes all 196 general purpose compute nodes (a mix of "Skylake" and "Cascade Lake" CPUs). The walltime limit is one hour (1:00:00).
-
skylake: This includes the 163 general purpose compute nodes with "Skylake" CPUs. The walltime limit is 120 hours (120:00:00).
-
cascadelake: This includes the 32 general purpose compute nodes with "Cascade Lake" CPUs. The walltime limit is 120 hours (120:00:00).
-
long: The 'long' queue is for long running jobs that are unable to use a checkpoint/restart feature. Jobs in this queue are subject to being killed at the discretion of HPCC administrators for hardware and software issues. This includes 195 of the general purpose compute nodes (a mix of "Skylake" and "Cascade Lake" CPUs). The walltime limit is 21 days (504 hours).
-
bigmem: The 'bigmem' queue directs jobs to one of a dozen large memory nodes that each have 768 GB RAM. The walltime limit is 7 days, or 168 hours. (7-00:00:00 -or- 168:00:00).
-
supermem: The 'supermem' queue consists of one 1.5TB memory node. Users must have shown demonstrated need to use this queue, and should not attempt to use it unless HPC staff verified the 'bigmem' queue was not sufficient.
-
bullet: The 'bullet' queue consists of 10 nodes, each containing at least two NVIDIA QUADRO RTX6000, with a total of 22 GPUs. The walltime limit for this queue is 5 days, or 120 hours.
More information about node architecture can be found on Pete's information page: Pete Supercomputer
Does it matter if I estimate my walltime accurately? If you specify an excessively long runtime, your job may be delayed in the queue longer than it should be. The scheduler my allow your job to start ahead of certain MPI jobs if it has a short enough waltime; therefore, please attempt to accurately estimate your walltime!
How can I extend the walltime for a job that’s already running? Email jessejs@okstate.edu and request a walltime extension.
Do I get a node to myself when I submit a job? It depends. When your job runs on the node, no other users are able to submit jobs to the same node if all the cores are reserved by your job(s). However, if other users' jobs can fit on the node with your jobs, then they will share the resources. Note that the nodes have controls in place to keep multiple users' jobs from competing for memory/CPU resources.
My job only needs to run for a few minutes, but it has been in the queue all day and hasn’t started yet! How can I get the job to start? You should specify "#SBATCH -p express" in your submission script if your job only needs to run for a few minutes to an hour. The walltme must be one hour or less. New jobs in the express nodes usually start within a few seconds or a few hours. We will not extend walltime for jobs submitted to this queue, however.
I’ve been submitting numerous jobs to the cluster over the last several weeks and I’ve noticed my jobs are beginning to start behind other users’ jobs. Why is this happening? The scheduler keeps track of how much walltime is used by each user over the course of several weeks. The scheduler will begin to give lower priority to the 'heaviest' users who have submitted a lot of jobs. This help ensures new and less active users have an opportunity to utilize our free resources. All jobs will start normally if nodes are available and no other jobs are waiting in the queue.
I’ve submitted hundreds of jobs to the cluster at once and many of them are in the “Blocked Jobs” section! Do I need to kill those jobs and resubmit them? The scheduler limits the max number of jobs that a user can run at any given time. The excess jobs will be placed in the 'Blocked Jobs' sections until nodes become free. The jobs will eventually start on their own without need for intervention.
Why do jobs with large amounts of cores seem to ‘cut in line’ to the front of the queue? The large core jobs are MPI jobs, which often require numerous nodes. MPI jobs have the ability to ‘partially reserve’ nodes that become free. The scheduler will still run, or “backfill,” some jobs onto these nodes while the MPI job is waiting for more resources to become available. A job will typically backfill as long as the specified walltime is short enough that it can start and finish before the MPI job is scheduled to have enough nodes. This functionality becomes more efficient when all users estimate walltimes accurately.
SOFTWARE
I need to use Software XYZ, can I install it myself? We encourage all users to install their own software if possible. We expect users to work with the software developers, online tutorials, or external user forums when performing self-installs.
I need to use Software XYZ, will you install it for me? We are often backlogged with numerous requests, but generally we'll install any open-source software that multiple users can/will use. Please email jessejs@okstate.edu with these requests.
Does graphical software run on the super computer? Yes, but it is usually not ideal. The graphics are usually choppy or non-responsive. The TIGER research cloud is the appropriate system for graphical software, especially Windows-based.