Launch and manage jobs
How to create a job script
In case your environment is lacking an editor that you are familiar with, you can use the cat utility to open a file for writing:
cat > job.sh
Make new box that contains OpenCL startup script (possibly more than one box. May have device dependencies)
Arria 10:
source /glob/development-tools/versions/fpgasupportstack/a10/1.2.1/intelFPGA_pro/hld/init_opencl.sh
source /glob/development-tools/versions/fpgasupportstack/a10/1.2.1/inteldevstack/init_env.sh
export FPGA_BBB_CCI_SRC=/usr/local/intel-fpga-bbb
export PATH=/glob/intel-python/python2/bin:${PATH}
Arria 10:
source /glob/development-tools/versions/fpgasupportstack/d5005/2.0.1/inteldevstack/init_env.sh
source /glob/development-tools/versions/fpgasupportstack/d5005/2.0.1/inteldevstack/hld/init_opencl.sh
export FPGA_BBB_CCI_SRC=/usr/local/intel-fpga-bbb
export PATH=/glob/intel-python/python2/bin:${PATH}
How to submit a batch job
qsub -l nodes=1:gpu:ppn=2 -d . job.sh
Note: -l nodes=1:gpu:ppn=2 (lower case L) is used to assign one full GPU node to the job.
Note: The -d . is used to configure the current folder as the working directory for the task.
Note: job.sh is the script that gets executed on the compute node.
How to request interactive mode
qsub -I -l nodes=1:gpu:ppn=2 -d .
Note: -I (upper case i) is the argument used to request an interactive session.
How to validate a job script
In order to test your job script, you first request access to a compute node in interactive mode. At the new prompt:bash job.sh
exit
How to measure job execution time
One option is to add timestamps to the job stdout. This is what a job script could look like after adding start & stop timestamps:#!/bin/bash
echo
echo start: $(date "+%y%m%d.%H%M%S.%3N")
echo
# TODO list
echo
echo stop: $(date "+%y%m%d.%H%M%S.%3N")
echo
How to run a job after a dependent job completed successfully
A job can be configured to be triggered automatically after another job completes successfully. For example, in the case of the FPGA HW, the execution of a workload can be triggered automatically after the compilation completes successfully:
- Submit the build FPGA HW job and take note of the job ID XXXX that is returned:
qsub -l nodes=1:fpga_compile:ppn=2 -d . build_fpga_hw.sh
- Use the -W depend=afterok:[job_id] argument when submitting the FPGA HW execution job:
Note: <fpga type> must match the device type you targeted when compiling your executableqsub -l nodes=1:fpga_runtime:<fpga type>:ppn=2 -d . run_fpga_hw.sh -W depend=afterok:XXXX
How to change job timeout (max = 24h)
By default, any jobs will be terminated automatically at the 6h mark. Use the following syntax if your job requires more than 6h to complete:
qsub […] -l walltime=hh:mm:ss
How to monitor jobs
watch -n 1 qstat -n -1
How to terminate a job
qdel <job_id>
Accessing compute nodes
How to request an interactive shell (upper-case ‘i’)
qsub -I […]
How to request a node by node property. (lower-case ‘L’)
qsub […] -l nodes=1:[property]:ppn=2
How to request a node by node name. (lower-case ‘L’)
qsub […] -l nodes=[node_name]:ppn=2
How to list all compute nodes and their properties
pbsnodes
How to list the free compute nodes (lower-case ‘L’)
pbsnodes -l free
Listing compute node properties
$ pbsnodes | sort | grep properties
Example output:
properties = core,cfl,i9-10920x,ram32gb,net1gbe,iris_xe_max,dual_gpu
properties = core,cfl,i9-10920x,ram32gb,net1gbe,iris_xe_max,quad_gpu
properties = xeon,cfl,e-2176g,ram64gb,net1gbe,gpu,gen9
properties = xeon,clx,ram192gb,net1gbe
properties = xeon,skl,gold6128,ram192gb,net1gbe,fpga,arria10,fpga_runtime
properties = xeon,skl,gold6128,ram192gb,net1gbe,jupyter,batch
properties = xeon,skl,gold6128,ram192gb,net1gbe,jupyter,batch,fpga_compile
properties = xeon,skl,plat8153,ram384gb,net1gbe,renderkit
The properties are used to describe various capabilities available on the compute nodes like: CPU type & name, accelerator type and name, available DRAM, type of interconnect, number of accelerator devices available and their type and intended or recommended use.
Some of the properties describe classes of devices:
- core
- fpga
- gpu
- xeon
Other properties describe the devices by name (includes nda):
- arria10
- stratix10
- e-2176g
- gen9
- gold6128
- i9-10920x
- iris_xe_max
- plat8153
Number of devices:
- dual_gpu
- quad_gpu
Intended use:
- batch
- fpga_compile
- fpga_runtime
- jupyter
- renderkit
- fpga_opencl_compile
- fpga_opencl_runtime
Targeting specific compute nodes
For a full reference of PBS utilities please check this resource: TORQUE PBS - Commands Overview
As an example of how to target specific compute nodes on the DevCloud let’s look at the compute nodes equipped with Intel® Graphics cards. At the time of this writing, by running the pbsnodes utility, we observe the following list of compute node properties:
pbsnodes | sort | grep properties | grep gpu
properties = xeon,cfl,e-2176g,ram64gb,net1gbe,gpu,gen9
properties = core,cfl,i9-10920x,ram32gb,net1gbe,gpu,iris_xe_max,dual_gpu
properties = core,cfl,i9-10920x,ram32gb,net1gbe,gpu,iris_xe_max,quad_gpu
How to target specific GPUs
The command for submitting a job to a compute node hosting a GPU is:
qsub -l nodes=1:gpu:ppn=2 job_script.sh
This will submit the job script to the first available compute node that hosts a GPU. That could be either the Intel® UHD Graphics P630 or the Intel® Iris® Xe MAX Graphics.
We can get more specific. In order to submit a job to an Intel® UHD Graphics P630 use the gen9 property:
qsub -l nodes=1:gen9:ppn=2 job_script.sh
In order to submit a job to a compute node hosting Intel® Iris® Xe MAX Graphics cards use the iris_xe_max property:
qsub -l nodes=1:iris_xe_max:ppn=2 job_script.sh
This command will issue either a dual or a quad Intel® Iris® Xe MAX Graphics compute node. In order to request a dual Intel® Iris® Xe MAX Graphics compute node, use both the iris_xe_max and dual_gpu properties at the same time:
qsub -l nodes=1:iris_xe_max:dual_gpu:ppn=2 job_script.sh
Similarly, in order to request a quad Intel® Iris® Xe MAX Graphics compute node:
qsub -l nodes=1:iris_xe_max:quad_gpu:ppn=2 job_script.sh
How to target specific FPGA on FPGA Server
qsub -q batch@v-qsvr-fpga -I -l nodes=darby:ppn=2
qsub -q batch@v-qsvr-fpga -I -l nodes=arria10:ppn=2
Transferring Files
You can transfer files from the DevCloud to your local system, or transfer from your local system to the DevCloud.
Upload to DevCloud
Open a terminal on your local system that is not connected to the DevCloud.
To upload a file use the syntax:
scp <FILE_NAME> devcloud:<PATH_TO_DESTINATION>
First navigate to the folder where the target file is located. In the example below, the target file is in a folder titled My-Project and the file name is my-application.py. The command below will transfer my-application.py to the DevCloud root folder (~/).
If the transfer is successful, you will see output indicating that the transfer is 100% complete.
In a separate terminal, log in to DevCloud to verify the file transfer.
Download from DevCloud
Open a local terminal that is not connected to the DevCloud.
To download a file, use the syntax:
scp devcloud:<PATH_TO_FILE>/<FILE_NAME> .
First navigate to the folder where you want the file downloaded. In the example below, the target folder is titled My-Reports.
In this example, we will download the file my-report.txt from the root folder of DevCloud.
scp devcloud:~/my-report.txt
If the transfer is successful, you will see output indicating that the transfer is 100% complete and the file will be in your folder.