In this section we’ll monitor the status of the GPU’s using the excellent tool nvtop.
squeue
, for example if the job is running on p5-dy-cr-48xlarge-[1-10]
we’ll use p5-dy-cr-48xlarge-1
.ubuntu@ip-10-0-21-245:~$ squeue
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
6 p5 megatron ubuntu R 0:22 1 p5-dy-cr-48xlarge-1
nvtop
:ssh p5-dy-cr-48xlarge-1
sudo apt-get -y install nvtop
nvtop
: