美文网首页
Terminal for slurm part 1

Terminal for slurm part 1

作者: 李时刻 | 来源:发表于2023-05-06 17:22 被阅读0次

If you don't have root access to run commands with sudo, you can still modify the slurm.conf configuration file by following these steps:

  1. Copy the slurm.conf file to a location where you have write access, such as your home directory.

    cp /etc/slurm/slurm.conf ~/slurm.conf
    
  2. Open the slurm.conf file for editing using a text editor such as nano, vim, or gedit.

    nano ~/slurm.conf
    
  3. Modify the various configuration parameters to suit your needs.

  4. Save the modified slurm.conf file and exit the text editor.

  5. Set the SLURM_CONF environment variable to point to the modified configuration file.

    export SLURM_CONF=~/slurm.conf
    
  6. Verify that the SLURM_CONF environment variable is set correctly by running the following command:

    echo $SLURM_CONF
    
  7. Run the Slurm commands as usual, such as:

    srun -N 2 --ntasks-per-node=4 hostname
    sinfo
    sacctmgr list users
    

    These commands respectively launch a job that runs the hostname command on two nodes with 4 CPU cores each, display the status of nodes and partitions in the cluster, and list users in the accounting database.

Note that setting the SLURM_CONF environment variable only affects the current shell session. If you want to make the variable persistent, you can add it to your shell startup file, such as .bashrc or .zshrc.

To check if Slurm is installed properly on your system, you can check the Slurm version installed using the following command in your terminal or command prompt:

$ sinfo --version

The above command will return the version of Slurm which is currently installed on your system. If Slurm is not installed, this command will return an error.

You can also check the status of the Slurm daemon and the control daemon by running the following command:

$ systemctl status slurmd slurmctld

This command will show you the status of the Slurm daemons and if they are running or not. If both daemons are running with active status, then Slurm is installed correctly on this system.

The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.

You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf and making sure that it is located in the correct directory /etc/slurm and has the correct permissions set.

You may also need to check the Slurm service and make sure it is running using the following command:

$ systemctl status slurmd

If the service is not running, you can start it using:

$ sudo systemctl start slurmd

If you have made any changes to the slurm.conf file, you should restart the Slurm service using the following command:

$ sudo systemctl restart slurmd

After that, you can try running the sinfo --version command again to verify that Slurm is installed and running properly.

If you get the error message "Unit slurmd.service could not be found" when running systemctl --user status slurmd, it indicates that the slurmd daemon is not running under your user account.

This could be due to one of the following reasons:

  • The Slurm service was not installed under your user account, but instead was installed system-wide requiring administrator privileges to manage it.
  • The slurmd daemon was not started or was stopped after installation.

To resolve this issue, you can try the following steps:

  1. Check if Slurm was installed under your user account or system-wide. You can check by running the following command:

    $ which slurmd
    

    If the command returns a path that starts with /usr, then Slurm is installed system-wide and you may need administrator privileges to manage it. If the command returns a path that starts with /home, then Slurm was installed under your user account.

  2. If Slurm was installed system-wide, you should run the systemctl status slurmd command with administrator privileges:

    $ sudo systemctl status slurmd
    

    This will show you the status of the slurmd daemon and whether it is running or not.

  3. If Slurm was installed under your user account, you can start the slurmd daemon by running the following command:

    $ slurmd -Dvvv
    

    This will start the slurmd daemon in debug mode and will show you any error messages that may be occurring.

    Alternatively, you can try running the following command to start slurmd:

    $ systemctl --user start slurmd
    

    If slurmd was already running, you may need to stop it first before you can start it again:

    $ systemctl --user stop slurmd
    

After performing these steps, running systemctl --user status slurmd should show you the status of the slurmd daemon and whether it is running correctly or not.

The error message suggests that Slurm is not correctly installed or configured, and it is unable to find the configuration file. This error message occurs because Slurm is trying to use DNS SRV lookup to find the configuration source, but it is unable to find it.

You can try to troubleshoot this issue by checking the Slurm configuration file slurm.conf and making sure that it is located in the correct directory /etc/slurm and has the correct permissions set.

You may also need to check the Slurm service and make sure it is running using the following command:

$ systemctl status slurmd

If the service is not running, you can start it using:

$ sudo systemctl start slurmd

If you have made any changes to the slurm.conf file, you should restart the Slurm service using the following command:

$ sudo systemctl restart slurmd

After that, you can try running the sinfo --version command again to verify that Slurm is installed and running properly.

It seems that Slurm is installed under your user account .

You can start the slurmd daemon by running the following command:

$ slurmd -Dvvv

This will start the slurmd daemon in debug mode and will show you any errors or informational messages that may be occurring.

Alternatively, you can try running the following command to start slurmd:

$ systemctl --user start slurmd

If slurmd was already running, you may need to stop it first before you can start it again:

$ systemctl --user stop slurmd

After performing these steps, running systemctl --user status slurmd should show you the status of the slurmd daemon and whether it is running correctly or not.

相关文章

  • pyslurm的使用

    pyslurm是slurm API的python扩展库, 通过调用slurm的api接口,获取slurm集群的状态...

  • Slurm 2.6.9-3 Bug修复

    slurm2.6.9-5更新, 修复了2.6.9-3两处Bug Bug 1. slurm-sql无法将特殊字符转义...

  • 2019-03-25

    CentOS7 安装Slurm 1、前期准备 1.1 关闭SELinux

  • 00. 调研计划总览

    集群管理:Slurm [2019.12.14 done] 并行计算框架:MPImpi 可以用slurm调度,也可以...

  • SLURM - Simple Linux Utility fo

    项目/软件介绍 白皮书和用户手册 https://slurm.schedmd.com/slurm_design.p...

  • [M01_Auth] Mongodb数据库权限设置

    Terminal 1: 扩展信息 Terminal 2: 扩展01: 这之后关闭Terminal 1 然后继续按照...

  • 超算平台 | SLURM 的使用

    对于我这个初学者而言,SLURM 学习还是有一定难度。本文参考:slurm入门[https://blog.csdn...

  • SLURM 资源管理系统

    在上一篇中我们简要介绍了 LSF 作业管理系统,下面将介绍 SLURM 资源管理系统。 简介 SLURM (Sim...

  • TOOLs:slurm集群的使用

    目前用过的集群系统有PBS/qsub系统,感觉是命令简单,容易上手。现在使用的是SLURM系统,SLURM系统常用...

  • slurm使用

    1、构建一个.slurm脚本 seri 2、提交作业 3、查看作业 4、删除作业

网友评论

      本文标题:Terminal for slurm part 1

      本文链接:https://www.haomeiwen.com/subject/ispcsdtx.html