Skip to content

Install

To install or reinstall a new node:

  1. If node is in the cluster already, first make sure the node is not a part of Ceph or is a gateway (MetalLB, ingress, etc). Ceph nodes can only be taken out one at a time, allowing time to recover after being brought back.

  2. Find the network settings: IP, subnet, gateway, DNS (if not Google or Cloudflare)

  3. Note the current disks setup - whether the node has similar OS drives for MD RAID

  4. Drain the node

  5. Login to the node’s IPMI screen

  6. Attach the ubuntu 22.04 image via the virtual media

  7. Reboot the node

  8. Trigger the boot menu (usually F10), choose to boot from virtual media

  9. Start the install with media check off

  10. Agree to everything it asks

  11. Set up the network:

    1. DNS can be 1.1.1.1,8.8.8.8

    2. Disable unused networks

    3. Can use the subnet calculator to figure out the subnet

  12. For disk: if node has OS drive mirror, use custom layout:

    1. Delete all existing MD arrays

    2. Click the drives you’re going to use, choose reformat

    3. Add unformatted GPT partitions to the drives

    4. Create MD array with those partitions

    5. For 2nd disk choose “Add as another boot device”

    6. Create ext4 GPT partition on created MD array

    7. Proceed with installation

  13. For username choose nautilus

  14. Choose to install SSH server, optionally import key from github

  15. Don’t install any additional packages

  16. In the end disconnect media, reboot

  17. After the node boots, make the nautilus user sudoer with NOPASSWD:

    1. sudo visudo, %sudo ALL=(ALL:ALL) NOPASSWD:ALL

    2. Add mtu: 9000 to /etc/netplan/00-installer-config.yaml, exec netplan apply. The mtu is under the ethernets device.

Steps for NRP Administrators

The below steps are meant for NRP administrators, and do not need to be performed by site system administrators.

  1. Make changes in Ansible inventory file if needed. The node should be in the proper region and zone section, with zone labels added.

  2. Generate a join_token by logging into the controller and running:

    kubeadm token create

  3. Run the Ansible playbook according to docs:

    ansible-playbook setup.yml -l <node> -e join_token=...

Labels added by Ansible:

Check that proper labels were added by Ansible:

host-endpoint: "true"

mtu: "9000"

nautilus.io/network: "10000" - network speed (10000/40000/100000) (needed for perfsonar maddash)

netbox.io/site: UNL - SLUG for netbox site (should exist)

topology.kubernetes.io/region: us-central - region (us-west, us-east, etc)

topology.kubernetes.io/zone: unl - zone

To set all labels run:

Terminal window
kubectl label node node_name nautilus.io/network="10000" netbox.io/site="UNL" topology.kubernetes.io/region="us-central" topology.kubernetes.io/zone="unl"

Cluster firewall

The node’s CIDR should be in https://gitlab.nrp-nautilus.io/prp/calico/-/blob/master/networksets.yaml list for the node to be accessible by other cluster nodes