Clustering and High Availability With proxmox

One of my favorite features of Proxmox is its ability to form clusters from multiple nodes (servers).

For example, if you have 3 physical servers with Proxmox installed on each of them, you can cluster them together. This way, if you need to take a node down for maintenance, you can simply move the VMs on that node to another server while not incurring any downtime.

The other thing you can do is, if you have at least 3 physical servers in a cluster, you can set up high availability between the three servers.

Proxmox requires at least 3 nodes for this function because it uses quorum to determine which node that the VMs will be automatically transferred to if a node goes down unexpectedly. You will also need a NAS or some type of shared storage for this to work correctly.

In this tutorial, I will be showing you how to create a cluster out of three nodes, set up high availability, and remove nodes from a cluster, which doesn't seem to be well-documented officially, and can be kind of messy if you do it the wrong way.

Creating a 3 node cluster in Proxmox

This will require at least 3 physical Proxmox servers. You can create a cluster with two, but HA will not function if there are less than 3 nodes in a cluster. Go ahead and log into the first server, click on the "Datacenter" tab, click on "Cluster" and "Create Cluster":

Create Cluster in Proxmox

Go ahead and give the cluster a name and then click on "Create"

Name the cluster and create

As usual with proxmox, the cluster creates itself and returns "TASK OK" when complete. Close the activity window:

Task OK for Cluster Creation

Go to your second server, click the "Datacenter" tab once again, and click "Join Cluster"

Click Join Cluster in second server

Go back to your first server and click on "Join Information" followed by "Copy Information":

Copy Join information from first server

On the second server, hit "Control V" in the information window to populate the fields. Make sure you enter the root password of the first server in the field below it. If everything looks good hit "join testcluster (depending on the name you gave your cluster this will vary):

Paste the Join Information

The join operation will commence. If it hangs, just refresh the entire page and re-login. You'll end up with a screen like this :

Join operation completed

And you will have 2 nodes in your cluster as so:

Two nodes in your cluster

In order to add the third node, you log into the node and hit "Cluster" again in the Datacenter tab. You then go back to our first node, copy the "Join Information" and paste it into the appropriate area on the Join Information box, along with the root password. You can basically repeat from step 4 at this point. You'll end up with all three nodes being shown on all proxmox installations like this :

All three nodes listed

High Availability with a 3 node cluster

The nice thing about having HA is that if a node (server) goes down for whatever reason, the virtual machines on that server will automatically move themselves to a different node. They do this using quorum (a vote is taken, and the winner gets the VMs from the failed node). It is possible to set up a "quorum node" - which is a node that doesn't host any virtual machines, but just votes. That is out of the scope of this article however. To set up HA, I have my "test" VM set up, stored on my first server on my NAS here like this:

To start, log into one of your nodes, and click on the "Datacenter" tab, followed by the HA option to the right of it and below.

Click on Add button

Click on "Add" and select the desired VM. I only have one VM, so that's the one I'm using.

Add desired Virtual Machine

You'll now see the VM in the list of machines that are being monitored for HA. You'll notice that if you shut down the node that the virtual machine is being hosted on, the VM will automatically move itself to a node that's powered on.

List of machines that are being monitored for HA

Removing a node from a Cluster

The first thing you want to do is move all the VMs off of the node you're going to remove. Also make sure you have backups of everything, just in case something goes wrong. To migrate a node you'll want to right click on it, and click "migrate"

Right-click and select Migrate

In the window that pops up, you'll want to select the node you want to move the VM storage to and click "Migrate". Make sure your virtual CD/DVD drive is empty when you do this, or it will complain and refuse to migrate.

Select the target node

The virtual machine will take a couple of moments to migrate, but it should give you a TASK OK if everything went well.

Task OK for migration

At this point you should make sure you're not logged into the node you want to remove. If you are, go ahead and log out of Proxmox.

SSH into a different node using the terminal as shown below:

ssh root@192.168.122.53

In my case, I'm going to be removing the "test vm". You can see I'm logged into the proxmox2 VM because it says "local".

root@proxmox2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         1          1 test
         2          1 proxmox2 (local)
         3          1 proxmox3
root@proxmox2:~# 

Go ahead and power down the node by clicking on it, and then clicking "Shutdown" on the right hand side:

Shutdown node

Go ahead and confirm the shutdown, and wait for the node to power down.

You can now remove the node from the cluster using the following command in your SSH session:

root@proxmox2:~# pvecm delnode test
Could not kill node (error = CS_ERR_NOT_EXIST)
Killing node 1
root@proxmox2:~# pvecm nodes

Membership information
----------------------
    Nodeid      Votes Name
         2          1 proxmox2 (local)
         3          1 proxmox3
root@proxmox2:~# 

If you get an error like above, it can be safely ignored.

You can now verify that the node has been removed using the following command:

root@proxmox2:~# pvecm status
Date:             Wed Aug 16 10:47:23 2023
Quorum provider:  corosync_votequorum
Nodes:            2
Node ID:          0x00000002
Ring ID:          2.11
Quorate:          Yes

You can see that the node is removed from the cluster by looking at the "Nodes" section under "quorum information".

Note that if you want to readd the node to the cluster, you'll have to reinstall Proxmox on the node, and add again using the instructions above.

✍🏻
Doron is a long-time system mangler who got his first taste of Linux compiling and configuring ircd servers from source in the mid 90s. He then dwelled into web hosting operations through reseller accounts and dedicated servers. Offline he plays bass, and is an avid music lover. He co-owns an internet radio station called Genesis Radio which plays all kinds of music 24X7 and features events and live shows. If you need hosting services, you can check out his current business, Genesis Hosting