Following on a question i saw in one of the online forums i decided to do a step by step paper on how you can recover one failed node(when the host/node is no longer available).
if in case you lose one node(the node will be with no access) in your cluster, you wont be able to remove it from the database ! as it is not up(so you have to fix this in another way).
Just before you jump head first you need to remember that if you use this solution in your environment i shall not be hold responsible for any damage caused.
So lets see how we can destroy one of the hosts and recover the entire host + Vertica node.
In this scenario i will be using a Vertica Cluster with 3 nodes running on AWS as shown below:
I have my database up and running as it can be seen.
Ill go ahead and terminate one of the nodes(this is to simulate the loss of the node):
Next i can see that my database is still running(because of build-in HA) but is missing a node(the one we dropped).
So how we can fix this ?
Here is the list of steps:
create new EC2 instance.
install the required packages.
download the vertica rpm and install it.
create the dbadmin user and the verticadba group.
enable password-less ssh access between the nodes for dbadmin user.
create the same directory structure as in the lost node.
copy admintools.conf file.
create catalog location.
recover the database using the force option.
Create new EC2 instance.
it has to be on the VPC and on the subnet.
provide the ip address as the lost node in our case 10.333.333.226
Install the required packages and setup host for Vertica installation.
For this task use the script below unless you have an AMI template ready.
Download the vertica rpm and install it.
Make sure you use the same version as the one in the other two nodes.
Create the dbadmin user and the verticadba group.
Run the command in the 10.333.333.226 node.
Enable password-less ssh access between the nodes for dbadmin user.
This has to be enabled between the nodes.
i won't describe here how is done since is trivial and there are many source over the internet.
Create the same directory structure as in the lost node.
To see the directories you require open the /opt/vertica/config/admintools.conf file in the good nodes.
Copy admintools.conf file.
Copy the admintools.conf file from one of the good nodes onto the new node.
Create catalog location.
You need to create the catalog location in the new node.
since is the node 3 that we lost, we need to create the catalog directory structure as it was before. You just need to see how it is in one of the good nodes and replicate it with node0003 naming.
Recover the database using the force option.
Now that we have all ready we need to recover the node. We need to use the --force option that will enable auto recover of the node.
See if all nodes are up:
Conclusion:
yes it can be done ! Is not a solution i recommend ! It is just a workaround. But sometimes when you get stuck you have to handle it the way you can :).