Following on a question i saw in one of the online forums i decided to do a step by step paper on how you can recover one failed node(when the host/node is no longer available).
[dbadmin@ip-10-333-333-210 ~]$ admintools -t list_allnodes
Node | Host | State | Version | DB
----------------+---------------+-------+-----------------+-----
v_db1_node0001 | 10.333.333.210 | UP | vertica-7.2.1.0 | db1
v_db1_node0002 | 10.333.333.211 | UP | vertica-7.2.1.0 | db1
v_db1_node0003 | 10.333.333.226 | UP | vertica-7.2.1.0 | db1
[root@ip-10-333-333-210 ~]# aws ec2 terminate-instances --instance-ids i-5d65634232
-------------------------------
| TerminateInstances |
+-----------------------------+
|| TerminatingInstances ||
|+---------------------------+|
|| InstanceId ||
|+---------------------------+|
|| i-5d656683 ||
|+---------------------------+|
||| CurrentState |||
||+-------+-----------------+||
||| Code | Name |||
||+-------+-----------------+||
||| 32 | shutting-down |||
||+-------+-----------------+||
||| PreviousState |||
||+---------+---------------+||
||| Code | Name |||
||+---------+---------------+||
||| 16 | running |||
||+---------+---------------+||
[dbadmin@ip-10-333-333-210 ~]$ admintools -t list_allnodes
Node | Host | State | Version | DB
----------------+---------------+-------+-----------------+-----
v_db1_node0001 | 10.333.333.210 | UP | vertica-7.2.1.0 | db1
v_db1_node0002 | 10.333.333.211 | UP | vertica-7.2.1.0 | db1
v_db1_node0003 | 10.333.333.226 | DOWN | unavailable | db1
Create new EC2 instance.
Install the required packages and setup host for Vertica installation.
For this task use the script below unless you have an AMI template ready.yum install openssh which dialog gdb mcelog sysstat rsync python* telnet ruby* java* sudo openssh-server openssh-clients ntpd wget -y
chkconfig sshd on
service sshd start
service iptables save
service iptables stop
chkconfig iptables off
chkconfig ntpd on
service ntpd start
echo 'session required pam_limits.so' /etc/pam.d/su
echo '# Controls the default maxmimum open files' /etc/sysctl.conf
echo 'fs.file-max = 65536' /etc/sysctl.conf
echo '# Controls the default maxmimum size of a mesage queue' /etc/sysctl.conf
echo 'kernel.msgmnb = 65536' /etc/sysctl.conf
echo '# Controls the maximum size of a message, in bytes' /etc/sysctl.conf
echo 'kernel.msgmax = 65536' /etc/sysctl.conf
echo '# Controls the maximum shared segment size, in bytes' /etc/sysctl.conf
echo 'kernel.shmmax = 68719476736' /etc/sysctl.conf
echo '# Controls the maximum number of shared memory segments, in pages' /etc/sysctl.conf
echo 'kernel.shmall = 4294967296' /etc/sysctl.conf
echo '# The following 1 line added by Vertica tools. ' /etc/sysctl.conf
echo 'vm.max_map_count = 503831' /etc/sysctl.conf
echo 'vm.swappiness = 10' /etc/sysctl.conf
-- Limits
echo 'dbadmin - nproc 4096' /etc/security/limits.conf
echo 'dbadmin - fsize unlimited ' /etc/security/limits.conf
echo 'dbadmin - nofile 65536 ' /etc/security/limits.conf
echo 'dbadmin - nice 0' /etc/security/limits.conf
-- Disk Readahead set to 4096
for DISK in df | grep vertica | awk '{print $1}' ; do
echo "blockdev --setra 4096 $DISK" /etc/rc.d/rc.local
done
--disable hugepages
echo never /sys/kernel/mm/transparent_hugepage/enabled
-- disable defrag
echo never /sys/kernel/mm/transparent_hugepage/defrag
-- I/O Scheduling to deadline - must have Vertica in the disk desc mount point
for DISK in df | grep vertica | awk {'print $1'} | sed 's//dev///g' ; do
echo deadline /sys/block/$DISK/queue/scheduler
done
sysctl -p
Download the vertica rpm and install it.
Make sure you use the same version as the one in the other two nodes.
root@ip-10-333-333-226 tmp]# rpm -ihv vertica-7.2.1-0.x86_64.RHEL6.rpm
Preparing... ################################# [100%]
Updating / installing...
1:vertica-7.2.1-0 ################################# [100%]
Vertica Analytic Database V7.2.1-0 successfully installed on host ip-10-333-333-226
To complete your NEW installation and configure the cluster, run:
/opt/vertica/sbin/install_vertica
To complete your Vertica UPGRADE, run:
/opt/vertica/sbin/update_vertica
Create the dbadmin user and the verticadba group.
Run the command in the 10.333.333.226 node.
useradd dbadmin
groupadd verticadba
gpasswd -a dbadmin verticadba
Enable password-less ssh access between the nodes for dbadmin user.
This has to be enabled between the nodes.
Create the same directory structure as in the lost node.
To see the directories you require open the /opt/vertica/config/admintools.conf file in the good nodes.
-- create dirs
mkdir /vertica_catalog
mkdir /vertica_data
mkdir /mnt/vertica_temp
-- grant ownership to dbadmin
chown -R dbadmin:verticadba /vertica_catalog
chown -R dbadmin:verticadba /vertica_data
chown -R dbadmin:verticadba /mnt/vertica_temp
chown -R dbadmin:verticadba /opt/vertica
Copy admintools.conf file.
Copy the admintools.conf file from one of the good nodes onto the new node.
[dbadmin@ip-10-333-333-210 .ssh]$ scp /opt/vertica/config/admintools.conf dbadmin@10.333.333.226:/opt/vertica/config/admintools.conf
dbadmin@10.333.333.226's password:
admintools.conf
Create catalog location.
You need to create the catalog location in the new node.
[root@ip-10-333-333-226 vertica_catalog]# mkdir -p /vertica_catalog/db1/v_db1_node0003_catalog/Catalog
[root@ip-10-333-333-226 vertica_catalog]# chown -R dbadmin:verticadba /vertica_catalog
Recover the database using the force option.
Now that we have all ready we need to recover the node. We need to use the --force option that will enable auto recover of the node.
[dbadmin@ip-10-333-333-226 v_db1_node0003_catalog]$ /opt/vertica/bin/admintools -t restart_node -s 10.333.333.226 -d db1 --force
Info: no password specified, using none
*** Restarting nodes for database db1 ***
restart host 10.333.333.226 with catalog v_db1_node0003_catalog
issuing multi-node restart
Starting nodes:
v_db1_node0003 (10.333.333.226)
Starting Vertica on all nodes. Please wait, databases with large catalog may take a while to initialize.
Node Status: v_db1_node0001: (UP) v_db1_node0003: (DOWN)
Node Status: v_db1_node0001: (UP) v_db1_node0003: (DOWN)
Node Status: v_db1_node0001: (UP) v_db1_node0003: (RECOVERING)
Node Status: v_db1_node0001: (UP) v_db1_node0003: (UP)
Restart Nodes result: 1
[dbadmin@ip-10-333-333-226 Catalog]$ /opt/vertica/bin/admintools -t list_allnodes
Node | Host | State | Version | DB
----------------+---------------+-------+-----------------+-----
v_db1_node0001 | 10.333.333.210 | UP | vertica-7.2.1.0 | db1
v_db1_node0002 | 10.333.333.211 | UP | vertica-7.2.1.0 | db1
v_db1_node0003 | 10.333.333.226 | UP | vertica-7.2.1.0 | db1
Conclusion: