Understanding Vertica Storage Mechanism

I find it fundamental for a DBA to understand what is going on under the hood of your DB engine. One of the strong point of HP Vertica database are the ROS and WOS storage types and is ideal that every DBA that works with Vertica should know what are the steps taken by the database during the process of storing data into the database. Unlike other database that store data into tablespaces(logical group of data files in Oracle) or filegroup(SQL Server) Vertica stores data into containers, and this containers are of two type : WOS or ROS containers.

  • WOS - Write Optimized Storage (something like cache - data is stored unsorted, uncompressed and is available to be queried).
  • ROS - Read Optimized Storage (data is sorted and compressed)
In this demo i will demonstrate how the Data is stored in Vertica
  • We start with an empty database and we check to see that there is nothing stored in the data location.
[dbadmin@DCXLG001 v_d_node0001_data]$ pwd
/home/dbadmin/d/v_d_node0001_data
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD
.
+-- [       4096 Apr 16  8:22]  311
+-- [       4096 Apr 16  8:22]  315
+-- [       4096 Apr 16  8:22]  319
+-- [       4096 Apr 16  8:22]  321
+-- [       4096 Apr 16  8:22]  325
+-- [       4096 Apr 16  8:22]  329
+-- [       4096 Apr 16  8:52]  611
+-- [       4096 Apr 16  8:52]  615
+-- [       4096 Apr 16  8:52]  625
+-- [       4096 Apr 16  8:52]  629
+-- [       4096 Apr 16  8:22]  675
+-- [       4096 Apr 16  8:22]  679
+-- [       4096 Apr 16  8:22]  683
+-- [       4096 Apr 16  8:52]  783
+-- [       4096 Apr 16  8:52]  787
+-- [       4096 Apr 16  8:53]  821
+-- [       4096 Apr 16  8:53]  825
+-- [       4096 Apr 14 13:37]  RejectionTableData
  •  Create new table and insert a row into the table.
create table one (id int);

-- insert data into table one
dbadmin= insert into one values(1);
OUTPUT
------
1
  •  Check the data storage on your disk and see that even tho we have inserted data into the table no data is stored onto our operational system.
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD
.
+-- [       4096 Apr 16  8:22]  311
+-- [       4096 Apr 16  8:22]  315
+-- [       4096 Apr 16  8:22]  319
+-- [       4096 Apr 16  8:22]  321
+-- [       4096 Apr 16  8:22]  325
+-- [       4096 Apr 16  8:22]  329
+-- [       4096 Apr 16  8:52]  611
+-- [       4096 Apr 16  8:52]  615
+-- [       4096 Apr 16  8:52]  625
+-- [       4096 Apr 16  8:52]  629
+-- [       4096 Apr 16  8:22]  675
+-- [       4096 Apr 16  8:22]  679
+-- [       4096 Apr 16  8:22]  683
+-- [       4096 Apr 16  8:52]  783
+-- [       4096 Apr 16  8:52]  787
+-- [       4096 Apr 16  8:53]  821
+-- [       4096 Apr 16  8:53]  825
+-- [       4096 Apr 14 13:37]  RejectionTableData
Why there is no data written to disk ? This is because by default Vertica will store the data into WOS.  As seen bellow we can see the Storage containers associated with with our projection and we see that the storage_type is WOS type.
dbadmin= select * from storage_containers;
-[ RECORD 1 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | WOS
storage_oid         | 45035996273732743
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 16384
start_epoch         | 7
end_epoch           | 7
grouping            | ALL
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | f
location_label      |
delete_vector_count | 0
  •  How to move it to disk(make it consistent) ?
Vertica has a process called Tuple Mover that runs in the background every 600 seconds and his value is set by the MoveOutInterval parameter.
dbadmin= SELECT parameter_name, current_value FROM configuration_parameters where parameter_name='MoveOutInterval';
 parameter_name  | current_value
-----------------+---------------
 MoveOutInterval | 300
 But also we can anticipate the execution of this process by using the run_tuple_mover_once() function.
dbadmin= select run_tuple_mover_once();
-[ RECORD 1 ]--------+---------------------
run_tuple_mover_once | Tuple mover ran once
After running the run_tuple_mover_once() function the storage type changed to ROS and also the data was written to disk.
dbadmin= select * from storage_containers;
-[ RECORD 1 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732785
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         | 7
end_epoch           | 7
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
 Check to see that data was written to disk location.
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD
¦   +-- [       4096 Apr 16  9:45]  45035996273732785
¦       +-- [          3 Apr 16  9:45]  45035996273732785_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732785_0.pidx
+-- [       4096 Apr 16  8:52]  787
+-- [       4096 Apr 16  9:45]  789
¦   +-- [       4096 Apr 16  9:45]  45035996273732789
¦       +-- [         48 Apr 16  9:45]  45035996273732789_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732789_0.pidx
+-- [       4096 Apr 16  8:53]  821
+-- [       4096 Apr 16  8:53]  825
+-- [       4096 Apr 16  9:45]  MvO_28.29.del
¦   +-- [       4096 Apr 16  9:45]  000000
+-- [       4096 Apr 14 13:37]  RejectionTableData
-where the storage_oid columns holds the data store identifier 45035996273732785_* as seen here.
  • Now let's insert some more data in our table but this time we will use the /*+ direct */ option.
  • this option tells Vertica to skip the WOS and store the data into ROS.
dbadmin= insert /*+ direct */ into one values(1);
  •  Now check the containers that were created by the insert with the direct option.
  • see that the storage_type is ROS (because of the direct option use).
dbadmin= select * from storage_containers;
-[ RECORD 1 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732785
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         | 7
end_epoch           | 7
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
-[ RECORD 2 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732829
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         |
end_epoch           |
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
  • Again see the data file on the operation system.
-see that new files where added. eg:45035996273732833_0.*
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD

¦   +-- [       4096 Apr 16  9:45]  45035996273732785
¦       +-- [          3 Apr 16  9:45]  45035996273732785_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732785_0.pidx
+-- [       4096 Apr 16  8:52]  787
+-- [       4096 Apr 16  9:45]  789
¦   +-- [       4096 Apr 16  9:45]  45035996273732789
¦       +-- [         48 Apr 16  9:45]  45035996273732789_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732789_0.pidx
+-- [       4096 Apr 16  8:53]  821
+-- [       4096 Apr 16  8:53]  825
+-- [       4096 Apr 16 10:12]  829
¦   +-- [       4096 Apr 16 10:12]  45035996273732829
¦       +-- [          3 Apr 16 10:12]  45035996273732829_0.fdb
¦       +-- [         56 Apr 16 10:12]  45035996273732829_0.pidx
+-- [       4096 Apr 16 10:12]  833
¦   +-- [       4096 Apr 16 10:12]  45035996273732833
¦       +-- [         48 Apr 16 10:12]  45035996273732833_0.fdb
¦       +-- [         56 Apr 16 10:12]  45035996273732833_0.pidx
+-- [       4096 Apr 14 13:37]  RejectionTableData
  •  Insert some more data into your table
dbadmin= insert /*+ direct */ into one values(1);
-[ RECORD 1 ]
OUTPUT | 1
  •  See that we have added one more container to our projection_name:one_super
dbadmin= select * from storage_containers;
-[ RECORD 1 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732785
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         | 7
end_epoch           | 7
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
-[ RECORD 2 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732829
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         |
end_epoch           |
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
-[ RECORD 3 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273732873
total_row_count     | 1
deleted_row_count   | 0
used_bytes          | 51
start_epoch         |
end_epoch           |
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
  •  Also a new set of data files were added to the operational system data directory.
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD
.
¦   +-- [       4096 Apr 16  9:45]  45035996273732785
¦       +-- [          3 Apr 16  9:45]  45035996273732785_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732785_0.pidx
+-- [       4096 Apr 16  8:52]  787
+-- [       4096 Apr 16  9:45]  789
¦   +-- [       4096 Apr 16  9:45]  45035996273732789
¦       +-- [         48 Apr 16  9:45]  45035996273732789_0.fdb
¦       +-- [         56 Apr 16  9:45]  45035996273732789_0.pidx
+-- [       4096 Apr 16  8:53]  821
+-- [       4096 Apr 16  8:53]  825
+-- [       4096 Apr 16 10:12]  829
¦   +-- [       4096 Apr 16 10:12]  45035996273732829
¦       +-- [          3 Apr 16 10:12]  45035996273732829_0.fdb
¦       +-- [         56 Apr 16 10:12]  45035996273732829_0.pidx
+-- [       4096 Apr 16 10:12]  833
¦   +-- [       4096 Apr 16 10:12]  45035996273732833
¦       +-- [         48 Apr 16 10:12]  45035996273732833_0.fdb
¦       +-- [         56 Apr 16 10:12]  45035996273732833_0.pidx
+-- [       4096 Apr 16 10:15]  873
¦   +-- [       4096 Apr 16 10:15]  45035996273732873
¦       +-- [          3 Apr 16 10:15]  45035996273732873_0.fdb
¦       +-- [         56 Apr 16 10:15]  45035996273732873_0.pidx
+-- [       4096 Apr 16 10:15]  877
¦   +-- [       4096 Apr 16 10:15]  45035996273732877
¦       +-- [         48 Apr 16 10:15]  45035996273732877_0.fdb
¦       +-- [         56 Apr 16 10:15]  45035996273732877_0.pidx
+-- [       4096 Apr 16 10:15]  CpD_32.33.del
¦   +-- [       4096 Apr 16 10:15]  000000
+-- [       4096 Apr 14 13:37]  RejectionTableData
 Now let's see how we can merge all of the containers into one container.
  • for this we will use the DO_TM_TASK() with the mergeout option
dbadmin= SELECT DO_TM_TASK('mergeout', 'one_super');
-[ RECORD 1 ]------------------------------------------------------------------
DO_TM_TASK | Task: mergeout
(Table: public.one) (Projection: public.one_super)
Also this process is running in the background and his values is defined by the MergeOutInterval parameter values(it runs every 600 seconds).
dbadmin= SELECT parameter_name, current_value FROM configuration_parameters where parameter_name='MergeOutInterval';
  parameter_name  | current_value
------------------+---------------
 MergeOutInterval | 600
The DO_TM_TASK process consolidates the ROS containers. - see now we have only one ROS container for our one_super projection.
dbadmin= select * from storage_containers;
-[ RECORD 1 ]-------+------------------
node_name           | v_d_node0001
schema_name         | public
projection_id       | 45035996273732732
projection_name     | one_super
storage_type        | ROS
storage_oid         | 45035996273733013
total_row_count     | 3
deleted_row_count   | 0
used_bytes          | 53
start_epoch         | 7
end_epoch           | 8
grouping            | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted           | t
location_label      |
delete_vector_count | 0
Also see that a consolidation of the files took place
[dbadmin@DCXLG001 v_d_node0001_data]$ tree -sD
.
+-- [       4096 Apr 16 10:34]  013
¦   +-- [       4096 Apr 16 10:34]  45035996273733013
¦       +-- [          5 Apr 16 10:34]  45035996273733013_0.fdb
¦       +-- [         56 Apr 16 10:34]  45035996273733013_0.pidx
+-- [       4096 Apr 16 10:34]  017
¦   +-- [       4096 Apr 16 10:34]  45035996273733017
¦       +-- [         48 Apr 16 10:34]  45035996273733017_0.fdb
¦       +-- [         56 Apr 16 10:34]  45035996273733017_0.pidx