What about that Vertica WRITE ONCE stuff !

In this article I will discus about something i have initiated in  Vertica Storage Concepts. This Is something that DW admin that migrate from SQL Server platform or Oracle must understand when working with Vertica database as their DW database engine. So we all now by now that Vertica is storing it's data in WOS or ROS containers(I went and explained this concepts in previous articles). ROS container are write-once and again I will say write-once, what write-once really means ? Well if you have 1 or many rows of data stored in one ROS container and you decide you want to remove or update one of them the ENTIRE ROS CONTAINER MUST BE RE-CREATED. So after we decide to update an existing data this are the steps that Vertica will need to follow the steps bellow:

  • Insert new data - if you user default insert with no DIRECT option data will go to a WOS container.
  • Initial storage id after default insert
          45035996273721463(this is stored in memory/not-persistent so there is no reference of it on disk).
  • After move-out happens we have a ROS container created and his id is : 45035996273721571 (stored on disk/physical file/persistent).
  • You decide you want to update a row that is contained in this ROS container. -after update you will end-up with 2 ROS containers as bellow. 45035996273721571 -- old ROS 45035996273722143 -- new ROS
  • Next the database will trigger the merge-out operation and will consolidate the present ROS containers.
  • After merge-out we end-up with one container 45035996273722437 -- Consolidated ROS
  • Finally the purge operation runs and removed the delete vectors and we end up with: 45035996273722797 -- Final Consistent and up-to date ROS container.
The whole process I have related is part of the Tuple Mover (TM) operation and it happens at a set interval. For a more deep understanding of this operation follow the link 

Here is an example how this actually work in reality.

I will go and create a table and simulate all the steps i have related previous. Create a new table.
CREATE TABLE TEST(NAME VARCHAR(10));
Insert data into the table with default option(in WOS storage)
(dbadmin@:5433) [dbadmin] * INSERT INTO TEST VALUES('ADRIAN');
-[ RECORD 1 ]
OUTPUT | 1
See the storage container for that table(by default the initial projection name will be
(dbadmin@:5433) [dbadmin]  select * from storage_containers where projection_name='TEST_super';
-[ RECORD 1 ]-------+------------------

node_name | v_d_node0001
schema_name | public
projection_id | 45035996273721452
projection_name | TEST_super
storage_type | WOS
storage_oid | 45035996273721463
total_row_count | 1
deleted_row_count | 0
used_bytes | 16384
start_epoch | 1
end_epoch | 1
grouping | ALL
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted | f
location_label |
delete_vector_count | 0
Move your data from WOS to ROS by using the DO_tm_task('MOVEOUT') operation.
(dbadmin@:5433) [dbadmin] * SELECT DO_tm_task('MOVEOUT');
-[ RECORD 1 ]-------------------------------------------------------------------

DO_tm_task | Task: moveout
(Table: public.TEST) (Projection: public.TEST_super)
Query again the storage container for that table and see that the storage_oid changed and also the storage_type is now ROS.
(dbadmin@:5433) [dbadmin]  select * from storage_containers where projection_name='TEST_super';
-[ RECORD 1 ]-------+------------------

node_name | v_d_node0001
schema_name | public
projection_id | 45035996273721452
projection_name | TEST_super
storage_type | ROS
storage_oid | 45035996273721571
total_row_count | 1
deleted_row_count | 0
used_bytes | 79
start_epoch | 1
end_epoch | 1
grouping | PROJECTION
segment_lower_bound | 0
segment_upper_bound | 4294967295
is_sorted | t
location_label |
delete_vector_count | 0
Now look for the ROS container file location on disk. - find the location
(dbadmin@:5433) [dbadmin] * select storage_path from disk_storage where storage_usage like '%DATA%';
storage_path
-----------------------------------

/home/dbadmin/d/v_d_node0001_data
Look for the storage_oid in your host file system
(dbadmin@:5433) [dbadmin] * ! find /home/dbadmin/ -iname 45035996273721571
/home/dbadmin/d/v_d_node0001_data/571/45035996273721571
See the content of the 571 folder.
  • this will contain two files one is the .fdb which is the ROS container and the second is the .pidx holds indexes and statistics.
  • as small hint on the folders and file naming is that the last three digits of the storage containers will allays correspond to the folder/directory name that holds them.
(dbadmin@:5433) [dbadmin] * ! tree /home/dbadmin/d/v_d_node0001_data/571
/home/dbadmin/d/v_d_node0001_data/571
-- 45035996273721571

|-- 45035996273721571_0.fdb

-- 45035996273721571_0.pidx
Also count the number of ROS containers that are present for our TEST table.
(dbadmin@:5433) [dbadmin] * select ps.projection_name, ps.row_count, ps.ros_count,sc.storage_oid from v_monitor.projection_storage ps
join v_monitor.storage_containers sc on ps.projection_id=sc.projection_id
where anchor_table_name = 'TEST';
-[ RECORD 1 ]---+------------------

projection_name | TEST_super
row_count | 1
ros_count | 1
storage_oid | 45035996273721571
Update the TEST table. Note: -the update statement in general and this is not only in Vertica will do a delete and an insert.
(dbadmin@:5433) [dbadmin] * UPDATE TEST SET NAME='DBA' WHERE NAME='ADRIAN';
-[ RECORD 1 ]
OUTPUT | 1
Count again the number of containers that are present for our TEST table after the update statement now. See we have a new container present.  Normally in a regular database as Oracle or SQL Server this is not done this way, other types of structures are generated(REDO LOGS,TRAN LOGS,etc..).
(dbadmin@:5433) [dbadmin]  select ps.projection_name, ps.row_count, ps.ros_count,sc.storage_oid from v_monitor.projection_storage ps
dbadmin- join v_monitor.storage_containers sc on ps.projection_id=sc.projection_id
dbadmin- where anchor_table_name = 'TEST';
-[ RECORD 1 ]---+------------------

projection_name | TEST_super
row_count | 2
ros_count | 2
storage_oid | 45035996273721571
-[ RECORD 2 ]---+------------------

projection_name | TEST_super
row_count | 2
ros_count | 2
storage_oid | 45035996273722143
Consolidate your containers now by running the do_tm_task('mergout') opertion. - remember that this operation is done by your database on a set interval, we only do it manually just for the purpose of this example.
(dbadmin@:5433) [dbadmin]  select do_tm_task('mergeout');
-[ RECORD 1 ]--------------------------------------------------------------------

do_tm_task | Task: mergeout
(Table: public.TEST) (Projection: public.TEST_super)
After the mergeout occurs we end up with only one ROS container for our TEST table, and you can see the storage_oid changed and we have 2 rows in the row count(this is because of the delete vectors).
(dbadmin@:5433) [dbadmin]  select ps.projection_name, ps.row_count, ps.ros_count,sc.storage_oid from v_monitor.projection_storage ps
dbadmin- join v_monitor.storage_containers sc on ps.projection_id=sc.projection_id
dbadmin- where anchor_table_name = 'TEST';
-[ RECORD 1 ]---+------------------

projection_name | TEST_super
row_count | 2
ros_count | 1
storage_oid | 45035996273722437
Delete/Purge the delete vectors using the purge_table() function.
(dbadmin@:5433) [dbadmin] * select purge_table('test');
-[ RECORD 1 ]----------------------------------------------------------------------------

purge_table | Task: purge operation
(Table: public.TEST) (Projection: public.TEST_super)

Count again the number of containers that are present for our TEST table after the purge statement.
See we have a new container present.
(dbadmin@:5433) [dbadmin]  select ps.projection_name, ps.row_count, ps.ros_count,sc.storage_oid from v_monitor.projection_storage ps
dbadmin- join v_monitor.storage_containers sc on ps.projection_id=sc.projection_id
dbadmin- where anchor_table_name = 'TEST';
-[ RECORD 1 ]---+------------------

projection_name | TEST_super
row_count | 1
ros_count | 1
storage_oid | 45035996273722797
See your data files on the file system now.
(dbadmin@:5433) [dbadmin]  ! tree /home/dbadmin/d/v_d_node0001_data
/home/dbadmin/d/v_d_node0001_data
|-- 143

|-- 147

|-- 151

|-- 153

|-- 437

|-- 441

|-- 445

|-- 447

|-- 571

|-- 575

|-- 797

| -- 45035996273722797

| |-- 45035996273722797_0.fdb

| -- 45035996273722797_0.pidx

-- 801

-- 45035996273722801

|-- 45035996273722801_0.fdb

-- 45035996273722801_0.pidx
 Look for the initial ROC container.
  • is gone.
(dbadmin@:5433) [dbadmin] * ! find /home/dbadmin/ -iname 45035996273721571
Conclusion: There is nothing to update in Vertica !   It only re-creates data so high I/O is required for this type of operations.  The concept of Tablespace(Oracle), FileGroup(SQL Server) does not exists in Vertica this is very important to understand when planning a DW that used Vertica as it's storage engine.