In this article I will discus about something i have initiated in Vertica Storage Concepts.
This Is something that DW admin that migrate from SQL Server platform or Oracle must understand when working with Vertica database as their DW database engine.
So we all now by now that Vertica is storing it's data in WOS or ROS containers(I went and explained this concepts in previous articles).
ROS container are write-once and again I will say write-once, what write-once really means ?
Well if you have 1 or many rows of data stored in one ROS container and you decide you want to remove or update one of them the ENTIRE ROS CONTAINER MUST BE RE-CREATED.
So after we decide to update an existing data this are the steps that Vertica will need to follow the steps bellow:
Insert new data
- if you user default insert with no DIRECT option data will go to a WOS container.
Initial storage id after default insert
45035996273721463(this is stored in memory/not-persistent so there is no reference of it on disk).
After move-out happens we have a ROS container created and his id is :
45035996273721571 (stored on disk/physical file/persistent).
You decide you want to update a row that is contained in this ROS container.
-after update you will end-up with 2 ROS containers as bellow.
45035996273721571 -- old ROS
45035996273722143 -- new ROS
Next the database will trigger the merge-out operation and will consolidate the present ROS containers.
After merge-out we end-up with one container
45035996273722437 -- Consolidated ROS
Finally the purge operation runs and removed the delete vectors and we end up with:
45035996273722797 -- Final Consistent and up-to date ROS container.
The whole process I have related is part of the Tuple Mover (TM) operation and it happens at a set interval. For a more deep understanding of this operation follow the link
Here is an example how this actually work in reality.
I will go and create a table and simulate all the steps i have related previous.
Create a new table.
Insert data into the table with default option(in WOS storage)
See the storage container for that table(by default the initial projection name will be
Move your data from WOS to ROS by using the DO_tm_task('MOVEOUT') operation.
Query again the storage container for that table and see that the storage_oid changed and also the storage_type is now ROS.
Now look for the ROS container file location on disk.
- find the location
Look for the storage_oid in your host file system
See the content of the 571 folder.
this will contain two files one is the .fdb which is the ROS container and the second is the .pidx holds indexes and statistics.
as small hint on the folders and file naming is that the last three digits of the storage containers will allays correspond to the folder/directory name that holds them.
Also count the number of ROS containers that are present for our TEST table.
Update the TEST table.
Note:
-the update statement in general and this is not only in Vertica will do a delete and an insert.
Count again the number of containers that are present for our TEST table after the update statement now.
See we have a new container present. Normally in a regular database as Oracle or SQL Server this is not done this way, other types of structures are generated(REDO LOGS,TRAN LOGS,etc..).
Consolidate your containers now by running the do_tm_task('mergout') opertion.
- remember that this operation is done by your database on a set interval, we only do it manually just for the purpose of this example.
After the mergeout occurs we end up with only one ROS container for our TEST table, and you can see the storage_oid changed and we have 2 rows in the row count(this is because of the delete vectors).
Delete/Purge the delete vectors using the purge_table() function.
See your data files on the file system now.
Look for the initial ROC container.
is gone.
Conclusion:
There is nothing to update in Vertica ! It only re-creates data so high I/O is required for this type of operations. The concept of Tablespace(Oracle), FileGroup(SQL Server) does not exists in Vertica this is very important to understand when planning a DW that used Vertica as it's storage engine.