How to get the raw size of your data in HP Vertica

Some times we need to know what is our Raw data size that our tables is occupying in HP Vertica. We already know that Vertica is very good when we talk about data compression and data encoding so the data size that is stored in the projection_storage.used_bytes it holds the value of the data size after the compression or encoding is applied.   I will the StockTransaction_Fact as example. The Raw data inside the StockTransaction_Fact.dat file has 685M in size.

du -sh StockTransaction_Fact.dat

685M StockTransaction_Fact.tbl
1- Now after we loaded that data into Vertica we need to run the audit_license_size() function. -this will populate the license_audits table. -it runs in the backgound but you can anticipate it's action by executing:
SELECT audit_license_size();
2 - We need to get the compression Ratio that our database is using.
  • for this use the query bellow
SELECT cast(database_size_bytes /
(SELECT SUM(used_bytes) FROM v_monitor.projection_storage) as numeric(10,3)) as compression_ratio
FROM v_catalog.license_audits
ORDER BY audit_start_timestamp DESC LIMIT 1;
3- Next calculate the size of your table based on the projection_storage data.
  • for this use the query bellow
select cast(SUM(used_bytes) / (1024 ^ 3) AS numeric(3,3))AS compressed_space
  from projection_storage
 where anchor_table_name = 'StockTransaction_Fact';
4- Finally you combine the results and find the raw size of your data.
  • for this use the query bellow
select cast((compressed_space * compression_ratio)as numeric(10,3)) as raw_size;
In my case the raw_size result was 96% of my Raw load file size(the 4 % might be the 'delimiter' maybe ! ).