Many times when we load data in HP Vertica we really do not know what is happening and unless we have a tool such as Vertica Management Console installed and configured capturing information about the load streams is quite hard.
I am going to walk you thru some simple steps that will help manage and understand better at what point is my load stream. The table that captures the information about the Data load streams is called load_streams. This table contains all the information you require to understand the status of your loads. So before you jump in and COPY+PASTE the script go thru some of this bullet points below: 1 - Make sure you setup the STREAM NAME of the Data Load, this will allow you to easy track and measure your data loads over periods of time. 2 - Use DIRECT option when you load large amounts of data. Why DIRECT ? - the DIRECT option enables you to load data straight into the ROS containers. 3 - Put your Rejected data into a table. - i find this very useful as you can query it and work with easy on it to fix any loading issues. Here is a basic syntax of COPY command:copy schema.Table_Name FROM '/tmp/file.csv' SKIP 1 DELIMITER '|' NULL as 'null' DIRECT STREAM NAME 'Load Stream Identifier' REJECTED DATA AS TABLE staging.Table_Name_rej;
SELECT
stream_name,
table_name,
is_executing,
accepted_row_count,
rejected_row_count,
DATEDIFF(ss,load_start::TIMESTAMP,GETDATE()::TIMESTAMP) AS DurationSec,
ROUND((accepted_row_count+rejected_row_count)/DATEDIFF(ss,load_start::TIMESTAMP,GETDATE()::TIMESTAMP),3.0) AS RowsPerSec
FROM
load_streams
WHERE
is_executing='true';
stream_name | table_name | is_executing | accepted_row_count | rejected_row_count | DurationSec | RowsPerSec
-------------------------------------------------------+--------------------+--------------+--------------------+--------------------+-------------+------------
Loading dev.Mapping from Sql Server | UAT_Mapping | t | 29885952 | 1 | 757 | 39479.462
Loading dev.Item from Sql Server | UAT_Item | t | 23785472 | 3919 | 754 | 31550.916
(2 rows)