How to Manage and Work with Data Collector in Vertica

Data Collector is the utility that retains essential information on performance and resource utilization counters. The main use of the Data Collector utility is to assist into finding performance issues and help improve the overall performance of your database. Now that we know what is the Data Collector utility let's go and see how data is collected where is it stored and some ways we can manage this data to make it useful to us. Data Collector comes enabled by default and it can be disabled by the database superuser. The data is retained by the Data Collector utility in the DataCollector directory under the own Vertica/catalog path and also is available to be queried under the v_internal schema and the table are identified by the dc_ prefix.

Topics treated in this article.

  1. Enabling and disabling the Data Collector.
  2. Configure Data Collector retention policy.
  3. How to see the retention policy for a specific component.
  4. How to alter/modify the retention policy for a specific component
  5. Set data retention policy by time interval.
  6. Manage the data collected in the Data Collector tables and log files.
  7. How to manipulate the DC data.
  8. How to synchronize the disk storage with the data collected in the memory.
  9. How to move change the location of the Data Collector logs.
Note: The retention of data can be adjusted/configured by the database superuser.

Enabling and disabling the Data Collector

  • To disable the Data Collector:
SELECT SET_CONFIG_PARAMETER('EnableDataCollector', '0');
  • To re-enable the Data Collector:
SELECT SET_CONFIG_PARAMETER('EnableDataCollector', '1');

Configure Data Collector retention policy

In Vertica v7.1.1-0 we have about 158 components that can have their retention policy configured as per our needs. To see the components and their description use the query bellow:
dbadmin= SELECT DISTINCT component, description FROM data_collector ORDER BY 1 ASC;
              component               |                                                    description
--------------------------------------+--------------------------------------------------------------------------------------------------------------------

 AllocationPoolStatistics             | Information about global memory pools, which generally cannot be recovered without restart
 AllocationPoolStatisticsByDay        | Information about global memory pools, which generally cannot be recovered without restart (historical, by day)
 AllocationPoolStatisticsByHour       | Information about global memory pools, which generally cannot be recovered without restart (historical, by hour)
 AllocationPoolStatisticsByMinute     | Information about global memory pools, which generally cannot be recovered without restart (historical, by minute)
 AllocationPoolStatisticsBySecond     | Information about global memory pools, which generally cannot be recovered without restart (historical, by second)
 AnalyzeStatistics                    | History of statistics collection
 Backups                              | Monitoring successful backups
 BlockMemoryManagerEvents             | Events for Block Memory Manager

 How to see the retention policy for a specific component

-for this task we need to use the  get_data_collector_policy() function.
dbadmin= select get_data_collector_policy('Backups');
                       get_data_collector_policy
------------------------------------------------------------------------

 10KB kept in memory, 10KB kept on disk. Time based retention disabled.
(1 row)
-also we can use the data_collector table.
dbadmin= SELECT *  FROM data_collector where component='Backups';
-[ RECORD 1 ]----------+------------------------------

node_name              | v_d_node0001
component              | Backups
table_name             | dc_backups
description            | Monitoring successful backups
access_restricted      | t
in_db_log              | f
in_vertica_log         | f
memory_buffer_size_kb  | 10
disk_size_kb           | 10
set_interval           | f
interval_time_s        | 0
record_too_big_errors  | 0
lost_buffers           | 0
lost_records           | 0
retired_files          | 0
retired_records        | 0
current_memory_records | 0
current_disk_records   | 0
current_memory_bytes   | 0
current_disk_bytes     | 0
first_time             |
last_time              |
kb_per_day             |
  • where we see that the table name that will hold the collected data is dc_backups, it's description, memory and disk space allocated to it.

How to alter/modify the retention policy for a specific component

  • for this task we need to use the set_data_collector_policy('component', 'memoryKB', 'diskKB') function. I will use the same backup component for our example:
dbadmin= select set_data_collector_policy('Backups', '20', '20');
-[ RECORD 1 ]-------------+----
set_data_collector_policy | SET

-- see the alterd policy
dbadmin= select get_data_collector_policy('Backups');
-[ RECORD 1 ]-------------+-----------------------------------------------------------------------
get_data_collector_policy | 20KB kept in memory, 20KB kept on disk. Time based retention disabled.
- we managed to alter the component and allocated 20KB of memory and 20KB of disk space

 Set data retention policy by time interval

  • for this task we will use the SET_DATA_COLLECTOR_TIME_POLICY() function.
  • the function will allow you to retain the data on disk based on a time interval.
  •  How to enable time interval retention for a single component
dbadmin=  SELECT set_data_collector_time_policy('Backups', '7 day'::interval);
 set_data_collector_time_policy
--------------------------------

 SET
(1 row)
dbadmin= select get_data_collector_policy('Backups');
                  get_data_collector_policy
--------------------------------------------------------------

 20KB kept in memory, 20KB kept on disk. 7 days kept on disk.
(1 row)
[otw_is sidebar=otw-sidebar-1]
  •  How to disable time interval retention for a single component
dbadmin= SELECT set_data_collector_time_policy('Backups', '-1');
 set_data_collector_time_policy
--------------------------------

 SET
(1 row)
 Important: You need to find how much space your DC tables are using per day * interval so you wont run out of disk space.
(dbadmin@:5433) [dbadmin] * select node_name,cast(sum(kb_per_day)/(1024^2) as numeric(10,3))||' Mb' as Used_Mb_per_Day from v_monitor.data_collector group by 1;
     node_name     | Used_Mb_per_Day
-------------------+-----------------

 v_dcg023_node0001 | 9.976 Mb
 Manage the data collected in the Data Collector tables and log files
  • in the DataCollector directory located in our Vertica node catalog we can find 3 types of files as per each component. This files as follows:
  • CREATE_
  • COPY_
The Create and Copy files can be useful if you what to load your DC data onto another server.

 How to manipulate the DC data.

  •  To clear the disk data and memory data and reset your DC use the CLEAR_DATA_COLLECTOR('component') function. - this function can be used to clear all data or just the data for a specific component.
--check that we have data in the WosesDestroyed component disk stored file.
[dbadmin@DCXLG001 DataCollector]$ cat WosesDestroyed_482341541015797.log | wc
    170     170    3682

-- check that we have data in the WosesDestroyed component v_internal.dc_woses_destroyed
dbadmin= select count(*) from dc_woses_destroyed;
 count
-------
    10

-- run the CLEAR_DATA_COLLECTOR() funtion
dbadmin= select CLEAR_DATA_COLLECTOR('WosesDestroyed');
 CLEAR_DATA_COLLECTOR
----------------------
 CLEAR
(1 row)

-- see that data was cleared
dbadmin= select count(*) from dc_woses_destroyed;
 count
-------
 0
  •  The following command clears data collection for all components on all nodes:
SELECT CLEAR_DATA_COLLECTOR();
CLEAR_DATA_COLLECTOR
----------------------

CLEAR
 Note: user must be superuser in order to proceed with this action. [otw_is sidebar=otw-sidebar-1]

How to synchronize the disk storage with the data collected in the memory

- for this task we need to use the flush_data_collector('
-- synchronize all

dbadmin= SELECT flush_data_collector();
 flush_data_collector
----------------------

 FLUSH
(1 row)

-- synchronize specific componenet

dbadmin= SELECT flush_data_collector('WosesDestroyed');
 flush_data_collector
----------------------

 FLUSH
(1 row)

 How to move change the location of the Data Collector logs.

To move the data collector logs and instructions to other storage locations use the set_data_collector_storage_location(see here how to do it. I already have the location created as seen bellow:
dbadmin= select * from storage_locations where location_label='DataCollector';
    location_id    |  node_name   |  location_path   | location_usage | is_retired | location_label | rank | throughput | latency
-------------------+--------------+------------------+----------------+------------+----------------+------+------------+---------

 45035996273757994 | v_d_node0001 | /u03/data_store1 | DATA           | f          | DataCollector  |    0 |         71 |     818
(1 row)
 Change storage location for the DataCollector
dbadmin= select set_data_collector_storage_location('DataCollector');
set_data_collector_storage_location
-------------------------------------
SET
(1 row)
Check the storage location content
dbadmin= ! ls -la /u03/data_store1/| head -2
total 10004
drwxr-xr-x 2 dbadmin verticadba   36864 Apr 17 10:28 .
dbadmin= ! ls -la /u03/data_store1/| head -20
total 10388
drwxr-xr-x 2 dbadmin verticadba   36864 Apr 17 10:28 .
drwxr-xr-x 4 oracle  oinstall      4096 Apr 17 09:48 ..
-rw------- 1 dbadmin verticadba   20440 Apr 17 10:28 AllocationPoolStatistics_482596048217937.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 AllocationPoolStatisticsByDay_482596048227182.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 AllocationPoolStatisticsByHour_482596048217996.log

-rw------- 1 dbadmin verticadba    1294 Apr 17 10:28 AllocationPoolStatisticsByMinute_482596048217984.log

-rw------- 1 dbadmin verticadba   65076 Apr 17 10:28 AllocationPoolStatisticsBySecond_482596048217961.log

-rw------- 1 dbadmin verticadba   28072 Apr 17 10:28 AllocationPoolStatisticsBySecond_482596100001870.log

-rw------- 1 dbadmin verticadba   18031 Apr 17 10:28 BlockMemoryManagerStatistics_482596048228656.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 BlockMemoryManagerStatisticsByDay_482596048228702.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 BlockMemoryManagerStatisticsByHour_482596048228690.log

-rw------- 1 dbadmin verticadba    1551 Apr 17 10:28 BlockMemoryManagerStatisticsByMinute_482596048228678.log

-rw------- 1 dbadmin verticadba   64782 Apr 17 10:28 BlockMemoryManagerStatisticsBySecond_482596048228668.log

-rw------- 1 dbadmin verticadba   15372 Apr 17 10:28 BlockMemoryManagerStatisticsBySecond_482596108001823.log

-rw------- 1 dbadmin verticadba   15768 Apr 17 10:28 CatalogInfo_482596048227205.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 CatalogInfoByDay_482596048227264.log

-rw------- 1 dbadmin verticadba       0 Apr 17 10:27 CatalogInfoByHour_482596048227243.log

-rw------- 1 dbadmin verticadba    1240 Apr 17 10:28 CatalogInfoByMinute_482596048227232.log

-rw------- 1 dbadmin verticadba   51465 Apr 17 10:28 CatalogInfoBySecond_482596048227219.log
Great we managed to move the DataCollector files to the new location.