row store , memory based method to add data to vertica.
is used to reduce latency.
is used to recive large numbers of transactions containing small amounts of data ,is
also called TRICKLE-LOAD
Commands to load data in WOS
LCOPY -used with ODBC loads
COPY -used with other types of loads
Read optimized Storage (ROS)
column store , disk based method to add data to vertica.
is used in to recive large amounts of data called BULK LOAD .
LCOPY DIRECT-used with ODBC loads
COPY DIRECT-used with other types of loads
LOAD Commands :
LCOPY - best to use this command when moving data from a remote location using ODBC.
COPY - best to use this command when moving data from within the cluster.
After the commands run the data will be redistibuted across the db nodes.
Data is moved from WOS to ROS by the TUPLE MOVER , the data movement is called
"moveout"
-the data in WOS is:
uncompressed
unsorted
sorted in rows
sorted in insertion order
During the moveout the data is transformed so that will be (commpresed,sorted,converted in columnar format).
// ]]>
Database parameters that control the behavior of the "moveout" process:
MoveOutInterval(-will check the WOS fill levels)
-is 5 min as default
MoveOutMaxAgeTime(-how long data can reside on the WOS before being moved out to ROS)
default is 30 min.
MoveOutSizePct (- he percentage that is required for the WOS to move data to ROS)
-default is 0% (data will always move to ROS)
WOSDATA resource pool parameters
(set adjustments in memory area and priority for the WOS activities)
Adding data in Vertica
INSERT
COPY(to WOS and ROS)
LCOPY
INSERT command
is used frequentlly and it has high overhead.
is stored in the WOS unless you use the /*+DIRECT*/ is used.
Ex:
COPY command
-used for bulk loads directly to ROS(to disk).
options:
DIRECT - trickle loads of frequent data contained into smaller amounts of data to WOS(in memory)
AUTO - fast approach of loading a datawarehouse.
bulk loads data from files to Vertica.
// ]]>
COPY DIRECT command
used for very large loads
is more efficient then AUTO
the WOS is not involved
no MoveOuts of data from WOS to ROS is done.
data in ROS is fully available and optimized upon commit.
To many ROS containers can cause ROS pushback(-this occurs when the maximum container number is reached
-to monitore this use the ros_count from projection_storage table).
ROS pushback solutions :
1-increase the MAXCONCURRENCY in the Tuple Mover Resource pool.
2-reduce partitions in the load table .
3-stop the load and allow the database to process and merge containers.
UPDATEING data in VERTICA
DELETE
UPDATE
DELETE
deleted data is not immediatly removed from the DB.
the records are marked fro delete by the "delete vector".
the objects that are marked by the delete vector can not be queryed thow they are
phisicly present .
it provides faster way of removing the data using the vectors marking then
physical deletion
"Purge command " will remove files that are beyond AHM (ancient history mark)
purge can be run manually.
purge is run automaticly on margeout.
UPDATE
update is an insert+delete combo.
// ]]>
PARTITIONING
Auto-partion in Vertica
Partition pruning
Auto-partion in Vertica
in Vertica you can setup our schema to have partioned tables
when loading data into partitioned tables multiple ROS containers will be created since
a container can contain data from a single partition only.
partition should not go beyond 700 per projection.
Post-Loading Tasks
Update the Statistics - when to run
this should be done after a initial load
if data was changed more then 50% from last statistics collection.
when query plans change overtime.
How to update statistics:
we can run statistics for a single table or for all tables :