Vertica ERROR: Join did not fit in memory - is happening because memory resource are in fault. This might happen at the moment you are running the query where your system is already using a part of the allocated memory so it will not fit into memory and when the system is not busy the join fits in the memory with no problems. The fastest way to solve this error is to enable join spill.
dbadmin= select add_vertica_options('EE', 'ENABLE_JOIN_SPILL');
set_vertica_options
--------------------------------------------------------------
EE Vertica Options
--------------------
ENABLE_JOIN_SPILL
dbadmin= select clr_vertica_options('EE', 'ENABLE_JOIN_SPILL');
set_vertica_options
--------------------------------------------------------------
EE Vertica Options
--------------------
ENABLE_JOIN_SPILL
dbadmin= SELECT /*+set_vertica_options(EE, ENABLE_JOIN_SPILL)*/ * from dual;
dummy
-------
X
(1 row)
by default this option is disabled as we can see in the CONFIGURATION_PARAMETERS table.
dbadmin= SELECT * FROM CONFIGURATION_PARAMETERS where parameter_name='EEVerticaOptions';
-[ RECORD 1 ]-----------------+----------------------------------------
node_name | ALL
parameter_name | EEVerticaOptions
current_value |
catalog_value |
database_value |
default_value |
source | DEFAULT
is_mismatch | f
groups |
change_under_support_guidance | f
change_requires_restart | f
description | Use as recommended by Technical Support
dbadmin= SELECT set_config_parameter('EEVerticaOptions','ENABLE_JOIN_SPILL');
-[ RECORD 1 ]--------+---------------------------
set_config_parameter | Parameter set successfully
Query the CONFIGURATION_PARAMETERS table again.
dbadmin= SELECT * FROM CONFIGURATION_PARAMETERS where parameter_name='EEVerticaOptions';
-[ RECORD 1 ]-----------------+----------------------------------------
node_name | ALL
parameter_name | EEVerticaOptions
current_value | ENABLE_JOIN_SPILL
catalog_value | ENABLE_JOIN_SPILL
database_value | ENABLE_JOIN_SPILL
default_value |
source | DATABASE
is_mismatch | f
groups |
change_under_support_guidance | f
change_requires_restart | f
description | Use as recommended by Technical Support
dbadmin= ALTER DATABASE d CLEAR EEVerticaOptions;
ALTER DATABASE
dbadmin= SELECT * FROM CONFIGURATION_PARAMETERS where parameter_name='EEVerticaOptions';
-[ RECORD 1 ]-----------------+----------------------------------------
node_name | ALL
parameter_name | EEVerticaOptions
current_value |
catalog_value |
database_value |
default_value |
source | DEFAULT
is_mismatch | f
groups |
change_under_support_guidance | f
change_requires_restart | f
description | Use as recommended by Technical Support
- as per this parameter description should only be altered under Vertica Support guidance, this if you are using a EE license if you are running a CE version of Vertica database that you should now that enabling join spill will allow your query to finish but with performance costs as the spill will be written to disk
- are your Vertica cluster resources well planned ? you could follow this guide to see what is recommended by Vertica.
- have you run DB designer after you have loaded the data ? Is there a comprehensive design present on your database ? if so are any incremental design required ?
- using query specific projection will improve your performance as Vertica will require less resource.
- try to use optimized statement, look for a better join order, try to implement sub-queries, are you predicates pushed down ?
- do you really need all those rows ?
- make sure you use proper filters, these way Vertica will do less work.
- it is often helpful when you are exploring a problem to break a query down into several steps and create temporary tables to hold the intermediate results. After you have worked through the problem, combining the queries into one query by using in-line views can be more efficient. However, under certain circumstances it is more efficient to use temporary tables. You should try both methods to determine which is more efficient for your case.
- the explain plan will allow you to see what projections are used by your query, view the plan and make sure the expected projections are used if they are not make sure the optimizer gets to use them.