How to work External Tables in Vertica

In Vertica we can make use of external tables by using  CREATE EXTERNAL TABLE AS COPY syntax.

So, what is an External Table ?

  • basically is a flat file on your operational system that can be queried from inside the database as if it was a database table.

  • there is no ROS or WOS concepts when working with External Tables.

External tables are useful and is a very flexible mechanism to feed data to our systems.

Point of strength while working with External Tables

  • no internal work is done to maintain them(refreshes,statistics,..
  • easy to build
  • take advantage of the server managed parallelism
  • no delete vectors are created
  • no extra work for the TM(tuple mover)
  • can read from compressed files(bzip2)
  • can combine multiple files into one table
  • data transformation during the load time
Syntax used to create External Tables in Vertica  To create an external table you need to have superuser privileges(required in order to access the tables via select statement) and also have READ access on the storage location. Basic Syntax :
CREATE EXTERNAL TABLE Table_Name (col1 data type)
 AS COPY FROM '/full path to the data file'
 DELIMITER ',';
For the complete syntax follow this link to Vertica official documentation.

Let's see what is not supported while creating External Tables:

  • ORDER BY
  • ENCODED BY
  • KSAFE
  • PARTITION BY
  • UNSEGMENTED

Examples of create External Tables in Vertica

  • Create external table using a csv file.
create external table ext(
  id int,
  name1 varchar(20),
  name2 varchar(20),
  email varchar(40),
  country varchar(30),
  ip varchar(15)
) as copy from '/tmp/data/xaa' delimiter ',';
  •  Create external table using multiple csv file.
  • you can use multiple files located in the same location '/tmp/data/*'
create external table ext(
  id int,
  name1 varchar(20),
  name2 varchar(20),
  email varchar(40),
  country varchar(30),
  ip varchar(15)
) as copy from '/tmp/data/*' delimiter ',';
  • Create external table using bzip2 compressed files.
create external table ext(
  id int,
  name1 varchar(20),
  name2 varchar(20),
  email varchar(40),
  country varchar(30),
  ip varchar(15)
) as copy from  '/tmp/data/xaa.bz2'  BZIP delimiter ',';

What type of actions are not permitted when using External Tables

  • INSERT
(dbadmin@:5433) [dbadmin]  insert into ext values(1111,'Lisa',
'Wheeler','[email protected]','Indonesia','127.42.139.76');
ERROR 5263: Unsupported access to external table
  •  UPDATE
(dbadmin@:5433) [dbadmin] * update ext set id=id+1 where id=1;
ERROR 5263: Unsupported access to external table
  •  DELETE
(dbadmin@:5433) [dbadmin] * delete from ext where id=1;
ERROR 5263: Unsupported access to external table
  • TEXT INDEX
(dbadmin@:5433) [dbadmin] * CREATE TEXT INDEX text_index ON ext (id, email);
ERROR 6170: Cannot index external table 'ext'
HINT: In addition to supporting text indexing, native Vertica tables have
much better unindexed query performance

Possible actions on External Tables

  •   Add PRIMARY KEY
(dbadmin@:5433) [dbadmin] * alter table ext ADD PRIMARY KEY(id);
WARNING 2623: Column "id" definition changed to NOT NULL
ALTER TABLE
  • JOIN
  • joins are possible between external tables and also between and external table and regular table.
(dbadmin@:5433) [dbadmin] * select e.id,t.name1 from ext e
                             join tbl1 t
                             on e.id=t.id limit 1;
id | name1
-----+-------

500 | Jose
  • Merge
  • you will always have the external table as the source table.
 MERGE /*+ direct */
INTO tbl1  --target

USING ext  --source

ON tbl1.id = ext.id
WHEN MATCHED THEN
  UPDATE
     SET id      = ext.id,
         name1   = ext.name1,
         name2   = ext.name2,
         email   = ext.email,
         country = ext.country,
         ip      = ext.ip
WHEN NOT MATCHED THEN
  INSERT
    (id, name1, name2, email, country, ip)
  VALUES
    (ext.id, ext.name1, ext.name2, ext.email, ext.country, ext.ip);
  •  Views
  • views can be create on a external table
(dbadmin@:5433) [dbadmin] * create view view_ext as select * from ext limit 1;
CREATE VIEW
(dbadmin@:5433) [dbadmin]  select * from view_ext ;
id | name1 | name2 | email | country | ip
-----+---------+---------+------------------------+---------+-----------------

499 | William | Simmons | wsimmonsdu@nytimes.com | Sweden | 192.188.175.171
(1 row)
  •  Alter External Table name
(dbadmin@:5433) [dbadmin] * alter table ext rename to ext1;
ALTER TABLE

Thing i don't like about External Table in Vertica

  • The Create table statement shows no error and the table is create in the catalog even if there is no file present on the file system to feed the data to it. You only get the error when you try to query the table.
  • i think this should be treated/validated.
(dbadmin@:5433) [dbadmin]  create external table ext1(id int
,name1 varchar(20),name2 varchar(20),email varchar(40),country varchar(30),ip varchar(15))
 as copy from '/tmp/data/xaa' delimiter ',' ABORT ON ERROR;
CREATE TABLE

(dbadmin@:5433) [dbadmin]  select * from ext1;
ERROR 2886: Could not open file [/tmp/data/xaa] for reading; No such file or directory
HINT: Make sure the file is available on the specified node. If using ON ANY NODE, it must
 be available at this path on all nodes in the cluster since any node could load the file

(dbadmin@:5433) [dbadmin] * d ext1;
 List of Fields by Tables
 Schema | Table | Column | Type | Size | Default | Not Null | Primary Key | Foreign Key
--------+-------+---------+-------------+------+---------+----------+-------------+-------------

 public | ext1 | id | int | 8 | | f | f |
 public | ext1 | name1 | varchar(20) | 20 | | f | f |
 public | ext1 | name2 | varchar(20) | 20 | | f | f |
 public | ext1 | email | varchar(40) | 40 | | f | f |
 public | ext1 | country | varchar(30) | 30 | | f | f |
 public | ext1 | ip | varchar(15) | 15 | | f | f |
(6 rows)