Hive HQL introduction at the Hive Command Line

Best way to learn HIVE and the rest of Hadoop puzzle pieces is to get one of the "Hadoop Distros" available now days such as Horton, Cloudera or MapR.

In this article i will use Horton Work Hadoop platform, is easy to use and they have some great free online documentation that will get on your way with learning Hadoop and all its components.

So, What is HIVE ?

Apache Hive is the data warehouse infrastructure that runs on top of Apache Hadoop and provides data ad-hoc query and analysis of large data-sets. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).

We will assume you have downloaded and started the Horton Sandbox on your local machine, if not follow this link to see how is done.

To start using HIVE you just need to run the hive command:

[root@sandbox ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
hive

Hive HQL is very similar to MySQL SQL language, i said is similar and not exactly the same !

So let us go over some examples of using HiveQL that similar to MySQL Sql

Metadata Commands

Create a database

hive create database testdb;
OK
Time taken: 4.752 seconds

Show database

hive show databases;
OK
testdb
Time taken: 0.206 seconds, Fetched: 1 row(s)

Use Database

hive use testdb;
OK
Time taken: 5.562 seconds

Create table in Hive

we are going to go over data type in future tutorial.

hive create table tblOne ( Name string);
OK
Time taken: 1.844 seconds

Describe Table

describe also comes with FORMATTED|EXTENDED options that will be covered in future tutorials.

hive describe tblOne;
OK
name                    string
Time taken: 0.632 seconds, Fetched: 1 row(s)

List Tables in a Hive Database

hive show tables;
OK
tblone
Time taken: 0.45 seconds, Fetched: 1 row(s)

Data Retrival/Select Commands

Select Statement using * or column name

hive select * from tblone;
OK
1
Time taken: 0.818 seconds, Fetched: 1 row(s)


hive select name from tblone;
OK
1
Time taken: 1.62 seconds, Fetched: 1 row(s)

Select using predicates:

hive select * from tblone where name='1';
OK
1
Time taken: 1.823 seconds, Fetched: 1 row(s)

Select using sort

a job is created for this operation.

hive select * from tblone order by name desc;
Query ID = root_20160819030757_f9f7537a-8212-4358-9a79-f665b7ca94ad
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 02/02  [==========================] 100%  ELAPSED TIME: 7.91 s
--------------------------------------------------------------------------------

OK
1
Time taken: 10.613 seconds, Fetched: 1 row(s)

Count/Aggregate, Group and order data in HiveQL

hive select count(*),name from tblone group by name order by name desc;
Query ID = root_20160819030949_530ac19b-326c-48ac-8131-504380725a60
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 03/03  [==========================] 100%  ELAPSED TIME: 7.47 s
--------------------------------------------------------------------------------

OK
1       1
Time taken: 9.932 seconds, Fetched: 1 row(s)

Select run a count(*) in HiveQL

hive select max(name) from tblone;
Query ID = root_20160819031105_92ae2105-6bb0-4f8c-891e-79e891e1647a
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 02/02  [==========================] 100%  ELAPSED TIME: 6.72 s
--------------------------------------------------------------------------------

OK
1
Time taken: 9.252 seconds, Fetched: 1 row(s)

This are just a few of the most used HiveQL commands that can be related to MySQL SQL language. Done with this tutorial ? Jump into Hive Data Types and get more knowledge For the full manual on Hive HQL see the following link

Search Loading... Please wait

Subscribe to our Newsletter

Buy me a coffee

Join our Youtube Channel

A Rock Solid, Responsive CSS Framework built to work on all devices big, small and in-between