Hive HQL introduction at the Hive Command Line

    Best way to learn HIVE and the rest of Hadoop puzzle pieces is to get one of the "Hadoop Distros" available now days such as Horton, Cloudera or MapR.

   In this article i will use Horton Work Hadoop platform, is easy to use and they have some great free online documentation that will get on your way with learning Hadoop and all its components.

So, What is HIVE ? 

  Apache Hive is the data warehouse infrastructure that runs on top of Apache Hadoop and provides data ad-hoc query and analysis of large data-sets. It provides a mechanism to project structure onto the data in Hadoop and to query that data using a SQL-like language called HiveQL (HQL).

 We will assume you have downloaded and started the Horton Sandbox on your local machine, if not follow this link to see how is done.

To start using HIVE you just need to run the hive command:

[root@sandbox ~]# hive
WARNING: Use "yarn jar" to launch YARN applications.

Logging initialized using configuration in file:/etc/hive/2.4.0.0-169/0/hive-log4j.properties
hive
Hive HQL is very similar to MySQL SQL language, i said is similar and not exactly the same !

So let us go over some examples of using HiveQL  that similar to MySQL Sql

Metadata Commands 

Create a database 
hive create database testdb;
OK
Time taken: 4.752 seconds
Show database
hive show databases;
OK
testdb
Time taken: 0.206 seconds, Fetched: 1 row(s)
Use Database 
hive use testdb;
OK
Time taken: 5.562 seconds
Create table in Hive 
  • we are going to go over data type in future tutorial.
hive create table tblOne ( Name string);
OK
Time taken: 1.844 seconds
Describe Table 
  • describe also comes with FORMATTED|EXTENDED options that will be covered in future tutorials.
hive describe tblOne;
OK
name                    string
Time taken: 0.632 seconds, Fetched: 1 row(s)
List Tables in a Hive Database
hive show tables;
OK
tblone
Time taken: 0.45 seconds, Fetched: 1 row(s)

Data Retrival/Select Commands 

Select Statement using * or column name
hive select * from tblone;
OK
1
Time taken: 0.818 seconds, Fetched: 1 row(s)


hive select name from tblone;
OK
1
Time taken: 1.62 seconds, Fetched: 1 row(s)
Select using predicates:
hive select * from tblone where name='1';
OK
1
Time taken: 1.823 seconds, Fetched: 1 row(s)
Select using sort
  • a job is created for this operation.
hive select * from tblone order by name desc;
Query ID = root_20160819030757_f9f7537a-8212-4358-9a79-f665b7ca94ad
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 02/02  [==========================] 100%  ELAPSED TIME: 7.91 s
--------------------------------------------------------------------------------

OK
1
Time taken: 10.613 seconds, Fetched: 1 row(s)
Count/Aggregate, Group  and order data in HiveQL
hive select count(*),name from tblone group by name order by name desc;
Query ID = root_20160819030949_530ac19b-326c-48ac-8131-504380725a60
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
Reducer 3 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 03/03  [==========================] 100%  ELAPSED TIME: 7.47 s
--------------------------------------------------------------------------------

OK
1       1
Time taken: 9.932 seconds, Fetched: 1 row(s)
Select run a count(*) in HiveQL
hive select max(name) from tblone;
Query ID = root_20160819031105_92ae2105-6bb0-4f8c-891e-79e891e1647a
Total jobs = 1
Launching Job 1 out of 1


Status: Running (Executing on YARN cluster with App id application_1471501976636_0006)

--------------------------------------------------------------------------------

        VERTICES      STATUS  TOTAL  COMPLETED  RUNNING  PENDING  FAILED  KILLED
--------------------------------------------------------------------------------

Map 1 ..........   SUCCEEDED      1          1        0        0       0       0
Reducer 2 ......   SUCCEEDED      1          1        0        0       0       0
--------------------------------------------------------------------------------

VERTICES: 02/02  [==========================] 100%  ELAPSED TIME: 6.72 s
--------------------------------------------------------------------------------

OK
1
Time taken: 9.252 seconds, Fetched: 1 row(s)
This are just a few of the most used HiveQL commands that can be related to MySQL SQL language. Done with this tutorial ? Jump into Hive Data Types and get more knowledge  For the full manual on Hive HQL see the following link