Apache Hadoop HDFS command line examples

In our last post we saw how we can install Hadoop on single machine, we managed to start and stop the Hadoop components. See the post here In this post we will dive into using the hadoop command line and try to understand the way hadoop stores the file into hdfs. This post will be done at the command line and no GUI will be involved.

The basic syntax of a hadoop command is:

hadoop command [genericOptions] [commandOptions]

Available commands:

-conf <configuration file     specify an application configuration file
-D <property=value            use value for given property
-fs <local|namenode:port      specify a namenode
-jt <local|resourcemanager:port    specify a ResourceManager
-files <comma separated list of files    specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars    specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives    specify comma separated archives to be unarchived on the compute machines.

You can get all this information and more by running the following command:

[hadoop@localhost sbin]$ hadoop fs
Usage: hadoop fs [generic options]
        [-appendToFile <localsrc ... <dst]
        [-cat [-ignoreCrc] <src ...]
        [-checksum <src ...]
        [-chgrp [-R] GROUP PATH...]
....................................
....................................
....................................
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]

In general the hadoop commands are very similar to Linux os commands e.g.:cp, ls , mkdir etc..

So let us Start with some simple and basic commands

List he content of the hadoop root directory:

hadoop fs -ls /

Create new directory in hadoop hdfs:

hadoop fs -mkdir /new_dir

List the content of a directory in hadoop hdfs:

hadoop fs -ls /new_dir

Remove/drop a directory from hadoop hdfs:

hadoop fs -rmdir /new_dir

Note: this will work if the directory is empty.

Remove/drop a directory from hadoop hdfs that contains file or other folders:

hadoop fs -rm -R /new_dir

this is remove the folder and the content recursively.

Load a file into hadoop hdfs:

hadoop fs -put /tmp/file.txt /newdir/

Load more file into hadoop hdfs using a wildcard:

hadoop fs -put /tmp/file* /newdir/

Check to see the content of the folder:

[hadoop@localhost sbin]$ hadoop fs -ls /newdir/
Found 3 items
-rw-r--r--   1 hadoop supergroup          6 2016-09-14 22:51 /newdir/file.txt

-rw-r--r--   1 hadoop supergroup          6 2016-09-14 22:51 /newdir/file2.txt

-rw-r--r--   1 hadoop supergroup          6 2016-09-14 22:51 /newdir/file_new.txt

Multiple files have been loaded.

Read the content of a file that is stored in hadoop:

for this we will need to use the cat option

[hadoop@localhost sbin]$ hadoop fs -cat /newdir/file.txt
1,2,3

Read the content of multiple files that is stored in hadoop:

is the same command but we will use the wildcard for the file name.

[hadoop@localhost sbin]$ hadoop fs -cat /newdir/file*
1,2,3
1,2,3
1,2,3

Append to a file the content of other file/files:

this can be done using a external hadoop source or internal hadoop source.

[hadoop@localhost sbin]$ hadoop fs -appendToFile /tmp/file.txt /newdir/file.txt

[hadoop@localhost sbin]$ hadoop fs -cat /newdir/file.txt
1,2,3
1,2,3

Count the numbers of directories, files and bytes under the paths that match the specified file pattern.

[hadoop@localhost sbin]$ hadoop fs -count /
        2            4                 24 /

See the space capacity:

[hadoop@localhost sbin]$ hadoop fs -df /
Filesystem                    Size   Used   Available  Use%
hdfs://localhost:9000  11170750464  61440  5839032320    0%

--human readable format

[hadoop@localhost sbin]$ hadoop fs -df -h /
Filesystem               Size  Used  Available  Use%
hdfs://localhost:9000  10.4 G  60 K      5.4 G    0%

Fine or locate a file inside the hadoop hdfs:

[hadoop@localhost sbin]$hadoop fs -find / -iname file.txt

/newdir/file.txt

List the ACL(Access Control Lists) of files and directories.

[hadoop@localhost sbin]$ hadoop fs -getfacl /newdir
# file: /newdir
# owner: hadoop
# group: supergroup
getfacl: The ACL operation has been rejected.  Support for ACLs has been disabled by setting dfs.namenode.acls.enabled to false.

And finally the most waited command, the help command:

you can use the -help command with any other command option to see the preferred syntax and the used options.

[hadoop@localhost sbin]$ hadoop fs -help getfacl
-getfacl [-R] <path :
  Displays the Access Control Lists (ACLs) of files and directories. If a
  directory has a default ACL, then getfacl also displays the default ACL.

  -R      List the ACLs of all files and directories recursively.
  <path  File or directory to list.

Note: for the command you are looking to get help for you should not set the hyphen sign.

Wanna see the Video tutorial as well covering the same topic, visit the my YouTube Channel

I hope this was useful and there is more to come.

I do encourage you to like and subscribe to my website and i very much appreciate your support.

Search Loading... Please wait

Subscribe to our Newsletter

Buy me a coffee

Join our Youtube Channel

A Rock Solid, Responsive CSS Framework built to work on all devices big, small and in-between