Hdfs Commands Basics for Beginners

Hadoop HDFS commands is a Java-based file system that provides scalable and reliable data storage in the Hadoop Ecosystem. So, you need to know basic HDFS commands to work in HDFS.
Let’s first discuss why HDFS is used and the advantages of using it in Hadoop.
HDFS – Features and Advantages
HDFS is popularly known as Hadoop Distributed File System,which is the core component of Hadoop. HDFS is a java-based file system and is the place where all the data in the Hadoop cluster resides.In typical terms, Hadoop has the Master-Slave architecture. This is named in perspective to the HDFS.
It is called as Master-Slave architecture because there is a Master which takes control of all the Slaves. Here the Master is named as Namenodes and the Slaves are named as Datanodes.
HDFS has been restructured in the second version of Hadoop to support multiple types of data processing units.
HDFS has become a key tool for managing pools of Big Data and supporting Big Data Analytics applications.
The advantages of HDFS in clusters are as follows:
  • Offers a cost effective storage solution for businesses.
  • Uses commodity direct attached storage and shares the cost of the network & computers.
  • It is highly scalable storage platform, because it can store and distribute very large data sets across hundreds of inexpensive servers that operate in simultaneously.
  • Businesses can use Hadoop to derive valuable insights from data sources such as social media, email conversations or clickstream data(flexible)
  • Maps data quickly, wherever it is located on a cluster.
  • When data is sent to an individual node, that data is also replicated to other nodes in the cluster. Meaning that, in the event of failure there is another copy of the data is available for use.

The HDFS commands are shared between different Hadoop distributions.

  1. General Syntax – hadoop dfs [COMMAND [COMMAND_OPTIONS]
  2. Put Command – hadoop dfs –put </source path> </destination path>
  3. List Command – hadoop dfs –ls </source path>
  4. Get Command – hadoop dfs –get </source path> </destination path>
  5. Make Directory Command – hadoop dfs –mkdir </source path>
  6. View contents of particular file – hadoop dfs –cat </path[filename]>
  7. Duplicating a Complete File inside the HDFS – hadoop dfs –copyFromLocal </source path> </destination path>
  8. Duplicating a File – hadoop dfs –copyToLocal </source path> </destination path>
  9. Removing the File – hadoop dfs –rm </path[filename]>
  10. Run a DFS Filesystem to Check Utility – hadoop fsck </file path>
  11. Run a Cluster Balancing Utility – hadoop balancer
  12. Check Directory Space – hadoop dfs -du -s -h </file path>
  13. List all the Hadoop Commands – hadoop fs [options]
  14. For Help – hadoop fs -help

Basic HDFS Commands

Moving forward to HDFS commands, we need to understand the syntax of each command. The general syntax is as follows:
 hadoop dfs [COMMAND [COMMAND_OPTIONS]
This will run a filesystem command on the file system supported in Hadoop (HDFS). The various Command options are shown below:
Let’s discuss each of these commands in detail.

1. Put Command

The ‘put’command feeds the data in to the HDFS.
Syntax: hadoop dfs –put </source path> </destination path>

2. List Command

The ‘list’command displays all the available files inside a particular path.
Syntax: hadoop dfs –ls </source path>

3.Get Command

The ‘get’ command copies the entire contents of the mentioned file to the local drive.
Syntax: hadoop dfs –get </source path> </destination path>

4. Make Directory Command

The ‘mkdir’ command creates a new directory in the specified location.
Syntax: hadoop dfs –mkdir </source path>

5. View contents of particular file

The ‘cat’ command is used to display all the contents of a file.
Syntax: hadoop dfs –cat </path[filename]>

6.Duplicating a Complete File inside the HDFS.

The ‘copyfromlocal’ command will copy file from the local file system to the HDFS.
Syntax: hadoop dfs –copyFromLocal </source path> </destination path>

7. Duplicating a File from HDFS to the Local File System.

The ‘copytolocal’ command will copy files from the HDFS to the local file system.
Syntax: hadoop dfs –copyToLocal </source path> </destination path>

8.Removing the File

The command ‘rm’ will delete the file stored inside the HDFS.
Syntax: hadoop dfs –rm </path[filename]>

9. Run a DFS Filesystem to Check Utility

The command ‘fsck’ is used for checking the consistency of a file system
Syntax: hadoop fsck </file path>

10. Run a Cluster Balancing Utility

The command ‘balancer’ will check for work load on nodes in cluster and balance it.
Syntax: hadoop balancer

11. Check Directory Space in HDFS

The command will show the file size occupied by file inside cluster.
Syntax: hadoop dfs -du -s -h </file path>

12. List all the Hadoop File System Shell Commands

The ‘fs’ command lists down all the shell commands of the Hadoop File System.
Syntax: hadoop fs [options]
[hadoop@acadgild ~]$ hadoop fs
Usage: hadoop fs [generic options]
      [-appendToFile <localsrc> ... <dst>]
      [-cat [-ignoreCrc] <src> ...]
      [-checksum <src> ...]
      [-chgrp [-R] GROUP PATH...]
      [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
      [-chown [-R] [OWNER][:[GROUP]] PATH...]
      [-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>]
      [-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
      [-count [-q] [-h] <path> ...]
      [-cp [-f] [-p | -p[topax]] <src> ... <dst>]
      [-createSnapshot <snapshotDir> [<snapshotName>]]
      [-deleteSnapshot <snapshotDir> <snapshotName>]
      [-df [-h] [<path> ...]]
      [-du [-s] [-h] <path> ...]
      [-expunge]
      [-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
      [-getfacl [-R] <path>]
      [-getfattr [-R] {-n name | -d} [-e en] <path>]
      [-getmerge [-nl] <src> <localdst>]
      [-help [cmd ...]]
      [-ls [-d] [-h] [-R] [<path> ...]]
      [-mkdir [-p] <path> ...]
      [-moveFromLocal <localsrc> ... <dst>]
      [-moveToLocal <src> <localdst>]
      [-mv <src> ... <dst>]
      [-put [-f] [-p] [-l] <localsrc> ... <dst>]
      [-renameSnapshot <snapshotDir> <oldName> <newName>]
      [-rm [-f] [-r|-R] [-skipTrash] <src> ...]
      [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
      [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
      [-setfattr {-n name [-v value] | -x name} <path>]
      [-setrep [-R] [-w] <rep> <path> ...]
      [-stat [format] <path> ...]
      [-tail [-f] <file>]
      [-test -[defsz] <path>]
      [-text [-ignoreCrc] <src> ...]
      [-touchz <path> ...]
      [-usage [cmd ...]]
Generic options supported are
-conf <configuration file>     specify an application configuration file
-D <property=value>           use value for given property
-fs <local|namenode:port>     specify a namenode
-jt <local|resourcemanager:port>   specify a ResourceManager
-files <comma separated list of files>   specify comma separated files to be copied to the map reduce cluster
-libjars <comma separated list of jars>   specify comma separated jar files to include in the classpath.
-archives <comma separated list of archives>   specify comma separated archives to be unarchived on the compute machines.
The general command line syntax is
bin/hadoop command [genericOptions] [commandOptions]
[hadoop@acadgild ~]$

Last but not the least, always ask for help!

13.Asking for Help

The ‘help’ command is for asking for help or querying a particular question.
Command: hadoop fs -help

No comments:

Post a Comment