HDFS(Hadoop Distributed File System)是Hadoop的核心组件之一,本篇文章主要介绍利用Shell命令与HDFS进行交互。

概述

  1. 我们可以在终端输入如下命令查看dfs支持哪些命令

    1. $ bin/hdfs dfs
    2. Usage: hadoop fs [generic options]
    3. [-appendToFile <localsrc> ... <dst>]
    4. [-cat [-ignoreCrc] <src> ...]
    5. [-checksum [-v] <src> ...]
    6. [-chgrp [-R] GROUP PATH...]
    7. [-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...]
    8. [-chown [-R] [OWNER][:[GROUP]] PATH...]
    9. [-concat <target path> <src path> <src path> ...]
    10. [-copyFromLocal [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
    11. [-copyToLocal [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    12. [-count [-q] [-h] [-v] [-t [<storage type>]] [-u] [-x] [-e] [-s] <path> ...]
    13. [-cp [-f] [-p | -p[topax]] [-d] <src> ... <dst>]
    14. [-createSnapshot <snapshotDir> [<snapshotName>]]
    15. [-deleteSnapshot <snapshotDir> <snapshotName>]
    16. [-df [-h] [<path> ...]]
    17. [-du [-s] [-h] [-v] [-x] <path> ...]
    18. [-expunge [-immediate] [-fs <path>]]
    19. [-find <path> ... <expression> ...]
    20. [-get [-f] [-p] [-ignoreCrc] [-crc] <src> ... <localdst>]
    21. [-getfacl [-R] <path>]
    22. [-getfattr [-R] {-n name | -d} [-e en] <path>]
    23. [-getmerge [-nl] [-skip-empty-file] <src> <localdst>]
    24. [-head <file>]
    25. [-help [cmd ...]]
    26. [-ls [-C] [-d] [-h] [-q] [-R] [-t] [-S] [-r] [-u] [-e] [<path> ...]]
    27. [-mkdir [-p] <path> ...]
    28. [-moveFromLocal [-f] [-p] [-l] [-d] <localsrc> ... <dst>]
    29. [-moveToLocal <src> <localdst>]
    30. [-mv <src> ... <dst>]
    31. [-put [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst>]
    32. [-renameSnapshot <snapshotDir> <oldName> <newName>]
    33. [-rm [-f] [-r|-R] [-skipTrash] [-safely] <src> ...]
    34. [-rmdir [--ignore-fail-on-non-empty] <dir> ...]
    35. [-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]]
    36. [-setfattr {-n name [-v value] | -x name} <path>]
    37. [-setrep [-R] [-w] <rep> <path> ...]
    38. [-stat [format] <path> ...]
    39. [-tail [-f] [-s <sleep interval>] <file>]
    40. [-test -[defswrz] <path>]
    41. [-text [-ignoreCrc] <src> ...]
    42. [-touch [-a] [-m] [-t TIMESTAMP (yyyyMMdd:HHmmss) ] [-c] <path> ...]
    43. [-touchz <path> ...]
    44. [-truncate [-w] <length> <path> ...]
    45. [-usage [cmd ...]]
  2. 在终端输入如下命令,可以查看具体命令的作用

    1. # 以查看put命令举例
    2. $ ./bin/hdfs dfs -help put
    3. -put [-f] [-p] [-l] [-d] [-t <thread count>] <localsrc> ... <dst> :
    4. Copy files from the local file system into fs. Copying fails if the file already
    5. exists, unless the -f flag is given.
    6. Flags:
    7. -p Preserves timestamps, ownership and the mode.
    8. -f Overwrites the destination if it already exists.
    9. -t <thread count> Number of threads to be used, default is 1.
    10. -l Allow DataNode to lazily persist the file to disk. Forces
    11. replication factor of 1. This flag will result in reduced
    12. durability. Use with care.
    13. -d Skip creation of temporary file(<dst>._COPYING_).

    常用目录操作

  3. 注意:如果是当前用户第一次使用HDFS,需要先在HDFS中创建当前用户目录。

    1. $ /bin/hdfs dfs mkdir p /user/<your_username>

    -mkdir是创建目录的操作,-p表示创建的是多级目录,父目录与子目录一起创建。

  4. -ls表示列出HDFS某个目录下所有内容

    1. $ /bin/hdfs dfs ls /user
  5. -rm表示删除一个文件,带上-r参数表示删除一个文件夹

    1. $ /bin/hdfs dfs rm -r /user

    常用文件操作

  6. -put命令会把本地文件系统中的文件上传到HDFS中当前用户目录下的相应路径

    1. # 如下命令将“/home/<your_username>/myLocalFile.txt”文件上传至HDFS中的/user/<your_username>/input路径下
    2. $ /bin/hdfs dfs -put /home/<your_username>/myLocalFile.txt input
  7. -cat查看文件内容

    1. $ /bin/hdfs dfs cat input/myLocalFile.txt
  8. -get命令将HDFS中的文件下载到本地文件系统的相应目录下

    1. # 如下命令将HDFS中的"/user/<your_username>/input/myLocalFile.txt"文件下载到本地文件系统中的"/home/fudan/Downloads"目录下
    2. $ /bin/hdfs dfs -get input/myLocalFile.txt /home/fudan/Downloads
  9. -cp命令可以把文件从HDFS中一个目录拷贝到HDFS中另一个目录下

    1. $ /bin/hdfs dfs -cp /user/<your_username>/input/myLocalFile.txt /user/<your_username>/