crypto

image.png

fileSystem子类

ftp

包里是FtpFileSystem

kfs

另一种集群文件系统的hadoop filesystem实现
image.png

相关资料:https://blog.csdn.net/Cloudeep/article/details/4467238

s3 和 s3native

s3

A distributed, block-based implementation of {@link org.apache.hadoop.fs.FileSystem} that uses Amazon S3 as a backing store. Files are stored in S3 as blocks (represented by {@link org.apache.hadoop.fs.s3.Block}), which have an ID and a length. Block metadata is stored in S3 as a small record (represented by {@link org.apache.hadoop.fs.s3.INode}) using the URL-encoded path string as a key. Inodes record the file type (regular file or directory) and the list of blocks. This design makes it easy to seek to any given position in a file by reading the inode data to compute which block to access, then using S3’s support for HTTP Range headers to start streaming from the correct position. Renames are also efficient since only the inode is moved (by a DELETE followed by a PUT since S3 does not support renames). For a single file /dir1/file1 which takes two blocks of storage, the file structure in S3 would be something like this: / /dir1 /dir1/file1 block-6415776850131549260 block-3026438247347758425 Inodes start with a leading /, while blocks are prefixed with block-.

s3native:

A distributed implementation of {@link org.apache.hadoop.fs.FileSystem} for reading and writing files on Amazon S3. Unlike {@link org.apache.hadoop.fs.s3.S3FileSystem}, which is block-based, this implementation stores files on S3 in their native form for interoperability with other S3 tools.

s3本身是amazon云的对象存储,其本身用来存储文件。
s3FileSystem
与Linux文件系统实现非常类似,“文件”在amazon云中按块(S3对象存储中实际存储的文件是块)存储,使用Inode对文件进行索引。这样设计是为了方便进行seek操作。
每个Inode和block都用一个文件存储,区别在于文件名不同:

  • Inode文件名以“/”开头
  • block文件名以“block-”开头

比如,对于一个需要两个块进行存储的文件/dir1/file1,S3中的文件结构如下

  1. / # inode
  2. /dir # inode
  3. /dir/file1 # inode
  4. block-6415776850131549260 # 块
  5. block-3026438247347758425 # 块

S3NativeFileSystem
用于读写S3原生文件。一般作为用于hadoop中S3读写文件的工具(比如上面的S3FileSystem,Inode和block的读写都需要使用S3NativeFileSystem中的方法)

permission

image.png
Hadoop实现了符合POSIX标准的文件权限模型,其类似于Linux的权限模型

  1. Owner/user Group Others
  2. rwx rwx rwx

ViewFs

用于管理多个hadoop文件系统命名空间的方法,类似于Unix挂载表。ViewFileSystem中也会存储MountTableMountPoine
通过如下方式可以在core-site.xml中设置mountTable,即挂载表

  1. <property>
  2. <name>fs.defaultFs</name>
  3. <value>viewfs://ClusterX</value>
  4. </property>
  5. <!-- 设置homedir -->
  6. <property>
  7. <name>fs.viewfs.mounttable.ClusterX.homedir</name>
  8. <value>/home</value>
  9. </property>
  10. <!-- 将本地文件系统的/Users/zrang 映射到viewfs的/home路径上-->
  11. <property>
  12. <name>fs.viewfs.mounttable.ClusterX.link./home</name>
  13. <value>file:///Users/zrwang</value>
  14. </property>
  15. <!-- 将hdfs的/tmp映射到viewfs的/tmp路径上-->
  16. <property>
  17. <name>fs.viewfs.mounttable.ClusterX.link./tmp</name>
  18. <value>hdfs://dn/tmp</value>
  19. </property>

示意图如下
image.png

https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-hdfs/ViewFs.html

Shell

image.png

shell工作流程

shell流程.png

FileSystem