环境准备

虽然我们编写 java 代码的电脑是作为客户端去连接 hdfs 服务器，但是 hdfs 要求如果要读写hdfs就需要在客户端也安装 hadoop。但是hdfs官方又没有Windows版的安装包：

如果是准备在 Linux / mac 环境编写 java 代码连接HDFS，则只需要在Linux系统上也安装一下 hadoop 即可（将Hadoop压缩包解压，然后配置环境变量）。
如果准备在 Windows 环境下编写 java 代码连接 hdfs，则需要在windows系统中安装与hadoop服务器对应版本的winutils.exe和hdfs.dll（因为hdfs默认不支持windows安装）。winutils.exe工具可以在 github 上找到，也可以自己编译hadoop源码得到。

否则会报错：

HADOOP_HOME and hadoop.home.dir are unset

编写Demo

创建Maven工程，添加依赖：

<dependency>
 <groupId>org.apache.hadoop</groupId>
 <artifactId>hadoop-client</artifactId>
 <!-- 注意版本和Hadoop服务器版本一致，否则可能会不兼容 -->
 <version>3.2.3</version>
</dependency>
<dependency>
 <groupId>junit</groupId>
 <artifactId>junit</artifactId>
 <version>4.12</version>
 <scope>test</scope>
</dependency>
<dependency>
 <groupId>org.slf4j</groupId>
 <artifactId>slf4j-log4j12</artifactId>
 <version>1.7.30</version>
</dependency>

配置log4j日志：log4j.properties

log4j.rootLogger=INFO, stdout
log4j.appender.stdout=org.apache.log4j.ConsoleAppender
log4j.appender.stdout.layout=org.apache.log4j.PatternLayout
log4j.appender.stdout.layout.ConversionPattern=%d %p [%c] - %m%n
log4j.appender.logfile=org.apache.log4j.FileAppender
log4j.appender.logfile.File=target/hadoop-client.log
log4j.appender.logfile.layout=org.apache.log4j.PatternLayout
log4j.appender.logfile.layout.ConversionPattern=%d %p [%c] - %m%n

编写测试Demo代码：
1. 获取客户端对象
2. 执行相关的操作命令
3. 关闭资源 ```java package com.study.hdfs;

import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import org.junit.After; import org.junit.Before; import org.junit.Test;

import java.io.IOException; import java.net.URI; import java.net.URISyntaxException;

public class HdfsClient {

private FileSystem fs;
@Before
/**
 * 获取客户端对象
 */
public void init() throws URISyntaxException, IOException, InterruptedException {
    Configuration configuration = new Configuration();
    // 连接的集群NN地址
    String uri = "hdfs://hadoop102:8020";  
    // 用户
    String user = "tengyer";  
    fs = FileSystem.get(new URI(uri), configuration, user);
}
@After
/**
 * 关闭资源
 */
public void close() throws IOException {
    fs.close();
}
@Test
public void testMkdirs() throws IOException, URISyntaxException, InterruptedException {
    // 在HDFS上创建一个文件夹
    fs.mkdirs(new Path("/xiyouji/huaguoshan"));
}

}

<a name="g50yB"></a>
# 常用 Java API
<a name="P238i"></a>
## 创建文件夹
```java
@Test
public void testMkdirs() throws IOException, URISyntaxException, InterruptedException {
    // 在HDFS上创建一个文件夹
    fs.mkdirs(new Path("/xiyouji/huaguoshan"));
}

上传文件

@Test
/**
 * 测试上传
 */
public void testPut() throws IOException {
    // 上传完毕是否删除原数据
    boolean delSrc = false;
    // 如果hdfs上已有该文件，是否允许覆盖，（不允许覆盖时，如果目的地已经存在该文件，则抛出异常）
    boolean overwrite = false; 
    // 源数据路径
    Path src = new Path("/app/testData/sunwukong.txt"); 
    // 目的地路径，可以加上协议写成完整路径 hdfs://hadoop102:8020/xiyouji/huaguoshan/,也可以不加
    Path dst = new Path("/xiyouji/huaguoshan/"); 
    fs.copyFromLocalFile(delSrc, overwrite, src, dst);
}

下载文件或文件夹


@Test
/**
 * 测试从hdfs下载
 */
public void testGet() throws IOException {
    // 下载完毕后，是否删除hdfs上的源文件
    boolean delSrc = false; 
    // hdfs源数据路径（文件或文件夹），也可以加上hdfs://hadoop102/写成完整路径
    Path src = new Path("/xiyouji/huaguoshan/sunwukong.txt");  
    // 目的地路径是一个文件夹，程序会将hdfs中的文件下载到该文件夹。如果配置的有CRC校验，则文件还会同时生成一个crc校验文件
    Path dst = new Path("/app/testData/"); // 目的文件夹路径
    boolean useRawLocalFileSystem = true;  // 是否进行CRC完整性校验。true则不生成crc校验文件，false会生成
    fs.copyToLocalFile(delSrc, src, dst, useRawLocalFileSystem);
}

删除hdfs的文件或文件夹

@Test
/**
 * 测试删除hdfs文件
 */
public void testRM() throws IOException {
      // 要删除的路径（文件或文件夹）
    Path path = new Path("/xiyouji/huaguoshan/sunwukong.txt");  
    // 是否递归删除，删除非空文件夹时需要递归删除文件夹下的内容，类似 rm 的 -r 参数。删除文件或空文件夹时可以不递归
    boolean recursive = false;  
    fs.delete(path, recursive);
}

文件（或文件夹）移动和重命名

@Test
/**
 * 测试移动和重命名hdfs文件
 */
public void testMV() throws IOException {
    //        Path src = new Path("/xiyouji/huaguoshan/sunwukong.txt");
    //        Path dst = new Path("/xiyouji/huaguoshan/meihouwang.txt");
    //        fs.rename(src, dst);  // 重命名
    // 文件移动位置并重命名
    fs.rename(new Path("/xiyouji/huaguoshan/meihouwang.txt"), new Path("/xiyouji/sunxingzhe.txt"));
}

获取文件夹下的文件信息（listFiles方式）

@Test
/**
 * 列举文件夹下文件详情, ls 命令
 */
public void testLS() throws IOException {
    Path path = new Path("/xiyouji");
    boolean recursive = false; // 是否递归
    // 类似ls命令，返回值是一个迭代器
    RemoteIterator<LocatedFileStatus> listFilesIterator = fs.listFiles(path, recursive);
    while(listFilesIterator.hasNext()) {
        LocatedFileStatus fileStatus = listFilesIterator.next();  // 获取到文件属性
        Path filePath = fileStatus.getPath();
        String owner = fileStatus.getOwner();
        String group = fileStatus.getGroup();
        long blockSize = fileStatus.getBlockSize();
        short replication = fileStatus.getReplication();
        System.out.println(fileStatus);
    }
}

获取文件夹下的文件信息（listStatus方式）

@Test
/**
 * 通过listStatus获取指定路径下的所有文件属性
 */
public void testFileStatus() throws IOException {
    Path path = new Path("/jinguo");
    FileStatus[] fileStatuses = fs.listStatus(path);
    for (FileStatus fileStatus : fileStatuses) {
        System.out.println("------------------------");
        // 获取文件名称
        System.out.println(fileStatus.getPath().getName());  
          // 判断是文件还是文件夹
        if (fileStatus.isFile()) {  
            System.out.println("是一个文件");
        }
    }
}

修改配置项

使用配置文件配置

java程序中，也可以在resources中添加hdfs-site.xml等配置文件来修改某些配置项的值。
例如在resources下创建hdfs-site.xml：

<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- 配置副本数量,程序中的hdfs-site.xml会覆盖hadoop软件中的该项值 -->
    <property>
        <name>dfs.replication</name>
        <value>4</value>
    </property>
</configuration>

此时执行上传文件操作时，上传到hdfs上的文件副本数量就变为了4个。

使用configuration对象配置

java程序中，还可以通过configuration对象修改配置项的值，例如：

Configuration configuration = new Configuration();
// 将文件副本数量配置为5
configuration.set("dfs.replication", "5");
String uri = "hdfs://hadoop102:8020";  // 连接的集群NN地址
String user = "tengyer";  // 用户
fs = FileSystem.get(new URI(uri), configuration, user);

配置项优先级

参数的优先级，以hdfs-xxx.xml配置文件为例（越往下优先级越高，下面的配置会覆盖上面的）：

hadoop软件中的hdfs-default.xml
hadoop软件中的hdfs-site.xml
在项目的resources下的hdfs-site.xml配置文件
程序中的Configuration对象中配置的值

大数据Hadoop

08-HDFS的JavaAPI操作