Java IO - Java IO(1) - 《开发笔记》

Java IO 类库
实例
- 字节流
- 字符读
背后是什么
总结
参考链接

java.io包 包含许多的 class ，程序可以使用这些类来读取和写入数据。大部分类都实现了顺序访问流，顺序访问流可以分为2类

读写字节的流
读写Unicode字符的流

每个流都有他的特殊性，例如从文件读取或写入文件，在读取和写入的过程中过滤数据，或者是序列化对象

Java IO 类库

Java IO(1) - 图1 刚接触IO的时候这个类库还是相对庞大的，且不明白各个 Class 的用法，更别谈他们之间的组合用法了

从字面上看Java IO分为2块，分别是:

字节流，网络传输都是使用的字节流，所以需要直接支持的
字符流，日常使用字节流其实是不方便的，例如一个图片，一个文件所以Java提供了字符流来进行操作.

输入输出分别是

输入，代表流的InputStream，代表字符的Reader
输出，代表流的OutputStream，代表字符的Writer

实例

字节流

    public static void main(String[] args) throws IOException {
        final String path = "/paths/Test.java";
        {
            //单字节读取
            try (InputStream inputStream = new FileInputStream(path)) {
                int read;
                while ((read = inputStream.read()) != -1) {
                    System.out.print(new String(new byte[]{(byte) read}, 0, 1));
                }
            }
        }
        {
            //多字节夺取
            try (InputStream inputStream = new FileInputStream(path)) {
                byte[] bytes = new byte[1024];
                int read;
                while ((read = inputStream.read(bytes)) != -1) {
                    System.out.print(new String(bytes, 0, read));
                }
            }
        }
        {
            //整个文件读取
            try (InputStream inputStream = new FileInputStream(path)) {
                final int available = inputStream.available();
                byte[] bytes = new byte[available];
                inputStream.read(bytes, 0, bytes.length);
                System.out.println(new String(bytes, StandardCharsets.UTF_8));
            }
        }
    }

单个字节读取肯定不方便，多个字节读取也有问题

一个ASCII字符是1个字节，而汉字是UTF-8编码，是不确定有多少个字节的，如果一个一个字节的读取必然出现乱码，多字节夺取如果刚好读取到这部分也会形成乱码

整个文件读取肯定也不现实，如果一个文件是10G，机器只有4G的内存. :(

针对整个问题，我能不能使用读字符的形式来读取本地文件.

字符读

    try (InputStream inputStream = new FileInputStream(path)) {
        final InputStreamReader inputStreamReader = new InputStreamReader(inputStream, StandardCharsets.UTF_8);
        int read;
        while ((read = inputStreamReader.read()) != -1) {
            System.out.print(new String(new char[]{(char) read}));
        }
    }

一次读取一个字符，从结果来看直接消灭了乱码使用方式就是在inputStream的形式上进行了一次wrapper，读取字节的过程中将其转换成字符

那么还能不能更方便一点呢，例如一次读取一行.
    try (InputStream inputStream = new FileInputStream(path);
         BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(inputStream))) {
        String line;
        while ((line = bufferedReader.readLine()) != null) {
            System.out.println(line);
        }
    }
在上述单个字符的情况下又进行了一次包装，一直读取直到读完一行.

各种各种的进化以及组合形式的出现都是为了让我们更加方便的进行读写操作. 读如此，写也是如此..(评论区)

背后是什么

我们简简单单的一个 write 和 read 就能写入文件和读取文件了，那么背后究竟是怎么一回事呢.

为了了解这个，我们先看一个c是怎么读写文件的 :)

#include <stdio.h>
#include <unistd.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
int main(void)
{
    int in;
    char buffer[1024];
    char tempWrite[10] = "test write";
    //系统调用open打开in.txt 文件(读写，如果没有就创建文件)
    //in 文件描述福
    in = open("in.txt", O_RDWR | O_CREAT);
    if (-1 == in) // 打开文件失败,则异常返回
    {
        return -1;
    }
    //系统调用写入文件
    write(in, tempWrite, 10);
    //系统调用关闭文件描述符
    close(in);
    return 0;
}
//gcc -o read readAndWrite.c && ./read

分别是 open文件, 准备写入的数据 ， write写入文件 ， close关闭文件

打开文件

    public FileInputStream(File file) throws FileNotFoundException {
        String name = (file != null ? file.getPath() : null);
        SecurityManager security = System.getSecurityManager();
        if (security != null) security.checkRead(name);
        if (name == null) { throw new NullPointerException();
        if (file.isInvalid()) throw new FileNotFoundException("Invalid file path");
        fd = new FileDescriptor();
        fd.attach(this);
        path = name;
        open(name);
    }
    // wrap native call to allow instrumentation
    /**
     * Opens the specified file for reading.
     * @param name the name of the file
     */
    private void open(String name) throws FileNotFoundException {
        open0(name);
    }
    private native void open0(String name) throws FileNotFoundException;

基本的文件检查
创建 文件描述符FD
调用native方法 open0

grep -nr “搜索的字符串” . (grep -nr “Java_java_io_FileInputStream_open0” .)

jdk8u_jdk/src/share/native/java/io/FileInputStream.c +60
JNIEXPORT void JNICALL
Java_java_io_FileInputStream_open0(JNIEnv *env, jobject this, jstring path) {
    fileOpen(env, this, path, fis_fd, O_RDONLY);
}
------------------------------------------------------------------------------
vim +101 ./jdk8u_jdk/src/solaris/native/java/io/io_util_md.c
void fileOpen(JNIEnv *env, jobject this, jstring path, jfieldID fid, int flags)
{
//WITH_PLATFORM_STRING 执行{}之间的代码
    WITH_PLATFORM_STRING(env, path, ps) {
        FD fd;
        //....省略部分不重要的
        fd = handleOpen(ps, flags, 0666);
        if (fd != -1) {
            SET_FD(this, fd, fid);
        } else {
            throwFileNotFoundException(env, path);
        }
    } END_PLATFORM_STRING(env, ps);
}
------------------------------------------------------------------------------
vim +79 ./jdk8u_jdk/src/solaris/native/java/io/io_util_md.c
FD handleOpen(const char *path, int oflag, int mode) {
    FD fd;
    //RESTARTABLE 执行到成功为止
    //The open64() function is a part of the large file extensions, and is //equivalent to calling open() with the O_LARGEFILE flag.
    //调用open64打开文件，== open() + O_LARGEFILE
    RESTARTABLE(open64(path, oflag, mode), fd);
    if (fd != -1) {
        struct stat64 buf64;
        int result;
        RESTARTABLE(fstat64(fd, &buf64), result);
        if (result != -1) {
            if (S_ISDIR(buf64.st_mode)) {
                close(fd);
                errno = EISDIR;
                fd = -1;
            }
        } else {
            close(fd);
            fd = -1;
        }
    }
    return fd;
}
------------------------------------------------------------------------------

RESTARTABLE

/* 
 * vim +95 ./solaris/native/java/io/io_util_md.h
 * Retry the operation if it is interrupted
 */
#define RESTARTABLE(_cmd, _result) do { \
    do { \
        _result = _cmd; \
    } while((_result == -1) && (errno == EINTR)); \
} while(0)

读取数据

inputStream.read();  读取单个字节
-------------------------------------------------------------------------------
vim +64 jdk8u_jdk/src/share/native/java/io/FileInputStream.c
JNIEXPORT jint JNICALL
Java_java_io_FileInputStream_read0(JNIEnv *env, jobject this) {
    return readSingle(env, this, fis_fd);
}
jint
readSingle(JNIEnv *env, jobject this, jfieldID fid) {
    jint nread;
    char ret;
    //获取文件描述符
    FD fd = GET_FD(this, fid);
    if (fd == -1) {
        JNU_ThrowIOException(env, "Stream Closed");
        return -1;
    }
    nread = IO_Read(fd, &ret, 1);
    if (nread == 0) { /* EOF */
        return -1;
    } else if (nread == -1) { /* error */
        JNU_ThrowIOExceptionWithLastError(env, "Read error");
    }
    return ret & 0xFF;
}
#define IO_Read handleRead
vim +157 ./solaris/native/java/io/io_util_md.c
ssize_t
handleRead(FD fd, void *buf, jint len)
{
    ssize_t result;
    RESTARTABLE(read(fd, buf, len), result);
    return result;
}
成功通过系统调用读取到一个字节 read(fd, buf, len)
-------------------------------------------------------------------------------
byte[] b = new byte[1024]; //读取到字节数组中
inputStream.readBytes(b, 0, b.length); 读取1024子字节到数组b中
private native int readBytes(byte b[], int off, int len) throws IOException;
vim +74 ./share/native/java/io/io_util.c
jint readBytes(JNIEnv *env, jobject this, jbyteArray bytes,
          jint off, jint len, jfieldID fid)
{
    jint nread;
    char stackBuf[BUF_SIZE];//BUF_SIZE == 8192
    char *buf = NULL;
    FD fd;
    if (IS_NULL(bytes)) {
        JNU_ThrowNullPointerException(env, NULL);
        return -1;
    }
    if (outOfBounds(env, off, len, bytes)) {
        JNU_ThrowByName(env, "java/lang/IndexOutOfBoundsException", NULL);
        return -1;
    }
    if (len == 0) {
        return 0;
    } else if (len > BUF_SIZE) {
            //如果超过8M，那么需要重新申请内存
        buf = malloc(len);
        if (buf == NULL) {// OOM
            JNU_ThrowOutOfMemoryError(env, NULL);
            return 0;
        }
    } else {
        buf = stackBuf;
    }
    fd = GET_FD(this, fid);
    if (fd == -1) {
        JNU_ThrowIOException(env, "Stream Closed");
        nread = -1;
    } else {
        nread = IO_Read(fd, buf, len);
        if (nread > 0) {
            (*env)->SetByteArrayRegion(env, bytes, off, nread, (jbyte *)buf);
        } else if (nread == -1) {
            JNU_ThrowIOExceptionWithLastError(env, "Read error");
        } else { /* EOF */
            nread = -1;
        }
    }
    if (buf != stackBuf) {
            // 释放内存
        free(buf);
    }
    return nread;
}
Note: 如果一次读取超过8M需要重新申请内存. 所以.....
IO_Read 同上，还是一样的系统调用

整体上的读取还是和c差不多，添加了一些额外的检查和JNI的调用，但是原则上的流程还是和C一致的.

关闭文件

void
fileClose(JNIEnv *env, jobject this, jfieldID fid)
{
    FD fd = GET_FD(this, fid);
    if (fd == -1) {
        return;
    }
    /* Set the fd to -1 before closing it so that the timing window
     * of other threads using the wrong fd (closed but recycled fd,
     * that gets re-opened with some other filename) is reduced.
     * Practically the chance of its occurance is low, however, we are
     * taking extra precaution over here.
     */
    SET_FD(this, -1, fid);
    //...一部分判断挪走
    if (close(fd) == -1) {
        JNU_ThrowIOExceptionWithLastError(env, "close failed");
    }
}

关于Java的文件描述符可以看看这篇文章

总结

java.io 类库比较杂，使用方式主要和 Buffered 组合使用，初次接触可能会觉得比较麻烦，多次使用形成肌肉记忆以后则简单需要了
Java的文件io流程上和C调用差不多，大体都是 open read write close
- 相比于直接使用系统调用，Java的读取文件会多一次拷贝！因为使用read读取文件内容到C空间的数组后，需要拷贝数据到JVM的堆空间的数组中

Java IO(1)