File System - 图1

Background

Polling, Interrupts and DMA

Polling

Polling is a procedure written in software that detects that an event has occurred. There are two types of polling, blocking and non-blocking.
In a blocking poll, the processor tests a flag or bit and waits until the state of the flag has changed to a desired state. Thus the processor is unable to proceed until this new state is detected.
In non-blocking poll, the processor may proceed with other tasks and only acts on the desired action when a defined flag state has been detected. Since the flag may change state while the processor was not testing the flag, there will be a delay before the desired action is taken which could be in the order of milliseconds, depending on what the processor is doing.

Interrupts

Interrupts overcome the short comings of polling. An event can interrupt the processor at anytime. Interrupt latency is the time it takes for the processor to acknowledge reception of the notification and can be of the order of sub-microseconds. The processor may choose to service the event immediately or may postpone servicing until an appropriate moment.

Direct Memory Access

DMA is a hardware process that can handle data transfers without processor intervention. Thus, events can happen behind the scene without needing to interrupt the processor. DMA is best suited (but not limited to) mass data transfers, from peripheral to memory, memory to peripheral, or memory to memory. A typical example of DMA data transfer is high-speed ADC sampling, video recording and playback, or audio recording and playback.

Rerferen Link:
https://forum.allaboutcircuits.com/threads/difference-between-polling-dma-and-interrupt.152493/
https://www.eventhelix.com/fault-handling/dma-interrupt-handling/

File System

image.png

File system interface

  • read() / write()
  • lseek()
  • rename()
  • stat()
  • unlink()
  • mkdir() rmdir()
  • Hard link / Soft link

Block

Block: The smallest unit writable by a disk or file system. Everything a file system does is composed of operations done on blocks. A file system block is always the same size as or larger (in integer multiples) than the disk block size.

block size

块比较大,执行的IOPS就会减少。如果块大,但是文件小,将会浪费很大的存储空间

  1. [root@ip-172-16-1-245 ~]# blockdev --getbsz /dev/nvme0n1p1
  2. 4096

Block 使用率

[root@ip-172-16-1-245 ~]# df -B 4096 /dev/nvme0n1p2
Filesystem     4K-blocks    Used Available Use% Mounted on
/dev/nvme0n1p2  13104123 4746275   8357848  37% /

block group

ext 文件系统会将整个空间划分成大小相等的块组
File System - 图3
Number of blocks per group is fixed, and cannot be changed. Generally the number of blocks per block groups is 8*block size.

inode

文件名和inode信息是存在目录下面的,不是在inode里面的

inode 保存的信息

=> File type (executable, block special etc)
=> Permissions (read, write etc)
=> Owner
=> Group
=> File Size
=> File access, change and modification time (remember UNIX or Linux never stores file creation time, this is favorite question asked in UNIX/Linux sys admin job interview)
=> File deletion time
=> Number of links (soft/hard)
=> Access Control List (ACLs)

Each inode is identified by a unique inode number within the file system. Inode is also know as index number.

查看inode相关内容

inode number of file

[root@ip-172-31-15-250 ~]# ls -i /etc/passwd 199355 /etc/passwd

inode muber of the Partition :

[root@ip-172-16-1-245 ~]# df -i /dev/nvme0n1p2
Filesystem       Inodes  IUsed    IFree IUse% Mounted on
/dev/nvme0n1p2 26213312 266096 25947216    2% /

inode number and its attribute:

[root@ip-172-31-15-250 ~]# stat /etc/passwd File: /etc/passwd Size: 1211 Blocks: 8 IO Block: 4096 regular file Device: 10302h/66306d Inode: 199355 Links: 1 Access: (0644/-rw-r—r—) Uid: ( 0/ root) Gid: ( 0/ root) Context: system_u:object_r:passwd_file_t:s0 Access: 2021-07-23 03:54:19.855810229 +0000 Modify: 2021-07-16 03:20:47.567588251 +0000 Change: 2021-07-16 03:20:47.569588258 +0000 Birth: -

[root@ip-172-31-15-250 tmp]# ls -iltotal 0 104868177 drwx———. 2 ec2-user ec2-user 6 Jul 16 06:03 pyright-13956-1gvvF0BV2Ji6 41948605 drwx———. 2 ec2-user ec2-user 6 Jul 16 06:09 pyright-14456-8buzo8tCIPex 88083382 drwxrwxr-x. 2 ec2-user ec2-user 6 Jul 16 12:19 python-languageserver-cancellation 134221180 drwx———. 3 root root 17 Jul 22 07:45 snap.stress-ng 402020 drwx———. 3 root root 17 Jul 22 07:39 systemd-private-5c20985bc0fe48b39c239f318d398681-chronyd.service-wyjiUf 33554683 drwxrwxr-x. 2 ec2-user ec2-user 6 Jul 16 12:19 vscode-typescript1000

使用inode 查找文件删除文件

find . -inum 782263 -exec rm -i {} \;

https://www.howtogeek.com/465350/everything-you-ever-wanted-to-know-about-inodes-on-linux/

Access Path

读文件的流程

image.png
其中write是要写他的访问时间

写文件的流程

image.png

为了减少这些时间,所以有了Dentry Cache

Dentry Cache

A dentry (short for “directory entry”) is what the Linux kernel uses to keep track of the hierarchy of files in directories. Each dentry maps an inode number to a file name and a parent directory.
每次打开一个目录都会创建一个 dentry。

[root@ip-172-16-1-245 ~]# cat /proc/slabinfo  | grep dentry
dentry             25633  26124    192   21    1 : tunables    0    0    0 : slabdata   1244   1244      0

Super Block

A superblock is a record of the characteristics of a filesystem, including its size, the block size, the empty and the filled blocks and their respective counts, the size and location of the inode tables, the disk block map and usage information, and the size of the block groups.

An inode is a data structure on a filesystem on a Unix-like operating system that stores all the information about a file except its name and its actual data. A data structure is a way of storing data so that it can be used efficiently; different types of data structures are suited to different types of applications, and some are highly specialized for specific types of tasks.

If the superblock of a file system is corrupted, then you will face issues while mounting that file system. The system verifies and modifies superblock each time you mount the file system.

SuperBlock 存储信息

  • Blocks in the file systemNo of free blocks in the file system
  • Inodes per block group
  • Blocks per block group
  • No of times the file system was mounted since last fsck.
  • Mount time
  • UUID of the file system
  • Write time
  • File System State (ie: was it cleanly unounted, errors detected etc)
  • The file system type etc(ie: whether its ext2,3 or 4).
  • The operating system in which the file system was formatted

Super Block location

ext2/3/4

[root@ip-172-31-15-250 ~]# mke2fs -n /dev/nvme1n1
mke2fs 1.45.6 (20-Mar-2020)
Found a dos partition table in /dev/nvme1n1
Proceed anyway? (y,N) y
Creating filesystem with 26214400 4k blocks and 6553600 inodes
Filesystem UUID: 51b71a80-7297-43f6-b630-4c078c552646
Superblock backups stored on blocks: 
        32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208, 
        4096000, 7962624, 11239424, 20480000, 23887872

[root@ip-172-31-15-250 ~]# dumpe2fs /dev/nvme1n1p1 |grep -i superblock
dumpe2fs 1.45.6 (20-Mar-2020)
  Primary superblock at 0, Group descriptors at 1-13
  Backup superblock at 32768, Group descriptors at 32769-32781
  Backup superblock at 98304, Group descriptors at 98305-98317
  Backup superblock at 163840, Group descriptors at 163841-163853
  Backup superblock at 229376, Group descriptors at 229377-229389
  Backup superblock at 294912, Group descriptors at 294913-294925
  Backup superblock at 819200, Group descriptors at 819201-819213
  Backup superblock at 884736, Group descriptors at 884737-884749
  Backup superblock at 1605632, Group descriptors at 1605633-1605645
  Backup superblock at 2654208, Group descriptors at 2654209-2654221
  Backup superblock at 4096000, Group descriptors at 4096001-4096013
  Backup superblock at 7962624, Group descriptors at 7962625-7962637
  Backup superblock at 11239424, Group descriptors at 11239425-11239437
  Backup superblock at 20480000, Group descriptors at 20480001-20480013
  Backup superblock at 23887872, Group descriptors at 23887873-23887885

XFS

[root@ip-172-31-15-250 ~]# xfs_info /dev/nvme0n1p2
meta-data=/dev/nvme0n1p2         isize=512    agcount=34, agsize=393216 blks
         =                       sectsz=512   attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=1
data     =                       bsize=4096   blocks=13106683, imaxpct=25
         =                       sunit=0      swidth=0 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=0 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0

https://linux.die.net/man/8/tune2fs

断电

image.png
image.png

磁盘修复

使用备份superblock修复

e2fsck -f -b 8193 /dev/sda3

FSCK

Fsck会检查所有在cylinder group blocks maps(注:cylinder group即对应于fs的partions)中标记为free的块,即没有被文件占用的块。Fsck会检查free的块的数量与inode中声明使用的块的数量的和是否与整个文件系统的所有块数相等。
如果在block allocation maps中有任何错误,fsck将根据其计算的allocated blocks进行重新组建block allocation maps。
Super block中冶存有所有free块的数量信息,fsck会把自己检查的结果与super block中的信息进行比较,如果这两个数不等,则fsck会将检查得到的结果更新到super block中。
支持多种文件类型进行修复。

[root@ip-172-31-15-250 ~]# fsck.ext4 /dev/nvme1n1p1 
e2fsck 1.45.6 (20-Mar-2020)
/dev/nvme1n1p1: clean, 11/6553600 files, 557848/26214144 blocks

The exit code returned by fsck is one of following conditions:0 No errors
1 Filesystem errors corrected
2 System should be rebooted
4 Filesystem errors left uncorrected
8 Operational error
16 Usage or syntax error
32 Checking canceled by user request
128 Shared-library error

The -f (“force”) option specifies that fsck should check parts of the filesystem even if they are not “dirty”. The result is a less efficient, but a more thorough check.

https://www.linuxprobe.com/linux-fsck-command.html
https://blog.csdn.net/weixin_30704893/article/details/116907035

非XFS

# e2fsck -f /dev/sda3    卸载磁盘
# mke2fs -n /dev/sda3    确定超级快的位置
# dumpe2fs /dev/sda3|grep -i superblock
# e2fsck -f -b 8193 /dev/sda3  修复系统

XFS

xfs_repair /dev/vdb1
xfs_check

reference link
https://developer.ibm.com/tutorials/l-linux-filesystem/
http://www.linfo.org/superblock
https://zorrozou.github.io/docs/xfs/XFS%E6%96%87%E4%BB%B6%E7%B3%BB%E7%BB%9F%E7%BB%93%E6%9E%84.html
https://help.ubuntu.com/community/LinuxFilesystemsExplained

Journal

Such a filesystem maintains a journal, which is a data structure that describes pending operations. Prior to writing data to the disk’s main data structures, Linux describes what it’s about to do in the journal. When the operations are complete, their entries are removed from the journal. Thus, at any given moment the journal should contain a list of disk structures that might be undergoing modification. The result is that, in the event of a crash or power failure, the system can examine the journal and check only those data structures described in it. If inconsistencies are found, the system can roll back or complete the changes, returning the disk to a consistent state without checking every data structure in the filesystem. This greatly speeds the disk-check process after power failures and system crashes. Today, journaling filesystems are the standard for most Linux disk partitions.

软连接 vs 硬链接

硬链接
在 Linux 中,当您在目录中执行列表时,列表实际上是映射到 inode 的引用列表。创建硬链接时,硬链接是对与原始文件相同的 inode 的另一个引用。硬链接允许用户创建两个精确的文件,而无需复制磁盘上的数据。但是,与创建副本不同的是,如果您修改硬链接,您又会修改原始文件,并且它们都引用相同的 inode。并且不能跨文件系统创建。

创建方式

ln

软链接
符号链接类似于硬链接,因为它用于链接到一个已经存在的文件,但它的实现却大不相同。符号链接不是对 inode 的引用,而是重定向到另一个文件或目录的指针。

创建方式

ln -s

原始文件inode

[root@ip-172-31-15-250 tmp]# stat realfile File: realfile Size: 5 Blocks: 8 IO Block: 4096 regular file Device: 10302h/66306d Inode: 4935185 Links: 2 Access: (0644/-rw-r—r—) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:user_tmp_t:s0 Access: 2021-07-23 09:52:02.646626931 +0000 Modify: 2021-07-23 09:51:24.655419747 +0000 Change: 2021-07-23 09:52:28.492768635 +0000 Birth: -

软链接inode

[root@ip-172-31-15-250 tmp]# stat sfile File: sfile -> realfile Size: 8 Blocks: 0 IO Block: 4096 symbolic link Device: 10302h/66306d Inode: 4935186 Links: 1 Access: (0777/lrwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:user_tmp_t:s0 Access: 2021-07-23 09:52:04.097634933 +0000 Modify: 2021-07-23 09:52:00.076612963 +0000 Change: 2021-07-23 09:52:00.076612963 +0000 Birth: -

硬链接inode

[root@ip-172-31-15-250 tmp]# stat hfile File: hfile Size: 5 Blocks: 8 IO Block: 4096 regular file Device: 10302h/66306d Inode: 4935185 Links: 2 Access: (0644/-rw-r—r—) Uid: ( 0/ root) Gid: ( 0/ root) Context: unconfined_u:object_r:user_tmp_t:s0 Access: 2021-07-23 09:52:02.646626931 +0000 Modify: 2021-07-23 09:51:24.655419747 +0000 Change: 2021-07-23 09:52:28.492768635 +0000 Birth: -

删除原始文件后,硬链接可以访问,软链接不可以

[root@ip-172-31-15-250 tmp]# rm -rf realfile [root@ip-172-31-15-250 tmp]# [root@ip-172-31-15-250 tmp]# cat sfile cat: sfile: No such file or directory [root@ip-172-31-15-250 tmp]# cat hfile 1111