Hi all! We continue to launch new streams for the courses you have already fallen in love with and now we are in a hurry to announce that we are starting a new set of courses
Virtual file systems serve as a kind of magical abstraction that allows the philosophy of Linux to say that "everything is a file."
What is a file system? Based on the words of one of the first contributors and authors of Linux
Filesystem Basics
The Linux kernel has certain requirements for an entity that can be considered a file system. It must implement the methods open()
, read()
ΠΈ write()
for persistent objects that have names. From an object-oriented point of view
If we can open, read, and write to an entity, then that entity is considered a file, as we can see from the example in the console above.
The VFS phenomenon only underscores the Unix-like observation that "everything is a file". Think how weird that that little /dev/console example above shows how the console actually works. The picture shows an interactive Bash session. Sending a string to the console (virtual console device) displays it on a virtual screen. VFS has other, even stranger properties. For example, it allows you to search by
Familiar systems such as ext4, NFS, and /proc have three important functions in a C data structure called read()
one file system and then use the method write ()
another file system for data output.
The function definitions that belong to the base VFS types are in the files fs/
contain certain file systems. The core also contains entities such as cgroups
, /dev
ΠΈ tmpfs
, which are required during the boot process and are therefore defined in the kernel subdirectory init/
. Notice that cgroups
, /dev
ΠΈ tmpfs
do not call the "big three" functions file_operations
, but directly read and write to memory.
The diagram below shows how userspace accesses the different types of filesystems commonly mounted on Linux systems. Structures not shown pipes
, dmesg
ΠΈ POSIX clocks
, which also implement the structure file_operations
, accessed through the VFS layer.
VFS is a "wrapper layer" between system calls and implementations of certain file_operations
Such as ext4
ΠΈ procfs
. Functions file_operations
can interact with either device drivers or memory access devices. tmpfs
, devtmpfs
ΠΈ cgroups
do not use file_operations
, but directly access the memory.
The existence of VFS provides an opportunity to reuse code, since the basic methods associated with file systems do not have to be re-implemented by each type of file system. Code reuse is a common practice among software engineers! However, if the reusable code contains
/tmp: Simple hint
An easy way to detect that VFS are present on a system is to type mount | grep -v sd | grep -v :/
, which will show all mounted (mounted
) filesystems that are not disk-resident and non-NFS, which is true on most computers. One of the listed mounts (mounts
) VFS will undoubtedly /tmp
, right?
Everyone knows that storage / tmp
on a physical medium - madness!
Why is it undesirable to store /tmp
on physical media? Because the files in /tmp
are temporary and storage devices are slower than the memory where tmpfs is created. Moreover, physical media is more prone to overwriting wear than memory. Finally, files in /tmp can contain sensitive information, so making them disappear on every reboot is an essential feature.
Unfortunately, some Linux distribution installation scripts create /tmp on the storage device by default. Don't despair if this happened to your system as well. Follow a few simple instructions with tmpfs
becomes unavailable for other purposes. In other words, a system with a giant tmpfs and large files on it can run out of memory and crash. Another hint: while editing a file /etc/fstab
, remember that it must end with a newline, otherwise your system will not boot.
/proc and /sys
Besides /tmp
, VFS (virtual file systems) that are most familiar to Linux users are /proc
ΠΈ /sys
system. (/dev
resides in shared memory and does not have file_operations
). Why these two components? Let's look into this issue.
procfs
creates a snapshot of the kernel and the processes it monitors for userspace
. In /proc
the kernel prints information about what it has available, such as interrupts, virtual memory, and the scheduler. Besides, /proc/sys
is the place where the parameters configured with the command sysctl
, available for userspace
. The status and statistics of individual processes are displayed in directories /proc/
.
Here /proc/meminfo
is an empty file that nevertheless contains valuable information.
Behavior /proc
files shows how different VFS disk file systems can be. On the one side, /proc/meminfo
contain information that can be viewed with the command free
. On the other hand, it's empty! How does it work? The situation is reminiscent of the famous article titled /proc
, and actually in files /proc
there is nothing when no one is looking. As said
Seeming emptiness procfs
makes sense because the information there is dynamic. A slightly different situation with sysfs
. Let's compare how many files that are at least one byte in size are in /proc
and /sys
.
Procfs
has one file, namely the exported kernel configuration, which is an exception because it only needs to be generated once per boot. On the other hand, in /sys
there are many larger files, many of which take up an entire page of memory. Usually files sysfs
contain exactly one number or line, unlike tables of information obtained from reading files such as /proc/meminfo
.
Goal sysfs
- provide read/write properties of what the kernel calls Β«kobjectsΒ»
in userspace. The only goal kobjects
is link counting: when the last link to a kobject is removed, the system will restore the resources associated with it. Nevertheless, /sys
makes up most of the famous
The kernel's stable ABI limits what can appear in /sys
, not what is actually present at that particular moment. Listing file permissions in sysfs provides insight into how configurable settings for devices, modules, filesystems, etc. can be configured or read. The logical conclusion is that procfs is also part of the kernel's stable ABI, although this is not explicitly stated in
Files in sysfs
describe one particular property for each entity and can be readable, writable, or both. "0" in the file means that the SSD cannot be removed.
Let's start the second part of the translation with how to monitor VFS using the eBPF and bcc tools, and now we are waiting for your comments and traditionally invite you to
Source: habr.com