Are file edits in Linux directly saved into disk?
if I turn off the computer immediately after I edit and save a file, my changes will be most likely lost?
They might be. I wouldn't say "most likely", but the likelihood depends on a lot of things.
An easy way to increase performance of file writes, is for the OS to just cache the data, tell (lie to) the application the write went through, and then actually do the write later. This is especially useful if there's other disk activity going on at the same time: the OS can prioritize reads and do the writes later. It can also remove the need for an actual write completely, e.g., in the case where a temporary file is removed quickly afterwards.
The caching issue is more pronounced if the storage is slow. Copying files from a fast SSD to a slow USB stick will probably involve a lot of write caching, since the USB stick just can't keep up. But your cp
command returns faster, so you can carry on working, possibly even editing the files that were just copied.
Of course caching like that has the downside you note, some data might be lost before it's actually saved. The user will be miffed if their editor told them the write was successful, but the file wasn't actually on the disk. Which is why there's the fsync()
system call, which is supposed to return only after the file has actually hit the disk. Your editor can use that to make sure the data is fine before reporting to the user that the write succeeded.
I said, "is supposed to", since the drive itself might tell the same lies to the OS and say that the write is complete, while the file really only exists in a volatile write cache within the drive. Depending on the drive, there might be no way around that.
In addition to fsync()
, there are also the sync()
and syncfs()
system calls that ask the system to make sure all system-wide writes or all writes on a particular filesystem have hit the disk. The utility sync
can be used to call those.
Then there's also the O_DIRECT
flag to open()
, which is supposed to "try to minimize cache effects of the I/O to and from this file." Removing caching reduces performance, so that's mostly used by applications (databases) that do their own caching and want to be in control of it.
(O_DIRECT
isn't without its issues, the comments about it in the man page are somewhat amusing.)
What happens on a power-out also depends on the filesystem. It's not just the file data that you should be concerned about, but the filesystem metadata. Having the file data on disk isn't much use if you can't find it. Just extending a file to a larger size will require allocating new data blocks, and they need to be marked somewhere.
How a filesystem deals with metadata changes and the ordering between metadata and data writes varies a lot. E.g., with ext4
, if you set the mount flag data=journal
, then all writes – even data writes – go through the journal and should be rather safe. That also means they get written twice, so performance goes down. The default options try to order the writes so that the data is on the disk before the metadata is updated. Other options or other filesystem may be better or worse; I won't even try a comprehensive study.
In practice, on a lightly loaded system, the file should hit the disk within a few seconds. If you're dealing with removable storage, unmount the filesystem before pulling the media to make sure the data is actually sent to the drive, and there's no further activity. (Or have your GUI environment do that for you.)
There is an extremely simple way to prove that it cannot be true that file edits are always directly saved to disk, namely the fact that there are filesystems that aren't backed by a disk in the first place. If a filesystem doesn't have a disk in the first place, then it cannot possibly write the changes to disk, ever.
Some examples are:
tmpfs
, a file system that only exists in RAM (or more precisely, in the buffer cache)ramfs
, a file system that only exists in RAM- any network file system (NFS, CIFS/SMB, AFS, AFP, …)
- any virtual filesystem (
sysfs
,procfs
,devfs
,shmfs
, …)
But even for disk-backed file systems this is usually not true. The page How To Corrupt An SQLite Database has a chapter called Failure to sync which describes many different ways in which writes (in this cases commits to an SQLite database) can fail to arrive on disk. SQLite also has a white paper explaining the many hoops you have to jump through to guarantee Atomic Commit In SQLite. (Note that Atomic Write is a much harder than problem than just Write, but of course writing to disk is sub-problem of atomic writing, and you can learn a lot about that problem, too, from this paper.) This paper has a section on Things That Can Go Wrong which includes a subsection about Incomplete Disk Flushes that give some examples of subtle intricacies that might prevent a write from reaching the disk (such as the HDD controller reporting that it has written to disk when it fact it hasn't – yes, there are HDD manufacturers that do this, and it might even be legal according to the ATA spec, because it is ambiguously worded in this respect).
It is true that most operating systems, including Unix, Linux and Windows use a write cache to speed up operations. That means that turning a computer off without shutting it down is a bad idea and may lead to data loss. The same is true if you remove an USB storage before it is ready to be removed.
Most systems also offer the option to make writes synchronous. That means that the data will be on disk before an application receives a success confirmation, at the cost of being slower.
In short, there is a reason why you should properly shut down your computer and properly prepare USB storage for removal.