Is a write operation in unix atomic?
To call the Posix semantics "atomic" is perhaps an oversimplification. Posix requires that reads and writes occur in some order:
Writes can be serialized with respect to other reads and writes. If a
read()
of file data can be proven (by any means) to occur after awrite()
of the data, it must reflect thatwrite()
, even if the calls are made by different processes. A similar requirement applies to multiple write operations to the same file position. This is needed to guarantee the propagation of data fromwrite()
calls to subsequentread()
calls. (from the Rationale section of the Posix specification forpwrite
andwrite
)
The atomicity guarantee mentioned in APUE refers to the use of the O_APPEND
flag, which forces writes to be performed at the end of the file:
If the
O_APPEND
flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation.
With respect to pread
and pwrite
, APUE says (correctly, of course) that these interfaces allow the application to seek and perform I/O atomically; in other words, that the I/O operation will occur at the specified file position regardless of what any other process does. (Because the position is specified in the call itself, and does not affect the persistent file position.)
The Posix sequencing guarantee is as follows (from the Description of the write()
and pwrite()
functions):
After a write() to a regular file has successfully returned:
Any successful
read()
from each byte position in the file that was modified by that write shall return the data specified by the write() for that position until such byte positions are again modified.Any subsequent successful
write()
to the same byte position in the file shall overwrite that file data.
As mentioned in the Rationale, this wording does guarantee that two simultaneous write
calls (even in different unrelated processes) will not interleave data, because if data were interleaved during a write which will eventually succeed the second guarantee would be impossible to provide. How this is accomplished is up to the implementation.
It must be noted that not all filesystems conform to Posix, and modular OS design, which allows multiple filesystems to coexist in a single installation, make it impossible for the kernel itself to provide guarantees about write
which apply to all available filesystems. Network filesystems are particularly prone to data races (and local mutexes won't help much either), as is mentioned as well by Posix (at the end of the paragraph quoted from the Rationale):
This requirement is particularly significant for networked file systems, where some caching schemes violate these semantics.
The first guarantee (about subsequent reads) requires some bookkeeping in the filesystem, because data which has been successfully "written" to a kernel buffer but not yet synched to disk must be made transparently available to processes reading from that file. This also requires some internal locking of kernel metadata.
Since writing to regular files is typically accomplished via kernel buffers and actually synching the data to the physical storage device is definitely not atomic, the locks necessary to provide these guarantee don't have to be very long-lasting. But they must be done inside the filesystem because nothing in the Posix wording limits the guarantees to simultaneous writes within a single threaded process.
Within a multithreaded process, Posix does require read()
, write()
, pread()
and pwrite()
to be atomic when they operate on regular files (or symbolic links). See Thread Interactions with Regular File Operations for a complete list of interfaces which must obey this requirement.