How does Linux kernel find out which process to wake up during interrupt handling?
Q1: “It” is wake_up
. It wakes up all tasks that are waiting for the disk data. If they weren't waiting for that data, they wouldn't be waiting on that queue.
Q2: I'm not sure I understand the question. Each wake queue entry contains a pointer to the task. try_to_wake_up
receives a pointer to the task that it's supposed to wake up. It is called once per function.
Q3: There are lots of wait queues. There's one for every event that can happen. The disk driver sets up a wait queue for each request to the disk. For example, when the filesystem driver wants the content of a certain disk block, it asks the disk driver for that block, and then the request starts with the task that made the filesystem request. Other entries may be added to the wait queue if another request for the same block comes in while this one is still outstanding.
When an interrupt happens, the disk driver determines which disk has data available from the information passed by the hardware and looks up the data structure that contains the kernel data for this disk to find which request was to be filled. In this data structure, among others, are the location where the data is to be written and the corresponding wake queue indicating what to do next.
Q4: The process makes a system call, let's say to read a file. This triggers some code in the filesystem driver which determines that the data needs to be loaded from the disk. That code makes a request to the disk driver and adds the calling process to the request's wait queue. (There are actually more layers than that, but you get the idea.) When the disk read completes, the wait queue event triggers, and the process is thus removed from the disk's wait queue. The code triggered by the wait queue event is a function supplied by the filesystem driver, which copies the data to the process's memory and causes the read
system call to return.
To build on Gilles’s answer,
Q2: I’m not sure, but I would interpret the passage from the book
to mean that the interrupt handler calls wake_up()
once
(passing the queue identifier as an argument), and wake_up()
calls try_to_wake_up()
for each process that is on that queue.
(This reiterates Gilles’s answer to Q1:
it doesn’t wake all tasks, just the ones that are on the wait queue
associated with the event that called wake_up()
.)
Q3: Each wait queue is “owned” by some piece of kernel code —
mostly device drivers, some others.
The routine that owns a queue assigns it a unique identifier,
based on some unique characteristic of the event that the queue is for.
When it (the driver/other module) puts a process to sleep,
it specifies (by ID) what queue to put it on.
The routine that calls wake_up()
(typically an interrupt handler)
must be part of the same module that put the process to sleep,
so it knows the identifier for the queue
that corresponds to the event that happened.
Last time I looked at Unix kernel source code (which was many years ago), the disk drivers had a different event ID for each I/O request. As Gilles says, multiple processes can be waiting for the same event if they are reading the same file at the same time. (This, of course, also relates to Q1.)
Q4: When I hear the phrase “disk controller”, I think of hardware.
But, aside from that, you’re right; the disk driver
(a software module in the kernel) has (at least potentially)
access to all information about any process that invokes it
(i.e., by doing disk I/O).
So, when the disk driver puts a process to sleep
because it has initiated a physical I/O that takes time to complete,
it (the driver) puts “the process’s PID, memory address or something”
into the wait queue.
Whatever this is, it is enough for try_to_wake_up()
to wake the process.
The last sentence of the passage that you quoted says,
“… VFS calls wake_up() on the wait queue …”.
I question whether this is literally accurate.
Filesystem code is a layer above the disk driver.
I would expect the interrupt (a signal from the disk hardware to the CPU)
to be handled by the disk interrupt handler (part of the disk driver)
which would wake up the process(es) that are waiting
(by calling wake_up()
).
The driver would subsequently wake the filesystem code.
(This terminology might be imprecise, too.
It might be better to say that the driver does something
to allow the filesystem code to resume processing.)
The filesystem code might then return to the user process,
or it might invoke the disk driver again,
resulting in the process being put to sleep again.
I quibble with your step #4. If a device is using DMA, it won’t interrupt until after the data transfer has completed.