Tracing down Bio in Block Subsystems
I wrote this blog to answer two questions:
- Layers involved of I/O requests like write sth to a file in a local computer?
- How does md device (or any other block device like null block driver) receive its data?
Bottom halves¶
- Bottom halves perform interrupt-related work that was not performed by the interrupt handler (top half)
- Run with all interrupts enabled
- Deferring work means not now
- Work queue is a simple interface for deferring work to a generic kernel thread
runqueue & waitqueue
Interface¶
-
queuing work to workqueue[1]
1
2
3
4queue_work
queue_work_on
queue_delayed_work
queue_delayed_work_on -
schedule work to workqueue
Block drivers¶
- No need to open another kernel thread when using workqueues
- Waitqueue waits on the loop until the condition is met: https://stackoverflow.com/questions/11184581/why-does-wait-queue-implementation-wait-on-a-loop-until-condition-is-met
- wakeup will trigger an interrupt
- wake_up_interruptible wakes up only the processes that are in interruptible sleeps
- BIOs can be split, merged (chain). It’s in the scheduling layer.
- The null_blk driver is a bit different than others. It has two ways of receiving commands: bio based, req based.
- Device drivers are normally request based. BIOs are already split/merged in the block layer (scheduling) and grouped to a req which is sent to the device drivers. It should not touch BIOs inside a req/command in the device driver. The job of device driver is to translating a req to corresponding command.
- In-flight BIOs in the device driver don’t conclude the BIOs in the requests.
- Linux is running on the async context.
- flow control on device drivers may not be a good idea. A lot of places in the block layer have already done/could do that, like scheduling layer where requests are regulated.
Block io¶
v6.3-rc2
high level: app -> fs -> block level
1 | application |
bio -> bio_vec/bi_sector -> memory page
1 | struct bio { |
gendisk -> request queue/block device -> request
submit_bio() -> submit_bio_noaccout -> submit_bio_noacct_nocheck -> _submit_bio/_submit_bio_noacct
(generic_make_request[2], v<=5.8)
1 | struct request_queue { |
bio layer[3] -> request layer -> device driver[4]
request queue[5]
create/delete a rq: blk_mq_init_queue[6]
process a request: blk_mq_start_request
device mapper[7]
Flow control¶
Problem: Implement flow control for the bios in the md block device (or any other block device like null block driver) in Linux kernel
Bio is the unit to map data in the memory to generic block offset. Block device drivers receive requests or I/Os from I/O scheduler. The scheduler groups bios to requests and use a multi-queue machanism to dispatch events. Device mapper is on top of I/O scheduler. DM receives BIOs from the user/fs and remap them first according to the DM properties.
First, look into the null_blk driver. In flight BIO is an operation that has been requested, but hasn’t been initiated yet. The shared resources in flow control are in-flight BIOs. The invariants of flow control for the bios in the null_blk driver are:
- Increasing: If the # of in-flight bios >= high, the thread that is processing bio will be blocked
- Decreasing: If the # of in-flight bios < low, the blocked threads will continue executing.
Secendly, the null_blk driver receives requests from mq block layer or IOs directly. The request queue model and IO queue model will both handle the BIOs through *_handle_cmd and finish BIOs through *_complete_cmd. Zoned commands are part of processing IO and can be ignored as the other processing part.
Intuitively, there are two choices to add locking for null_blk driver:
- When the amount of blocking time is comparably smaller than the thread switching time, use spinlock
- spinlock, semaphore + sleep/wakeup
- sleeplock, atomics
This patch applies method 2. It can be implemented by atomic ops, waitqueue and spinlock.
Waitqueue has its disadavantages on the interrupted context. Workqueue is better for handling tasks without openning another thread.
Finally, cases that increase BIOs are handled in two parts:
Let n be # of BIO of a req,
- n = 1, bio_in_flight > high
- n + bio_in_flight > high
https://embetronicx.com/tutorials/linux/device-drivers/work-queue-in-linux-own-workqueue/ ↩︎
http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch13lev1sec3.html ↩︎
http://blog.vmsplice.net/2020/04/how-linux-vfs-block-layer-and-device.html ↩︎
https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html#request-queues-multi-queue-block-layer ↩︎
https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html#create-and-delete-a-request-queue ↩︎
https://xuechendi.github.io/2013/11/14/device-mapper-deep-dive ↩︎