Tracing down Bio in Block Subsystems

This blog is to answer two questions:

  1. Layers involved of I/O requests like write sth to a file in a local computer?
  2. How does md device (or any other block device like null block driver) receive its data?

Bottom halves

  1. Bottom halves perform interrupt-related work that was not performed by the interrupt handler (top half)
    • Run with all interrupts enabled
    • Deferring work means not now
  2. Work queue is a simple interface for deferring work to a generic kernel thread

runqueue & waitqueue

Interface

  1. queuing work to workqueue[1]

    1
    2
    3
    4
    queue_work
    queue_work_on
    queue_delayed_work
    queue_delayed_work_on
  2. schedule work to workqueue

Block drivers

  1. No need to open another kernel thread when using workqueues
  2. Waitqueue waits on the loop until the condition is met: https://stackoverflow.com/questions/11184581/why-does-wait-queue-implementation-wait-on-a-loop-until-condition-is-met
  3. wakeup will trigger an interrupt
    • wake_up_interruptible wakes up only the processes that are in interruptible sleeps
  4. BIOs can be split, merged (chain). It’s in the scheduling layer.
  5. The null_blk driver is a bit different than others. It has two ways of receiving commands: bio based, req based.
  6. Device drivers are normally request based. BIOs are already split/merged in the block layer (scheduling) and grouped to a req which is sent to the device drivers. It should not touch BIOs inside a req/command in the device driver. The job of device driver is to translating a req to corresponding command.
  7. In-flight BIOs in the device driver don’t conclude the BIOs in the requests.
  8. Linux is running on the async context.
  9. flow control on device drivers may not be a good idea. A lot of places in the block layer have already done/could do that, like scheduling layer where requests are regulated.

Block IO

v6.3-rc2

high level: app -> fs -> block level

1
2
3
4
5
6
7
8
9
application
VFS
File system (XFS, btrfs, etc)
Page cache
Block layer
- Device mapper
Driver Level
- I/O scheduler
- Physical device driver

bio -> bio_vec/bi_sector -> memory page

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
struct bio {
struct bio *bi_next; /* request queue link */
struct block_device *bi_bdev;
blk_opf_t bi_opf;
unsigned short bi_flags; /* BIO_* below */
unsigned short bi_ioprio;
blk_status_t bi_status;
atomic_t __bi_remaining; /* usage counter */

struct bvec_iter bi_iter;

blk_qc_t bi_cookie;
bio_end_io_t *bi_end_io;
void *bi_private;
...

atomic_t __bi_cnt; /* pin count */
struct bio_vec *bi_io_vec; /* the actual vec list */
struct bio_set *bi_pool;
struct bio_vec bi_inline_vecs[];
}

gendisk -> request queue/block device -> request

submit_bio() -> submit_bio_noaccout -> submit_bio_noacct_nocheck -> _submit_bio/_submit_bio_noacct

(generic_make_request[2], v<=5.8)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
struct request_queue {
struct request *last_merge;
struct elevator_queue *elevator;

struct percpu_ref q_usage_counter;
struct blk_queue_stats *stats;
struct rq_qos *rq_qos;
const struct blk_mq_ops *mq_ops;
struct blk_mq_ctx __percpu *queue_ctx;
unsigned int queue_depth;
void *queuedata;
unsigned long queue_flags;
...
spinlock_t queue_lock;
struct gendisk *disk;
unsigned long nr_requests; /* Max # of requests */
...
};

struct request {
struct request_queue *q;
blk_opf_t cmd_flags; /* op and common flags */
req_flags_t rq_flags;
...

/* the following two fields are internal, NEVER access directly */
unsigned int __data_len; /* total data len */
sector_t __sector; /* sector cursor */

struct bio *bio;
struct bio *biotail;

union {
struct list_head queuelist;
struct request *rq_next;
};

struct block_device *part;
...
}

bio layer[3] -> request layer -> device driver[4]

request queue[5]

create/delete a rq: blk_mq_init_queue[6]

process a request: blk_mq_start_request

device mapper[7]


  1. https://embetronicx.com/tutorials/linux/device-drivers/work-queue-in-linux-own-workqueue/ ↩︎

  2. v4.5 block layer ↩︎

  3. http://books.gigatux.nl/mirror/kerneldevelopment/0672327201/ch13lev1sec3.html ↩︎

  4. http://blog.vmsplice.net/2020/04/how-linux-vfs-block-layer-and-device.html ↩︎

  5. https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html#request-queues-multi-queue-block-layer ↩︎

  6. https://linux-kernel-labs.github.io/refs/heads/master/labs/block_device_drivers.html#create-and-delete-a-request-queue ↩︎

  7. https://xuechendi.github.io/2013/11/14/device-mapper-deep-dive ↩︎