The block layer in 2.4 (and prior) kernels was organized around the buffer
head data structure. The limits of buffer heads have long been clear,
however. It is hard to create a truly high-performance block I/O subsystem
when the underlying buffer head structure forces each I/O request to be
split into 512-byte chunks. So one of the first items on the 2.5 block
"todo" list was the creation of a way to represent block I/O requests that
supported higher performance and greater flexibility. The result was the
new BIO structure.
BIO basics
As with most real-world code, the BIO structure incorporates a fair number
of tricky details.
The core of the structure (as defined in <linux/bio.h>) is
not that complicated, however; it is as appears
in the diagram to the right. The BIO structure itself contains the usual
collection of housekeeping information, along with a pointer
(bi_io_vec) pointing to an array of bio_vec structures.
This array represents the (possibly multiple) segments which make up this
I/O request.
There is also an index (bi_idx) giving an offset into the
bi_io_vec array; we'll get into its use shortly.
The bio_vec structure itself has a simple definition:
struct bio_vec {
struct page *bv_page;
unsigned int bv_len;
unsigned int bv_offset;
};
As is increasingly the case with internal kernel data structures, the BIO
now tracks data buffers using struct page pointers. There are
some implications of this change for driver writers:
- Data buffers for block transfers can be anywhere - kernel or user
space. The driver author need not be concerned about the ultimate
source or destination of the data.
- These buffers could be in high memory, unless the driver author has
explicitly requested that bounce buffers be used (Request Queues I covers how to do
that). The
driver author cannot count on the existence of a kernel-space mapping
for the buffer unless one has been created explicitly.
- More than ever, block I/O operations are scatter/gather operations,
with data coming from multiple, dispersed buffers.
At first glance, the BIO structure may seem more difficult to work with
than the old buffer head, which provided a nice kernel virtual address for
a single chunk of data. Working with BIOs is not hard, however.
Getting request information from a BIO
A driver author could use the information above (along with the other BIO
fields) to get the needed information out of the structure without too much
trouble. As a general rule, however, direct access to the bio_vec
array is discouraged. A set of accessor routines has been provided which
hides the details of how the BIO structure works and eases access to that
structure. Use of these routines will make the driver author's job easier,
and, with luck, will enable a driver to keep working in the face of future
block I/O changes.
So how does one get request information from the BIO structure? The
beginning sector for the entire BIO is in the bi_sector field -
there is no accessor function for that. The total size of the operation is
in bi_size (in bytes). One can also get the total size in sectors with:
bio_sectors(struct bio *bio);
The function (macro, actually):
int bio_data_dir(struct bio *bio);
returns either READ or WRITE, depending on what type of
operation is encapsulated by this BIO.
Almost everything else requires working through the bio_vec
array. The encouraged way of doing that is to use the special
bio_for_each_segment macro:
int segno;
struct bio_vec *bvec;
bio_for_each_segment(bvec, bio, segno) {
/* Do something with this segment */
}
Within the loop, the integer variable segno will be the current
index into the array, and bvec will point to the current
bio_vec structure. Usually the driver programmer need not use
either variable; instead, a new set of macros is available for use within
this sort of loop:
- struct page *bio_page(struct bio *bio)
- Returns a pointer to the current page structure.
- int bio_offset(struct bio *bio)
- Returns the offset within the current page for this operation. Block
I/O operations are often page-aligned, but that is not always the case.
- int bio_cur_sectors(struct bio *bio)
- The number of sectors to transfer for this bio_vec.
- char *bio_data(struct bio *bio)
- Returns the kernel virtual address for the data buffer. Note
that this address will only exist if the buffer is not in high memory.
- char *bvec_kmap_irq(struct bio_vec *bvec, unsigned long *flags)
- This function returns a kernel virtual address which can be used to
access the data buffer pointed to by the given bio_vec entry;
it also disables
interrupts and returns an atomic kmap - so the driver should not sleep
until bvec_kunmap_irq() has been called. Note that the
flags argument is a pointer value, which is a departure for
the usual convention for macros which disable interrupts.
- void bvec_kunmap_irq(char *buffer, unsigned long *flags);
- Undo a mapping which was created with bvec_kmap_irq().
- char *bio_kmap_irq(struct bio *bio, unsigned long *flags);
- This function is a wrapper around bvec_kmap_irq(); it returns
a mapping for the current bio_vec entry in the given
bio. There is, of course, a corresponding
bio_kunmap_irq().
- char *__bio_kmap_atomic(struct bio *bio, int i, enum km_type type)
- Use kmap_atomic() to obtain a kernel virtual address for the
ith buffer in the bio; the kmap slot
designated by type will be used.
- void __bio_kunmap_atomic(char *addr, enum km_type type)
- Return a kernel virtual address obtained with
__bio_kmap_atomic().
A little detail which is worth noting: all of bio_data(),
bvec_kmap_irq(), and bio_kmap_irq() add the segment offset
(bio_offset(bio)) to the address before returning it. It is
tempting to add the offset separately, but that is an error which leads to
weird problems. Trust me.
Completing I/O
Given the information from the BIO, each block driver should be able to
arrange a transfer to or from its particular device. Note that a helper
function (blk_rq_map_sg()) exists which makes it easy to set up
DMA scatter/gather lists from a block request; we'll get into that when we
look at request queue management.
When the operation is complete, the driver must inform the block subsystem
of that fact. That is done with bio_endio():
void bio_endio(struct bio *bio, unsigned int nbytes, int error);
Here, bio is the BIO of interest, nbytes is the number of
bytes actually transferred, and error indicates the status of the
operation; it should be zero for a successful transfer, and a negative
error code otherwise.
Other BIO details
The bi_private field in the BIO structure is not used by the block
subsystem, and is available for the owner of the structure to use. Drivers
do not own BIOs passed in to their request function and should not
touch bi_private there. If your driver creates its own BIO
structures (using the functions listed below, usually), then the
bi_private field in those BIOs is available to it.
As mentioned above, the bi_idx BIO field is an index into the
bi_io_vec array. This index is maintained for a couple of
reasons. One is that it can be used to keep track of partially-complete
operations. But this field (along with bi_vcnt, which says how
many bio_vec entries are to be processed) can also be used to
split a BIO into multiple chunks. Using this facility, a RAID or volume
manager driver can "clone" a BIO into multiple structures all pointing at
different parts of the bio_vec array. The operation is quick and
efficient, and allows a large operation to be quickly dispatched across a
number of physical drives.
To clone a BIO in this way, use:
struct bio *bio_clone(struct bio *bio, int gfp_mask);
bio_clone() creates a second BIO pointing to the same
bio_vec array as the original. This function uses the
given gfp_mask when allocating memory.
BIO structures contain reference counts; the structure is released when the
reference count hits zero. Drivers normally need not manipulate BIO
reference counts, but, should the need arise, functions exist in the usual
form:
void bio_get(struct bio *bio);
void bio_put(struct bio *bio);
Numerous other functions exist for working with BIO structures; most of the
functions not covered here are involved with creating BIOs. More
information can be found in <linux/bio.h> and
block/biodoc.txt in the kernel documentation directory.
Post a comment
|