LWN.net Logo

 


 
Summary page
Return to the Kernel page
 
Recent Features

LWN.net Weekly Edition for March 18, 2004

LWN.net Weekly Edition for March 11, 2004

The annotated SCO stock price chart

A grumpy editor's calendar search

LWN.net Weekly Edition for March 4, 2004

Printable page
 

 

A simple block driver

This article is part of the LWN Porting Drivers to 2.6 series.
Given the large number of changes to the 2.6 block layer, it is hard to know where to start describing them. We'll begin by examining the simplest possible block driver. The sbd ("simple block device") driver simulates a block device with a region of kernel memory; it is, essentially, a naive ramdisk driver implemented in less than 200 lines of code. It will allow the demonstration of some changes in how block drivers work with the rest of the system without the need for all the complexity required when one is dealing with real hardware. Code fragments will be shown below; the full driver source can be found on this page.

If you have not read the block layer overview, you might want to head over there for a moment; this article will still be here when you get back.

Initialization

In our simple driver, the module initialization function is called sbd_init(). Its job, of course, is to get set up for block operations and to make its disk available to the system. The first step is to set up our internal data structure; within the driver a disk (the disk, in this case) is represented by:

	static struct sbd_device {
	    unsigned long size;
	    spinlock_t lock;
	    u8 *data;
	    struct gendisk *gd;
	} Device;

Here size is the size of the device (in bytes), data is the array where the "disk" stores its data, lock is a spinlock for mutual exclusion, and gd is the kernel representation of our device.

The device initialization is pretty straightforward; it is just a matter of allocating the memory to actually store the data and initializing the spinlock:

    Device.size = nsectors*hardsect_size;
    spin_lock_init(&Device.lock);
    Device.data = vmalloc(Device.size);
    if (Device.data == NULL)
	return -ENOMEM;

(nsectors and hardsect_size are load-time parameters that control how big the device should be).

About now is where block drivers traditionally register themselves with the kernel, and sbd does that too:

    major_num = register_blkdev(major_num, "sbd");
    if (major_num <= 0) {
	printk(KERN_WARNING "sbd: unable to get major number\n");
	goto out;
    }

Note that, in 2.6, no device operations structure is passed to register_blkdev(). As it turns out, a block driver can happily get by without calling register_blkdev() at all. That function does little work, at this point, and will likely be removed sooner or later. About the only remaining tasks performed by register_blkdev() are the assignment of a dynamic major number (if requested), and causing the block driver to show up in /proc/devices.

Generic disks

If register_blkdev() no longer does anything, where does the real work get done? The answer lies in the much improved 2.6 "generic disk" (or "gendisk") code. The gendisk interface is covered in a separate article, so we'll look only quickly at how sbd does its gendisk setup.

The first step is to get a gendisk structure to represent the sbd device:

    Device.gd = alloc_disk(16);
    if (! Device.gd)
	goto out_unregister;

Note that a memory allocation is involved, so the return value should be checked. The parameter to alloc_disk() indicates the number of minor numbers that should be dedicated to this device. We have requested 16 minor numbers, meaning that the device will support 15 partitions.

The gendisk must be initialized; the sbd driver starts that task as follows:

    Device.gd->major = major_num;
    Device.gd->first_minor = 0;
    Device.gd->fops = &sbd_ops;
    Device.gd->private_data = &Device;
    strcpy (Device.gd->disk_name, "sbd0");
    set_capacity(Device.gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE));

Most of the above should be relatively self-explanatory. The fops field is a pointer to the block_device_operations structure for this device; we'll get to that shortly. The private_data field can be used by the driver, so we stick a pointer to our sbd_device structure there. The set_capacity() call tells the kernel how large the device is. Note that the kernel can handle block devices which have sectors greater than 512 bytes, but it always deals with 512-byte sectors internally. So we need to normalize the sector count before passing it to the kernel.

Another thing that (usually) goes into the gendisk is the request queue to use. The BLK_DEFAULT_QUEUE macro from 2.4 is no more; a block driver must explicitly create and set up the request queue(s) it will use. Furthermore, request queues must be allocated dynamicly, at run time. The sbd driver sets up its request queue as follows:

    static struct request_queue *Queue;
    /* ... */
    Queue = blk_init_queue(sbd_request, &Device.lock);
    if (Queue == NULL)
	    goto out;
    blk_queue_hardsect_size(Queue, hardsect_size);
    Device.gd->queue = Queue;

Here, sbd_request is the request function, which we will get to soon. Note that a spinlock must be passed into blk_init_queue(). The global io_request_lock is gone forevermore, and each block driver must manage its own locking. Typically, the lock used by the driver to serialize access to internal resources is the best choice for controlling access to the request queue as well. For that reason, the block layer expects the driver to provide a lock of its own for the queue. If a nonstandard hard sector size (i.e. not 512 bytes) is in use, the sector size should be stored into the request queue with blk_queue_hardsect_size(). Finally, a pointer to the queue must be stored in the gendisk structure.

At this point, the gendisk setup is complete. All that remains is to add the disk to the system:

    add_disk(Device.gd);

Note that add_disk() may well generate I/O to the device before it returns - the driver must be in a state where it can handle requests before adding disks. The driver also should not fail initialization after it has successfully added a disk.

What you don't have to do

That is the end of the initialization process for the sbd driver. What you don't have to do is as notable as what does need to be done. For example, there are no assignments to global arrays; the whole set of global variables that used to describe block devices is gone. There is also nothing here for dealing with partition setup. Partition handling is now done in the generic block layer, and there is almost nothing that individual drivers must do at this point. "Almost" because the driver must handle one ioctl() call, as described below.

Open and release

The open and release methods (which are kept in the block_device_operations structure) actually have not changed since 2.4. The sbd driver has nothing to do at open or release time, so it doesn't even bother to define these methods. Drivers for real hardware may need to lock and unlock doors, check for media, etc. in these methods.

The request method

The core of a block driver, of course, is its request method. The sbd driver has the simplest possible request function; it does not concern itself with things like request clustering, barriers, etc. It does not understand the new bio structure used to represent requests at all. But it works. Real drivers will almost certainly require a more serious request method; see the other Driver Porting Series articles for the gory details on how to do that.

Here is the whole thing:

    static void sbd_request(request_queue_t *q)
    {
        struct request *req;
    
        while ((req = elv_next_request(q)) != NULL) {
            if (! blk_fs_request(req)) {
                end_request(req, 0);
                continue;
            }
            sbd_transfer(&Device, req->sector, req->current_nr_sectors,
                            req->buffer, rq_data_dir(req));
            end_request(req, 1);
        }
    }

The first thing to notice is that all of the old <linux/blk.h> cruft has been removed. Macros like INIT_REQUEST (with its hidden return statement), CURRENT, and QUEUE_EMPTY are gone. It is now necessary to deal with the request queue functions directly, but, as can be seen, that is not particularly hard.

Note that the Device.lock will be held on entry to the request function, much like io_request_lock is in 2.4.

The function for getting the first request in the queue is now elv_next_request(). A NULL return means that there are no more requests on the queue that are ready to process. A simple request loop like this one can simply run until the request queue is empty; drivers for real hardware will also have to take into account how many operations the device can handle, of course. Note that this function does not actually remove the request from the queue; it just returns a properly adjusted view of the top request.

Note also that, in 2.6, there can be multiple types of requests. Thus the test:

	if (! blk_fs_request(req)) {
	    end_request(req, 0);
	    continue;
	}

A nonzero return value from the blk_fs_request() macro says "this is a normal filesystem request." Other types of requests (i.e. packet-mode or device-specific diagnostic operations) are not something that sbd supports, so it simply fails any such requests.

The function sbd_transfer() is really just a memcpy() with some checking; see the full source if you are interested. The key is in the parameters: the various fields of the request structure (sector, current_nr_sectors, and buffer) look just like they did in 2.4. They also have the same meaning: they are a window looking at the first part of a (possibly larger) request. If you deal with block requests at this level, you need know nothing of the bio structures underlying the request. This approach only works for the simplest of drivers, however.

Note that the direction of the request is now found in the flags field, and can be tested with rq_data_dir(). A nonzero value (WRITE) indicates that this is a write request. Note also the absence of any code adding partition offsets; all of that is handled in the higher layers.

Finally, end_request() is called to finish processing of this request. This function has picked up a new parameter in 2.6, being the pointer to the request structure.

Other block operations

The two other block_device_operations methods from 2.4 - check_media_change() and revalidate() - have seen prototype changes in 2.5. They are now called media_changed() and revalidate_disk(), and both take a gendisk structure as their only argument. The basic task performed by these methods remains unchanged, however.

In 2.4, a block driver's ioctl() method would handle any commands it understood, and pass the rest on to blk_ioctl() for generic processing. In 2.6, the generic code gets the first crack at any ioctl() calls, and only invokes the driver for those it can't implement itself. As a result, ioctl() methods in drivers can often be pretty small. The sbd driver includes an ioctl method which handles a single command:

    int sbd_ioctl (struct inode *inode, struct file *filp,
                   unsigned int cmd, unsigned long arg)
    {
	long size;
	struct hd_geometry geo;

	switch(cmd) {
	/*
	 * The only command we need to interpret is HDIO_GETGEO, since
	 * we can't partition the drive otherwise.  We have no real
	 * geometry, of course, so make something up.
	 */
	    case HDIO_GETGEO:
		size = Device.size*(hardsect_size/KERNEL_SECTOR_SIZE);
		geo.cylinders = (size & ~0x3f) >> 6;
		geo.heads = 4;
		geo.sectors = 16;
		geo.start = 4;
		if (copy_to_user((void *) arg, &geo, sizeof(geo)))
			return -EFAULT;
		return 0;
        }
        return -ENOTTY; /* unknown command */
    }

The notion of a regular geometry has been fiction for most devices for some years now. Tools like fdisk still work with cylinders, however, so a driver must make up some sort of convincing geometry story. The sbd implementation claims four heads and 16 sectors per cylinder, but anything else reasonable would have worked as well.

Shutting down

The last thing to look at is what happens when the module is unloaded. We must, of course, clean up our various data structures and free memory - the usual stuff. The sbd cleanup function looks like this:

    static void __exit sbd_exit(void)
    {
        del_gendisk(Device.gd);
        put_disk(Device.gd);
        unregister_blkdev(major_num, "sbd");
        blk_cleanup_queue(Queue);
        vfree(Device.data);
    }

del_gendisk() cleans up any partitioning information, and generally makes the system forget about the gendisk passed to it. The call to put_disk() then releases our reference to the gendisk structure (obtained when we first called alloc_disk()) so that it can be freed. Then, of course, we must free the memory used for the device itself (only after the gendisk has been cleaned up, so we know no more operations can be requested), release the request queue, and unregister the block device.

Conclusion

That is about as simple as it gets; the above implements a true virtual block device that can support a filesystem. Real drivers, of course, will tend to be more complicated. For details on how to make them more complicated, continue with the Driver Porting Series; the next block driver article is The Gendisk Interface.
Post a comment

  -ENOTTY ?
(Posted Jan 15, 2004 22:54 UTC (Thu) by xav) (Post reply)

I don't understand the -ENOTTY in the sbd_ioctl(). Why not simply a -EINVAL ?

Copyright (©) 2003, Eklektix, Inc.
Linux (®) is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.