Summary page
Return to the Kernel page

Recent Features

Home	Weekly edition	Kernel	Security	Distributions
Archives	Search	Penguin Gallery	Calendar	LWN.net FAQ
Subscriptions	Advertise	Write for LWN	Contact us	Privacy

Driver porting: supporting asynchronous I/O

This article is part of the LWN Porting Drivers to 2.6 series.

One of the key "enterprise" features added to the 2.6 kernel is asynchronous I/O (AIO). The AIO facility allows user processes to initiate multiple I/O operations without waiting for any of them to complete; the status of the operations can then be retrieved at some later time. Block and network drivers are already fully asynchronous, and thus there is nothing special that needs to be done to them to support the new asynchronous operations. Character drivers, however, have a synchronous API, and will not support AIO without some additional work. For most char drivers, there is little benefit to be gained from AIO support. In a few rare cases, however, it may be beneficial to make AIO available to your users.

AIO file operations

The first step in supporting AIO (beyond including <linux/aio.h>) is the implementation of three new methods which have been added to the file_operations structure:

    ssize_t (*aio_read) (struct kiocb *iocb, char __user *buffer, 
			 size_t count, loff_t pos);
    ssize_t (*aio_write) (struct kiocb *iocb, const char __user *buffer, 
			  size_t count, loff_t pos);
    int (*aio_fsync) (struct kiocb *, int datasync);

For most drivers, the real work will be in the implementation of aio_read() and aio_write(). These functions are analogous to the standard read() and write() methods, with a couple of changes: the file parameter has been replaced with an I/O control block (iocb), and they (usually) need not complete the requested operations immediately. The iocb argument can usually be treated as an opaque cookie used by the AIO subsystem; if you need the struct file pointer for this file descriptor, however, you can find it as iocb->ki_filp.

The aio_ operations can be synchronous. One obvious example is when the requested operation can be completed without blocking. If the operation is complete before aio_read() or aio_write() returns, the return value should be the usual status or error code. So, the following aio_read() method, while being pointless, is entirely correct:

    ssize_t my_aio_read(struct kiocb *iocb, char __user *buffer, 
                        size_t count, loff_t pos)
    {
	return my_read(iocb->ki_filp, buf, count, &pos);
    }

In some cases, synchronous behavior may actually be required. The so-called "synchronous iocb's" allow the AIO subsystem to be used synchronously when need be. The macro:

    is_sync_kiocb(struct kiocb *iocb)

will return a true value if the request must be handled synchronously.

In most cases, though, it is assumed that the I/O request will not be satisfied immediately by aio_read() or aio_write(). In this case, those functions should do whatever is required to get the operation started, then return -EIOCBQUEUED. Note that any work that must be done within the user process's context must be done before returning; you will not have access to that context later. In order to access the user buffer, you will probably need to either set up a DMA mapping or turn the buffer pointer into a series of struct page pointers before returning. Bear in mind also that there can be multiple asynchronous I/O requests active at any given time. A driver which implements AIO will have to include proper locking (and, probably queueing) to keep these requests from interfering with each other.

When the I/O operation completes, you must inform the AIO subsystem of the fact by calling aio_complete():

    int aio_complete(struct kiocb *iocb, long res, long res2);

Here, iocb is, of course, the IOCB you were given when the request was initiated. res is the usual result of an I/O operation: the number of bytes transfered, or a negative error code. res2 is a second status value which will be returned to the user; currently (2.6.0-test9), callers of aio_complete() within the kernel always set res2 to zero. aio_complete() can be safely called in an interrupt handler. Once you have called aio_complete(), you no longer own the IOCB or the user buffer, and should not touch them again.

The aio_fsync() method serves the same purpose as the fsync() method; its purpose is to ensure that all pending data are written to disk. As a general rule, device drivers will not need to implement aio_fsync().

Cancellation

The design of the AIO subsystem includes the ability to cancel outstanding operations. Cancellation may occur as the result of a specific user-mode request, or during the cleanup of a process which has exited. It is worth noting that, as of 2.6.0-test9, no code in the kernel actually performs cancellation. So cancellation may not work properly, and the interface could change in the process of making it work. That said, here is how the interface looks today.

A driver which implements cancellation needs to implement a function for that purpose:

    int my_aio_cancel(struct kiocb *iocb, struct io_event *event);

A pointer to this function can be stored into any IOCB which can be cancelled:

    iocb->ki_cancel = my_aio_cancel;

Should the operation be cancelled, your cancellation function will be called with pointers to the IOCB and an io_event structure. If it is possible to cancel (or successfuly complete) the operation prior to returning from the cancellation function, the result of the operation should be stored into the res and res2 fields of the io_event structure, and return zero. A non-zero return value from the cancellation function indicates that cancellation was not possible.

Post a comment

AIO *is* useful in driver APIs

(Posted Dec 13, 2003 21:00 UTC (Sat) by HalfMoon) (Post reply)

For most char drivers, there is little benefit to be gained from AIO support. In a few rare cases, however, it may be beneficial to make AIO available to your users.

Not so rare as that. In fact, the first time I happened across AIO was in some other OS, where it was used to help provide drivers (a) access to I/O overlap with CPU processing, (b) user-mode control over readahead or writebehind buffering policies and (c) better accomodation for guaranteed-latency ("realtime") processing. Lots of UNIX APIs are basically just special case workarounds for not having AIO ... but they're more complicated to implement. Why not use the real thing, now that it's finally available? And for that matter, new code might be able to implement only AIO and thereby simplify quite a lot of things.

It's true that most of the 2.6 kernel AIO code is involved with improving disk concurrency, but classically that's only one of the problem domains for AIO. Linux can catch up with other operating systems in that respect pretty easily.

One more thing... the "retry" mechanism in AIO is something you'll probably have to use. It's still getting changed around a bit though.