LWN.net Logo

Sponsored Link

E-Commerce & credit card processing - the Open Source way!

Summary page
Return to the Kernel page
Recent Features

LWN.net Weekly Edition for March 18, 2004

LWN.net Weekly Edition for March 11, 2004

The annotated SCO stock price chart

A grumpy editor's calendar search

LWN.net Weekly Edition for March 4, 2004

Printable page


Driver porting: supporting mmap()

This article is part of the LWN Porting Drivers to 2.6 series.
Occasionally, a device driver will need to map an address range into a user process's space. This mapping can be done to give the process direct access to a device's I/O memory area, or to the driver's DMA buffers. 2.6 features a number of changes to the virtual memory subsystem, but, for most drivers, supporing mmap() will be relatively painless.

Using remap_page_range()

There are two techniques in use for implementing mmap(); often the simpler of the two is using remap_page_range(). This function creates a set of page table entries covering a given physical address range. The prototype of remap_page_range() changed slightly in 2.5.3; the relevant virtual memory area (VMA) pointer must be passed as the first parameter:

    int remap_page_range(struct vm_area_struct *vma, unsigned long from,
		         unsigned long to, unsigned long size, 
			 pgprot_t prot);

remap_page_range() is now explicitly documented as requiring that the memory management semaphore (usually current->mm->mmap_sem) be held when the function is called. Drivers will almost invariably call remap_page_range() from their mmap() method, where that semaphore is already held. So, in other words, driver writers do not normally need to worry about acquiring mmap_sem themselves. If you use remap_page_range() from somewhere other than your mmap() method, however, do be sure you have acquired the semaphore first.

Note that, if you are remapping into I/O space, you may want to use:

    int io_remap_page_range(struct vm_area_struct *vma, unsigned long from,
		            unsigned long to, unsigned long size, 
			    pgprot_t prot);

On all architectures other than SPARC, io_remap_page_range() is just another name for remap_page_range(). On SPARC systems, however, io_remap_page_range() uses the systems I/O mapping hardware to provide access to I/O memory.

remap_page_range() retains its longstanding limitation: it cannot be used to remap most system RAM. Thus, it works well for I/O memory areas, but not for internal buffers. For that case, it is necessary to define a nopage() method. (Yes, if you are curious, the "mark pages reserved" hack still works as a way of getting around this limitation, but its use is strongly discouraged).

Using vm_operations

The other way of implementing mmap is to override the default VMA operations to set up a driver-specific nopage() method. That method will be called to deal with page faults in the mapped area; it is expected to return a struct page pointer to satisfy the fault. The nopage() approach is flexible, but it cannot be used to remap I/O regions; only memory represented in the system memory map can be mapped in this way.

The nopage() method made it through the entire 2.5 development series without changes, only to be modified in the 2.6.1 release. The prototype for that function used to be:

    struct page *(*nopage)(struct vm_area_struct *area, 
                           unsigned long address, 
			   int unused);

As of 2.6.1, the unused argument is no longer unused, and the prototype has changed to:

    struct page *(*nopage)(struct vm_area_struct *area, 
	                   unsigned long address, 
			   int *type);

The type argument is now used to return the type of the page fault; VM_FAULT_MINOR would indicate a minor fault - one where the page was in memory, and all that was needed was a page table fixup. A return of VM_FAULT_MAJOR would, instead, indicate that the page had to be fetched from disk. Driver code using nopage() to implement a device mapping would probably return VM_FAULT_MINOR. In-tree code checks whether type is NULL before assigning the fault type; other users would be well advised to do the same.

There are a couple of other things worth mentioning. One is that the vm_operations_struct is rather smaller than it was in 2.4.0; the protect(), swapout(), sync(), unmap(), and wppage() methods have all gone away (they were actually deleted in 2.4.2). Device drivers made little use of these methods, and should not be affected by their removal.

There is also one new vm_operations_struct method:

    int (*populate)(struct vm_area_struct *area, unsigned long address, 
                    unsigned long len, pgprot_t prot, unsigned long pgoff, 
		    int nonblock);

The populate() method was added in 2.5.46; its purpose is to "prefault" pages within a VMA. A device driver could certainly implement this method by simply invoking its nopage() method for each page within the given range, then using:

    int install_page(struct mm_struct *mm, struct vm_area_struct *vma, 
                     unsigned long addr, struct page *page, 
		     pgprot_t prot);

to create the page table entries. In practice, however, there is no real advantage to doing things in this way. No driver in the mainline (2.5.67) kernel tree implements the populate() method.

Finally, one use of nopage() is to allow a user process to map a kernel buffer which was created with vmalloc(). In the past, a driver had to walk through the page tables to find a struct page corresponding to a vmalloc() address. As of 2.5.5 (and 2.4.19), however, all that is needed is a call to:

    struct page *vmalloc_to_page(void *address);

This call is not a variant of vmalloc() - it allocates no memory. It simply returns a pointer to the struct page associated with an address obtained from vmalloc().

No comments have been posted. Post one now

Copyright (©) 2003, Eklektix, Inc.
Linux (®) is a registered trademark of Linus Torvalds
Powered by Rackspace Managed Hosting.