--- layout: default title: usr/share/doc/df/html/dd.html ---

Device driver development using DF

1. Intro

A device driver is a task that:

serves I/O requests sent by clients (user applications, services and other device drivers)
typically drives a given hardware
sometimes acts as a client and makes I/O requests to other device drivers (this case is out of the scope of the current document)

A device driver can be designed and written with or without using DF (the Device Framework). This document focuses on writing DF-based devices.

2. Device framework, device manager and device basics

2.1. Flows

The generic data flow is illustrated on figure 1 below. Clients first open the device, then make asynchronous I/O requests that are answered by the device. A client program may use the I/O request API directly in sync or async manner or may depend on synchronous wrapper functions.

When using DF, the device driver initializes a DF instance on program entry (DF being a lib the device is linked against). I/O requests are received by a DF function and are tracked, queued and passed as async function calls to device driver callback functions registered on device startup.

Figure 1. Data flow
Nodes are code blocks, arrows represent communication

The device driver shall perform one of the following actions with the received and properly preprocessed I/O request (this must be implemented in the workrequest callback):

refuse the I/O request with an error code
respond to the I/O request immediately
start/trigger some background processing or hardware operation and delay (block) the I/O request and return to it later, responding to it with success or an error code.

The DF keeps track of the I/O requests using so called [request] heads. There is a pool of free heads allocated on startup. When a new request arrives at DF, a new free head is assigned to it or the request is queued in a FIFO if there is no free head available. This results in a finite set of I/O requests that have heads. These are the request the device driver (or DF) tries to serve in parallel, even though the code runs in a single task / single thread mode.

This implies that the device driver code needs to be written in an async style: it needs to spend time in small chunks on headed requests and delay them while waiting for timeout, data or events from the hardware.

How the device driver communicates with the hardware pretty much depends on the architecture and hardware implementation details. A common pattern is to receive interrupts from the hardware.

Interrupts need to be served immediately, sparing CPU cycles if possible, running in a context that is different from the device context that serves the I/O requests. Thus device drivers are often split in two parts, a "bottom" and a "top" half: the "top half" handling interrupts while the "bottom half" handling software requests from the system. There is an infrastructure available for registering interrupt handlers and for efficiently and safely pass on information from the "top half" to the "bottom half".

2.2. Request heads

A request head is DF's way of keeping track of parallel requests. Each requests being served has a head attached. The head has a state that tells what the request is doing at the moment (see Figure 2. below)

Figure 2. DF request head states
nodes are states, arrows are transitions
drq=requested by the device code
urq=requested by the user (client) code

Normally, there are free heads preallocated, waiting on a list. When a new request arrives to DF, it binds it to one of the free heads and marks it active. The list of all states and transitions:

free: head not in use
- -> active: by df, upon the arrival of a new request.
active: bound to an I/O request; being served
- -> blocked: by the device with DFBlockHead(); return value of the workrequest (DFWR_DELAY)
- -> finalized: by the device with DFFinalizeHead(); return value of the workrequest is DFWR_DONE or DFWR_ERROR
- -> aborting: by the user (client), using the AbortIO() on the client-side
blocked (a.k.a. delayed): postpone the request for later processing
- -> active: by the device, upon a call to DFActivateHead() - will trigger a workrequest call with the head
- -> finalized: by the device, upon a call to DFFinalizeHead()
- -> aborting: by the user (client), using the AbortIO() on the client-side
aborting: an abort request arrived, but the device has not yet aborted all pending operations related to the request
- -> finalized: by the device, calling DFFinalizeHead()
finalized: serving the request is finished by the device, with a positive or negative result; DF is working on getting the responses back to the original sender
- -> free: by df, after all the responses are sent

Notes:

The workrequest callback of the device is called whenever a head becomes active
DFActivateHead() and DFFinalizeHead() cannot be called from the workrequest on the current head; use the return value of the function instead. However, these functions can be called on other heads (except the current head).
List of heads: in some situations the device may need to keep lists of active heads; there's no easy/efficient way of searching the heads in DF. Before a head transitions to the free state, on_timedout is called so that the device can remove it from its lists.
In the finalized state the DF will send multiple responses only if one or more abort requests were received.

2.3. I/O request types

An I/O request is a message exchange between a client and a server, usually with some attached data in shared memory. What the data looks like in C, how/where it is allocate depends on the I/O request type (include/device/base.h):

IOreq_buffer: the most common one, with a C struct attached. Example: include/device/syslog.h; the void *data field of the IOreq points to the memory following the IOreq_buffer structure (which is always allocated to be large enough for the largest used struct of the device). The void *data part is casted to the right struct by client side wrapper functions, according to the request cmd value. An union is often created of the data structures, this makes it easier to determine overallocation needed.
custom structs: instead of the IOreq_buffer struct, define custom structs in the device class header. The first field must be IOreq req. Advantages: no need to overallocate, no void *. Example: include/device/repo.h
IOreq_direct: rarely used; it's the simpler version of IOreq_buffer with a fixed size payload.
IOreq_service: do not use
IOreq_scatter_gather: do not use
_CLSMSGBASE: for interrupts and device class specific messages; messages can be useful for passing on events without memory allocation. See also: 4.3.
IOREQ_EC_* generic error messages; these should be used by the DF mainly. A device should avoid using these, except for IOREQ_EC_OK, IOREQ_EC_INVALID_CMD, IOREQ_EC_INVALID_REQ, IOREQ_EC_CMD_NOT_IMPLEMENTED. For the rest, prefer device-specific error codes (see also: 3.2)

2.4. Units

The unit is an integer field of the IOreq struct, thus each request can select a unit of the device to operate on. The meaning of unit is device-specific:

it may refer to a physical instance of a hardware (e.g. SCSI LUNs)
it may be an instance of an underlying driver connection
it may represent a session identifier (e.g. fat32's block device instance and internal states)

Units should be numbered from 0.

2.5. Device manager (and detector)

Devices are started by the device manager. The device manager is a supervisor: it restarts a device if it quits. Device configuration is kept at the device manager (and detector) too, in a tagdesc format. It is accessible through an I/O request interface defined in include/device/devdet.h.

3. Device implementation: class headers and API

If a new device driver is an implementation of an existing class, it does not require modification of any files described in this section. Creating a new device class involves modifications in base.h and creating a new class header.

3.1. `base.h` modifications

Add a new DCLS_ constant for the new class. The upper 5 nibbles should be the same as in other DCLS_ values, the lower bits should be unique. There's no specific convention in assigning numbers, but it's good if the hex digits ressemble the device name (e.g. F5 for FS, file system). Never change any existing value. Do not reorder the list.

3.2. Device driver's class header

The public API of a device is specified in its class header. The class header is called include/device/dname.h, where dname is the name of the device. A typical example is include/device/vfs.h.

The header must first include base.h and may define the name of the device (DNAME_NAME) - this name is used by clients as an argument for OpenDevice(). The name should not be longer than DEVNAME_MAX_LEN characthers (including the terminating zero). For devices started by the device manager, the device manager will determine the name of the device. These devices shall accept the offered name.

There should be a list of device-specific error code #defines, all relative to the base configured in base.h. This is followed by request payload structures and request command ID #defines (DNAME_CMD_*), with a base of _CLSCMDBASE(DNAME).

Error codes are #define'd with base _CLSERRBASE(DNAME). Error codes should be detailed and specific. Naming convention: DNAME_EC_*.

4. Device implementation: implementation with df

4.1. File naming advice

The following is only a possibility. Some device implementations use a single-file solution.

The directory the implementation goes in is device/dname. It should contain at least the following files:

dname_start.c - entry point
dname_df.[ch] - device state and df callback functions
dname_commands.[ch] - functions to serve the actual I/O requests.
dname_command_*.c - in case if the amount of commands are too high to keep them in a single file (see service/vfs)

4.2. Entry point

Should be PROGRAMENTRY(dname) { } in file dname_start.c. It should initialize the DF, setting up a DFCtorData structure and calling DFCreate() with it. A simple example is df-examples/blkex1.c. Fields of the DFCtorData is documented in lib/df/df.h. The DFCreate() will also spawn a new thread. There are some undocumented features/tricks, though:

on_timedout is called any time a head is free'd - either because it timed out or finalized with positive or negative answer; can't call any DF* function and can not change the state of the IOReq got in the parameter.
head_count can not be zero (it represents the number of request heads: the maximum number of parallel requests the device is trying to serve)
head_extra: if non-zero, each head is allocated with this many bytes larger, so the device can store per-request states in it
peropen_size: df can keep a per-device-open state (typically a struct) - this is the size of that data, in bytes
the name of the device should be dynamic and received from the device manager; this is part of the device manager configuration. Get it via e_getenv(DEVMAN_ENV_DEVNAME).
initializing device state should happen in on_init (this is the only callback that run in the original thread)

4.3. DF callback functions

dname_df.h should contain the device state structure (optional) and the prototypes of the callback functions used for initializing DFCtorData fields.

dname_df.c implements the callback functions. The most important one is DFWorkResult dname_workrequest(DFState* state, IOreq* ioreq, DFIOreqHead* head) which is called each time when a DFIOreqHead being activated. To check whether this is the first invocation of on_workrequest for the current head, however this is not mandatory:

	if(req->req.io_state==IOREQ_STATE_NEW) {
		req->req.io_state = IOREQ_STATE_PEND;
		...
	}

It is recommended to check the type of the I/O request, e.g. (ioreq->io_type == IOREQ_TYPE_BUFFER).

After the sanity checks, most workrequest functions would use a switch on the ioreq->io_cmd field and call a function implemented in dname_commands.c.

The workrequest function should return one of:

DFWR_ERROR: refuse the request with an error code (the code should set IOreq.io_error_code field)
DFWR_DONE: finished processing the request, DF should send the response, no further workrequest calls for this request (IOreq.io_error_code will be overwritten with IOREQ_EC_OK)
DFWR_DELAY: block the I/O request, postponing the response

Workrequests handle I/O requests. For simpler events, usually without payload or with at most 1 pointer-sized field, it is possible (but not recommended) to use messages instead of I/O requests. The entry point for this is on_msgreceived. This is also the default mechanism for msg based interrupt handling. Related fields: msg_mask, msg_filter.

Other commonly used DF callbacks:

on_init: called from the program-entry thread of the device. Should return DF_TRUE on success, DF_FALSE on failure. This is the only callback that runs in the program-entry thread.
on_start: called right before the main loop of the device (which is implemented by the DF). Everything is initialized but no request handled yet.
on_destroy: called immediately before the device is destroyed. This can be used to uninit the hardware or clean up allocations. (In the current implementation it is not called)
on_abortrequest: called when an ABORT_IO_MSG message is received; the device should determine the fate of the pending I/O request. If the head is linked into a list, the head shall be removed from that list. (see an example in df-examples/chex2.c)
on_ioreq*: used when the device is also a client passing on the I/O request (TODO: refer to ext doc)
on_changepwrlvl: power level, drivers should act accordingly to the given power level (DFPL_*); i.e. switch to lower consumption mode when running on battery
on_debugcmd: clients may make debug requests without opening the device (i.e. dump internal hardware registers via debugtrace)
on_getcounters: query internal, but public counters (diagnostics); put the counters with the DFAppendCounter(); if DFAppendCounter() returns DF_FALSE, this function needs to return DF_FALSE immediately. Example: devman_getcounters(). The unit parameter is sometimes used to navigate in a menu system of counters, which is useful when many counters are available (see service/bufgroup and service/tcpdev2).

NOTE: state->data points to the device state instance for the device. Devices should use this struct instead of global variables. There is only one of this per device, for all instances (a per instance storage is called peropen). The struct is allocated by the DF.

4.4. Interrupt handling

A mechanism to pass on information from the interrupt handler ("top half") to the device driver core ("bottom half") is using ringbuffer (see lextras/ringbuffer.h):

with 1 reader and 1 writer it is transparent and lock-free (recommended for interrupt handling)
multiple writers, 1 reader: lock at writers, lock-free at the reader
1 writer, multiple readers: lock at readers, lock-free at the writer (recommended for interrupt handling)
multiple readers and writers: needs two locks, one for writers, one for readers

The acceptable mechanism to notify the DF about the interrupt is the signal messages (_SendSignalMsg()):

use _CreateSignalMsg() to create signal messages (e.g. one per event type)
the interrupt handler uses _SendSignalMsg() to send one of the signal messages
the "bottom half" can receive the message through an async callback on_msgreceived
absolutely not recommended to send messages that will be filtered-out (msg_mask, msg_filter), so take care

To hook to an interrupt use _CreateISR() (see the kernel reference manual for details). The interrupt handler function needs to be prefetched first (with PrefetchView() or by reading a few words from that), as well as the related data (stack and data structures). To obtain the interrupt number, the device may use different resources, i.e. from the devman detect data, or acquire it from its detector (see DCLS_DEVRES).

Generic considerations when communicating from interrupt handlers:

never call DF functions from the ISR function!
use SignalMsg to notify the DF; also small amount of data can be passed (typically events) lock-free and copy-free.
use the ringbuffer API for passing on data bigger than what fits in a SignalMsg; content may a structs or a character stream. Avoid locking. This still means a copy.
avoid copy if possible

4.5. Memory management

If the device requires memory for DMA operations, the memory object should be created using _CreateMemo() with MEMO_CR_FIX specified. After creating the memo, it must be mapped with the _AllocView() call.

5. Device implementation: client side wrapper lib

Using SyncIO() directly from the client code is not always the most convenient approach. Client code is often more readable when it can call wrapper functions with verbose C function call API. Such a wrapper function then performs the SyncIO(), hiding the binary API from the user code. The wrapper functions are always blocking calls.

DF has generic convenience macros for supporting wrapper function impementation. Using these macros has the advantage that the wrapper code needs to know the device type only and doesn't need to know the I/O request type.

For example please refer to libc/vfs_wrapper.c.

By convention wrapper libs are usually one lib per device, placed in lib/. There's no specific naming convention other than the name of the lib should refer to the name of the device.