Customizing Image Startup Programs

Developer Resources
PDF Documents
Caution: This version of this document is no longer maintained. For the latest documentation, see

Customizing Image Startup Programs

In this chapter...


The first program in a bootable Neutrino image is a startup program whose purpose is to:

  1. Initialize the hardware.
  2. Initialize the system page.
  3. Initialize callouts.
  4. Load and transfer control to the next program in the image.

You can customize Neutrino for different embedded-system hardware by changing the startup program.

Initialize hardware

You do basic hardware initialization at this time. The amount of initialization done here will depend on what was done in the IPL loader.

Note that you don't need to initialize standard peripheral hardware such as an IDE interface or the baud rate of serial ports. This will be done by the drivers that manage this hardware when they're started.

Initialize system page

Information about the system is collected and placed in an in-memory data structure called the system page. This includes information such as the processor type, bus type, and the location and size of available system RAM.

The kernel as well as applications can access this information as a read-only data structure. The hardware/system-specific code to interrogate the system for this information is confined to the startup program. This code doesn't occupy any system RAM after it has run.

Initialize callouts

Another key function of the startup code is that the system page callouts are bound in. These callouts are used by the kernel to perform various hardware- and system-specific functions that must be specified by the systems integrator.

Anatomy of a startup program

Each release of Neutrino ships with a growing number of startup programs for many boards. To find out what boards we currently support, please refer to the following sources:

Each startup program is provided as a ready-to-execute binary. Full source and a Makefile are also available so you can customize and remake each one. The files are kept in this directory structure as illustrated:

Figure showing the startup directory structure

Startup directory structure.

Generally speaking, the following directory structure applies in the startup source for the startup-boardname module:


Structure of a startup program

Each startup program consists of a main() with the following structure (in pseudo code):

Global variables

    Call add_callout_array()

    Argument parsing (Call handle_common_option())

    Call init_raminfo()
    Remove ram used by modules in the image

    if (virtual) Call init_mmu() to initialize the MMU

    Call init_intrinfo()
    Call init_qtime()
    Call init_cacheattr()
    Call init_cpuinfo()

    Set hardware machine name

    Call init_system_private()

    Call print_syspage() to print debugging output

Note: You should examine the commented source for each of the functions within the library to see if you need to replace a library function with one of your own.

Creating a new startup program

To create a new startup program, you should make a new directory under bsp_working_dir/src/hardware/startup/boards and copy the files from one of the existing startup program directories. For example, to create something close to the Intel PXA250TMDP board, called my_new_board, you would:

  1. cd bsp_working_dir/src/hardware/startup/boards
  2. mkdir my_new_board
  3. cp -r pxa250tmdp/* my_new_board
  4. cd my_new_board
  5. make clean

For descriptions of all the startup functions, see "The startup library" section in this chapter.

Structure of the system page

As mentioned earlier (see the section "Initialize system page"), one of the main jobs of the startup program is to initialize the system page.

The system page structure struct syspage_entry is defined in the include file <sys/syspage.h>. The structure contains a number of constants, references to other structures, and a union shared between the various processor platforms supported by Neutrino.

It's important to realize that there are two ways of accessing the data within the system page, depending on whether you're adding data to the system page at startup time or reading data from the system page later (as would be done by an application program running after the system has been booted). Regardless of which access method you use, the fields are the same.

Here's the system page structure definition, taken from <sys/syspage.h>:

 * contains at least the following:
struct syspage_entry {
    uint16_t            size;
    uint16_t            total_size;
    uint16_t            type;
    uint16_t            num_cpu;
    syspage_entry_info  system_private;
    syspage_entry_info  asinfo;
    syspage_entry_info  hwinfo;
    syspage_entry_info  cpuinfo;
    syspage_entry_info  cacheattr;
    syspage_entry_info  qtime;
    syspage_entry_info  callout;
    syspage_entry_info  callin;
    syspage_entry_info  typed_strings;
    syspage_entry_info  strings;
    syspage_entry_info  intrinfo;
    syspage_entry_info  smp;
    syspage_entry_info  pminfo;
    union {
        struct x86_syspage_entry    x86;
        struct ppc_syspage_entry    ppc;
        struct mips_syspage_entry   mips;
        struct arm_syspage_entry    arm;
        struct sh_syspage_entry     sh;
    } un;

Note that some of the fields presented here may be initialized by the code provided in the startup library, while some may need to be initialized by code provided by you. The amount of initialization required really depends on the amount of customization that you need to perform.

Let's look at the various fields.


The size of the system page entry. This member is set automatically by the library.


The size of the system page entry plus the referenced substructures; effectively the size of the entire system-page database. This member is set automatically by the library and adjusted later (grown) as required by other library calls.


This is used to indicate the CPU family for determining which union member in the un element to use. Can be one of:


The library sets this member automatically.


The num_cpu member indicates the number of CPUs present on the given system. This member is initialized to the default value 1 in the library and adjusted by the library call init_smp() if additional processors are detected.


The system_private area contains information that the operating system needs to know when it boots. This is filled in by the startup library's init_system_private() function.

Member Description
user_cpupageptr User address (R/O) for cpupage pointer
user_syspageptr User address (R/O) for syspage pointer
kern_cpupageptr Kernel address (R/W) for cpupage pointer
kern_syspageptr Kernel address (R/W) for syspage pointer
pagesize Granularity of the OS memory allocator (usually 16 in physical mode or 4096 in virtual mode).


The asinfo section consists of an array of the following structure. Each entry describes the attributes of one section of address space on the machine.

struct asinfo_entry {
    uint64_t            start;
    uint64_t            end;
    uint16_t            owner;
    uint16_t            name;
    uint16_t            attr;
    uint16_t            priority;
    int                 (*alloc_checker)(struct syspage_entry *__sp, 
                                     uint64_t   *__base,
                                     uint64_t   *__len,
                                     size_t     __size,
                                     size_t     __align);
    uint32_t            spare;
Member Description
start Gives the first physical address of the range being described.
end Gives the last physical address of the range being described. Note that this is the actual last byte, not one beyond the end.
owner An offset from the start of the section giving the owner of this entry (its "parent" in the tree). It's set to AS_NULL_OFF if the entry doesn't have an owner (it's at the "root" of the address space tree).
name An offset from the start of the strings section of the system page giving the string name of this entry.
attr Contains several bits affecting the address range (see below).
priority Indicates the speed of the memory in the address range. Lower numbers mean slower memory. The macro AS_PRIORITY_DEFAULT is defined to use a default value for this field (currently defined as 100).

Note: The alloc_checker isn't currently used. When implemented, it will let you provide finer-grain control over how the system allocates memory (e.g. making sure that ISA memory used for DMA doesn't cross 64 KB boundaries).

The attr field

The attr field can have the following bits:

#define AS_ATTR_READABLE 0x0001
Address range is readable.
#define AS_ATTR_WRITABLE 0x0002
Address range is writable.
#define AS_ATTR_CACHABLE 0x0004
Address range can be cached (this bit should be off if you're using device memory).
#define AS_ATTR_KIDS 0x0010
Indicates that there are other entries that use this one as their owner. Note that the library turns on this bit automatically; you shouldn't specify it when creating the section.
#define AS_ATTR_CONTINUED 0x0020
Indicates that there are multiple entries being used to describe one "logical" address range. This bit will be on in all but the last one. Note that the library turns on this bit and uses it internally; you shouldn't specify it when creating the section.

Address space trees

The asinfo section contains trees describing address spaces (where RAM, ROM, flash, etc. are located).

The general hierarchy for address spaces is:






The memory or io indicates whether this is describing something in the memory or I/O address space (the third form is used on a machine without separate in/out instructions and where everything is memory-mapped).

The memclass is something like: ram, rom, flash, etc. Below that would be further classifications, allowing the process manager to provide typed memory support.


The hwinfo area contains information about the hardware platform (type of bus, devices, IRQs, etc). This is filled in by the startup library's init_hwinfo() function.

This is one of the more elaborate sections of the Neutrino system page. The hwinfo section doesn't consist of a single structure or an array of the same type. Instead, it consists of a sequence of symbolically "tagged" structures that as a whole describe the hardware installed on the board. The following types and constants are all defined in the <hw/sysinfo.h> file.

Note: The hwinfo section doesn't have to describe all the hardware. For instance, the startup program doesn't have to do PCI queries to discover what's been plugged into any slots if it doesn't want to. It's up to you as the startup implementor to decide how full to make the hwinfo description. As a rule, if a component is hardwired on your board, consider putting it into hwinfo.


Each structure (or tag) in the section starts the same way:

struct hwi_prefix {
    uint16_t        size;
    uint16_t        name;

The size field gives the size, in 4-byte quantities, of the structure (including the hwi_prefix).

The name field is an offset into the strings section of the system page, giving a zero-terminated string name for the structure. It might seem wasteful to use an ASCII string rather than an enumerated type to identify the structure, but it actually isn't. The system page is typically allocated in 4 KB granularity, so the extra storage required by the strings doesn't cost anything. On the upside, people can add new structures to the section without requiring QNX Software Systems to act as a central repository for handing out enumerated type values. When processing the section, code should ignore any tag that it doesn't recognize (using the size field to skip over it).


Each piece of hardware is described by a sequence of tags. This conglomeration of tags is known as an item. Each item describes one piece of hardware. The first tag in each item always starts out with the following structure (note that the first thing in it is a hwi_prefix structure):

struct hwi_item {
    struct hwi_prefix   prefix;
    uint16_t            itemsize;
    uint16_t            itemname;
    uint16_t            owner;
    uint16_t            kids;

The itemsize field gives the distance, in 4-byte quantities, until the start of the next item tag.

The itemname gives an offset into the strings section of the system page for the name of the item being described. Note that this differs from the field, which tells what type of the structure the hwi_item is buried in.

The owner field gives the offset, in bytes, from the start of the hwinfo section to the item that this item is owned by. This field allows groups of items to be organized in a tree structure, similar to a filesystem directory hierarchy. We'll see how this is used later. If the item is at the root of a tree of ownership, the owner field is set to HWI_NULL_OFF.

The kids field indicates how many other items call this one "daddy."

Note: The code currently requires that the tag name of any item structure must start with an uppercase letter; nonitem tags have to start with a lowercase letter.

Device trees

The hwinfo section contains trees describing the various hardware devices on the board.

The general hierarchy for devices is:



the root of the hardware tree.
the bus the hardware is on (pci, eisa, etc.).
the general class of the device (serial, rtc, etc.).
the actual chip implementing the device (8250, mc146818, etc.).

Building the section

Two basic calls in the startup library are used to add things to the hwinfo section:

void *hwi_alloc_tag(const char *name, unsigned size, unsigned align);

This call allocates a tag of size size with the tag name of name. If the structure contains any 64-bit integer fields within it, the align field should be set to 8; otherwise, it should be 4. The function returns a pointer to memory that can be filled in as appropriate. Note that the hwi_prefix fields are automatically filled in by the hwi_alloc_tag() function.

void *hwi_alloc_item(const char *name, unsigned size, 
                     unsigned align, const char *itemname,
                     unsigned owner);

This call allocates an item structure. The first three parameters are the same as in the hwi_alloc_tag() function.

The itemname and owner parameters are used to set the itemname and owner fields of the hwi_item structure. All hwi_alloc_tag() calls done after a hwi_alloc_item() call are assumed to belong to that item and the itemsize field is adjusted appropriately.

Here are the general steps for building an item:

  1. Call hwi_alloc_item() to build a top-level item (one with the owner field to be HWI_NULL_OFF).
  2. Add whatever other tag structures you want in the item.
  3. Use hwi_alloc_item() to start a new item. This item could be either another top-level one or a child of the first.

Note that you can build the items in any order you wish, provided that the parent is built before the child.

When building a child item, suppose you've remembered its owner in a variable or you know only its item name. In order to find out the correct value of the owner parameter, you can use the following function (which is defined in the C library, since it's useful for people processing the section):

unsigned hwi_find_item(unsigned start, ...);

The start parameter indicates where to start the search for the given item. For an initial call, it should be set to HWI_NULL_OFF. If the item found isn't the one wanted, then the return value from the first hwi_find_item() is used as the start parameter of the second call. The search will pick up where it left off. This can be repeated as many times as required (the return value from the second call going into the start parameter of the third, etc). The item being searched is identified by a sequence of char * parameters following start. The sequence is terminated by a NULL. The last string before the NULL is the bottom-level itemname being searched for, the string in front of that is the name of the item that owns the bottom-level item, etc.

For example, this call finds the first occurrence of an item called "foobar":

item_off = hwi_find_item(HWI_NULL_OFF, "foobar", NULL);

The following call finds the first occurrence of an item called "foobar" that's owned by "sam":

item_off = hwi_find_item(HWI_NULL_OFF, "sam", "foobar", NULL);

If the requested item can't be found, HWI_NULL_OFF is returned.

Other functions

The following functions are in the C library for use in processing the hwinfo section:

unsigned hwi_tag2off(void *);
Given a pointer to the start of a tag, return the offset, in bytes, from the beginning of the start of the hwinfo section.
void *hwi_off2tag(unsigned);
Given an offset, in bytes, from the start of the hwinfo section, return a pointer to the start of the tag.
unsigned hwi_find_tag(unsigned start, int curr_item, const char *tagname);
Find the tag named tagname. The start parameter works the same as the one in hwi_find_item(). If curr_item is nonzero, the search stops at the end of the current item (whatever item the start parameter points into). If curr_item is zero, the search continues until the end of the section. If the tag isn't found, HWI_NULL_OFF is returned.


Before main() is invoked in the startup program, the library adds some initial entries to serve as a basis for later items.

HWI_TAG_INFO() is a macro defined in the <startup.h> header and expands out to the three name, size, align parameters for hwi_alloc_tag() and hwi_alloc_item() based on some clever macro names.

hwi_default() {
    hwi_tag     *tag;
    hwi_tag     *tag;

    hwi_alloc_item(HWI_TAG_INFO(group), HWI_ITEM_ROOT_AS,
    tag = hwi_alloc_item(HWI_TAG_INFO(group), HWI_ITEM_ROOT_HW,

    hwi_alloc_item(HWI_TAG_INFO(bus), HWI_ITEM_BUS_UNKNOWN,

    loc = hwi_find_item(HWI_NULL_OFF, HWI_ITEM_ROOT_AS, NULL);

    tag = hwi_alloc_item(HWI_TAG_INFO(addrspace),
                         HWI_ITEM_AS_MEMORY, loc);
    tag->addrspace.base = 0;
    tag->addrspace.len  = (uint64_t)1 << 32;
    #ifndef __X86__
       loc = hwi_tag2off(tag);
    tag = hwi_alloc_item(HWI_TAG_INFO(addrspace), HWI_ITEM_AS_IO,
    tag->addrspace.base = 0;
    #ifdef __X86__
        tag->addrspace.len  = (uint64_t)1 << 16;
        tag->addrspace.len  = (uint64_t)1 << 32;

Predefined items and tags

These are the items defined in the hw/sysinfo.h file. Note that you're free to create additional items -- these are just what we needed for our own purposes. You'll notice that all things are defined as HWI_TAG_NAME_*, HWI_TAG_ALIGN_*, and struct hwi_*. The names are chosen that way so that the HWI_TAG_INFO() macro in startup works properly.

Group item

#define HWI_TAG_NAME_group  "Group"
#define HWI_TAG_ALIGN_group (sizeof(uint32_t))
struct hwi_group {
    struct hwi_item     item;

The Group item is used when you wish to group a number of items together. It serves the same purpose as a directory in a filesystem. For example, the devclass level of the /hw tree would use a Group item.

Bus item

#define HWI_TAG_NAME_bus    "Bus"
#define HWI_TAG_ALIGN_bus   (sizeof(uint32))
struct hwi_bus {
    struct hwi_item     item;

The Bus item tells the system about a hardware bus. Item names can be (but are not limited to):

#define HWI_ITEM_BUS_PCI        "pci"
#define HWI_ITEM_BUS_ISA        "isa"
#define HWI_ITEM_BUS_EISA       "eisa"
#define HWI_ITEM_BUS_MCA        "mca"
#define HWI_ITEM_BUS_PCMCIA     "pcmcia"
#define HWI_ITEM_BUS_UNKNOWN    "unknown"

Device item

#define HWI_TAG_NAME_device     "Device"
#define HWI_TAG_ALIGN_device    (sizeof(uint32))
struct hwi_device {
    struct hwi_item     item;
    uint32_t            pnpid;

The Device item tells the system about an individual device (the device level from the "Trees" section -- the devclass level is done with a "Group" tag). The pnpid field is the Plug and Play device identifier assigned by Microsoft.

location tag

#define HWI_TAG_NAME_location   "location"
#define HWI_TAG_ALIGN_location  (sizeof(uint64))
struct hwi_location {
    struct hwi_prefix   prefix;
    uint32_t            len;
    uint64_t            base;
    uint16_t            regshift;
    uint16_t            addrspace;

Note that location is a simple tag, not an item. It gives the location of the hardware device's registers, whether in a separate I/O space or memory-mapped. There may be more than one of these tags in an item description if there's more than one grouping of registers.

The base field gives the physical address of the start of the registers. The len field gives the length, in bytes, of the registers. The regshift tells how much each register access is shifted by. If a register is documented at offset of a device, then the driver will actually access offset offset2^regshift to get to that register.

The addrspace field is an offset, in bytes, from the start of the asinfo section. It should identify either the memory or io address space item to tell whether the device registers are memory-mapped.

irq tag

#define HWI_TAG_NAME_irq        "irq"
#define HWI_TAG_ALIGN_irq       (sizeof(uint32))
struct hwi_irq {
    struct hwi_prefix   prefix;
    uint32_t            vector;

Note that this is a simple tag, not an item. The vector field gives the logical interrupt vector number of the device.

diskgeometry tag

#define HWI_TAG_NAME_diskgeometry   "diskgeometry"
#define HWI_TAG_ALIGN_diskgeometry  (sizeof(uint32))
struct hwi_diskgeometry {
    struct hwi_prefix   prefix;
    uint8_t             disknumber;
    uint8_t             sectorsize;   /* as a power of two */
    uint16_t            heads;
    uint16_t            cyls;
    uint16_t            sectors;
    uint32_t            nblocks;

Note that this is a simple tag, not an item. This is an x86-only mechanism used to transfer the information from the BIOS about disk geometry.

pad tag

#define HWI_TAG_NAME_pad        "pad"
#define HWI_TAG_ALIGN_pad       (sizeof(uint32))
struct hwi_pad {
    struct hwi_prefix   prefix;

Note that this is a simple tag, not an item. This tag is used when padding must be inserted to meet the alignment constraints for the subsequent tag.


The cpuinfo area contains information about each CPU chip in the system, such as the CPU type, speed, capabilities, performance, and cache sizes. There are as many elements in the cpuinfo structure as the num_cpu member indicates (e.g. on a dual-processor system, there will be two cpuinfo entries).

This table is filled automatically by the library function init_cpuinfo().

Member Description
cpu This is a number that represents the type of CPU. Note that this number will vary with the CPU architecture. For example, on the x86 processor family, this number will be the processor chip number (e.g. 386, 586). On MIPS and PowerPC, this is filled with the contents of the version registers.
speed Contains the MHz rating of the processor. For example, on a 300 MHz MIPS R4000, this number would be 300.
flags See below.
name Contains an index into the strings member in the system page structure. The character string at the specified index contains an ASCII, NULL-terminated machine name (e.g. on a MIPS R4000 it will be the string "R4000").
ins_cache Contains an index into the cacheattr array, described below. This index points to the first definition in a list for the instruction cache.
data_cache Contains an index into the cacheattr array, described below. This index points to the first definition in a list for the data cache.

The flags member contains a bitmapped indication of the capabilities of the CPU chip. Note that the prefix for the manifest constant indicates which CPU family it applies to (e.g. PPC_ indicates this constant is for use by the PowerPC family of processors). In the case of no prefix, it indicates that it's generic to any CPU.

Here are the constants and their defined meanings:

This constant: Means that the CPU has or supports:
CPU_FLAG_FPU Floating Point Unit (FPU).
CPU_FLAG_MMU Memory Management Unit (MMU), and the MMU is enabled (i.e. the CPU is currently in virtual addressing mode).
X86_CPU_CPUID CPUID instruction.
X86_CPU_RDTSC RDTSC instruction.
X86_CPU_INVLPG INVLPG instruction.
X86_CPU_WP WP bit in the CR0 register.
X86_CPU_BSWAP BSWAP instruction.
X86_CPU_MMX MMX instructions.
X86_CPU_CMOV CMOVxx instructions.
X86_CPU_PSE Page size extensions.
X86_CPU_PGE TLB (Translation Lookaside Buffer) global mappings.
X86_CPU_MTRR MTRR (Memory Type Range Register) registers.
X86_CPU_SIMD SIMD instructions.
X86_CPU_PAE Extended addressing.
PPC_CPU_EAR EAR (External Address Register) register.
PPC_CPU_HW_HT Hardware hash table.
PPC_CPU_HW_POW Power management.
PPC_CPU_FPREGS Floating point registers.
PPC_CPU_SW_HT Software hash table.
PPC_CPU_ALTIVEC AltiVec extensions.
PPC_CPU_XAEN Extended addressing.
PPC_CPU_TLB_SHADOW Shadow registers in TLB handler.
MIPS_CPU_FLAG_MAX_PGSIZE_MASK Maximum number of masks.
MIPS_CPU_FLAGS_MAX_PGSIZE_SHIFT Maximum number of shifts.
MIPS_CPU_FLAG_64BIT 64-bit registers.
MIPS_CPU_FLAG_128BIT 128-bit registers.
MIPS_CPU_FLAG_NO_WIRED No wired register.
MIPS_CPU_FLAG_NO_COUNT No count register.

syspage_entry cacheattr

The cacheattr area contains information about the configuration of the on-chip and off-chip cache system. It also contains the control() callout used for cache control operations. This entry is filled by the library routines init_cpuinfo() and init_cacheattr().

Note that init_cpuinfo() deals with caches implemented on the CPU itself; init_cacheattr() handles board-level caches.

Each entry in the cacheattr area consists of the following:

Member Description
next index to next lower level entry
line_size size of cache line in bytes
num_lines number of cache lines
flags See below
control callout supplied by startup code (see below).

The total number of bytes described by a particular cacheattr entry is defined by line_size * num_lines.

The flags parameter is a bitmapped variable consisting of the following:

This constant: Means that the cache:
CACHE_FLAG_INSTR Holds instructions.
CACHE_FLAG_UNIFIED Holds both instructions and data.
CACHE_FLAG_SHARED Is shared between multiple processors in an SMP system.
CACHE_FLAG_SNOOPED Implements a bus-snooping protocol.
CACHE_FLAG_VIRTUAL Is virtually tagged.
CACHE_FLAG_WRITEBACK Does write-back, not write-through.
CACHE_FLAG_CTRL_PHYS Takes physical addresses via its control() function.
CACHE_FLAG_SUBSET Obeys the subset property. This means that one cache level caches something from another level as well. As you go up each cache level, if something is in a particular level, it will also be in all the lower-level caches as well. This impacts the flushing operations of the cache in that a "subsetted" level can be effectively "ignored" by the control() function, since it knows that the operation will be performed on the lower-level cache.
CACHE_FLAG_NONISA Doesn't obey ISA cache instructions.

The cacheattr entries are organized in a linked list, with the next member indicating the index of the next lower cache entry. This was done because some architectures will have separate instruction and data caches at one level, but a unified cache at another level. This linking allows the system page to efficiently contain the information. Note that the entry into the cacheattr tables is done through the cpuinfo's ins_cache and data_cache. Since the cpuinfo is an array indexed by the CPU number for SMP systems, it's possible to construct a description of caches for CPUs with different cache architectures. Here's a diagram showing a two-processor system, with separate L1 instruction and data caches as well as a unified L2 cache:

Diagram showing two-processor system

Two-processor system with separate L1 instruction and data caches.

Given the above memory layout, here's what the cpuinfo and cacheattr fields will look like:

cpuinfo [0].ins_cache  = 0;
cpuinfo [0].data_cache = 1;

cpuinfo [1].ins_cache  = 0;
cpuinfo [1].data_cache = 1;

cacheattr [0].next = 2;
cacheattr [0].linesize = linesize;
cacheattr [0].numlines = numlines;
cacheattr [0].flags = CACHE_FLAG_INSTR;

cacheattr [1].next = 2;
cacheattr [1].linesize = linesize;
cacheattr [1].numlines = numlines;
cacheattr [1].flags = CACHE_FLAG_DATA;

cacheattr [2].next = CACHE_LIST_END;
cacheattr [2].linesize = linesize;
cacheattr [2].numlines = numlines;
cacheattr [2].flags = CACHE_FLAG_UNIFIED;

Note that the actual values chosen for linesize and numlines will, of course, depend on the actual configuration of the caches present on the system.

syspage_entry qtime

The qtime area contains information about the timebase present on the system, as well as other time-related information. The library routine init_qtime() fills these data structures.

Member Description
intr Contains the interrupt vector that the clock chip uses to interrupt the processor.
boot_time Seconds since Jan 1 1970 00:00:00 GMT when the system was booted.
nsec This 64-bit field holds the number of nanoseconds since the system was booted.
nsec_tod_adjust When added to the nsec field, this field gives the number of nanoseconds from the start of the epoch (1970).
nsec_inc Number of nanoseconds deemed to have elapsed each time the clock triggers an interrupt.
adjust Set to zero at startup -- contains any current timebase adjustment runtime parameters (as specified by the kernel call ClockAdjust()).
timer_rate Used in conjunction with timer_scale (see below).
timer_scale See below.
timer_load Timer chip divisor value. The startup program leaves this zero. The kernel sets it based on the last ClockPeriod() and timer_rate/timer_scale values to a number, which is then put into the timer chip by the timer_load/timer_reload kernel callouts.
cycles_per_sec For ClockCycles().
epoch Currently set to 1970, but not used.
flags Indicates when timer hardware is specific to CPU0.

Note: The nsec field is always monotonically increasing and is never affected by setting the current time of day via ClockTime() or ClockAdjust(). Since both nsec and nsec_tod_adjust are modified in the kernel's timer interrupt handler and are too big to load in an atomic read operation, to inspect them you must either:
  • disable interrupts


  • get the value(s) twice and make sure that they haven't changed between the first and second read.

The parameters timer_rate and timer_scale relate to the external counter chip's input frequency, in Hz, as follows:

Figure showing the timer parameters

Yes, this does imply that timer_scale is a negative number. The goal when expressing the relationship is to make timer_rate as large as possible in order to maximize the number of significant digits available during calculations.

For example, on an x86 PC with standard hardware, the values would be 838095345UL for the timer_rate and -15 for the timer_scale. This indicates that the timer value is specified in femtoseconds (the -15 means "ten to the negative fifteen"); the actual value is 838,095,345 femtoseconds (approximately 838 nanoseconds).

If you need to change the number of nsecs that the OS adds to the time when a tick fires, you can manually adjust the nsec_inc value in SYSPAGE_ENTRY (qtime).

The idea is to adjust for differences between the clock interval and the real expired time. The closer they become the less need there is for ClockAdjust() calls.

What you'll need to do is find out the physical address of the syspage. If it's already in nsec_inc you won't need to modify startup. If not, modify startup to put it there. Then use the mmap_device_memory() function to make the physical address of the syspage writable. That is, get the offset to the read-only page, and map a new block of memory to the address.

You could give ClockAdjust() a value of 0 for the number of ticks, to indicate that you want to make this adjustment "permanent". If you don't want to do that, you can give the ClockAdjust() function the maximum possible value for tick_count.

When you call and modify nsec_inc, you overwrite the ClockPeriod() function. The timer_rate and timer_scale fields are used as the input frequency to the clock hardware. The code uses these fields and the requested tick rate to calculate the number of input frequency clocks to count before generating an interrupt. The number of input frequency clocks that are counted, combined with timer_rate and timer_scale provides the nsec_inc value. For example:

timer_load = requested_ticksize / (timer_rate ** timer_scale)
nsec_inc   = timer_load * (timer_rate ** timer_scale)

The nsec_inc value is used to adjust the time of day when the clock interrupt goes off.

The changed value in ClockPeriod() is used to determine the new ticksize.


The callout area is where various callouts get bound into. These callouts allow you to "hook into" the kernel and gain control when a given event occurs. The callouts operate in an environment similar to that of an interrupt service routine -- you have a very limited stack, and you can't invoke any kernel calls (such as mutex operations, etc.). On standard hardware platforms (MIPS and PowerPC eval boards, x86-PC compatibles), you won't have to supply any functionality -- it's already provided by the startup code we supply.

Member Description
reboot Used by the kernel to reset the system.
power Provided for power management.
The kernel uses these timer_* callouts to deal with the hardware timer chip.
debug Used by the kernel when it wishes to interact with a serial port, console, or other device (e.g. when it needs to print out some internal debugging information or when there's a fault).

For details about the characteristics of the callouts, please see the sections "Callout information" and "Writing your own kernel callout" later in this chapter.


For internal use.


The typed_strings area consists of several entries, each of which is a number and a string. The number is 4 bytes and the string is NULL-terminated as per C. The number in the entry corresponds to a particular constant from the system include file <confname.h> (see the C function confname() for more information).

Generally, you wouldn't access this member yourself; the various init_*() library functions put things into the typed strings literal pool themselves. But if you need to add something, you can use the function call add_typed_string() from the library.


This member is a literal pool used for nontyped strings. Users of these strings would typically specify an index into strings (for example, cpuinfo's name member).

Generally, you wouldn't access this member yourself; the various init_*() library functions put things into the literal pool themselves. But if you need to add something, you can use the function call add_string() from the library.


The intrinfo area is used to store information about the interrupt system. It also contains the callouts used to manipulate the interrupt controller hardware.

This area is automatically filled in by the library routine init_intrinfo().

If you need to override some of the defaults provided by init_intrinfo(), or if the function isn't appropriate for your custom environment, you can call add_interrupt_array() directly with a table of the following format:

Note: In all probability, you will need to modify this for non-x86 platforms.

Member Description
vector_base The base number of the logical interrupt numbers that programs will use (e.g. the interrupt vector passed to InterruptAttach()).
num_vectors The number of vectors starting at vector_base described by this entry.
cascade_vector If this interrupt entry describes a set of interrupts that are cascaded into another interrupt controller, then this variable contains the logical interrupt number that this controller cascades into.
cpu_intr_base The association between this set of interrupts and the CPU's view of the source of the interrupt (see below).
cpu_intr_stride The spacing between interrupt vector entries for interrupt systems that do autovectoring. On an x86 platform with the standard 8259 controller setup, this is the value 1, meaning that the interrupt vector corresponding to the hardware interrupt sources is offset by 1 (e.g. interrupt vector 0 goes to interrupt 0x30, interrupt vector 1 goes to interrupt 0x31, and so on). On non-x86 systems it's usually 0, because those interrupt systems generally don't do autovectoring. A value of 0 indicates that it's not autovectored.
flags Used by the startup code when generating the kernel's interrupt service routine entry points. See below under INTR_FLAG_* and PPC_INTR_FLAG_*.
id A code snippet that gets copied into the kernel's interrupt service routine used to identify the source of the interrupt, in case of multiple hardware events being able to trigger one CPU-visible interrupt. Further modified by the INTR_GENFLAG_* flags, defined below.
eoi A code snippet that gets copied into the kernel's interrupt service routine that provides the EOI (End Of Interrupt) functionality. This code snippet is responsible for telling the controller that the interrupt is done and for unmasking the interrupt level. For CPU fault-as-an-interrupt handling, eoi identifies the cause of the fault.
mask An outcall to mask an interrupt source at the hardware controller level. The numbers passed to this function are the interrupt vector numbers (starting at 0 to num_vectors - 1).
unmask An outcall to unmask an interrupt source at the hardware controller level. Same vector numbers as mask, above.
config Provides configuration information on individual interrupt levels. Passed the system page pointer (1st argument), a pointer to this interrupt info entry (2nd argument), and the zero-based interrupt level. Returns a bitmask; see INTR_CONFIG_FLAG* below.
patch_data Provides information about patched data. The patched data is passed to the patcher() routine that gets called once for each callout in a startup_intrinfo() structure.

Note: Each group of callouts (i.e. id, eoi, mask, unmask) for each level of interrupt controller deals with a set of interrupt vectors that start at 0 (zero-based). Set the callouts for each level of interruption accordingly.

Interrupt vector numbers are passed without offset to the callout routines. The association between the zero-based interrupt vectors the callouts use and the system-wide interrupt vectors is configured within the startup-intrinfo structures. These structures are found in the init_intrinfo() routine of startup.

The cpu_intr_base member

The interpretation of the cpu_intr_base member varies with the processor:

Processor Interpretation
x86 The IDT (Interrupt Descriptor Table) entry, typically 0x30.
PPC The offset from the beginning of the exception table where execution begins when an external interrupt occurs. A sample value is 0x0140, calculated by 0x0500 / 4.
PPC/BE Interrupts no longer start at fixed locations in low memory. Instead there's a set of IVOR (Interrupt Vector Offset Register) registers. Each exception class has a different IVOR. When you specify the interrupt layout to startup, you'll need to identify the particular IVOR register the processor will use when the interrupt occurs. For example, PPCBKE_SPR_IVOR4 is used for normal external interrupts; PPCBKE_SPR_IVOR10 is used for decrementer interrupts. See startup/boards/440rb/init_intrinfo.c for an example of what to do on bookE CPUs.
PPC/Non-BE --
MIPS The value in the "cause" register when an external interrupt occurs. A sample value is 0.
ARM This value should be 0, since all ARM interrupts are handled via the IRQ exception.
SH The offset from the beginning of the exception table where execution starts when an interrupt occurs. For example, for 7750, the value is 0x600.

The flags member

The flags member takes two sets of flags. The first set deals with the characteristics of the interrupts:

Indicates that this is a NonMaskable Interrupt (NMI). An NMI is an interrupt which can't be disabled by clearing the CPU's interrupt enable flag, unlike most normal interrupts. NonMaskable interrupts are typically used to signal events that require immediate action, such as a parity error, a hardware failure, or imminent loss of power. The address for the handler's NMI is stored in the BIOS's Interrupt Vector table at position 02H. For this reason an NMI is often referred to as INT 02H.

The code in the kernel needs to differentiate between normal interrupts and NMIs, because with an NMI the kernel needs to know that it can't protect (mask) the interrupt (hence the "N" in NonMaskable Interrupt). We strongly discourage the use of the NMI vector in x86 designs; we don't support it on any non-x86 platforms.

Note: Regular interrupts that are normally used and referred to by number are called maskable interrupts. Unlike non maskable interrupts, maskable interrupts are those that can be masked, or ignored, to allow the processor to complete a task.

Indicates that an EOI to the primary interrupt controller is not required when handling a cascaded interrupt (e.g. it's done automatically). Only used if this entry describes a cascaded controller.
Indicates that one or more of the vectors described by this entry is not connected to a hardware interrupt source, but rather is generated as a result of a CPU fault (e.g. bus fault, parity error). Note that we strongly discourage designing your hardware this way. The implication is that a check needs to be inserted for an exception into the generated code stream; after the interrupt has been identified, an EOI needs to be sent to the controller. The EOI code burst has the additional responsibility of detecting what address caused the fault, retrieving the fault type, and then passing the fault on. The primary disadvantage of this approach is that it causes extra code to be inserted into the code path.
Similar to INTR_FLAG_NMI, this indicates to the code generator that a different kernel entry sequence is required. This is because the PPC400 series doesn't have an NMI, but rather has a critical interrupt that can be masked. This interrupt shows up differently from a "regular" external interrupt, so this flag indicates this fact to the kernel.
Same as PPC_INTR_FLAG_400ALT, where CI refers to critical interrupt.
Indicates that exception table doesn't have normal 256 bytes of memory space between this and the next vector.

The second set of flags deals with code generation:

Before the interrupt identification or EOI code sequence is generated, a piece of code needs to be inserted to fetch the system page pointer into a register so that it's usable within the identification code sequence.
Same as INTR_GENFLAG_LOAD_SYSPAGE, except that it loads a pointer to this structure.
Used only by EOI routines for hardware that doesn't automatically mask at the chip level. When the EOI routine is about to reenable interrupts, it should reenable only those interrupts that are actually enabled at the user level (e.g. managed by the functions InterruptMask() and InterruptUnmask()). When this flag is set, the existing interrupt mask is stored in a register for access by the EOI routine. A zero in the register indicates that the interrupt should be unmasked; a nonzero indicates it should remain masked.
Used by the interrupt ID code to cause a check to be made to see if the interrupt was due to a glitch or to a different controller. If this flag is set, the check is omitted -- you're indicating that there's no reason (other than the fact that the hardware actually did generate an interrupt) to be in the interrupt service routine. If this flag is not set, the check is made to verify that the suspected hardware really is the source of the interrupt.
Same as INTR_GENFLAG_LOAD_SYSPAGE, except that it loads a pointer to the number of the CPU this structure uses.

config return values

The config callout may return zero or more of the following flags:

Normally, an interrupt is masked off until a routine attaches to it via InterruptAttach() or InterruptAttachEvent(). If CPU fault indications are routed through to a hardware interrupt (not recommended!), the interrupt would, by default, be disabled. Setting this flag causes a "dummy" connection to be made to this source, causing this level to become unmasked.
Prevents user code from attaching to this interrupt level. Generally used with INTR_CONFIG_FLAG_PREATTACH, but could be used to prevent user code from attaching to any interrupt in general.
Identifies the vector that's used as the target of an inter-processor interrupt in an SMP system.

syspage_entry union un

The un union is where processor-specific system page information is kept. The purpose of the union is to serve as a demultiplexing point for the various CPU families. It is demultiplexed based on the value of the type member of the system page structure.

Member Processor type
x86 The x86 family SYSPAGE_X86
ppc PowerPC family SYSPAGE_PPC
mips The MIPS family SYSPAGE_MIPS
arm The ARM family SYSPAGE_ARM
sh The Hitachi SH family of processors. SYSPAGE_SH


This structure contains the x86-specific information. On a standard PC-compatible platform, the library routines (described later) fill these fields:

Contains info on how to manipulate the SMP control hardware; filled in by the library call init_smp().
Contains the Global Descriptor Table (GDT); filled in by the library.
Contains the Interrupt Descriptor Table (IDT); filled in by the library.
Contains pointers to the Page Directory Table(s); filled in by the library.
The virtual address corresponding to the physical address range 0 through 0xFFFFF inclusive (the bottom 1 megabyte).

un.x86.smpinfo (deprecated)

The members of this field are filled automatically by the function init_smp() within the startup library.

un.ppc (deprecated)

This structure contains the PowerPC-specific information. On a supported evaluation platform, the library routines (described later) fill these fields. On customized hardware, you'll have to supply the information.

Contains info on how to manipulate the SMP control hardware; filled in by the library call init_smp().
Kernel information, filled by the library.
Points at system exception table, filled by the library.


Contains information relevant to the kernel:

Allows us to specify an override for the CPU ID register so that the kernel can pretend it is a "known" CPU type. This is done because the kernel "knows" only about certain types of PPC CPUs; different variants require specialized support. When a new variant is manufactured, the kernel will not recognize it. By stuffing the pretend_cpu field with a CPU ID from a known CPU, the kernel will pretend that it's running on the known variant.
Template of what bits to have on in the MSR when creating a thread. Since the MSR changes among the variants in the PPC family, this allows you to specify some additional bits that the kernel doesn't necessarily know about.
Indicates what family the PPC CPU belongs to.
Identifies what address space bits are active.
Lets callouts know whether to turn off data translation to get at their hardware.


This structure contains the MIPS-specific information:

A shadow copy of the interrupt mask bits for the builtin MIPS interrupt controller.


This structure contains the ARM-specific information:

Virtual address of the MMU level 1 page table used to map the kernel.
Physical address of the MMU level 1 page table used to map the kernel.
Virtual address of a 1-1 virtual-physical mapping used to map the startup code that enables the MMU. This virtual mapping is removed when the kernel is initialized.
Size of the mapping used for startup_base.
Structure containing ARM core-specific operations and data. Currently this contains the following:
A routine used to implement CPU-specific cache/TLB flushing when the memory manager unmaps or changes the access protections to a virtual memory mapping for a page. This routine is called for each page in a range being modified by the virtual memory manager.
A routine used to perform any operations that can be deferred when page_flush is called. For example on the SA-1110 processor, an Icache flush is deferred until all pages being operated on have been modified.

This structure contains the Hitachi SH-specific information:

Points at system exception table, filled by the library.


The smp area is CPU-independent and contains the following elements:

This element Description
send_ipi Sends an interprocess interrupt (IPI) to the CPU.
start_address Get the starting address for the IPI.
pending Identify the pending interrupts for the SMP processor.
cpu Identify the SMP CPU.


The pminfo area is a communication area between the power manager and startup/power callout.

The pminfo area contains the following elements which are customizable in the power manager structure and are power-manager dependent:

This element Description
wakeup_pending Notifies the power callout that a wakeup condition has occurred. The power manager requires write access so it can modify this entry.
wakeup_condition Indicates to the power manager what has caused the wakeup i.e. whether it's a power-on reset, or an interrupt from peripherals or other devices. The value is set by the power callout.
managed_storage This entry is an area where the power manager can store any data it chooses. This storage is not persistent storage; it needs to be manually stored and restored by the startup and power callout.

The managed_storage element is initialized by the init_pminfo() function call in startup and can be modified at startup. The value passed into init_pminfo() determines the size of the managed_storage array.

Callout information

All the callout routines share a set of similar characteristics:

Callouts are basically binding standalone pieces of code for the kernel to invoke without having to statically link them to the kernel.

The requirement for coding the callouts in assembler stems from the second requirement (i.e. that they must be written to be position-independent). This is because the callouts are provided as part of the startup code, which will get overwritten when the kernel starts up. In order to circumvent this, the startup program will copy the callouts to a safe place -- since they won't be in the location that they were loaded in, they must be coded to be position-independent.

We need to qualify the last requirement (i.e. that callouts not use any static read/write storage). There's a mechanism available for a given callout to allocate a small amount of storage space within the system page, but the callouts cannot have any static read/write storage space defined within themselves.

Debug interface

The debug interface consists of the following callouts:

These three callouts are used by the kernel when it wishes to interact with a serial port, console, or other device (e.g. when it needs to print out some internal debugging information or when there's a fault). Only the display_char() is required; the others are optional.

Clock/timer interface

Here are the clock and timer interface callouts:

The kernel uses these callouts to deal with the hardware timer chip.

The timer_load() callout is responsible for stuffing the divisor value passed by the kernel into the timer/counter chip. Since the kernel doesn't know the characteristics of the timer chip, it's up to the timer_load() callout to take the passed value and validate it. The kernel will then use the new value in any internal calculations it performs. You can access the new value in the qtime_entry element of the system page as well as through the ClockPeriod() function call.

The timer_reload() callout is called after the timer chip generates an interrupt. It's used in two cases:

The timer_value() callout is used to return the value of the timer chip's internal count as a delta from the last interrupt. This is used on processors that don't have a high-precision counter built into the CPU (e.g. 80386, 80486).

Interrupt controller interface

Here are the callouts for the interrupt controller interface:

In addition, two "code stubs" are provided:

The mask() and unmask() perform masking and unmasking of a particular interrupt vector.

The config() callout is used to ascertain the configuration of an interrupt level.

For more information about these callouts, refer to the intrinfo structure in the system page above.

Cache controller interface

Depending on the cache controller circuitry in your system, you may need to provide a callout for the kernel to interface to the cache controller.

On the x86 architecture, the cache controller is integrated tightly with the CPU, meaning that the kernel doesn't have to talk to the cache controller. On other architectures, like the MIPS and PowerPC, the cache controllers need to be told to invalidate portions of the cache when certain functions are performed in the kernel.

The callout for cache control is control(). This callout gets passed:

The callout is responsible for returning the number of cache lines that it affected -- this allows the caller (the kernel) to call the control() callout repeatedly at a higher level. A return of 0 indicates that the entire cache was affected (e.g. all cache entries were invalidated).

System reset callout

The miscellaneous callout, reboot(), gets called whenever the kernel needs to reboot the machine.

The reboot() callout is responsible for resetting the system. This callout lets developers customize the events that occur when proc needs to reboot -- such as turning off a watchdog, banging the right registers etc. without customizing proc each time.

A "shutdown" of the binary will call sysmgr_reboot(), which will eventually trigger the reboot() callout.

Power management callout

The power() callout gets called whenever power management needs to be activated. The power() callout is used for power management.

The startup library

The startup library contains a rich set of routines consisting of high-level functions that are called by your main() through to utility functions for interrogating the hardware, initializing the system page, loading the next process in the image, and switching to protected mode. Full source is provided for all these functions, allowing you to make local copies with minor modifications in your target startup directory.

The following are the available library functions (in alphabetical order):

init_syspage_memory() (deprecated)


int add_cache(int next, 
              unsigned flags, 
              unsigned line_size, 
              unsigned num_lines, 
              const struct callout_rtn *rtn);

Add an entry to the cacheattr section of the system page structure. Parameters map one-to-one with the structure's fields. The return value is the array index number of the added entry. Note that if there's already an entry that matches the one you're trying to add, that entry's index is returned -- nothing new is added to the section.


void add_callout(unsigned offset, 
                 const struct callout_rtn *callout);

Add a callout to the callout_info section of the system page. The offset parameter holds the offset from the start of the section (as returned by the offsetof() macro) that the new routine's address should be placed in.


void add_callout_array (const struct callout_slot *slots, 
                        unsigned size)

Add the callout array specified by slots (for size bytes) into the callout array in the system page.


struct intrinfo_entry 
        *add_interrupt(const struct startup_intrinfo 

Add a new entry to the intrinfo section. Returns a pointer to the newly added entry.


void add_interrupt_array (const struct startup_intrinfo *intrs,
                          unsigned size)

Add the interrupt array callouts specified by intrs (for size bytes) into the interrupt callout array in the system page.


void add_ram(paddr_t start, 
             paddr_t size);

Tell the system that there's RAM available starting at physical address start for size bytes.


unsigned add_string (const char *name)

Add the string specified by name into the string literal pool in the system page and return the index.


unsigned add_typed_string (int type_index,                           
                           const char *name)

Add the typed string specified by name (of type type_index) into the typed string literal pool in the system page and return the index.


struct qtime_entry *alloc_qtime(void);

Allocate space in the system page for the qtime section and fill in the epoch, boot_time, and nsec_tod_adjust fields. Returns a pointer to the newly allocated structure so that user code can fill in the other fields.


paddr_t alloc_ram (paddr_t addr, 
                   paddr_t size, 
                   paddr_t align)

Allocate memory from the free memory pool initialized by the call to init_raminfo(). The RAM is not cleared.


unsigned as_add(paddr_t start, 
                paddr_t end, 
                unsigned attr, 
                const char *name, 
                unsigned owner);

Add an entry to the asinfo section of the system page. Parameters map one-to-one with field names. Returns the offset from the start of the section for the new entry.


unsigned as_add_containing(paddr_t start,
                           paddr_t end, 
                           unsigned attr, 
                           const char *name, 
                           const char *container);

Add new entries to the asinfo section, with the owner field set to whatever entries are named by the string pointed to by container. This function can add multiple entries because the start and end values are constrained to stay within the start and end of the containing entry (e.g. they get clipped such that they don't go outside the parent). If more than one entry is added, the AS_ATTR_CONTINUED bit will be turned on in all but the last. Returns the offset from the start of the section for the first entry added.


unsigned as_default(void);

Add the default memory and io entries to the asinfo section of the system page.


unsigned as_find(unsigned start, ...);

The start parameter indicates where to start the search for the given item. For an initial call, it should be set to AS_NULL_OFF. If the item found isn't the one wanted, then the return value from the first as_find_item() is used as the start parameter of the second call. The search will pick up where it left off. This can be repeated as many times as required (the return value from the second call going into the start parameter of the third, etc). The item being searched is identified by a sequence of char * parameters following start. The sequence is terminated by a NULL. The last string before the NULL is the bottom-level itemname being searched for, the string in front of that is the name of the item that owns the bottom-level item, etc.

For example, this call finds the first occurrence of an item called "foobar":

item_off = as_find_item(AS_NULL_OFF, "foobar", NULL);

The following call finds the first occurrence of an item called "foobar" that's owned by "sam":

item_off = as_find_item(AS_NULL_OFF, "sam", "foobar", NULL);

If the requested item can't be found, AS_NULL_OFF is returned.


unsigned as_find_containing(unsigned off, 
                            paddr_t start, 
                            paddr_t end, 
                            const char *container);

Find an asinfo entry with the name pointed to by container that at least partially covers the range given by start and end. Follows the same rules as as_find() to know where the search starts. Returns the offset of the matching entry or AS_NULL_OFF if none is found. (The as_add_containing() function uses this to find what the owner fields should be for the entries it's adding.)


unsigned as_info2off(const struct asinfo_entry *);

Given a pointer to an asinfo entry, return the offset from the start of the section.


struct asinfo_entry *as_off2info(unsigned offset);

Given an offset from the start of the asinfo section, return a pointer to the entry.


void as_set_checker(unsigned off, 
                    const struct callout_rtn *rtn);

Set the checker callout field of the indicated asinfo entry. If the AS_ATTR_CONTINUED bit is on in the entry, advance to the next entry in the section and set its priority as well (see as_add_containing() for why AS_ATTR_CONTINUED would be on). Repeat until an entry without AS_ATTR_CONTINUED is found.


void as_set_priority(unsigned as_off, 
                     unsigned priority);

Set the priority field of the indicated entry. If the AS_ATTR_CONTINUED bit is on in the entry, advance to the next entry in the section and set its priority as well (see as_add_containing() for why AS_ATTR_CONTINUED would be on). Repeat until an entry without AS_ATTR_CONTINUED is found.


void avoid_ram( paddr32_t start,
                size_t size);

Make startup avoid using the specified RAM for any of its internal allocations. Memory remains available for procnto to use. This function is useful for specifying RAM that the IPL/ROM monitor needs to keep intact while startup runs. Because it takes only a paddr32_t, addresses can be specified in the first 4 GB. It doesn't need a full paddr_t because startup will never use memory above 4 GB for its own storage requirements.


unsigned long calc_time_t(const struct tm *tm);

Given a struct tm (with values appropriate for the UTC timezone), calculate the value to be placed in the boot_time field of the qtime section.


paddr32_t calloc_ram (size_t size, 
                      unsigned align)

Allocate memory from the free memory pool initialized by the call to init_raminfo(). The RAM is cleared.


uintptr_t callout_io_map_indirect(unsigned size, 
                                 paddr_t phys);

Same as mmap_device_io() in the C library -- provide access to an I/O port on the x86 (for other systems, callout_io_map() is the same as callout_memory_map_indirect()) at a given physical address for a given size. The return value is for use in the CPU's equivalent of in/out instructions (regular moves on all but the x86). The value is for use in any kernel callouts (i.e. they live beyond the end of the startup program and are maintained by the OS while running).


void *callout_memory_map_indirect(unsigned size, 
                                 paddr_t phys, 
                                 unsigned prot_flags);

Same as mmap_device_memory() in the C library -- provide access to a memory-mapped device. The value is for use in any kernel callouts (i.e. they live beyond the end of the startup program and are maintained by the OS while running).


void callout_register_data( void *rp,
                            void *data );

This function lets you associate a pointer to arbitrary data with a callout. This data pointer is passed to the patcher routine (see "Patching the callout code," below.

The rp argument is a pointer to the pointer where the callout address is stored in the system page you're building. For example, say you have a pointer to a system page section that you're working on called foo. In the section there's a field bar that points to a callout when the system page is finished. Here's the code:

// This sets the callout in the syspage:

foo->bar = (void *)&callout_routine_name;

// This registers data to pass to the patcher when we're
// building the final version of the system page:

callout_register_data(&foo->bar, &some_interesting_data_for_patcher);

When the patcher is called to fix up the callout that's pointed at by foo->bar, &some_interesting_data_for_patcher is passed to it.


void chip_access(paddr_t base, 
                 unsigned reg_shift, 
                 unsigned mem_mapped, 
                 unsigned size);

Get access to a hardware chip at physical address base with a register shift value of reg_shift (0 if registers are one byte apart; 1 if registers are two bytes apart, etc. See devc-ser8250 for more information).

If mem_mapped is zero, the function uses startup_io_map() to get access; otherwise, it uses startup_memory_map(). The size parameter gives the range of locations to be given access to (the value is scaled by the reg_shift parameter for the actual amount that's mapped). After this call is made, the chip_read*() and chip_write*() functions can access the specified device. You can have only one chip_access() in effect at any one time.


void chip_done(void);

Terminate access to the hardware chip specified by chip_access().


unsigned chip_read8(unsigned off);

Read one byte from the device specified by chip_access(). The off parameter is first scaled by the reg_shift value specified in chip_access() before being used.


unsigned chip_read16(unsigned off);

Same as chip_read8(), but for 16 bits.


unsigned chip_read32(unsigned off);

Same as chip_read16(), but for 32 bits.


void chip_write8(unsigned off, 
                 unsigned val);

Write one byte from the device specified by chip_access(). The off parameter is first scaled by the reg_shift value specified in chip_access() before being used.


void chip_write16(unsigned off, 
                  unsigned val);

Same as chip_write8(), but for 16 bits.


void chip_write32(unsigned off,
                  unsigned val);

Same as chip_write16(), but for 32 bits.


void copy_memory (paddr_t dst, 
                  paddr_t src, 
                  paddr_t len)

Copy len bytes of memory from physical memory at src to dst.


int del_typed_string(int type_index);

Find the string in the typed_strings section of the system page indicated by the type type_index and remove it. Returns the offset where the removed string was, or -1 if no such string was present.


void falcon_init_l2_cache(paddr_t base);

Enable the L2 cache on a board with a Falcon system controller chip. The base physical address of the Falcon controller registers are given by base.


void falcon_init_raminfo(paddr_t falcon_base);

On a system with the Falcon system controller chip located at falcon_base, determine how much/where RAM is installed and call add_ram() with the appropriate parameters.


unsigned falcon_system_clock(paddr_t falcon_base);

On a system with a Falcon chipset located at physical address falcon_base, return the speed of the main clock input to the CPU (in Hertz). This can then be used in turn to set the cpu_freq, timer_freq, and cycles_freq variables.


const void *find_startup_info (const void *start, 
                               unsigned type)

Attempt to locate the kind of information specified by type in the data area used by the IPL code to communicate such information. Pass start as NULL to find the first occurrence of the given type of information. Pass start as the return value from a previous call in order to get the next information of that type. Returns 0 if no information of that type is found starting from start.


int find_typed_string(int type_index);

Return the offset from the beginning of the type_strings section of the string with the type_index type. Return -1 if no such string is present.


void handle_common_option (int opt)

Take the option identified by opt (a single ASCII character) and process it. This function assumes that the global variable optarg points to the argument string for the option.

Valid values for opt and their actions are:

Reboot switch. If set, an OS crash will cause the system to reboot. If not set, an OS crash will cause the system to hang.
Output channel specification (e.g. kprintf(), stdout, etc.).
f [cpu_freq][,[cycles_freq][,timer_freq]]
Specify CPU frequencies. All frequencies can be followed by H for hertz, K for kilohertz, or M for megahertz (these suffixes aren't case-sensitive). If no suffix is given, the library assumes megahertz if the number is less than 1000; otherwise, it assumes hertz.

If they're specified, cpu_freq, cycles_freq, and timer_freq are used to set the corresponding variables in the startup code:

  • cpu_freq -- the CPU clock frequency. Also sets the speed field in the cpuinfo section of the system page.
  • cycles_freq -- the frequency at which the value returned by ClockCycles() increments. Also sets the cycles_per_sec field in the qtime section of the system page.
  • timer_freq -- the frequency at which the timer chip input runs. Also sets the timer_rate and timer_scale values of the qtime section of the system page.
kdebug remote debug protocol channel.
Placeholder for processing additional memory blocks. The parsing of additional memory blocks is deferred until init_system_private().
Add the hostname specified to the typed name string space under the identifier _CS_HOSTNAME.
Used for reserving memory at the bottom of the address space.
Used for reserving memory at any address space you specify.
Placeholder for processing debug code's -S option.
Specify maximum number of CPUs in an SMP system.
Add Jtag-related options. Reserves four bytes of memory at the specified location and copies the physical address of the system page to this location so the hardware debugger can retrieve it.
Increment the verbosity global flag, debug_flag.


void hwi_add_device(const char *bus, 
                    const char *class, 
                    const char *name, 
                    unsigned pnp);

Add an hwi_device item to the hwinfo section. The bus and class parameters are used to locate where in the device tree the new device is placed.


void hwi_add_inputclk(unsigned clk, 
                      unsigned div);

Add an hwi_inputclk tag to the hw item currently being constructed.


void hwi_add_irq(unsigned vector);

Add an irq tag structure to the hwinfo section. The logical vector number for the interrupt will be set to vector.


void hwi_add_location(paddr_t base, 
                      paddr_t len, 
                      unsigned reg_shift, 
                      unsigned addr_space);

Add a location tag structure to the hwinfo section. The fields of the structure will be set to the given parameters.


void hwi_add_nicaddr(const uint8 *addr, 
                     unsigned len);

Add an hwi_nicaddr tag to the hw item currently being constructed.


void hwi_add_rtc(const char *name, 
                 paddr_t base, 
                 unsigned reg_shift, 
                 unsigned len, 
                 int mmap, 
                 int cent_reg);

Add an hwi_device item describing the realtime clock to the hwinfo section. The name of the device is name. The hwi_location tag items are given by base, reg_shift, len, and mmap. The mmap parameter indicates if the device is memory-mapped or I/O-space-mapped and is used to set the addrspace field.

If the cent_reg parameter is not -1, it's used to add an hwi_regname tag with the offset field set to its value. This indicates the offset from the start of the device where the century byte is stored.


void *hwi_alloc_item(const char *tagname, 
                     unsigned size, 
                     unsigned align, 
                     const char *itemname, 
                     unsigned owner);

Add an item structure to the hwinfo section.


void *hwi_alloc_tag(const char *tagname, 
                    unsigned size, 
                    unsigned align);

Add a tag structure to the hwinfo section.


unsigned hwi_find_as(paddr_t base, 
int mmap);

Given a physical address of base and mmap (indicating 1 for memory-mapped and 0 for I/O-space-mapped), return the offset from the start of the asinfo section indicating the appropriate addrspace field value for an hwi_location tag.


unsigned hwi_find_item(unsigned start, ...);

Note: Although the hwi_find_item() function resides in the C library (proto in <hw/sysinfo.h>), the function is still usable from startup programs.

Search for a given item in the hwinfo section of the system page. If start is HWI_NULL_OFF, the search begins at the start of the hwinfo section. If not, it starts from the item after the offset of the one passed in (this allows people to find multiple tags of the same type; it works just like the find_startup_info() function). The var args portion is a list of character pointers, giving item names; the list is terminated with a NULL. The order of the item names gives ownership information. For example:

item = hwi_find_item(HWI_NULL_OFF, "foobar", NULL);

searches for an item name called "foobar." The following:

item = hwi_find_item(HWI_NULL_OFF, "mumblyshwartz",
                     "foobar", NULL);

also searches for "foobar," but this time it has to be owned by an item called "mumblyshwartz."

If the item can't be found, HWI_NULL_OFF is returned; otherwise, the byte offset within the hwinfo section is returned.


unsigned hwi_find_tag(unsigned start, 
                      int curr_item, 
                      const char *tagname);

Note: Although the hwi_find_tag() function resides in the C library (proto in <hw/sysinfo.h>), the function is still usable from startup programs.

Search for a given tagname in the hwinfo section of startup. The start parameter works just like in hwi_find_item(). If curr_item is nonzero, the tagname must occur within the current item. If zero, the tagname can occur anywhere from the starting point of the search to the end of the section. If the tag can't be found, then HWI_NULL_OFF is returned; otherwise, the byte offset within the hwinfo section is returned.


void *hwi_off2tag(unsigned off);

Note: Although the hwi_off2tag() function resides in the C library (proto in <hw/sysinfo.h>), the function is still usable from startup programs.

Given a byte offset from the start of the hwinfo section, return a pointer to the hwinfo tag structure.


unsigned hwi_tag2off(void *tag);

Note: Although the hwi_tag2off() function resides in the C library (proto in <hw/sysinfo.h>), the function is still usable from startup programs.

Given a pointer to the start of a hwinfo tag instruction, convert it to a byte offset from the start of the hwinfo system page section.


void init_asinfo(unsigned mem);

Initialize the asinfo section of the system page. The mem parameter is the offset of the memory entry in the section and can be used as the owner parameter value for as_add()s that are adding memory.


void init_cacheattr (void)

Initialize the cacheattr member. For all platforms, this is a do-nothing stub.


void init_cpuinfo (void)

Initialize the members of the cpuinfo structure with information about the installed CPU(s) and related capabilities. Most systems will be able to use this function directly from the library.


void init_hwinfo (void)

Initialize the appropriate variant of the hwinfo structure in the system page.


void init_intrinfo (void)

Initialize the intrinfo structure.

You would need to change this only if your hardware doesn't have the standard PC-compatible dual 8259 configuration.
The default library version sets up the internal MIPS interrupt controller.
No default version exists; you must supply one.
No default version exists; you must supply one.
The default library version sets up the SH-4 on-chip peripheral interrupt. You need to provide the external interrupt code.

If you're providing your own function, make sure it initializes:

This initialization of the structure is done via a call to the function add_interrupt_array().


void init_mmu (void)

Sets up the processor for virtual addressing mode by setting up page-mapping hardware and enabling the pager.

On the x86 family, it sets up the page tables as well as special mappings to "known" physical address ranges (e.g. sets up a virtual address for the physical address ranges 0 through 0xFFFFF inclusive).

The 400 and 800 series processors within the PowerPC family are stubs; the others, i.e. the 600 series and BookE processors, are not. On the MIPS, and SH families, this function is currently a stub. On the PowerPC family, this function may be a stub.

On the ARM family, this function simply sets up the page tables.


*init_pminfo (unsigned managed_size)

Initialize the pminfo section of the system page and set the number of elements in the managed storage array.


void init_qtime (void)

Initialize the qtime structure in the system page. Most systems will be able to use this function directly from the library.

This function doesn't exist for ARM. Specific functions exist for ARM processors with on-chip timers; currently, this includes only init_qtime_sa1100().


void init_qtime_sa1100 (void)

Initialize the qtime structure and kernel callouts in the system page to use the on-chip timer for the SA1100 and SA1110 processors.


void init_raminfo (void)

Determine the location and size of available system RAM and initialize the asinfo structure in the system page.

If you know the exact amount and location of RAM in your system, you can replace this library function with one that simply hard-codes the values via one or more add_ram() calls.

If the RAM configuration is known (e.g. set by the IPL code, or the multi-boot IPL code gets set by the gnu utility), then the library version of init_raminfo() will call the library routine find_startup_info() to fetch the information from a known location in memory. If the RAM configuration isn't known, then a RAM scan (via x86_scanmem()) is performed looking for valid memory between locations 0 and 0xFFFFFF, inclusive. (Note that the VGA aperture that usually starts at location 0xB0000 is specifically ignored.)
There's no library default. You must supply your own init_raminfo() function.


void init_smp (void)

Initialize the SMP functionality of the system, assuming the hardware (e.g. x86, PPC, MIPS) supports SMP.

init_syspage_memory() (deprecated)

void init_syspage_memory (void *base, 
                          unsigned size)

Initialize the system page structure's individual member pointers to point to the data areas for the system page substructures (e.g. typed_strings). The base parameter is a pointer to where the system page is currently stored (it will be moved to the kernel's address space later); the size indicates how big this area is. On all platforms, this routine shouldn't require modification.


void init_system_private (void)

Find all the boot images that need to be started and fill a structure with that information; parse any -M options used to specify memory regions that should be added; tell Neutrino where the image filesystem is located; and finally allocate room for the actual storage of the system page. On all platforms, this shouldn't require modification.

Note: Note that this must be the last init_*() function called.


void jtag_reserve_memory (unsigned long resmem_addr, 
                         unsigned long resmem_size,
                         uint8_t resmem_flag)  

Reserve a user-specified block of memory at the location specified in resmem_addr, if the resmem_flag is set to 0.


void kprintf (const char *fmt, ... )

Display output using the put_char() function you provide. It supports a very limited set of printf() style formats.


void mips41xx_set_clock_freqs(unsigned sysclk);

On a MIPS R41xx series chip, set the cpu_freq, timer_freq, and cycles_freq variables appropriately, given a system clock input frequency of sysclk.


void openbios_init_raminfo(void);

On a system that contains an OpenBIOS ROM monitor, add the system RAM information.


void pcnet_reset(paddr_t base, 
                 int mmap);

Ensure that a PCnet-style Ethernet controller chip at the given physical address (either I/O or memory-mapped as specified by mmap) is disabled. Some ROM monitors leave the Ethernet receiver enabled after downloading the OS image. This causes memory to be corrupted after the system starts and before Neutrino's Ethernet driver is run, due to the reception of broadcast packets. This function makes sure that no further packets are received by the chip until the Neutrino driver starts up and properly initializes it.


void ppc400_pit_init_qtime(void);

On a PPC 400 series chip, initialize the qtime section and timer kernel callouts of the system page to use the on-board Programmable Interval Timer.


void ppc405_set_clock_freqs
(unsigned sys_clk, unsigned timer_clk);

Initialize the timer_freq and cycles_freq variables based on a given timer_clk. The cpu_freq variable is initialized using a multiplication of a given system clock (system_clk). The multiplication value is found using the CPCO_PSR DCR.


void ppc600_set_clock_freqs(unsigned sysclk);

On a PPC 600 series chip, set the cpu_freq, timer_freq, and cycles_freq variables appropriately, given a system clock input frequency of sysclk.


void ppc700_init_l2_cache(unsigned flags);

On a PPC 700 series system, initialize the L2 cache. The flags indicate which bits in the L2 configuration register are set. In particular, they decide the L2 size, clock speed, and so on. For details, see the Motorola PPC 700 series user's documentation for the particular hardware you're using.

For example, on a Sandpoint board, flags might be:


This would set the following for L2CR:


void ppc800_pit_init_qtime(void);

On a PPC 800 series chip, initialize the qtime section and timer kernel callouts of the system page to use the on-board Programmable Interval Timer.


void ppc800_set_clock_freqs(unsigned extclk_freq, 
                            unsigned extal_freq, 
                            int is_extclk);

On a PPC 800 series chip, set the cpu_freq, timer_freq, and cycles_freq variables appropriately, given input frequencies of extclk_freq at the EXTCLK pin and extal_freq at the XTAL/EXTAL pins.

If is_extclk is nonzero, then the extclk_freq is used for the main timing reference (MODCLK1 signal is one at reset). If zero, extal_freq is used at the main timing reference (MODCLK1 signal is zero at reset).

Note that the setting of the frequency variables assumes that the ppc800_pit_init_qtime() routine is being used. If some other initialization of the qtime section and timer callouts takes place, the values in the frequency variables may have to be modified.


void ppc_dec_init_qtime(void);

On a PPC, initialize the qtime section and timer kernel callouts of the system page to use the decrementer register.

Note: The ppc_dec_init_qtime() routine may not be used on a PPC 400 series chip, which omits the decrementer register.


void print_syspage (void)

Print the contents of all the structures in the system page. The global variable debug_level is used to determine what gets printed. The debug_level must be at least 2 to print anything; a debug_level of 3 will print the information within the individual substructures.

Note that you can set the debug level at the command line by specifying multiple -v options to the startup program.

You can also use the startup program's -S command-line option to select which entries are printed from the system page: -Sname selects name to be printed, whereas -S~name disables name from being printed. The name can be selected from the following list:

Name Processors Syspage entry
cacheattr all Cache attributes
callout all Callouts
cpuinfo all CPU info
gdt x86 Global Descriptor Table
hwinfo all Hardware info
idt x86 Interrupt Descriptor Table
intrinfo all Interrupt info
kerinfo PPC Kernel info
pgdir x86 Page directory
qtime all System time info
smp all SMP info
strings all Strings
syspage all Entire system page
system_private all System private info
typed_strings all Typed strings


unsigned long rtc_time (void)

This is a user-replaceable function responsible for returning the number of seconds since January 1 1970 00:00:00 GMT.

This function defaults to calling rtc_time_mc146818(), which knows how to get the time from an IBM-PC standard clock chip.
The default library version simply returns zero.
The default function calls rtc_time_sh4(), which knows how to get the time from the SH-4 on-chip rtc.

Currently, these are the chip-specific versions:

Dallas Semiconductor DS-1386 compatible
SGS-Thomson M48T59 RTC/NVRAM chip
Motorola 146818 compatible
FOX RTC-72423 compatible
PPC 800 onboard RTC hardware

There's also a "none" version to use if your board doesn't have RTC hardware:

unsigned long rtc_time_none(void);

For the PPC 800 onboard RTC hardware, the function is simply as follows:

unsigned long rtc_time_rtc8xx(void);

If you're supplying the rtc_time() routine, you should call one of the chip-specific routines or write your own. The chip-specific routines all share the same parameter list:

(paddr_t base, unsigned reg_shift, int mmap, int cent_reg);

The base parameter indicates the physical base address or I/O port of the device. The reg_shift indicates the register offset as a power of two.

A typical value would be 0 (meaning 20, i.e. 1), indicating that the registers of the device are one byte apart in the address space. As another example, a value of 2 (meaning 22, i.e. 4) indicates that the registers in the device are four bytes apart.

If the mmap variable is 0, then the device is in I/O space. If mmap is 1, then the device is in memory space.

Finally, cent_reg indicates which register in the device contains the century byte (-1 indicates no such register). If there's no century byte register, then the behavior is chip-specific. If the chip is year 2000-compliant, then we will get the correct time. If the chip isn't compliant, then if the year is less than 70, we assume it's in the range 2000 to 2069; else we assume it's in the range 1970 to 1999.


uintptr_t startup_io_map(unsigned size, 
                         paddr_t phys);

Same as mmap_device_io() in the C library -- provide access to an I/O port on the x86 (for other systems, startup_io_map() is the same as startup_memory_map()) at a given physical address for a given size. The return value is for use in the in*/out* functions in the C library. The value is for use during the time the startup program is running (as opposed to callout_io_map(), which is for use after startup is completed).


void startup_io_unmap(uintptr_t port);

Same as unmap_device_io() in the C library -- remove access to an I/O port on the x86 (on other systems, unmap_device_io() is the same as startup_memory_unmap()) at the given port location.


void *startup_memory_map(unsigned size, 
                         paddr_t phys, 
                         unsigned prot_flags);

Same as mmap_device_io_memory() in the C library -- provide access to a memory-mapped device. The value is for use during the time the startup program is running (as opposed to callout_memory_map(), which is for use after startup is completed).


void startup_memory_unmap(void *vaddr);

Same as unmap_device_memory() in the C library -- remove access to a memory-mapped device at the given location.


void tulip_reset(paddr_t phys, 
                 int mem_mapped);

Ensure that a Tulip Ethernet chip (Digital 21x4x) at the given physical address (either I/O or memory-mapped as specified by mem_mapped) is disabled. Some ROM monitors leave the Ethernet receiver enabled after downloading the OS image. This causes memory to be corrupted after the system starts and before Neutrino's Ethernet driver is run, due to the reception of broadcast packets. This function makes sure that no further packets are received by the chip until the Neutrino driver starts up and properly initializes it.


int uncompress(char *dst, 
               int *dstlen, 
               char *src, 
               int srclen, 
               char *win);

This function resides in the startup library and is responsible for expanding a compressed OS image out to full size (this is invoked before main() gets called). If you know you're never going to be given a compressed image, you can replace this function with a stub version in your own code and thus make a smaller startup program.


int x86_cpuid_string (char *buf, 
                      int max)

Place a string representation of the CPU in the string buf to a maximum of max characters. The general format of the string is:

manufacturer part Ffamily Mmodel Sstepping

This information is determined using the cpuid instruction. If it's not supported, then a subset (typically only the part) will be placed in the buffer (e.g. 386).


unsigned x86_cputype (void)

An x86 platform-only function that determines the type of CPU and returns the number (e.g. 386).


int x86_enable_a20 (unsigned long cpu, 
                    int only_keyboard)

Enable address line A20, which is often disabled on many PCs on reset. It first checks if address line A20 is enabled and if so returns 0. Otherwise, it sets bit 0x02 in port 0x92, which is used by many systems as a fast A20 enable. It again checks to see if A20 is enabled and if so returns 0. Otherwise, it uses the keyboard microcontroller to enable A20 as defined by the old PC/AT standard. It again checks to see if A20 is enabled and if so returns 0. Otherwise, it returns -1.

If cpu is a 486 or greater, it issues a wbinvd opcode to invalidate the cache when doing a read/write test of memory to see if A20 is enabled.

In the rare case where setting bit 0x02 in port 0x92 may affect other hardware, you can skip this by setting only_keyboard to 1. In this case, it will attempt to use only the keyboard microcontroller.


unsigned x86_fputype (void)

An x86-only function that returns the FPU type number (e.g. 387).


void x86_init_pcbios(void);

Perform initialization unique to an IBM PC BIOS system.


int x86_pcbios_shadow_rom(paddr_t rom, 
                          size_t size);

Given the physical address of a ROM BIOS extension, this function makes a copy of the ROM in a RAM location and sets the x86 page tables in the _syspage_ptr->un.x86.real_addr range to refer to the RAM copy rather than the ROM version. When something runs in V86 mode, it'll use the RAM locations when accessing the memory.

The amount of ROM shadowed is the maximum of the size parameter and the size indicated by the third byte of the BIOS extension.

The function returns:

if there's no ROM BIOS extension signature at the address given
if you're starting the system in physical mode and there's no MMU to make a RAM copy be referenced
if everything works.


unsigned x86_scanmem (paddr_t beg, 
                      paddr_t end)

An x86-only function that scans memory between beg and end looking for RAM, and returns the total amount of RAM found. It scans memory performing a R/W test of 3 values at the start of each 4 KB page. Each page is marked with a unique value. It then rescans the memory looking for contiguous areas of memory and adds them to the asinfo entry in the system page.

A special check is made for a block of memory between addresses 0xB0000 and 0xBFFFF, inclusive. If memory is found there, the block is skipped (since it's probably the dual-ported memory of a VGA card).

The call x86_scanmem (0, 0xFFFFFF) would locate all memory in the first 16 megabytes of memory (except VGA memory). You may make multiple calls to x86_scanmem() to different areas of memory in order to step over known areas of dual-ported memory with hardware.

Writing your own kernel callout

In order for the Neutrino microkernel to work on all boards, all hardware-dependent operations have been factored out of the code. Known as kernel callouts, these routines must be provided by the startup program.

The startup can actually have a number of different versions of the same callout available -- during hardware discovery it can determine which one is appropriate for the board it's running on and make that particular instance of the callout available to the kernel. Alternatively, if you're on a deeply embedded system and the startup knows exactly what hardware is present, only one of each callout might be present; the startup program simply tells the kernel about them with no discovery process.

The callout code is copied from the startup program into the system page and after this, the startup memory (text and data) is freed.

At the point where the reboot callout is called:

The patch code is run during execution of the startup program itself, so regular calls work as normal.

Once copied, your code must be completely self-contained and position independent. The purpose of the patch routines is to allow you to patch up the code with constants, access to RW data storage etc. so that your code is self-contained and contains all the virtual-physical mappings required.

Find out who's gone before

The startup library provides a number of different callout routines that we've already written. You should check the source tree (originally installed in bsp_working_dir/src/hardware/startup/lib/) to see if a routine for your device/board is already available before embarking on the odyssey of writing your own. This directory includes generic code, as well as processor-specific directories.

In the CPU-dependent level of the tree for all the source files, look for files that match the pattern:


Those are all the callouts provided by the library. Whether a file ends in .s or .S depends on whether it's sent through the C preprocessor before being handed off to an assembler. For our purposes here, we'll simply refer to them as .s files.

The names break down further like this:


where category is one of:

cache control routines
kernel debug input and output routines
interrupt handling routines
timer chip routine
rebooting the system

The device identifies the unique hardware that the callouts are for. Typically, all the routines in a particular source file would be used (or not) as a group by the kernel. For example, the callout_debug_8250.s file contains the display_char_8250(), poll_key_8250(), and break_detect_8250() routines for dealing with an 8250-style UART chip.

Why are they in assembly language?

Since the memory used by the startup executable is reclaimed by the OS after startup has finished, the callouts that are selected for use by the kernel can't be used in place. Instead, they must be copied to a safe location (the library takes care of this for you). Therefore, the callout code must be completely position-independent, which is why callouts have to be written in assembly language. We need to know where the callout begins and where it ends; there isn't a portable way to tell where a C function ends.

The other issue is that there isn't a portable way to control the preamble/postamble creation or code generation. So if an ABI change occurs or a build configuration issue occurs, we could have a very latent bug.

For all but two of the routines, the kernel invokes the callouts with the normal function-calling conventions. Later we'll deal with the two exceptions (interrupt_id() and interrupt_eoi()).

Starting off

Find a callout source file of the appropriate category that's close to what you want and copy it to a new filename. If the new routines will be useful on more than one board, you might want to keep the source file in your own private copy of the startup library. If not, you can just copy to the directory where you've put your board-specific files.

Now edit the new source file. At the top you'll see something that looks like this:

#include "callout.ah"


.include "callout.ah"

The difference depends on the assembler syntax being used.

This include file defines the CALLOUT_START and CALLOUT_END macros. The CALLOUT_START macro takes three parameters and marks the start of one callout. The first parameter is the name of the callout routine (we'll come back to the two remaining parameters later).

The CALLOUT_END macro indicates the end of the callout routine source. It takes one parameter, which has to be the same as the first parameter in the preceding CALLOUT_START. If this particular routine is selected to be used by the kernel, the startup library will copy the code between the CALLOUT_START and CALLOUT_END to a safe place for the kernel to use. The exact syntax of the two macros depends on exactly which assembler is being used on the source. Two common versions are:

CALLOUT_START(timer_load_8254, 0, 0)


CALLOUT_START timer_load_8254, 0, 0
CALLOUT_END timer_load_8254

Just keep whatever syntax is being used by the original file you started from. The original file will also have C prototypes for the routines as comments, so you'll know what parameters are being passed in. Now you should replace the code from the original file with what will work for the new device you're dealing with.

"Patching" the callout code

You may need to write a callout that deals with a device that may appear in different locations on different boards. You can do this by "patching" the callout code as it is copied to its final position. The third parameter of the CALLOUT_START macro is either a zero or the address of a patcher() routine. This routine has the following prototype:

void patcher(paddr_t paddr, 
             paddr_t vaddr, 
             unsigned rtn_offset,
             unsigned rw_offset,
             void *data,
             struct callout_rtn *src );

This routine is invoked immediately after the callout has been copied to its final resting place. The parameters are as follows:

Physical address of the start of the system page.
Virtual address of the system page that allows read/write access (usable only by the kernel).
Offset from the beginning of the system page to the start of the callout's code.
See the section on "Getting some R/W storage" below.
A pointer to arbitrary data registered by callout_register_data() (see above).
A pointer to the callout_rtn structure that's being copied into place.

Note: The data and src arguments were added in the QNX Neutrino Core OS 6.3.2. Earlier patcher functions can ignore them.

Here's an example of a patcher routine for an x86 processor:

    movl    0x4(%esp),%eax                      // get paddr of routine
    addl    0xc(%esp),%eax                      // ...
        movl        0x14(%esp),%edx                        // get base info

    movl    DDI_BASE(%edx),%ecx         // patch code with real serial port
    movl    %ecx,0x1(%eax)
    movl    DDI_SHIFT(%edx),%ecx        // patch code with register shift
    movl    $REG_LS,%edx
    shll    %cl,%edx
    movl    %edx,0x6(%eax)

CALLOUT_START(display_char_8250, 0, patch_debug_8250)
    movl      $0x12345678,%edx          // get serial port base (patched)
    movl      $0x12345678,%ecx          // get serial port shift (patched)

After the display_char_8250() routine has been copied, the patch_debug_8250() routine is invoked, where it modifies the constants in the first two instructions to the appropriate I/O port location and register spacing for the particular board. The patcher routines don't have to be written in assembler, but they typically are to keep them in the same source file as the code they're patching. By arranging the first instructions in a group of related callouts all the same (e.g. debug_char_*(), poll_key_*(), break_detect_*()), the same patcher routine can be used for all of them.

Getting some R/W storage

Your callouts may need to have access to some static read/write storage. Normally this wouldn't be possible because of the position-independent requirements of a callout. But you can do it by using the patcher routines and the second parameter to CALLOUT_START. The second parameter to CALLOUT_START is the address of a four-byte variable that contains the amount of read/write storage the callout needs. For example:

	.long	4
	add		a1,a1,a2
	j	ra
	sh		a3,0+LOW16(a1)
 * Mask the specified interrupt
CALLOUT_START(interrupt_mask_mips, rw_interrupt, patch_interrupt)
	 * Input Parameters : 
	 *      a0 - syspage_ptr
	 *      a1 - Interrupt Number
	 * Returns:
	 *		v0 - error status
	 * Mark the interrupt disabled 
	la		 t3,0x1234(a0)			# get enabled levels addr (patched)
	li       t1, MIPS_SREG_IMASK0		

The rw_interrupt address as the second parameter tells the startup library that the routine needs four bytes of read/write storage (since the contents at that location is a 4). The startup library allocates space at the end of the system page and passes the offset to it as the rw_offset parameter of the patcher routine. The patcher routine then modifies the initial instruction of the callout to the appropriate offset. While the callout is executing, the t3 register will contain a pointer to the read/write storage. The question you're undoubtedly asking at this point is: Why is the CALLOUT_START parameter the address of a location containing the amount of storage? Why not just pass the amount of storage directly?

That's a fair question. It's all part of a clever plan. A group of related callouts may want to have access to shared storage so that they can pass information among themselves. The library passes the same rw_offset value to the patcher routine for all routines that share the same address as the second parameter to CALLOUT_START. In other words:

CALLOUT_START(interrupt_mask_mips, rw_interrupt, patch_interrupt)

CALLOUT_START(interrupt_unmask_mips, rw_interrupt, patch_interrupt)

CALLOUT_START(interrupt_eoi_mips, rw_interrupt, patch_interrupt)

CALLOUT_START(interrupt_id_mips, rw_interrupt, patch_interrupt)

will all get the same rw_offset parameter value passed to patch_interrupt() and thus will share the same read/write storage.

The exception that proves the rule

To clean up a final point, the interrupt_id() and interrupt_eoi() routines aren't called as normal routines. Instead, for performance reasons, the kernel intermixes these routines directly with kernel code -- the normal function-calling conventions aren't followed. The callout_interrupt_*.s files in the startup library will have a description of what registers are used to pass values into and out of these callouts for your particular CPU. Note also that you can't return from the middle of the routine as you normally would. Instead, you're required to "fall off the end" of the code.

PPC chips support

The PPC startup library has been modified in order to:

The new routines and data variables all begin with ppcv_ for PPC variant, and are separated out into one function or data variable per source file. This separation allows maximum code reuse and minimum code duplication.

There are two new data structures:

The first is:

    struct ppcv_chip {
        unsigned short  chip;
        uint8_t         paddr_bits;
        uint8_t         cache_lsize;
        unsigned short  icache_lines;
        unsigned short  dcache_lines;
        unsigned        cpu_flags;
        unsigned        pretend_cpu;
        const char      *name;
        void            (*setup)(void);

Every supported CPU has a statically initialized variable of this type (in its own source file, e.g. <ppvc_chip_603e7.c>).

If the chip field matches the upper 16 bits of the PVR register, this ppcv_chip structure is selected and the pccv global variable in the library is pointed at it. Only the upper 16 bits are checked so you can use the constants like PPC_750 defined in <ppc/cpu.h> when initializing the field.

The paddr_bits field is the number of physical address lines on the chip, usually 32.

The cache_lsize field is the number of bits in a cache line size of the chip, usually 5, but sometimes 4.

The icache_lines and dcache_lines are the number of lines in the instruction and data cache, respectively.

The cpu_flags field holds the PPC_CPU_* flag constants from <ppc/syspage.h> that are appropriate for this CPU. Note that the older startups sometimes left out flags like PPC_CPU_HW_HT and depended on the kernel to check the PVR and turn them on if appropriate. This is no longer the case. The kernel will continue to turn on those bits if it detects an old style startup, but will NOT with a new style one.

The pretend_cpu field goes into the ppc_kerinfo_entry.pretend_cpu field of the system page and as before, it's used to tell the kernel that even though you don't know the PVR, you can act like it's the pretend one.

The name field is the string name of the CPU that gets put in the cpuinfo section.

The setup function is called when a particular ppcv_chip structure has been selected by the library as the one to use. It continues the library customization process by filling the second new structure.

The second data structure is:

struct ppcv_config {
    unsigned    family;
    void        (*cpuconfig1)(int cpu);
    void        (*cpuconfig2)(int cpu);
    void        (*cpuinfo)(struct cpuinfo_entry *cpu);
    void        (*qtime)(void);
    void        *(*map)(unsigned size, paddr_t phys,
                                   unsigned prot_flags);
    void        (*unmap)(void *);
    int         (*mmu_info)(enum mmu_info info, unsigned tlb);
//NYI: tlb_read/write

There's a single variable defined of this type in the library, called ppcv_config. The setup function identified by the selected ppcv_chip is responsible for filling in the fields with the appropriate routines for the chip. The variable is statically initialized with a set of do-nothing routines, so if a particular chip doesn't need something done in one spot (typically the cpuconfig[1/2] routines), the setup routine doesn't have to fill anything in).

The general design rules for the routines are that they should perform whatever chip-specific actions that they can perform that are not also board-specific. For example, the old startup main() functions would sometimes turn off data translation, since some IPLs turned it on. With the new startups this is handled automatically by the library. On the other hand, both the old and new startups call the ppc700_init_l2_cache() manually in main(), since the exact bits to put in the L2CR register are board-specific. The routines in the libraries should be modified to work with the IPL and initialize the CPU properly, rather than modifying the board-specific code to hack around it (e.g. the aforementioned disabling of data translation).

The setup routine might also initialize a couple of other freestanding variables that other support routines use to avoid them having to check the PVR value again (e.g. see the ppc600_set_clock_freqs() and ppcv_setup_7450() functions for an example).

The new startup (and kernel, when used with a new startup) no longer depends on the PVR to identify the chip family. Instead the "family" field is filled in with a PPC_FAMILY_* value from <ppc/syspage.h>. This is transferred to the field on the system page, which the kernel uses to verify that the right version of procnto is being used.

If the kernel sees a value of PPC_FAMILY_UNKNOWN (zero) in the system page, it assumes that an old style startup is being used and will attempt to determine the family (and cpuinfo->flags) fields on its own. DO NOT USE that feature with new startups.

Fill in the and ppcv_chip.cpu_flags field properly. The cpuconfig1 routine is used to configure the CPU for use in startup, and is called early before main() is called. For example, it makes sure that instruction and data translation is turned off, the exception table is pointed at low memory, etc. It's called once for every CPU in an SMP system, with the cpu parm indicating the CPU number being initialized.

The cpuconfig2 routine is called just before startup transfers control to the first bootstrap executable in the image file system. It configures the CPU for running in the bootstrap environment, e.g. turning on CPU-specific features such as HID0 and HID1 bits. Again it's called once per CPU in an SMP system with the cpu parm indicating which one.

The cpuinfo routine is called by init_one_cpuinfo() to fill in the cpuinfo_entry structure for each CPU. The qtime routine is called by init_qtime() to set up the qtime syspage section.

The map and unmap routines used to create/delete memory mappings for startup and callout use, are called by:

There's one more data variable to mention. This is ppcv_list, which is a statically initialized array of pointers to ppcv_chip structures. The default version of the variable in the library has a list of all the ppcv_chip variables defined by the library so, by default, the library is capable of handling any type of PPC chip.

By defining a ppcv_list variable in the board-specific directory and adding only the ppcv_chip_* variable(s) that can be used with that board, all the chip-specific code for the processors that can't possibly be there will be left out.

For example, the new shasta-ssc startup with the default ppcv_list is about 1 KB bigger than the old version. By restricting the ppcv_list to only ppcv_chip_750, the new startup drops to 1 KB smaller than the original.

Adding a new CPU to the startup library

For a CPU called xyz, create a <ppcv_chip_xyz.c> and in it put an appropriately initialized struct ppcv_chip ppcv_chip_xyz variable. Add the ppcv_chip_xyz variable to the default ppcv_list (in <ppcv_list.c>).

If you were able to use an already existing ppcv_setup_*() function for the ppcv_chip_xyz initialization, you're done. Otherwise, create a <ppcv_setup_xyz.c> file with the properly coded ppcv_setup_xyz() function in it (don't forget to add the prototype to <cpu_startup.h>).

If you were able to use already existing ppcv_* routines in the ppcv_setup_xyz() function, you're done. Otherwise, create the routines in the appropriate <ppcv_*_xyz.c> files (don't forget to add the prototype(s) to <cpu_startup.h>). When possible, code the routines in an object-oriented manner, calling already existing routines to fill more generic information, e.g. ppcv_cpuconfig2_700() uses ppcv_cpuconfig2_600() to do most of the work and then it just fills in the 700 series-specific info.

With the new design, the following routines are now deprecated (and they spit out a message to that effect if you call them):

ppc600_init_features(), ppc600_init_caches(), ppc600_flush_caches()
Handled automatically by the library now.
Use ppc700_init_l2_cache() instead.