Embedded Linux start-up time optimization

November 2015 Friday 6 May

Fast BootSome applications have special requirements for the system start-up time. In many cases, these systems do not require an immediate place for all tasks, but for some key tasks (for example, receive Ethernet or display a user interface) must be able to respond. The post will provide some simple steps and methods to optimize the start time Toradex system module.


Tip: part methods described herein relate to the need to recompile U-boot, kernel and file system. Please refer to the relevant articles on our Developer Center website.

Before we started working on the optimization, we need a proper way to measure start-up time. If you want to very accurately measure start-up time, which even need to involve hardware (such as GPIO and oscilloscope). In most cases, by monitoring the serial console output system it is quite accurate. Tim Bird's grabserial is a widely used tool for generating time information see the serial console output. This tool can be added to each row of the received timestamp information, as shown below:

$ ./grabserial -d /dev/ttyUSB1 -t
[0.000002 0.000002]
[0.000171 0.000169]
[0.000216 0.000045] U-Boot 2015.04-00006-g6762920 (Oct 12 2015 - 15:35:50)
[0.005177 0.004961]
[0.005227 0.000050] CPU: Freescale Vybrid VF610 at 500 MHz
[0.008938 0.003711] Reset cause: POWER ON RESET
[0.011153 0.002215] DRAM:  256 MiB
[0.063692 0.052539] NAND:  512 MiB
[0.065568 0.001876] MMC:   FSL_SDHC: 0

The first number representative of the time stamp (counting from the first character is received), the second row represents the time between receipt of the current line and the previous line interval information.

The article basically applies to all of our modules. However, the method which I used and the improvement does have some basis for NXP ® / Freescale Vybrid of Colibri VF61 module.

Starting Linux systems can be divided into three phases, the article will discuss one by one.

  • Boot loader
  • Linux kernel
  • User space (init system)
Embedded Linux Boot Time Optimization
Video: Colibri VF61 Linux 2 seconds faster start

Boot loader
actually boot loader before the start, there are two steps: hardware initialization and boot ROM. Hardware initialization sequence, and we need to meet the electrical bus and a processor chip reset timing requirements on power. Processed in this stage it is generally fixed at 10 ~ 200 ms. Arm processor from boot firmware located inside the ROM. The firmware loads boot loader from the boot media. This phase is generally very short period of time, depending on the size of the boot loader. In addition to reducing the volume boot loader, other optimization is difficult to do. In fact, you can do to optimize and adjust or in the boot loader (U-Boot).

The current release of V2.5 Beta 1 version, the output from the first character to the kernel boot time is about 1.85 seconds. Mainly involves the following process:

  • U-Boot initialization (about 110 ms, measured from the first received character)
  • Autoboot delay(1s)
  • UBI UBIFS initialization and loading (about 300 ms, thanks Fastmap function of this feature will take 1.6s)
  • Kernel loads (375 ms)
  • Loading and application device tree (about 35 ms)
  • Finally jump to the kernel start address
Boot time to Kernel start: ~1850ms

The most notable optimization is to reduce Autoboot delay. This value can be set using the following command to 0:

setenv bootdelay 0
saveenv

This can also be used CONFIG_BOOTDELAY configure it as the default. In the current release version, if bootdelay set to 0, then there will be no way to directly enter the boot loader command line mode. A U-Boot options CONFIG_ZERO_BOOTDELAY_CHECK, bootdelay is 0 in the case of, for detecting a character. We have to add it to the default configuration in the next release.

Boot time to Kernel start with this improvement: ~860ms

Synchronous serial output is transmitted. This means that the CPU will wait until the transmission is completed by the characters in the serial line. Thus, each character output will slow down the start of U-Boot. In particular UBI will output a lot of information, this is a place to be optimized. There is a sign CONFIG_UBI_SILENCE_MSG configuration can achieve this purpose.

Boot time to Kernel start with this improvement: ~800ms

To ensure that the hardware as efficiently as possible, requires a deep understanding of the hardware functions and methods currently implemented. There is no function is being used Level 2 Cache (Colibri VF61 only). After turning on Level 2 cache, can improve start-up time 40 ms.

Boot time to Kernel start with this improvement: ~760ms

Remove some feature helps reduce the time allocation and initialization time these functions. May be removed, for example, display support (DCU), EXT3 EXT4 support and drive and USB peripheral devices such as storage and DFU. This may be U-Boot size is reduced to 366 KB, while saving time 10 ms.

Boot time to Kernel start with this improvement: ~750ms

The time stamp shown, most of the time is used to load and mount UBIFS and UBI core loading (about 380 ms). Obviously kernel size and loading time have a linear relation, thus, optimize the kernel size will start time can be further improved.

Kernel
order to measure only the kernel boot time, the matching function may be used grabserial boot loader output reset time information.

./grabserial -d /dev/ttyUSB1 -t -m "^Starting kernel.*"

Start time end somewhat difficult to determine, because the kernel will continue to initialize the hardware, even if the file system has been mounted and a user-space process (init) starts running (delayed initialization). "Freeing unused kernel memory" is the last message issued before the start of the init process, thus marking it as the end of the linear core tasks (please see kernel_init in init / main.c). I will use this information to compare the start time stamp information. The default compression module on our core size is 4316 KB, start-up time of 2.56 seconds.

Kernel boot time to Init start: 2.56s

Like the U-Boot, Linux kernel will also send information to the serial synchronously. Specific methods depend on the serial port, LPUART (Vybrid the console driver) will wait until the synchronization is completed character transmitted on the serial port. The advantage is that when a kernel panic when all the information is visible at that time. If the output information is asynchronous, the final output information indicating where the kernel will not collapse.

A kernel parameter can minimize the information output: "Quiet". However, this information will be masked character we test start time ( "Freeing unused kernel memory"). The easiest way to output information is output using the log level specific information. Search "Freeing% s memory" in 'mm / page_alloc.c' in. I will use the output information 'pr_alert'. This approach from the high 1.55 seconds, reducing the time than normal.

Kernel boot time to Init start with this improvement: ~1.01s

Another simple method to further improve the startup time is to remove the feature. Yocto project provides a convenient tool ksize.py, this needs to be run in the kernel compile directory. The tool display size of each core portion. The first table shows the approximate overview (To obtain accurate profiles, use make clean before compiling).

Linux Kernel              total |       text       data        bss
-------------------------------------------------------------------
vmlinux                 8305381 |    7882273     247732     175376
drivers/built-in.o      2010229 |    1881545     109796      18888
fs/built-in.o           1944926 |    1911100      19422      14404
net/built-in.o          1477404 |    1398316      44832      34256
kernel/built-in.o        628094 |     514935      17099      96060
sound/built-in.o         326322 |     316298       8248       1776
mm/built-in.o            288456 |     276492       8000       3964
lib/built-in.o           160209 |     157659        217       2333
block/built-in.o         137262 |     133614       2420       1228
crypto/built-in.o        104157 |     100063       4082         12
security/built-in.o       37391 |      36303        788        300
init/built-in.o           31064 |      16208      14772         84
ipc/built-in.o            29366 |      28640        722          4
usr/built-in.o              138 |        138          0          0
-------------------------------------------------------------------
sum                     7175018 |    6771311     230398     173309
delta                   1130363 |    1110962      17334       2067

Can be safely removed is generally application related functions. Browse through the first directory to help quickly identify the object most likely to be removed. For the text of the presentation, I removed part of the file system (cifs, nfs, ext4, ntfs), audio subsystem, multimedia support, USB support and Wi-Fi adapter. Kernel finally reduced to 3356 KB, nearly 1MB smaller than original. This also reduces the kernel loading time of about 85 ms.

Kernel boot time to Init start with this improvement: ~0.90s

Another way to improve start-up time can use different compression algorithms, even though we are currently in default kernel configuration algorithm LZO, which have also already been fully utilized.

User Space
in Linux user space, initial work done by the init system. Toradex BSP mirroring Ångströ standard boot init system, referred Systemd. Systemd and has become a become a standard desktop Linux init system, has a wealth of features, especially for dynamic system design. Systemd will also affect the start time. Multiple daemons can be started simultaneously (using the current multi-core systems); socket when activated at a later support a delay time to load on demand services, and support equipment to start activation. And integrated logging daemon journald the use of binary log files and log files sound management can save space.

Depending on the application, an embedded system can be quite static. Therefore, we do not need the dynamic functionality Systemd. Unfortunately, Systemd not a very modular system, the interdependencies between the various modules. This makes it difficult to streamline Systemd. In this section, nothing in two parts, the first part Systemd start using optimization techniques, using the second part of System V and other technologies.

In part two, we use the "Freeing unused kernel memory" as a measurement reference time.

./grabserial -d /dev/ttyUSB1 -t -m "^\[ *[]0-9.]* Freeing unused kernel memory.*"

systemd
In this blog post, login shell on our definition of serial output as mission-critical. login shell is defined as "Type = Idle", by definition it will run all the service starts.

To start a framebuffer is not interface or application-based, generally you need to create a new service. Systemd conditions required to allow the service run previously defined (e.g. in Network "Wants = network-online.target") and then automatically ensure the desired condition is satisfied when the service can be started. However, since these services are started simultaneously, so the CPU resources need to be shared between them. But the application is still possible to start up and running before the serial console is available. Therefore, the following figures may be higher.

User space boot time to Login without improvements: ~8.6s

Kernel parameters in the quiet, also apply to Systemd. This helps Systemd startup time by about 1.6 seconds.

User space boot time to Login with this improvement: ~6.5s

Systemd provides systemd-analyze tool when using "blame", able to print out individual services and the time it started. This service can be found in most consuming start-up time. However, where the values ​​may be deceptive, since the actual measured time is the time elapsed. Service may be in the sleep state, when the CPU is actually processing other tasks. So the top of the list of services is not necessarily the most time-consuming, especially in the single-core system.

Services can use the disable command to shut down. Some services (particularly Systemd itself provided) may be required to turn off their masks. In addition some of which may be required for system operation. Therefore, special care is needed when closing the service, and can only handle one. In this blog post, the following services have been shut down:

systemctl disable usbg
systemctl disable connman.service # replaced with networkd
systemctl mask alsa-restore.service
User space boot time to Login with this improvement: ~6.1s

Systemd comes with the system log daemon is journald. This is one of the components that should not be completely disabled. At startup, the log daemon to manage and delete old files on the disk, and write the new file. By prohibiting the log is written to disk, you can provide start-up time, the cost of the log file will not be saved. /Etc/systemd/journald.conf configuration of Storage = none, save disable logging.

User space boot time to Login with this improvement: ~5.6s

System V init and other methods
for a long period of time, Linux also be used as a standard SysV init system. As the script-based system, which is modular, and can be relatively easily streamline the system. Especially for relatively static system, it does not require Systemd of device activation and socket activation. At this point, SysV can be a good choice.

My last post [ Towers Perrin hardware reference Yocto project to build ] Yocto project mentioned in reference to build "poky" on the use of SysV default. By using the 'minimal-console-image' and a static IP configuration, the Colibri VF61 user space starting time is about 2.3 s.

User space boot time to Shell with System V: ~2.3s

meta-yocto layer also provides 'poky-tiny', which is used as a shell script init system. Alternatively as long as the release version, compiled general Yocot mirroring "poky-tiny", for example, 'console-image-minimal'. The release plate was used as initramfs. However, by removing the conf / distro / poky-tiny.conf file MACHINE_ESSENTIAL_EXTRA_RDEPENDS, IMAGE_FSTYPES and PREFERRED_PROVIDER_virtual / kernel, I was able to compile available UBIFS mirror. In order to correct the required reconfiguration of the root file system can be programmed, you need to create a new release layer and copy the profile. Thus, the shell start time is relatively fast (220 ms), the overall start-up time can perform simple command is less than 2 seconds. Of course, this is just mount the root file system provides some basic virtual file support and shell. Similarly, the number of the desired function of the project, as these can be a good starting point.

User space boot time to Shell with a Shell script only: ~0.2s

More resources:
http://free-electrons.com/doc/training/boot-time/boot-time-slides.pdf