Boot stages#
To be able to provide the functionality that it does, cloud-init
must be
integrated into the boot in a fairly controlled way. There are five
stages to boot:
Generator
Local
Network
Config
Final
Generator#
When booting under systemd
, a generator will run that determines if
cloud-init.target should be included in the boot goals. ds-identify
runs at this stage.
Local#
systemd service |
|
|
runs |
as soon as possible with |
|
blocks |
as much of boot as possible, must block network |
|
modules |
none |
The purpose of the local stage is to:
Locate “local” data sources, and
Apply networking configuration to the system (including “fallback”).
In most cases, this stage does not do much more than that. It finds the datasource and determines the network configuration to be used. That network configuration can come from:
datasource: Cloud-provided network configuration via metadata.
fallback:
Cloud-init
’s fallback networking consists of rendering the equivalent todhcp on eth0
, which was historically the most popular mechanism for network configuration of a guest.none: Network configuration can be disabled by writing the file
/etc/cloud/cloud.cfg
with the content:network: {config: disabled}
.
If this is an instance’s first boot, then the selected network configuration is rendered. This includes clearing of all previous (stale) configuration including persistent device naming with old MAC addresses.
This stage must block network bring-up or any stale configuration that might have already been applied. Otherwise, that could have negative effects such as DHCP hooks or broadcast of an old hostname. It would also put the system in an odd state to recover from, as it may then have to restart network devices.
Cloud-init
then exits and expects for the continued boot of the operating
system to bring network configuration up as configured.
Note
In the past, local datasources have been only those that were available without network (such as ‘ConfigDrive’). However, as seen in the recent additions to the DigitalOcean datasource, even data sources that require a network can operate at this stage.
Network#
systemd service |
|
|
runs |
after local stage and configured networking is up |
|
blocks |
as much of remaining boot as possible |
|
modules |
cloud_init_modules in |
This stage requires all configured networking to be online, as it will fully process any user data that is found. Here, processing means it will:
retrieve any
#include
or#include-once
(recursively) including http,decompress any compressed content, and
run any part-handler found.
This stage runs the disk_setup
and mounts
modules which may partition
and format disks and configure mount points (such as in /etc/fstab
).
Those modules cannot run earlier as they may receive configuration input
from sources only available via the network. For example, a user may have
provided user data in a network resource that describes how local mounts
should be done.
On some clouds, such as Azure, this stage will create filesystems to be
mounted, including ones that have stale (previous instance) references in
/etc/fstab
. As such, entries in /etc/fstab
other than those
necessary for cloud-init to run should not be done until after this stage.
A part-handler will run at this stage, as will boothooks including
cloud-config bootcmd
. The user of this functionality has to be aware
that the system is in the process of booting when their code runs.
Config#
systemd service |
|
|
runs |
after network |
|
blocks |
nothing |
|
modules |
cloud_config_modules in |
This stage runs config modules only. Modules that do not really have an
effect on other stages of boot are run here, including runcmd
.
Final#
systemd service |
|
|
runs |
as final part of boot (traditional “rc.local”) |
|
blocks |
nothing |
|
modules |
cloud_final_modules in |
This stage runs as late in boot as possible. Any scripts that a user is accustomed to running after logging into a system should run correctly here. Things that run here include:
package installations,
configuration management plugins (Ansible, Puppet, Chef, salt-minion), and
user-defined scripts (i.e., shell scripts passed as user data).
For scripts external to cloud-init
looking to wait until cloud-init
is
finished, the cloud-init status --wait subcommand can help block
external scripts until cloud-init
is done without having to write your own
systemd
units dependency chains. See status for more info.
First boot determination#
Cloud-init
has to determine whether or not the current boot is the first
boot of a new instance, so that it applies the appropriate configuration. On
an instance’s first boot, it should run all “per-instance” configuration,
whereas on a subsequent boot it should run only “per-boot” configuration. This
section describes how cloud-init
performs this determination, as well as
why it is necessary.
When it runs, cloud-init
stores a cache of its internal state for use
across stages and boots.
If this cache is present, then cloud-init
has run on this system
before [1]. There are two cases where this could occur. Most
commonly, the instance has been rebooted, and this is a second/subsequent
boot. Alternatively, the filesystem has been attached to a new instance,
and this is the instance’s first boot. The most obvious case where this
happens is when an instance is launched from an image captured from a
launched instance.
By default, cloud-init
attempts to determine which case it is running
in by checking the instance ID in the cache against the instance ID it
determines at runtime. If they do not match, then this is an instance’s
first boot; otherwise, it’s a subsequent boot. Internally, cloud-init
refers to this behaviour as check
.
This behaviour is required for images captured from launched instances to
behave correctly, and so is the default that generic cloud images ship with.
However, there are cases where it can cause problems [2]. For these
cases, cloud-init
has support for modifying its behaviour to trust the
instance ID that is present in the system unconditionally. This means that
cloud-init
will never detect a new instance when the cache is present,
and it follows that the only way to cause cloud-init
to detect a new
instance (and therefore its first boot) is to manually remove
cloud-init
’s cache. Internally, this behaviour is referred to as
trust
.
To configure which of these behaviours to use, cloud-init
exposes the
manual_cache_clean
configuration option. When false
(the default),
cloud-init
will check
and clean the cache if the instance IDs do
not match (this is the default, as discussed above). When true
,
cloud-init
will trust
the existing cache (and therefore not clean it).
Manual cache cleaning#
Cloud-init
ships a command for manually cleaning the cache:
cloud-init clean. See clean’s documentation for further
details.
Reverting manual_cache_clean
setting#
Currently there is no support for switching an instance that is launched with
manual_cache_clean: true
from trust
behaviour to check
behaviour,
other than manually cleaning the cache.
Warning
If you want to capture an instance that is currently in trust
mode as an image for launching other instances, you must manually clean
the cache. If you do not do so, then instances launched from the captured
image will all detect their first boot as a subsequent boot of the captured
instance, and will not apply any per-instance configuration.
This is a functional issue, but also a potential security one:
cloud-init
is responsible for rotating SSH host keys on first boot,
and this will not happen on these instances.