How to debug cloud-init¶
There are several cloud-init failure modes that one may need to debug. Debugging is specific to the scenario, but the starting points are often similar:
I can’t log in to my instance¶
One of the more challenging scenarios to debug is when you don’t have shell access to your instance. You have a few options:
Acquire log messages from the serial console and check for any errors.
To access instances without SSH available, create a user with password access (using the user-data) and log in via the cloud serial port console. This only works if
cc_users_groups
successfully ran.Try running the same user-data locally, such as in one of the tutorials. Use LXD or QEMU locally to get a shell or logs then debug with these steps.
Try copying the image to your local system, mount the filesystem locally and inspect the image logs for clues.
Cloud-init did not run¶
Check the output of
cloud-init status --long
what is the value of the
'extended_status'
key?what is the value of the
'boot_status_code'
key?
See our reported status explanation for more information on the status.
Check the contents of
/run/cloud-init/ds-identify.log
This log file is used when the platform that cloud-init is running on is detected. This stage enables or disables cloud-init.
Check the status of the services
systemctl status cloud-init-local.service cloud-init-network.service\ cloud-config.service cloud-final.service
Cloud-init may have started to run, but not completed. This shows how many, and which, cloud-init stages completed.
Cloud-init ran, but didn’t do what I want it to¶
If you are using cloud-init’s user-data cloud config, make sure to validate your user-data cloud config
Check for errors in
cloud-init status --long
what is the value of the
'errors'
key?what is the value of the
'recoverable_errors'
key?
See our guide on exported errors for more information on these exported errors.
For more context on errors, check the logs files:
/var/log/cloud-init.log
/var/log/cloud-init-output.log
Identify errors in the logs and the lines preceding these errors.
Ask yourself:
According to the log files, what went wrong?
How does the cloud-init error relate to the configuration provided to this instance?
What does the documentation say about the parts of the configuration that relate to this error? Did a configuration module fail?
What failure state is cloud-init in?
Cloud-init never finished running¶
There are many reasons why cloud-init may fail to complete. Some reasons are internal to cloud-init, but in other cases, cloud-init failure to complete may be a symptom of failure in other components of the system, or the result of a user configuration.
External reasons¶
Other services failed or are stuck.
Bugs in the kernel or drivers.
Bugs in external userspace tools that are called by
cloud-init
.
Internal reasons¶
A command in
bootcmd
orruncmd
that never completes (e.g., running cloud-init status --wait will deadlock).Configurations that disable timeouts or set extremely high timeout values.
To start debugging¶
Check
dmesg
for errors:dmesg -T | grep -i -e warning -e error -e fatal -e exception
Investigate other systemd services that failed
systemctl --failed
Check the output of
cloud-init status --long
what is the value of the
'extended_status'
key?what is the value of the
'boot_status_code'
key?
See our guide on exported errors for more information on these exported errors.
Inspect running services boot stage:
$ systemctl list-jobs --after JOB UNIT TYPE STATE 150 cloud-final.service start waiting └─ waiting for job 147 (cloud-init.target/start) - - 155 blocking-daemon.service start running └─ waiting for job 150 (cloud-final.service/start) - - 147 cloud-init.target start waiting 3 jobs listed.
In the above example we can see that
cloud-final.service
is waiting and is ordered beforecloud-init.target
, and thatblocking-daemon.service
is currently running and is ordered beforecloud-final.service
. From this output, we deduce that cloud-init is not complete because the service namedblocking-daemon.service
hasn’t yet completed, and that we should investigateblocking-daemon.service
to understand why it is still running.Use the PID of the running service to find all running subprocesses. Any running process that was spawned by cloud-init may be blocking cloud-init from continuing.
pstree <PID>
Ask yourself:
Which process is still running?
Why is this process still running?
How does this process relate to the configuration that I provided?
For more context on errors, check the logs files:
/var/log/cloud-init.log
/var/log/cloud-init-output.log
Identify errors in the logs and the lines preceding these errors.
Ask yourself:
According to the log files, what went wrong?
How does the cloud-init error relate to the configuration provided to this instance?
What does the documentation say about the parts of the configuration that relate to this error?