Oracle DataSource Fails When Used With a Bionic Image
Affects | Status | Importance | Assigned to | Milestone | |
---|---|---|---|---|---|
cloud-init (Ubuntu) |
Fix Released
|
Medium
|
Unassigned | ||
Bionic |
Fix Released
|
Undecided
|
Unassigned | ||
Focal |
Fix Released
|
Undecided
|
Unassigned | ||
Hirsute |
Fix Released
|
Undecided
|
Unassigned |
Bug Description
=== Begin SRU Template ===
[Impact]
When attempting to launch a Bionic instance on Oracle Cloud Infrastructure, with an explicitly set datasource: [ Oracle ], the instance fails to run the OracleDataSource. This eventually leads to cloud-init falling back to NoDataSource. The root cause is cloud-init attempting to add routes to create an Ephemeral DHCP network. We can instead check for a response from the hardcoded metadata URL and skip adding unnecessary routes.
[Test Case]
1. Launch Oracle Bionic instance
2. Install cloud-init proposed version
3. mv /etc/cloud/
4. Enable Oracle in `dpkg-reconfigure cloud-init` # Only required for existing instances
5. Verify the datasource listed via `cloud-init status -l` shows DataSourceOracle and not DataSourceNoCloud or DataSourceOpenStack
6. Verify /var/log/
[Regression Potential]
If the metadata service is down, we'll fall back to the erroneous behavior. However, cloud-init will fail in other ways if the metadata service is inaccessible.
[Other Info]
Github PR: https:/
Upstream commit:
https:/
=== End SRU Template ===
Initial bug:
When attempting to launch a Bionic instance on Oracle Cloud Infrastructure, with an explicitly set datasource: [ Oracle ], the instance fails to run the OracleDataSource. This leads to the instance not having SSH keys imported from the metadata service. The failure is related to the command:
Running command ['ip', '-4', 'route', 'add', '0.0.0.0/0', 'via', '10.0.0.1', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
Which showed up in the logs :
2021-08-11 13:56:13,289 - util.py[DEBUG]: Reading from /var/tmp/
2021-08-11 13:56:13,289 - util.py[DEBUG]: Read 519 bytes from /var/tmp/
2021-08-11 13:56:13,289 - dhcp.py[DEBUG]: Received dhcp lease on ens3 for 10.0.0.
2021-08-11 13:56:13,289 - __init__.py[DEBUG]: Attempting setup of ephemeral network on ens3 with 10.0.0.66/24 brd 10.0.0.255
2021-08-11 13:56:13,289 - subp.py[DEBUG]: Running command ['ip', '-family', 'inet', 'addr', 'add', '10.0.0.66/24', 'broadcast', '10.0.0.255', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
2021-08-11 13:56:13,291 - __init__.py[DEBUG]: Skip ephemeral network setup, ens3 already has address 10.0.0.66
2021-08-11 13:56:13,291 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'add', '0.0.0.0/0', 'via', '10.0.0.1', 'dev', 'ens3'] with allowed return codes [0] (shell=False, capture=True)
2021-08-11 13:56:13,293 - handlers.py[DEBUG]: finish: init-local/
2021-08-11 13:56:13,293 - util.py[WARNING]: Getting data from <class 'cloudinit.
2021-08-11 13:56:13,293 - util.py[DEBUG]: Getting data from <class 'cloudinit.
Traceback (most recent call last):
File "/usr/lib/
if s.update_
File "/usr/lib/
result = self.get_data()
File "/usr/lib/
return_value = self._get_data()
File "/usr/lib/
with network_context:
File "/usr/lib/
return self.obtain_lease()
File "/usr/lib/
ephipv4.
File "/usr/lib/
self.
File "/usr/lib/
['dev', self.interface], capture=True)
File "/usr/lib/
cmd=args)
cloudinit.
Command: ['ip', '-4', 'route', 'add', '0.0.0.0/0', 'via', '10.0.0.1', 'dev', 'ens3']
Exit code: 2
Reason: -
Stdout:
Stderr: RTNETLINK answers: File exists
This eventually leads to cloud-init falling back to NoDataSource.
To create this image, I:
* Updated CPC's livecd-rootfs code for Oracle to include:
# etc/cloud/
# Configuration for Oracle Cloud Infrastructure
datasource_list: [ Oracle ]
* created an image using CPC's livecd-rootfs using ubuntu-bartender
* registered a custom image in OCI
* attempted to create an instance using the custom image
I was unable to connect via ssh, getting "Permission denied (publickey)"
I attempted to create a serial connection, however, I was never able to successfully SSH in. It just hung forever.
In a second attempt, I tried to pass in a username:password to cloud-init. However, due to the failure of the datasource, and fallback to NoDataSource, my custom data was not loaded either
I was able to collect logs by terminating the instance, but keeping the boot volume. I then created a Bionic instance using the platform image, and verified that it worked with the OpenStack datasource currently in use. I then attached the boot volume from the now terminated instance as a block volume, ran the required iscsi commands (found via the web console after attaching the block volume), and mounted the drive to /mnt/nods. I was then able to collect the logs in /mnt/nods/
To reproduce, an image would need made with the datasource explicitly set to Oracle.
Changed in cloud-init (Ubuntu): | |
status: | New → Triaged |
importance: | Undecided → Medium |
Changed in cloud-init (Ubuntu): | |
status: | Triaged → Fix Committed |
description: | updated |
description: | updated |
description: | updated |
description: | updated |
tags: |
added: verification-done verification-done-bionic verification-done-focal verification-done-hirsute removed: verification-needed verification-needed-bionic verification-needed-focal verification-needed-hirsute |
Changed in cloud-init (Ubuntu): | |
status: | Fix Committed → Fix Released |
PR is up at https://github.com/canonical/cloud-init/pull/988