The default timeout in PVE/RADOS.pm is 5 seconds, but this is not
always enough for external clusters under load. Workers can and should
take their time to not fail here too quickly.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
In preparation to increase the timeout for workers. Both existing
callers of librados_connect() don't currently use the parameter.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The return value of get_rbd_dev_path() is only used when $scfg->{krbd}
evaluates to true and the function shouldn't have any side effects
that are needed later, so the call can be avoided otherwise.
This also saves a RADOS connection and command with configurations for
external clusters with krbd disabled.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The changes in cfe46e2d4a git not catch
all situations.
In the case of a guest having 2 disk images with the same name on a pool
with the same name but in two different ceph clusters we still had
issues when starting it. The first disk got mapped as expected. The
second disk did not get mapped because we returned the old $path to
"/dev/rbd/<pool>/<image>" because it already existed from the first
disk.
In the case that only the "old" /dev/rbd path exists and we do not have
the /dev/rbd-pve/<cluster>/... path available, we now check if the
cluster fsid used by that rbd device matches the one we expect. If it
does, then we are in the situation that the image has been mapped before
the new rbd-pve udev rule was introduced. If it does not, then we have
the situation of an ambiguous mapping in /dev/rbd and return the
$pve_path.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
When a data-pool is configured, use it for status infos. The 'data-pool'
config option is used to mark the erasure coded pool while the 'pool'
will be the replicated pool holding meta data such as the omap.
This means, the 'pool' will only use a small amount of space and people
are interested how much they can store in the erasure coded pool anyway.
Therefore this patch reorders the assignment of the used pool name by
availability of the scfg parameters: data-pool -> pool -> fallback 'rbd'
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
happens in case of a mistyped poolname, and the new message should be
more helpful than:
`Use of uninitialized value $free in addition (+) at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 64`
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
the fallback to a default pool name of 'rbd' was introduced in:
1440604a4b
and worked for the status command, because it used the `rados_cmd`
sub.
This fallback was lost with the changes in:
41aacc6cde
leading to confusing errors:
`Use of uninitialized value in string eq at \
/usr/share/perl5/PVE/Storage/RBDPlugin.pm line 633`
(e.g. in the journal from pvestatd)
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
to avoid calls into RADOS connect, that trigger RPCEnv not
initialized breakage in regression tests, but wouldn't really work
otherwise either
in the future the RBD $scfg could actually support this (or similarly
named) property, to safe on storage addition and then avoid frequent
mon commands
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
When krbd is used, subsequent removal after an an operation
involving a rename could fail with
> librbd::image::PreRemoveRequest: 0x559b7506a470 \
> check_image_watchers: image has watchers - not removing
because the old mapping was still present.
For both operations with a rename, the owning guest should be offline,
but even if it weren't, unmap simply fails when the volume is in-use.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
it only redirected to get_rbd_dev_path with the same signature and both
are private subs..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
the new udev rule is expected to be in place and active, switching the
checks around means 1 instead of 2 stat()s in this rather hot code path.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
By adding our own customized rbd udev rules and ceph-rbdnamer we can
create device paths that include the cluster fsid and avoid any
ambiguity if the same pool and namespace combination is used in
different clusters we connect to.
Additionally to the '/dev/rbd/<pool>/...' paths we now have
'/dev/rbd-pve/<cluster fsid>/<pool>/...' paths.
The other half of the patch makes use of the new device paths in the RBD
plugin.
The new 'get_rbd_dev_path' method the full device path. In case that the
image has been mapped before the rbd-pve udev rule has been installed,
it returns the old path.
The cluster fsid is read from the 'ceph.conf' file in the case of a
hyperconverged setup. In the case of an external Ceph cluster we need to
fetch it via a rados api call.
Co-authored-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
The first step is to allocate rbd images correctly.
The metadata objects still need to be stored in a replicated pool, but
by providing the --data-pool parameter on image creation, we can place
the data objects on the erasure coded (EC) pool.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Functionality has been added for the following storage types:
* directory ones, based on the default implementation:
* directory
* NFS
* CIFS
* gluster
* ZFS
* (thin) LVM
* Ceph
A new feature `rename` has been introduced to mark which storage
plugin supports the feature.
Version API and AGE have been bumped.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
the intention of this feature is to support the following use-cases:
- reassign a volume from one owning guest to another (which usually
entails a rename, since the owning vmid is encoded in the volume name)
- rename a volume (e.g., to use a more meaningful name instead of the
auto-assigned ...-disk-123)
only the former is implemented at the caller side in
qemu-server/pve-container for now, but since the lower-level feature is
basically the same for both, we can take advantage of the storage plugin
API bump now to get the building block for this future feature in place
already.
adapt ApiChangelog change to fix conflicts and added more detail above
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
By adding the keyring for RBD storage or the secret for CephFS ones, it
is possible to add an external Ceph cluster with only one API call.
Previously the keyring / secret file needed to be placed in
/etc/pve/priv/ceph/$storeID.{keyring,secret} manually.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
This patch introduces support for Cephs RBD namespaces.
A new storage config parameter 'namespace' defines the namespace to be
used for the RBD storage.
The namespace must already exist in the Ceph cluster as it is not
automatically created.
The main intention is to use this for external Ceph clusters. With
namespaces, each PVE cluster can get its own namespace and will not
conflict with other PVE clusters.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
The <pool>/<image> paths are needed in quite a lot of places. Having one
single place where they are created helps to reduce duplicate code and
makes it easier to introduce new features.
The 'add_pool_to_disk' sub was already doing that but the name was not
really fitting. This commit renames it to the more general
'get_rbd_path' and changes the second parameter to the more widely used
$volume instead of $disk.
Furthermore, all occurences where "$pool/$volume" has been concatenated
have been replaced with a call to get_rbd_path.
Plus some minor code style cleanups for long function calls that were
touched.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
In order to take a snapshot of a container volume, which can be mounted
read-only with RBD, the volume needs to be frozen (fsfreeze (8)) before taking
the snapshot.
This commit adds helpers to determine if the FIFREEZE ioctl needs to be called
for the volume.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
A newer than the Luminous version is shipped with buster, and our
ceph repos are on Nautilus (14.2) in PVE 6.
Allows to drop a check for really old ceph versions (< 10, so
Infernalis and older).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
dmesg: libceph: bad option at 'conf=/etc/pve/ceph.conf'
After the upgrade to PVE 6 with Ceph Luminous, the mount.ceph helper
doesn't understand the conf= option yet. And the CephFS mount with the
kernel client fails. After upgrading to Ceph Nautilus the option exists
in the mount.ceph helper.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
We can use 'list_images' to get the desired volume IDs in
'find_free_diskname' for most plugins. For the two LVM plugins, 'list_images'
potentially skips untagged volumes, so we keep the custom version. For the
RBD plugin, 'list_images' is much more costly than the custom version, so we
keep the custom version.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
we need to unprotect more snapshots than just the base one, since we
allow linked clones of regular VM snapshots. unprotection will only work
if no linked clones exist anymore.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
it does not work:
disable RBD image features this kernel RBD drivers is not compatible with: fast-diff,object-map,deep-flatten
clone failed: could not disable krbd-incompatible image features 'fast-diff,object-map,deep-flatten' for rbd image: vm-123123123-disk-0@test: rbd: snapshot name specified for a command that doesn't use it
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Modern kernel, like 5.3, support all those features ('fast-diff',
'object-map', 'deep-flatten'), so we do not want to disable them
there. 5.0 already supports exclusive-locks, so no need to disable
exclusive locking there.
Further, we also want to profit from new features available, so let's
enable those which can be enabled "live" (i.e., after image creation)
if their available.
While we could also parse the kernel information directly from:
/sys/module/libceph/parameters/supported_features
there's not much advantage to that, features cannot be disabled with
KConfig, their also very dependent of the kernel version booted.
So for us it's enough to check that one.
This only affects container and VMs backed by a storage with KRBD
explicitly enabled. But as the enabling and disabling happens
transparently, it has no effect on the running guest.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
The patch uses the value from the field 'stored' if it is available.
In Ceph 14.2.2 the storage calculation changed to a per pool basis. This
introduced an additional field 'stored' that holds the amount of data
that has been written to the pool. While the field 'used' now has the
data after replication for the pool.
The new calculation will be used only if all OSDs are running with the
on-disk format introduced by Ceph 14.2.2.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
it is not a storage plugin, and it makes more sense to have it
top-level, but there we cannot name it CephTools because of the
existing ones in pve-manager
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
This allows to request a mapped device/path explicitly, regardles of
the storage option, eg. krbd option in the RBDplugin.
Bump of the storage ABI => 2
Co-authored-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
$features is actually an array reference, so use it as one.
This broke creation and migration of disks on rbd storages
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Non conforming image names are not ignored anymore by the new rbd_ls
implementation, this patch adds the old behaviour.
This fix is a temporary workaround and should be removed, once the new
image name parser is ready.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>