Callers interested in the list of valid formats from
storage_default_format() actually want this functionality.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Rely on get_formats() rather than just the static plugin data in the
'status' API call. This removes the need for the special casing for
LVM storages without the 'snapshot-as-volume-chain' option. It also
fixes the issue that the 'format' storage configuration option to
override the default format was previously ignored there.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
As the alloc_lvm_image() helper asserts, qcow2 cannot be used as a
format without the 'snapshot-as-volume-chain' configuration option.
Therefore it is necessary to implement get_formats() and distinguish
based on the storage configuration.
In case the 'snapshot-as-volume-chain' option is set, qcow2 is even
preferred and thus declared the default format.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
The LVM plugin can only use qcow2 format when the
'snapshot-as-volume-chain' configuration option is set. The format
information is currently only recorded statically in the plugin data.
This causes issues, for example, restoring a guest volume that uses
qcow2 as a format hint on an LVM storage without the option set will
fail, because the plugin data indicates that qcow2 is supported.
Introduce a dedicated method, so that plugins can indicate what
actually is supported according to the storage configuration.
The implementation for LVM is done in a separate commit.
Remove the now unused default_format() function from Plugin.pm.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
[WB: docs: add missing params, drop =pod line, use !! for bools]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Because LvmThinPlugin.pm uses LVMPlugin.pm as a base, it inherits the
`volume_rollback_is_possible()` subroutine added in eda88c94. Its
implementation however causes snapshot rollbacks to fail with
"can't rollback snapshot for 'raw' volume".
Fix this by implementing `volume_rollback_is_possible()`.
Closes: #6553
Signed-off-by: Max R. Carrara <m.carrara@proxmox.com>
Not perfect but now it's still easy to rename and the new variant fits
a bit better to the actual design and implementation.
Add best-effort migration for storage.cfg, this has been never
publicly released after all.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This also changes the return values, since their meanings are rather
weird from the storage point of view. For instance, "internal" meant
it is *not* the storage which does the snapshot, while "external"
meant a mixture of storage and qemu-server side actions. `undef` meant
the storage does it all...
┌────────────┬───────────┐
│ previous │ new │
├────────────┼───────────┤
│ "internal" │ "qemu" │
│ "external" │ "mixed" │
│ undef │ "storage" │
└────────────┴───────────┘
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Just to safe, as this is already check higher in the stack.
Technically, it's possible to implement snapshot file renaming,
and update backing_file info with "qemu-img rebase -u".
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
by moving the preallocation handling to the call site, and preparing
them for taking further options like cluster size in the future.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
we format lvm logical volume with qcow2 to handle snapshot chain.
like for qcow2 file, when a snapshot is taken, the current lvm volume
is renamed to snap volname, and a new current lvm volume is created
with the snap volname as backing file
snapshot volname is similar to lvmthin : snap_${volname}_{$snapname}.qcow2
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
add a snapext option to enable the feature
When a snapshot is taken, the current volume is renamed to snap volname
and a current image is created with the snap volume as backing file
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
Returns if the volume is supporting qemu snapshot:
'internal' : do the snapshot with qemu internal snapshot
'external' : do the snapshot with qemu external snapshot
undef : does not support qemu snapshot
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
This add a $running param to volume_snapshot,
it can be used if some extra actions need to be done at the storage
layer when the snapshot has already be done at qemu level.
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
template guests are never running and never write
to their disks/mountpoints, those $running parameters there can be
dropped.
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
This compute the whole size of a qcow2 volume with datas + metadatas.
Needed for qcow2 over lvm volume.
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
and use it for plugin linked clone
This also enable extended_l2=on, as it's mandatory for backing file
preallocation.
Preallocation was missing previously, so it should increase performance
for linked clone now (around x5 in randwrite 4k)
cluster_size is set to 128k, as it reduce qcow2 overhead (reduce disk,
but also memory needed to cache metadatas)
l2_extended is not enabled yet on base image, but it could help too
to reduce overhead without impacting performance
bench on 100G qcow2 file:
fio --filename=/dev/sdb --direct=1 --rw=randwrite --bs=4k --iodepth=32 --ioengine=libaio --name=test
fio --filename=/dev/sdb --direct=1 --rw=randread --bs=4k --iodepth=32 --ioengine=libaio --name=test
base image:
randwrite 4k: prealloc=metadata, l2_extended=off, cluster_size=64k: 20215
randread 4k: prealloc=metadata, l2_extended=off, cluster_size=64k: 22219
randwrite 4k: prealloc=metadata, l2_extended=on, cluster_size=64k: 20217
randread 4k: prealloc=metadata, l2_extended=on, cluster_size=64k: 21742
randwrite 4k: prealloc=metadata, l2_extended=on, cluster_size=128k: 21599
randread 4k: prealloc=metadata, l2_extended=on, cluster_size=128k: 22037
clone image with backing file:
randwrite 4k: prealloc=metadata, l2_extended=off, cluster_size=64k: 3912
randread 4k: prealloc=metadata, l2_extended=off, cluster_size=64k: 21476
randwrite 4k: prealloc=metadata, l2_extended=on, cluster_size=64k: 20563
randread 4k: prealloc=metadata, l2_extended=on, cluster_size=64k: 22265
randwrite 4k: prealloc=metadata, l2_extended=on, cluster_size=128k: 18016
randread 4k: prealloc=metadata, l2_extended=on, cluster_size=128k: 21611
Signed-off-by: Alexandre Derumier <alexandre.derumier@groupe-cyllene.com>
In 7684225 ("ceph/rbd: set 'keyring' in ceph configuration for
externally managed RBD storages") the ceph config creation was packed
into a new function and checked whether the installation is an external
Ceph cluster or not.
However, a check was forgotten in the RBDPlugin which is now added.
Without this check a configuration in /etc/pve/priv/ceph/<pool>.conf is
created and pvestatd complains
pvestatd[1144]: ignoring custom ceph config for storage 'pool', 'monhost' is not set (assuming pveceph managed cluster)! because the file /etc/pve/priv/ceph/pool.conf
Fixes: 7684225 ("ceph/rbd: set 'keyring' in ceph configuration for externally managed RBD storages")
Signed-off-by: Hannes Duerr <h.duerr@proxmox.com>
Reviewed-by: Fiona Ebner <f.ebner@proxmox.com>
Link: https://lore.proxmox.com/20250716130117.71785-1-h.duerr@proxmox.com
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Use '/dev/zvol' as a base path for new storages for providers 'iet'
and 'LIO', because that is what modern distributions use.
This is a breaking change regarding the addition of new storages on
older distributions, but it's enough to specify the base path '/dev'
explicitly for setups that require it.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Christoph Heiss <c.heiss@proxmox.com>
Link: https://lore.proxmox.com/20250605111109.52712-1-f.ebner@proxmox.com
When discovering a new volume group (VG), for example on boot, LVM
triggers autoactivation. With the default settings, this activates all
logical volumes (LVs) in the VG. Activating an LV creates a
device-mapper device and a block device under /dev/mapper.
Autoactivation is problematic for shared LVM storages, see #4997 [1].
For the inherently local LVM-thin storage it is less problematic, but
it still makes sense to avoid unnecessarily activating LVs and thus
making them visible on the host at boot.
To avoid that, disable autoactivation after creating new LVs. lvcreate
on trixie does not accept the --setautoactivation flag for thin LVs
yet, support was only added with [2]. Hence, setting the flag is is
done with an additional lvchange command for now. With this setting,
LVM autoactivation will not activate these LVs, and the storage stack
will take care of activating/deactivating LVs when needed.
The flag is only set for newly created LVs, so LVs created before this
patch can still trigger #4997. To avoid this, users will be advised to
run a script to disable autoactivation for existing LVs.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=4997
[2] 1fba3b876b
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Link: https://lore.proxmox.com/20250709141034.169726-3-f.weber@proxmox.com
When discovering a new volume group (VG), for example on boot, LVM
triggers autoactivation. With the default settings, this activates all
logical volumes (LVs) in the VG. Activating an LV creates a
device-mapper device and a block device under /dev/mapper.
This is not necessarily problematic for local LVM VGs, but it is
problematic for VGs on top of a shared LUN used by multiple cluster
nodes (accessed via e.g. iSCSI/Fibre Channel/direct-attached SAS).
Concretely, in a cluster with a shared LVM VG where an LV is active on
nodes 1 and 2, deleting the LV on node 1 will not clean up the
device-mapper device on node 2. If an LV with the same name is
recreated later, the leftover device-mapper device will cause
activation of that LV on node 2 to fail with:
> device-mapper: create ioctl on [...] failed: Device or resource busy
Hence, certain combinations of guest removal (and thus LV removals)
and node reboots can cause guest creation or VM live migration (which
both entail LV activation) to fail with the above error message for
certain VMIDs, see bug #4997 for more information [1].
To avoid this issue in the future, disable autoactivation when
creating new LVs using the `--setautoactivation` flag. With this
setting, LVM autoactivation will not activate these LVs, and the
storage stack will take care of activating/deactivating the LV (only)
on the correct node when needed.
This additionally fixes an issue with multipath on FC/SAS-attached
LUNs where LVs would be activated too early after boot when multipath
is not yet available, see [3] for more details and current workaround.
The `--setautoactivation` flag was introduced with LVM 2.03.12 [2], so
it is available since Bookworm/PVE 8, which ships 2.03.16. Nodes with
older LVM versions ignore the flag and remove it on metadata updates,
which is why PVE 8 could not use the flag reliably, since there may
still be PVE 7 nodes in the cluster that reset it on metadata updates.
The flag is only set for newly created LVs, so LVs created before this
patch can still trigger #4997. To avoid this, users will be advised to
run a script to disable autoactivation for existing LVs.
[1] https://bugzilla.proxmox.com/show_bug.cgi?id=4997
[2] https://gitlab.com/lvmteam/lvm2/-/blob/main/WHATS_NEW
[3] https://pve.proxmox.com/mediawiki/index.php?title=Multipath&oldid=12039#FC/SAS-specific_configuration
Signed-off-by: Friedrich Weber <f.weber@proxmox.com>
Link: https://lore.proxmox.com/20250709141034.169726-2-f.weber@proxmox.com
Introduce qemu_blockdev_options() plugin method.
In terms of the plugin API only, adding the qemu_blockdev_options()
method is a fully backwards-compatible change. When qemu-server will
switch to '-blockdev' however, plugins where the default implemenation
is not sufficient, will not be usable for virtual machines anymore.
Therefore, this is intended for the next major release, Proxmox VE 9.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
FG: fixed typo, add paragraph break
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>