When working with several ZFS over iSCSI / LIO storages, we might lookup
between them with less than 15 sec interval.
Previously, the cache of the previous storage was used, which was breaking
disk move for example
Signed-off-by: Daniel Berteaud <daniel@firewall-services.com>
The common ZFSPlugin was missing volume name parsing
in a few places. This was not a problem for standard
volumes, but broke functionnalities (like resize,
snapshot, rollback) with linked clones as the name of
the zvol must be extracted from the entry in the config
(remove base-X-disk-Y prefix)
Signed-off-by: Daniel Berteaud <daniel@firewall-services.com>
In the default config, emulate_tpu is set to 0, which disables
unmap support. Once enabled, trim can run from guest to reclaim free
space.
Signed-off-by: Daniel Berteaud <daniel@firewall-services.com>
It's not needed, LIO sees the new size automatically.
And it was broken anyway. Partially fix#2335
Signed-off-by: Daniel Berteaud <daniel@firewall-services.com>
Using the json output, as suggested by Thomas, we now die if the decoding
fails and, if not, all return values are set to the corresponding decoded
values. That should prevent any unforeseen null size values, except if
qemu-img info reports it, which we then consider as valid.
Signed-off-by: Tim Marx <t.marx@proxmox.com>
$1 and $2 get set to undef from the vmid filter regex, so we have to do
the name/format regex after, else we get errors like:
'use of unitiialized value $1[...]'
and the listing is empty
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
The patch uses the value from the field 'stored' if it is available.
In Ceph 14.2.2 the storage calculation changed to a per pool basis. This
introduced an additional field 'stored' that holds the amount of data
that has been written to the pool. While the field 'used' now has the
data after replication for the pool.
The new calculation will be used only if all OSDs are running with the
on-disk format introduced by Ceph 14.2.2.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
To maintain full (backwards) compatibility, leave the type name as
'iso' - this makes this patch work without changing every consumer of
storage APIs.
Note that currently these files can only be attached as a CDROM/DVD
drive, so USB-only images can be uploaded but might not work in VMs.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
and actually do that not just for creating zvols, but also when
activating them. this should fix a range of issues/races that sometimes
occured on bootup, snapshot rollback or similar operations.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
plugins can still override list_volumes if they want separate methods to
list rootdir and images content.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
since list_volumes is only supposed to be called with filtered content
types, this should ensure that get_subdir is only called for plugins
that have a defined 'path' property, like the old code in
PVE::Storage::template_list did.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
checking '$server:$subdir' is too strict to work in all cirumcstances,
e.g. adding/removing a monitor would mean that it is not the same
anymore, same if one is adding/removing the ports from the config
check only if the subdir is the same and if it is a cephfs
this way, it still returns true if someone changes the config
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
don't require any specific file types, if something is here which can
be requested to delete over API/CLI it either comes from an API/CLI
operation and should be thus OK to delete, else a user caused the
creation of the special file and it either works and all is good or
the user gets notified as we check if unlink succeeded anyway
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Symlinks with a non-existing target fail Perls '-f' test and were thus
not deleteable via the API (failing with '$path does not exist').
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Since ceph-fuse is called directly in the CephFS storage plugin, which
can not process the _netdev option, mounting the CephFS storage fails
when fuse is set in the storage.cfg.
This patch moves the _netdev option into the else part of the if fuse is
set statement. _netdev is only added if the CephFS kernel client mounts
the storage.
It seems _netdev is not needed anyway for the fuse mount, as the
connection is closed, once the fuse process gets killed on shutdown.
Signed-off-by: Alwin Antreich <a.antreich@proxmox.com>
Adds a fallback to 'Plugin::path' in the default implementation of
'map_volume' to simplify a common case of calling 'map_volume' followed
by a defined-check and a call to path if it is not. The path is now
always returned if the plugin in question does not override
'map_volume'.
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
The underlying issue is that a zpool can get imported only once, so
we first check if it's in `zpool list`, and thus imported, and only
if it does not shows up there we try to import it.
But, this can race with either:
* parallel running activate_storage call, through CLI/API/daemon
* a zpool import from an admin (a bit unlikely, but hey that's the
thing with race conditions ;))
So refactor the "is pool imported" check into a closure, and call it
addditionally if the import failed, and silent the error if the pool
is now listed, and thus imported. This makes it a little bit nicer to
read too, IMO.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
during storage activation.
for pools that don't get imported at boot (e.g. because their vdevs are
not available when zfs-import-*.service runs) it is fatal to include
them in the cachefile, for those that do get imported at boot this code
should never run anyway as they are already imported.
in any case, a fallback to import without cachefile is the safe variant.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
fixes the 'cannot create 'nvme/foo': volume size must be a multiple of
volume block size' error by always rounding the size up to the next 1M
boundary. this is a workaround until
https://github.com/zfsonlinux/zfs/issues/8541 is solved.
the current manpage says 128k is the maximum blocksize, but a local test
showed that values up to 1M are allowed. it might be possible to
increase it even further (see f1512ee61).
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
When trying to create a qcow2 disk image with a size larger than available on the
storage, this will fail.
As qemu-img does not clean up the disk afterwards, it needs to be deleted
explicitly. Further, the vmid folder is cleaned up once it is empty.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
will be used to contain files which can be executed as hookscripts or
contain custom cloud-init configs
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
use the port where the main glusterfs daemon listens on as ping port,
this one is also used by QEMU as default.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Remove directories if they are empty, which can happen if all images
from a VM got deleted, e.g., after destroying said VM.
Signed-off-by: Christian Ebner <c.ebner@proxmox.com>
LVMPlugin->volume_import (used by storage_migrate on either offline
migration with local disks, or online migration with storage-only
referenced disks) passed 'conv=sparse' to `dd`. This can lead to
data-corruption, if the target volume is not zero-initialized.
dropping the sparse argument completely would fix the problem, but
breaks keeping data sparse for LvmThinPlugin.
This patch moves the dd out into (LVM*) plugin specific sub so that
each can control the parameters.
Steps for reproducing the issue:
* create a cluster with (at least) 2 nodes A and B, with a free
disk-device (/dev/sdx)
* write a recognizable pattern to /dev/sdx on B:
`dd if=/dev/zero bs=10M | tr '\000' '\255' | dd of=/dev/sdb bs=10M`
(would be grateful for alternatives to the dd| tr| dd)
* on both A and B create a lvm-vg (pvcreate, vgcreate)
* add it as _not_ shared storage, which is available on nodes A and B
* create a small guest on A
* fill a file in the guest with zeros
`dd if=/dev/zero of=/zerofil bs=10M`
* stop the guest, migrate it to B
* start the guest - check that the file `/zerofil` contains `ad`
instead of `00`
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
From `man 8 lvchange`:
--refresh
If the logical volume is active, reload its metadata. This is not
necessary in normal operation, but may be useful ... if you're doing
clustering manually without a clustered lock manager.
Fixes migration in a shared LVM (iscsi) setup, where a disk gets resized on one
node A and the guest is afterwards migrated to another node B: B still presents
the old size to the guest, leading to data corruption.
It is necessary to run `lvchange` twice because the options `-ay` and
`--refresh` are mutually exclusive.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Without volume_size_info a Storage plugin falls back to the Implementation
in PVE/Storage/Plugin.pm, which relies on `qemu-img info`.
`qemu-img info` returns wrong results on a node in the case of shared volume
groups (e.g. when sharing disks via iSCSI), if a disk was resized on another
node (it lseeks to the end of the block-device, and this yields the old size).
Using lvs directly fixes the issue, since the LVM metadata gets updated when
invoking lvs.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
it is not a storage plugin, and it makes more sense to have it
top-level, but there we cannot name it CephTools because of the
existing ones in pve-manager
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
As we mount this manually and thus systemd doesn't know about any
dependency for cephFS mounts, this got umounted only at the last
stage of shutdown, where network wasn't active anymore.
But, CephFS needs to be connected to an active MDS for a clean
unmount so without network this mount would delay shutdown for quite
a bit, until after some minutes systemd gave up and forced unmount.
So tell systemd that this mount requires network, which can be done
with the '_netdev'[0] mount option, that lucky for us can be also
passed to a mount call and isn't only available for fstab.
with this a mount gets, among others:
> Wants=network-online.target
> Before=umount.target remote-fs.target
> After=remote-fs-pre.target system.slice network.target network-online.target -.mount
Which does the trick for us.
[0]: https://www.freedesktop.org/software/systemd/man/systemd.mount.html#_netdev
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This is important for shared LVM storages. As with deletes and
creates of images, as else we may have not the up-to-date metadata
and extents may get reused if another node created an image during
the same time, for example.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This allows to request a mapped device/path explicitly, regardles of
the storage option, eg. krbd option in the RBDplugin.
Bump of the storage ABI => 2
Co-authored-by: Alwin Antreich <a.antreich@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>