useful for rollback, so that only the required replication snapshots
can be removed, and it's possible to abort early without deleting any
replication snapshots if there are other non-replication snasphots
blocking rollback.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
by not dying when the dataset is already unmounted. Can be triggered
for a container by doing two rollbacks in a row.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Bumps APIVER to 9 and resets APIAGE to zero.
The import methods (volume_import, volume_import_formats):
These additionally get the '$snapshot' parameter which is
already present on the export side as an informational piece
to know which of the snapshots is the *current* one.
This parameter is inserted *in the middle* of the current
parameters, so the import & export format methods now have
the same signatures.
The current "disk" state will be set to this snapshot.
This, too, is required for our btrfs implementation.
`volume_import_formats` can obviously not make much
*use* of this parameter, but it'll still be useful to know
that the information is actually available in the import
call, so its presence will be checked in the btrfs
implementation.
Currently this is intended to be used for btrfs send/recv
support, which in theory could also get additional metadata
similar to how we do the "tar+size" format, however, we
currently only really use this within this repository in
storage_migrate() which has this information readily
available anyway.
On the export side (volume_export, volume_export_formats):
The `$with_snapshots` option is now "defined" to be an
ordered array of snapshots to include, as a hint for
storages which need this. (As of the next commit this is
only btrfs, and only when also specifying a base snapshot,
which is a case we can currently not run into except on the
command line interface.)
The current providers of the `with_snapshot` option will
still treat it as a boolean (since eg. for ZFS you cannot
really "skip" snapshots AFAIK).
This is mainly intended for storages which do not have a
strong association between snapshots and the originals, or
an ordering (eg. btrfs and lvm-thin allow creating
arbitrary snapshot trees, and with btrfs you can even
create a "circular" connection between subvolumes, also we
could consider reflink based copies snapshots on xfs in
the future maybe?)
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
A restore to ZFS for a container which has a volume (rootfs / mount
point) of size 0 failed because the refquota property does not accept
'0k' but wants 'none' in that situation.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Early return when mounted heuristics returns true, that allows to get
rid of an indentation level.
Moving the heuristic out makes the activate method smaller and easier
to grasp
Best viewed with ignoring whitespace changes (`git show -w`).
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
highly unlikely to fail in our setups, most realistic case is when
procfs is not mounted at /proc, which breaks much else anyway and is
a requirement
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
this was mistakenly done as the procfs code uses it and it was
assumed we need to decode this too to get both in the same
encoding-space and thus correct comparission.
But only procfs has that encoding, we don't have it for pool values
in the storage config, so we must not do a decode on that value, that
could potentially break.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
This commit is a small performance optimization to the previous one:
`zpool list` is cheaper than `zpool import -d /dev..` (the latter
scans the disks in the provided directory for zfs signatures,
unconditionally)
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
This patch addresses an issue we recently saw on a production machine:
* after booting a ZFS pool failed to get imported (due to an empty
/etc/zfs/zpool.cache)
* pvestatd/guest-startall eventually tried to import the pool
* the pool was imported, yet the datasets of the pool remained
not mounted
A bit of debugging showed that `zpool import <poolname>` is not
atomic, in fact it does fork+exec `mount` with appropriate parameters.
If an import ran longer than the hardcoded timeout of 15s, it could
happen that the pool got imported, but the zpool command (and its
forks) got terminated due to timing out.
reproducing this is straight-forward by setting (drastic) bw+iops
limits on a guests' disk (which contains a zpool) - e.g.:
`qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\
iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\
mbps_wr_max=15`
afterwards running `timeout 15 zpool import <poolname>` resulted in
that situation in the guest on my machine
The patch changes the check in activate_storage for the ZFSPoolPlugin,
to check if any dataset below the 'pool' (which can also be a sub-dataset)
is mounted by parsing /proc/mounts:
* this is cheaper than running `zfs get` or `zpool list`
* it catches a properly imported and mounted pool in case the
root-dataset has 'canmount' set to off (or noauto), as long
as any dataset below is mounted
After trying to import the pool, we also run `zfs mount -a` (in case
another check of /proc/mounts fails).
Potential for regression:
* running `zfs mount -a` is problematic, if a dataset is manually
umounted after booting (without setting 'canmount')
* a pool without any mounted dataset (no mountpoint property set and
only zvols) - will result in repeated calls to `zfs mount -a`
both of the above seem unlikely and should not occur, if using our
tooling.
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
as described in the zfs bug https://github.com/openzfs/zfs/issues/10931
the kernel keeps around cached data from mmaps after a rollback, thus
having invalid data in files that were allegedly rolled back
to workaround this (until a real fix comes along), we unmount the subvol,
invalidating the kernel cache anyway
the dataset gets mounted on the next 'activate_volume' again
Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
When using the path to request properties, and no ZFS file system is mounted
at that path, ZFS will fall back to the parent filesystem:
> # zfs unmount myzpool/subvol-172-disk-0
> # zfs get mounted /myzpool/subvol-172-disk-0
> NAME PROPERTY VALUE SOURCE
> myzpool mounted yes -
> # zfs get mounted myzpool/subvol-172-disk-0
> NAME PROPERTY VALUE SOURCE
> myzpool/subvol-172-disk-0 mounted no -
Thus, we cannot use the path and need to use the dataset directly.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
quiet takes care of both the error and success case.
Without this, there are lines like:
myzpool/vm-4352-disk-0@__replicate_4352-7_1601538554__ name myzpool/vm-4352-disk-0@__replicate_4352-7_1601538554__ -
in the log if the dataset exists, and this information is
already present in more readable form.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
commit 815df2dd08 introduced a small issue
when activating linked clone volumes - the volname passed contains
basevol/subvol, which needs to be translated to subvol.
using the path method should be a robust way to get the actual path for
activation.
Found and tested by building the package as root (otherwise the zfs
regressiontests are skipped).
Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
Makes it possible to clone and start a container whose
ZFS subvols are not yet mounted for some reason. If a
subvol cannot be mounted, there's a better error now:
zfs error: cannot mount '/myzpool/subvol-103-disk-0': directory is not empty
Previously, cloning would quietly do an "empty" clone,
and startup would fail with:
mount_autodev: 1074 Permission denied - Failed to create "/dev" directory
lxc_setup: 3238 Failed to mount "/dev"
do_start: 1224 Failed to setup container "103"
__sync_wait: 41 An error occurred in another process (expected sequence number 5)
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
we don't even know whether $snap exists at all, so the old variant could
be rather misleading..
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
ZFS supports the -p flag in the list command since a few years now.
Let us use the real byte values and avoid the error prone calculation
from human readable numbers that can lead to incorrect numbers if the
reported human readable value is a rounded number.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Getting the volume sizes as byte values instead of converted to human
readable units helps to avoid rounding errors in the further processing
if the volume size is more on the odd side.
The `zfs list` command supports the -p(arseable) flag since a few years
now.
When returning the size in bytes there is no calculation performed and
thus we need to explicitly cast the size to an integer before returning
it.
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
and also return the ID of the allocated volume. This option
allows plugins to choose a new name if there is a collision.
In storage_migrate, the API version of the receiving side is checked.
In Storage.pm's volume_import, when a plugin returns 'undef',
it can be assumed that the import with the requested volid was
successful (it should've died otherwise) and so volid is returned.
This is done for backwards compatibility with foreign plugins.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
We can use 'list_images' to get the desired volume IDs in
'find_free_diskname' for most plugins. For the two LVM plugins, 'list_images'
potentially skips untagged volumes, so we keep the custom version. For the
RBD plugin, 'list_images' is much more costly than the custom version, so we
keep the custom version.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
The size is required to be a multiple of volblocksize. Make sure
that the requirement is always met, so ZFS won't complain when we do
things like 'qm resize 102 scsi1 +0.01G'.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
When adding a zfspool storage with 'pvesm add' the mount point is now
added automatically to the storage configuration if it can be
determined. path() does not assume the default mountpoint anymore,
fixing 2085.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
and actually do that not just for creating zvols, but also when
activating them. this should fix a range of issues/races that sometimes
occured on bootup, snapshot rollback or similar operations.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
The underlying issue is that a zpool can get imported only once, so
we first check if it's in `zpool list`, and thus imported, and only
if it does not shows up there we try to import it.
But, this can race with either:
* parallel running activate_storage call, through CLI/API/daemon
* a zpool import from an admin (a bit unlikely, but hey that's the
thing with race conditions ;))
So refactor the "is pool imported" check into a closure, and call it
addditionally if the import failed, and silent the error if the pool
is now listed, and thus imported. This makes it a little bit nicer to
read too, IMO.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
during storage activation.
for pools that don't get imported at boot (e.g. because their vdevs are
not available when zfs-import-*.service runs) it is fatal to include
them in the cachefile, for those that do get imported at boot this code
should never run anyway as they are already imported.
in any case, a fallback to import without cachefile is the safe variant.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
fixes the 'cannot create 'nvme/foo': volume size must be a multiple of
volume block size' error by always rounding the size up to the next 1M
boundary. this is a workaround until
https://github.com/zfsonlinux/zfs/issues/8541 is solved.
the current manpage says 128k is the maximum blocksize, but a local test
showed that values up to 1M are allowed. it might be possible to
increase it even further (see f1512ee61).
Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
`zfs create` add the creation job in a worker queue,
which should normally execute instantly. But there are circumstances
where the job will take a while to get processed.
If this is the case udev settle will see no dev in the queue and the program
will continue without an allocated dev.
The busy waiting is not best practice but the only way to be sure,
that the block device exists.
Takes an operation, an optional requested bandwidth
limit override, and a list of storages involved in the
operation and lowers the requested bandwidth against global
and storage-specific limits unless the user has permissions
to change those.
This means:
* Global limits apply to all users without Sys.Modify on /
(as they can change datacenter.cfg options via the API).
* Storage specific limits apply to users without
Datastore.Allocate access on /storage/X for any involved
storage X.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>