Commit Graph

99 Commits

Author SHA1 Message Date
8c20d8afa3 plugin: add volume_snapshot_info function
which allows for better choices of common replication snapshots.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-08 10:35:53 +01:00
9a5d50950c zfspool: add blockers parameter to volume_snapshot_is_possible
useful for rollback, so that only the required replication snapshots
can be removed, and it's possible to abort early without deleting any
replication snapshots if there are other non-replication snasphots
blocking rollback.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-08 10:34:00 +01:00
ac5c1af57c zfspool: add zfs_get_sorted_snapshot_list helper
replacing the current zfs_get_latest_snapshot. For
volume_snapshot_list, ignore errors as before.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-11-08 10:34:00 +01:00
ab3516a6d7 zfs: fix unmount request
by not dying when the dataset is already unmounted. Can be triggered
for a container by doing two rollbacks in a row.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-08-12 11:48:42 +02:00
3cc29a0487 bump storage API: update import/export methods
Bumps APIVER to 9 and resets APIAGE to zero.

The import methods (volume_import, volume_import_formats):

These additionally get the '$snapshot' parameter which is
already present on the export side as an informational piece
to know which of the snapshots is the *current* one.
This parameter is inserted *in the middle* of the current
parameters, so the import & export format methods now have
the same signatures.
The current "disk" state will be set to this snapshot.
This, too, is required for our btrfs implementation.
  `volume_import_formats` can obviously not make much
*use* of this parameter, but it'll still be useful to know
that the information is actually available in the import
call, so its presence will be checked in the btrfs
implementation.

Currently this is intended to be used for btrfs send/recv
support, which in theory could also get additional metadata
similar to how we do the "tar+size" format, however, we
currently only really use this within this repository in
storage_migrate() which has this information readily
available anyway.

On the export side (volume_export, volume_export_formats):

The `$with_snapshots` option is now "defined" to be an
ordered array of snapshots to include, as a hint for
storages which need this. (As of the next commit this is
only btrfs, and only when also specifying a base snapshot,
which is a case we can currently not run into except on the
command line interface.)
  The current providers of the `with_snapshot` option will
still treat it as a boolean (since eg. for ZFS you cannot
really "skip" snapshots AFAIK).
  This is mainly intended for storages which do not have a
strong association between snapshots and the originals, or
an ordering (eg. btrfs and lvm-thin allow creating
arbitrary snapshot trees, and with btrfs you can even
create a "circular" connection between subvolumes, also we
could consider reflink based copies snapshots on xfs in
the future maybe?)

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-06-23 20:20:31 +02:00
8c858f7eeb fix #3345: zfs: restore container volume to ZFS with size 0
A restore to ZFS for a container which has a volume (rootfs / mount
point) of size 0 failed because the refquota property does not accept
'0k' but wants 'none' in that situation.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-12 14:37:50 +02:00
26de022b56 zpool: activate: move mount check out and make program flow easier
Early return when mounted heuristics returns true, that allows to get
rid of an indentation level.

Moving the heuristic out makes the activate method smaller and easier
to grasp

Best viewed with ignoring whitespace changes (`git show -w`).

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-19 15:21:21 +01:00
9440330aba zpool: activate: don't eval procfs read, if it fails it should be fatal
highly unlikely to fail in our setups, most realistic case is when
procfs is not mounted at /proc, which breaks much else anyway and is
a requirement

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-19 15:06:22 +01:00
5b715fd984 zpool: activate: drop intermediate state variable, return directly
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-19 15:04:53 +01:00
c31aee36d8 zpool: avoid wrong mount-decoding of dataset
this was mistakenly done as the procfs code uses it and it was
assumed we need to decode this too to get both in the same
encoding-space and thus correct comparission.

But only procfs has that encoding, we don't have it for pool values
in the storage config, so we must not do a decode on that value, that
could potentially break.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-19 15:01:39 +01:00
e74b9dd79d zfspoolplugin: check if imported before importing
This commit is a small performance optimization to the previous one:
`zpool list` is cheaper than `zpool import -d /dev..` (the latter
scans the disks in the provided directory for zfs signatures,
unconditionally)

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2021-02-19 14:22:43 +01:00
56a7637adb zfspoolplugin: check if mounted instead of imported
This patch addresses an issue we recently saw on a production machine:
* after booting a ZFS pool failed to get imported (due to an empty
  /etc/zfs/zpool.cache)
* pvestatd/guest-startall eventually tried to import the pool
* the pool was imported, yet the datasets of the pool remained
  not mounted

A bit of debugging showed that `zpool import <poolname>` is not
atomic, in fact it does fork+exec `mount` with appropriate parameters.
If an import ran longer than the hardcoded timeout of 15s, it could
happen that the pool got imported, but the zpool command (and its
forks) got terminated due to timing out.

reproducing this is straight-forward by setting (drastic) bw+iops
limits on a guests' disk (which contains a zpool) - e.g.:
`qm set 100 -scsi1 wd:vm-100-disk-1,iops_rd=10,iops_rd_max=20,\
iops_wr=15,iops_wr_max=20,mbps_rd=10,mbps_rd_max=15,mbps_wr=10,\
mbps_wr_max=15`
afterwards running `timeout 15 zpool import <poolname>` resulted in
that situation in the guest on my machine

The patch changes the check in activate_storage for the ZFSPoolPlugin,
to check if any dataset below the 'pool' (which can also be a sub-dataset)
is mounted by parsing /proc/mounts:
* this is cheaper than running `zfs get` or `zpool list`
* it catches a properly imported and mounted pool in case the
  root-dataset has 'canmount' set to off (or noauto), as long
  as any dataset below is mounted

After trying to import the pool, we also run `zfs mount -a` (in case
another check of /proc/mounts fails).

Potential for regression:
* running `zfs mount -a` is problematic, if a dataset is manually
  umounted after booting (without setting 'canmount')
* a pool without any mounted dataset (no mountpoint property set and
  only zvols) - will result in repeated calls to `zfs mount -a`

both of the above seem unlikely and should not occur, if using our
tooling.

Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2021-02-19 14:18:36 +01:00
a2d747a118 zfspoolplugin: activate_storage: minor cleanup
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2021-02-19 14:18:06 +01:00
26a8f21ab2 add workaround for zfs rollback bug
as described in the zfs bug https://github.com/openzfs/zfs/issues/10931
the kernel keeps around cached data from mmaps after a rollback, thus
having invalid data in files that were allegedly rolled back

to workaround this (until a real fix comes along), we unmount the subvol,
invalidating the kernel cache anyway

the dataset gets mounted on the next 'activate_volume' again

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-01-26 18:32:47 +01:00
f3ccd0ef3d plugin: hooks: add explicit returns
to avoid returning something unexpected. Finish what
afeda18256 already started for all the other
plugins. At least for ZFS's on_add_hook this is necessary (adding a ZFS storage
currently fails as reported here [0]), but it cannot hurt
in the other places either as the only hooks we expect to return something
currently are PBS's on_add_hook and on_update_hook.

[0]: https://forum.proxmox.com/threads/gui-add-zfs-storage-verification-failed-400-config-type-check-object-failed.79734/

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-27 10:45:42 +01:00
3893d2755e fix volume activation for ZFS subvols
When using the path to request properties, and no ZFS file system is mounted
at that path, ZFS will fall back to the parent filesystem:

> # zfs unmount myzpool/subvol-172-disk-0
> # zfs get mounted /myzpool/subvol-172-disk-0
> NAME     PROPERTY  VALUE    SOURCE
> myzpool  mounted   yes      -
> # zfs get mounted myzpool/subvol-172-disk-0
> NAME                       PROPERTY  VALUE    SOURCE
> myzpool/subvol-172-disk-0  mounted   no       -

Thus, we cannot use the path and need to use the dataset directly.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-11-22 18:36:23 +01:00
0fd0a6270b avoid output of zfs get command on volume import
quiet takes care of both the error and success case.
Without this, there are lines like:
myzpool/vm-4352-disk-0@__replicate_4352-7_1601538554__	name	myzpool/vm-4352-disk-0@__replicate_4352-7_1601538554__	-
in the log if the dataset exists, and this information is
already present in more readable form.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-10-28 14:05:49 +01:00
59fdc2b71e fix regression in zfs volume activation
commit 815df2dd08 introduced a small issue
when activating linked clone volumes - the volname passed contains
basevol/subvol, which needs to be translated to subvol.

using the path method should be a robust way to get the actual path for
activation.

Found and tested by building the package as root (otherwise the zfs
regressiontests are skipped).

Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2020-09-29 18:52:32 +02:00
e3eb131ec5 zfs pool: clean up use statements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-09-29 05:08:57 +02:00
815df2dd08 ZFS: mount subvols in activate_volume
Makes it possible to clone and start a container whose
ZFS subvols are not yet mounted for some reason. If a
subvol cannot be mounted, there's a better error now:
zfs error: cannot mount '/myzpool/subvol-103-disk-0': directory is not empty

Previously, cloning would quietly do an "empty" clone,
and startup would fail with:
mount_autodev: 1074 Permission denied - Failed to create "/dev" directory
lxc_setup: 3238 Failed to mount "/dev"
do_start: 1224 Failed to setup container "103"
__sync_wait: 41 An error occurred in another process (expected sequence number 5)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-09-29 05:07:15 +02:00
d0eaf18571 zfs: rollback: improve error message
we don't even know whether $snap exists at all, so the old variant could
be rather misleading..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-09-23 15:11:17 +02:00
c8eb017867 zfs: handle unexpectedly missing snapshots better
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-09-23 15:11:17 +02:00
e05113fbe5 ZFS: use -p flag where possible
Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-04-09 10:20:06 +02:00
3881e68025 ZFS: use -p flag and remove zfs_parse_size
ZFS supports the -p flag in the list command since a few years now.
Let us use the real byte values and avoid the error prone calculation
from human readable numbers that can lead to incorrect numbers if the
reported human readable value is a rounded number.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-04-09 10:20:06 +02:00
d99de0f898 ZFSPoolPlugin: fix #2662 get volume size correctly
Getting the volume sizes as byte values instead of converted to human
readable units helps to avoid rounding errors in the further processing
if the volume size is more on the odd side.

The `zfs list` command supports the -p(arseable) flag since a few years
now.
When returning the size in bytes there is no  calculation performed and
thus we need to explicitly cast the size to an integer before returning
it.

Signed-off-by: Aaron Lauterer <a.lauterer@proxmox.com>
2020-04-09 10:19:59 +02:00
a97d3ee49f Introduce allow_rename parameter for pvesm import and storage_migrate
and also return the ID of the allocated volume. This option
allows plugins to choose a new name if there is a collision.

In storage_migrate, the API version of the receiving side is checked.

In Storage.pm's volume_import, when a plugin returns 'undef',
it can be assumed that the import with the requested volid was
successful (it should've died otherwise) and so volid is returned.
This is done for backwards compatibility with foreign plugins.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-04-09 09:41:01 +02:00
75815bf556 Check whether 'zfs get mountpoint' returns a valid absolute path
The command 'zfs get mountpoint' can return 'none' and so 'mountpoint
none' was written to storage.cfg, which would block the fall-back to
using the default mount point when requesting a path, see [0].

[0]: https://forum.proxmox.com/threads/zfs-backup-with-snapshot-mode-fails.61927/#post-284123

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2020-02-18 13:26:46 +01:00
a44c0147bc Use a common interface for find_free_diskname
We can use 'list_images' to get the desired volume IDs in
'find_free_diskname' for most plugins. For the two LVM plugins, 'list_images'
potentially skips untagged volumes, so we keep the custom version. For the
RBD plugin, 'list_images' is much more costly than the custom version, so we
keep the custom version.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2019-12-12 12:52:43 +01:00
d3e3e5d6bd When resizing a ZFS volume, align size to 1M
The size is required to be a multiple of volblocksize. Make sure
that the requirement is always met, so ZFS won't complain when we do
things like 'qm resize 102 scsi1 +0.01G'.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2019-12-09 14:07:50 +01:00
528aa0eefb followup: remove no-op JSONSchema::check_format on mountpoint
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-11-21 12:43:55 +01:00
dcefd9dd28 fix #2085: add mountpoint property for non-default ZFS pool MPs
When adding a zfspool storage with 'pvesm add' the mount point is now
added automatically to the storage configuration if it can be
determined.  path() does not assume the default mountpoint anymore,
fixing 2085.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2019-11-21 12:42:38 +01:00
4966c8869e Introduce zfs_get_properties helper
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2019-11-14 12:38:25 +01:00
3824ba88a9 Whitespace cleanup
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2019-10-08 11:22:12 +02:00
56362cfb55 ZFS: refactor waiting for zvol symlinks
and actually do that not just for creating zvols, but also when
activating them. this should fix a range of issues/races that sometimes
occured on bootup, snapshot rollback or similar operations.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-08-06 13:39:43 +02:00
b5c8278a3e zpool: handle race with other zpool imports
The underlying issue is that a zpool can get imported only once, so
we first check if it's in `zpool list`, and thus imported, and only
if it does not shows up there we try to import it.

But, this can race with either:
* parallel running activate_storage call, through CLI/API/daemon
* a zpool import from an admin (a bit unlikely, but hey that's the
  thing with race conditions ;))

So refactor the "is pool imported" check into a closure, and call it
addditionally if the import failed, and silent the error if the pool
is now listed, and thus imported. This makes it a little bit nicer to
read too, IMO.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-04-18 10:01:39 +02:00
e9ab8ea313 zPool: fixup timeout setting for import
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-04-17 14:54:42 +00:00
a10695b4e8 zpool: cleanup zfs_request command a bit
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2019-04-17 14:31:05 +00:00
dc18abe07b ZFS Pool: improve error output from activate_storage
related to #2154 which has some issues on zpool list, but we do not
see the error messages from that step : Buggy "pvesm status" output
2019-04-17 08:05:04 +00:00
c6f1315524 zfs: don't generate/update cachefile on pool import
during storage activation.

for pools that don't get imported at boot (e.g. because their vdevs are
not available when zfs-import-*.service runs) it is fatal to include
them in the cachefile, for those that do get imported at boot this code
should never run anyway as they are already imported.

in any case, a fallback to import without cachefile is the safe variant.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2019-04-03 12:18:37 +02:00
cdef3abb25 workaround zfs create -V error for unaligned sizes
fixes the 'cannot create 'nvme/foo': volume size must be a multiple of
volume block size' error by always rounding the size up to the next 1M
boundary. this is a workaround until
https://github.com/zfsonlinux/zfs/issues/8541 is solved.
the current manpage says 128k is the maximum blocksize, but a local test
showed that values up to 1M are allowed. it might be possible to
increase it even further (see f1512ee61).

Signed-off-by: Mira Limbeck <m.limbeck@proxmox.com>
2019-03-30 15:38:35 +01:00
c4a29df483 refactor finding next diskname for all plugins
Signed-off-by: Stoiko Ivanov <s.ivanov@proxmox.com>
2018-09-10 12:21:10 +02:00
9a75947b49 fix #1770: allow ex/import linked clones.
We encode the base-volume-name in the volname what is not needed.
On ex/import we need the correct dataset name.
2018-05-22 11:36:36 +02:00
fc05c9a0d8 fixup error message typo: s/sould/should/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2018-05-08 09:34:37 +02:00
cf8ded6a96 fix #1691: increase timeout in worker
A ZFS storage under heavy load can take more time.
2018-05-07 12:01:20 +02:00
894b9ecc1e fix #1691: replace udev check
`zfs create` add the creation job in a worker queue,
which should normally execute instantly. But there are circumstances
where the job will take a while to get processed.
If this is the case udev settle will see no dev in the queue and the program
will continue without an allocated dev.

The busy waiting is not best practice but the only way to be sure,
that the block device exists.
2018-05-07 12:01:20 +02:00
9edb99a5a7 add Storage::get_bandwidth_limit helper
Takes an operation, an optional requested bandwidth
limit override, and a list of storages involved in the
operation and lowers the requested bandwidth against global
and storage-specific limits unless the user has permissions
to change those.
This means:
 * Global limits apply to all users without Sys.Modify on /
   (as they can change datacenter.cfg options via the API).
 * Storage specific limits apply to users without
   Datastore.Allocate access on /storage/X for any involved
   storage X.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2018-01-31 12:25:32 +01:00
7118dd916b PVE::Storage::storage_can_replicate - hew helper 2017-06-27 06:17:58 +02:00
b43d0f3043 ZFSPoolPlugin.pm: remove unused code 2017-06-08 08:45:22 +02:00
8b622c2dff PVE::Storage::volume_snapshot_list - remove $prefix parameter
Always return the full list of snapshots. Users of this library can easily
filter with a simply 'grep' instead.
2017-06-07 06:20:07 +02:00
d390328bfd api: add import/export format querying 2017-05-12 14:42:17 +02:00