248 lines
9.7 KiB
Diff
Executable file
248 lines
9.7 KiB
Diff
Executable file
From 520309591712f72a36f87d086f005e9f2c21f388 Mon Sep 17 00:00:00 2001
|
|
From: Jason Ekstrand <jason@jlekstrand.net>
|
|
Date: Wed, 8 Jun 2022 10:21:42 -0500
|
|
Subject: [PATCH 3/5] dma-buf: Add an API for importing sync files (v10)
|
|
MIME-Version: 1.0
|
|
Content-Type: text/plain; charset=UTF-8
|
|
Content-Transfer-Encoding: 8bit
|
|
|
|
This patch is analogous to the previous sync file export patch in that
|
|
it allows you to import a sync_file into a dma-buf. Unlike the previous
|
|
patch, however, this does add genuinely new functionality to dma-buf.
|
|
Without this, the only way to attach a sync_file to a dma-buf is to
|
|
submit a batch to your driver of choice which waits on the sync_file and
|
|
claims to write to the dma-buf. Even if said batch is a no-op, a submit
|
|
is typically way more overhead than just attaching a fence. A submit
|
|
may also imply extra synchronization with other work because it happens
|
|
on a hardware queue.
|
|
|
|
In the Vulkan world, this is useful for dealing with the out-fence from
|
|
vkQueuePresent. Current Linux window-systems (X11, Wayland, etc.) all
|
|
rely on dma-buf implicit sync. Since Vulkan is an explicit sync API, we
|
|
get a set of fences (VkSemaphores) in vkQueuePresent and have to stash
|
|
those as an exclusive (write) fence on the dma-buf. We handle it in
|
|
Mesa today with the above mentioned dummy submit trick. This ioctl
|
|
would allow us to set it directly without the dummy submit.
|
|
|
|
This may also open up possibilities for GPU drivers to move away from
|
|
implicit sync for their kernel driver uAPI and instead provide sync
|
|
files and rely on dma-buf import/export for communicating with other
|
|
implicit sync clients.
|
|
|
|
We make the explicit choice here to only allow setting RW fences which
|
|
translates to an exclusive fence on the dma_resv. There's no use for
|
|
read-only fences for communicating with other implicit sync userspace
|
|
and any such attempts are likely to be racy at best. When we got to
|
|
insert the RW fence, the actual fence we set as the new exclusive fence
|
|
is a combination of the sync_file provided by the user and all the other
|
|
fences on the dma_resv. This ensures that the newly added exclusive
|
|
fence will never signal before the old one would have and ensures that
|
|
we don't break any dma_resv contracts. We require userspace to specify
|
|
RW in the flags for symmetry with the export ioctl and in case we ever
|
|
want to support read fences in the future.
|
|
|
|
There is one downside here that's worth documenting: If two clients
|
|
writing to the same dma-buf using this API race with each other, their
|
|
actions on the dma-buf may happen in parallel or in an undefined order.
|
|
Both with and without this API, the pattern is the same: Collect all
|
|
the fences on dma-buf, submit work which depends on said fences, and
|
|
then set a new exclusive (write) fence on the dma-buf which depends on
|
|
said work. The difference is that, when it's all handled by the GPU
|
|
driver's submit ioctl, the three operations happen atomically under the
|
|
dma_resv lock. If two userspace submits race, one will happen before
|
|
the other. You aren't guaranteed which but you are guaranteed that
|
|
they're strictly ordered. If userspace manages the fences itself, then
|
|
these three operations happen separately and the two render operations
|
|
may happen genuinely in parallel or get interleaved. However, this is a
|
|
case of userspace racing with itself. As long as we ensure userspace
|
|
can't back the kernel into a corner, it should be fine.
|
|
|
|
v2 (Jason Ekstrand):
|
|
- Use a wrapper dma_fence_array of all fences including the new one
|
|
when importing an exclusive fence.
|
|
|
|
v3 (Jason Ekstrand):
|
|
- Lock around setting shared fences as well as exclusive
|
|
- Mark SIGNAL_SYNC_FILE as a read-write ioctl.
|
|
- Initialize ret to 0 in dma_buf_wait_sync_file
|
|
|
|
v4 (Jason Ekstrand):
|
|
- Use the new dma_resv_get_singleton helper
|
|
|
|
v5 (Jason Ekstrand):
|
|
- Rename the IOCTLs to import/export rather than wait/signal
|
|
- Drop the WRITE flag and always get/set the exclusive fence
|
|
|
|
v6 (Jason Ekstrand):
|
|
- Split import and export into separate patches
|
|
- New commit message
|
|
|
|
v7 (Daniel Vetter):
|
|
- Fix the uapi header to use the right struct in the ioctl
|
|
- Use a separate dma_buf_import_sync_file struct
|
|
- Add kerneldoc for dma_buf_import_sync_file
|
|
|
|
v8 (Jason Ekstrand):
|
|
- Rebase on Christian König's fence rework
|
|
|
|
v9 (Daniel Vetter):
|
|
- Fix -EINVAL checks for the flags parameter
|
|
- Add documentation about read/write fences
|
|
- Add documentation about the expected usage of import/export and
|
|
specifically call out the possible userspace race.
|
|
|
|
v10 (Simon Ser):
|
|
- Fix a typo in the docs
|
|
|
|
Signed-off-by: Jason Ekstrand <jason@jlekstrand.net>
|
|
Signed-off-by: Jason Ekstrand <jason.ekstrand@intel.com>
|
|
Signed-off-by: Jason Ekstrand <jason.ekstrand@collabora.com>
|
|
Reviewed-by: Christian König <christian.koenig@amd.com>
|
|
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
|
|
Cc: Sumit Semwal <sumit.semwal@linaro.org>
|
|
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
|
|
Signed-off-by: Simon Ser <contact@emersion.fr>
|
|
Link: https://patchwork.freedesktop.org/patch/msgid/20220608152142.14495-3-jason@jlekstrand.net
|
|
(cherry picked from commit 594740497e998d30477ab26093bfb81c28cd3ff1)
|
|
---
|
|
drivers/dma-buf/dma-buf.c | 42 +++++++++++++++++++++++++++++++
|
|
include/uapi/linux/dma-buf.h | 49 ++++++++++++++++++++++++++++++++++++
|
|
2 files changed, 91 insertions(+)
|
|
|
|
diff --git a/drivers/dma-buf/dma-buf.c b/drivers/dma-buf/dma-buf.c
|
|
index 437f3620dce8..41ebb645efda 100644
|
|
--- a/drivers/dma-buf/dma-buf.c
|
|
+++ b/drivers/dma-buf/dma-buf.c
|
|
@@ -523,6 +523,46 @@ static long dma_buf_export_sync_file(struct dma_buf *dmabuf,
|
|
put_unused_fd(fd);
|
|
return ret;
|
|
}
|
|
+
|
|
+static long dma_buf_import_sync_file(struct dma_buf *dmabuf,
|
|
+ const void __user *user_data)
|
|
+{
|
|
+ struct dma_buf_import_sync_file arg;
|
|
+ struct dma_fence *fence;
|
|
+ bool write;
|
|
+ int ret = 0;
|
|
+
|
|
+ if (copy_from_user(&arg, user_data, sizeof(arg)))
|
|
+ return -EFAULT;
|
|
+
|
|
+ if (arg.flags & ~DMA_BUF_SYNC_RW)
|
|
+ return -EINVAL;
|
|
+
|
|
+ if ((arg.flags & DMA_BUF_SYNC_RW) == 0)
|
|
+ return -EINVAL;
|
|
+
|
|
+ fence = sync_file_get_fence(arg.fd);
|
|
+ if (!fence)
|
|
+ return -EINVAL;
|
|
+
|
|
+ write = (arg.flags & DMA_BUF_SYNC_WRITE);
|
|
+
|
|
+ dma_resv_lock(dmabuf->resv, NULL);
|
|
+
|
|
+ if (write) {
|
|
+ dma_resv_add_excl_fence(dmabuf->resv, fence);
|
|
+ } else {
|
|
+ ret = dma_resv_reserve_shared(dmabuf->resv, 1);
|
|
+ if (!ret)
|
|
+ dma_resv_add_shared_fence(dmabuf->resv, fence);
|
|
+ }
|
|
+
|
|
+ dma_resv_unlock(dmabuf->resv);
|
|
+
|
|
+ dma_fence_put(fence);
|
|
+
|
|
+ return ret;
|
|
+}
|
|
#endif
|
|
|
|
static long dma_buf_ioctl(struct file *file,
|
|
@@ -610,6 +650,8 @@ static long dma_buf_ioctl(struct file *file,
|
|
#if IS_ENABLED(CONFIG_SYNC_FILE)
|
|
case DMA_BUF_IOCTL_EXPORT_SYNC_FILE:
|
|
return dma_buf_export_sync_file(dmabuf, (void __user *)arg);
|
|
+ case DMA_BUF_IOCTL_IMPORT_SYNC_FILE:
|
|
+ return dma_buf_import_sync_file(dmabuf, (const void __user *)arg);
|
|
#endif
|
|
|
|
default:
|
|
diff --git a/include/uapi/linux/dma-buf.h b/include/uapi/linux/dma-buf.h
|
|
index 522bcee5498e..b4ceeaedfa87 100644
|
|
--- a/include/uapi/linux/dma-buf.h
|
|
+++ b/include/uapi/linux/dma-buf.h
|
|
@@ -48,6 +48,24 @@ struct dma_buf_sync {
|
|
* dma-buf for waiting later instead of waiting immediately. This is
|
|
* useful for modern graphics APIs such as Vulkan which assume an explicit
|
|
* synchronization model but still need to inter-operate with dma-buf.
|
|
+ *
|
|
+ * The intended usage pattern is the following:
|
|
+ *
|
|
+ * 1. Export a sync_file with flags corresponding to the expected GPU usage
|
|
+ * via DMA_BUF_IOCTL_EXPORT_SYNC_FILE.
|
|
+ *
|
|
+ * 2. Submit rendering work which uses the dma-buf. The work should wait on
|
|
+ * the exported sync file before rendering and produce another sync_file
|
|
+ * when complete.
|
|
+ *
|
|
+ * 3. Import the rendering-complete sync_file into the dma-buf with flags
|
|
+ * corresponding to the GPU usage via DMA_BUF_IOCTL_IMPORT_SYNC_FILE.
|
|
+ *
|
|
+ * Unlike doing implicit synchronization via a GPU kernel driver's exec ioctl,
|
|
+ * the above is not a single atomic operation. If userspace wants to ensure
|
|
+ * ordering via these fences, it is the respnosibility of userspace to use
|
|
+ * locks or other mechanisms to ensure that no other context adds fences or
|
|
+ * submits work between steps 1 and 3 above.
|
|
*/
|
|
struct dma_buf_export_sync_file {
|
|
/**
|
|
@@ -71,6 +89,36 @@ struct dma_buf_export_sync_file {
|
|
__s32 fd;
|
|
};
|
|
|
|
+/**
|
|
+ * struct dma_buf_import_sync_file - Insert a sync_file into a dma-buf
|
|
+ *
|
|
+ * Userspace can perform a DMA_BUF_IOCTL_IMPORT_SYNC_FILE to insert a
|
|
+ * sync_file into a dma-buf for the purposes of implicit synchronization
|
|
+ * with other dma-buf consumers. This allows clients using explicitly
|
|
+ * synchronized APIs such as Vulkan to inter-op with dma-buf consumers
|
|
+ * which expect implicit synchronization such as OpenGL or most media
|
|
+ * drivers/video.
|
|
+ */
|
|
+struct dma_buf_import_sync_file {
|
|
+ /**
|
|
+ * @flags: Read/write flags
|
|
+ *
|
|
+ * Must be DMA_BUF_SYNC_READ, DMA_BUF_SYNC_WRITE, or both.
|
|
+ *
|
|
+ * If DMA_BUF_SYNC_READ is set and DMA_BUF_SYNC_WRITE is not set,
|
|
+ * this inserts the sync_file as a read-only fence. Any subsequent
|
|
+ * implicitly synchronized writes to this dma-buf will wait on this
|
|
+ * fence but reads will not.
|
|
+ *
|
|
+ * If DMA_BUF_SYNC_WRITE is set, this inserts the sync_file as a
|
|
+ * write fence. All subsequent implicitly synchronized access to
|
|
+ * this dma-buf will wait on this fence.
|
|
+ */
|
|
+ __u32 flags;
|
|
+ /** @fd: Sync file descriptor */
|
|
+ __s32 fd;
|
|
+};
|
|
+
|
|
#define DMA_BUF_BASE 'b'
|
|
#define DMA_BUF_IOCTL_SYNC _IOW(DMA_BUF_BASE, 0, struct dma_buf_sync)
|
|
|
|
@@ -81,6 +129,7 @@ struct dma_buf_export_sync_file {
|
|
#define DMA_BUF_SET_NAME_A _IOW(DMA_BUF_BASE, 1, u32)
|
|
#define DMA_BUF_SET_NAME_B _IOW(DMA_BUF_BASE, 1, u64)
|
|
#define DMA_BUF_IOCTL_EXPORT_SYNC_FILE _IOWR(DMA_BUF_BASE, 2, struct dma_buf_export_sync_file)
|
|
+#define DMA_BUF_IOCTL_IMPORT_SYNC_FILE _IOW(DMA_BUF_BASE, 3, struct dma_buf_import_sync_file)
|
|
|
|
struct dma_buf_sync_partial {
|
|
__u64 flags;
|
|
--
|
|
2.38.1
|
|
|