wgpu/examples/hello-workgroups/README.md
JustAnotherCodemonkey f738551250
Add Extended Examples (#3885)
* Add the base of the example. May need refining and definitely fact-checking.

* Start change to changelog.

* Complete changelog change for repeated-compute.

* Apply suggestion to fix typos.

Co-authored-by: Alphyr <47725341+a1phyr@users.noreply.github.com>

* Add storage-texture example which currently works native but needs to be changed to work for wasm. [no ci]

* repeated-compute now works on the web. [no ci]

* `storage-texture` now works on the web as well as native.

* Format because I forgot to do that (ugh).

* Add `storage-texture` to changelog.

* Add `render-to-texture` example.

* Not all the files got git added. Fixed it.

* Add `render-to-texture` to changelog.

* Make better readme's and add examples to said readme's.

* Oops. Put the example updates in the wrong place.

* Add `uniform-values` example.

* Apply clippy suggestions.

* Improved readme's and documentation.

* Fmt. Turning into the Joker rn.

* Make instructions for examples on the web more clear. \(Fmt and clippy happy\)

* hello-workgroups It doesn't work.

* Add basic comments and readme to hello-workgroups.

* Add hello-synchronization example. Currently doesn't have any tests but those should be added later.

* Forgot to check wasm compatibility for hello-synchronization. Fixed it.

* Add test for hello-synchronization.

* Make my examples downlevel defaults.

* Make uniform-values downlevel defaults. (Forgot to do that last commit.)

* Fix clippy doc complaints.

* Didn't fully fix the docs last commit. Got it here I think.

* Fix redundant bullet point in examples/hello-workgroups/README.md.

* Trim down the introduction section of examples/hello-workgroups/README.md.

* Add technical links section to examples/hello-workgroups/README.md.

* Use idiomatic Rust comments, break up big text wall into paragraphs, and fix some spelling errors.

* Move output image functions into examples/common and give output_image_wasm some upgrades.

* Modify changelog for moving output_image_native and output_image_wasm into wgpu-example.

* Fix output_image_wasm. (Formerly did not handle pre-existing output image targets.)

* Make a multiline comment be made of single lines to be more ideomatic.

* "Fix" more multiline comments. I think this is actually the last of them.

* Make the window a consistant, square size that's convenient for viewing.

* Make the window on uniform-values not endlessly poll, taking up 100% of the main thread in background at idle. Also, change layout a little and make native use nanos by default for logging.

* Make execute in hello-synchronization return a struct of vecs instead of using out parameters.

* Didn't realize the naming of wgpu_example::framework so I moved my common example utility functions into wgpu_example::utils.

* Add add_web_nothing_to_see_msg function to replace all the instances of adding "open the console" messages across the examples.

* Add small documentation to add_web_nothing_to_see_msg and change it to use h1 instead of p.

* Add documentation to output_image_native and output_image_wasm in examples/common.

* Do better logging for output image functions in wgpu-example::utils.

* Remove redundant append_child'ing of the output image element in wgpu-example::utils::output_image_wasm.

* Fix error regarding log message for having written the image in wgpu-example::utils::output_image_native.

* Fmt.

* In examples/README.md, re-arrange the examples in the graph to be in alphabetical order.

* Fix changlog item regarding wgpu-example::utils and the output image functions.

* Move all the added examples into one changelog item that lists all of them.

* Updated table in examples/README.md with new examples. Added new features to the table to accurately represent the examples.\n\nFor the new features, not all old examples may be fully represented.

* Fix inaccurate comment in hello-workgroups/src/shader.wgsl.

* Update examples/README.md to include basic descriptions of the basic examples as well as hints on how examples build off of each other.

* Remove `capture` example. See changelog entry for reasoning.

* Fix typo in hello-workgroups/shader.wgsl

* Change the method of vertex generation in the shader code of render-to-texture to make it more clear.

* Modify/correct message in repeated-compute/main.rs regarding the output staging buffer.

* Update message in uniform-values/main.rs about writing the app state struct to the input WGSL buffer.

* Add notice in repeated-compute/main.rs about why async channels are necessary (portability to WASM).

* Revise comment in uniform-values/main.rs about why we don't cast the struct using POD to be more clear.

* Change uniform-values to use encase for translating AppState to WGSL bytes.

* Cargo & Clippy: My two best friends.

* Add MIT-0 to the list of allowed liscences.

* Fix docs for wasm.

---------

Co-authored-by: Alphyr <47725341+a1phyr@users.noreply.github.com>
2023-10-08 02:05:51 -04:00

7.5 KiB

hello-workgroups

Now you finally know what that silly little @workgroup_size(1) means!

This example is an extremely bare-bones and arguably somewhat unreasonable demonstration of what workgroup sizes mean in an attempt to explain workgroups in general.

The example starts with two arrays of numbers. One where a[i] = i and the other where b[i] = 2i. Both are bound to the shader. The program dispatches a workgroup for each index, each workgroup representing both elements at that index in both arrays. Each invocation in each workgroup works on its respective array and adds 1 to the element there.

What are Workgroups?

TLDR / Key Takeaways

  • Workgroups fit in a 3d grid of workgroups executed in a single dispatch.
  • All invocations in a workgroup are guaranteed to execute concurrently.
  • Workgroups carry no other guarantees for concurrency outside of those individual workgroups, meaning...
    • No two workgroups can be guaranteed to be executed in parallel.
    • No two workgroups can be guaranteed NOT to be executed in parallel.
    • No set of workgroups can be guaranteed to execute in any predictable or reliable order in relation to each other.
  • Ths size of a workgroup is defined with the @workgroup_size attribute on a compute shader main function.
  • The location of an invocation within its workgroup grid can be got with @builtin(local_invocation_id).
  • The location of an invocation within the entire compute shader grid can be gotten with @builtin(global_invocation_id).
  • The location of an invocation's workgroup within the dispatch grid can be gotten with @builtin(workgroup_id).
  • Workgroups share memory within the workgroup address space. Workgroup memory is similar to private memory but it is shared within a workgroup. Invocations within a workgroup will see the same memory but invocations across workgroups will be accessing different memory.

Introduction

When you call ComputePass::dispatch_workgroups, the function dispatches multiple workgroups in a 3d grid defined by the x, y, and z parameters you pass to the function. For example, dispatch_workgroups(5, 2, 1) would create a dispatch grid like

W W W W W
W W W W W

Where each W is a workgroup. If you want your shader to consider what workgroup within the dispatch the current invocation is in, add a function argument with type vec3<u32> and with the attribute @builtin(workgroup_id).

Note here that in this example, the term "dispatch grid" is used throughout to mean the grid of workgroups within the dispatch but is not a proper term within WGSL. Other terms to know though that are proper are "workgroup grid" which refers to the invocations in a single workgroup and "compute shader grid" which refers to the grid of all the invocations in the entire dispatch.

Within the Workgroup

Although with hello-compute and repeated-compute, we used a workgroup size of (1), or rather, (1, 1, 1), and then each workgroup called from dispatch_workgroups made an invocation, this isn't always the case. Each workgroup represents its own little grid of individual invocations tied together. This could be just one or practically any number in a 3d grid of invocations. The grid size of each workgroup and thus the number of invocations called per workgroup is determined by the @workgroup_size attribute you've seen in other compute shaders. To get the current invocation's location within a workgroup, add a vec3<u32> argument to the main function with the attribute @builtin(local_invocation_id). We'll look at the compute shader grid of a dispatch of size (2, 2, 1) with workgroup sizes of (2, 2, 1) as well. Let w be the workgroup_id and i be the local_invocation_id.

w(0, 0, 0), i(0, 0, 0) w(0, 0, 0), i(1, 0, 0) w(1, 0, 0), i(0, 0, 0) w(1, 0, 0), i(1, 0, 0)
w(0, 0, 0), i(0, 1, 0) w(0, 0, 0), i(1, 1, 0) w(1, 0, 0), i(0, 1, 0) w(1, 0, 0), i(1, 1, 0)
w(0, 1, 0), i(0, 0, 0) w(0, 1, 0), i(1, 0, 0) w(1, 1, 0), i(0, 0, 0) w(1, 1, 0), i(1, 0, 0)
w(0, 1, 0), i(0, 1, 0) w(0, 1, 0), i(1, 1, 0) w(1, 1, 0), i(0, 1, 0) w(1, 1, 0), i(1, 1, 0)

Execution of Workgroups

As stated before, workgroups are groups of invocations. The invocations within a workgroup are always guaranteed to execute in parallel. That said, the guarantees basically stop there. You cannot get any guarantee as to when any given workgroup will execute, including in relation to other workgroups. You can't guarantee that any two workgroups will execute together nor can you guarantee that they will not execute together. Of the workgroups that don't execute together, you additionally cannot guarantee that they will execute in any particular order. When your function runs in an invocation, you know that it will be working together with its workgroup buddies and that's basically it.

See the WGSL spec on compute shader execution for more details.

Workgroups and their Invocations in a Global Scope

As mentioned above, invocations exist both within the context of a workgroup grid as well as a compute shader grid which is a grid, divided into workgroup sections, of invocations that represents the whole of the dispatch. Similar to how @builtin(local_invocation_id) gets you the place of the invocation within the workgroup grid, @builtin(global_invocation_id) gets you the place of the invocation within the entire compute shader grid. Slight trivia: you might imagine that this is computed from local_invocation_id and workgroup_id but it's actually the opposite. Everything operates on the compute shader grid, the workgroups are imagined sectors within the compute shader grid, and local_invocation_id and workgroup_id are calculated based on global id and known workgroup size. Yes, we live in a matrix... of compute shader invocations. This isn't super useful information but it can help fit things into a larger picture.

Barriers and Workgroups

Arguably, workgroups are at their most useful when being used alongside barriers. Since barriers are already explained more thoroughly in the hello-synchronization example, this section will be short. Despite affecting different memory address spaces, all synchronization functions affect invocations on a workgroup level, synchronizing the workgroup. See hello-synchronization/README.md for more.

For a rather long explainer, this README may still leave the more technically minded person with questions. The specifications for both WebGPU and WGSL ("WebGPU Shading Language") are long and it's rather unintuitive that by far the vast majority of specification on how workgroups and compute shaders more generally work, is all in the WGSL spec. Below are some links into the specifications at a couple interesting points:

  • Here is the main section on workgroups and outlines important terminology in technical terms. It is recommended that everyone looking for something in this section of this README start by reading this.
  • Here is a section on compute shaders from a WebGPU perspective (instead of WGSL). It's still a stub but hopefully it will grow in the future.
  • Don't forget your @builtin()'s!