summaryrefslogtreecommitdiffstats
path: root/docs/cpu-target.md
diff options
context:
space:
mode:
authorjsmall-nvidia <jsmall@nvidia.com>2019-09-16 09:38:21 -0400
committerGitHub <noreply@github.com>2019-09-16 09:38:21 -0400
commit40d8f3aeedf018c7c6766e98ec64733abd90671e (patch)
tree0c9cae7bc88d4344dd53596a88c3ce9918f2df13 /docs/cpu-target.md
parentc2e5d2468ad6a38cdb8a067da0678302f6cc6066 (diff)
CPU Performance/Testing improvements (#1055)
* First pass of render-test refactor. * Make window construction a function that can choose an implementation. * Remove OpenGL as currently has windows dependency. * Disable Vulkan as Renderer impl has dependency on windows. * Pass Window in as parameter of 'update'. * Add win-window.cpp as was missing. * Fix warning on windows about signs during comparison. * * Added mechanism to add random arrays as buffer inputs and select type * Improved RenderGenerator to generate more types, and to be more careful around int32 ranges. * Added support for security checks (for Visual Studio C++) * Disable Execption handling being on by default when compiling kernels * Added a 'Group' version of the entry point that will evaluate all threads in a group in a single call. In test code use this method if available. * Added -compile-arg to be able to pass arguments to the compile within render-test * Add documention for the _Group execution feature. * Fix some typos in cpu-target.md
Diffstat (limited to 'docs/cpu-target.md')
-rw-r--r--docs/cpu-target.md17
1 files changed, 17 insertions, 0 deletions
diff --git a/docs/cpu-target.md b/docs/cpu-target.md
index d4ef6ddd8..ac1499218 100644
--- a/docs/cpu-target.md
+++ b/docs/cpu-target.md
@@ -112,6 +112,23 @@ When compiled into a shared library/dll - how is it invoked? The entry point is
void computeMain(ComputeVaryingInput* varyingInput, UniformEntryPointParams* uniformParams, UniformState* uniformState);
```
+
+If compiled with `SLANG_HOST_CALLABLE` the `ISlangSharedLibrary` will export a function named `computeMain` the same name as the entry point in the original source.
+
+ComputeVaryingInput is defined in the prelude as
+
+```
+struct ComputeVaryingInput
+{
+ uint3 groupID;
+ uint3 groupThreadID;
+};
+```
+
+Typically when invoking the kernel it is a question of updating the groupID/groupThreadID, to specify which 'thread' of the computation to execute. For the example above we have `[numthreads(4, 1, 1)]`. This means groupThreadID.x can vary from 0-3 and .y and .z must be 0. That groupID.x indicates which 'group of 4' to execute. So groupID.x = 1, with groupThreadID.x=0,1,2,3 runs the 4th, 5th, 6th and 7th 'thread'. Being able to invoke each thread in this way is flexible - in that any specific thread can specified and executed. It is not necessarily very efficient because there is the call overhead and a small amount of extra work that is performed inside the kernel.
+
+For improved performance there is a mechanism to execute a 'thread group' all in a single invocation. A function with the same signature will be exposed with the entry point name postfixed with `_Group` - in the example above the function would be called 'computeMain_Group'. When calling this function only the groupID need be specified, the groupThreadID is ignored. All of the threads within the group (as specified by `[numthreads]`) will be executed in a single call.
+
The UniformState and UniformEntryPointParams struct typically vary by shader. UniformState holds 'normal' bindings, whereas UniformEntryPointParams hold the uniform entry point parameters. Where specific bindings or parameters are located can be determined by reflection. The structures for the example above would be something like the following...
```