summaryrefslogtreecommitdiffstats
path: root/tests/autodiff/autobind-plain-vector-input.slang
diff options
context:
space:
mode:
authorSai Praveen Bangaru <31557731+saipraveenb25@users.noreply.github.com>2024-04-30 16:05:33 -0400
committerGitHub <noreply@github.com>2024-04-30 16:05:33 -0400
commit52b91231cdadc048f93b224f5035759cf1a96eaa (patch)
tree23d3263bc662eb96d6284266282695a9b0f1e2db /tests/autodiff/autobind-plain-vector-input.slang
parent70111daf43c87e182695666c34345e061e114a68 (diff)
Added diagnostics & built-in type lowering for `[CUDAKernel]` functions (#4042)
* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions This PR adds - Diagnostics for non-void return from a cuda kernel entry point - Diagnostics for using differentiable types in a differentiable cuda kernel entry point - Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking. * Fix error codes * Add ability to lower structs and arrays that contain built-in types. + Added tests + Fix issue where the host-side was not marshalling data to lowered types. * Update slang-ir-pytorch-cpp-binding.cpp --------- Co-authored-by: Yong He <yonghe@outlook.com>
Diffstat (limited to 'tests/autodiff/autobind-plain-vector-input.slang')
-rw-r--r--tests/autodiff/autobind-plain-vector-input.slang21
1 files changed, 21 insertions, 0 deletions
diff --git a/tests/autodiff/autobind-plain-vector-input.slang b/tests/autodiff/autobind-plain-vector-input.slang
new file mode 100644
index 000000000..216585093
--- /dev/null
+++ b/tests/autodiff/autobind-plain-vector-input.slang
@@ -0,0 +1,21 @@
+//TEST:SIMPLE(filecheck=CUDA): -target cuda -line-directive-mode none
+//TEST:SIMPLE(filecheck=TORCH): -target torch -line-directive-mode none
+
+[AutoPyBindCUDA]
+[CUDAKernel]
+void plain_copy(float3 input, TensorView<float> output)
+{
+ // CUDA: __global__ void __kernel__plain_copy(_VectorStorage_float3_0 input_0, TensorView output_0)
+ // TORCH: void __kernel__plain_copy(_VectorStorage_float3_0 _0, TensorView _1);
+
+ // Get the 'global' index of this thread.
+ uint3 dispatchIdx = cudaThreadIdx() + cudaBlockIdx() * cudaBlockDim();
+
+ // If the thread index is beyond the input size, exit early.
+ if (dispatchIdx.x >= 1)
+ return;
+
+ output[0] = input.x;
+ output[1] = input.y;
+ output[2] = input.z;
+}