From 52b91231cdadc048f93b224f5035759cf1a96eaa Mon Sep 17 00:00:00 2001
From: Sai Praveen Bangaru <31557731+saipraveenb25@users.noreply.github.com>
Date: Tue, 30 Apr 2024 16:05:33 -0400
Subject: Added diagnostics & built-in type lowering for `[CUDAKernel]`
 functions (#4042)

* Added diagnostics & built-in type lowering for `[CUDAKernel]` functions

This PR adds
- Diagnostics for non-void return from a cuda kernel entry point
- Diagnostics for using differentiable types in a differentiable cuda kernel entry point
- Logic for converting built-in types (float3, float3x3, etc..) to portable struct types and unpacks the parameter back into a built-in type on the CUDA side. This is because built-in types have different implementations in CUDA & CPP targets, which causes signature mis-match when linking.

* Fix error codes

* Add ability to lower structs and arrays that contain built-in types.

+ Added tests
+ Fix issue where the host-side was not marshalling data to lowered types.

* Update slang-ir-pytorch-cpp-binding.cpp

---------

Co-authored-by: Yong He <yonghe@outlook.com>
---
 source/slang/slang-emit.cpp | 3 +++
 1 file changed, 3 insertions(+)

(limited to 'source/slang/slang-emit.cpp')

diff --git a/source/slang/slang-emit.cpp b/source/slang/slang-emit.cpp
index 1fa04b4be..afdd37fce 100644
--- a/source/slang/slang-emit.cpp
+++ b/source/slang/slang-emit.cpp
@@ -471,10 +471,13 @@ Result linkAndOptimizeIR(
     switch (target)
     {
     case CodeGenTarget::PyTorchCppBinding:
+        generateHostFunctionsForAutoBindCuda(irModule, sink);
+        lowerBuiltinTypesForKernelEntryPoints(irModule, sink);
         generatePyTorchCppBinding(irModule, sink);
         handleAutoBindNames(irModule);
         break;
     case CodeGenTarget::CUDASource:
+        lowerBuiltinTypesForKernelEntryPoints(irModule, sink);
         removeTorchKernels(irModule);
         handleAutoBindNames(irModule);
         break;
-- 
cgit v1.2.3