From a8669ade5cb3add8b9ce08e2c3bd96e93190bca8 Mon Sep 17 00:00:00 2001 From: jsmall-nvidia Date: Fri, 17 Jan 2020 09:15:06 -0500 Subject: Slang -> CUDA kernel runs correctly in test infrastructure (#1167) * First pass at BindLocation. * Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class. * Fix handling of constant buffer stripping. Improved initialization. * Trying to generalize BindLocation a little more. Split out CPULikeBindRoot. * More work to make BindLocation et al work with non uniform bindings. * Added parsing to a location. * WIP: Trying to get CPU working with BindLocation. * Describe problem of knowing the type of the reference point in the binding table. * More ideas on getBindings fix. * Remove BindSet as member of BindLocation. * Added BindLocation::Invalid * Made BindLocation able to be key in hash * Use BindLocation for bindings on BindingSet. * Added cuda and nvrtc categories to test infrastructure. Disabled CUDA synthetic tests by default. Fixed such that all tests now produce something in BindLocation style. * Use m_userIndex instead of m_userData on Resource. Move the binding setup out of cpu-compute-util (as no longer CPU specific) * Removed CPUBinding - used BindLocation/BindSet instead. Fixed some bugs around indexOf around uniform indirection. * Renamed BindSet::Resource -> BindSet::Value. * Document BindLocation. * Fixes for Clang/GCC Improve invariant requirement handling when constructing from BindPoints. * WIP: First attempt to run CUDA kernel. * Fix some issues around doing CUDA kernel launch. * Fix issues around use of cudaMemCpy . * Better cuda runtime error checking mechanism. * Fixed bug in passing parameters to cuda kernel launch. Simplified initialisation of context. * WIP: Fix CUDA runtime issues. * Add explicit CUDA synchronize so failures don't appear on implicit ones. * Fix problem emitting non shared variable on CUDA. * Fix some typos in CUDA layout. Use just a pointer for now for CUDA StucturedBuffer. * Arg order for CUDA launch was wrong. * First compute kernel runs on CUDA. --- tests/cuda/compile-to-cuda.slang | 24 +++++++----------------- 1 file changed, 7 insertions(+), 17 deletions(-) (limited to 'tests/cuda') diff --git a/tests/cuda/compile-to-cuda.slang b/tests/cuda/compile-to-cuda.slang index 6166aaf0b..be7d775bd 100644 --- a/tests/cuda/compile-to-cuda.slang +++ b/tests/cuda/compile-to-cuda.slang @@ -1,29 +1,19 @@ //DISABLE_TEST(smoke):SIMPLE: -target ptx -entry computeMain -stage compute +//DISABLE_TEST(compute):COMPARE_COMPUTE:-cpu -compute +//TEST(compute):COMPARE_COMPUTE:-cuda -compute //TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name=outputBuffer RWStructuredBuffer outputBuffer : register(u0); -int quantize(double value) -{ - return int(value * 256); -} - -int quantize(float value) -{ - return int(value * 256); -} - [numthreads(4, 1, 1)] void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) { - float values[] = { -9, 9, -3, 3 }; int tid = int(dispatchThreadID.x); - float value = values[tid]; - - outputBuffer[tid * 4] = quantize(sin(value)); - outputBuffer[tid * 4 + 1] = quantize(cos(value)); - outputBuffer[tid * 4 + 2] = quantize(sin(double(value))); - outputBuffer[tid * 4 + 3] = quantize(cos(double(value))); + outputBuffer[tid * 4] = tid; + outputBuffer[tid * 4 + 1] = tid + 1; + outputBuffer[tid * 4 + 2] = tid + 2; + outputBuffer[tid * 4 + 3] = tid + 3; + } -- cgit v1.2.3