diff options
| author | jsmall-nvidia <jsmall@nvidia.com> | 2020-01-17 09:15:06 -0500 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-01-17 09:15:06 -0500 |
| commit | a8669ade5cb3add8b9ce08e2c3bd96e93190bca8 (patch) | |
| tree | 63be2fa7829c5bf956a5ce4d52af4e1d4073bf84 /tests/cuda | |
| parent | 662721ba4ab0e38924701df4c876a86eb8390968 (diff) | |
Slang -> CUDA kernel runs correctly in test infrastructure (#1167)
* First pass at BindLocation.
* Added BindSet::init - for initializing with two input constant buffers. Needs better name, and perhaps should be another class.
* Fix handling of constant buffer stripping.
Improved initialization.
* Trying to generalize BindLocation a little more.
Split out CPULikeBindRoot.
* More work to make BindLocation et al work with non uniform bindings.
* Added parsing to a location.
* WIP: Trying to get CPU working with BindLocation.
* Describe problem of knowing the type of the reference point in the binding table.
* More ideas on getBindings fix.
* Remove BindSet as member of BindLocation.
* Added BindLocation::Invalid
* Made BindLocation able to be key in hash
* Use BindLocation for bindings on BindingSet.
* Added cuda and nvrtc categories to test infrastructure.
Disabled CUDA synthetic tests by default.
Fixed such that all tests now produce something in BindLocation style.
* Use m_userIndex instead of m_userData on Resource.
Move the binding setup out of cpu-compute-util (as no longer CPU specific)
* Removed CPUBinding - used BindLocation/BindSet instead.
Fixed some bugs around indexOf around uniform indirection.
* Renamed BindSet::Resource -> BindSet::Value.
* Document BindLocation.
* Fixes for Clang/GCC
Improve invariant requirement handling when constructing from BindPoints.
* WIP: First attempt to run CUDA kernel.
* Fix some issues around doing CUDA kernel launch.
* Fix issues around use of cudaMemCpy .
* Better cuda runtime error checking mechanism.
* Fixed bug in passing parameters to cuda kernel launch.
Simplified initialisation of context.
* WIP: Fix CUDA runtime issues.
* Add explicit CUDA synchronize so failures don't appear on implicit ones.
* Fix problem emitting non shared variable on CUDA.
* Fix some typos in CUDA layout.
Use just a pointer for now for CUDA StucturedBuffer.
* Arg order for CUDA launch was wrong.
* First compute kernel runs on CUDA.
Diffstat (limited to 'tests/cuda')
| -rw-r--r-- | tests/cuda/compile-to-cuda.slang | 24 |
1 files changed, 7 insertions, 17 deletions
diff --git a/tests/cuda/compile-to-cuda.slang b/tests/cuda/compile-to-cuda.slang index 6166aaf0b..be7d775bd 100644 --- a/tests/cuda/compile-to-cuda.slang +++ b/tests/cuda/compile-to-cuda.slang @@ -1,29 +1,19 @@ //DISABLE_TEST(smoke):SIMPLE: -target ptx -entry computeMain -stage compute +//DISABLE_TEST(compute):COMPARE_COMPUTE:-cpu -compute +//TEST(compute):COMPARE_COMPUTE:-cuda -compute //TEST_INPUT:ubuffer(data=[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0], stride=4):out,name=outputBuffer RWStructuredBuffer<int> outputBuffer : register(u0); -int quantize(double value) -{ - return int(value * 256); -} - -int quantize(float value) -{ - return int(value * 256); -} - [numthreads(4, 1, 1)] void computeMain(uint3 dispatchThreadID : SV_DispatchThreadID) { - float values[] = { -9, 9, -3, 3 }; int tid = int(dispatchThreadID.x); - float value = values[tid]; - - outputBuffer[tid * 4] = quantize(sin(value)); - outputBuffer[tid * 4 + 1] = quantize(cos(value)); - outputBuffer[tid * 4 + 2] = quantize(sin(double(value))); - outputBuffer[tid * 4 + 3] = quantize(cos(double(value))); + outputBuffer[tid * 4] = tid; + outputBuffer[tid * 4 + 1] = tid + 1; + outputBuffer[tid * 4 + 2] = tid + 2; + outputBuffer[tid * 4 + 3] = tid + 3; + } |
