diff options
| author | Tim Foley <tfoleyNV@users.noreply.github.com> | 2020-03-11 08:50:38 -0700 |
|---|---|---|
| committer | GitHub <noreply@github.com> | 2020-03-11 08:50:38 -0700 |
| commit | 935768c6a00c258bf5122a2d04b84064a1eee67d (patch) | |
| tree | 68dac944da274a21acb8c8bf651401c26e289f4c /prelude | |
| parent | b380b1af6ba6f5f58e3841c2a5b14db7ee8c372d (diff) | |
Clean-ups related to expanded standard library coverage (#1269)
This change continues the work already started in moving the definitions of many built-in functions to the standard library.
The main focus in this change was reducing the number of operations that had to be special-cased on the CPU and CUDA targets by making sure that the scalar cases of built-in functions map to the proper names in the prelude (e.g., `F32_sin()`) via the ordinary `__target_intrinsic` mechanism. In some cases this cleanup meant that special-case logic that was constructing definitions for those functions using C++ code could be scrapped.
Additional changes made along the way:
* A few scalar functions that were missing in the CPU/CUDA preludes got added: `round`, hyperbolic trigonometric functions, `frexp`, `modf`, and `fma`
* The floating-point `min()` and `max()` definitions in the preludes were changed to use intrinsic operations on the target (which are likely to follow IEEE semantics, while our definitions did not)
* For the CUDA target, many of the functions had their names translated during code emit from, e.g., `sin` to `sinf`. This change makes the CUDA target more closely match the C++/CPU target in using names like `F32_sin` consistently.
* For the CUDA target, a few additional functions have intrinsics that don't exist (portably) on CPU: `sincos()` and `rsqrt()`.
* For the Slang stdlib definitions to work, a new `$P` replacement was defined for `__targert_intrinsic` that expands to a type based on the first operand of the function (e.g., `F32` for `float`).
* I removed the dedicated opcodes for matrix-matrix, matrix-vector, and vector-matrix multiplication, and instead turned them into ordinary functions with definitions and `__target_intrinsic` modifiers to map them appropriately for HLSL and GLSL. This is realistically how we would have implemented these if we'd had `__target_intrinsic` from the start.
Notes about possible follow-on work:
* The `ldexp` function is still left in the Slang stdlib because it has to account for a floating-point exponent and the `math.h` version only handles integers for the exponent. It is possible that we can/should define another overload for `ldexp` (and `frexp`) that uses an integer for exponent, and then have that one be a built-in on CPU/CUDA, with the HLSL `frexp` being defined in the stdlib to delegate to the correct `frexp` for those targets.
* The `firstbithigh` and related functions are missing for our CPU and CUDA targets, and will need to be added. It is worth nothing that `firstbithigh` apparently has some very odd functionality around signed integer arguments (which are supported, despite MSDN being unclear on that point). General cleanup will be required for those functions.
* Maxing the various matrix and vector products no longer be intrinsic ops might affect how we emit code for them as sub-expressions (both whether we fold them into use sites and how we parenthize them). This doesn't seem to affect any of our existing tests, but we could consider marking these functions with `[__readNone]` to ensure they can be folded, and then also adding whatever modifier(s) we might invent to control precdence and parentheses insertion during emit.
Diffstat (limited to 'prelude')
| -rw-r--r-- | prelude/slang-cpp-scalar-intrinsics.h | 82 | ||||
| -rw-r--r-- | prelude/slang-cuda-prelude.h | 133 |
2 files changed, 124 insertions, 91 deletions
diff --git a/prelude/slang-cpp-scalar-intrinsics.h b/prelude/slang-cpp-scalar-intrinsics.h index 95acd9335..c814365c6 100644 --- a/prelude/slang-cpp-scalar-intrinsics.h +++ b/prelude/slang-cpp-scalar-intrinsics.h @@ -46,12 +46,16 @@ SLANG_FORCE_INLINE float F32_calcSafeRadians(float radians) // Unary SLANG_FORCE_INLINE float F32_ceil(float f) { return ::ceilf(f); } SLANG_FORCE_INLINE float F32_floor(float f) { return ::floorf(f); } +SLANG_FORCE_INLINE float F32_round(float f) { return ::roundf(f); } SLANG_FORCE_INLINE float F32_sin(float f) { return ::sinf(f); } SLANG_FORCE_INLINE float F32_cos(float f) { return ::cosf(f); } SLANG_FORCE_INLINE float F32_tan(float f) { return ::tanf(f); } SLANG_FORCE_INLINE float F32_asin(float f) { return ::asinf(f); } SLANG_FORCE_INLINE float F32_acos(float f) { return ::acosf(f); } SLANG_FORCE_INLINE float F32_atan(float f) { return ::atanf(f); } +SLANG_FORCE_INLINE float F32_sinh(float f) { return ::sinhf(f); } +SLANG_FORCE_INLINE float F32_cosh(float f) { return ::coshf(f); } +SLANG_FORCE_INLINE float F32_tanh(float f) { return ::tanhf(f); } SLANG_FORCE_INLINE float F32_log2(float f) { return ::log2f(f); } SLANG_FORCE_INLINE float F32_log(float f) { return ::logf(f); } SLANG_FORCE_INLINE float F32_log10(float f) { return ::log10f(f); } @@ -61,42 +65,39 @@ SLANG_FORCE_INLINE float F32_abs(float f) { return ::fabsf(f); } SLANG_FORCE_INLINE float F32_trunc(float f) { return ::truncf(f); } SLANG_FORCE_INLINE float F32_sqrt(float f) { return ::sqrtf(f); } SLANG_FORCE_INLINE float F32_rsqrt(float f) { return 1.0f / F32_sqrt(f); } -SLANG_FORCE_INLINE float F32_rcp(float f) { return 1.0f / f; } SLANG_FORCE_INLINE float F32_sign(float f) { return ( f == 0.0f) ? f : (( f < 0.0f) ? -1.0f : 1.0f); } -SLANG_FORCE_INLINE float F32_saturate(float f) { return (f < 0.0f) ? 0.0f : (f > 1.0f) ? 1.0f : f; } SLANG_FORCE_INLINE float F32_frac(float f) { return f - F32_floor(f); } -SLANG_FORCE_INLINE float F32_radians(float f) { return f * 0.01745329222f; } SLANG_FORCE_INLINE bool F32_isnan(float f) { return isnan(f); } SLANG_FORCE_INLINE bool F32_isfinite(float f) { return isfinite(f); } SLANG_FORCE_INLINE bool F32_isinf(float f) { return isinf(f); } // Binary -SLANG_FORCE_INLINE float F32_min(float a, float b) { return a < b ? a : b; } -SLANG_FORCE_INLINE float F32_max(float a, float b) { return a > b ? a : b; } +SLANG_FORCE_INLINE float F32_min(float a, float b) { return ::fminf(a, b); } +SLANG_FORCE_INLINE float F32_max(float a, float b) { return ::fmaxf(a, b); } SLANG_FORCE_INLINE float F32_pow(float a, float b) { return ::powf(a, b); } SLANG_FORCE_INLINE float F32_fmod(float a, float b) { return ::fmodf(a, b); } SLANG_FORCE_INLINE float F32_remainder(float a, float b) { return ::remainderf(a, b); } -SLANG_FORCE_INLINE float F32_step(float a, float b) { return float(b >= a); } SLANG_FORCE_INLINE float F32_atan2(float a, float b) { return float(::atan2(a, b)); } -// TODO(JS): -// Note C++ has ldexp, but it takes an integer for the exponent, it seems HLSL takes both as float -SLANG_FORCE_INLINE float F32_ldexp(float m, float e) { return m * ::powf(2.0f, e); } - -// Ternary -SLANG_FORCE_INLINE float F32_smoothstep(float min, float max, float x) -{ - const float t = x < min ? 0.0f : ((x > max) ? 1.0f : (x - min) / (max - min)); - return t * t * (3.0 - 2.0 * t); +SLANG_FORCE_INLINE float F32_frexp(float x, float& e) +{ + int ei; + float m = ::frexpf(x, &ei); + e = ei; + return m; +} +SLANG_FORCE_INLINE float F32_modf(float x, float& ip) +{ + return ::modff(x, &ip); } -SLANG_FORCE_INLINE float F32_lerp(float x, float y, float s) { return x + s * (y - x); } -SLANG_FORCE_INLINE float F32_clamp(float x, float min, float max) { return ( x < min) ? min : ((x > max) ? max : x); } -SLANG_FORCE_INLINE void F32_sincos(float f, float& outSin, float& outCos) { outSin = F32_sin(f); outCos = F32_cos(f); } SLANG_FORCE_INLINE uint32_t F32_asuint(float f) { Union32 u; u.f = f; return u.u; } SLANG_FORCE_INLINE int32_t F32_asint(float f) { Union32 u; u.f = f; return u.i; } +// Ternary +SLANG_FORCE_INLINE float F32_fma(float a, float b, float c) { return ::fmaf(a, b, c); } + // ----------------------------- F64 ----------------------------------------- SLANG_FORCE_INLINE double F64_calcSafeRadians(double radians) @@ -112,12 +113,16 @@ SLANG_FORCE_INLINE double F64_calcSafeRadians(double radians) // Unary SLANG_FORCE_INLINE double F64_ceil(double f) { return ::ceil(f); } SLANG_FORCE_INLINE double F64_floor(double f) { return ::floor(f); } +SLANG_FORCE_INLINE double F64_round(double f) { return ::round(f); } SLANG_FORCE_INLINE double F64_sin(double f) { return ::sin(f); } SLANG_FORCE_INLINE double F64_cos(double f) { return ::cos(f); } SLANG_FORCE_INLINE double F64_tan(double f) { return ::tan(f); } SLANG_FORCE_INLINE double F64_asin(double f) { return ::asin(f); } SLANG_FORCE_INLINE double F64_acos(double f) { return ::acos(f); } SLANG_FORCE_INLINE double F64_atan(double f) { return ::atan(f); } +SLANG_FORCE_INLINE double F64_sinh(double f) { return ::sinh(f); } +SLANG_FORCE_INLINE double F64_cosh(double f) { return ::cosh(f); } +SLANG_FORCE_INLINE double F64_tanh(double f) { return ::tanh(f); } SLANG_FORCE_INLINE double F64_log2(double f) { return ::log2(f); } SLANG_FORCE_INLINE double F64_log(double f) { return ::log(f); } SLANG_FORCE_INLINE double F64_log10(float f) { return ::log10(f); } @@ -127,38 +132,32 @@ SLANG_FORCE_INLINE double F64_abs(double f) { return ::fabs(f); } SLANG_FORCE_INLINE double F64_trunc(double f) { return ::trunc(f); } SLANG_FORCE_INLINE double F64_sqrt(double f) { return ::sqrt(f); } SLANG_FORCE_INLINE double F64_rsqrt(double f) { return 1.0 / F64_sqrt(f); } -SLANG_FORCE_INLINE double F64_rcp(double f) { return 1.0 / f; } SLANG_FORCE_INLINE double F64_sign(double f) { return (f == 0.0) ? f : ((f < 0.0) ? -1.0 : 1.0); } -SLANG_FORCE_INLINE double F64_saturate(double f) { return (f < 0.0) ? 0.0 : (f > 1.0) ? 1.0 : f; } SLANG_FORCE_INLINE double F64_frac(double f) { return f - F64_floor(f); } -SLANG_FORCE_INLINE double F64_radians(double f) { return f * 0.01745329222; } SLANG_FORCE_INLINE bool F64_isnan(double f) { return isnan(f); } SLANG_FORCE_INLINE bool F64_isfinite(double f) { return isfinite(f); } SLANG_FORCE_INLINE bool F64_isinf(double f) { return isinf(f); } // Binary -SLANG_FORCE_INLINE double F64_min(double a, double b) { return a < b ? a : b; } -SLANG_FORCE_INLINE double F64_max(double a, double b) { return a > b ? a : b; } +SLANG_FORCE_INLINE double F64_min(double a, double b) { return ::fmin(a, b); } +SLANG_FORCE_INLINE double F64_max(double a, double b) { return ::fmax(a, b); } SLANG_FORCE_INLINE double F64_pow(double a, double b) { return ::pow(a, b); } SLANG_FORCE_INLINE double F64_fmod(double a, double b) { return ::fmod(a, b); } SLANG_FORCE_INLINE double F64_remainder(double a, double b) { return ::remainder(a, b); } -SLANG_FORCE_INLINE double F64_step(double a, double b) { return double(b >= a); } SLANG_FORCE_INLINE double F64_atan2(double a, double b) { return ::atan2(a, b); } -// TODO(JS): -// Note C++ has ldexp, but it takes an integer for the exponent, it seems HLSL takes both as float -SLANG_FORCE_INLINE double F64_ldexp(double m, double e) { return m * ::pow(2.0, e); } - -// Ternary -SLANG_FORCE_INLINE double F64_smoothstep(double min, double max, double x) -{ - const double t = x < min ? 0.0 : ((x > max) ? 1.0 : (x - min) / (max - min)); - return t * t * (3.0 - 2.0 * t); +SLANG_FORCE_INLINE double F64_frexp(double x, double& e) +{ + int ei; + double m = ::frexp(x, &ei); + e = ei; + return m; +} +SLANG_FORCE_INLINE double F64_modf(double x, double& ip) +{ + return ::modf(x, &ip); } -SLANG_FORCE_INLINE double F64_lerp(double x, double y, double s) { return x + s * (y - x); } -SLANG_FORCE_INLINE double F64_clamp(double x, double min, double max) { return (x < min) ? min : ((x > max) ? max : x); } -SLANG_FORCE_INLINE void F64_sincos(double f, double& outSin, double& outCos) { outSin = F64_sin(f); outCos = F64_cos(f); } SLANG_FORCE_INLINE void F64_asuint(double d, uint32_t& low, uint32_t& hi) { @@ -176,6 +175,9 @@ SLANG_FORCE_INLINE void F64_asint(double d, int32_t& low, int32_t& hi) hi = int32_t(u.u >> 32); } +// Ternary +SLANG_FORCE_INLINE double F64_fma(double a, double b, double c) { return ::fma(a, b, c); } + // ----------------------------- I32 ----------------------------------------- SLANG_FORCE_INLINE int32_t I32_abs(int32_t f) { return (f < 0) ? -f : f; } @@ -183,8 +185,6 @@ SLANG_FORCE_INLINE int32_t I32_abs(int32_t f) { return (f < 0) ? -f : f; } SLANG_FORCE_INLINE int32_t I32_min(int32_t a, int32_t b) { return a < b ? a : b; } SLANG_FORCE_INLINE int32_t I32_max(int32_t a, int32_t b) { return a > b ? a : b; } -SLANG_FORCE_INLINE int32_t I32_clamp(int32_t x, int32_t min, int32_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_FORCE_INLINE float I32_asfloat(int32_t x) { Union32 u; u.i = x; return u.f; } SLANG_FORCE_INLINE uint32_t I32_asuint(int32_t x) { return uint32_t(x); } SLANG_FORCE_INLINE double I32_asdouble(int32_t low, int32_t hi ) @@ -201,8 +201,6 @@ SLANG_FORCE_INLINE uint32_t U32_abs(uint32_t f) { return f; } SLANG_FORCE_INLINE uint32_t U32_min(uint32_t a, uint32_t b) { return a < b ? a : b; } SLANG_FORCE_INLINE uint32_t U32_max(uint32_t a, uint32_t b) { return a > b ? a : b; } -SLANG_FORCE_INLINE uint32_t U32_clamp(uint32_t x, uint32_t min, uint32_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_FORCE_INLINE float U32_asfloat(uint32_t x) { Union32 u; u.u = x; return u.f; } SLANG_FORCE_INLINE uint32_t U32_asint(int32_t x) { return uint32_t(x); } @@ -238,8 +236,6 @@ SLANG_FORCE_INLINE uint64_t U64_abs(uint64_t f) { return f; } SLANG_FORCE_INLINE uint64_t U64_min(uint64_t a, uint64_t b) { return a < b ? a : b; } SLANG_FORCE_INLINE uint64_t U64_max(uint64_t a, uint64_t b) { return a > b ? a : b; } -SLANG_FORCE_INLINE uint64_t U64_clamp(uint64_t x, uint64_t min, uint64_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_FORCE_INLINE uint32_t U64_countbits(uint64_t v) { #if SLANG_GCC_FAMILY @@ -264,8 +260,6 @@ SLANG_FORCE_INLINE int64_t I64_abs(int64_t f) { return (f < 0) ? -f : f; } SLANG_FORCE_INLINE int64_t I64_min(int64_t a, int64_t b) { return a < b ? a : b; } SLANG_FORCE_INLINE int64_t I64_max(int64_t a, int64_t b) { return a > b ? a : b; } -SLANG_FORCE_INLINE int64_t I64_clamp(int64_t x, int64_t min, int64_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - #ifdef SLANG_PRELUDE_NAMESPACE } #endif diff --git a/prelude/slang-cuda-prelude.h b/prelude/slang-cuda-prelude.h index 457fb4246..0a2ec088b 100644 --- a/prelude/slang-cuda-prelude.h +++ b/prelude/slang-cuda-prelude.h @@ -131,67 +131,113 @@ union Union64 // ----------------------------- F32 ----------------------------------------- // Unary -SLANG_CUDA_CALL float F32_rcp(float f) { return 1.0f / f; } +SLANG_CUDA_CALL float F32_ceil(float f) { return ::ceilf(f); } +SLANG_CUDA_CALL float F32_floor(float f) { return ::floorf(f); } +SLANG_CUDA_CALL float F32_round(float f) { return ::roundf(f); } +SLANG_CUDA_CALL float F32_sin(float f) { return ::sinf(f); } +SLANG_CUDA_CALL float F32_cos(float f) { return ::cosf(f); } +SLANG_CUDA_CALL void F32_sincos(float f, float& s, float& c) { ::sincosf(f, &s, &c); } +SLANG_CUDA_CALL float F32_tan(float f) { return ::tanf(f); } +SLANG_CUDA_CALL float F32_asin(float f) { return ::asinf(f); } +SLANG_CUDA_CALL float F32_acos(float f) { return ::acosf(f); } +SLANG_CUDA_CALL float F32_atan(float f) { return ::atanf(f); } +SLANG_CUDA_CALL float F32_sinh(float f) { return ::sinhf(f); } +SLANG_CUDA_CALL float F32_cosh(float f) { return ::coshf(f); } +SLANG_CUDA_CALL float F32_tanh(float f) { return ::tanhf(f); } +SLANG_CUDA_CALL float F32_log2(float f) { return ::log2f(f); } +SLANG_CUDA_CALL float F32_log(float f) { return ::logf(f); } +SLANG_CUDA_CALL float F32_log10(float f) { return ::log10f(f); } +SLANG_CUDA_CALL float F32_exp2(float f) { return ::exp2f(f); } +SLANG_CUDA_CALL float F32_exp(float f) { return ::expf(f); } +SLANG_CUDA_CALL float F32_abs(float f) { return ::fabsf(f); } +SLANG_CUDA_CALL float F32_trunc(float f) { return ::truncf(f); } +SLANG_CUDA_CALL float F32_sqrt(float f) { return ::sqrtf(f); } +SLANG_CUDA_CALL float F32_rsqrt(float f) { return ::rsqrtf(f); } SLANG_CUDA_CALL float F32_sign(float f) { return ( f == 0.0f) ? f : (( f < 0.0f) ? -1.0f : 1.0f); } -SLANG_CUDA_CALL float F32_saturate(float f) { return (f < 0.0f) ? 0.0f : (f > 1.0f) ? 1.0f : f; } -SLANG_CUDA_CALL float F32_frac(float f) { return f - floorf(f); } +SLANG_CUDA_CALL float F32_frac(float f) { return f - F32_floor(f); } SLANG_CUDA_CALL bool F32_isnan(float f) { return isnan(f); } SLANG_CUDA_CALL bool F32_isfinite(float f) { return isfinite(f); } SLANG_CUDA_CALL bool F32_isinf(float f) { return isinf(f); } // Binary -SLANG_CUDA_CALL float F32_min(float a, float b) { return a < b ? a : b; } -SLANG_CUDA_CALL float F32_max(float a, float b) { return a > b ? a : b; } -SLANG_CUDA_CALL float F32_step(float a, float b) { return float(b >= a); } - -// TODO(JS): -// Note CUDA has ldexp, but it takes an integer for the exponent, it seems HLSL takes both as float -SLANG_CUDA_CALL float F32_ldexp(float m, float e) { return m * powf(2.0f, e); } - -// Ternary -SLANG_CUDA_CALL float F32_lerp(float x, float y, float s) { return x + s * (y - x); } -SLANG_CUDA_CALL void F32_sincos(float f, float& outSin, float& outCos) { sincosf(f, &outSin, &outCos); } -SLANG_CUDA_CALL float F32_smoothstep(float min, float max, float x) +SLANG_CUDA_CALL float F32_min(float a, float b) { return ::fminf(a, b); } +SLANG_CUDA_CALL float F32_max(float a, float b) { return ::fmaxf(a, b); } +SLANG_CUDA_CALL float F32_pow(float a, float b) { return ::powf(a, b); } +SLANG_CUDA_CALL float F32_fmod(float a, float b) { return ::fmodf(a, b); } +SLANG_CUDA_CALL float F32_remainder(float a, float b) { return ::remainderf(a, b); } +SLANG_CUDA_CALL float F32_atan2(float a, float b) { return float(::atan2(a, b)); } + +SLANG_CUDA_CALL float F32_frexp(float x, float& e) +{ + int ei; + float m = ::frexpf(x, &ei); + e = ei; + return m; +} +SLANG_CUDA_CALL float F32_modf(float x, float& ip) { - const float t = x < min ? 0.0f : ((x > max) ? 1.0f : (x - min) / (max - min)); - return t * t * (3.0 - 2.0 * t); + return ::modff(x, &ip); } -SLANG_CUDA_CALL float F32_clamp(float x, float min, float max) { return ( x < min) ? min : ((x > max) ? max : x); } SLANG_CUDA_CALL uint32_t F32_asuint(float f) { Union32 u; u.f = f; return u.u; } SLANG_CUDA_CALL int32_t F32_asint(float f) { Union32 u; u.f = f; return u.i; } +// Ternary +SLANG_CUDA_CALL float F32_fma(float a, float b, float c) { return ::fmaf(a, b, c); } + + // ----------------------------- F64 ----------------------------------------- // Unary -SLANG_CUDA_CALL double F64_rcp(double f) { return 1.0 / f; } +SLANG_CUDA_CALL double F64_ceil(double f) { return ::ceil(f); } +SLANG_CUDA_CALL double F64_floor(double f) { return ::floor(f); } +SLANG_CUDA_CALL double F64_round(double f) { return ::round(f); } +SLANG_CUDA_CALL double F64_sin(double f) { return ::sin(f); } +SLANG_CUDA_CALL double F64_cos(double f) { return ::cos(f); } +SLANG_CUDA_CALL void F64_sincos(double f, double& s, double& c) { ::sincos(f, &s, &c); } +SLANG_CUDA_CALL double F64_tan(double f) { return ::tan(f); } +SLANG_CUDA_CALL double F64_asin(double f) { return ::asin(f); } +SLANG_CUDA_CALL double F64_acos(double f) { return ::acos(f); } +SLANG_CUDA_CALL double F64_atan(double f) { return ::atan(f); } +SLANG_CUDA_CALL double F64_sinh(double f) { return ::sinh(f); } +SLANG_CUDA_CALL double F64_cosh(double f) { return ::cosh(f); } +SLANG_CUDA_CALL double F64_tanh(double f) { return ::tanh(f); } +SLANG_CUDA_CALL double F64_log2(double f) { return ::log2(f); } +SLANG_CUDA_CALL double F64_log(double f) { return ::log(f); } +SLANG_CUDA_CALL double F64_log10(float f) { return ::log10(f); } +SLANG_CUDA_CALL double F64_exp2(double f) { return ::exp2(f); } +SLANG_CUDA_CALL double F64_exp(double f) { return ::exp(f); } +SLANG_CUDA_CALL double F64_abs(double f) { return ::fabs(f); } +SLANG_CUDA_CALL double F64_trunc(double f) { return ::trunc(f); } +SLANG_CUDA_CALL double F64_sqrt(double f) { return ::sqrt(f); } +SLANG_CUDA_CALL double F64_rsqrt(double f) { return ::rsqrt(f); } SLANG_CUDA_CALL double F64_sign(double f) { return (f == 0.0) ? f : ((f < 0.0) ? -1.0 : 1.0); } -SLANG_CUDA_CALL double F64_saturate(double f) { return (f < 0.0) ? 0.0 : (f > 1.0) ? 1.0 : f; } -SLANG_CUDA_CALL double F64_frac(double f) { return f - floor(f); } +SLANG_CUDA_CALL double F64_frac(double f) { return f - F64_floor(f); } SLANG_CUDA_CALL bool F64_isnan(double f) { return isnan(f); } SLANG_CUDA_CALL bool F64_isfinite(double f) { return isfinite(f); } SLANG_CUDA_CALL bool F64_isinf(double f) { return isinf(f); } // Binary -SLANG_CUDA_CALL double F64_min(double a, double b) { return a < b ? a : b; } -SLANG_CUDA_CALL double F64_max(double a, double b) { return a > b ? a : b; } -SLANG_CUDA_CALL double F64_step(double a, double b) { return double(b >= a); } - -// TODO(JS): -// Note CUDA has ldexp, but it takes an integer for the exponent, it seems HLSL takes both as float -SLANG_CUDA_CALL double F64_ldexp(double m, double e) { return m * pow(2.0, e); } - -// Ternary -SLANG_CUDA_CALL double F64_lerp(double x, double y, double s) { return x + s * (y - x); } -SLANG_CUDA_CALL void F64_sincos(double f, double& outSin, double& outCos) { sincos(f, &outSin, &outCos); } -SLANG_CUDA_CALL double F64_smoothstep(double min, double max, double x) -{ - const double t = x < min ? 0.0 : ((x > max) ? 1.0 : (x - min) / (max - min)); - return t * t * (3.0 - 2.0 * t); +SLANG_CUDA_CALL double F64_min(double a, double b) { return ::fmin(a, b); } +SLANG_CUDA_CALL double F64_max(double a, double b) { return ::fmax(a, b); } +SLANG_CUDA_CALL double F64_pow(double a, double b) { return ::pow(a, b); } +SLANG_CUDA_CALL double F64_fmod(double a, double b) { return ::fmod(a, b); } +SLANG_CUDA_CALL double F64_remainder(double a, double b) { return ::remainder(a, b); } +SLANG_CUDA_CALL double F64_atan2(double a, double b) { return ::atan2(a, b); } + +SLANG_CUDA_CALL double F64_frexp(double x, double& e) +{ + int ei; + double m = ::frexp(x, &ei); + e = ei; + return m; +} +SLANG_CUDA_CALL double F64_modf(double x, double& ip) +{ + return ::modf(x, &ip); } -SLANG_CUDA_CALL double F64_clamp(double x, double min, double max) { return (x < min) ? min : ((x > max) ? max : x); } SLANG_CUDA_CALL void F64_asuint(double d, uint32_t& low, uint32_t& hi) { @@ -209,6 +255,9 @@ SLANG_CUDA_CALL void F64_asint(double d, int32_t& low, int32_t& hi) hi = int32_t(u.u >> 32); } +// Ternary +SLANG_CUDA_CALL double F64_fma(double a, double b, double c) { return ::fma(a, b, c); } + // ----------------------------- I32 ----------------------------------------- // Unary @@ -218,9 +267,6 @@ SLANG_CUDA_CALL int32_t I32_abs(int32_t f) { return (f < 0) ? -f : f; } SLANG_CUDA_CALL int32_t I32_min(int32_t a, int32_t b) { return a < b ? a : b; } SLANG_CUDA_CALL int32_t I32_max(int32_t a, int32_t b) { return a > b ? a : b; } -// Ternary -SLANG_CUDA_CALL int32_t I32_clamp(int32_t x, int32_t min, int32_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_CUDA_CALL float I32_asfloat(int32_t x) { Union32 u; u.i = x; return u.f; } SLANG_CUDA_CALL uint32_t I32_asuint(int32_t x) { return uint32_t(x); } SLANG_CUDA_CALL double I32_asdouble(int32_t low, int32_t hi ) @@ -239,9 +285,6 @@ SLANG_CUDA_CALL uint32_t U32_abs(uint32_t f) { return f; } SLANG_CUDA_CALL uint32_t U32_min(uint32_t a, uint32_t b) { return a < b ? a : b; } SLANG_CUDA_CALL uint32_t U32_max(uint32_t a, uint32_t b) { return a > b ? a : b; } -// Ternary -SLANG_CUDA_CALL uint32_t U32_clamp(uint32_t x, uint32_t min, uint32_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_CUDA_CALL float U32_asfloat(uint32_t x) { Union32 u; u.u = x; return u.f; } SLANG_CUDA_CALL uint32_t U32_asint(int32_t x) { return uint32_t(x); } @@ -266,8 +309,6 @@ SLANG_CUDA_CALL int64_t I64_abs(int64_t f) { return (f < 0) ? -f : f; } SLANG_CUDA_CALL int64_t I64_min(int64_t a, int64_t b) { return a < b ? a : b; } SLANG_CUDA_CALL int64_t I64_max(int64_t a, int64_t b) { return a > b ? a : b; } -SLANG_CUDA_CALL int64_t I64_clamp(int64_t x, int64_t min, int64_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - // ----------------------------- U64 ----------------------------------------- SLANG_CUDA_CALL int64_t U64_abs(uint64_t f) { return f; } @@ -275,8 +316,6 @@ SLANG_CUDA_CALL int64_t U64_abs(uint64_t f) { return f; } SLANG_CUDA_CALL int64_t U64_min(uint64_t a, uint64_t b) { return a < b ? a : b; } SLANG_CUDA_CALL int64_t U64_max(uint64_t a, uint64_t b) { return a > b ? a : b; } -SLANG_CUDA_CALL int64_t U64_clamp(uint64_t x, uint64_t min, uint64_t max) { return ( x < min) ? min : ((x > max) ? max : x); } - SLANG_CUDA_CALL uint32_t U64_countbits(uint64_t v) { // https://docs.nvidia.com/cuda/cuda-math-api/group__CUDA__MATH__INTRINSIC__INT.html#group__CUDA__MATH__INTRINSIC__INT_1g43c9c7d2b9ebf202ff1ef5769989be46 |
