|
|
Slang APIs are documented as taking UTF-8 encoded shader source,
though it's not explicitly documented whether it is allowed to
include a BOM (Byte Order Marker). This change adds support for
UTF-8 BOM markers by virtue of disposing of BOM data. As a bonus,
UTF-16 input which can cleanly decode to UTF-8 is now also
accepted.
Throwing out the BOM on input is done by leveraging existing
functionality in "determineEncoding()", however a bug exists there
for null-terminated single character input, where the null byte
caused a heuristic to guess UTF-16, even though the null byte
isn't part of the string. The bug in "determineEncoding" is fixed
by only guessing when bytes >= 2 and not looking past the end
of the buffer. The 'implicit-cast' test was mistakenly relying
on the bug to pass, as its expected file was being read as UTF16
and cropped to zero length due to the bug. The expected output
of implicit-cast is updated to pass with the bug fix in place.
The decoding of UTF-16 to UTF-8 is done through an existing
'decode' method. This change fixes a bug in UTF16-LE 'decode'
where it was decoded as if it were Big-Endian.
Adds 3 small tests to ensure the compiler doesn't choke on source
files in UTF-8 (with BOM), UTF16-LE, or UTF16-BE.
Bonus: Fixes a bug in diagnostic reporting where hex values were
incorrectly translated to text, leading to incorrect, possibly
truncated strings.
Fixes #4046
Co-authored-by: Yong He <yonghe@outlook.com>
|