Remove incorrect UTF decode assert (#5028)

The assert assumed that after removing a BOM and "deflating" UTF* to UTF8, the decoded (UTF8) size should be less than the raw size (UTF8 or UTF16). However, UTF8 is not actually smaller than UTF16 for some UTF16 codepoints. Specifically, UTF16 code points (2 bytes) 0x800+ are 3 to 4 bytes large. The assert is mostly obeyed for source code files, but is easily violated for binary files with more random values. Wikipedia clarifies why: https://en.wikipedia.org/wiki/UTF-8#UTF-16 "Text encoded in UTF-8 will be smaller than the same text encoded in UTF-16 if there are more code points below U+0080 than in the range U+0800..U+FFFF. This is true for all modern European languages. It is often true even for languages like Chinese, due to the large number of spaces, newlines, digits, and HTML markup in typical files."
author: cheneym2 <acheney@nvidia.com> 2024-09-06 14:38:33 -0400
committer: GitHub <noreply@github.com> 2024-09-06 14:38:33 -0400
commit: dcd6c246a14d2321e75c657f381b08b7ab08016e (patch)
tree: a2d2cf83661101004ce3b7714ffcb198ca6d4818 /source/compiler-core
parent: 8662375b075a6cdb86baffd188af721320dac8b1 (diff)
1 files changed, 0 insertions, 1 deletions
diff --git a/source/compiler-core/slang-source-loc.cpp b/source/compiler-core/slang-source-loc.cpp
index 75601b815..5f1f51a38 100644
--- a/source/compiler-core/slang-source-loc.cpp
+++ b/source/compiler-core/slang-source-loc.cpp
@@ -597,7 +597,6 @@ void SourceFile::setContents(ISlangBlob* blob)
 
     char const* decodedContentBegin = (char const*)m_contentBlob->getBufferPointer();
     const UInt decodedContentSize = m_contentBlob->getBufferSize();
-    assert(decodedContentSize <= rawContentSize);
     char const* decodedContentEnd = decodedContentBegin + decodedContentSize;
 
     m_content = UnownedStringSlice(decodedContentBegin, decodedContentEnd);
author	cheneym2 <acheney@nvidia.com>	2024-09-06 14:38:33 -0400
committer	GitHub <noreply@github.com>	2024-09-06 14:38:33 -0400
commit	dcd6c246a14d2321e75c657f381b08b7ab08016e (patch)
tree	a2d2cf83661101004ce3b7714ffcb198ca6d4818 /source/compiler-core
parent	8662375b075a6cdb86baffd188af721320dac8b1 (diff)