Bug fix for newline escaping.

The previous changes had left out logic for "scrubbing" a token value that includes an escaped newline, because I expected it would only occur within whitespace. Unfortunately, some user code looked like this: ``` a + b ``` That is, there was a token at the very start of the line, after the escaped newline. As a result, after consuming the leading whitespace (which didn't end up consuming the escaped newline - but we could consider making it do so in future), the lexer started to lex a token that *starts* with an escaped newline, but turns out to be an identifer (which gets an invalid name). This change adds some ad-hoc code to "scrub" the value of *every* token, which wasteful but at least solves the problem.
author: Tim Foley <tfoley@nvidia.com> 2017-06-19 14:13:55 -0700
committer: Tim Foley <tfoley@nvidia.com> 2017-06-19 14:13:55 -0700
commit: e65d2ad1e776610427b85dd20e861d3ad5e0ea71 (patch)
tree: 2693987d8cbac64369b30857f6a6b8e6d78cd903 /source/slang
parent: 838e8331da24744948539c12d2a8edcd9c594ee5 (diff)
1 files changed, 32 insertions, 1 deletions
diff --git a/source/slang/lexer.cpp b/source/slang/lexer.cpp
index cb718b538..f81ead87c 100644
--- a/source/slang/lexer.cpp
+++ b/source/slang/lexer.cpp
@@ -1052,8 +1052,39 @@ namespace Slang
             // Note(tfoley): `StringBuilder::Append()` seems to crash when appending zero bytes
             if(textEnd != textBegin)
             {
+                // HACK(tfoley): "scrubbing" token value here to remove escaped newlines...
+                //
+                // TODO: Only perform this work if we encountered an escaped newline
+                // while lexing this token (e.g., keep a flag on the lexer), or
+                // do it on-demand when the actual value of the token is needed.
+
                 StringBuilder valueBuilder;
-                valueBuilder.Append(textBegin, int(textEnd - textBegin));
+                auto tt = textBegin;
+                while(tt != textEnd)
+                {
+                    char c = *tt++;
+                    if(c == '\\')
+                    {
+                        char d = *tt;
+                        switch(d)
+                        {
+                        case '\r': case '\n':
+                            {
+                                tt++;
+                                char e = *tt;
+                                if((d ^ e) == ('\r' ^ '\n'))
+                                {
+                                    tt++;
+                                }
+                            }
+                            continue;
+
+                        default:
+                            break;
+                        }
+                    }
+                    valueBuilder.Append(c);
+                }
                 token.Content = valueBuilder.ProduceString();
             }
author	Tim Foley <tfoley@nvidia.com>	2017-06-19 14:13:55 -0700
committer	Tim Foley <tfoley@nvidia.com>	2017-06-19 14:13:55 -0700
commit	e65d2ad1e776610427b85dd20e861d3ad5e0ea71 (patch)
tree	2693987d8cbac64369b30857f6a6b8e6d78cd903 /source/slang
parent	838e8331da24744948539c12d2a8edcd9c594ee5 (diff)