From 7911c9437333692db275d2dff41264f4c8023be8 Mon Sep 17 00:00:00 2001 From: Yong He Date: Wed, 5 Feb 2025 12:32:56 -0800 Subject: Use two-stage parsing to disambiguate generic app and comparison. (#6281) * Use two-stage parsing to disambiguate generic app and comparison. * Typo fix. * Update doc. --- docs/design/parsing.md | 68 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 68 insertions(+) create mode 100644 docs/design/parsing.md (limited to 'docs') diff --git a/docs/design/parsing.md b/docs/design/parsing.md new file mode 100644 index 000000000..9027e06dc --- /dev/null +++ b/docs/design/parsing.md @@ -0,0 +1,68 @@ +# Resolving Ambiguity in Slang's Parser + +A typical text-book style compiler front-end usually features explicit stages: tokenization, parsing, and semantic checking. Slang's original design follows this pattern, but the design has a drawback that it cannot effectively disambiguate the syntax due to lack of semantic info during parsing. + +For example, without knowing what `X` is, it is impossible to tell whether `X(5)` means calling a generic function `X` with argument `5`, or computing the logical `AND` between condition `X < a` and `b > 5`. + +Slang initially addresses this problem with a heursitic: if the compiler sees `IDENTIFIER` followed by `<`, it will try to parse the expression as a generic specialization first, and if that succeeds, it checks the token after the closing `>` to see if the following token is one of the possible "generic specialization followers". In this example, the next token is `(`, which is a "generic specialization follower", so the compiler determines that the expression being parsed is very likely a generic function call, and it will parse the expression as such. For reference, the full set of "generic specialization followers" are: `::`, `.`, `(`, `)`, `[`, `]`, `:`, `,`, `?`, `;`, `==`, `!=`, `>` and `>>`. + +This simplistic heuristic is originated from the C# compiler, which works well there since C# doesn't allow generic value arguments, therefore things like `X...` or `X...` can never be valid generic specializations. This isn't the case for Slang, where generic arguments can be int or boolean values, so `a&&b` and `a