diff options
| -rw-r--r-- | docs/design/parsing.md | 68 | ||||
| -rw-r--r-- | source/slang/slang-ast-base.h | 2 | ||||
| -rw-r--r-- | source/slang/slang-ast-dump.cpp | 2 | ||||
| -rw-r--r-- | source/slang/slang-ast-stmt.h | 4 | ||||
| -rw-r--r-- | source/slang/slang-ast-support-types.h | 4 | ||||
| -rw-r--r-- | source/slang/slang-check-conversion.cpp | 3 | ||||
| -rw-r--r-- | source/slang/slang-check-decl.cpp | 28 | ||||
| -rw-r--r-- | source/slang/slang-check-expr.cpp | 6 | ||||
| -rw-r--r-- | source/slang/slang-check-impl.h | 12 | ||||
| -rw-r--r-- | source/slang/slang-check-stmt.cpp | 31 | ||||
| -rw-r--r-- | source/slang/slang-diagnostic-defs.h | 5 | ||||
| -rw-r--r-- | source/slang/slang-language-server-document-symbols.cpp | 5 | ||||
| -rw-r--r-- | source/slang/slang-lookup.cpp | 4 | ||||
| -rw-r--r-- | source/slang/slang-parser.cpp | 238 | ||||
| -rw-r--r-- | source/slang/slang-parser.h | 13 | ||||
| -rw-r--r-- | tests/bugs/gh-4150.slang | 6 | ||||
| -rw-r--r-- | tests/front-end/generic-disambiguate-2.slang | 28 |
17 files changed, 426 insertions, 33 deletions
diff --git a/docs/design/parsing.md b/docs/design/parsing.md new file mode 100644 index 000000000..9027e06dc --- /dev/null +++ b/docs/design/parsing.md @@ -0,0 +1,68 @@ +# Resolving Ambiguity in Slang's Parser + +A typical text-book style compiler front-end usually features explicit stages: tokenization, parsing, and semantic checking. Slang's original design follows this pattern, but the design has a drawback that it cannot effectively disambiguate the syntax due to lack of semantic info during parsing. + +For example, without knowing what `X` is, it is impossible to tell whether `X<a&&b>(5)` means calling a generic function `X` with argument `5`, or computing the logical `AND` between condition `X < a` and `b > 5`. + +Slang initially addresses this problem with a heursitic: if the compiler sees `IDENTIFIER` followed by `<`, it will try to parse the expression as a generic specialization first, and if that succeeds, it checks the token after the closing `>` to see if the following token is one of the possible "generic specialization followers". In this example, the next token is `(`, which is a "generic specialization follower", so the compiler determines that the expression being parsed is very likely a generic function call, and it will parse the expression as such. For reference, the full set of "generic specialization followers" are: `::`, `.`, `(`, `)`, `[`, `]`, `:`, `,`, `?`, `;`, `==`, `!=`, `>` and `>>`. + +This simplistic heuristic is originated from the C# compiler, which works well there since C# doesn't allow generic value arguments, therefore things like `X<a&&b>...` or `X<a<y>...` can never be valid generic specializations. This isn't the case for Slang, where generic arguments can be int or boolean values, so `a&&b` and `a<y` are valid as generic arguments. Although using the same heuristic here works most of the time, it is still causing a lot of confusion to the users when the heuristic fails. + +The ambiguity problem can be systematically solved if the parser has access to semantic info. If the parser knows that `X` is / isn't a generic, then it can parse the expression accordingly without any guess work. The key challenge is to make such semantic info available while we are still parsing. + +## Two-stage Parsing + +Slang solves this problem by breaking parsing into two stages: the decl parsing stage, and body parsing stage. Initially, we will parse the user source in the decl parsing stage. In this stage, we parse all decls, such as `struct`s, variables, functions etc. as usual, except that when we are about to parse the body of a function, we will just collect all tokens enclosed by `{` and `}` and store them in a raw list as a `UnparsedStmt` AST node. By deferring the parsing of function bodies, we no longer need to guess whether a `<` token inside a function body means generic specialization or less-than comparison. + +After the decl parsing stage, we have the AST that represents the decl structure but not the function bodies. With this initial AST, we can start semantic checking. Once we reached the `UnparsedStmt` nodes, the semantic visitor will spawn a new `Parser` and start to parse the tokens stored in the `UnparsedStmt` node. When we spawn the parser in a semantic visitor, initialize the parser to be in `Body` parsing stage, and pass a pointer to the semantic visitor to the parser. This way, we are triggering the second parsing stage from the semantic visitor. + +During the second parsing stage, whenever we see a `<` and need to disambiguate, we will use the semantic visitor to check the expression that has been parsed so far before `<`. If we are able to type check the expression and find it to be a `DeclRefExpr` referencing a generic decl, or an `OverloadedExpr` where one of the candidate is a generic decl, then we know `<` should be parsed as a generic specialization instead of `operator <`. If the expression before `<` checks to be a reference to a variable or a property, we should parse it as the comparison operator. The reason we are still parsing `<` as generic specialization when the expression before it is an non-generic function or type, is to allow us provide better error messages instead of just a "syntax error" somewhere down the line: in this case the user is most likely treating the non-generic type or function as a generic one by mistake, so we should diagnose as such. In the case that we are unable to properly check the preceeding expression or it checks to something else that we don't know, the compiler will fallback to the heuristic based method for disambiguation. + +Note that in the second stage, parsing and semantic checking is interleaved organically. We no longer have a clean boundary between parsing and checking. However, the checking that happens in the second stage is on-demand and checks only necessary parts of the code to determine the type of the expression preceeding the `<` token. Any other code irrelevant to disambiguation purposes are left unchecked. Once the function body is fully parsed, the semantic visitor working on the function will make sure every node of the parsed AST is visited. + +This two stage parsing technique should work well to correctly disambiguate code inside a function body. However the current implementation is not 100% bulletproof. Expressions at decl level, such as default values for struct members or function parameters, are still fully parsed in the first stage using the heuristic based method. However this should be a lesser problem in practice, because the default values are typically simple expressions and the chances of running into wrongly disambiguated case is much lower than in function bodies. + +## Scope of Local Variables + +Another issue linked with parsing is to correctly support the scope of local variables. A local variable should only be visible to code after its declaration within the same `{}` block. Consider this example: + +```cpp +static int input = 100; +int f() +{ + input = 2; // global `input` is now 2 + int input = input + 1; // local `input` is now 3 + input = input + 2; // local `input` is now 5 + return input; // returns 5. +} +``` + +In Slang's implementation, we are creating a `ScopeDecl` container node for each `BlockStatement`, and variable declarations inside the block are added to the same `ScopeDecl`. This creates a problem for two stage parsing: to allow any expression to check during disambiguation, we need to insert variables into the scope as soon as they are parsed, but this means that when we are doing the "full checking" after the entire body is parsed, all variables are already registered in scope and discoverable when we are checking the earlier statements in the block. This means that the compiler cannot report an error if the user attempts to use a variable that is defined later in the block. In the example above, it means that when we are checking the first statement `input = 2`, the lookup logic for `input` will find the local variable instead of the global variable, thus generating the wrong code. + +One way to solve this problem is instead of registering all local variables to the same scope owned by the containing `BlockStmt`, we make each local variable declaration own its own scope, that is ended at the end of the owning block. This way, all statements following the local variable declaration become the children of the local variable `DeclStmt`, effectively parsing the above example as: + +```cpp +static int input = 100; +int f() +{ + input = 2; // global `input` is now 2 + { + int input = input + 1; // local `input` is now 3 + input = input + 2; // local `input` is now 5 + return input; // returns 5. + } +} + +``` + +This will ensure the scope data-structure matches the semantic scope of the variable, and allow the compiler to produce the correct diagnostics. + +However, expressing scope this way creates long nested chains in the AST, and leads to inefficient lookup and deep ASTs that risk overflowing the stack. Instead, Slang stays with the design to put all variables in the same block registered to the same `ScopeDecl`, but uses a separate state on each `VarDecl` called `hiddenFromLookup` to track whether or not the decl should be visible to lookup. During parsing, all decls are set to visible by default, so they can be used for disambiguation purpose. Once parsing is fully done and we are about to check a `BlockStmt`, we will first visit all `DeclStmt`s in the block, mark it as `invisible`, then continue checking the children statements. When checking encounters a `DeclStmt`, it will then mark the decl as `visible`, allowing it to be found by lookup logic for code after the declaration side. This solution allows us to respect the semantic scope of local variables without actually forming a long chain of scopes for a sequence of statements. + +## Future Work: Extend Staged Parsing to Decl Scopes + +We can further extend this to properly support expressions in global/decl scopes, such as default value expressions for struct members, or the type expressions for functions and global/member variables. To do so, we will use a different strategy for parsing expressions in the first parsing stage. Instead of parsing the expression directly, we should identify the token boundary of an expression without detailed understanding of the syntax. We will parse all expressions into `UnparsedExpr` nodes, which contain unparsed tokens for each expression. By doing so, the first parsing stage will give us an AST that is detailed enough to identify the names of types and functions, and whether or not they are generic. Then we can perform the semantic checking on the intial AST, and use the semantic checking to drive the parsing and checking of any `UnparsedExpr` and `UnparsedStmt`s. + +## Future Work: ScopeRef + +We can get rid of the `hiddenFromLookup` flag and use a more immutable representation of AST nodes if we introduce the concept of a `ScopeRef` that is a `Scope*` + `endIndex` to mark the boundary of the referenced scope. This way, different statements in a block can have different `ScopeRef` to the same scope but different ending member index. If we are looking up through a `ScopeRef` and find a variable in the scope that has an index greater than `endIndex`, we should treat the variable as invisible and report an error. This is cleaner, allowing better error messages, and avoids having to maintain mutable state flags on Decls.
\ No newline at end of file diff --git a/source/slang/slang-ast-base.h b/source/slang/slang-ast-base.h index 8c2f7afa7..9cc36e530 100644 --- a/source/slang/slang-ast-base.h +++ b/source/slang/slang-ast-base.h @@ -791,6 +791,8 @@ public: // Track the decl reference that caused the requirement of a capability atom. SLANG_UNREFLECTED List<DeclReferenceWithLoc> capabilityRequirementProvenance; + SLANG_UNREFLECTED bool hiddenFromLookup = false; + private: SLANG_UNREFLECTED DeclRefBase* m_defaultDeclRef = nullptr; }; diff --git a/source/slang/slang-ast-dump.cpp b/source/slang/slang-ast-dump.cpp index 85d2d0d9f..f6cdf50d8 100644 --- a/source/slang/slang-ast-dump.cpp +++ b/source/slang/slang-ast-dump.cpp @@ -645,6 +645,8 @@ struct ASTDumpContext m_writer->emit(info->m_name); } + void dump(SourceLanguage language) { m_writer->emit((int)language); } + void dump(KeyValuePair<DeclRefBase*, SubtypeWitness*> pair) { m_writer->emit("("); diff --git a/source/slang/slang-ast-stmt.h b/source/slang/slang-ast-stmt.h index 8c21f78a5..eba7027c2 100644 --- a/source/slang/slang-ast-stmt.h +++ b/source/slang/slang-ast-stmt.h @@ -52,6 +52,10 @@ class UnparsedStmt : public Stmt // The tokens that were contained between `{` and `}` List<Token> tokens; + Scope* currentScope = nullptr; + Scope* outerScope = nullptr; + SourceLanguage sourceLanguage; + bool isInVariadicGenerics = false; }; class EmptyStmt : public Stmt diff --git a/source/slang/slang-ast-support-types.h b/source/slang/slang-ast-support-types.h index 2581630dd..72707ab38 100644 --- a/source/slang/slang-ast-support-types.h +++ b/source/slang/slang-ast-support-types.h @@ -419,6 +419,10 @@ enum class DeclCheckState : uint8_t /// Unchecked, + /// The declaration is parsed and inserted into the initial scope, + /// ready for future lookups from within the parser for disambiguation purposes. + ReadyForParserLookup, + /// Basic checks on the modifiers of the declaration have been applied. /// /// For example, when a declaration has attributes, the transformation diff --git a/source/slang/slang-check-conversion.cpp b/source/slang/slang-check-conversion.cpp index d89997bad..6bbdc1477 100644 --- a/source/slang/slang-check-conversion.cpp +++ b/source/slang/slang-check-conversion.cpp @@ -1254,6 +1254,7 @@ bool SemanticsVisitor::_coerce( auto resultExpr = getASTBuilder()->create<MakeOptionalExpr>(); resultExpr->loc = fromExpr->loc; resultExpr->type = toType; + resultExpr->checked = true; *outToExpr = resultExpr; } return true; @@ -1379,6 +1380,7 @@ bool SemanticsVisitor::_coerce( derefExpr = m_astBuilder->create<DerefExpr>(); derefExpr->base = fromExpr; derefExpr->type = QualType(fromElementType); + derefExpr->checked = true; } if (!_coerce(site, toType, outToExpr, fromElementType, derefExpr, &subCost)) @@ -1407,6 +1409,7 @@ bool SemanticsVisitor::_coerce( refExpr->base = fromExpr; refExpr->type = QualType(refType); refExpr->type.isLeftValue = false; + refExpr->checked = true; *outToExpr = refExpr; } if (outCost) diff --git a/source/slang/slang-check-decl.cpp b/source/slang/slang-check-decl.cpp index b03126512..ab3335bfb 100644 --- a/source/slang/slang-check-decl.cpp +++ b/source/slang/slang-check-decl.cpp @@ -16,6 +16,7 @@ #include "slang-ast-reflect.h" #include "slang-ast-synthesis.h" #include "slang-lookup.h" +#include "slang-parser.h" #include "slang-syntax.h" #include <limits> @@ -4558,6 +4559,7 @@ void SemanticsVisitor::addRequiredParamsToSynthesizedDecl( synMemberExpr->elementIndices.add((uint32_t)i); synMemberExpr->type = elementType; synMemberExpr->type.isLeftValue = paramType.isLeftValue; + synMemberExpr->checked = true; synArgs.add(synMemberExpr); } } @@ -8165,6 +8167,7 @@ SemanticsContext SemanticsDeclBodyVisitor::registerDifferentiableTypesForFunc( void SemanticsDeclBodyVisitor::visitFunctionDeclBase(FunctionDeclBase* decl) { auto newContext = registerDifferentiableTypesForFunc(decl); + decl->body = maybeParseStmt(decl->body, newContext); if (const auto body = decl->body) { checkStmt(decl->body, newContext); @@ -9472,6 +9475,28 @@ void SemanticsDeclBodyVisitor::visitAggTypeDecl(AggTypeDecl* aggTypeDecl) } } +Stmt* SemanticsVisitor::maybeParseStmt(Stmt* stmt, const SemanticsContext& context) +{ + if (auto unparsedStmt = as<UnparsedStmt>(stmt)) + { + // Parse the statement now, and check it. + SemanticsVisitor subVisitor(context); + TokenList tokenList; + tokenList.m_tokens = _Move(unparsedStmt->tokens); + return parseUnparsedStmt( + m_astBuilder, + &subVisitor, + getShared()->getTranslationUnitRequest(), + unparsedStmt->sourceLanguage, + unparsedStmt->isInVariadicGenerics, + tokenList, + getShared()->getSink(), + unparsedStmt->currentScope, + unparsedStmt->outerScope); + } + return stmt; +} + void SemanticsDeclHeaderVisitor::cloneModifiers(Decl* dest, Decl* src) { dest->modifiers = src->modifiers; @@ -11182,6 +11207,9 @@ static void _dispatchDeclCheckingVisitor(Decl* decl, DeclCheckState state, Seman { switch (state) { + case DeclCheckState::ReadyForParserLookup: + // We don't need to do anything to make a decl ready for parser lookup. + break; case DeclCheckState::ModifiersChecked: SemanticsDeclModifiersVisitor(shared).dispatch(decl); break; diff --git a/source/slang/slang-check-expr.cpp b/source/slang/slang-check-expr.cpp index 7726fd6c8..7d4fbdf4c 100644 --- a/source/slang/slang-check-expr.cpp +++ b/source/slang/slang-check-expr.cpp @@ -167,7 +167,7 @@ Expr* SemanticsVisitor::openExistential(Expr* expr, DeclRef<InterfaceDecl> inter openedValue->declRef = varDeclRef; openedValue->type = QualType(openedType); openedValue->originalExpr = expr; - + openedValue->checked = true; // The result of opening an existential is an l-value // if the original existential is an l-value. // @@ -226,6 +226,7 @@ Expr* SemanticsVisitor::maybeOpenRef(Expr* expr) openRef->innerExpr = expr; openRef->type.isLeftValue = (as<RefType>(exprType) != nullptr); openRef->type.type = refType->getValueType(); + openRef->checked = true; return openRef; } return expr; @@ -522,6 +523,7 @@ Expr* SemanticsVisitor::constructDerefExpr(Expr* base, QualType elementType, Sou derefExpr->loc = loc; derefExpr->base = base; derefExpr->type = QualType(elementType); + derefExpr->checked = true; if (as<PtrType>(base->type) || as<RefType>(base->type)) { @@ -3918,6 +3920,7 @@ Expr* SemanticsExprVisitor::visitAsTypeExpr(AsTypeExpr* expr) makeOptional->type = optType; makeOptional->value = castToSuperType; makeOptional->typeExpr = typeExpr.exp; + makeOptional->checked = true; return makeOptional; } @@ -4125,6 +4128,7 @@ Expr* SemanticsVisitor::CheckMatrixSwizzleExpr( swizExpr->loc = memberRefExpr->loc; swizExpr->base = memberRefExpr->baseExpression; swizExpr->memberOpLoc = memberRefExpr->memberOperatorLoc; + swizExpr->checked = true; // We can have up to 4 swizzles of two elements each MatrixCoord elementCoords[4]; diff --git a/source/slang/slang-check-impl.h b/source/slang/slang-check-impl.h index 3276b7534..e6520cfd3 100644 --- a/source/slang/slang-check-impl.h +++ b/source/slang/slang-check-impl.h @@ -2035,6 +2035,8 @@ public: void checkStmt(Stmt* stmt, SemanticsContext const& context); + Stmt* maybeParseStmt(Stmt* stmt, const SemanticsContext& context); + void getGenericParams( GenericDecl* decl, List<Decl*>& outParams, @@ -2877,14 +2879,8 @@ public: // deal with this cases here, even if they are no-ops. // -#define CASE(NAME) \ - Expr* visit##NAME(NAME* expr) \ - { \ - if (!getShared()->isInLanguageServer()) \ - SLANG_DIAGNOSE_UNEXPECTED(getSink(), expr, "should not appear in input syntax"); \ - expr->type = m_astBuilder->getErrorType(); \ - return expr; \ - } +#define CASE(NAME) \ + Expr* visit##NAME(NAME* expr) { return expr; } CASE(DerefExpr) CASE(MakeRefExpr) diff --git a/source/slang/slang-check-stmt.cpp b/source/slang/slang-check-stmt.cpp index 550993957..151c9324a 100644 --- a/source/slang/slang-check-stmt.cpp +++ b/source/slang/slang-check-stmt.cpp @@ -54,6 +54,8 @@ void SemanticsStmtVisitor::visitDeclStmt(DeclStmt* stmt) // that need to be recursively checked. // ensureDeclBase(stmt->decl, DeclCheckState::DefinitionChecked, this); + if (auto decl = as<Decl>(stmt->decl)) + decl->hiddenFromLookup = false; } void SemanticsStmtVisitor::visitBlockStmt(BlockStmt* stmt) @@ -66,14 +68,41 @@ void SemanticsStmtVisitor::visitBlockStmt(BlockStmt* stmt) if (as<AggTypeDeclBase>(decl)) ensureAllDeclsRec(decl, DeclCheckState::DefinitionChecked); } + + // Consider this code: + // ``` + // { + // int a = 5 + b; // should error. + // int b = 3; + // } + // + // ``` + // In order to detect the error trying to use `b` before it's declared within + // a block, our lookup logic contains a condition that ignores a decl if its + // `hiddenFromLookup` field is set to `true`. + // See _lookUpDirectAndTransparentMembers(). + // This field will be set to false when we reach the decl through the DeclStmt. + // + if (auto seqStmt = as<SeqStmt>(stmt->body)) + { + for (auto subStmt : seqStmt->stmts) + { + if (auto declStmt = as<DeclStmt>(subStmt)) + { + if (auto decl = as<Decl>(declStmt->decl)) + decl->hiddenFromLookup = true; + } + } + } } checkStmt(stmt->body); } void SemanticsStmtVisitor::visitSeqStmt(SeqStmt* stmt) { - for (auto ss : stmt->stmts) + for (auto& ss : stmt->stmts) { + ss = maybeParseStmt(ss, *this); checkStmt(ss); } } diff --git a/source/slang/slang-diagnostic-defs.h b/source/slang/slang-diagnostic-defs.h index 682980107..4d037b68e 100644 --- a/source/slang/slang-diagnostic-defs.h +++ b/source/slang/slang-diagnostic-defs.h @@ -456,6 +456,11 @@ DIAGNOSTIC( Error, unexpectedTokenExpectedComponentDefinition, "unexpected token '$0', only component definitions are allowed in a shader scope.") +DIAGNOSTIC( + 20005, + Error, + invalidEmptyParenthesisExpr, + "empty parenthesis '()' is not a valid expression.") DIAGNOSTIC(20008, Error, invalidOperator, "invalid operator '$0'.") DIAGNOSTIC(20011, Error, unexpectedColon, "unexpected ':'.") DIAGNOSTIC( diff --git a/source/slang/slang-language-server-document-symbols.cpp b/source/slang/slang-language-server-document-symbols.cpp index c13316a71..e8a041ca8 100644 --- a/source/slang/slang-language-server-document-symbols.cpp +++ b/source/slang/slang-language-server-document-symbols.cpp @@ -90,6 +90,11 @@ static SourceLoc _findClosingSourceLoc(Decl* decl) { return block->closingSourceLoc; } + else if (auto unparsedStmt = as<UnparsedStmt>(func->body)) + { + if (unparsedStmt->tokens.getCount()) + return unparsedStmt->tokens.getLast().getLoc(); + } else if (func->body) { return func->body->loc; diff --git a/source/slang/slang-lookup.cpp b/source/slang/slang-lookup.cpp index 72fbed581..9b9034c63 100644 --- a/source/slang/slang-lookup.cpp +++ b/source/slang/slang-lookup.cpp @@ -161,8 +161,8 @@ static void _lookUpMembersInValue( static bool _isUncheckedLocalVar(const Decl* decl) { auto checkStateExt = decl->checkState; - auto isUnchecked = - checkStateExt.getState() == DeclCheckState::Unchecked || checkStateExt.isBeingChecked(); + auto isUnchecked = checkStateExt.getState() == DeclCheckState::Unchecked || + checkStateExt.isBeingChecked() || decl->hiddenFromLookup; return isUnchecked && isLocalVar(decl); } diff --git a/source/slang/slang-parser.cpp b/source/slang/slang-parser.cpp index 0d7281e3f..342ac9e55 100644 --- a/source/slang/slang-parser.cpp +++ b/source/slang/slang-parser.cpp @@ -1,6 +1,7 @@ #include "slang-parser.h" #include "../core/slang-semantic-version.h" +#include "slang-check-impl.h" #include "slang-compiler.h" #include "slang-lookup-spirv.h" #include "slang-lookup.h" @@ -75,11 +76,18 @@ enum Precedence : int Postfix, }; +enum class ParsingStage +{ + Decl, + Body, +}; + struct ParserOptions { bool enableEffectAnnotations = false; bool allowGLSLInput = false; bool isInLanguageServer = false; + ParsingStage stage = ParsingStage::Body; CompilerOptionSet optionSet; }; @@ -91,6 +99,7 @@ public: NamePool* namePool; SourceLanguage sourceLanguage; ASTBuilder* astBuilder; + SemanticsVisitor* semanticsVisitor = nullptr; NamePool* getNamePool() { return namePool; } SourceLanguage getSourceLanguage() { return sourceLanguage; } @@ -156,6 +165,8 @@ public: resetLookupScope(); } + ParsingStage getStage() { return options.stage; } + ModuleDecl* getCurrentModuleDecl() { return currentModule; } Parser( @@ -1811,14 +1822,62 @@ static Stmt* parseOptBody(Parser* parser) if (peekTokenType(parser) == TokenType::LBrace) { parser->sink->diagnose(semiColonToken.loc, Diagnostics::unexpectedBodyAfterSemicolon); - return parser->parseBlockStatement(); + + // Fall through to parse the block stmt. + } + else + { + return nullptr; } - return nullptr; } - else + + if (parser->getStage() == ParsingStage::Decl) { - return parser->parseBlockStatement(); + // If we are at the initial parsing stage, just collect the tokens + // without actually parsing them. + if (peekTokenType(parser) != TokenType::LBrace) + { + return parser->parseBlockStatement(); + } + auto unparsedStmt = parser->astBuilder->create<UnparsedStmt>(); + unparsedStmt->currentScope = parser->currentScope; + unparsedStmt->outerScope = parser->outerScope; + unparsedStmt->sourceLanguage = parser->getSourceLanguage(); + unparsedStmt->isInVariadicGenerics = parser->isInVariadicGenerics; + parser->FillPosition(unparsedStmt); + List<Token>& tokens = unparsedStmt->tokens; + int braceDepth = 0; + for (;;) + { + auto token = parser->ReadToken(); + if (token.type == TokenType::EndOfFile) + { + break; + } + if (token.type == TokenType::LBrace) + { + braceDepth++; + } + else if (token.type == TokenType::RBrace) + { + braceDepth--; + } + tokens.add(token); + if (braceDepth == 0) + { + break; + } + } + Token eofToken; + eofToken.type = TokenType::EndOfFile; + eofToken.loc = parser->tokenReader.peekLoc(); + tokens.add(eofToken); + return unparsedStmt; } + + // If we are in the second stage of parsing, then we need to actually + // parse the block statement for real. + return parser->parseBlockStatement(); } /// Complete parsing of a function using traditional (C-like) declarator syntax @@ -1867,9 +1926,12 @@ static Decl* parseTraditionalFuncDecl(Parser* parser, DeclaratorInfo const& decl parser->PushScope(funcScope); decl->body = parseOptBody(parser); - if (auto block = as<BlockStmt>(decl->body)) + if (auto blockStmt = as<BlockStmt>(decl->body)) + decl->closingSourceLoc = blockStmt->closingSourceLoc; + else if (auto unparsedStmt = as<UnparsedStmt>(decl->body)) { - decl->closingSourceLoc = block->closingSourceLoc; + if (unparsedStmt->tokens.getCount()) + decl->closingSourceLoc = unparsedStmt->tokens.getLast().getLoc(); } parser->PopScope(); @@ -2311,15 +2373,79 @@ static bool isGenericName(Parser* parser, Name* name) return lookupResult.item.declRef.is<GenericDecl>(); } +enum class BaseGenericKind +{ + Unknown, + Generic, + NonGeneric, +}; + static Expr* tryParseGenericApp(Parser* parser, Expr* base) { Name* baseName = nullptr; - if (auto varExpr = as<VarExpr>(base)) - baseName = varExpr->name; - // if base is a known generics, parse as generics - if (baseName && isGenericName(parser, baseName)) + BaseGenericKind baseKind = BaseGenericKind::Unknown; + if (parser->semanticsVisitor) + { + // If we have access to a semantic visitor, we can check the base + // and see if it refers to a generic. + auto checkedBase = parser->semanticsVisitor->CheckTerm(base); + if (auto declRefExpr = as<DeclRefExpr>(checkedBase)) + { + if (declRefExpr->declRef.is<GenericDecl>()) + { + baseKind = BaseGenericKind::Generic; + } + else if ( + declRefExpr->declRef.is<FunctionDeclBase>() || + declRefExpr->declRef.is<AggTypeDeclBase>()) + { + // If declref is a function or type, even if it is not a generic, + // we should parse the `<` as a generic application for better error + // messages. This is because functions or types can never precede a + // `<` in valid Slang code, and it is more likely that the user assumed + // the function or type is generic by mistake. + // + baseKind = BaseGenericKind::Generic; + } + else + { + baseKind = BaseGenericKind::NonGeneric; + } + } + else if (auto overloadedExpr = as<OverloadedExpr>(checkedBase)) + { + baseKind = BaseGenericKind::NonGeneric; + for (auto candidate : overloadedExpr->lookupResult2) + { + if (candidate.declRef.is<GenericDecl>() || + declRefExpr->declRef.is<FunctionDeclBase>() || + declRefExpr->declRef.is<AggTypeDeclBase>()) + { + baseKind = BaseGenericKind::Generic; + break; + } + } + } + } + else + { + // Without a semantic visitor, we fallback to a more simplistic lookup + // and guessing. + if (auto varExpr = as<VarExpr>(base)) + baseName = varExpr->name; + // if base is a known generics, parse as generics + if (baseName && isGenericName(parser, baseName)) + baseKind = BaseGenericKind::Generic; + } + + // If base is known to be a generic, just parse as generic app. + if (baseKind == BaseGenericKind::Generic) return parseGenericApp(parser, base); + // If base is known to be non-generic, just return base. + if (baseKind == BaseGenericKind::NonGeneric) + return base; + // otherwise, we speculate as generics, and fallback to comparison when parsing failed TokenSpan tokenSpan; tokenSpan.m_begin = parser->tokenReader.m_cursor; @@ -3844,8 +3970,13 @@ static NodeBase* parseConstructorDecl(Parser* parser, void* /*userData*/) decl->body = parseOptBody(parser); - if (auto block = as<BlockStmt>(decl->body)) - decl->closingSourceLoc = block->closingSourceLoc; + if (auto blockStmt = as<BlockStmt>(decl->body)) + decl->closingSourceLoc = blockStmt->closingSourceLoc; + else if (auto unparsedStmt = as<UnparsedStmt>(decl->body)) + { + if (unparsedStmt->tokens.getCount()) + decl->closingSourceLoc = unparsedStmt->tokens.getLast().getLoc(); + } parser->PopScope(); return decl; @@ -3898,10 +4029,13 @@ static AccessorDecl* parseAccessorDecl(Parser* parser) if (parser->tokenReader.peekTokenType() == TokenType::LBrace) { - decl->body = parser->parseBlockStatement(); - if (auto block = as<BlockStmt>(decl->body)) + decl->body = parseOptBody(parser); + if (auto blockStmt = as<BlockStmt>(decl->body)) + decl->closingSourceLoc = blockStmt->closingSourceLoc; + else if (auto unparsedStmt = as<UnparsedStmt>(decl->body)) { - decl->closingSourceLoc = block->closingSourceLoc; + if (unparsedStmt->tokens.getCount()) + decl->closingSourceLoc = unparsedStmt->tokens.getLast().getLoc(); } } else @@ -4184,6 +4318,11 @@ static NodeBase* parseFuncDecl(Parser* parser, void* /*userData*/) decl->body = parseOptBody(parser); if (auto blockStmt = as<BlockStmt>(decl->body)) decl->closingSourceLoc = blockStmt->closingSourceLoc; + else if (auto unparsedStmt = as<UnparsedStmt>(decl->body)) + { + if (unparsedStmt->tokens.getCount()) + decl->closingSourceLoc = unparsedStmt->tokens.getLast().getLoc(); + } parser->PopScope(); return decl; }); @@ -4734,6 +4873,19 @@ static void CompleteDecl( AddMember(containerDecl, decl); } } + + if (parser->semanticsVisitor && parser->getStage() == ParsingStage::Body) + { + // When we are in a deferred parsing stage for function bodies, + // we will mark all local var decls as `ReadyForParserLookup` so they can + // be returned via lookup. + // Note that our lookup logic will ignore all unchecked decls, but during + // parsing we don't want to ignore them, so we mark them as `ReadyForParserLookup` + // here, which is a pseudo state that is only used during parsing. + // Before checking the decl in semantic checking, we will mark them back as + // `Unchecked`. + decl->checkState = DeclCheckState::ReadyForParserLookup; + } } static DeclBase* ParseDeclWithModifiers( @@ -5482,6 +5634,7 @@ Stmt* parseCompileTimeForStmt(Parser* parser) VarDecl* varDecl = parser->astBuilder->create<VarDecl>(); varDecl->nameAndLoc = varNameAndLoc; varDecl->loc = varNameAndLoc.loc; + varDecl->checkState = DeclCheckState::ReadyForParserLookup; stmt->varDecl = varDecl; @@ -5863,7 +6016,6 @@ DeclStmt* Parser::parseVarDeclrStatement(Modifiers modifiers) { sink->diagnose(decl->loc, Diagnostics::declNotAllowed, decl->astNodeType); } - return varDeclrStatement; } @@ -5902,6 +6054,11 @@ Stmt* Parser::parseIfLetStatement() tempVarDecl->nameAndLoc = NameLoc(getName(this, "$OptVar"), identifierToken.loc); tempVarDecl->initExpr = initExpr; AddMember(currentScope->containerDecl, tempVarDecl); + if (semanticsVisitor) + semanticsVisitor->ensureDecl( + (Decl*)tempVarDecl, + DeclCheckState::DefinitionChecked, + nullptr); DeclStmt* tmpVarDeclStmt = astBuilder->create<DeclStmt>(); FillPosition(tmpVarDeclStmt); @@ -6991,9 +7148,19 @@ static Expr* parseAtomicExpr(Parser* parser) // as an expression. Expr* base = nullptr; - if (!tryParseExpression(parser, base, TokenType::RParent)) + if (parser->LookAheadToken(TokenType::RParent)) + { + // We don't support empty parentheses `()` as a valid expression. + parser->diagnose(openParen, Diagnostics::invalidEmptyParenthesisExpr); + base = parser->astBuilder->create<IncompleteExpr>(); + base->type = parser->astBuilder->getErrorType(); + } + else { - base = parser->ParseType(); + if (!tryParseExpression(parser, base, TokenType::RParent)) + { + base = parser->ParseType(); + } } parser->ReadToken(TokenType::RParent); @@ -8086,6 +8253,7 @@ Expr* parseTermFromSourceFile( { ParserOptions options; options.allowGLSLInput = sourceLanguage == SourceLanguage::GLSL; + options.stage = ParsingStage::Body; Parser parser(astBuilder, tokens, sink, outerScope, options); parser.currentScope = outerScope; parser.namePool = namePool; @@ -8093,6 +8261,39 @@ Expr* parseTermFromSourceFile( return parser.ParseExpression(); } +Stmt* parseUnparsedStmt( + ASTBuilder* astBuilder, + SemanticsVisitor* semanticsVisitor, + TranslationUnitRequest* translationUnit, + SourceLanguage sourceLanguage, + bool isInVariadicGenerics, + TokenSpan const& tokens, + DiagnosticSink* sink, + Scope* currentScope, + Scope* outerScope) +{ + ParserOptions options = {}; + options.stage = ParsingStage::Body; + options.enableEffectAnnotations = translationUnit->compileRequest->optionSet.getBoolOption( + CompilerOptionName::EnableEffectAnnotations); + options.allowGLSLInput = + translationUnit->compileRequest->optionSet.getBoolOption(CompilerOptionName::AllowGLSL) || + sourceLanguage == SourceLanguage::GLSL; + options.isInLanguageServer = + translationUnit->compileRequest->getLinkage()->isInLanguageServer(); + options.optionSet = translationUnit->compileRequest->optionSet; + + Parser parser(astBuilder, tokens, sink, outerScope, options); + parser.currentScope = outerScope; + parser.namePool = translationUnit->getNamePool(); + parser.sourceLanguage = sourceLanguage; + parser.semanticsVisitor = semanticsVisitor; + parser.currentScope = parser.currentLookupScope = currentScope; + parser.currentModule = semanticsVisitor->getShared()->getModule()->getModuleDecl(); + parser.isInVariadicGenerics = isInVariadicGenerics; + return parser.parseBlockStatement(); +} + // Parse a source file into an existing translation unit void parseSourceFile( ASTBuilder* astBuilder, @@ -8104,6 +8305,7 @@ void parseSourceFile( ContainerDecl* parentDecl) { ParserOptions options = {}; + options.stage = ParsingStage::Decl; options.enableEffectAnnotations = translationUnit->compileRequest->optionSet.getBoolOption( CompilerOptionName::EnableEffectAnnotations); options.allowGLSLInput = diff --git a/source/slang/slang-parser.h b/source/slang/slang-parser.h index bef03c9fe..9f9f4972a 100644 --- a/source/slang/slang-parser.h +++ b/source/slang/slang-parser.h @@ -25,6 +25,19 @@ Expr* parseTermFromSourceFile( NamePool* namePool, SourceLanguage sourceLanguage); +struct SemanticsVisitor; + +Stmt* parseUnparsedStmt( + ASTBuilder* astBuilder, + SemanticsVisitor* semantics, + TranslationUnitRequest* translationUnit, + SourceLanguage sourceLanguage, + bool isInVariadicGenerics, + TokenSpan const& tokens, + DiagnosticSink* sink, + Scope* currentScope, + Scope* outerScope); + ModuleDecl* populateBaseLanguageModule(ASTBuilder* astBuilder, Scope* scope); /// Information used to set up SyntaxDecl. Such decls diff --git a/tests/bugs/gh-4150.slang b/tests/bugs/gh-4150.slang index 031d778a9..a6c9f7a0b 100644 --- a/tests/bugs/gh-4150.slang +++ b/tests/bugs/gh-4150.slang @@ -27,15 +27,15 @@ void main(uint3 pixel_i : SV_DispatchThreadID) output[0] = #ifdef ERROR1 // expect error: trying to specialize RWTex, which has two arguments, with only one argument. - // CHECK1:([[# @LINE+1]]): error 30075 + // CHECK1-DAG:([[# @LINE+1]]): error 30075 RWTex<float3>::get(p.image_id); #else RWTex<float, 3>::get(p.image_id); #endif - //CHECK1:([[# @LINE+1]]): error 30071 + //CHECK1-DAG:([[# @LINE+1]]): error 30071 static float sa1[]; - //CHECK1:([[# @LINE+1]]): error 30071 + //CHECK1-DAG:([[# @LINE+1]]): error 30071 float sa2[]; //CHECK1-NOT:([[# @LINE+1]]): error diff --git a/tests/front-end/generic-disambiguate-2.slang b/tests/front-end/generic-disambiguate-2.slang new file mode 100644 index 000000000..81fd9bc20 --- /dev/null +++ b/tests/front-end/generic-disambiguate-2.slang @@ -0,0 +1,28 @@ +//TEST:COMPARE_COMPUTE(filecheck-buffer=CHECK): -output-using-type + +static const uint X_SIZE = 32; + +uint y<int x> (int y){return 0;} + +uint test(uint x) +{ + uint y = 0; + // With two stage parsing, we should be able to disambiguate this + // from a generic function call. + uint surround_mask = x + + y<3?1:x>(X_SIZE) // generic call or comparison? + ? 5 : 0; + return surround_mask; +} + +//TEST_INPUT: set outputBuffer = out ubuffer(data=[0 0 0 0], stride=4) +RWStructuredBuffer<uint> outputBuffer; + +[numthreads(1,1,1)] +void computeMain() +{ + // CHECK: 1 + outputBuffer[0] = test(1); + // CHECK: 5 + outputBuffer[1] = test(33); +}
\ No newline at end of file |
