summaryrefslogtreecommitdiffstats
path: root/source/slang/lexer.cpp
Commit message (Collapse)AuthorAge
* Use slang- prefix on slang compiler and core source (#973)jsmall-nvidia2019-05-31
| | | | | | | | | | | | * Prefixing source files in source/slang with slang- * Prefix source in source/slang with slang- prefix. * Rename core source files with slang- prefix. * Update project files. * Fix problems from automatic merge.
* String/List closer to conventions, and use Index type (#959)jsmall-nvidia2019-04-29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * List made members m_ Tweaked types to closer match conventions. * Use asserts for checking conditions on List. Other small improvements. * List<T>.Count() -> getSize() * List<T> Add -> add First -> getFirst Last -> getLast RemoveLast -> removeLast ReleaseBuffer -> detachBuffer GetArrayView -> getArrayView * List<T>:: AddRange -> addRange Capacity -> getCapacity Insert -> insert InsertRange -> insertRange AddRange -> addRange RemoveRange -> removeRange RemoveAt -> removeAt Remove -> remove Reverse -> reverse FastRemove -> fastRemove FastRemoveAt -> fastRemoveAt Clear -> clear * List<T> FreeBuffer -> _deallocateBuffer Free -> clearAndDeallocate SwapWith -> swapWith * List<T> SetSize -> setSize Reserve -> reserve GrowToSize growToSize * UnsafeShrinkToSize -> unsafeShrinkToSize Compress -> compress FindLast -> findLastIndex FindLast -> findLastIndex Simplify Contains * List<T> Removed m_allocator (wasn't used) Swap -> swapElements Sort -> sort Contains -> contains ForEach -> forEach QuickSort -> quickSort InsertionSort -> insertionSort BinarySearch -> binarySearch Max -> calcMax Min -> calcMin * Initializer::Initialize -> initialize List<T>:: Allocate -> _allocate Init -> _init IndexOf -> indexOf * * Put #include <assert.h> in common.h, and remove unneeded inclusions * Small refactor of ArrayView - remove stride as not used * getSize -> getCount setSize -> setCount unsafeShrinkToSize->unsafeShrinkToCount growToSize -> growToCount m_size -> m_count * Some tidy up around Allocator. * Use Index type on List. * Refactor of IntSet. First tentative look at using Index. * Made Index an Int Did preliminary fixes. Made String use Index. * Partial refactor of String. * String::Buffer -> getBuffer ToWString -> toWString * Small improvements to String. String:: Buffer() -> getBuffer() Equals() -> equals * Try to use Index where appropriate. * Fix warnings on windows x86 builds.
* Allow generics to close with >>Yong He2019-02-05
|
* Feature/lex memory reduction (#762)jsmall-nvidia2018-12-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * Only do scrubbing if needed. When allocating content try to limit size (with scrubbing each token takes up 1k), now it's 16 bytes min size. * Don't allocate for every call to write on the CallbackWriter - use the m_appendBuffer. * Don't allocate memory for CallbackWriter use m_appendBuffer. * Use UnownedStringSlice for suffix output for parsing float/int literals. Fix typo in invalidFloatingPointLiteralSuffix * Using memory arena to hold tokens that are not in SourceManager. * Improve comment on lexing. * Make UnownedStringSlice allocation simpler on SourceManager. * Fix error on gcc around UnownedStringSlice - because VC converted string + UnownedStringSlice automatically into a String. * Fix generateName needing concat string for gcc. * When constructing a Token in parseAttributeName - because it's a Identifier, we have to set the Name. * Remove translation through String on getIntrinsicOp * Make func-cbuffer-param disablable with -exclude compatibility-issue * Move memory leak in render-test. * From review - can just use "?:" instead of performing a concat.
* Feature/source loc review (#672)jsmall-nvidia2018-10-12
| | | | | | | | | | | | | | * Fixes/improvements based around review comments. * SourceUnit -> SourceView * * Removed the HumaneSourceLoc as it's POD-like ness seemed to make that unnecessary * Made exposed member variables in SourceManager protected - so make clear where/how can be accesed * Improved description about SourceLoc and associated structures * Changed SourceLocType to 'Actual' and 'Nominal'. * Improved a comment.
* Feature/source loc refactor (#668)jsmall-nvidia2018-10-10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * * Remove the need for IRHighLevelDecoration in Emit * Use the IRLayoutDecoration for GeometryShaderPrimitiveTypeModifier * Initial look at at variable byte encoding, and simple unit test. * Fixing problems with comparison due to naming differences with slang/fxc. * * More tests and perf improvements for byte encoding. * Mechanism to detect processor and processor features in main slang header. * Split out cpu based defines into slang-cpu-defines.h so do not polute slang.h * Support for variable byte encoding on serialization. * Removed unused flag. * Fix warning. * Fix calcMsByte32 for 0 values without using intrinsic. * Fix a mistake in calculating maximum instruction size. * Introduced the idea of SourceUnit. * Small improvements around naming. Add more functionality - including getting the HumaneLoc. * Add support for #line default * Compiling with new SourceLoc handling. * Fix off by one on #line directives. * Can use 32bits for SourceLoc. Fix serialize to use that. * Small fixes and comment on usage. * Premake run. * Fix signed warning. * Fix typo on StringSlicePool::has found in review.
* Improve support for non-32-bit types. (#643)Tim Foley2018-09-20
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The main change here is to fill out the `BaseType` enumeration so that it covers the full range of 8/16/32/64-bit signed and unsigned integers, as well as 16/32/64-bit floating-point numbers, and then propagate that completion through various places in the code. More details: * The current `half`, `float`, `double`, `int`, and `uint` types are still the default names for their types, so things like `float16_t` and `int32_t` were added as `typedef`s. * We still need to generate the full gamut of vector/matrix `typedef`s for the new types, so that things like `float16_t4x3` will work (yes, I know that is ugly as sin, but that's the HLSL syntax...). * A few pieces of dead code from earlier in the compiler's life got removed, since I did a find-in-files for `BaseType::` and tried to either update or delete every site. * A few call sites that were enumerating integer base types in an ad-hoc fashion were changed to use a single `isIntegerBaseType()` function that I added in `check.cpp` * When compiling with dxc for shader model 6.2 and up, we enable the compiler's support for native 16-bit types via a flag. * The public API enumeration for reflection of scalar types added cases for 8- and 16-bit integers (it already exposed the other cases we need) * The lexer was updated to be extremely liberal in what kinds of suffixes it allows on literals. I also removed the logic that was treating, e.g., `0f` as a floating-point literal (it doesn't seem to be the right behavior). That would now be an integer literal with an invalid suffix. * The logic in the parser that applies types to literals was updated to handle a few more cases: `LL` and `ULL` for 64-bit integers, and `H` for 16-bit floats. * The mangling logic needed to be updated to handle the new cases, and I consolidated the handling of those types in their front-end and IR forms. * Removed the explicit `BasicExpressionType::ToString` logic, since all basic types are `DeclRefType`s in the front end, and we can just print them out as such. * As a bit of a gross hack, fudged the conversion costs so that `int` to `int64_t` conversion is a bit more costly. The problem there is that given an operation like `int(0) + uint(0)`, the best applicable candidates ended up being `+(uint,uint)` and `+(int64_t,int64_t)` because the cost of a single `int`-to-`uint` conversion was the same as the sum of the cost of an `int`-to-`int64_t` and a `uint`-to-`int64_t`. A better long-term fix here is to completely change our overload resolution strategy, but that is obviously way too big to squeeze into this change. * Type layout computation was updated to handle all the new types and give them their natural size/alignment. Note that this does *not* work for down-level HLSL where `half` is treated as a synonym for `float`. It also doesn't deal with the fact that many of these types aren't actually allowed in constant buffers for certain shader models. A future change should work to add error messages for unsupported stuff during type layout (or just make the types themselves require support for certain capabilities)
* Feature/attributed binding (#621)jsmall-nvidia2018-07-31
| | | | | | | | | | | | | | | * Typo fix, and added dxc to command line documentation. * Fix small typos. Added support for Scope to lexer. Fix bug in Token ctor. * Add support for attribute names that are scoped. * Added GLSLBindingAttribute. Make binding work through core.met.slang. * Allow [[gl::binding(binding, set)]] [[vk::binding(binding,set)]]
* Preprocessor cleanups (#484)Tim Foley2018-04-12
| | | | | | | | * For a `#error` or `#warning`, read the rest of the line as raw text to include in the error message * When skipping tokens (e.g., in an `#ifdef`d out block), don't emit errors on invalid characters * TODO: we could clearly get more efficient and skip whole raw lines in the future * Fix an issue when a macro invocation that expands to nothing (zero tokens) is the last thing before a directive. The preprocessor was returning the `#` as an ordinary token, because it has already gone past its test for directives.
* fixed last couple warnings under release/x64 build.Yong He2017-11-04
|
* Add an explicit `Name` typeTim Foley2017-08-14
| | | | | | | | | | | | | Fixes #23 Up to this point, the compiler has used the ordinary `String` type to represent declaration names, which means a bunch of lookup structures throughout the compiler were string-to-whatever maps, which can reduce efficiency. It also means that things like the `Token` type end up carying a `String` by value and paying for things like reference-counting. This change adds a `Name` type that is used to represent names of variables, types, macros, etc. Names are cached and unique'd globally for a session, and the string-to-name mapping gets done during lexing. From that point on, most mapping is from pointers, which should make all the various table lookups faster. More importantly (possibly), this brings us one step closer to being able to pool-allocate the AST nodes.
* Look up declaration keywords using ordinary scoping.Tim Foley2017-08-11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | The existing parser code was doing string-based matching on the lookahead token to figure out how to parse a declaration, e.g.: ``` if(lookAhead == "struct") { /* do struct thing */ } else if(lookAhead == "interface") { /* do interface thing * } ... ``` That approach has some annoying down-sides: - It is slower than it needs to be - It is annoying to deal with cases where the available declaration keywords might differ by language - Most importantly, it is not possible for us to introduce "extended" keywords that the user can make use of, but which can be ignored by the user and treated as an ordinary identifier. That last part is important. Suppose the user wanted to have a local variable named `import`, but we also had a Slang extension that added an `import` keyword. Then a line of code like `import += 1` would lead to a failure because we'd try to parse an import declaration, even when it is obvious that the user meant their local variable. This would mean that Slang can't parse existing user code that might clash with syntax extensions. This issue is the reason why we currently have keywords like `__import`. A traditional solution in a compiler is to map keywords to distinct token codes as part of lexing, which eliminates the first conern (performance) because now we can dispatch with `switch`. It can also aleviate the second concern if we add/remove names from the string->code mapping based on language (the rest of the parsing logic doesn't have to know about keywords being added/removed). The solution we go for here is more aggressive. Instead of mapping keyword names to special token codes during lexing, we instead introduce logical "syntax declarations" into the AST, which are looked up using the ordinary scoping rules of the language. Depending on what code is imported into the scope where parsing is going on, different keywords may then be visible. This solves our last concern, since a user-defined variable that just happens to use the same name as a keyword is now allowed to shadow the imported declaration for syntax (this is akin to, e.g., Scheme where there really aren't any "keywords"). This also opens the door to the possibility of eventually allowing user to define their own syntax (again, like Scheme). For now I'm only using this for the declaration keywords. With this change it should be pretty easy to also add statement keywords in the same fashion.
* Make source location lightweightTim Foley2017-08-10
| | | | | | | | | | | | | | | | Fixes #24 So far the code has used a representation for source locations that is heavy-weight, but typical of research or hobby compilers: a `struct` type containing a line number and a (heap-allocated) string. This is actually very convenient for debugging, but it means that any data structure that might contain a source location needs careful memory management (because of those strings) and has a tendency to bloat. The new represnetation is that a source location is just a pointer-sized integer. In the simplest mental model, you can think of this as just counting every byte of source text that is passed in, and using those to name locations. Finding the path and line number that corresponds to a location involves a lookup step, but we can arrange to store all the files in an array sorted by their start locations, and do a binary search. Finding line numbers inside a file is similarly fast (one you pay a one-time cost to build an array of starting offsets for lines). More advanced compilers like clang actually go further and create a unique range of source locations to represent a file each time it gets included, so that they can track the include stack and reproduce it in diagnostic messages. I'm not doing anything that clever here.
* Major naming overhaul:Tim Foley2017-08-09
| | | | | | | | | | - `ExpressionSyntaxNode` becomes `Expr` - `StatementSyntaxNode` becomes `Stmt` - `StructSyntaxNode` becomes `StructDecl` - `ProgramSyntaxNode` becomes `ModuleDecl` - `ExpressionType` becomes `Type` - Existing fields names `Type` become `type` - There might be some collateral damage here if there were, e.g., `enum`s named `Type`, but I can live with that for now and fix those up as a I see them
* Try to improve handling of failures during compilationTim Foley2017-07-19
| | | | | | | The change is mostly about trying to make sure the compiler "fails safe" when it encounters an internal assumption that isn't met. Most internal errors will now throw exceptions (yes, exceptions are evil, but this will work for now), and these get caught in `spCompile` so that they don't propagate to the user (they just see a message that compilation aborted due to an internal error). Subsequent changes are going to need to work on diagnosing as many of these situations as possible, so that users can at least know what construct in their code was unexpected or unhandled by the compiler.
* Fully parse function bodies, even in "rewriter" modeTim Foley2017-07-08
| | | | | | | | | | | This is in anticipation of needing to have more complete knowledge to be able to handle user code that `import`s library functionality. The big picture of this change is just to remove the `UnparsedStmt` class that was used to hold the bodies of user functions as opaque token streams, and thus to let the full parser and compiler loose on that code. That is the easy part, of course, and the hard part is all the fixes that this requires in the rest of the compielr to make this even remotely work. Subsequent commit address a lot of other issues, so this particular commit mostly represents work-in-progress. One detail is that this change puts a conditional around nearly every diagnostic message in `check.cpp` to suppress thing when in rewriter mode. I have yet to check how that works out if there are errors in anything we actually need to understand for the purposes of generating reflection data.
* Overhaul `RefPtr` and `String`Tim Foley2017-06-29
| | | | | | | | | | | | | | | | | | | | | - `RefPtr` no longer tries to have distinct cases for interal-vs-external reference counts. Instead we always require an internal reference count. - Types the used `RefPtr` but weren't `RefObject` were made to inherit `RefObject` - The `ReferenceCounted` base class was removed, so that only `RefObject` remains - Implicit conversion from `RefPtr<T>` to `T*` added - This created some complicates for other types that relied on implicit conversions, so this isn't a net cleanup right now - The main type that got messed up by the above was `String`, which previously held a `RefPtr<char, ...>`. This change thus *also* includes a major overhaul of `String`: - `String` now holds all its data via indirection, using a `StringRepresentation` that is a `RefObject`. This object holds a length, capacity, and directly stores the character data in its allocation. This means that `sizeof(String)==sizeof(void*)` - It is now possible to directly mutate a `String` by appending to its representation (we just need to ensure it has a reference count of `1`, possibly by cloning it). This means that `StringBuilder` is now basically just an idomatic use of `String` - A couple operations that just return sub-ranges of a `String` now return `StringSlice` to avoid allocation when it isn't needed. This required more work. - Indices into strings changed from `int` to `UInt` (which is pointer-sized). This had a bunch of follow-on changes because the value `-1` sometimes needs to be special-cased in code that uses indices. Further cleanups are probably needed here.
* Actually respect suffixes on numeric literals.Tim Foley2017-06-28
| | | | | | | | | | | | | | | | | - Add logic to extract the value and suffix from a numeric literal - This duplicates some of the lexing logic, but this is hard to avoid without redundant runtime work - Note that I'm not using and stdlib string-to-number code. This should be more robust once it is working, but it is obviously error prone in the near term. The main up-sides to this are: - We can handle binary integer literals - We can handle hexadecimal floating-point literals without stdlib support - We can hypothetically support digit separators, if we ever wanted - The parser looks at the suffix characters sliced off by the lexer, and tries to pick a type to use for a literal - It uses `NULL` if there is no suffix, to avoid some nasty order dependencies where the stdlib might need to parse a number before it has seen the definition of `int` - Right now I only handle a few cases, so there may be bugs lurking here - The emit logic needs to handle the fact that a literal node in the AST might have a non-default type attached. - Right now I just quickly check for the most likely types, and emit the literal with a matching suffix. This doesn't seem robust if any source language supports a suffix for a type where a target has no corresponding suffix. In the long term some amount of casting is probably required.
* Rename literal tokens.Tim Foley2017-06-28
| | | | | These had a typo (`Literial`), so they needed a fix eventually. I also went ahead and made things a bit more verbose (`IntegerLiteral`, `FloatingPointLiteral`) because these names don't get used often enough for the brevity to pay off.
* Bug fix for newline escaping.Tim Foley2017-06-19
| | | | | | | | | | | | | The previous changes had left out logic for "scrubbing" a token value that includes an escaped newline, because I expected it would only occur within whitespace. Unfortunately, some user code looked like this: ``` a + b ``` That is, there was a token at the very start of the line, after the escaped newline. As a result, after consuming the leading whitespace (which didn't end up consuming the escaped newline - but we could consider making it do so in future), the lexer started to lex a token that *starts* with an escaped newline, but turns out to be an identifer (which gets an invalid name). This change adds some ad-hoc code to "scrub" the value of *every* token, which wasteful but at least solves the problem.
* Rename `Slang::Compiler` -> `Slang`Tim Foley2017-06-15
| | | | This gets rid of one unecessary namespace.
* Fixups for newline-escaping behavior.Tim Foley2017-06-13
| | | | This is really messy and I'm not entirely happy with the result, but at least we handle a couple more corner cases.
* Lexer: handle escaped newlinesTim Foley2017-06-12
| | | | | | | | | | | | | | | This is mostly to allow for the idiomatic style of defining a multi-line macro in C: #define FOO(a,b) \ x(a) \ y(b) \ /* end */ The handling is reasonably general: in the lexer whenever we need to consume or "peek" the next code point, we check if we are at the start of a backslash-newline sequence, and if so we skip past that to find what we were looking for. However, the way I'm handling things right now there is no step taken to "clean" a token and remove the backslash or newline from its value, so downstream code that actually inspects token values will probably break if users start putting escaped newlines in the middle of names. We can fix that issue when (if) it comes up.
* Initial import of code.Tim Foley2017-06-09