This article is within the scope of WikiProject Computer science, a collaborative effort to improve the coverage of Computer science related articles on Wikipedia. If you would like to participate, please visit the project page, where you can join the discussion and see a list of open tasks.Computer scienceWikipedia:WikiProject Computer scienceTemplate:WikiProject Computer scienceComputer science articles
Added a disadvantages section and edited the advantages. This article is in need of being balanced, scannerless parsing is a technique which makes sense in limited circumstances, usually when the language being parsed is very simple. Remember, there is a reason the lexer/parser distinction was made in the first place.
In particular:
lexer/parser distinction not neccesary: actually, yes it is, depending on what your needs are. as I said above, there was a very good reason it was developed, combining the 2 functions (scanning/parsing) with more complex languages becomes messy, harder to understand, develop, and maintain. Moved this into the introduction, changed semantics to explain when this technique is appropriate (as opposed to implying it's a universal truth)
no keywordification: keywords are often included as a feature, and having a seperate lexer and parser doesn't mean you have to have keywords; scannerless parsing can do without them simply because it has less of the design constraints that make keywords attractive to implement in the first place. Also, many people would rightly consider keywords a feature, and not a requirement; go look up the early fortran days to get an inkling why. As such, moved this info to the token classification not required advantage.
I've been observing this article for a while, and I've been dismayed at how poor the article still is. It contains a number of factual mistakes and does not really explain anything. I'm reluctant to improve the article myself though, because I'm one of the researchers publishing on the merits of scannerless parsing. I have a few problems with the article:
There is no decent explanation of the scanner/parsing process. This article should explain how in a traditional scanner/parser division a scanner splits up a character stream into tokens, and how the parser consumes the tokens.
The article does not give any actual examples of cases where scannerless parsing is useful. The current list of applications is not correct. In fact, scannerless parsing is mostly useful for languages with a complex, context-sensitive lexical syntax. Typically, these are languages that involve a mixture of different sublanguages. We've published a series of papers on this: "Concrete Syntax for Objects" (OOPSLA'04) and "Declarative, Formal, and Extensible Syntax Definition for AspectJ - A Case for Scannerless Generalized-LR Parsing". The second paper in particular illustrates how the traditional scanner/parser separation breaks down on languages with a complex context-sensitive lexical syntax.
The 'required extensions' section is largely focussed on language extensions in SDF/SGLR. Some of these extensions are not related to scannerless parsing at all. In particular: preference attributes (more an aspect of GLR) and per-production transitions (related to the priorities mechanism, which is unrelated to scannerless parsing).
I also think this article contains many errors. For example the advantage 'Grammars are compositional' is not related to Scannerless parsers: it is theoretically possible to have an LL(1)-parser that is scannerless, and LL(1) is not closed under composition. The third page of this paper describes it: http://www.springerlink.com/content/xugat38tyrxvtm9w/. So compositional grammars seem to be a feature of generalized parsers, instead of scannerless. It is related, because most scannerless parsers are generalized.
Seems to make sense. Unfortunately I don't understand it good enough to edit the footnote to make real sense, can you do that? Mbvlist (talk) 10:10, 14 July 2009 (UTC)[reply]