What do people mean when they say C++ has "undecidable grammar"?

What do people mean when they say C++ has "undecidable grammar"?

哑剧 发布于 2021-11-29 字数 104 浏览 648 回复 5 原文

What do people mean when they say this? What are the implications for programmers and compilers?

如果你对这篇文章有疑问,欢迎到本站 社区 发帖提问或使用手Q扫描下方二维码加群参与讨论,获取更多帮助。



需要 登录 才能够评论, 你可以免费 注册 一个本站的账号。


-旧情勿念。 2022-06-07 5 楼

The implication for those of us using the language is that the error messages can get very weird, very fast (in practice this isn't such a big deal. STL library errors are usually worse than the stuff you end up with due to the language grammar).

The implication for those who write the compilers is that they have to spend a lot of extra time and effort compiling the language.

逆蝶 2022-06-07 4 楼

'Undecidable grammar' is a very poor choice of words. A truly undecidable grammar is such that there exists no parser for the grammar that will terminate on all possible inputs. What they likely mean is that C++ grammar is not context-free, but even this is somewhat a matter of taste: Where to draw the line between syntax and semantics? Any compiler will admit only a proper subset of those programs that pass the parser stage without syntax errors and only a proper subset of those programs actually run without errors, thus no language is truly context-free or even decidable (barring perhaps some esoteric languages).

溺深海 2022-06-07 3 楼

If "some people" includes Yossi Kreinin, then based on what he writes here ...

Consider this example:

x * y(z);

in two different contexts:

int main() {
    int x, y(int), z;
    x * y(z);


int main() {
    struct x { x(int) {} } *z;
    x * y(z);

... he means "You cannot decide by looking at x * y(z) whether it is an expression or a declaration." In the first case, it means "call function y with argument z, then invoke operator*(int, int) with x and the return value of the function call, and finally discard the result." In the second case, it means "y is a pointer to a struct x, initialized to point to the same (garbage & time-bomb) address as does z."

Say you had a fit of COBOLmania and added DECLARE to the language. Then the second would become

int main() {
    DECLARE struct x { x(int) {} } *z;
    DECLARE x * y(z);

and the decidability would appear. Note that being decidable does not make the pointer-to-garbage problem go away.

叫思念不要吵 2022-06-07 2 楼

What it probably means is that C++ grammar is syntactically ambiguous, that you can write down some code that could mean different things, depending on context. (The grammar is a description of the syntax of a language. It's what determines that a + b is an addition operation, involving variables a and b.)

For example, foo bar(int(x));, as written, could be a declaration of a variable called bar, of type foo, with int(x) being an initializer. It could also be a declaration of a function called bar, taking an int, and returning a foo. This is defined within the language, but not as a part of the grammar.

The grammar of a programming language is important. First, it's a way to understand the language, and, second, it's part of compiling that can be made fast. Therefore, C++ compilers are harder to write and slower to use than if C++ had an unambiguous grammar. Also, it's easier to make certain classes of bugs, although a good compiler will provide enough clues.

知你几分 2022-06-07 1 楼

This is related to the fact that C++'s template system is Turing complete. This means (theoretically) that you can compute anything at compile time with templates that you could using any other Turing complete language or system.

This has the side effect that some apparently valid C++ programs cannot be compiled; the compiler will never be able to decide whether the program is valid or not. If the compiler could decide the validity of all programs, it would be able to solve the Halting problem.

Note this has nothing to do with the ambiguity of the C++ grammar.

Edit: Josh Haberman pointed out in the comments below and in a blog post with an excellent example that constructing a parse tree for C++ actually is undecideable. Due to the ambiguity of the grammar, it's impossible to separate syntax analysis from semantic analysis, and since semantic analysis is undecideable, so is syntax analysis.

See also (links from Josh's post):