Stories
Slash Boxes
Comments

Dev.SN ♥ developers

Submission Preview

Link to Story

Testing Parsing_C_Is_Literally_Undecidable_2013.htm

Accepted submission by SNAPI_Test at 2019-10-06 13:45:01
Software

Title: Parsing C++ Is Literally Undecidable (2013)

--- --- --- --- Entire Story Below - Must Be Edited --- --- --- --- --- --- ---

Arthur T Knackerbracket has found the following story [reverberate.org]:

Many programmers are aware that C++ templates are Turing-complete, and this was proved in the 2003 paper C++ Templates are Turing Complete [port70.net].

However, there is an even stronger result that many people are not aware of. The C++ FQA has a section showing that parsing C++ is undecidable [yosefk.com], but many people have misinterpreted the full implications of this (understandable, since the FQA is discussing several issues over the course of its questions and does not make explicit the undecidability proof).

Some people misinterpret this statement to simply mean that fully compiling a C++ program is undecidable, or that showing the program valid is undecidable. This line of thinking presumes that constructing a parse tree is decidable, but only further stages of the compiler such as template instantiation are undecidable.

For example, see this (incorrect, but top-voted) Stack Overflow answer [stackoverflow.com] to the question What do people mean when they say C++ has “undecidable grammar”? [stackoverflow.com] This answer errs when it says: “Note this has nothing to do with the ambiguity of the C++ grammar.”

In fact, simply producing a parse tree for a C++ program is undecidable, because producing a parse tree can require arbitrary template instantiation. I will demonstrate this with a short program, which is a simplification/adaptation of what is in the FQA link above.

The parse tree for this program depends on whether TuringMachine::output is SomeType or not. If it is SomeType then ::name is an integer and the parse tree for the program is multiplying two integers and throwing away the result. If it is not SomeType, then ::name is a typedef for int and the parse tree is declaring a pointer-to-int named x. These two are completely different parse trees, and the difference between them cannot be delayed to further stages of the compiler.

The parse tree itself depends on arbitrary template instantiation, and is therefore the parsing step is undecidable.

In practice, compilers limit template instantiation depth, so this is more of a theoretical problem than a practical one. But it is still a deep and significant result if you are ever planning on writing a C++ parser.

Parsing, performance, and low-level programming.


Original Submission