Undefined Behaviour in Code

June 12, 2021 5-minute read

Marcus

C++ • JavaScript

Recently, while browsing the ever-interesting Internet, I came across a very interesting set of code. The set of code was written in C/C++, but if one were to apply it into an interpreted language like JavaScript, the output would be different.

Like many people, I was initially confused about the difference between a Compiled and Interpreted programming language, and I still am, but this article really helped provide me with a better understanding.

Comparison between C++ and JS snippet

Consider the following snippet of C++ code:

int main() {
  int a = 1, b;
  a++;
  b = ++a + a++ + ++a;
  cout << b << endl; // Output is 12
}

Then, consider the following snippet written in JS:

let a = 1, b
a++
b = ++a + a++ + ++a
console.log(b) // Output is 11

JavaScript snippet and my interpretation

In JavaScript, the documentation is clear regarding the increment (++) operator. Using postfix increment (ie. a++) would mean that the value is incremented (still), but the value returned is before increment, and using prefix increment (++a) would mean that the value is incremented and returned after the increment.

Therefore, by tracing the JavaScript code snippet from left to right, we get the following:

let a = 1, b
a++ // value of a is 2
b = ++a + a++ + ++a // taken as: b = 3 + 3 + 5
console.log(b) // Output is 11 = 3 + 3 + 5

This might be confusing, but after digging around, I was able to come up with the following explanation:

In line 2, a is incremented to 2.
In line 3, the first ++a increments a to 3, and registers the first a’s value in the equation to be 3.
In line 3, the second a++ increments a to 4, BUT registers a’s value in the equation as 3.
In line 3, the third ++a increments a to 5, and registers its value in the equation to be 5.

C++ snippet and my interpretation

In C++, however, the increment operator is not as straightforward. After digging around Stack Overflow for a while and watching a tutorial in Hindi, I managed to trace the C++ code, using the steps in the tutorial.

int main() {
  int a = 1, b;
  a++; // value of a is 2, same as JS
  b = ++a + a++ + ++a; // taken as: b = 4 + 3 + 5
  cout << b << endl; // Output is 12
}

This code trace is more confusing than the one for JavaScipt:

In line 2, a is incremented to 2.
Line 3 is compiled as b = (++a + a++) + ++a.
In line 3, ++a increments a to 3, and a++ registers the second a’s value as 3.
After registering the second a’s value, a++ then increments a to 4.
Thus, the initial ++a registers the first a’s value as 4.
Finally, the last ++a increments a to 5, and registers it.

Undefined Behaviour in C/C++?

With the above tracing method as prescribed by the tutorial, a sense of comfort is given to one, as they are able to successfully trace and solve the C/C++ code, at least on paper.

However, I don’t really believe that the previous code trace was the intended code tracing method. In fact, based on majority of the C++ community, b = ++a + a++ + ++a; bears no sensible meaning. Because of undefined behaviour, the snippet of C/C++ code provided above could be compiled many different ways by different compilers. Since this is syntactically valid, some compilers may attempt to give an output, like 12, while some other compilers would fail to compile. In fact, when I placed brackets to get b = ++a + (a++ + ++a);, the output given was 13, and I could not apply the code tracing technique above, at least to any reasonable conclusion.

Why is this code considered undefined behaviour?

It is unspecified which ++a or a++ will be evaluated first, and since the input of the next evaluation of a depends on that, it becomes undefined behaviour. There also exists multiple side effects in the initialisation of b, since there are multiple increment operators used on a.

This undefined behaviour is documented, and in the case of our C++ code snippet, the code has both violations that lead to that behaviour. Firstly, “a side effect on a scalar object is unsequenced relative to another side effect on the same scalar object” when a++ and ++a are used in the same line, and secondly, “a side effect on a scalar object is unsequenced relative to a value computation using the value of the same scalar object”, when the value of b is initialised to the mess of a RHS, since it is unspecified when b should be assigned. Therefore, the output of the code becomes compiler dependent, and may vary.

Why does there exist undefined behaviour?

This is probably a legacy question, but I would think that because of different hardware specifications, it was tough to create a programming language and compiler that could adhere to the same behaviour. Bear in mind that computers are unable to make sense of words and letters, and they perform based on the 1s and 0s provided. As such, the responsibility of writing sensible code fell on the programmers.

However, as C++ improves over the years, people are coming together to agree on certain behaviour. As C++17 was introduced, more undefined behaviour became specifically defined.

Conclusion

Moral of the story: Write Good Code! This is a reminder to myself that coding is about making sense of the senseless 1s and 0s in machine code. By writing a few extra characters and lines every now and then, like breaking up the code into a += 1, code becomes more readable even without documentation. These snippets of code probably will not and should not exist anywhere, other than forums and blog posts in the name of learning and fun. After all, imagine having to sit down and spend a few hours trying to decrypt a line of code just because someone wanted to “save some time”.

Return to Posts