Abstract Machine
The first presentation on CppCon 2020 is from Bob Steagall. He gave an overview of the abstract machine, according to which C++ is defined. Here are my summary from the presentation, peppered with some complementary notes.
Contents
Definition
When we write code, we do not typically target any specific operating system or hardware. Instead, we are targeting its abstraction described by the language specification. The according abstract machine is defined in §4.1.2.
C++ Specification, §4.1.2/1.
The semantic descriptions in this document define a parameterized nondeterministic abstract machine. This document places no requirement on the structure of conforming implementations. In particular, they need not copy or emulate the structure of the abstract machine. Rather, conforming implementations are required to emulate (only) the observable behavior of the abstract machine as explained below.
Specific implementations like gcc
or clang
translate our code for the
physical machine. The behavior of the abstract machine and physical machine
match on specific, observable points. Those consist of reading a volatile
variable, modifying an object or calling a library function that performs IO.
This also includes calling a function that performs any such operations.
The abstract machine is parameterized through implementation-defined behavior. It is nondeterministic due to unspecified behavior. The program is well-formed, if it has correct syntax, diagnosable semantics and no violations of the one definition rule 1.
Behavior
If not exactly specified, the behavior of the abstract machine is categorized in implementation defined, unspecified or undefined. The program can also be explicitly ill-formed.
Implementation Defined
The behavior is not exactly specified. However, it needs to be documented. One
example is sizeof(void*)
, the value of which depends on the platform.
Another is the exact message provided by std::bad_alloc::what()
.
Unspecified Behavior
Unspecified behavior is allowed and needs no documentation. One example is the evaluation order of function parameters. Another one is whether same string literals are stored in one place or individually. Finally, the order, contiguity and initial value of successive allocation requests is also unspecified.
Undefined Behavior
Undefined behavior puts no requirements on the program. No diagnostic is
required. Examples consist of dereferencing nullptr
and signed integer
overflow.
Ill-Formed
Diagnosable semantics errors fall into this category. There is a sub-category
ill-formed, no diagnostic required (IFNDR). It includes all semantics
errors, which cannot be diagnosed at compile-time. One example is
a constructor, which directly or indirectly delegates to itself. Another one
consists of mismatching [[noreturn]]
tags on function declarations in
different compilation units. Finally, some violations of the one definition
rule fall into this subcategory.
Structure
The structure of the abstract machine roughly consists of its memory, threads and expressions.
Memory
The memory consists of a single, flat space. According to specification, all memory is always reachable. The abstract machine provides no concepts of stack, registers or cache. However, the specification mentions stack unwinding regarding exceptions. There is no definition for external memory on GPU or the coprocessor.
Memory is composed of bytes and every byte has an address.
Objects
Memory is organized in objects. Each object has following properties.
Type |
Object’s type like |
||||||||
Value |
Object’s value such as |
||||||||
Name |
Optional name of the object. It is optional, as temporary objects do not have a name. |
||||||||
Location |
The address of object’s first byte. It is optional as temporaries do not have an address. |
||||||||
Size |
The value returned by |
||||||||
Alighnment |
The value returned by |
||||||||
Storage Duration |
|
||||||||
Lifetime |
The lifetime of any object begins with obtaining and initializing its storage. It ends with the object’s destruction. It also ends if object’s storage is released or re-occupied by another object that is not nested in the original one. |
The specification allows pointing one past the last element of an array. Such pointers can be checked for equality. However, dereferencing and comparing is not allowed.
Lifetime gap in the specification.
Note an interesting gap in the specification. If you receive an object in
raw byte representation, casting it via reinterpret_cast<>()
is
undefined behavior, because such an object has never been created in the
first place. Another example is dynamic construction of arrays.
To close the gap, std::start_lifetime_as<>()
is expected in the next
standard. Unfortunately, it has not passed into C++20 due to time
constraints.
Threads
Threads describe a single execution flow. Threads start with their top-level function. The initial function is executed by the thread and not its caller. Threads then recursively include all functions called by the top-level one.
According to the abstract machine, every thread has access to all memory.
Every program has at least one thread with main()
as its entry point.
Valid signatures are at least int main()
and int main(int, char**)
. It
cannot be overloaded and it cannot be a coroutine. The program cannot call it
explicitly or define a global variable named main
. main()
cannot be
delete
d, static
, inline
or constexpr
. It needs to have C++
linkage and no explicit linkage specification is allowed.
Expression
An expression has a type and a value category. The categories can be represented as following tree.
expression ──┬──> glvalue ──┬─> lvalue │ │ │ └─> xvalue │ ┌─> -"- │ │ └──> rvalue ───┴─> prvalue
An rvalue has no name. Its address cannot be taken. One example are temporary objects. A glvalue is a general expression that determines the identity of an object. A lvalue has a name and its address can be retrieved. A prvalue initializes an object. It also computes the value of an operand according to its context. Examples are literals and function calls returning non-reference types. An xvalue is a glvalue, which can be moved from. It can initialize a rvalue reference. An example consists of a function returning rvalue.
A rough determination whether something is rvalue or lvalue is its name.
If it has one, it is lvalue. If it has none, it is rvalue. To be sure,
type traits std::is_rvalue_reference<>
and std::is_lvalue_reference<>
can be used. The following example provides a short demonstration.
constexpr std::is_lvalue_reference<auto&&> is_lvalue(auto&&){ return {}; } constexpr std::is_rvalue_reference<auto&&> is_rvalue(auto&&){ return {}; } auto main() -> int { struct S{} s; static_assert(is_lvalue(s)); static_assert(is_rvalue(S{})); }
Conclusion
From my perspective, the presentation provides a great entry point for the language specification. I doubt it would make you a better developer right away. Your code would be as solid as before. It rather gives you some context to understand basic terminology such as objects and expression types. The target group would consist of developrers who are interested in language specifics.
- 1
-
The one definition rule is defined in §6.3 of the specification. According to it, no translation unit shall contain more than one definition of any variable, function, class, enumeration, template or default parameters for functions or templates.
While some violations must be diagnosed by the compiler, particularly those that span multiple translation units do not require diagnosis.