How C++ Works: Understanding Compilation

Bjarne Stroustrup’s The C++ Programming Language has a chapter titled “A Tour of C++: The Basics”—Standard C++. That chapter, in 2.2, mentions in half a page the compilation and linking process in C++. Compilation and linking are two very basic processes that happen all the time during C++ software development, but oddly enough, they aren’t well understood by many C++ developers.

Why is C++ source code split into header and source files? How is each part seen by the compiler? How does that affect compilation and linking? There are many more questions like these that you may have thought about, but have come to accept as convention.

Whether you are designing a C++ application, implementing new features for it, trying to address bugs (especially certain strange bugs), or trying to make C and C++ code work together, knowing how compilation and linking works will save you a lot of time and make those tasks much more pleasant. In this article, you will learn exactly that.

The article will explain how a C++ compiler works with some of the basic language constructs, answer some common questions that are related to their processes, and help you work around some related mistakes that developers often make in C++ development.

Note: This article has some example source code that can be downloaded from https://bitbucket.org/danielmunoz/cpp-article

The examples were compiled in a CentOS Linux machine:


$ uname -sr
Linux 3.10.0-327.36.3.el7.x86_64

Using g++ version:


$ g++ --version
g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-11)

The source files provided should be portable to other operating systems, although the Makefiles accompanying them for the automated build process should be portable only to Unix-like systems.

Each C++ source file needs to be compiled into an object file. The object files resulting from the compilation of multiple source files are then linked into an executable, a shared library, or a static library (the last of these being just an archive of object files). C++ source files generally have the .cpp, .cxx or .cc extension suffixes.

A C++ source file can include other files, known as header files, with the

#include

directive. Header files have extensions like .h, .hpp, or .hxx, or have no extension at all like in the C++ standard library and other libraries’ header files (like Qt). The extension doesn’t matter for the C++ preprocessor, which will literally replace the line containing the

#include

directive with the entire content of the included file.

The first step that the compiler will do on a source file is run the preprocessor on it. Only source files are passed to the compiler (to preprocess and compile it). Header files aren’t passed to the compiler. Instead, they are included from source files.

Each header file can be opened multiple times during the preprocessing phase of all source files, depending on how many source files include them, or how many other header files that are included from source files also include them (there can be many levels of indirection). Source files, on the other hand, are opened only once by the compiler (and preprocessor), when they are passed to it.

For each C++ source file, the preprocessor will build a translation unit by inserting content in it when it finds an #include directive at the same time that it’ll be stripping code out of the source file and of the headers when it finds conditional compilation blocks whose directive evaluates to

false

. It’ll also do some other tasks like macro replacements.

Once the preprocessor finishes creating that (sometimes huge) translation unit, the compiler starts the compilation phase and produces the object file.

To obtain that translation unit (the preprocessed source code), the

-E

option can be passed to the g++ compiler, along with the

-o

option to specify the desired name of the preprocessed source file.

In the

cpp-article/hello-world

directory, there is a “hello-world.cpp” example file:


<span class="hljs-preprocessor">#include &lt;iostream&gt;</span>

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"Hello world"</span> &lt;&lt; <span class="hljs-built_in">std</span>::endl;
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}

Create the preprocessed file by:


$ g++ -E hello-world.cpp -o hello-world.ii

And see the number of lines:


$ wc -l hello-world.ii
17558 hello-world.ii

It has 17,588 lines in my machine. You can also just run

make

on that directory and it’ll do those steps for you.

We can see that the compiler must compile a much larger file than the simple source file that we see. This is because of the included headers. And in our example, we have included just one header. The translation unit becomes bigger and bigger as we keep including headers.

This preprocess and compile process is similar for C language. It follows the C rules for compiling, and the way it includes header files and produces object code is nearly the same.

How Source Files Import and Export Symbols

Let’s see now the files in

cpp-article/symbols/c-vs-cpp-names directory

.

There is a simple C (not C++) source file named sum.c that exports two functions, one for adding two integers and one for adding two floats:


<span class="hljs-keyword">int</span> sumI(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b) {
    <span class="hljs-keyword">return</span> a + b;
}

<span class="hljs-keyword">float</span> sumF(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b) {
    <span class="hljs-keyword">return</span> a + b;
}

Compile it (or run

make

and all the steps to create the two example apps to be executed) to create the sum.o object file:


$ gcc -c sum.c

Now look at the symbols exported and imported by this object file:


$ nm sum.o
0000000000000014 T sumF
0000000000000000 T sumI

No symbols are imported and two symbols are exported:

sumF

and

sumI

. Those symbols are exported as part of the .text segment (T), so they are function names, executable code.

If other (both C or C++) source files want to call those functions, they need to declare them before calling.

The standard way to do it is to create a header file that declares them and includes them in whatever source file we want to call them. The header can have any name and extension. I chose

sum.h

:


<span class="hljs-preprocessor">#ifdef __cplusplus</span>
<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> {
<span class="hljs-preprocessor">#endif</span>

<span class="hljs-keyword">int</span> sumI(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">float</span> sumF(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

<span class="hljs-preprocessor">#ifdef __cplusplus</span>
} <span class="hljs-comment">// end extern "C"</span>
<span class="hljs-preprocessor">#endif</span>

What are those

ifdef

/

endif

conditional compilation blocks? If I include this header from a C source file, I want it to become:


<span class="hljs-keyword">int</span> sumI(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">float</span> sumF(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

But if I include them from a C++ source file, I want it to become:


<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> {

<span class="hljs-keyword">int</span> sumI(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">float</span> sumF(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

} <span class="hljs-comment">// end extern "C"</span>

C language doesn’t know anything about the

extern "C"

directive, but C++ does, and it needs this directive applied to C function declarations. This is because C++ mangles function (and method) names because it supports function/method overloading, while C doesn’t.

This can be seen in the C++ source file named print.cpp:


<span class="hljs-preprocessor">#include &lt;iostream&gt; <span class="hljs-comment">// std::cout, std::endl</span></span>
<span class="hljs-preprocessor">#include "sum.h" <span class="hljs-comment">// sumI, sumF</span></span>

<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cout</span> &lt;&lt; a &lt;&lt; <span class="hljs-string">" + "</span> &lt;&lt; b &lt;&lt; <span class="hljs-string">" = "</span> &lt;&lt; sumI(a, b) &lt;&lt; <span class="hljs-built_in">std</span>::endl;
}

<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b) {
    <span class="hljs-built_in">std</span>::<span class="hljs-built_in">cout</span> &lt;&lt; a &lt;&lt; <span class="hljs-string">" + "</span> &lt;&lt; b &lt;&lt; <span class="hljs-string">" = "</span> &lt;&lt; sumF(a, b) &lt;&lt; <span class="hljs-built_in">std</span>::endl;
}

<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> <span class="hljs-keyword">void</span> printSumInt(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b) {
    printSum(a, b);
}

<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> <span class="hljs-keyword">void</span> printSumFloat(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b) {
    printSum(a, b);
}

There are two functions with the same name (

printSum

) that only differ in their parameters’ type:

int

or

float

. Function overloading is a C++ feature which isn’t present in C. To implement this feature and differentiate those functions, C++ mangles the function name, as we can see in their exported symbol name (I’ll only pick what’s relevant from nm’s output):


$ g++ -c print.cpp
$ nm print.o
0000000000000132 T printSumFloat
0000000000000113 T printSumInt
                 U sumF
                 U sumI
0000000000000074 T _Z8printSumff
0000000000000000 T _Z8printSumii
                 U _ZSt4cout

Those functions are exported (in my system) as

_Z8printSumff

for the float version and

_Z8printSumii

for the int version. Every function name in C++ is mangled unless declared as

extern "C"

. There are two functions that were declared with C linkage in

print.cpp

:

printSumInt

and

printSumFloat

.

Therefore, they cannot be overloaded, or their exported names would be the same since they aren’t mangled. I had to differentiate them from each other by postfixing an Int or a Float to the end of their names.

Since they are not mangled they can be called from C code, as we’ll soon see.

To see the mangled names like we would see them in C++ source code, we can use the

-C

(demangle) option in the

nm

command. Again, I’ll only copy the same relevant part of the output:


$ nm -C print.o
0000000000000132 T printSumFloat
0000000000000113 T printSumInt
                 U sumF
                 U sumI
0000000000000074 T printSum(float, float)
0000000000000000 T printSum(int, int)
                 U std::cout

With this option, instead of

_Z8printSumff

we see

printSum(float, float)

, and instead of

_ZSt4cout

we see std::cout, which are more human-friendly names.

We also see that our C++ code is calling C code:

print.cpp

is calling

sumI

and

sumF

, which are C functions declared as having C linkage in

sum.h

. This can be seen in the nm output of print.o above, that informs of some undefined (U) symbols:

sumF

,

sumI

and

std::cout

. Those undefined symbols are supposed to be provided in one of the object files (or libraries) that will be linked together with this object file output in the link phase.

So far we have just compiled source code into object code, we haven’t yet linked. If we don’t link the object file that contain the definitions for those imported symbols together with this object file, the linker will stop with a “missing symbol” error.

Note also that since

print.cpp

is a C++ source file, compiled with a C++ compiler (g++), all the code in it is compiled as C++ code. Functions with C linkage like

printSumInt

and

printSumFloat

are also C++ functions that can use C++ features. Only the names of the symbols are compatible with C, but the code is C++, which can be seen by the fact that both functions are calling an overloaded function (

printSum

), which couldn’t happen if

printSumInt

or

printSumFloat

were compiled in C.

Let’s see now

print.hpp

, a header file that can be included both from C or C++ source files, which will allow

printSumInt

and

printSumFloat

to be called both from C and from C++, and

printSum

to be called from C++:


<span class="hljs-preprocessor">#ifdef __cplusplus</span>
<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);
<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> {
<span class="hljs-preprocessor">#endif</span>

<span class="hljs-keyword">void</span> printSumInt(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> printSumFloat(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

<span class="hljs-preprocessor">#ifdef __cplusplus</span>
} <span class="hljs-comment">// end extern "C"</span>
<span class="hljs-preprocessor">#endif</span>

If we are including it from a C source file, we just want to see:


<span class="hljs-keyword">void</span> printSumInt(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> printSumFloat(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);
printSum

can’t be seen from C code since its name is mangled, so we don’t have a (standard and portable) way to declare it for C code. Yes, I can declare them as:


<span class="hljs-keyword">void</span> _Z8printSumii(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> _Z8printSumff(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

And the linker won’t complain, since that’s the exact name that my currently installed compiler invented for it, but I don’t know if it’ll work for your linker (if your compiler generates a different mangled name), or even for the next version of my linker. I don’t even know if the call will work as expected because of the existence of different calling conventions (how parameters are passed and return values are returned) that are compiler specific and may be different for C and C++ calls (especially for C++ functions that are member functions and receive the this pointer as a parameter).

Your compiler can potentially use one calling convention for regular C++ functions and a different one if they are declared as having extern “C” linkage. So, cheating the compiler by saying that one function uses C calling convention while it actually uses C++ for it can deliver unexpected results if the conventions used for each happen to be different in your compiling toolchain.

There are standard ways to mix C and C++ code and a standard way to call C++ overloaded functions from C is to wrap them in functions with C linkage as we did by wrapping

printSum

with

printSumInt

and

printSumFloat

.

If we include

print.hpp

from a C++ source file, the

__cplusplus

preprocessor macro will be defined and the file will be seen as:


<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> printSum(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);
<span class="hljs-keyword">extern</span> <span class="hljs-string">"C"</span> {

<span class="hljs-keyword">void</span> printSumInt(<span class="hljs-keyword">int</span> a, <span class="hljs-keyword">int</span> b);
<span class="hljs-keyword">void</span> printSumFloat(<span class="hljs-keyword">float</span> a, <span class="hljs-keyword">float</span> b);

} <span class="hljs-comment">// end extern "C"</span>

This will allow C++ code to call the overloaded function printSum or its wrappers

printSumInt

and

printSumFloat

.

Now let’s create a C source file containing the main function, which is the entry point for a program. This C main function will call

printSumInt

and

printSumFloat

, that is, will call both C++ functions with C linkage. Remember, those are C++ functions (their function bodies execute C++ code) that only don’t have C++ mangled names. The file is named

c-main.c

:


<span class="hljs-preprocessor">#include "print.hpp"</span>

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    printSumInt(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>);
    printSumFloat(<span class="hljs-number">1.5f</span>, <span class="hljs-number">2.5f</span>);
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}

Compile it to generate the object file:


$ gcc -c c-main.c

And see the imported/exported symbols:


$ nm c-main.o
0000000000000000 T main
                 U printSumFloat
                 U printSumInt

It exports main and imports

printSumFloat

and

printSumInt

, as expected.

To link it all together into an executable file, we need to use the C++ linker (g++), since at least one file that we’ll link,

print.o

, was compiled in C++:


$ g++ -o c-app sum.o print.o c-main.o

The execution produces the expected result:


$ ./c-app
1 + 2 = 3
1.5 + 2.5 = 4

Now let’s try with a C++ main file, named

cpp-main.cpp

:


<span class="hljs-preprocessor">#include "print.hpp"</span>

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    printSum(<span class="hljs-number">1</span>, <span class="hljs-number">2</span>);
    printSum(<span class="hljs-number">1.5f</span>, <span class="hljs-number">2.5f</span>);
    printSumInt(<span class="hljs-number">3</span>, <span class="hljs-number">4</span>);
    printSumFloat(<span class="hljs-number">3.5f</span>, <span class="hljs-number">4.5f</span>);
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}

Compile and see the imported/exported symbols of the

cpp-main.o

object file:


$ g++ -c cpp-main.cpp
$ nm -C cpp-main.o
0000000000000000 T main
                 U printSumFloat
                 U printSumInt
                 U printSum(float, float)
                 U printSum(int, int)

It exports main and imports C linkage

printSumFloat

and

printSumInt

, and both mangled versions of

printSum

.

You may be wondering why the main symbol isn’t exported as a mangled symbol like

main(int, char**)

from this C++ source, since it’s a C++ source file and it isn’t defined as

extern "C"

. Well,

main

is a special implementation defined function and my implementation seems to have chosen to use C linkage for it no matter whether it’s defined in a C or C++ source file.

Linking and running the program gives the expected result:


$ g++ -o cpp-app sum.o print.o cpp-main.o
$ ./cpp-app
1 + 2 = 3
1.5 + 2.5 = 4
3 + 4 = 7
3.5 + 4.5 = 8

How Header Guards Work

So far, I’ve been careful not to include my headers twice, directly or indirectly, from the same source file. But since one header can include other headers, the same header can indirectly be included multiple times. And since header content is just inserted in the place from where it was included, it’s easy to end with duplicated declarations.

See the example files in

cpp-article/header-guards

.


<span class="hljs-comment">// unguarded.hpp</span>
<span class="hljs-keyword">class</span> A {
<span class="hljs-keyword">public</span>:
    A(<span class="hljs-keyword">int</span> a) : m_a(a) {}
    <span class="hljs-keyword">void</span> setA(<span class="hljs-keyword">int</span> a) { m_a = a; }
    <span class="hljs-keyword">int</span> getA() <span class="hljs-keyword">const</span> { <span class="hljs-keyword">return</span> m_a; }
<span class="hljs-keyword">private</span>:
    <span class="hljs-keyword">int</span> m_a;
};

<span class="hljs-comment">// guarded.hpp:</span>
<span class="hljs-preprocessor">#ifndef __GUARDED_HPP</span>
<span class="hljs-preprocessor">#define __GUARDED_HPP</span>

<span class="hljs-keyword">class</span> A {
<span class="hljs-keyword">public</span>:
    A(<span class="hljs-keyword">int</span> a) : m_a(a) {}
    <span class="hljs-keyword">void</span> setA(<span class="hljs-keyword">int</span> a) { m_a = a; }
    <span class="hljs-keyword">int</span> getA() <span class="hljs-keyword">const</span> { <span class="hljs-keyword">return</span> m_a; }
<span class="hljs-keyword">private</span>:
    <span class="hljs-keyword">int</span> m_a;
};

<span class="hljs-preprocessor">#endif <span class="hljs-comment">// __GUARDED_HPP</span></span>

The difference is that, in guarded.hpp, we surround the entire header in a conditional that will only be included if

__GUARDED_HPP

preprocessor macro isn’t defined. The first time that the preprocessor includes this file, it won’t be defined. But, since the macro is defined inside that guarded code, the next time it’s included (from the same source file, directly or indirectly), the preprocessor will see the lines between the #ifndef and the #endif and will discard all the code between them.

Note that this process happens for every source file that we compile. It means that this header file can be included once and only once for each source file. The fact that it was included from one source file won’t prevent it to be included from a different source file when that source file is compiled. It’ll just prevent it to be included more than once from the same source file.

The example file

main-guarded.cpp

includes

guarded.hpp

twice:


<span class="hljs-preprocessor">#include "guarded.hpp"</span>
<span class="hljs-preprocessor">#include "guarded.hpp"</span>

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    A a(<span class="hljs-number">5</span>);
    a.setA(<span class="hljs-number">0</span>);
    <span class="hljs-keyword">return</span> a.getA();
}

But the preprocessed output only shows one definition of class

A

:


$ g++ -E main-guarded.cpp
# 1 "main-guarded.cpp"
# 1 "&lt;built-in&gt;"
# 1 "&lt;command-line&gt;"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "&lt;command-line&gt;" 2
# 1 "main-guarded.cpp"
# 1 "guarded.hpp" 1



class A {
public:
    A(int a) : m_a(a) {}
    void setA(int a) { m_a = a; }
    int getA() const { return m_a; }
private:
    int m_a;
};
# 2 "main-guarded.cpp" 2


int main(int argc, char* argv[]) {
    A a(5);
    a.setA(0);
    return a.getA();
}

Therefore, it can be compiled without problems:


$ g++ -o guarded main-guarded.cpp

But the

main-unguarded.cpp

file includes

unguarded.hpp

twice:


<span class="hljs-preprocessor">#include "unguarded.hpp"</span>
<span class="hljs-preprocessor">#include "unguarded.hpp"</span>

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    A a(<span class="hljs-number">5</span>);
    a.setA(<span class="hljs-number">0</span>);
    <span class="hljs-keyword">return</span> a.getA();
}

And the preprocessed output shows two definitions of class A:


$ g++ -E main-unguarded.cpp
# 1 "main-unguarded.cpp"
# 1 "&lt;built-in&gt;"
# 1 "&lt;command-line&gt;"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "&lt;command-line&gt;" 2
# 1 "main-unguarded.cpp"
# 1 "unguarded.hpp" 1
class A {
public:
    A(int a) : m_a(a) {}
    void setA(int a) { m_a = a; }
    int getA() const { return m_a; }
private:
    int m_a;
};
# 2 "main-unguarded.cpp" 2
# 1 "unguarded.hpp" 1
class A {
public:
    A(int a) : m_a(a) {}
    void setA(int a) { m_a = a; }
    int getA() const { return m_a; }
private:
    int m_a;
};
# 3 "main-unguarded.cpp" 2

int main(int argc, char* argv[]) {
    A a(5);
    a.setA(0);
    return a.getA();
}

This will cause problems when compiling:


$ g++ -o unguarded main-unguarded.cpp

In file included from

main-unguarded.cpp:2:0

:


unguarded.hpp:1:7: error: redefinition of 'class A'
 class A {
       ^
In file included from main-unguarded.cpp:1:0:
unguarded.hpp:1:7: error: previous definition of 'class A'
 class A {
       ^

For the sake of brevity, I won’t use guarded headers in this article if it isn’t necessary, since most are short examples. But always guard your header files. Not your source files, which won’t be included from anywhere. Just header files.

Pass by Value and Constness of Parameters

Look at

by-value.cpp

file in

cpp-article/symbols/pass-by

:


#include &lt;vector&gt;
#include &lt;numeric&gt;
#include &lt;iostream&gt;

// std::vector, std::accumulate, std::cout, std::endl
using namespace std;

int sum(int a, const int b) {
    cout &lt;&lt; "sum(int, const int)" &lt;&lt; endl;
    const int c = a + b;
    ++a; // Possible, not const
    // ++b; // Not possible, this would result in a compilation error
    return c;
}

float sum(const float a, float b) {
    cout &lt;&lt; "sum(const float, float)" &lt;&lt; endl;
    return a + b;
}

int sum(vector&lt;int&gt; v) {
    cout &lt;&lt; "sum(vector&lt;int&gt;)" &lt;&lt; endl;
    return accumulate(v.begin(), v.end(), 0);
}

float sum(const vector&lt;float&gt; v) {
    cout &lt;&lt; "sum(const vector&lt;float&gt;)" &lt;&lt; endl;
    return accumulate(v.begin(), v.end(), 0.0f);
}

Since I use the

using namespace std

directive, I don’t have to qualify the names of symbols (functions or classes) inside the std namespace in the rest of the translation unit, which in my case is the rest of the source file. If this were a header file, I shouldn’t have inserted this directive because a header file is supposed to be included from multiple source files; this directive would bring to the global scope of each source file the entire std namespace from the point they include my header.

Even headers included after mine in those files will have those symbols in scope. This can produce name clashes since they were not expecting this to happen. Therefore, don’t use this directive in headers. Only use it in source files if you want, and only after you included all headers.

Note how some parameters are const. This means that they can’t be changed in the body of the function if we try to. It’d give a compilation error. Also, note that all the parameters in this source file are passed by value, not by reference (&) or by pointer (*). This means that the caller will make a copy of them and pass to the function. So, it doesn’t matter for the caller whether they are const or not, because if we modify them in the function body we’ll only be modifying the copy, not the original value the caller passed to the function.

Since the constness of a parameter that is passed by value (copy) doesn’t matter for the caller, it is not mangled in the function signature, as it can be seen after compiling and inspecting the object code (only the relevant output):


$ g++ -c by-value.cpp
$ nm -C by-value.o
000000000000001e T sum(float, float)
0000000000000000 T sum(int, int)
0000000000000087 T sum(std::vector&lt;float, std::allocator&lt;float&gt; &gt;)
0000000000000048 T sum(std::vector&lt;int, std::allocator&lt;int&gt; &gt;)

The signatures don’t express whether the copied parameters are const or not in the bodies of the function. It doesn’t matter. It mattered for the function definition only, to show at a glance to the reader of the function body whether those values will ever change. In the example, only half of the parameters are declared as const, so we can see the contrast, but if we want to be const-correct they should all have been declared so, since none of them are modified in the function body (and they shouldn’t).

Since it doesn’t matter for the function declaration, which is what the caller sees, we can create the

by-value.hpp

header like this:


#include &lt;vector&gt;

int sum(int a, int b);
float sum(float a, float b);
int sum(std::vector&lt;int&gt; v);
int sum(std::vector&lt;float&gt; v);

Adding the const qualifiers here is allowed (you can even qualify as const variables that aren’t const in the definition and it’ll work), but this is not necessary and it’ll only make the declarations unnecessarily verbose.

Pass by Reference

Let’s see

by-reference.cpp

:


#include &lt;vector&gt;
#include &lt;iostream&gt;
#include &lt;numeric&gt;

using namespace std;

int sum(const int&amp; a, int&amp; b) {
    cout &lt;&lt; "sum(const int&amp;, int&amp;)" &lt;&lt; endl;
    const int c = a + b;
    ++b; // Will modify caller variable
    // ++a; // Not allowed, but would also modify caller variable
    return c;
}

float sum(float&amp; a, const float&amp; b) {
    cout &lt;&lt; "sum(float&amp;, const float&amp;)" &lt;&lt; endl;
    return a + b;
}

int sum(const std::vector&lt;int&gt;&amp; v) {
    cout &lt;&lt; "sum(const std::vector&lt;int&gt;&amp;)" &lt;&lt; endl;
    return accumulate(v.begin(), v.end(), 0);
}

float sum(const std::vector&lt;float&gt;&amp; v) {
    cout &lt;&lt; "sum(const std::vector&lt;float&gt;&amp;)" &lt;&lt; endl;
    return accumulate(v.begin(), v.end(), 0.0f);
}

Constness when passing by reference matters for the caller, because it’ll tell the caller whether its argument will be modified or not by the callee. Therefore, the symbols are exported with their constness:


$ g++ -c by-reference.cpp
$ nm -C by-reference.o
0000000000000051 T sum(float&amp;, float const&amp;)
0000000000000000 T sum(int const&amp;, int&amp;)
00000000000000fe T sum(std::vector&lt;float, std::allocator&lt;float&gt; &gt; const&amp;)
00000000000000a3 T sum(std::vector&lt;int, std::allocator&lt;int&gt; &gt; const&amp;)

That should also be reflected in the header that callers will use:


#include &lt;vector&gt;

int sum(const int&amp;, int&amp;);
float sum(float&amp;, const float&amp;);
int sum(const std::vector&lt;int&gt;&amp;);
float sum(const std::vector&lt;float&gt;&amp;);

Note that I didn’t write the name of the variables in the declarations (in the header) as I’d been doing so far. This is also legal, for this example and for the previous ones. Variable names aren’t required in the declaration, since the caller doesn’t need to know how do you want to name your variable. But parameter names are generally desirable in declarations so the user can know at a glance what each parameter mean and therefore what to send in the call.

Surprisingly, variable names aren’t either needed in the definition of a function. They are only needed if you actually use the parameter in the function. But if you never use it you can leave the parameter with the type but without the name. Why would a function declare a parameter that it’d never use? Sometimes functions (or methods) are just part of an interface, like a callback interface, which defines certain parameters that are passed to the observer. The observer must create a callback with all the parameters that the interface specifies, since they’ll be all sent by the caller. But the observer may not be interested in all of them, so instead of receiving a compiler warning about an “unused parameter,” the function definition can just leave it without name.

Pass by Pointer


<span class="hljs-comment">// by-pointer.cpp:</span>
<span class="hljs-preprocessor">#include &lt;iostream&gt;</span>
<span class="hljs-preprocessor">#include &lt;vector&gt;</span>
<span class="hljs-preprocessor">#include &lt;numeric&gt;</span>

<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-keyword">int</span> sum(<span class="hljs-keyword">int</span> <span class="hljs-keyword">const</span> * a, <span class="hljs-keyword">int</span> <span class="hljs-keyword">const</span> * <span class="hljs-keyword">const</span> b) {
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"sum(int const *, int const * const)"</span> &lt;&lt; endl;
    <span class="hljs-keyword">const</span> <span class="hljs-keyword">int</span> c = *a+ *b;
    <span class="hljs-comment">// *a = 4; // Can't change. The value pointed to is const.</span>
    <span class="hljs-comment">// *b = 4; // Can't change. The value pointed to is const.</span>
    a = b; <span class="hljs-comment">// I can make a point to another const int </span>
    <span class="hljs-comment">// b = a; // Can't change where b points because the pointer itself is const.</span>
    <span class="hljs-keyword">return</span> c;
}

<span class="hljs-keyword">float</span> sum(<span class="hljs-keyword">float</span> * <span class="hljs-keyword">const</span> a, <span class="hljs-keyword">float</span> * b) {
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"sum(int const * const, float const *)"</span> &lt;&lt; endl;
    <span class="hljs-keyword">return</span> *a + *b;
}

<span class="hljs-keyword">int</span> sum(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-stl_container"><span class="hljs-built_in">vector</span>&lt;<span class="hljs-keyword">int</span>&gt;</span>* v) {
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"sum(std::vector&lt;int&gt; const *)"</span> &lt;&lt; endl;
    <span class="hljs-comment">// v-&gt;clear(); // I can't modify the const object pointed by v</span>
    <span class="hljs-keyword">const</span> <span class="hljs-keyword">int</span> c = accumulate(v-&gt;begin(), v-&gt;end(), <span class="hljs-number">0</span>);
    v = NULL; <span class="hljs-comment">// I can make v point to somewhere else</span>
    <span class="hljs-keyword">return</span> c;
}

<span class="hljs-keyword">float</span> sum(<span class="hljs-keyword">const</span> <span class="hljs-built_in">std</span>::<span class="hljs-stl_container"><span class="hljs-built_in">vector</span>&lt;<span class="hljs-keyword">float</span>&gt;</span> * <span class="hljs-keyword">const</span> v) {
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"sum(std::vector&lt;float&gt; const * const)"</span> &lt;&lt; endl;
    <span class="hljs-comment">// v-&gt;clear(); // I can't modify the const object pointed by v</span>
    <span class="hljs-comment">// v = NULL; // I can't modify where the pointer points to</span>
    <span class="hljs-keyword">return</span> accumulate(v-&gt;begin(), v-&gt;end(), <span class="hljs-number">0.0f</span>);
}

To declare a pointer to a const element (int in the example) you can declare the type as either of:


<span class="hljs-keyword">int</span> <span class="hljs-keyword">const</span> *
<span class="hljs-keyword">const</span> <span class="hljs-keyword">int</span> *

If you also want the pointer itself to be const, that is, that the pointer cannot be changed to point to something else, you add a const after the star:


int const * const
const int * const

If you want the pointer itself to be const, but not the element pointed by it:


int * const

Compare the function signatures with the demangled inspection of the object file:


$ g++ -c by-pointer.cpp
$ nm -C by-pointer.o
000000000000004a T sum(float*, float*)
0000000000000000 T sum(int const*, int const*)
0000000000000105 T sum(std::vector&lt;float, std::allocator&lt;float&gt; &gt; const*)
000000000000009c T sum(std::vector&lt;int, std::allocator&lt;int&gt; &gt; const*)

As you see, the

nm

tool uses the first notation (const after the type). Also, note that the only constness that is exported, and matters for the caller, is whether the function will modify the element pointed by the pointer or not. The constness of the pointer itself is irrelevant for the caller, since the pointer itself is always passed as a copy. The function can only make its own copy of the pointer to point to somewhere else, which is irrelevant for the caller.

So, a header file can be created as:


#include &lt;vector&gt;

int sum(int const* a, int const* b);
float sum(float* a, float* b);
int sum(std::vector&lt;int&gt;* const);
float sum(std::vector&lt;float&gt;* const);

Passing by pointer is like passing by reference. One difference is that when you pass by reference the caller is expected and assumed to have passed a valid element’s reference, not pointing to NULL or other invalid address, while a pointer could point to NULL for example. Pointers can be used instead of references when passing NULL have a special meaning.

Since C++11 values can also be passed with move semantics. This topic will not be treated in this article but can be studied in other articles like Argument Passing in C++.

Another related topic that won’t be covered here is how to call all those functions. If all those headers are included from a source file but are not called, the compilation and linkage will succeed. But if you want to call all functions, there will be some errors because some calls will be ambiguous. The compiler will be able to choose more than one version of sum for certain arguments, especially when choosing whether to pass by copy or by reference (or const reference). That analysis is out of the scope of this article.

Compiling with Different Flags

Let’s see now a real-life situation related to this subject where hard-to-find bugs can show up.

Go to directory

cpp-article/diff-flags

and look at

Counters.hpp

:


<span class="hljs-keyword">class</span> Counters {
<span class="hljs-keyword">public</span>:
    Counters() :
<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug builds</span></span>
    m_debugAllCounters(<span class="hljs-number">0</span>),
<span class="hljs-preprocessor">#endif</span>
    m_counter1(<span class="hljs-number">0</span>),
    m_counter2(<span class="hljs-number">0</span>) {
    }  

<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug build</span></span>
<span class="hljs-preprocessor">#endif</span>
    <span class="hljs-keyword">void</span> inc1() {
<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug build</span></span>
        ++m_debugAllCounters;
<span class="hljs-preprocessor">#endif  </span>
        ++m_counter1;
    }
   
    <span class="hljs-keyword">void</span> inc2() {
<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug build</span></span>
        ++m_debugAllCounters;
<span class="hljs-preprocessor">#endif  </span>
        ++m_counter2;
    }

<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug build</span></span>
    <span class="hljs-keyword">int</span> getDebugAllCounters() { <span class="hljs-keyword">return</span> m_debugAllCounters; }
<span class="hljs-preprocessor">#endif</span>
    <span class="hljs-keyword">int</span> get1() <span class="hljs-keyword">const</span> { <span class="hljs-keyword">return</span> m_counter1; }
    <span class="hljs-keyword">int</span> get2() <span class="hljs-keyword">const</span> { <span class="hljs-keyword">return</span> m_counter2; }
   
<span class="hljs-keyword">private</span>:
<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// Enabled in debug builds</span></span>
    <span class="hljs-keyword">int</span> m_debugAllCounters;
<span class="hljs-preprocessor">#endif</span>
    <span class="hljs-keyword">int</span> m_counter1;
    <span class="hljs-keyword">int</span> m_counter2;
};

This class has two counters, which start as zero and can be incremented or read. For debug builds, which is how I’ll call builds where the

NDEBUG

macro isn’t defined, I also add a third counter, which will be incremented every time that any of the other two counters are incremented. That will be a kind of debug helper for this class. Many third-party library classes or even built-in C++ headers (depending on the compiler) use tricks like this to allow different levels of debugging. This allows debug builds to detect iterators going out of range and other interesting things that the library maker could think about. I’ll call release builds “builds where the

NDEBUG

macro is defined.”

For release builds, the precompiled header looks like (I use

grep

to remove blank lines):


$ g++ -E -DNDEBUG Counters.hpp | grep -v -e '^$'
# 1 "Counters.hpp"
# 1 "&lt;built-in&gt;"
# 1 "&lt;command-line&gt;"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "&lt;command-line&gt;" 2
# 1 "Counters.hpp"
class Counters {
public:
    Counters() :
    m_counter1(0),
    m_counter2(0) {
    }
    void inc1() {
        ++m_counter1;
    }
    void inc2() {
        ++m_counter2;
    }
    int get1() const { return m_counter1; }
    int get2() const { return m_counter2; }
private:
    int m_counter1;
    int m_counter2;
};

While for debug builds, it’ll look like:


$ g++ -E Counters.hpp | grep -v -e '^$'
# 1 "Counters.hpp"
# 1 "&lt;built-in&gt;"
# 1 "&lt;command-line&gt;"
# 1 "/usr/include/stdc-predef.h" 1 3 4
# 1 "&lt;command-line&gt;" 2
# 1 "Counters.hpp"
class Counters {
public:
    Counters() :
    m_debugAllCounters(0),
    m_counter1(0),
    m_counter2(0) {
    }
    void inc1() {
        ++m_debugAllCounters;
        ++m_counter1;
    }
    void inc2() {
        ++m_debugAllCounters;
        ++m_counter2;
    }
    int getDebugAllCounters() { return m_debugAllCounters; }
    int get1() const { return m_counter1; }
    int get2() const { return m_counter2; }
private:
    int m_debugAllCounters;
    int m_counter1;
    int m_counter2;
};

There is one more counter in debug builds, as I explained earlier.

I also created some helper files.


// increment1.hpp:
// Forward declaration so I don't have to include the entire header here
class Counters;

int increment1(Counters&amp;);

// increment1.cpp:
#include "Counters.hpp"

void increment1(Counters&amp; c) {
    c.inc1();
}

<span class="hljs-comment">// increment2.hpp:</span>
<span class="hljs-comment">// Forward declaration so I don't have to include the entire header here</span>
<span class="hljs-keyword">class</span> Counters;

<span class="hljs-keyword">int</span> increment2(Counters&amp;);

<span class="hljs-comment">// increment2.cpp:</span>
<span class="hljs-preprocessor">#include "Counters.hpp"</span>

<span class="hljs-keyword">void</span> increment2(Counters&amp; c) {
    c.inc2();
}

<span class="hljs-comment">// main.cpp:</span>
<span class="hljs-preprocessor">#include &lt;iostream&gt;</span>
<span class="hljs-preprocessor">#include "Counters.hpp"</span>
<span class="hljs-preprocessor">#include "increment1.hpp"</span>
<span class="hljs-preprocessor">#include "increment2.hpp"</span>

<span class="hljs-keyword">using</span> <span class="hljs-keyword">namespace</span> <span class="hljs-built_in">std</span>;

<span class="hljs-keyword">int</span> main(<span class="hljs-keyword">int</span> argc, <span class="hljs-keyword">char</span>* argv[]) {
    Counters c;
    increment1(c); <span class="hljs-comment">// 3 times</span>
    increment1(c);
    increment1(c);
    increment2(c); <span class="hljs-comment">// 4 times</span>
    increment2(c);
    increment2(c);
    increment2(c);
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"c.get1(): "</span> &lt;&lt; c.get1() &lt;&lt; endl; <span class="hljs-comment">// Should be 3</span>
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"c.get2(): "</span> &lt;&lt; c.get2() &lt;&lt; endl; <span class="hljs-comment">// Should be 4</span>
<span class="hljs-preprocessor">#ifndef NDEBUG <span class="hljs-comment">// For debug builds</span></span>
    <span class="hljs-built_in">cout</span> &lt;&lt; <span class="hljs-string">"c.getDebugAllCounters(): "</span> &lt;&lt; c.getDebugAllCounters() &lt;&lt; endl; <span class="hljs-comment">// Should be 3 + 4 = 7</span>
<span class="hljs-preprocessor">#endif</span>
    <span class="hljs-keyword">return</span> <span class="hljs-number">0</span>;
}

And a

Makefile

that can customize the compiler flags for

increment2.cpp

only:


all: main.o increment1.o increment2.o
    g++ -o diff-flags main.o increment1.o increment2.o

main.o: main.cpp increment1.hpp increment2.hpp Counters.hpp
    g++ -c -O2 main.cpp

increment1.o: increment1.cpp Counters.hpp
    g++ -c $(CFLAGS) -O2 increment1.cpp

increment2.o: increment2.cpp Counters.hpp
    g++ -c -O2 increment2.cpp

clean:
    rm -f *.o diff-flags

So, let’s compile it all in debug mode, without defining

NDEBUG

:


$ CFLAGS='' make
g++ -c -O2 main.cpp
g++ -c  -O2 increment1.cpp
g++ -c -O2 increment2.cpp
g++ -o diff-flags main.o increment1.o increment2.o

Now run:


$ ./diff-flags
c.get1(): 3
c.get2(): 4
c.getDebugAllCounters(): 7

The output is just as expected. Now let’s compile just one of the files with

NDEBUG

defined, which would be release mode, and see what happens:


$ make clean
rm -f *.o diff-flags
$ CFLAGS='-DNDEBUG' make
g++ -c -O2 main.cpp
g++ -c -DNDEBUG -O2 increment1.cpp
g++ -c -O2 increment2.cpp
g++ -o diff-flags main.o increment1.o increment2.o
$ ./diff-flags
c.get1(): 0
c.get2(): 4
c.getDebugAllCounters(): 7

The output isn’t as expected.

increment1

function saw a release version of the Counters class, in which there are only two int member fields. So, it incremented the first field, thinking that it was

m_counter1

, and didn’t increment anything else since it knows nothing about the

m_debugAllCounters

field. I say that

increment1

incremented the counter because the inc1 method in

Counter

is inline, so it was inlined in

increment1

function body, not called from it. The compiler probably decided to inline it because the

-O2

optimization level flag was used.

So,

m_counter1

was never incremented and

m_debugAllCounters

was incremented instead of it by mistake in

increment1

. That’s why we see 0 for

m_counter1

but we still see 7 for

m_debugAllCounters

.

Working in a project where we had tons of source files, grouped in many static libraries, it happened that some of those libraries were compiled without debugging options for

std::vector

, and others were compiled with those options.

Probably at some point, all libraries were using the same flags, but as time passed, new libraries were added without taking those flags into consideration (they weren’t default flags, they had been added by hand). We used an IDE to compile, so to see the flags for each library, you had to dig into tabs and windows, having different (and multiple) flags for different compilation modes (release, debug, profile…), so it was even harder to note that the flags weren’t consistent.

This caused that in the rare occasions when an object file, compiled with one set of flags, passed a

std::vector

to an object file compiled with a different set of flags, which did certain operations on that vector, the application crashed. Imagine that it wasn’t easy to debug, since the crash was reported to happen in the release version, and it didn’t happen in the debug version (at least not in the same situations that were reported).

The debugger also did crazy things because it was debugging very optimized code. The crashes were happening in correct and trivial code.

The Compiler Does a Lot More Than You May Think

In this article, you have learned about some of the basic language constructs of C++ and how the compiler works with them, starting from the processing stage to the linking stage. Knowing how it works can help you look at the whole process differently and give you more insight into these processes that we take for granted in C++ development.

From a three-step compilation process to mangling of function names and producing different function signatures in different situations, the compiler does a lot of work to offer the power of C++ as a compiled programming language.

I hope you will find the knowledge from this article useful in your C++ projects.

This article is originally posted in Toptal.

Leave a Reply

Your email address will not be published. Required fields are marked *