Extreme Integers – Doom from Below

As a beginner or immediate C++ programmer, you heard never mixing unsigned and signed integer types or avoiding unsigned integers at all. There was also this talk about undefined behaviour. Yet, in embedded software development, there is no way around unsigned integers – so what is behind all these warnings?

Wonderful that you strive to know the facts! I’m gladly taking you on a short journey – exploring some of the deepest abysses of the C++ language. After this experience, I hope you will perfectly understand how the compiler handles integer operations and therefore write better and more reliable code.

Table of Contents

[show]

Preface

Before we start, I like to talk about a few boring but crucial premises. Even though I address embedded development, all the facts are valid for desktop applications too. I will use a modern C++ syntax in my examples but keep all code on the level of the C++17 standard.

Be aware of the differences between C and C++. Not all things said for C++ are automatically true for C. If you are still using C, please move on to a more modern language, like Rust, Go or C++.

You will find all code examples in a git repository with CMake build files. As these are merely demonstrations of the topics I explain in this article, I omitted most of the usual comments and API documentation to keep them small and readable.

GitHub Repository

Integers and Processors

Wikipedia has an interesting article about integers. It states the word integer is Latin, meaning “whole” (or “untouched”). In mathematics, the term defines positive and negative whole numbers, including zero. This is basically true for computer processors as well, but for practical reasons, the numbers are limited to a certain size, and there is a special type with no negative numbers.

The size of an integer is given as the number of bits, and it is either signed or unsigned. Signed integers can represent negative numbers, while unsigned ones can only represent positive ones. On the hardware level, there is no difference between signed and unsigned integers. The only difference is that for signed integers, the highest bit is used to indicate a negative value.

The illustration above shows the most commonly used integer sizes: 8, 16, 32 and 64 bit. Many processors can handle 128-bit and larger integers too. For each size, there is a signed and unsigned version. On the left side of the bit illustration, I wrote the type names from the standard library for a type of the given size.

The format of this type specification is simple: Unsigned integers start with “uint“, signed ones start with “int“. This is followed by the number of bits 8, 16, 32 or 64 and ends with the suffix _t mark it as a type of the standard library.

Small Excursion through the C++ Standard

The C++ standard is particular fuzzy in how integer types are defined:

There are five standard signed integer types: “signed char”, “short int”, “int”, “long int”, and “long long int”. In this list, each type provides at least as much storage as those preceding it in the list. (…) The range of representable values for a signed integer type is −2^N−1 to 2^N−1 − 1 (inclusive), where N is called the width of the type.

– from ISO/IEC 14882:2020, Fundamental types

Also, the standard defines unsigned integers like this:

For each of the standard signed integer types, there exists a corresponding (but different) standard unsigned integer type: “unsigned char”, “unsigned short int”, “unsigned int”, “unsigned long int”, and “unsigned long long int”. (…) The range of representable values for the unsigned type is 0 to 2^N−1 (inclusive); arithmetic for the unsigned type is performed modulo 2^N.

– from ISO/IEC 14882:2020, Fundamental types

This two sections leave much room for interpretation. Later, the sizes and behaviour of these fundamental types is further defined. There are minimum sizes defined for the types:

Type	Minimum Size in Bits
signed char	8
short	16
int	16 (!)
long	32
long long	64

The minimum size of the commonly used int type can be as small as 16 bits. A C++ implementation could also decide to make these types larger than expected, like 32 bits for char, short and int.

As the sizes of short, int and long depending on the compiler implementation for a platform, the standard library comes with the header <cstdint> where a set of types with defined sizes is declared:

int8_t, uint8_t,
int16_t, uint16_t,
int32_t, uint32_t,
int64_t, uint64_t

Never rely on specific sizes of short, int and long!

Use the intX_t and uintX_t type definitions from <cstdint> to get an integer of the specified size. If an integer of that size does not exist for the platform, this definition must not be declared.

So if you rely on a 32-bit value, but someone tries to compile your code on a platform that does not support 32-bit values, it will fail instead of creating a program with unpredicted behaviour.

Test the Language Implementation

Let’s start with a lightweight test of our compiler environment. We use a simple program checking out the implementation of all fundamental integer types.

Project 01-Types

#include <TypeInfo.hpp>

auto main() -> int {
    std::cout << "Fundamental Language Types:\n";
    printTypeInfo<char              >("char");
    printTypeInfo<wchar_t           >("wchar_t");
    printTypeInfo<signed char       >("signed char");
    printTypeInfo<unsigned char     >("unsigned char");
    printTypeInfo<signed short      >("signed short");
    printTypeInfo<unsigned short    >("unsigned short");
    printTypeInfo<signed int        >("signed int");
    printTypeInfo<unsigned int      >("unsigned int");
    printTypeInfo<signed long       >("signed long");
    printTypeInfo<unsigned long     >("unsigned long");
    printTypeInfo<signed long long  >("signed long long");
    printTypeInfo<unsigned long long>("unsigned long long");
    printTypeInfo<float             >("float");
    printTypeInfo<double            >("double");
    printTypeInfo<long double       >("long double");
    printTypeInfo<bool              >("bool");
    std::cout << "\nDefinitions from <cstdint>:\n";
    printTypeInfo<int8_t            >("int8_t");
    printTypeInfo<uint8_t           >("uint8_t");
    printTypeInfo<int16_t           >("int16_t");
    printTypeInfo<uint16_t          >("uint16_t");
    printTypeInfo<int32_t           >("int32_t");
    printTypeInfo<uint32_t          >("uint32_t");
    printTypeInfo<int64_t           >("int64_t");
    printTypeInfo<uint64_t          >("uint64_t");
// (...)
    return 0;
}

I use a template function printTypeInfo to display basic information about the type on the console. This template function is implemented in the header TypeInfo.hpp. As you see, I added char, but also the signed and unsigned version of char. From the C++ standard:

Type char is a distinct type that has an implementation-defined choice of “signed char” or “unsigned char” as its underlying type. The values of type char can represent distinct codes for all members of the implementation’s basic character set. (…)

– from ISO/IEC 14882:2020, Fundamental types

First, char is a distinct type, and second, it is either based on a signed char or an unsigned char.

When I compile and run the small executable on my ARM based computer using clang, I get the following output for the language-defined types.

Fundamental Language Types:
                char: int8
             wchar_t: int32
         signed char: int8
       unsigned char: uint8
        signed short: int16
      unsigned short: uint16
          signed int: int32
        unsigned int: uint32
         signed long: int64
       unsigned long: uint64
    signed long long: int64
  unsigned long long: uint64
               float: float
              double: double
         long double: double
                bool: uint8

The main types defined in <cstdint> produce the output below:

Definitions from <cstdint>:
              int8_t: int8
             uint8_t: uint8
             int16_t: int16
            uint16_t: uint16
             int32_t: int32
            uint32_t: uint32
             int64_t: int64
            uint64_t: uint64

Compile the example with the compilers and platforms of your choice. You will most likely see different results, especially for short and long. Also often char is signed or unsigned, depending on the platform.

Contrary to the language types, like int: If the types int8_t – uint64_t from the <cstdint> header are defined for your compile environment, they must have the expected size and not be smaller or larger. This is defined in the C standard so that you can rely on these sizes.

The <cstdint> header defines another set of integer types: int_leastX_t and int_fastX_t. These define either integer with at least X bits or the fastest integer for a CPU with at least X bits. Also there is intmax_t with the largest int for the platform, and intptr_t for an int that can hold a pointer.

More Definitions from <cstdint>:
         int_fast8_t: int8
        uint_fast8_t: uint8
        int_fast16_t: int16
       uint_fast16_t: uint16
        int_fast32_t: int32
       uint_fast32_t: uint32
        int_fast64_t: int64
       uint_fast64_t: uint64
        int_least8_t: int8
       uint_least8_t: uint8
       int_least16_t: int16
      uint_least16_t: uint16
       int_least32_t: int32
      uint_least32_t: uint32
       int_least64_t: int64
      uint_least64_t: uint64
            intmax_t: int64
           uintmax_t: uint64
            intmax_t: int64
           uintmax_t: uint64

show less

(Optional: Compare the result with GCC on a 64-bit Linux system)

Fundamental Language Types:
                char: uint8
             wchar_t: uint32
         signed char: int8
       unsigned char: uint8
        signed short: int16
      unsigned short: uint16
          signed int: int32
        unsigned int: uint32
         signed long: int64
       unsigned long: uint64
    signed long long: int64
  unsigned long long: uint64
               float: float
              double: double
         long double: long double
                bool: uint8

Definitions from <cstdint>:
              int8_t: int8
             uint8_t: uint8
             int16_t: int16
            uint16_t: uint16
             int32_t: int32
            uint32_t: uint32
             int64_t: int64
            uint64_t: uint64

More Definitions from <cstdint>:
         int_fast8_t: int8
        uint_fast8_t: uint8
        int_fast16_t: int64
       uint_fast16_t: uint64
        int_fast32_t: int64
       uint_fast32_t: uint64
        int_fast64_t: int64
       uint_fast64_t: uint64
        int_least8_t: int8
       uint_least8_t: uint8
       int_least16_t: int16
      uint_least16_t: uint16
       int_least32_t: int32
      uint_least32_t: uint32
       int_least64_t: int64
      uint_least64_t: uint64
            intmax_t: int64
           uintmax_t: uint64
            intmax_t: int64
           uintmax_t: uint64

show less

Unexpected Results with Integer Literals

To use certain integer values in your program, you’re writing them as literal values. There are many options for writing integer literals: Prefixes controlling the used number system and suffixes selecting the created integer type.

Prefixes

Prefix	Meaning	Examples
no prefix	Decimal Number	`0` `56'293` `7000`
`0b` or `0B`	Binary Number	`0b10010011` `0B00110011'00001111`
`0`	Octal Number	`074`
`0x` or `0X`	Hexadecimal Number	`0xffff` `0X20` `0x1000'47ab`

You can (and should) use the apostrophe character ' to group digits for all number systems. If a decimal number literal contains a decimal point ., it is interpreted as a floating point number.

Suffixes

Suffix	Meaning	Examples
no suffix	no change	`5780`
`l` or `L`	`long`	`5780l` `5780ul`
`u` or `U`	`unsigned`	`5780u`
`ll` or `LL`	`long long`	`5780ll` `5780ull`

There are no suffixes to create char, unsigned char, signed char and short values from numbers. To create a char, you have to use the syntax like '\xff'.

Project 02-literals

Let’s try this out and see what types we get from several different literals:

#include <TypeInfo.hpp>


auto main() -> int {
    std::cout << "Type from Literal:\n";
    std::cout << "100         => " << TypeInfo(100).str() << "\n";;
    std::cout << "070         => " << TypeInfo(070).str() << "\n";;
    std::cout << "0b10001000  => " << TypeInfo(0b10001000).str() << "\n";;
    std::cout << "0b10001000u => " << TypeInfo(0b10001000u).str() << "\n";;
    std::cout << "0x1000      => " << TypeInfo(0x1000).str() << "\n";;
    std::cout << "0x1000u     => " << TypeInfo(0x1000u).str() << "\n";;
    std::cout << "100u        => " << TypeInfo(100u).str() << "\n";;
    std::cout << "100l        => " << TypeInfo(100l).str() << "\n";;
    std::cout << "100ul       => " << TypeInfo(100ul).str() << "\n";;
    std::cout << "100ll       => " << TypeInfo(100ll).str() << "\n";;
    std::cout << "100ull      => " << TypeInfo(100ull).str() << "\n";;
}

After compiling this code, I get the following output:

Type from Literal:
100         => int32{100}
070         => int32{56}
0b10001000  => int32{136}
0b10001000u => uint32{136}
0x1000      => int32{4096}
0x1000u     => uint32{4096}
100u        => uint32{100}
100l        => int64{100}
100ul       => uint64{100}
100ll       => int64{100}
100ull      => uint64{100}

All literals without suffixes are converted into the int type, if you add u for unsigned, you get an unsigned int. Adding l will produce long types and ll will create long long types.

How Negative Numbers are Handled

You may have wondered why I only used positive values so far, even for the signed integer types. The reason is there are no negative integer literals in C++. Instead, the C++ language defines unary + and - operands. While the unary + operand has no effect, the functionality of the unary - operand is defined as follows:

The operand of the unary – operator shall have arithmetic or unscoped enumeration type and the result is the negation of its operand. Integral promotion is performed on integral or enumeration operands. The negative of an unsigned quantity is computed by subtracting its value from 2ⁿ, where n is the number of bits in the promoted operand. The type of the result is the type of the promoted operand.

– from ISO/IEC 14882:2020, Unary operators

If you write a negative integer literal, like -500, the compiler interprets this as negation(500). This is no problem for most integer literals, but at the extremes, you get side effects.

Let’s compile the following project and check its output:

Project 03-negation

#include <TypeInfo.hpp>


void literals() {
    std::cout << "Unexpected Side Effects:\n";
    auto int8a = std::numeric_limits<int8_t>::min();
    auto int8b = -128;
    int8_t int8c = -128;
    std::cout << "int8a = " << TypeInfo(int8a).str() << "\n";
    std::cout << "int8b = " << TypeInfo(int8b).str() << "\n";
    std::cout << "int8c = " << TypeInfo(int8c).str() << "\n\n";

    auto int16a = std::numeric_limits<int16_t>::min();
    auto int16b = -32768;
    int16_t int16c = -32768;
    std::cout << "int16a = " << TypeInfo(int16a).str() << "\n";
    std::cout << "int16b = " << TypeInfo(int16b).str() << "\n";
    std::cout << "int16c = " << TypeInfo(int16c).str() << "\n\n";

    auto int32a = std::numeric_limits<int32_t>::min();
    auto int32b = -2147483648;
    int32_t int32c = -2147483648;
    std::cout << "int32a = " << TypeInfo(int32a).str() << "\n";
    std::cout << "int32b = " << TypeInfo(int32b).str() << "\n";
    std::cout << "int32c = " << TypeInfo(int32c).str() << "\n\n";

    auto int64a = std::numeric_limits<int64_t>::min();
    auto int64b = -9223372036854775808ll;
    int64_t int64c = -9223372036854775808ll;
    std::cout << "int64a = " << TypeInfo(int64a).str() << "\n";
    std::cout << "int64b = " << TypeInfo(int64b).str() << "\n";
    std::cout << "int64c = " << TypeInfo(int64c).str() << "\n\n";
}

For each signed integer type, we create a variable and initialise it with the smallest possible value. First using std::numeric_limits<T>::min(), second as integer literal with auto as type and last with the same integer literal – forcing it into the expected type.

If you compile the project, you get the following warning:

warning: integer literal is too large to be represented in a signed integer type, interpreting as unsigned [-Wimplicitly-unsigned-literal]
    auto int64b = -9223372036854775808ll;

The reason for this warning is how the compiler reads the code. The literal is read without the negative sign and tried to match into the largest possible signed integer. This is not possible, as the number is too large. The compiler is forced to interpret the number as an unsigned integer, which is why it issues this warning.

After reading the literal, the unary negation operator is applied to the value. Yet, this operator has in this special case no effect, as you will see shortly.

Signed Integer Ranges are not Balanced

The illustration below illustrates this fact using a 4-bit integer.

As you can see, a signed integer can represent more negative than positive numbers. This is true of all signed integers. There is always one negative value more, which is the source of many problems.

Analysing the Output from the Project

With this knowledge, we understand the strange output from the project. The code was compiled using clang on a 64-bit AMD platform.

Unexpected Side Effects:
int8a = int8{-128}
int8b = int32{-128}
int8c = int8{-128}

int16a = int16{-32768}
int16b = int32{-32768}
int16c = int16{-32768}

int32a = int32{-2147483648}
int32b = int64{-2147483648}
int32c = int32{-2147483648}

int64a = int64{-9223372036854775808}
int64b = uint64{9223372036854775808}
int64c = int64{-9223372036854775808}

(Optional: See the (correct) results compiled with GCC on a 64-bit Linux platform.)

Unexpected Side Effects:
int8a = int8{-128}
int8b = int32{-128}
int8c = int8{-128}

int16a = int16{-32768}
int16b = int32{-32768}
int16c = int16{-32768}

int32a = int32{-2147483648}
int32b = int64{-2147483648}
int32c = int32{-2147483648}

int64a = int64{-9223372036854775808}
int64b = int128{-9223372036854775808}
int64c = int64{-9223372036854775808}

Side Effects:
-9'223'372'036'854'775'808ll / 10'000'000 = int128{-922337203685}
int64_t{-9'223'372'036'854'775'808ll / 10'000'000} = int64{-922337203685}
std::numeric_limits<int64_t>::min() / 10'000'000 = int64{-922337203685}

Because GCC is using a 128-bit integer, these results are correct.

show less

The 8-bit and 16-bit values behave as expected, as there is no suffix to limit an integer literal to smaller types than int, the values -128 and -32678 were interpreted as signed 32-bit values.

The first interesting oddity is how variable int32b with value -2'147'483'648 is interpreted as 64-bit, even if it perfectly fits into a signed 32-bit integer. As you now understand, a compiler has to come to this conclusion.

If an integer-literal cannot be represented by any type in its list and an extended integer type can represent its value, it may have that extended integer type. (…) A program is ill-formed if one of its translation units contains an integer-literal that cannot be represented by any of the allowed types.

– from ISO/IEC 14882:2020, Integer literals

The compiler reads the literal 2'147'483'648, which is larger than the largest signed 32-bit integer (2'147'483'647). Therefore, it uses a 64-bit integer.
Next, the negate operator is applied to the literal, which converts the 64-bit integer into a negative integer by subtracting its value from 2⁶⁴.

If you force the value back into a signed 32-bit integer, you end up with the correct value – this is also the case with the last problem, variable int64b.

The compiler reads the literal 9'223'372'036'854'775'808. It does not fit into a signed 64-bit integer, and there is no larger fundamental integer type. Therefore, the compiler puts the value into an unsigned 64-bit value, issuing a warning.
Now, the negate operator is applied to the literal:
2⁶⁴ – 9'223'372'036'854'775'808 => 9'223'372'036'854'775'808

Compared to the change of the signed 32-bit to the signed 64-bit value, this situation is problematic (why the compiler issued the warning). If this literal is used as part of an expression, an unsigned type is used that could cause unexpected results.

void sideEffects() {
    std::cout << "Side Effects:\n";
    auto r1 = -9'223'372'036'854'775'808ll / 10'000'000;
    std::cout << "-9'223'372'036'854'775'808ll / 10'000'000 = " << TypeInfo(r1).str() << "\n";
    int64_t r2 = -9'223'372'036'854'775'808ll / 10'000'000;
    std::cout << "int64_t{-9'223'372'036'854'775'808ll / 10'000'000} = " << TypeInfo(r2).str() << "\n";
    auto r3 = std::numeric_limits<int64_t>::min() / 10'000'000;
    std::cout << "std::numeric_limits<int64_t>::min() / 10'000'000 = " << TypeInfo(r3).str() << "\n";
}

The output from this part is shown below:

Side Effects:
-9'223'372'036'854'775'808ll / 10'000'000 = uint64{922337203685}
int64_t{-9'223'372'036'854'775'808ll / 10'000'000} = int64{922337203685}
std::numeric_limits<int64_t>::min() / 10'000'000 = int64{-922337203685}

Because the dividend of the division is an unsigned integer, the result is a positive value instead of a negative one. If the result of the expression is assigned to a signed integer, the problem remains.

The safe way to get the correct result is by using std::numeric_limits<int64_t>::min().

If you need an integer literal of a defined size and type, initialize it like this: int16_t{-12}
Use std::numeric_limits<T>::min() and std::numeric_limits<T>::max() to use signed integer variables with the largest or smallest values. While using the maximum value as literal is not problematic, by using numeric_limits you spell out your intention to use a special value in the integer range.
Be aware that the minus sign in front of numbers is an operator and not part of the literal.
If undefined behaviour is involved, the code may work fine with one compiler or even one configuration (debug) but fail to be compiled by another compiler or in another configuration (release).
Enable all warnings, e.g. with -Wall and -Wextra. Never ignore warnings.

Signed Integer Math is Strange

Math with signed integers has several weaknesses. Let’s discover them. First, as mentioned before, signed integers have more negative numbers than positive ones. There is also always one negative number that cannot be converted into a positive equivalent.

Make Negative Numbers Positive / Project 04-math

void makePositive() {
    std::cout << "Make Positive:\n";
    int8_t r1 = std::numeric_limits<int8_t>::min() * int8_t{-1};
    std::cout << "int8_t r1 = std::numeric_limits<int8_t>::min() * int8_t{-1}; r1 = " << TypeInfo(r1).str() << "\n";
    int32_t r2 = std::numeric_limits<int32_t>::min() * -1;
    std::cout << "int32_t r2 = std::numeric_limits<int8_t>::min() * -1; r2 = " << TypeInfo(r2).str() << "\n";
    int8_t r3 = std::abs(std::numeric_limits<int8_t>::min());
    std::cout << "int8_t r3 = std::abs(std::numeric_limits<int8_t>::min()); r3 = " << TypeInfo(r3).str() << "\n";
    int32_t r4 = std::abs(std::numeric_limits<int32_t>::min());
    std::cout << "int32_t r4 = std::abs(std::numeric_limits<int32_t>::min()); r4 = " << TypeInfo(r4).str() << "\n\n";
}

If I compile the code above, I get the following two warning messages:

04-math/src/main.cpp:12:54: warning: overflow in expression; result is -2147483648 with type 'int' [-Winteger-overflow]
    int32_t r2 = std::numeric_limits<int32_t>::min() * -1;
                                                     ^
04-math/src/main.cpp:10:52: warning: implicit conversion from 'int' to 'int8_t' (aka 'signed char') changes value from 128 to -128 [-Wconstant-conversion]
    int8_t r1 = std::numeric_limits<int8_t>::min() * int8_t{-1};
           ~~   ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~~~~~~~~~~

Because of the two warning messages, we have to expect problems with the first two calculations (r1 and r2). For the last two calculations, there are no warnings.

Make Positive:
int8_t r1 = std::numeric_limits<int8_t>::min() * int8_t{-1}; r1 = int8{-128}
int32_t r2 = std::numeric_limits<int8_t>::min() * -1; r2 = int32{-2147483648}
int8_t r3 = std::abs(std::numeric_limits<int8_t>::min()); r3 = int8{-128}
int32_t r4 = std::abs(std::numeric_limits<int32_t>::min()); r4 = int32{-2147483648}

By using auto the compiler can choose the result type:

void makePositiveWithAuto() {
    std::cout << "Make Positive using auto:\n";
    auto r1 = std::numeric_limits<int8_t>::min() * int8_t{-1};
    std::cout << "auto r1 = std::numeric_limits<int8_t>::min() * int8_t{-1}; r1 = " << TypeInfo(r1).str() << "\n";
    auto r2 = std::numeric_limits<int32_t>::min() * -1;
    std::cout << "auto r2 = std::numeric_limits<int32_t>::min() * -1; r2 = " << TypeInfo(r2).str() << "\n";
    auto r3 = std::abs(std::numeric_limits<int8_t>::min());
    std::cout << "auto r3 = std::abs(std::numeric_limits<int8_t>::min()); r3 = " << TypeInfo(r3).str() << "\n";
    auto r4 = std::abs(std::numeric_limits<int32_t>::min());
    std::cout << "auto r4 = std::abs(std::numeric_limits<int32_t>::min()); r4 = " << TypeInfo(r4).str() << "\n";
    auto r5 = static_cast<uint32_t>(std::abs(std::numeric_limits<int32_t>::min()));
    std::cout << "auto r5 = static_cast<uint32_t>(std::abs(std::numeric_limits<int32_t>::min())); r5 = " << TypeInfo(r5).str() << "\n\n";
}

Make Positive using auto:
auto r1 = std::numeric_limits<int8_t>::min() * int8_t{-1}; r1 = int32{128}
auto r2 = std::numeric_limits<int32_t>::min() * -1; r2 = int32{-2147483648}
auto r3 = std::abs(std::numeric_limits<int8_t>::min()); r3 = int32{128}
auto r4 = std::abs(std::numeric_limits<int32_t>::min()); r4 = int32{-2147483648}
auto r5 = static_cast<uint32_t>(std::abs(std::numeric_limits<int32_t>::min())); r5 = uint32{2147483648}

Because int8_t is automatically converted into an int32_t, these results are correct now.

As the literal -1 is interpreted as int, which is defined as a 32-bit integer, the multiplication for r2 and r4 does not change the integer size of the result. As there is no matching positive number of the minimum, the result is incorrect.

The compiler issues a warning for the expression at r2, but there is no warning if you use std::abs. For the result r4 the function std::abs has no effect, because bit-wise negation is correct.

If you cast the result to an unsigned integer, as shown with r5, you see the correct result.

Make Signed to Unsigned Absolute

If your software at one point switches from signed to unsigned math and you have to deal with negative numbers, one solution is a simple function converting signed into absolute unsigned numbers.

template<typename T>
constexpr auto unsignedAbs(T value) -> std::make_unsigned_t<T> {
    static_assert(std::is_integral_v<T>);
    if constexpr (std::is_signed_v<T>) {
        using R = std::make_unsigned_t<T>;
        return (value == std::numeric_limits<T>::min()) ?
            (static_cast<R>(std::numeric_limits<T>::max()) + R{1u}) :
            static_cast<R>(std::abs(value));
    } else {
        return value;
    }
}


void makeUnsignedPositive() {
    std::cout << "Make Unsigned Positive:\n";
    auto r1 = unsignedAbs(std::numeric_limits<int8_t>::min());
    std::cout << "auto r1 = unsignedAbs(std::numeric_limits<int8_t>::min()); r1 = " << TypeInfo(r1).str() << "\n";
    auto r2 = unsignedAbs(std::numeric_limits<int16_t>::min());
    std::cout << "auto r2 = unsignedAbs(std::numeric_limits<int16_t>::min()); r2 = " << TypeInfo(r2).str() << "\n";
    auto r3 = unsignedAbs(std::numeric_limits<int32_t>::min());
    std::cout << "auto r3 = unsignedAbs(std::numeric_limits<int32_t>::min()); r3 = " << TypeInfo(r3).str() << "\n";
    auto r4 = unsignedAbs(std::numeric_limits<int64_t>::min());
    std::cout << "auto r4 = unsignedAbs(std::numeric_limits<int64_t>::min()); r4 = " << TypeInfo(r4).str() << "\n\n";
}

This function has the benefit of returning a valid positive number. If you look at the results, you get:

Make Unsigned Positive:
auto r1 = unsignedAbs(std::numeric_limits<int8_t>::min()); r1 = uint8{128}
auto r2 = unsignedAbs(std::numeric_limits<int16_t>::min()); r2 = uint16{32768}
auto r3 = unsignedAbs(std::numeric_limits<int32_t>::min()); r3 = uint32{2147483648}
auto r4 = unsignedAbs(std::numeric_limits<int64_t>::min()); r4 = uint64{9223372036854775808}

Remember that there is one exceptional negative value that cannot be converted into a positive one.
Implement a strategy to deal with this situation:
- Limit your number range to keep all calculations in a valid range.
- Switch from signed to unsigned integer values if you need to capture the full range of negative values.
- Always work with integers of defined size to stay in control of the side effects.
Compiler warnings about overflows are serious, even if they sometimes point at the wrong part of your code.
If undefined behaviour is involved, the code may work fine with one compiler or even one configuration (debug) but fail to be compiled by another compiler or in another configuration (release).
Enable all warnings, e.g. with -Wall and -Wextra. Never ignore warnings.

(Optional: Undefined compiler behaviour and why not only using static_cast and std::abs in the unsignedAbs function)

It is very tempting to implement the unsignedAbs function like this:

template<typename T>
constexpr auto unsignedAbs_undefined(T value) -> std::make_unsigned_t<T> {
    // THIS IS BAD CODE! IT RELIES ON UNDEFINED COMPILER BEHAVIOUR!
    static_assert(std::is_integral_v<T>);
    return static_cast<std::make_unsigned_t<T>>(std::abs(value));
}

This solution seems shorter and more elegant. If I compile it using clang, with no optimizations, I get this output:

Make Unsigned Positive (undefined behaviour):
auto r1 = unsignedAbs_undefined(std::numeric_limits<int8_t>::min()); r1 = uint8{128}
auto r2 = unsignedAbs_undefined(std::numeric_limits<int16_t>::min()); r2 = uint16{32768}
auto r3 = unsignedAbs_undefined(std::numeric_limits<int32_t>::min()); r3 = uint32{2147483648}
auto r4 = unsignedAbs_undefined(std::numeric_limits<int64_t>::min()); r4 = uint64{9223372036854775808}

This result looks perfect. It creates the unsigned absolute value for all four sizes correctly. If the same code is compiled using clang with optimizations, I get the following result:

Make Unsigned Positive (undefined behaviour):
auto r1 = unsignedAbs_undefined(std::numeric_limits<int8_t>::min()); r1 = uint8{128}
auto r2 = unsignedAbs_undefined(std::numeric_limits<int16_t>::min()); r2 = uint16{32768}
auto r3 = unsignedAbs_undefined(std::numeric_limits<int32_t>::min()); r3 = uint32{32768}
auto r4 = unsignedAbs_undefined(std::numeric_limits<int64_t>::min()); r4 = uint64{9223372036854808576}

If you get this undefined behaviour in your software, you will probably spend weeks trying to find it. Because the release build is affected, you will have a hard time to even finding the location of the problem. As both the static_cast and std::numeric_limits::min() have a defined behaviour, the problem has to be in std::abs. I can only guess the subtraction used to negate the value causes an overflow, which is no defined behaviour for signed integers.

show less

Problems with Overflowing Operations

C++ has no protection in place to guarantee safe mathematical operations for integer types. Overflows in integer operations are silently ignored, leaving you in the best case with the lower part of the result or in the worst case causing undefined behaviour.

The range of representable values for the unsigned type is 0 to 2^N − 1 (inclusive); arithmetic for the unsigned type is performed modulo 2^N.

Unsigned arithmetic does not overflow. Overflow for signed arithmetic yields undefined behavior.

– from ISO/IEC 14882:2020, Types

No Problems with Unsigned Integers

If you do math using an unsigned integer, if an operation overflows, you simply get the modulo for the current size of the type of the expected result. This works for additions, subtractions and multiplications, providing a predictable result for all these operations.

See the illustration above for the principle of modulo artithmetic. As the standard states, operations never overflow. If the result of an operation would generate bits outside of the size of the target integer, they are simply ignored.

The reason why C++ implements operations like this is for backward compatibility with C. And C implemented the operations like this to be as close as possible to the way how early CPUs worked.

A good illustration is how the add operation is compiled into machine code. Look at the simple function below, that just adds two unsigned 32-bit integer values.

#include <cstdint>
auto add(uint32_t a, uint32_t b) -> uint32_t {
    return a + b;
}

The generated machine code is death simple on every architecture:

add(unsigned int, unsigned int):
        push    rbp
        mov     rbp, rsp
        mov     DWORD PTR [rbp-4], edi
        mov     DWORD PTR [rbp-8], esi
        mov     edx, DWORD PTR [rbp-4]
        mov     eax, DWORD PTR [rbp-8]
        add     eax, edx
        pop     rbp
        ret

add(unsigned int, unsigned int):
        daddiu  $sp,$sp,-32
        sd      $fp,24($sp)
        move    $fp,$sp
        move    $3,$4
        move    $2,$5
        sll     $3,$3,0
        sw      $3,0($fp)
        sll     $2,$2,0
        sw      $2,4($fp)
        lw      $3,0($fp)
        lw      $2,4($fp)
        addu    $2,$3,$2
        move    $sp,$fp
        ld      $fp,24($sp)
        daddiu  $sp,$sp,32
        jr      $31
        nop

add(unsigned int, unsigned int):                               // @add(unsigned int, unsigned int)
        sub     sp, sp, #16
        str     w0, [sp, #12]
        str     w1, [sp, #8]
        ldr     w8, [sp, #12]
        ldr     w9, [sp, #8]
        add     w0, w8, w9
        add     sp, sp, #16
        ret

It simply uses the add instruction to add the values of two registers.

Unexpected Results with Signed Integers

In my experience short sentence “Overflow for signed arithmetic yields undefined behavior” is often overlooked by beginners. One of the problem is, that compilers most of the time produce code that behaves exactly like you would work with unsigned integers. Especially non-optimised code is compiled into simple add instructions that generate the same predictable overflowing results.

Let’s try to expose undefined behaviour with the following example code.

Project 05-overflow

#include <TypeInfo.hpp>

#include <cmath>

template<typename T>
constexpr T badSaturatingSubtract(T a, T b) noexcept {
    // BAD CODE! Signed integer overflow is undefined.
    static_assert(std::is_integral_v<T>);
    if (b == 0) return a;
    const T result = a - b;
    if constexpr (std::is_signed_v<T>) {
        if ((result < a) == std::signbit(b)) {
            return std::signbit(b) ? std::numeric_limits<T>::max() : std::numeric_limits<T>::min();
        }
    } else {
        if (result > a) {
            return 0;
        }
    }
    return result;
}

template<typename T>
inline void doShadySaturatingMath(T a, T b) noexcept {
    auto result = badSaturatingSubtract(a, b);
    std::cout << "badSaturatingSubtract(" << TypeInfo(a).str() << ", " << TypeInfo(b).str() << ") = "
              << TypeInfo(result).str() << "\n";
}

template<typename T>
inline void doShadySaturatingMath() noexcept {
    auto a = std::numeric_limits<T>::min();
    auto b = T{1};
    doShadySaturatingMath(a, b);
    a = std::numeric_limits<T>::max();
    b = T{-1};
    doShadySaturatingMath(a, b);
    a = T{-100};
    b = T{-500};
    doShadySaturatingMath(a, b);
}

auto main() -> int {
    std::cout << "\nSigned Math 16-bit:\n";
    doShadySaturatingMath<int16_t>();
    std::cout << "\nSigned Math 32-bit:\n";
    doShadySaturatingMath<int32_t>();
    std::cout << "\nSigned Math 64-bit:\n";
    doShadySaturatingMath<int64_t>();
    return 0;
}

The bad code is in line 10 (marked); the subtraction between two signed integers will overflow. Next, the result is tested if the result changes in the expected direction. If this isn’t the case, theoretically, an overflow would have been detected – but only if signed integers would work like unsigned ones.

I use std::signbit to test for negative values because this will enable additional optimisations and result in undefined behaviour in the clang compiler (Version 13.1.6 / clang-1316.0.21.2.5).

Signed Math 16-bit:
badSaturatingSubtract(int16{-32768}, int16{1}) = int16{-32768}
badSaturatingSubtract(int16{32767}, int16{-1}) = int16{32767}
badSaturatingSubtract(int16{-100}, int16{-500}) = int16{400}

Signed Math 32-bit:
badSaturatingSubtract(int32{-2147483648}, int32{1}) = int32{-2147483648}
badSaturatingSubtract(int32{2147483647}, int32{-1}) = int32{2147483647}
badSaturatingSubtract(int32{-100}, int32{-500}) = int32{400}

Signed Math 64-bit:
badSaturatingSubtract(int64{-9223372036854775808}, int64{1}) = int64{-9223372036854775808}
badSaturatingSubtract(int64{9223372036854775807}, int64{-1}) = int64{9223372036854775807}
badSaturatingSubtract(int64{-100}, int64{-500}) = int64{400}

Signed Math 16-bit:
badSaturatingSubtract(int16{-32768}, int16{1}) = int16{-32768}
badSaturatingSubtract(int16{32767}, int16{-1}) = int16{32767}
badSaturatingSubtract(int16{-100}, int16{-500}) = int16{400}

Signed Math 32-bit:
badSaturatingSubtract(int32{-2147483648}, int32{1}) = int32{2147483647}
badSaturatingSubtract(int32{2147483647}, int32{-1}) = int32{-2147483648}
badSaturatingSubtract(int32{-100}, int32{-500}) = int32{400}

Signed Math 64-bit:
badSaturatingSubtract(int64{-9223372036854775808}, int64{1}) = int64{9223372036854775807}
badSaturatingSubtract(int64{9223372036854775807}, int64{-1}) = int64{-9223372036854775808}
badSaturatingSubtract(int64{-100}, int64{-500}) = int64{400}

You can see the difference in the result between the debug and release build of the code. It only affects the operations where an overflow occurs.

You can rely on the defined “overflow” behaviour of unsigned integers.
But, overflow for signed arithmetic yields undefined behaviour.
Never rely on a specific result if a signed integer overflows. The effect may not be solely a wrong result; as demonstrated, the undefined behaviour can also affect following operations and lead to unpredictable behaviour.
Always write unit tests where possible. Compile and run the unit tests not only with debug settings but also optimized with release settings.
If you do a lot of integer mathematics, work with saturating operations. All modern processors have instructions for this, generating almost no additional instructions.
Enable all warnings, e.g. with -Wall and -Wextra. Never ignore warnings.

Comparisons with Unexpected Results

Integer comparison can be problematic if you mix signed and unsigned integers. This can happen by accident because literals without suffixes are signed by default.

uint32_t a = 0xf0000000u;
if (a < 1) { ... };

This compares an uint32_t with a signed int. You need to know that C++ does not properly compare signed and unsigned types; instead, it first converts both sides into the same type before the comparison is made.

If both operands are of arithmetic or enumeration type, the usual arithmetic conversions are performed on both operands; (…)

– from ISO/IEC 14882:2020, Equality Operators

Integer Promotions

For integral types, these conversions are called “integer promotions” and are performed on both operands of an arithmetic expression. The rules of integer promotions are shown below:

If type A == type B, the types are used unchanged.
If A is signed and B is signed, or A is unsigned, and B is unsigned, use one with the larger rank for both types.
int16_t + int64_t → int64_t.
If A is signed and B is unsigned, or A is unsigned, and B is signed:
- If the unsigned type has an equal to or larger rank than the signed type, use the unsigned type.
  uint32_t + int32_t → uint32_t
  uint64_t + int32_t → uint64_t
- If the signed type has a larger rank than the unsigned type and, therefore, can hold all values of the unsigned type, use the signed type.
  uint16_t + int32_t → int32_t
- If no other rule matches, an unsigned type of the size of the signed type shall be used for both sides.
  int64_t + uint64_t → uint64_t

The last rule seems to be redundant, but note the difference between rank and size. The rank of integers was mainly introduced to deal with different integer types of the same size.

Rank Rules

All chars have the same rank:
char == signed char == unsigned char
Standard integers are ranked like this:
signed char < short int < int < long int < long long int
Unsigned integers have the same rank as signed ones:
unsigned X == signed X
Standard integers must always have a higher rank than extended ones of the same size.
If long long int is 128-bit wide, it has a higher rank than __int128.
Bool shall have a lower rank than all other standard integers:
bool < char, signed char, unsigned char, etc.

With this knowledge, you can understand how the types of operands of an operation are converted before it is executed:

uint32_t a = 0xf0000000u;
if (a < 1) { ... };

uint32_t{0xf0000000u} < (int{1} → uint32{1})
uint32_t{0xf0000000u} < uint32t{1} == false

Project 06-comparison

Let’s look at some examples where the comparison seems to get the wrong result.

#include <TypeInfo.hpp>

template<typename A, typename B>
void printIsEqual(A a, B b) {
    const bool isEqual = (a == b);
    std::cout << TypeInfo(a).str() << " == " << TypeInfo(b).str() << " => " << std::boolalpha << isEqual << "\n";
}

template<typename A, typename B>
void printIsLess(A a, B b) {
    const bool isEqual = (a < b);
    std::cout << TypeInfo(a).str() << " < " << TypeInfo(b).str() << " => " << std::boolalpha << isEqual << "\n";
}

void languageComparisons() {
    std::cout << "\nC++ Comparisons:\n";
    printIsEqual(0, 0);
    printIsEqual(std::numeric_limits<int32_t>::min(), 0x80000000u);
    printIsEqual(-1, 0xffffffffu);
    printIsLess(-1, 0u);
}

If I compile this code, I get several warnings about a comparison between signed and unsigned integers. But, the code is perfectly valid; it is not based on undefined behaviour. The compiler warns because the way how the integers are converted before the comparisons will give you most likely not the result you would expect:

C++ Comparisons:
int32{0} == int32{0} => true
int32{-2147483648} == uint32{2147483648} => true
int32{-1} == uint32{4294967295} => true
int32{-1} < uint32{0} => false

I added the first comparison to remind you that the literal value zero has a type. While, for comparison, the value zero is unproblematic, you may change the type of the result if you add 0.

The next lines show a few unexpected results, even if they are correct if all the rules from above are applied. In plain sight like this, they are easy to spot and understand, but if this comparison is part of a larger expression and you ignore the warning messages from the compiler, you may end up with strange side effects.

Don’t compare signed with unsigned integers.
Always remember that plain literal values, like 0 or 1 are signed integers. Unless you add the suffix u.
Enable all warnings, e.g. with -Wall and -Wextra and never ignore warnings.

If you have to compare signed and unsigned integers in your code, the best is to write a comparison function that handles all cases as expected:

enum class [[nodiscard]] Ordering {
    Equal,
    Less,
    Greater
};

template<typename A, typename B>
constexpr auto compareInt(A a, B b) noexcept -> Ordering {
    static_assert(std::is_integral_v<A> && std::is_integral_v<B>);
    if constexpr (std::is_signed_v<A> == std::is_signed_v<B>) {
        using C = typename std::common_type_t<A, B>;
        const auto ca = static_cast<C>(a);
        const auto cb = static_cast<C>(b);
        return (ca <= cb) ? ((ca == cb) ? Ordering::Equal : Ordering::Greater) : Ordering::Less;
    } else if constexpr (std::is_signed_v<B>) {
        using C = typename std::common_type_t<A, std::make_unsigned_t<B>>;
        const auto ca = static_cast<C>(a);
        const auto cb = static_cast<C>(b);
        return (b < 0) ? Ordering::Less :
               ((ca <= cb) ? ((ca == cb) ? Ordering::Equal : Ordering::Greater) : Ordering::Less);
    } else {
        using C = std::common_type_t<std::make_unsigned_t<A>, B>;
        const auto ca = static_cast<C>(a);
        const auto cb = static_cast<C>(b);
        return (a < 0) ? Ordering::Greater :
               ((ca <= cb) ? ((ca == cb) ? Ordering::Equal : Ordering::Greater) : Ordering::Less);
    }
}

template<typename A, typename B>
void printCompareInt(A a, B b) {
    const auto result = compareInt(a, b);
    std::cout << "compareInt(" << TypeInfo(a).str() << ", " << TypeInfo(b).str() << ") => ";
    switch (result) {
        case Ordering::Equal: std::cout << "Equal"; break;
        case Ordering::Less: std::cout << "Less"; break;
        case Ordering::Greater: std::cout << "Greater"; break;
    }
    std::cout << "\n";
}


void comparisonsWithCompareInt() {
    std::cout << "\ncompareInt() Comparisons:\n";
    printCompareInt(0, 0);
    printCompareInt(std::numeric_limits<int32_t>::min(), 0x80000000u);
    printCompareInt(-1, 0xffffffffu);
    printCompareInt(-1, 0u);
}

Using this function in your code produces only very few additional CPU instructions, but you can safely compare every combination of integer types. The result indicates how b is compared to a. So Ordering::Less means b is smaller than a. (Please note, the C++20 diamond operator gives: a compared to b as result.)

The result of this code looks like this:

compareInt() Comparisons:
compareInt(int32{0}, int32{0}) => Equal
compareInt(int32{-2147483648}, uint32{2147483648}) => Greater
compareInt(int32{-1}, uint32{4294967295}) => Greater
compareInt(int32{-1}, uint32{0}) => Greater

Conclusion

I hope you found this journey to the abysses of the C++ language interesting. I’m aware this post merely points out all the traps you have to avoid and not many solutions on how to navigate around them. Yet, I am afraid adding various solutions for the individual topics would have added too much content to this overview.

If you have questions, missed any information, or simply wish to provide feedback, simply add a comment below or send me a message.

Extreme Integers – Doom from Below

Preface

Integers and Processors

Small Excursion through the C++ Standard

Test the Language Implementation

Project 01-Types

Unexpected Results with Integer Literals

Prefixes

Suffixes

Project 02-literals

How Negative Numbers are Handled

Project 03-negation

Signed Integer Ranges are not Balanced

Analysing the Output from the Project

Signed Integer Math is Strange

Make Negative Numbers Positive / Project 04-math

Make Signed to Unsigned Absolute

Problems with Overflowing Operations

No Problems with Unsigned Integers

Unexpected Results with Signed Integers

Project 05-overflow

Comparisons with Unexpected Results

Integer Promotions

Rank Rules

Project 06-comparison

Conclusion

Learn More

Use Enum with More Class!

How and Why to use Namespaces

Event-Based Firmware (Part 2/2)

It’s Time to Use #pragma once

Consistent Error Handling

How to Deal with Badly Written Code

Leave a ReplyCancel reply

Preface

Integers and Processors

Small Excursion through the C++ Standard

Test the Language Implementation

Project 01-Types

Unexpected Results with Integer Literals

Prefixes

Suffixes

Project 02-literals

How Negative Numbers are Handled

Project 03-negation

Signed Integer Ranges are not Balanced

Analysing the Output from the Project

Signed Integer Math is Strange

Make Negative Numbers Positive / Project 04-math

Make Signed to Unsigned Absolute

Problems with Overflowing Operations

No Problems with Unsigned Integers

Unexpected Results with Signed Integers

Project 05-overflow

Comparisons with Unexpected Results

Integer Promotions

Rank Rules

Project 06-comparison

Conclusion

Learn More

Use Enum with More Class!

How and Why to use Namespaces

Event-Based Firmware (Part 2/2)

It’s Time to Use #pragma once

Consistent Error Handling

How to Deal with Badly Written Code

Leave a ReplyCancel reply

Discover more from Lucky Resistor