0%

Type Punning

类型双关(type punning)经常在需要高性能的代码和网络编程中出现。比如:

  • 利用强制类型转换,将floatint互相转换。
  • malloc申请来的内存指针void *强制类型转换为一个对象的指针类型X *,并通过该指针访问X的成员。

然而,C++中的大部分的type punning(包括以上的两种)都会导致undefined behaviour。
我们将解决以下问题:

  • type punning 会导致怎样的undefined behaviour?
  • type punning 为何会导致undefined behaviour?
  • 哪些type punning是正确的,哪些是错误的?
  • 当我们必须进行type punning时,如何避免undefined behaviour来实现type punning的功能?

CppCon 2019: Timur Doumler “Type punning in modern C++”

会导致怎样的ub

Example for twitter: @lunasorcery

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
#include <cstdio>

struct Widget
{
virtual void doSomething()
{
printf("Widget");
}
};

struct Gizmo
{
virtual void doSomethingCompleteDifferent()
{
printf("Gizmo");
}
};

int main()
{
Gizmo g;
Widget* w = reinterpret_cast<Widget*>(&g);
w->doSomething();
}

这段代码在clang和gcc都可以编译,但是却产生了UB:
在x86-64 clang10.0,-O3优化下,这段代码会打印出Gizmo
1
2
3
4
5
6
7
8
9
10
main:                                   # @main
push rax
mov edi, offset .L.str
xor eax, eax
call printf
xor eax, eax
pop rcx
ret
.L.str:
.asciz "Gizmo"

在x86-64 gcc10.1,-O3优化下,这段代码被编译器全部忽略:
1
main:

在gcc看来,g是一个Gizmo而不是一个WidgetGizmoWidget又无继承关系,不可能在g上调用Widge的成员函数doSomething()。因此gcc将第23行的代码直接优化掉。

为什么会导致ub

错误地进行type punning可能会导致代码违反以下四种规则:

  • aliasing rules
  • object lifetime rules
  • alignment rules
  • rules for valid value representations

aliasing rules

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
void test(int* a, int* b)
{
*a = 1;
*b = 2;
*a += *b;
}

// legal
int f1()
{
int a = 0;
int b = 0;

test(&a, &b);
return a; // a == 3
}

// legal
int f2()
{
int a = 0;

test(&a, &a);
return a; // a == 4
}

在这个例子当中,调用f1()会返回3,调用f2()会返回4。注意到在f2()调用test(&a, &a)时,两个指针同时指向了a。gcc生成的汇编如下

1
2
3
4
5
test(int*, int*):
mov DWORD PTR [rdi], 1
mov DWORD PTR [rsi], 2
add DWORD PTR [rdi], 2
ret

但是在fortran当中,不允许两个引用引用到相同的地址。

1
2
3
4
5
6
subroutine test(a, b)
integer, integer(inout) :: a, b
a = 1
b = 2
a = a + b
end subroutine test

gcc生成的汇编如下,因为预先知道了ab不会是相同地址的引用,生成的代码更加简洁。

1
2
3
4
test_:
mov DWORD PTR [rsi], 2
mov DWORD PTR [rdi], 3
ret

编译器在知道ab不是相同地址的前提下,看到了对ab的赋值。并且了解到在a = a + b之前,a的值一直都是1,因此编译器就可以调整语句顺序,直接将3赋值给a

但是在C++中允许这样的aliasing(即ab指向相同的地址),所以我们就不能作出像fortran一样的优化。然而编译器不可能假设任何指针之间都有aliasing关系,否则C++生成的代码会变得非常缓慢。我们需要在 aliasing 和 效率 之间找到一个平衡点,这个就是C++的aliasing rule。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
void test(int* a, float* b)
{
*a = 1;
*b = 2;
*a += static_cast<int>(*b);
}

int main()
{
int x = 0;
// violation of aliasing rule
test(&x, reinterpret_cast<float*>(&x));
return x;
}

在上面这段代码中,ab的类型是不相关的,因此编译器可以假设ab没有aliasing关系,进行更好的优化。
然而在main()函数中调用test()函数时,违反了aliasing rule。这段代码会在假设ab指向不相同地址的假设下编译,如果用相同的地址调用,就产生的UB。

关于C++的strict aliasing rule,可以阅读这篇文章What is the Strict Aliasing Rule and Why do we care?

object lifetime rules

标准中关于object lifetime的定义:

The lifetime of an object or reference is a runtime property of the object or reference. A variable is said to have vacuous initialization if it is default-initialized and, if it is of class type or a (possibly multi-dimensional) array thereof, that class type has a trivial default constructor. The lifetime of an object of type T begins when:

  • (1.1) — storage with the proper alignment and size for type T is obtained, and
  • (1.2) — its initialization (if any) is complete (including vacuous initialization) (9.4), except that if the object is a union member or subobject thereof, its lifetime only begins if that union member is the initialized member in the union (9.4.1, 11.10.2), or as described in 11.5 and 11.4.4.2, and except as described in 20.10.10.1. The lifetime of an object o of type T ends when:
  • (1.3) — if T is a non-class type, the object is destroyed, or
  • (1.4) — if T is a class type, the destructor call starts, or
  • (1.5) — the storage which the object occupies is released, or is reused by an object that is not nested within
    o (6.7.2).

在已经对一个对象分配好存储空间后,一个对象的lifetime开始前,对于一个对象的指针使用有着以下限制:

Before the lifetime of an object has started but after the storage which the object will occupy has been allocated29 or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any pointer that represents the address of the storage location where the object will be or was located may be used but only in limited ways. For an object under construction or destruction,
see 11.10.4. Otherwise, such a pointer refers to allocated storage (6.7.5.4.1), and using the pointer as if the pointer were of type void* is well-defined. Indirection through such a pointer is permitted but the resulting lvalue may only be used in limited ways, as described below. The program has undefined behavior if:

  • (6.1) — the object will be or was of a class type with a non-trivial destructor and the pointer is used as the operand of a delete-expression,
  • (6.2) — the pointer is used to access a non-static data member or call a non-static member function of the object, or
  • (6.3) — the pointer is implicitly converted (7.3.11) to a pointer to a virtual base class, or
  • (6.4) — the pointer is used as the operand of a static_cast (7.6.1.8), except when the conversion is to pointer to cv void, or to pointer to cv void and subsequently to pointer to cv char, cv unsigned char, or cv std::byte (17.2.1), or
  • (6.5) — the pointer is used as the operand of a dynamic_cast (7.6.1.6).

同样地,在上述情况下,对一个对象的引用也有以下限制

Similarly, before the lifetime of an object has started but after the storage which the object will occupy has been allocated or, after the lifetime of an object has ended and before the storage which the object occupied is reused or released, any glvalue that refers to the original object may be used but only in limited ways.
For an object under construction or destruction, see 11.10.4. Otherwise, such a glvalue refers to allocated storage (6.7.5.4.1), and using the properties of the glvalue that do not depend on its value is well-defined. The program has undefined behavior if:

  • (7.1) — the glvalue is used to access the object, or
  • (7.2) — the glvalue is used to call a non-static member function of the object, or
  • (7.3) — the glvalue is bound to a reference to a virtual base class (9.4.3), or
  • (7.4) — the glvalue is used as the operand of a dynamic_cast (7.6.1.6) or as the operand of typeid.

那么如果产生一个object呢?

An object is created by a definition (6.2), by a new-expression (7.6.2.7), by an operation that implicitly creates objects (see below), when implicitly changing the active member of a union (11.5), or when a temporary object is created (7.3.4, 6.7.7)


例子1:

1
2
3
4
5
6
7
8
9
10
11
12
13
struct X
{
int a;
int b;
};

X* make_x()
{
X* p = (X*)malloc(sizeof(struct X)); //UB
p->a = 1;
p->b = 2;
return p;
}

注意在注释有//UB的一行,X所指的对象处于”the lifetime of an object has started but after the storage which the object will occupy has been allocated”所描述的状态。即虽然已经分配了储存空间,但是却未调用X的构造函数。后续的操作为UB。


修复例子1中的错误可以使用placement new。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct X
{
int a;
int b;
};

X* make_x()
{
char* storage = new char[sizeof(X)];
X* p = new(storage) X; // ctor is called, object is created
p->a = 1;
p->b = 2;
return p;
}

alignment rules

继续前面object lifetime中例子1的修复,我们是否可以将storage放在栈上呢?
例子2:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct X
{
int a;
int b;
};

X* make_x()
{
char storage[sizeof(X)];
X* p = new(storage) X;
p->a = 1;
p->b = 2;
return p;
}

不可以,因为这违反了alignment rules。


标准中关于alignment rules的说明:

Object types have alignment requirements (6.8.1, 6.8.2) which place restrictions on the addresses at which an object of that type may be allocated. An alignment is an implementation-defined integer value representing the number of bytes between successive addresses at which a given object can be allocated. An object type imposes an alignment requirement on every object of that type; stricter alignment can be requested using the alignment specifier.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
      64bit Machine

An int32_t with false alignment +-------+
|
+----+An int32_t with correct alignment |
| |
| |
| +--+--+--+--+--+--+--+--+ |
| | | | | | | | | | |
| +-----------------------+ |
| | | | | | | | | | |
| +-----------------------+ |
+---->+AA|AA|AA|AA| | | | | |
+-----------------------+ |
| | | | | | |AA|AA+<--------------+
+-----------------------+
|AA|AA| | | | | | |
+-----------------------+
| | | | | | | |AA+<--------------+
+-----------------------+ |
|AA| | | | | | | | |
+-----+--+--+--+--+--+--+ |
|
a char[2] array with correct alignment +-+

如果这里的int32_t没有正常对齐。这可能会导致硬件对内存的访问变慢或产生错误。


如何修复例子2呢?我们可以使用aligned_storage_t

1
2
3
4
5
6
7
8
9
10
11
12
13
14
struct X
{
int a;
int b;
};

X* make_x()
{
std::aligned_storage<sizeof(X), alignof(x)> storage;
X* p = new(storage) X;
p->a = 1;
p->b = 2;
return p;
}

rules for valid value representations

将一个float通过type punning看作int时,这个float的4个bit,变成int后,所表示的int是合法的。

待补充。

正确的type punning

正确的type punning很少:

  • 通过一个类型的动态类型访问
  • 通过一个类型的被unsignedsigned修饰的动态类型访问
  • 通过一个char, unsinged char或者std::byte访问
  • 如果两个类型是pointer-interconvertible的(典型的例子为std::complex)

pointer-interconvertible

标准中提到:

Two objects a and b are pointer-interconvertible if:

  • (4.1) — they are the same object, or
  • (4.2) — one is a union object and the other is a non-static data member of that object (11.5), or
  • (4.3) — one is a standard-layout class object and the other is the first non-static data member of that object, or, if the object has no non-static data members, any base class subobject of that object (11.4), or
  • (4.4) — there exists an object c such that a and c are pointer-interconvertible, and c and b are pointer-interconvertible.
1
2
3
4
5
6
7
8
9
10
11
12
13
struct Widget
{
int i = 42;
float f = 0;
};

int main()
{
Widget w;
// NOT UB
int i = *reinterpret_cast<int*>(&w);
return i;
}

上面的例子中//NOT UB注释下的语句并不是UB,因为Widgetint是pointer-interconvertible的。std::complex是标准钦定的,可以使用type punning的类型。而std::complex正是典型的该性质在标准库中的利用。

标准中对于std::complex提到:

If z is an lvalue of type cv complex then:

  • (4.1) — the expression reinterpret_cast<cv T(&)[2]>(z) is well-formed,
  • (4.2) — reinterpret_cast<cv T(&)[2]>(z)[0] designates the real part of z, and
  • (4.3) — reinterpret_cast<cv T(&)[2]>(z)[1] designates the imaginary part of z.
    Moreover, if a is an expression of type cv complex<T>* and the expression a[i] is well-defined for an integer expression i, then:
  • (4.4) — reinterpret_cast<cv T*>(a)[2*i] designates the real part of a[i], and
  • (4.5) — reinterpret_cast<cv T*>(a)[2*i + 1] designates the imaginary part of a[i].

标准中关于type punning说明的变化

来自C++17 final working draft - N4659

  • If a program attempts to access the stored value of an object through a glvalue of other than one of the following types the behavior is undefined:
    • (8.1) — the dynamic type of the object,
    • (8.2) — a cv-qualified version of the dynamic type of the object,
    • (8.3) — a type similar (as defined in 7.5) to the dynamic type of the object,
    • (8.4) — a type that is the signed or unsigned type corresponding to the dynamic type of the object,
    • (8.5) — a type that is the signed or unsigned type corresponding to a cv-qualified version of the dynamic type
      of the object,
    • (8.6) — an aggregate or union type that includes one of the aforementioned types among its elements or nonstatic data members (including, recursively, an element or non-static data member of a subaggregate or contained union),
    • (8.7) — a type that is a (possibly cv-qualified) base class type of the dynamic type of the object,
    • (8.8) — a char, unsigned char, or std::byte type.

注意(8.6)中提到,使用union或者aggregate来进行type punning是可行的。然而这和标准中的这段话相矛盾。

12.3 Unions:
In a union, a non-static data member is active if its name refers to an object whose lifetime has begun and has not ended (6.8). At most one of the non-static data members of an object of union type can be active at any time

因此出现了下面这提案P1359r0

The aliasing rules of 7.2.1 [basic.lval] paragraph 10 were adapted from C with additions for C++. However, a number of the points either do not apply or are subsumed by other points. For example, the provision for aggregate and union types is needed in C for struct assignment, which in C++ is done via constructors and assignment operators in C++, not by accessing the complete object.

修复后的C++20 final working draft - N4860去掉了大部分条目,使得规则更精简明确

  • If a program attempts to access (3.1) the stored value of an object through a glvalue whose type is not similar (7.3.5) to one of the following types the behavior is undefined:
    • (11.1) — the dynamic type of the object,
    • (11.2) — a type that is the signed or unsigned type corresponding to the dynamic type of the object, or
    • (11.3) — a char, unsigned char, or std::byte type.

If a program invokes a defaulted copy/move constructor or copy/move assignment operator for a union of type U with a glvalue argument that does not denote an object of type cv U within its lifetime, the behavior is undefined. [Note: Unlike in C, C++ has no accesses of class type. — end note]

The intent of this list is to specify those circumstances in which an object may or may not be aliased.


错误的type punning

1
2
3
4
5
6
7
8
9
10
    float              float                    float
+-+-+-+-+ +-+-+-+-+ +-+-+-+-+
+---+ | | | | +---+ | | | | +---+ | | | |
| +-+-+-+-+ | +-+-+-+-+ | +-+-+-+-+
| | |
| int32_t | |
| +-+-+-+-+ | +-+ +-+ +-+ +-+ | +-+-+ +-+-+
+-->+ | | | | +-->+ | | | | | | | +-->+ | | | | |
+-+-+-+-+ +-+ +-+ +-+ +-+ +-+-+ +-+-+
(unsinged) char* / std::byte* int16_t[2]
  1. 将一个类型看作另一个类型,两个类型的长度相同
  2. 将一个类型看作一些单独的bytes
  3. 将一个类型看作另一个类型,两个类型的长度不同

形式1和形式3往往都是错误的type punning,其中形式1更为常见,经常在一些需要高性能的场景出现。比如下面这一段来自Quake3的FastInvSqrt()的代码
Fast inverse square root - wikipedia

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;

x2 = number * 0.5F;
y = number;

// Type Punning
i = * ( long * ) &y; // evil floating point bit level hacking
i = 0x5f3759df - ( i >> 1 ); // what the fuck?

// Type Punning
y = * ( float * ) &i;
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration
// y = y * ( threehalfs - ( x2 * y * y ) ); // 2nd iteration, this can be removed
return y;
}

第11行和第15行出现了type punning,这种type punning即使在C中也是UB。在C中,可以使用union的方法来修复这段代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
float Q_rsqrt( float number )
{
const float x2 = number * 0.5F;
const float threehalfs = 1.5F;

union
{
float f;
int i;
} conv = {number}; // member 'f' set to value of 'number'

conv.i = 0x5f3759df - (conv.i >> 1);
conv.f *= (threehalfs - (x2 * conv.f * conv.f));

return conv.f;
}

但这样的修复方式在C++,无论是根据C++17还是C++20关于type punning的说明,仍然是UB。即便gcc在文档中标注了支持这种type punning: gcc Non-bugs

… To fix the code above, you can use a union instead of a cast (note that this is a GCC extension which might not work with other compilers) …

但为了写出portable的代码,我们需要寻找type punning的替代品。

type punning的替代品

std::memcpy

为了修复Quake3中的UB,我们可以将reinterpret_cast改为memcpy

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
float Q_rsqrt( float number )
{
long i;
float x2, y;
const float threehalfs = 1.5F;

x2 = number * 0.5F;
y = number;
//i = * ( long * ) &y;
std::memcpy(&i, &y, sizeof(float));
i = 0x5f3759df - ( i >> 1 );
//y = * ( float * ) &i;
std::memcpy(&y, &i, sizeof(float));
y = y * ( threehalfs - ( x2 * y * y ) ); // 1st iteration

return y;
}

一般来讲,编译器都会帮助优化memcpy,并不会产生额外的性能开销。上面的这段代码,修复前和修复后的版本,在gcc,clang,MSVC中都产生了相同的汇编。

std::bit_cast

C++20加入了constexprbit_cast(),方便以memcpy的方式来代替type punning

1
2
template< class To, class From >
constexpr To bit_cast(const From& from) noexcept;

bit_cast - cppreference。注意为了能够调用memcpy,必须满足如下条件

  • sizeof(To) == sizeof(From)
  • To 和 From 两者都是可平凡复制 (TriviallyCopyable) 类型

一个可能的实现:

1
2
3
4
5
6
7
8
9
10
11
12
13
template <class To, class From>
typename std::enable_if<
(sizeof(To) == sizeof(From)) &&
std::is_trivially_copyable<From>::value &&
std::is_trivially_copyable<To>::value,
To>::type
// constexpr 支持需要编译器魔法
bit_cast(const From &src) noexcept
{
To dst;
std::memcpy(&dst, &src, sizeof(To));
return dst;
}

然而需要注意的是,在标准中,对bit_cast作出了这样的限制:

If there is no value of type ‘To’ corresponding to the value represetation produced, the behavior is undefined.

仍不能修复的type punning

在生产中常会见到这样的代码

1
2
3
4
5
6
7
8
9
10
11
12
void process(Stream* stream)
{
std::unique_ptr<char[]> buffer = stream->read();

if(buffer[0] == WIDGET)
{
// UB
processWidget(reinterpret_cast<Widget*>(buffer.get()));
}
else
// ...
}

Widget并没有被created,因此为UB。但是如果Widget不是可平凡复制的,将无法使用memcpy或者bit_cast。为了修复在这种情况下无法找到type punning替代品你的问题,可以查看可能会在C++23进入标准的提案P0593R4,其中提出了一种产生对象的新方式std::start_lifetime_as

其他问题

1
2
3
4
5
6
7
8
9
void print_bytes(float f)
{
auto* buf = reinterpret_cast<unsigned char*>(&f);
for(auto i = size_t{}; i < sizeof(float); ++i)
{
// UB: buf may not point to the first byte of f
std::cout<< buf[i];
}
}

在上面的这段代码中,标准中只说了buf会指向对象f,但没说buf是否会指向f的首个byte。然而在任何的主流编译器中,buf都会指向f的第一个byte。