I recently spoke with a smart friend who's going through the process of learning C++ coding at a university. We started talking about abstract classes and one thing led to another until I brought up pure-call exceptions. More discussion ensued and I posed the question, "How can you actually cause a pure-call exception?"
If you think about it a bit, this should be impossible. There's no way to instantiate an abstract class. The compiler just won't let you. For any real subclasses of the abstract class the compiler will fill in an appropriate function pointer table. So what's the deal? How does this happen?
First, let's have a look at the memory structure of a typical C++ object containing virtual functions.
0:000:x86> ?? tmp
class BaseReal * 0x004a49a0
+0x000 __VFN_table : 0x01312110
+0x004 m_data : 0x1337beef
0:000:x86> dps 0x01312110 L5
01312110 013110c0 cppstuff!BaseReal::`scalar deleting destructor'
01312114 01311120 cppstuff!BaseReal::Get
01312118 01311140 cppstuff!BaseReal::Sum
0131211c 00000000
01312120 00000048
Here's an object
tmp
which contains a virtual function table pointer __VFN_table
and then a single data element called m_data
. I can see from dumping pointer-sized chunks of __VFN_table
with symbol matching turned on that it's actually got the function pointers for the class called BaseReal
.
This corresponds to the following code listing:
#include <stdio.h>
#include <stdlib.h>
class BaseAbstract
{
public:
BaseAbstract ();
virtual ~BaseAbstract ();
virtual unsigned int Get () = 0;
virtual unsigned int Sum ();
};
class BaseReal : public BaseAbstract
{
public:
BaseReal ();
virtual ~BaseReal ();
virtual unsigned int Get ();
virtual unsigned int Sum ();
unsigned int m_data;
};
BaseAbstract::BaseAbstract ()
{
//Sum (); // BOOM.
}
BaseAbstract::~BaseAbstract ()
{
}
unsigned int BaseAbstract::Sum ()
{
return Get () + 0;
}
BaseReal::BaseReal () : BaseAbstract ()
{
m_data = 0x1337BEEF;
}
BaseReal::~BaseReal ()
{
m_data = 0xDEADBEEF;
}
unsigned int BaseReal::Get ()
{
return m_data;
}
unsigned int BaseReal::Sum ()
{
return Get () + m_data;
}
int main (int argc, char* argv[])
{
BaseReal *tmp = new BaseReal ();
unsigned int value = tmp->Sum (); // <<--- break point here.
delete tmp;
return 0;
}
If I uncomment the "BOOM" line and run again the application will crash before it gets to the break point. The interesting part is what happens before the crash. Let's have a look at the constructors disassembly. First, the BaseReal constructor:
0:000:x86> uf cppstuff!BaseReal::BaseReal
cppstuff!BaseReal::BaseReal :
// Function prologue...
41 00df1090 55 push ebp
41 00df1091 8bec mov ebp,esp
// Setting up the "this" pointer (ecx usually contains 'this') and then
// calling the BaseAbstract constructor.
41 00df1093 51 push ecx
41 00df1094 894dfc mov dword ptr [ebp-4],ecx
41 00df1097 8b4dfc mov ecx,dword ptr [ebp-4]
41 00df109a e861ffffff call cppstuff!BaseAbstract::BaseAbstract (00df1000)
// Loading eax with pointer to 'this' and then storing the virtual function
// table for BaseReal (cppstuff!BaseReal::`vftable' (00df2120)
41 00df109f 8b45fc mov eax,dword ptr [ebp-4]
41 00df10a2 c7002021df00 mov dword ptr [eax],
offset cppstuff!BaseReal::`vftable' (00df2120)
// Saving 0x1337BEEF to m_data.
42 00df10a8 8b4dfc mov ecx,dword ptr [ebp-4]
42 00df10ab c74104efbe3713 mov dword ptr [ecx+4],1337BEEFh
// Function epilogue...
43 00df10b2 8b45fc mov eax,dword ptr [ebp-4]
43 00df10b5 8be5 mov esp,ebp
43 00df10b7 5d pop ebp
43 00df10b8 c3 ret
// Dumping the function table...
0:000:x86> dps cppstuff!BaseReal::`vftable' L3
00df2120 00df10c0 cppstuff!BaseReal::`scalar deleting destructor'
00df2124 00df1120 cppstuff!BaseReal::Get
00df2128 00df1140 cppstuff!BaseReal::Sum
This looks pretty reasonable. First there's the function prologue and then we do some C++ "this pointer" setup to make all that work. After that we immediately jump into the constructor for
BaseAbstract
. Once that work is done the m_data
member is initialized.
And now a look at the BaseAbstract constructor:
0:000:x86> uf cppstuff!BaseAbstract::BaseAbstract
cppstuff!BaseAbstract::BaseAbstract :
// Function prologue...
27 00df1000 55 push ebp
27 00df1001 8bec mov ebp,esp
// Setting up the "this" pointer (ecx usually contains 'this') and
// saving it on the stack as a local in preparation for calling
// the "Sum" function.
27 00df1003 51 push ecx
27 00df1004 894dfc mov dword ptr [ebp-4],ecx
// Loading eax with pointer to 'this' and then storing the virtual function
// table for BaseReal (cppstuff!BaseAbstract::`vftable' (00df2110)
27 00df1007 8b45fc mov eax,dword ptr [ebp-4]
27 00df100a c7001021df00 mov dword ptr [eax],offset
cppstuff!BaseAbstract::`vftable' (00df2110)
28 00df1010 8b4dfc mov ecx,dword ptr [ebp-4]
// Calling Sum -- which will fail.
28 00df1013 e858000000 call cppstuff!BaseAbstract::Sum (00df1070)
// Function epilogue...
29 00df1018 8b45fc mov eax,dword ptr [ebp-4]
29 00df101b 8be5 mov esp,ebp
29 00df101d 5d pop ebp
29 00df101e c3 ret
// Dumping the function table...
0:000:x86> dps cppstuff!BaseAbstract::`vftable' L3
00df2110 00df1020 cppstuff!BaseAbstract::`scalar deleting destructor'
00df2114 00df1224 cppstuff!purecall
00df2118 00df1070 cppstuff!BaseAbstract::Sum
// Disassembly for BaseAbstract::Sum -- called in the constructor.
0:000:x86> uf cppstuff!BaseAbstract::Sum
cppstuff!BaseAbstract::Sum :
// Prologue...
36 00df1070 55 push ebp
36 00df1071 8bec mov ebp,esp
// Saving ecx. This is somewhat important. Note: the "this call"
// calling convention requires the "this pointer" to be in ecx. The code
// is using ebp-4 to stash the "this" pointer.
36 00df1073 51 push ecx
// Copying "this" (ecx) to local storage -- anything with negative
// ebp references is a local / spill location.
36 00df1074 894dfc mov dword ptr [ebp-4],ecx
// Load the address of the function table into eax.
37 00df1077 8b45fc mov eax,dword ptr [ebp-4]
// Dereference eax into edx -- now we have the function table.
37 00df107a 8b10 mov edx,dword ptr [eax]
// Set ecx to "this" for the call, per calling convention.
37 00df107c 8b4dfc mov ecx,dword ptr [ebp-4]
// Deference the 2nd function table entry (the one for "Get").
37 00df107f 8b4204 mov eax,dword ptr [edx+4]
37 00df1082 ffd0 call eax
// Epilogue...
38 00df1084 8be5 mov esp,ebp
38 00df1086 5d pop ebp
38 00df1087 c3 ret
There's the same "this" pointer initialization (although this was probably already done, these constructors have to work in a vacuum, so they may duplicate a little work). Next, the setup of the virtual function table and then the call to
Sum
. I think the compiler took a nice optimization here and didn't use the virtual function table to get the address of Sum
. If this were code anywhere other than the constructor I imagine it would have used the function table pointer instead.
So now this brings us to the
Sum
code, which I also dumped. You can see it dereferences the virtual function table for the Get
function call and then calls it. The problem is this is a pure virtual function so the table entry is for cppstuff!purecall
; which is a function added by the compiler as a placeholder to indicate failure.
What are the lessons learned? You should never call virtual functions (or functions that call virtual functions) in the constructor or destructor. I didn't show the destructor code, but the whole process of loading the proper function table pointer and setting it is reversed.
Clear as mud?