The This Pointer in C++

2017-02 Update

I didn’t know about boxing and unboxing of primitives to objects (or much about references as it happens) when I wrote this back in 2013. Primitives like int/bool/float/char/double etc. DO get copied via the default copy constructor because they have default working copy/assignment functions. However if the Example class held an instance of, say, a DeepCopyWang object (it was the first thing I could think of) – it wouldn’t copy cleanly with the default assignment or copy constructors – any new object would just hold a reference to the original DeepCopyWang object and its properties. That is, the new object trying to be a copy of the first would be a shallow copy (i.e. it may contain references to properties of the first object) and not a deep copy (i.e. completely separate objects and all properties there-in which can be individually manipulated). So take the below with a suitably sized pinch of salt.


Still on my fundamentals trip, I’m hitting up the ‘this’ pointer. Every class that you create has a ‘this’ pointer invisibly assigned to it by the compiler. Let’s look at a simple class to see what’s going on:

class Example
{
private:
	int a;
public:
	void setA(int value) { a = value; }
	int  getA()          { return a;  }
};

When you write the above code the compiler does some fun things with it, such as invisibly adding four methods:

  • A default constructor (that takes no parameters) which is automatically executed when you instantiate an object of this type,
  • A destructor (again no parameters) which is automatically executed when an object of this type is deleted or goes out of scope,
  • A copy constructor (that takes another object of this type) and performs a shallow copy from the source object to the (new) destination object, and
  • An assignment operator (that takes another object of this type) and which again performs a shallow copy from that object the the object you’re assigning to.

If we explicitly write these four methods into our class, we end up with our (exactly, exactly equivalent) code now being:

class Example
{
private:
	int a;
public:
	// Constructor
	Example()
	{
		// Do nothing
	}
 
	// Destructor
	~Example()
	{
		// Do nothing
	}
 
	// Copy constructor
	Example(const Example& rhs)
	{
		a = rhs.a;
	}
 
	// Overloaded assignment operator
	Example& operator=(const Example& rhs)
	{
		if (this == &rhs)
			return *this;
 
		a = rhs.a;
		return *this;
	}
 
	void setA(int value) { a = value; }
	int  getA()          { return a;  }
};

You can substitute either of these classes into a project, compile it (in Release mode if you have both in there and just comment each out in turn!), and you’ll end up with byte-wise identical executables down to the very last bit. Not only are they functionally equivalent, they’re absolutely equivalent – as the compiler sees them, it’s the exact same code. Don’t take my word for it – try it out, if you’d like!

The ‘this’ pointer’s already being used, but what exactly is it doing? Well, let’s drill down into the nuts and bolts of it and take a look…

What is this?

As I’ve kind of been forced to jump ahead of myself here (because the copy constructor and assignment operator we wrote used the this pointer already) – let’s cut to the chase and ask question: what is this?

It turns out that ‘this’ is a pointer (i.e. a memory address) of the current object. Every object must exist in memory, and for us to do anything useful with the object, we need to know where in memory it is. And that is where ‘this’ comes into things.

We wrote some simple setter and getters earlier, using code as follows:

void setA(int value) { a = value; }
int  getA()          { return a;  }

However, when the compiler is making this into an executable, it does some further injecting so that our code can function correctly. What it does is to modify our code so that it becomes:

void setA(int value) { this->a = value; }
int  getA()          { return this->a;  }

The code “this->a” (with -> being the arrow operator which de-references a member/property of a class) takes the this pointer (which as we said, is the memory address of the object) and then as the class knows the memory address of the various methods, as well as the offsets of all properties of the class, proceeds to use that knowledge to go to the memory address of the a member (in this example). Once we have the memory address for a specific property of a specific object, we can “set” or “get” it by simply reading or writing however many bytes (depending on the size of the property – an int is generally 4 bytes) at that memory address.

Memory layout

When we create a class, we can look at the memory size of each instance of the class, as well as the offsets of class properties using the sizeof and offsetof functions.

Let’s give this a go by recreating our Example class to give it a few extra data members, and then print out the size of an Example instance and the offset of any properties:

#include <iostream>
#include <stddef.h> // Required for offsetof
 
using namespace std;
 
class Example
{
public:
	char   someChar;     // 1 byte per char
	int    intArray[3];  // 4 bytes per int * 3 ints = 12 bytes
	double someDouble;   // 8 bytes per double
};
 
int main()
{
	Example testObject;
 
	cout << "Example class object size: " << sizeof(Example) << " bytes" << endl;
	cout << "Individual class property sizes are:" << endl;
	cout << "- someChar           = " << sizeof(char) << " bytes," << endl;
	cout << "- intArray (3 ints)  = " << sizeof(int) * 3 << " bytes, and" << endl;
	cout << "- someDouble         = " << sizeof(double) << " bytes." << endl << endl;
	cout << "Class property offsets are: " << endl;
	cout << "- someChar   offset: " << offsetof(Example, someChar  ) << endl;
	cout << "- intArray   offset: " << offsetof(Example, intArray  ) << endl;
	cout << "- someDouble offset: " << offsetof(Example, someDouble) << endl;
 
	return 0;
}

The output of which is:

Example class object size: 24 bytes
Individual class property sizes are:
- someChar           = 1 bytes,
- intArray (3 ints)  = 12 bytes, and
- someDouble         = 8 bytes.
 
Class property offsets are:
- someChar   offset: 0
- intArray   offset: 4
- someDouble offset: 16

So 1 + 12 + 8 = … um, 21 bytes… so why is the size of each instance 24 bytes. Again – more complexity, this time it’s byte alignment – but at this stage (and in this article) we’re not too fussy about the hows and whys, so we’ll just say that the compiler pads the memory allocation so that the memory for our class properties aligns cleanly. You can think of it as the compiler’s way of justifying the data if you like:

ByteAlignment2

This is easy

As a final demonstration of the difference between this (the memory address of an object instance) and *this (the actual object at that memory address), we’ll do one final run-through of how we can use the ‘this’ pointer in our own methods, if we so choose:

#include <iostream>
 
using namespace std;
 
class Test
{
public:
	// Default constructor
	Test() : value(0) {}
 
	// Constructor
	Test(int theValue) : value(theValue)
	{
		// Nothing to do here - the value is assigned via the assignment list for effeciency
		// That is, when using an assignment list we:
		//      1 - Instantiate and assign values in one hit
		//
		// If we'd assigned any values here in the constructor body, we would instead be doing:
		//      1 - Instantiate members to default values using default constructor of type,
		//      2 - Assign parameter values over the top of existing (default) values
		//
		// Hence, using assignment lists are quicker.
	}
 
	// Return the 'this' pointer for the instance
	// Note: It's a pointer to an object of type Test so we have to use pointer to Test as
	// the return type. We could, optionally, return void*, but there's no reason to as
	// we know the type of object the pointer will be pointing to!
	Test* getThis() {
		return this;
	}
 
	// Dereference and return the this pointer so we return the actual instance/object itself, not the address of it
	Test& getThisDereferenced() {
		return *this;
	}
 
	// Overloaded assignment operator which return a reference to a test object for chaining
	Test& operator=(const Test& rhs)
	{
		// Avoid self asignment
		if (this == &rhs)
		{
			return *this;
		}
 
		value = rhs.value;
		return *this;
	}
 
	// Method to assign a value to our someValue property
	void setValue(int someValue) { value = someValue; }
	int  getValue()              { return value;      }
 
private:
	// Class member properties - just a single int in this case
	int value;
};
 
int main()
{
	// Create an instance of Test called foo and assign it the value 123
	Test foo(123);
 
	// Print what 'this' is in relation to the foo instance
	cout << "In terms of our foo object, 'this' is: " << endl;
	cout << foo.getThis() << " with value: " << foo.getValue() << endl << endl;
 
	// Create a second instance of Test called bar
	Test bar(456);
	cout << "In terms of our bar object, 'this' is: " << endl;
	cout << bar.getThis() << " with value: " << bar.getValue() << endl << endl;
 
	// Assign '*this' to bar. In essence, the 'getThisDereferenced' acts as an assignment operator
	bar = foo.getThisDereferenced();
	cout << "After assigning foo.getThisDereferenced to bar, bar's 'this' is:" << endl;
	cout << bar.getThis() << " with value: " << bar.getValue() << endl;
	cout << "Note: Memory address of bar has not changed!" << endl << endl;
 
	// Modify the value of the bar instance
	bar.setValue(789);
	cout << "After modifying the value of bar, bar's 'this' is:" << endl;
	cout << bar.getThis() << " with value: " << bar.getValue() << endl << endl;
 
	// Demonstrate that foo is still it's own, separate object with no reference to bar
	cout << "After modifying bar, foo's 'this' is still: " << endl;
	cout << foo.getThis() << " and it's value is still: " << foo.getValue() << endl << endl;
	cout << "This is proof that even after assigning foo to bar, we did a byte-by-byte ";
	cout << "copy rather than making bar's value property be a reference to foo's value property!" << endl;
	cout << "We actually performed a shallow-copy (everything on the stack was copied)." << endl;
	cout << "If our class had memory allocated on the heap we'd have to write our own" << endl;
	cout << "custom code to perform the deep copy of heap data between objects!" << endl << endl;
 
	cout << "Final chaining test using our overloaded assignment operator." << endl;
	// Instantiate a final Test object and chain assignments.
	// Note: We could have easily written this as 'Test baz = bar = foo' if we'd wanted
	Test baz;
	baz = bar = foo;
	cout << "After assinging foo to bar and bar to baz, baz's this is:" << endl;
	cout << baz.getThis() << " with value: " << baz.getValue() << endl << endl;
 
	return 0;
}

Wrapup

The very final ‘loophole’ that was bothering we was; as we have a copy constructor, what happens when we do this?:

Example example(example);

That is, we instantiate an instance of a class using the copy constructor with a copy of itself – which is currently un-instantiated! The answer, disappointingly, is not a lot. It’s legal, it works – and nothing bad happens. In our Example class, the properties remain undefined/uninitialised, and if you do the same thing with an int, via:

int foo(foo);

Then you get the same behaviour – try printing ‘foo’ and you get whatever garbage was already in the memory at that location. If you had a serious class, and especially if you had one that allocated memory on the heap then you’d put a self-assignment guard in there to gracefully deal with such shenanigans – which we’ll look at in the next post, which’ll be on creating templated resizable arrays, if you’re up for it ;-)

Cheers!

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.