How To: Read and Write ASCII and Binary Files in C++

I’m working on my C++ fundamentals at the moment, so I took a little time to cover reading and writing objects to/from files in ASCII and binary formats. I’ve worked with Java’s serializable mechanism before and liked it, so I was interested to see how C++ handled serialisation. In its usual C++ way; it does so at a lower level than you might expect.

Reading Strings From Binary Files

One of the interesting things I found was that if you write a string to a binary file, you can’t get it back because there’s no way to find its length. As you might imagine, the solution to this is to add a null-terminator (i.e. ‘\0’) at the end of your string, so you can pull it back in char-by-char and stop when you hit the terminator. This isn’t going to be very fast as we’re reading in such tiny (1 Byte) chunks – but depending on your scenario it may be perfectly acceptable:

// Method to read a string from a binary file by reading a char at a time until we hit the null-terminator
static string readBinaryString()
{
	string tempString = "";
	char tempChar;
 
	do {
		// Read a single char worth of data from the file (i.e. 1 byte)
		infileBinary.read( (char*)&tempChar, 1 );
 
		// If the character we read isn't a null-terminator, add it onto the end of the string
		if (tempChar != '\0')
		{
			tempString += tempChar;
		}
 
	} while (tempChar != '\0'); // Keep trucking until we hit the null-character that signifies the end of the string
 
	// Return the string we've just built
	return tempString;
}

There’s other ways of reading strings back from binary files, like writing the size of the string first, and then the string itself. When reading, you’d then read the size first (for example, as a 4 Byte unsigned int), and then read however many Bytes of data as specified by the size you just read. Or, alternatively, you can simply not use strings – instead, have a fixed size char array so, for example, name is always 30 characters. You might not use all 30, but even if you waste like 20 chars or so on average – so what? It’s only 20 Bytes. Then, you can just go: read 30 bytes of data and store it in my name char array, no null-terminator required.

Why Use Binary?

BenderBinary

Another interesting thing is why you’d bother storing things in binary in the first place – once your data’s in binary it’s not human-readable anymore, you need a hex editor if you really want to to modify it, and just… why? The answer lies in size and efficiency.

If we want to store an integer value, then typically an int is 4 Bytes. So lets say we had an unsigned int and we wanted to store the maximum value we can fit in that 4 bytes, in that case we’d be be storing the value 4,294,967,295 (just under 4.3 billion). Now, if we wanted to store that as a char array to keep things human readable, how many chars would we need? Counting them up gives us 10 chars worth of data (we won’t count the commas – I just put those in for formatting, and we’ll leave off any kind of terminator for now).

So, at 1 Byte of data per char, that’s 10 Bytes. What about if we wanted to store it in binary format? Well, we already said that one unsigned byte takes 4 Bytes – so there’s our answer – which means we’ve saved 6 Bytes of bloat on this one value alone. When you start working with 3D models, which might have thousands or even millions or vertices, each of which is typically 3 floats (x/y/z location) and maybe another 3 floats on top of that per vertex (x/y/z surface normal), and we may also have colour values (r/g/b/a) and maybe texture coordinates (s/t/[also p if 3D textures]) — we’re looking at a huge amount of data in a single model (and this doesn’t even account for multiple animation frames of a model!).

When our files get this big we’re never going to manually go into the data and twiddle it, so we gain nothing by keeping it human-readable. In fact, we lose time in loading/processing/writing the data as it will be significantly larger in ASCII than pure binary. So even though it’s a little bit more work to set up the read/write in binary, we gain a lot down the line by doing so.

Those of you who’re sharp as a tack will have noticed that I picked a large unsigned integer value (10 chars) to store in our example, and you may well be wondering what if we stored a small value, like something between 0 and 9 instead? Well, in that specific case we’d actually save 3 Bytes by storing it as a char (1 Byte each) rather than an unsigned int (4 Bytes each). But… if you were dealing with such a small range, you can only get a range of between 0 and 9 as a char (1 Byte) while you can store a value in the range -128 to +127 when treating the same Byte as a signed numerical type (or 0 to 255 unsigned) for the exact same storage. In effect, you’re never going to gain anything by using chars over numeric types!

If you really wanted to maximise your storage you can go further by using a tightly packed bit field (cppreference.com, wikipedia) instead of a ‘wasteful’ numeric type with all its fancy-pants byte alignment =P (Although bit-scrimper beware: unpacking non-boundary aligned data can come with performance penalties!).

Anyways, that’s the theory. You can find some heavily commented source code for reading/writing objects (which include a string property) to files in ASCII and binary formats below. In this simple example we write and then read two objects to a file in ASCII mode, and then do the exact same thing in binary mode, which results in file sizes of: 156 Bytes (ASCII) Vs. 33 Bytes (Binary).

Cheers!

Source Code

/***
Project: Reading and writing objects in ASCII and binary
Author : r3dux
Date   : 02/12/2013
Version: 0.1
***/
 
#include <iostream>
#include <fstream>
 
using namespace std;
 
class Test
{
public:
	// Member properties
	string name;
	int age;
	float heightInMetres;
	short lastThreeTestScores[3];
 
	// Static filestream objects
	static ofstream outfileASCII;
	static ifstream infileASCII;
 
	static ofstream outfileBinary;
	static ifstream infileBinary;
 
	// Method to write a record to the outfile
	void writeASCIIRecord()
	{
		if (outfileASCII)
		{
			// IMPORTANT: The single space character at the end of this is what it splits on
			outfileASCII << "Name: " << name << endl;
			outfileASCII << "Age: " << age  << endl;
			outfileASCII << "HeightInMetres: " << heightInMetres << endl;
			outfileASCII << "LastThreeTestScores: " << lastThreeTestScores[0] << " " << lastThreeTestScores[1] << " " << lastThreeTestScores[2] << endl << endl;
		}
		else
		{
			cout << "ASCII file not open for writing!" << endl;
		}
	}
 
	// Method to read a record from the infile in ASCII format
	void readASCIIRecord()
	{
		// If the file's open, then read it...
		if (infileASCII)
		{
			// Note: We throw away the humanReadablePrefix and split on the space between it
			// and our actual data
			string humanReadablePrefix;
			infileASCII >> humanReadablePrefix >> name;
			infileASCII >> humanReadablePrefix >> age;
			infileASCII >> humanReadablePrefix >> heightInMetres;
			infileASCII >> humanReadablePrefix >> lastThreeTestScores[0] >> lastThreeTestScores[1] >> lastThreeTestScores[2];
 
			// NOTE: If the line we read is empty it moves onto the next line so we don't need to read
			// the blank line we placed between records here. However, if you start mixing in getline()
			// calls this is no longer the case! For example, try reading JUST the name using:
			//
			//  getline(infileASCII, name);
			//
			// And read everything else with the >> operator as above. It gets the first name as
			// "Name: Alice" (which you'd expect it to), but then fails to get the second name.
			// I would imagine that this is to do with the file pointers not being updated consistently.
			// Take away: Don't mix extraction operators and getline - odd things happen.
		}
		else
		{
			cout << "File not open for reading!" << endl;
		}
	}
 
	// Method to write a record to file in binary format
	void writeBinaryRecord()
	{
		// IMPORTANT: Name must be null terminated so we can extract it from the binary file
		// Note: If using C++11, instead of using the trick of de-referencing the
		// reverse_iterator, we can instead call:
		//
		//      char lastChar = name.back();
		//
		if (!name.empty())
		{
			char lastChar = *name.rbegin();
 
			// Name string not null terminated? Make it null terminated.
			if (lastChar != '\0')
			{
				name += '\0';
			}
		}
		else // Name is empty, so...
		{
			name = "NoNameSet"; // ...specify a placeholder name...
			name += '\0';       // ...and add the null terminator.
		}
 
		if (outfileBinary)
		{
			// IMPORTANT: The single space character at the end of this is what it splits on
			outfileBinary.write( (char*)name.c_str()        , name.size()       ); // Note: We don't have to go name.size() * sizeof(char) here as a char is 1 byte!
			outfileBinary.write( (char*)&age                , sizeof(int)       );
			outfileBinary.write( (char*)&heightInMetres     , sizeof(float)     );
			outfileBinary.write( (char*)&lastThreeTestScores, sizeof(short) * 3 );
		}
		else
		{
			cout << "Binary file not open for writing!" << endl;
		}
	}
 
	// Method to read a string from a binary file by reading a char at a time until we hit the null-terminator
	static string readBinaryString()
	{
		string tempString = "";
		char tempChar;
 
		do {
			// Read a single char worth of data from the file (i.e. 1 byte)
			infileBinary.read( (char*)&tempChar, 1 );
 
			// If the character we read isn't a null-terminator, add it onto the end of the string
			if (tempChar != '\0')
			{
				tempString += tempChar;
			}
 
		} while (tempChar != '\0'); // Keep trucking until we hit the null-character that signifies the end of the string
 
		// Return the string we've just built
		return tempString;
	}
 
	// Method to read a record from the infile
	void readBinaryRecord()
	{
		// If our file is open for reading...
		if (infileBinary)
		{
			// We have to read strings in a special way as we don't know how their size
			name = readBinaryString();
 
			// ...but we do know the sizes (in bytes) of ints, floats and our array of 3 shorts.
			infileBinary.read( (char*)&age, sizeof(int) );
			infileBinary.read( (char*)&heightInMetres, sizeof(float) );
			infileBinary.read( (char*)&lastThreeTestScores, sizeof(short) * 3 );
		}
		else
		{
			cout << "Binary file not open for reading!" << endl;
		}
	}
 
	// Open and close the outfile for writing in ASCII
	static void openToWriteASCII()  {
		outfileASCII.open("testfileASCII.txt");
	}
	static void closeOutfileASCII() {
		outfileASCII.close();
	}
 
	// Open and close the infile for reading in ASCII
	static void openToReadASCII()  {
		infileASCII.open("testfileASCII.txt");
	}
	static void closeInfileASCII() {
		infileASCII.close();
	}
 
	// Open and close the outfile for writing in binary
	static void openToWriteBinary()  {
		outfileBinary.open("testfileBinary.bin", ios_base::binary);
	}
	static void closeOutfileBinary() {
		outfileBinary.close();
	}
 
	// Open and close the infile for reading binary
	static void openToReadBinary()  {
		infileBinary.open("testfileBinary.bin", ios_base::binary);
	}
	static void closeInfileBinary() {
		infileBinary.close();
	}
 
	// Method to dump the object properties to the screen
	void print()
	{
		cout << "Name                  : " << name << endl;
		cout << "Age                   : " << age  << endl;
		cout << "Height in Metres      : " << heightInMetres << endl;
		cout << "Last Three Test Scores: " << lastThreeTestScores[0] << "\t" << lastThreeTestScores[1] << "\t" << lastThreeTestScores[2] << endl  << endl;
	}
};
 
/*** IMPORTANT: We MUST instantiate the static objects declared in the class! ***/
ofstream Test::outfileASCII;
ifstream Test::infileASCII;
 
ofstream Test::outfileBinary;
ifstream Test::infileBinary;
 
int main()
{
	// Create two objects of our Test class
	Test test;
	test.name = "Alice";
	test.age  = 33;
	test.heightInMetres = 1.7f;
	test.lastThreeTestScores[0] = 123;
	test.lastThreeTestScores[1] = 456;
	test.lastThreeTestScores[2] = 789;
 
	Test test2;
	test2.name = "Bob";
	test2.age  = 66;
	test2.heightInMetres = 3.4f;
	test2.lastThreeTestScores[0] = 111;
	test2.lastThreeTestScores[1] = 222;
	test2.lastThreeTestScores[2] = 333;
 
	// Print out their details so we know what to expect
	cout << "BEFORE writing to file..." << endl;
	test.print();
	test2.print();
 
	// Open the outfile for writing, write the two objects to it and close the file to save it
	Test::openToWriteASCII();
	test.writeASCIIRecord();
	test2.writeASCIIRecord();
	Test::closeOutfileASCII();
 
	// Create two new objects and read a saved record into each object
	Test test3, test4;
	Test::openToReadASCII();
	test3.readASCIIRecord();
	test4.readASCIIRecord();
 
	// Display the records we just read back from the file to ensure the data is the same
	cout << "AFTER reading data back from ASCII file: " << endl;
	test3.print();
	test4.print();
 
	Test::closeInfileASCII(); // Close the input file now we're done with it
 
	// Write test3 and test4 objects in binary format
	cout << "Writing retrieved objects as binary..." << endl << endl;
	Test::openToWriteBinary();
	test3.writeBinaryRecord();
	test4.writeBinaryRecord();
	Test::closeOutfileBinary();
 
	// Read binary objects back in
	cout << "Reading binary data back into new objects..." << endl;
	Test test5, test6;
	Test::openToReadBinary();
	test5.readBinaryRecord();
	test6.readBinaryRecord();
	test5.print();
	test6.print();
 
	// Don't forget to close the file now we're done with it
	Test::closeInfileBinary();
 
	return 0;
}

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.