I’m working on my C++ fundamentals at the moment, so I took a little time to cover reading and writing objects to/from files in ASCII and binary formats. I’ve worked with Java’s serializable mechanism before and liked it, so I was interested to see how C++ handled serialisation. In its usual C++ way; it does so at a lower level than you might expect.
Reading Strings From Binary Files
One of the interesting things I found was that if you write a string to a binary file, you can’t get it back because there’s no way to find its length. As you might imagine, the solution to this is to add a null-terminator (i.e. ‘\0’) at the end of your string, so you can pull it back in char-by-char and stop when you hit the terminator. This isn’t going to be very fast as we’re reading in such tiny (1 Byte) chunks – but depending on your scenario it may be perfectly acceptable:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
// Method to read a string from a binary file by reading a char at a time until we hit the null-terminator static string readBinaryString() { string tempString = ""; char tempChar; do { // Read a single char worth of data from the file (i.e. 1 byte) infileBinary.read( (char*)&tempChar, 1 ); // If the character we read isn't a null-terminator, add it onto the end of the string if (tempChar != '\0') { tempString += tempChar; } } while (tempChar != '\0'); // Keep trucking until we hit the null-character that signifies the end of the string // Return the string we've just built return tempString; } |
There’s other ways of reading strings back from binary files, like writing the size of the string first, and then the string itself. When reading, you’d then read the size first (for example, as a 4 Byte unsigned int), and then read however many Bytes of data as specified by the size you just read. Or, alternatively, you can simply not use strings – instead, have a fixed size char array so, for example, name is always 30 characters. You might not use all 30, but even if you waste like 20 chars or so on average – so what? It’s only 20 Bytes. Then, you can just go: read 30 bytes of data and store it in my name char array, no null-terminator required.
Why Use Binary?
Another interesting thing is why you’d bother storing things in binary in the first place – once your data’s in binary it’s not human-readable anymore, you need a hex editor if you really want to to modify it, and just… why? The answer lies in size and efficiency.
If we want to store an integer value, then typically an int is 4 Bytes. So lets say we had an unsigned int and we wanted to store the maximum value we can fit in that 4 bytes, in that case we’d be be storing the value 4,294,967,295 (just under 4.3 billion). Now, if we wanted to store that as a char array to keep things human readable, how many chars would we need? Counting them up gives us 10 chars worth of data (we won’t count the commas – I just put those in for formatting, and we’ll leave off any kind of terminator for now).
So, at 1 Byte of data per char, that’s 10 Bytes. What about if we wanted to store it in binary format? Well, we already said that one unsigned byte takes 4 Bytes – so there’s our answer – which means we’ve saved 6 Bytes of bloat on this one value alone. When you start working with 3D models, which might have thousands or even millions or vertices, each of which is typically 3 floats (x/y/z location) and maybe another 3 floats on top of that per vertex (x/y/z surface normal), and we may also have colour values (r/g/b/a) and maybe texture coordinates (s/t/[also p if 3D textures]) — we’re looking at a huge amount of data in a single model (and this doesn’t even account for multiple animation frames of a model!).
When our files get this big we’re never going to manually go into the data and twiddle it, so we gain nothing by keeping it human-readable. In fact, we lose time in loading/processing/writing the data as it will be significantly larger in ASCII than pure binary. So even though it’s a little bit more work to set up the read/write in binary, we gain a lot down the line by doing so.
Those of you who’re sharp as a tack will have noticed that I picked a large unsigned integer value (10 chars) to store in our example, and you may well be wondering what if we stored a small value, like something between 0 and 9 instead? Well, in that specific case we’d actually save 3 Bytes by storing it as a char (1 Byte each) rather than an unsigned int (4 Bytes each). But… if you were dealing with such a small range, you can only get a range of between 0 and 9 as a char (1 Byte) while you can store a value in the range -128 to +127 when treating the same Byte as a signed numerical type (or 0 to 255 unsigned) for the exact same storage. In effect, you’re never going to gain anything by using chars over numeric types!
If you really wanted to maximise your storage you can go further by using a tightly packed bit field (cppreference.com, wikipedia) instead of a ‘wasteful’ numeric type with all its fancy-pants byte alignment =P (Although bit-scrimper beware: unpacking non-boundary aligned data can come with performance penalties!).
Anyways, that’s the theory. You can find some heavily commented source code for reading/writing objects (which include a string property) to files in ASCII and binary formats below. In this simple example we write and then read two objects to a file in ASCII mode, and then do the exact same thing in binary mode, which results in file sizes of: 156 Bytes (ASCII) Vs. 33 Bytes (Binary).
Cheers!
Source Code
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 |
/*** Project: Reading and writing objects in ASCII and binary Author : r3dux Date : 02/12/2013 Version: 0.1 ***/ #include <iostream> #include <fstream> using namespace std; class Test { public: // Member properties string name; int age; float heightInMetres; short lastThreeTestScores[3]; // Static filestream objects static ofstream outfileASCII; static ifstream infileASCII; static ofstream outfileBinary; static ifstream infileBinary; // Method to write a record to the outfile void writeASCIIRecord() { if (outfileASCII) { // IMPORTANT: The single space character at the end of this is what it splits on outfileASCII << "Name: " << name << endl; outfileASCII << "Age: " << age << endl; outfileASCII << "HeightInMetres: " << heightInMetres << endl; outfileASCII << "LastThreeTestScores: " << lastThreeTestScores[0] << " " << lastThreeTestScores[1] << " " << lastThreeTestScores[2] << endl << endl; } else { cout << "ASCII file not open for writing!" << endl; } } // Method to read a record from the infile in ASCII format void readASCIIRecord() { // If the file's open, then read it... if (infileASCII) { // Note: We throw away the humanReadablePrefix and split on the space between it // and our actual data string humanReadablePrefix; infileASCII >> humanReadablePrefix >> name; infileASCII >> humanReadablePrefix >> age; infileASCII >> humanReadablePrefix >> heightInMetres; infileASCII >> humanReadablePrefix >> lastThreeTestScores[0] >> lastThreeTestScores[1] >> lastThreeTestScores[2]; // NOTE: If the line we read is empty it moves onto the next line so we don't need to read // the blank line we placed between records here. However, if you start mixing in getline() // calls this is no longer the case! For example, try reading JUST the name using: // // getline(infileASCII, name); // // And read everything else with the >> operator as above. It gets the first name as // "Name: Alice" (which you'd expect it to), but then fails to get the second name. // I would imagine that this is to do with the file pointers not being updated consistently. // Take away: Don't mix extraction operators and getline - odd things happen. } else { cout << "File not open for reading!" << endl; } } // Method to write a record to file in binary format void writeBinaryRecord() { // IMPORTANT: Name must be null terminated so we can extract it from the binary file // Note: If using C++11, instead of using the trick of de-referencing the // reverse_iterator, we can instead call: // // char lastChar = name.back(); // if (!name.empty()) { char lastChar = *name.rbegin(); // Name string not null terminated? Make it null terminated. if (lastChar != '\0') { name += '\0'; } } else // Name is empty, so... { name = "NoNameSet"; // ...specify a placeholder name... name += '\0'; // ...and add the null terminator. } if (outfileBinary) { // IMPORTANT: The single space character at the end of this is what it splits on outfileBinary.write( (char*)name.c_str() , name.size() ); // Note: We don't have to go name.size() * sizeof(char) here as a char is 1 byte! outfileBinary.write( (char*)&age , sizeof(int) ); outfileBinary.write( (char*)&heightInMetres , sizeof(float) ); outfileBinary.write( (char*)&lastThreeTestScores, sizeof(short) * 3 ); } else { cout << "Binary file not open for writing!" << endl; } } // Method to read a string from a binary file by reading a char at a time until we hit the null-terminator static string readBinaryString() { string tempString = ""; char tempChar; do { // Read a single char worth of data from the file (i.e. 1 byte) infileBinary.read( (char*)&tempChar, 1 ); // If the character we read isn't a null-terminator, add it onto the end of the string if (tempChar != '\0') { tempString += tempChar; } } while (tempChar != '\0'); // Keep trucking until we hit the null-character that signifies the end of the string // Return the string we've just built return tempString; } // Method to read a record from the infile void readBinaryRecord() { // If our file is open for reading... if (infileBinary) { // We have to read strings in a special way as we don't know how their size name = readBinaryString(); // ...but we do know the sizes (in bytes) of ints, floats and our array of 3 shorts. infileBinary.read( (char*)&age, sizeof(int) ); infileBinary.read( (char*)&heightInMetres, sizeof(float) ); infileBinary.read( (char*)&lastThreeTestScores, sizeof(short) * 3 ); } else { cout << "Binary file not open for reading!" << endl; } } // Open and close the outfile for writing in ASCII static void openToWriteASCII() { outfileASCII.open("testfileASCII.txt"); } static void closeOutfileASCII() { outfileASCII.close(); } // Open and close the infile for reading in ASCII static void openToReadASCII() { infileASCII.open("testfileASCII.txt"); } static void closeInfileASCII() { infileASCII.close(); } // Open and close the outfile for writing in binary static void openToWriteBinary() { outfileBinary.open("testfileBinary.bin", ios_base::binary); } static void closeOutfileBinary() { outfileBinary.close(); } // Open and close the infile for reading binary static void openToReadBinary() { infileBinary.open("testfileBinary.bin", ios_base::binary); } static void closeInfileBinary() { infileBinary.close(); } // Method to dump the object properties to the screen void print() { cout << "Name : " << name << endl; cout << "Age : " << age << endl; cout << "Height in Metres : " << heightInMetres << endl; cout << "Last Three Test Scores: " << lastThreeTestScores[0] << "\t" << lastThreeTestScores[1] << "\t" << lastThreeTestScores[2] << endl << endl; } }; /*** IMPORTANT: We MUST instantiate the static objects declared in the class! ***/ ofstream Test::outfileASCII; ifstream Test::infileASCII; ofstream Test::outfileBinary; ifstream Test::infileBinary; int main() { // Create two objects of our Test class Test test; test.name = "Alice"; test.age = 33; test.heightInMetres = 1.7f; test.lastThreeTestScores[0] = 123; test.lastThreeTestScores[1] = 456; test.lastThreeTestScores[2] = 789; Test test2; test2.name = "Bob"; test2.age = 66; test2.heightInMetres = 3.4f; test2.lastThreeTestScores[0] = 111; test2.lastThreeTestScores[1] = 222; test2.lastThreeTestScores[2] = 333; // Print out their details so we know what to expect cout << "BEFORE writing to file..." << endl; test.print(); test2.print(); // Open the outfile for writing, write the two objects to it and close the file to save it Test::openToWriteASCII(); test.writeASCIIRecord(); test2.writeASCIIRecord(); Test::closeOutfileASCII(); // Create two new objects and read a saved record into each object Test test3, test4; Test::openToReadASCII(); test3.readASCIIRecord(); test4.readASCIIRecord(); // Display the records we just read back from the file to ensure the data is the same cout << "AFTER reading data back from ASCII file: " << endl; test3.print(); test4.print(); Test::closeInfileASCII(); // Close the input file now we're done with it // Write test3 and test4 objects in binary format cout << "Writing retrieved objects as binary..." << endl << endl; Test::openToWriteBinary(); test3.writeBinaryRecord(); test4.writeBinaryRecord(); Test::closeOutfileBinary(); // Read binary objects back in cout << "Reading binary data back into new objects..." << endl; Test test5, test6; Test::openToReadBinary(); test5.readBinaryRecord(); test6.readBinaryRecord(); test5.print(); test6.print(); // Don't forget to close the file now we're done with it Test::closeInfileBinary(); return 0; } |