How To: Shrink/Compress a VirtualBox Windows Guest Machine

When you create a VirtualBox virtual-machine, you get the option to choose either a fixed sized hard drive (in which case the entire amount of space is allocated and fixed immediately), or a dynamically expanding hard drive for the virtual machines OS and file storage (i.e. you specify a maximum size for the hard drive, it starts at 0 bytes, and increases as necessary up to the maximum you gave it on creation – at which point the drive is full).

Linux and Windows via VirtualBox

This is all well and good, but the problem with dynamic storage is that although it’s more than happy to increase in size, it doesn’t come down in size again when data is removed. So, to give an example, if you created a 10GB dynamic disk for a virtual machine it starts off at 0 bytes, you install a 1GB operating system and the drive is now 1GB in size (and hence taking up 1GB of space on your actual hard drive), you rip a 4.7GB DVD to the virtual drive which makes its size now 5.7GB, you delete the DVD rip so only the OS remains – and you might think that the “dynamic” drive will automatically shrink back down to 1GB, only it doesn’t. You’re holding on to 4.7GB of unrecoverable* bloat. Lucky you… =P

You could rip another DVD and re-use that space without the drive expanding any further, but really, it’s just going to increase and increase, and you’ll know in your heart of hearts that when you’re running low on disk space you could really do with that space back to your real hard drive. In VMWare you can compact the drive image as a menu option, but in VB we have to do a three step process… So, shall we?

* = unrecoverable unless you jump through the below 3 hoops, or create a new dynamic drive image, expand it to a size just over the size of your data on your original drive, then raw-copy the data across, that is… IMHO the steps below are easier!

A-B-C, Easy As…

  1. Defrag your Windows guest machine
  2. Now we need to replace all the “blank” (but still taking up space!) areas of our drive with zeros so we can recognise them to be stripped out later. Thankfully, this is really easy. Just download free (and tiny – 47KB) command-line utility SDelete from Microsoft and run it from within your virtual Windows guest machine with the following command (the -c switch is important!):
    sdelete -c
  3. Once that’s finished running shut down your virtual machine, navigate to the folder where your virtual machine hard drive is (such as ~/.VirtualBox/HardDisks) then from your host system run the following command to compress the hard drive down to a more reasonable size:
    VBoxManage modifyhd NAME-OF-YOUR-VIRTUAL-HARD-DRIVE.vdi --compact

    So if your virtual machine name (and thus by default the hard drive name) was “XP_Client_1”, then you’d use:

    VBoxManage modifyhd XP_Client_1.vdi --compact

With that done I trimmed down an excessively bloated 25GB .VDI of Windows 7 into a still excessively bloated 15GB – but that’s just in the nature of Microsoft OS’s… =P

Update: If you get an error stating things along the lines of:

VBoxManage: error: Cannot register the hard disk 'BLAH' because a hard disk 'BLAH' with UUID {LOTS-O-HEX} already exists

then you can fix it like this:

  1. Detach the drive from your virtual machine,
  2. Edit the file ~/.VirtualBox/VirtualBox.xml and remove all lines with the drive you want to compact mentioned in the HardDrives section (Note: be careful you don’t delete the virtual machine entry itself from the MachineRegistry section! Only remove the drive from the HardDrives section.),
  3. Now you’ll be able to compact the drive, and when it’s done you can re-attach the drive to your virtual machine. Good as old! =D

Snap Happy

Another thing you can do to decrease the disk usage of VirtualBox machines is get rid of all your snapshots if you don’t need them anymore. Each snapshot is basically an entire disk image which you can roll back to, so if you have Windows 7 installed it’s about 7GB or so after a fresh install, if you then put on 500MB of patches and take another snapshot you’re storing another 7.5GB. If you then install Office or something and that takes up 2GB and take yet another snapshot you’re burning through yet another 9.5GB, so we’re up to 23.5GB already for a single 9GB drive!

You should definitely be careful when merging snapshots into the main image (basically getting rid of the snapshots), as it has the potential to break, but more likely it has the potential to confuse and cause you to throw away data you didn’t mean to. This is because of some particularly ambiguous and misleading phrasing used in VirtualBox circles – the crux of the matter being:

  • When you restore a snapshot, it will throw the current state and/or any subsequent snapshots away and leave the machine in the state defined by the snapshot you’re restoring. This can mean a lot of changes which currently exist in the image being undone, and lot of files disappear, for example – the only copy of important documents created since the snapshot you’re restoring was made. Use with care.
  • When you delete a snapshot, it will actually merge the current state of the machine into the snapshot before removing that snapshot and leaving the machine at it’s current merged state but without the snapshot existing…

Yeah, I know it’s confusing, so just be careful, okay? If you’ve got the space available just take a copy of the .VDI file from the HardDisks folder AS WELL AS a copy of the snapshots by copying the folder with the name of your VM from the Machines folder, and then merge in the snapshots – this way if it all goes nuts you can’t throw away the knackered copy and replace it with your pristine pre-merge copies.


Credits: Thanks to Damien for his article at MakeTechEasier for the initial information (you can also find out how to compress Linux guests there too, but just be aware that the technique he outlines involves cloning the drives then shrinking and re-importing them) and to Alphatek’s article for the simplification!

How To: Compress Each Folder/Directory to Separate Archives in Linux

Lets say you’ve got a bunch of folders taking up a large swathe of space which you never really use but want to keep, just not taking up stacks of your NAS… How can you easily compress them all up to individual archives of each folder? Dead easy:

  1. #!/bin/bash
  2. for folder in */
  3. do
  4.   7z a -mx9 -mmt "${folder%/}.7z" "$folder"
  5. done

Save that to a file, chmod +x it and run in the location you want to compress the folders. Every folder (and all contents within) will be compressed to its own foldername.7z archive.

With 7z, -mx9 is the flag for maximum compression, and -mmt says to use multiple CPUs to speed up compression, so omit that part if you’re on a single core machine.

2016 Now-That-I-Think-Of-It Update

The way I use this script is by simply putting it into a file called zipeach, making it executable, and then moving it to /usr/local/bin – which makes it convenient to be able to compress all folders in your pwd at whim.

Also, in case it’s important to you, the 7z format does not maintain file permissions, so if you need to preserve file permissions then you’ll likely want to compress each folder into a .tar archive, and then compress that into a .tar.7z.

How To: Compress a Directory of Files into Individual Archives

I’ve got a stack of files all thrown together in the same directory, and I wanted them compressed – simple enough, eh? Only thing is I wanted each file compressed to its own archive, so I can see at a glance what’s there, and if for some reason an archive gets corrupted, it’s just one file lost and I can replace it instead of having to dick around repairing corrupted “blob” archives that contain the entire bunch of files. And I want to be able to specify all files with a given file extension to compress.

Although I wouldn’t be surprised if you could do this in 4 lines of Perl, I don’t know flippin’ Perl (yet), so I wrote a bash script to do the job.

  1. #
  2. # Purpose: Script to compress all files of given extension to individual archives using 7z
  3. # Usage  : <extension-without-prefix-dot> i.e n64
  4. # Author : r3dux
  5. # Date   : 16/04/2009
  6. #
  7. #!/bin/bash
  9. count=0           # File counter
  10. got7z=$(which 7z) # Use "which" to check if there's a copy of 7z on the system
  12. # If there's no copy of 7z - we're not going to be doing much compressing... Exit stage left.
  13. if [ "$got7z" = "" ]; then
  14.     echo "No copy of 7z found on system! Try running: sudo apt-get install p7zip-full"
  15.     exit 0
  16. fi
  18. # If we have 7z, and have been given a file extension parameter...
  19. if test "$1"; then
  21. 	# Stop the script from entering an infinite loop should user mistakenly enter 7z as filetype to compress...
  22.  	if [ $1 = "7z" ]; then
  23.  	    echo "Recursion neatly sidestepped - no 7z filetypes ya scurvy seadog! =P"
  24.  	    exit 0
  25.  	fi	 	
  27.  	echo "Starting Zipeach..."
  29.  	# For each file with the given extension in the current directory...
  30. 	for file in ./*.$1; do
  32. 		# If a file exists with given extension...
  33. 		if [ -e "$file" ]; then
  35. 			# Compress the file with maximum compression (-mx9) and use multiple threads for multiCPU machines (-mmt)
  36. 			# NOTE: Remove -mmt flag to run this on a single CPU box...
  37. 			7z a -mx9 -mmt "$file".7z "$file"
  39. 			# Increment our file counter
  40. 			let count+=1
  41. 		else
  42. 			# If no file of required extension has been found then notify user and quit
  43. 			echo "No files of extension .$1 found in current directory!"
  44. 			exit 0
  46. 		fi # End of if file exists condition				
  48. 	done # End of for each file loop
  50. 	# Exit when all files compressed
  51. 	echo # Cheap blank line =P
  52. 	echo "Zipeach completed. $count files of extension .$1 found and compressed."
  53.   	exit 0 
  55. else
  56. 	# No extension parameter given? No worky...
  57. 	echo
  58. 	echo 'Please run the script with an extension to compress i.e. " n64" to compress each .n64 file into its own 7z archive.'
  59. 	exit 0
  60. fi

Bash, as it turns out, is a fiddly, finicky beast in that you really have to think about what the command-line will see under different circumstances and enclose variables in inverted commas or not in very precise ways (see this article to understand what I mean). All that if / fi stuff too… very odd.

To use the script, copy and paste it into a text file (in my case I’ve called it, save it, make the file executable using chmod +x and move it to /usr/bin or something so it’s in your path using sudo mv ./ /usr/bin/ – then run it inside any directory you want to zip files to individual archives by calling it with nds (for example) to compress all the .nds (Nintendo DS roms) in a folder into individual archives. Or use this link ;)

Anyway, job done – suggestions? improvements? props? death-threats? Let me know below!

Oh, and cheers to James McDonald for his WP-Syntax hack to stop all the embedded code appearing on a single (incredibly long) line!