Voice recognition – we’re not quite there yet

Voice recognition is only ever going to get better (check out CMU Sphinx if you want to play with speech rec. for yourself), but at the moment… um, it’s not quite up to scratch:

Okay, so he’s putting in on a bit for the lulz, and if he’d just said “where’s the nearest pub?” he’d probably get a useful answer – but that’s half the battle with speech recognition. At the moment we have to adapt our speech to the software, but in the future I don’t think we’ll have to at all.

At least for the time being, as Penny Arcade put it, it’s all going to be a bit like this:

Penny Arcade Kinect Integration - Jan-2012

Best keep your deer-combs handy.

How-To: Easily Remove the Vocals from Most Songs

2015 Shortcut: When I wrote this article Audacity didn’t have an automatic center-panned vocal canceling effect… but now it does, so rather than do the stereo-separate / invert-one-track / play-both-as-mono trick (and that’s pretty much all there is to it), you should be able to find the Vocal Remover option in the Effects menu – but it’s more fun / interesting and can give better results if you do it yourself! =D

Audacity now has a built-in center-panned vocal canceling effect.

I found this trick the other day whilst stumbling the Interwebs and thought I’d do a quick-write up w/ pictures to make it as easy as possible… For this exercise we’re going to be using a piece of free audio software called Audacity, which you can get for Linux, Windows and Mac.

The track I’m using in this example is the first 50 seconds of Ben Folds – Zak and Sara, where the voice kicks in at the 11 second mark, and the original sounds like this:

Once you’ve got a copy of Audacity for your platform of choice, fire it up and follow these simple steps to get rid of the vocals from most songs:

1.) Import Some Audio

From the menu in Audacity, choose File | Import | Audio and then select an mp3 (or any audio format Audacity understands) to work with.

Audacity - Import

2.) Duplicate The Tracks

We’re going to come back later and use the bass from this to give it a nice, full sound – but for now just duplicate your imported audio by going to Edit | Duplicate:

Audacity - Duplicate

Once you’ve duplicated the tracks, we’ll mute our copy for now by clicking on the Mute button to the left of the waveform as shown:

Audacity - Mute

3.) Separate Our Original Tracks, Convert To Mono and Invert One Of Them

This is the key part of the process: because vocal tracks on songs are commonly recorded as mono and then mixed into stereo – by separating the tracks and making them act as separate mono tracks, we can then invert one of them to have them cancel each other out! And since usually only the vocal waveform is identical (i.e. mono mixed to stereo) it’s only the vocals that magically disappear from the sound! Ha!

So, to start off we need to click on the little down-arrow to the left of our original wave form and select Split Stereo Track:

Audacity - Split Stereo Track

Once the waveform’s been split (so we can mess with both channels individually) double click in the lower of the two waveforms (the right channel) to select it all, and then from the menu choose Effect | Invert as shown:

Audacity - Invert Right Channel

Now for the last really important step – simply set both left and right channels to output as mono by clicking on the little down-arrow to the left of each waveform and selecting Mono. Don’t forget to set both of them to Mono or the magic won’t happen!

Audacity - Convert to Mono

With that done, give it a play and see what happens! With any luck, there won’t be any vocals in the track – so with my example, it now sounds like this:

You’ll notice at the end that the vocals come back (the backing singing etc.) – why? Because it wasn’t recorded as a mono source, and hence doesn’t get cancelled out by the inversion we did earlier – so this technique won’t work for all songs – only ones where the voice is recorded in mono and then mixed into stereo, which to be fair, I think it a pretty large swathe of ’em, and it’d be perfect for karaoke or something like this anyway because you’d want the backing vocals there!

If you wanted to know more about how this wave-form cancellation works, you can always look up Superposition of Waves, but I’ll leave that as an exercise for the curious =D

4.) Filter Our Original To Add Back The Bass

Update: BigFuz points out in the comments below that an easier way than using equalisation to filter our copy so that it only keeps the bass is to use a Low Pass filter and just enter a value of 200Hz or 250Hz (whichever works best for you). You won’t be able to add back both bass and treble with a single pass using this method, but you may not want or need to! To apply a low pass filter to the copy, you can just select Effects | Effects 1 to 9 | Low Pass Filter from the menu – too easy! Relatedly (and yeah, it’s a bit obvious, but I use this to keep track myself), a quick way to remember which way around low-pass/high-pass goes is to think that a low pass filter allows everything below the given frequency to pass through, so a high pass filter must allow any frequencies higher than what you provide to pass through.

The voice-cancelled audio above sounds pretty good, and the vocals are definitely gone, but in the process we’ve stripped out a lot of the lower frequency sounds (i.e. the bass). So remember when we duplicated our waveform and muted it right at the beginning? This is where it fits in…

Un-mute our duplicated (and still stereo) audio copy by clicking on the Mute button to the left of the waveform, double click on the waveform to select it all, and then from the menu choose Effect | Equalization as shown:

Audacity - Equalisation

When the equalisation window pops up, we’re going to filter it so that all sounds above 200Hz are stripped out. To do this, just click somewhere on the main part of the window and a white dot will appear, click again and another will – then click on them to drag them around until you get a shape that looks kinda like this:

Audacity - Only Keep Bass

Notice that I’ve dragged the bottom-left slider all the way down to get access to the full 120Db and not just the 30Db on the scale by default.

You might have to have a bit of a play to get it right, but all we’re really doing is saying “Leave anything with a frequency of 200Hz or less alone, but drop the volume of anything over that frequency by around 120Db” (i.e. remove it entirely!).

If you mute our top two mono tracks and play it back, you should get the filtered version of the stereo track with only the bass remaining, which for my example sounds like this:

5.) Un-mute Our Original Voice Cancelled Tracks

With the vocal-free (but a bit tinny) audio playing at the same time as our bass-only version, we get a pretty neat sound with good bass and no vocals! Result! =D

You can then just go to File | Export to save the finished vocal-free version to an mp3 or such, if you wanted to keep it.

Wrap Up

I’ve read that some people like to cut out the sections between 200Hz and 1000Hz or so (1KHz, although I’ve also seen people push it up to 6KHz) to keep the low-end and high-end sounds, but when I was playing with this I kept getting some voice creeping back into the mix. This could well have been because I was only dropping 30Db when I was messing around with it though – so go nuts and experiment if ya wanna!

The shape I used for that EQ setting was:

Audacity - Keep High and Low Only

With that all said and done, I hope you found this guide useful – I didn’t come up with the technique or anything like that, I just saw a 10 line how-to and had to mess around for half an hour to get it to work, so thought I could knock up a quick guide that shows how it’s done really clearly, and I hope you have fun with the technique!

Flattr this

New XBox 360 Motion Capture System

Looks interesting, and I really don’t want to be a hater, but you’ve got to wonder about some things:

– How fine a degree of control will you have?
– If I’m playing a skateboarding game, can the camera tell if I want the board to do a heelflip, or a heelflip shove-it? Ya know, is it accurate?
– How accurate do -I- have to be?
– How knackering will playing fighting games be on a scale of 1 to fuck-that?
– How’s a FPS gonna work? One hand for gun angle and the other for forward/back/left/right/strafe?
– Who do I sue when I break my neck on the coffee table playing Mirrors Edge 2?

Also, it’s all showing an idealised version of things not real gameplay, as the kids fingers were covering part of the board when he scanned it, yet no fingers in the scanned image. Hmm, I’m coming off as a hater here – and I’m sure lots of cool uses for this will arise, I’m just not sure about how many of them aren’t kinda gimicky…

Also, the voice recognition will be like this – no matter WHAT you ask it to do =P :


Okay, so I’m cynical as all hell AND a playah-hater… Oh, well. I really hope it works out, but I guess the only way to find out will be to wait and see…