For this chain, I want to focus on creating an incredibly detailed, present, and forward vocal but retain the natural sound of the performance and minimize the impact of unwanted artifacts.
We’ll start with phase alignment, subtractive EQ, compression, disharmonious frequency attenuation, and de-essing on our main channel.
Then on buses or sends we’ll perform a really unique form of saturation that results in a super controlled and clean sound, as well as amplifying mid frequencies without adding noise or distortion.
Lastly, we’ll route the vocal and sends to a collective bus, on which we’ll mimic how our ears compress low frequencies, add natural room reflections, clarify the vocal with high-frequency delay, and shape the overall sound with additive EQ.
This chain is somewhat complex and may seem like overkill to some, but it really does result in a super clean sound. Additionally, I’ll explain everything in detail as we go and offer free plugin alternatives, so stick around to get the best understanding of why this chain works.
Let’s listen to the full A B with peak normalized tracks to hear how the chain affects the vocal.
I’ve detailed this step for mastering, I think it’s even more important when processing vocals since the issue is more likely to occur.
In short, a recording can have a dc offset caused by the mic or preamp - this means that the recording can be shifted to the positive or negative phase or side.
Ideally, the peaks and troughs of a waveform should be in equal proportion and centered along an x-axis or voltage unity.
If a recording is shifted, any processor I use will misread the peaks - this is especially important with processors that include thresholds like compressors, saturators, and more.
That said, it’s a good idea to fix it before processing.
With this Izotope RX stand-alone plugin, I’ll import my vocal recording, open the phase module, then hit suggestion. It’ll show me if the vocal has an offset, and by how much, and will automatically fix the issue.
Note that nothing is deleted from the vocal, and it has no effect on the sound, only on how accurately my processors will measure it.
Unfortunately, I don’t know of a free alternative for this software, so if you do please let me and others know about it in the comments.
Also, there aren’t any tools or plugins I can find that create a dc offset - so my options for demonstrating it are limited. But, let’s observe a compressor attenuating more on a vocal that hasn’t had its offset fixed, and one that has.
Before we begin to introduce additive processing, let’s remove frequencies from the vocal that are disharmonious or present in excess - attenuating these problem frequencies will make the vocal sound more balanced and musical.
I’ll use this FabFilter Pro-Q 3 but a good free alternative is MEqualizer by Melda Audio - it only offers 6 bands, but it’s very useful nonetheless.
I’ll start with a high-pass filter to remove resonances picked up from the mic and mic stand - then I usually dip a little of 250Hz. Although there are rarely hard-set rules in audio, 250Hz. always covers up or masks 2-5kHz, which is where a lot of the vocal’s clarity is.
Next, I want to figure out the key of the vocal and song. If I don’t know this, I’ll use TuneBat’s analyzer which will tell me the key, and alternative key, and the BPM, which will be helpful later on.
To keep things simple let’s say the song’s key is C Major - what I’d do is look and listen to frequencies in the mids that don’t fit into this key - so potentially D#, F#, or another note that isn’t in the key but has a high amplitude in the recording.
I’ll center a band on the frequency, use a slightly more narrow Q value, and then attenuate by a couple of dB.
Lastly, I like to attenuate a little of the vocal’s sibilance or ‘Ess’ sounds in the highest frequency range. You can use your ears but the easiest way is to observe the frequency analyzer and find where is the energy is concentrated whenever a sibilant occurs.
Then I’ll attenuate it by 1-2dB with a bandwidth that matches that of the sibilant range.
Let’s take a listen to our vocal processed with subtractive EQ, and notice how it sounds slightly more balanced.
Next, let’s use compression to control the vocal’s dynamics but just as importantly, bring the vocal forward with post-compression amplification.
We’ll want to use quick settings to quickly capture the transient - one issue though is that fast attack times can cut into a transient, causing distortion.
To work around this, let’s use 2ms of lookahead, which will measure the transient before it occurs, allowing the compressor to attenuate the full transient without causing distortion.
Then we’ll set a 40-50ms release time to quickly return the vocal to unity, but avoid the same type of distortion that would have occurred with a super quick attack.
A 4:1 ratio with a softer knee works well - but be sure to use your ears and try to avoid more than 6dB of attenuation.
Lastly, I’ll use automatic makeup gain to amplify the vocal post-compression. By reducing the peaks and then amplifying, we bring forward quieter details of the vocal that would’ve otherwise been masked by louder signals.
Let’s listen and notice how the vocal sounds more controlled, and how the increased details from post-compression amplification make the vocalist sound closer to the listener.
Last chapter we used the FabFilter Pro-C2 for our compression, but a great free alternative is MCompressor by Melda Audio.
Using it will look a little different than last chapter, so let’s cover how to use it quickly in case you want to follow along but don’t have the Pro-C2.
With it, I’ll set a 1.5ms RMS setting, which will average the input amplitude, causing less erratic compression. Then I’ll set a 20ms attack time and 20ms release - since we don’t have lookahead with this plugin, we’ll have to avoid super short attack times due to distortion.
I’ll select the soft-knee setting, and increase the value of the knee-size, and use a 4:1 ratio, again, trying to attenuate by no more than 6dB.
Lastly, in the output section, we can enable automatic gain compensation, which will adjust the output similar to the auto-make-up gain setting from last chapter.
Let’s take a listen and compare this compressor with the one we used last chapter to see if they create a similar sound.
In chapter 3, we discussed the idea of disharmonious frequencies or ones that exist outside the key of our track. Although we don’t want to get rid of these completely, it helps to attenuate them whenever they get too loud.
I’ll use this multi-band transient expander called Punctuate by Newfangled audio, and select 10 bands as the setting. Then I’ll go over to the frequency aspect and instead of using the preset Mel frequencies, alter these to frequencies that are outside the song’s key.
You’ll probably need to look these frequencies up first depending on the key of the vocal you’re working on, but I’ll set mine according to this particular vocal.
Once the frequencies are set, I’ll use a suppress setting to attenuate them by about 1-2dB whenever they trigger the plugin. Additionally, I’ll set the measurement to a multi-band setting to allow each band to work independently.
This way, whenever a particular out-of-key frequency is strong enough, the processor will attenuate it.
If you’re having trouble getting this to sound natural, use the mix dial and blend in the effect until the vocal sounds more musical but the processing itself isn’t too noticeable.
Unfortunately, I don’t know of a free multi-band, frequency-specific transient suppressor, but if you do let me and others know about it in the comment section.
Let’s listen to the processor, and notice how the vocal sounds more musical by dynamically reducing disharmonious signals.
Although we attenuated some of the sibilance in chapter 3, I want to do it dynamically with a high-frequency compressor, more commonly called a de-esser.
The idea is to center the band over the frequency range as we did in chapter 3 and then set the threshold carefully so that only the sibilance is attenuated.
If you have access to one with an attack and release, set a quick attack and a quick release to return the signal to unity quickly. Since we’re working with higher frequencies, we don’t need to worry as much about these quick settings causing distortion.
I usually attenuate by 3dB at most, but you might need to attenuate less or more depending on the vocalist, the EQ of the microphone used, and more.
Lastly, I’ll adjust the makeup gain to amplify as much as I attenuated. This will reduce the harshness of the discordant sibilance and amplify more musical-sounding highs.
I’ll use this Weiss de-esser, but a good alternative is T-De-Esser by Techivation or an even better really affordable option is Sibilance 4 by Toneboosters.
Let’s listen to the de-esser being introduced and notice how the highs become cleaner.
For the past 7 chapters we’ve been inserting our processing on the main vocal track, but let’s start introducing parallel processing on auxiliary tracks.
The method I’m about to describe is a little complex, but it results in very controlled saturation.
To explain it let me quickly detail what a saturator does - in short it introduces harmonics. Harmonics are multiples of a fundamental frequency, so if the fundamental is 80Hz, a 2nd-order harmonic is 160Hz, while a 3rd-order harmonic is 240Hz, and so on.
That said our ears perceive these multiple frequencies as 1 note since they’re related to one another.
Intermodulation distortion on the other hand is somewhat similar in that specific frequencies are being amplified; however, unlike harmonics, these frequencies are not related to the fundamental. For example, if distortion occurs to 80Hz and it results in the amplification of 200Hz, this would be intermodulation distortion.
The reason I bring this up is that saturators can cause intermodulation distortion if they misread the fundamental frequency. If a saturator reads the fundamental as 100Hz,but the fundamental is actually 80Hz, then it could result in the amplification of 200Hz, which as we just covered, is disharmonious to our actual fundamental in this example.
So to completely avoid this issue, here’s an option I’d like to try out.
First, send the vocal to an auxiliary track, on which we’ve inserted a linear-phase EQ. Next, use this linear phase EQ to isolate the fundamental frequency of the vocal ; keep in mind that this probably won’t be just 1 note, since the singer will sing various notes, but it’s often within an observable range of frequencies.
With a high slope high pass and a high slope low pass, isolate the range of fundamental vocal frequencies.
Then insert your saturation plugin and dial in the effect. If we observe the output of the saturator using an analyzer, we’ll notice that the harmonics are all uniform - and that we don’t have intermodulation distortion.
But, once I start increasing the frequency range being affected, the output of the saturator is showing both harmonic and intermodulation distortion.
Once I have the harmonics that I want, I’ll use the aux track’s channel fader to blend the effect with my original vocal. One more thing I like about this setup is that I can use the second EQ, after the saturator to amplify various frequency ranges - so say I want some more harmonics around 2kHz,I could create a bell and amplify that range. Additionally, if I want to exclude the fundamental, I can use a high pass - resulting in this channel having only the harmonics or overtones.
Now keep in mind, intermodulation distortion may not always sound bad or even be perceivable, so you’ll have to use your ears, but let’s take a listen to how this controlled saturation sounds, and see if it makes the vocal sound full yet retains its clean nature.
In the last chapter, I used the clean tube setting of fabfilter’s Saturn 2 to saturate the signal, but if you don’t have this processor, a good alternative is GSat+ by TBProAudio.
Additionally, if you don’t have a linear phase EQ, Logic Pro offers one as a stock plugin. For anyone not using Logic Pro, LKJB has a free plugin called QRange which is a good option.
Let’s use these free plugins with the same setup and notice how the sound is similar.
For this next step, I’m going to bring up quieter aspects of the mid frequencies, but in a way that doesn’t increase the noise floor.
First, let’s again set up a send from our channel track, and on the corresponding auxiliary track insert a linear phase EQ. With it, I’ll isolate the mid-frequencies, usually between 4-500Hz and 5kHz.
Like the last chapter, I’ll use low and high pass filters to isolate the range, but this time use more gradual slopes, 18-24dB per octave should work well.
Then, I’ll insert a gate that will attenuate the vocal whenever it falls below a specific amplitude, as determined by the threshold. I’ll set the threshold around -50dB, and use a softer knee and a little lookahead.
I’ll use this FabFilter Gate, but if you don’t have it, you can use this MCompressor by Melda Audio. If you set a 1:1 ratio, and select custom shape, you can determine the amplitude at which the signal will be attenuated. In this instance, we’ll want to drop the amplitude when the signal falls below 50 dB.
Last up, let’s insert the upward compressor MV2 and increase the low-level signal. This will measure and compress quieter parts of the signal, then introduce make-up gain to amplify the details of the mid-frequency range.
Since we inserted a gate before this, the noise floor won’t be amplified by MV2. Although MV2 is an affordable option, a good free alternative is OTT by Xfer records.
Once we’ve dialed in the compression and made any needed adjustments to the EQ or gate, we can determine how much of this effect we want by adjusting the aux channel fader level.
Let’s listen to this effect blended in and notice how the mids sound much more present and detailed.
At this point we’re done creating aux channels - what we want to do now is route both the original channel and our auxiliary tracks to a collective bus where we can process them all together.
To do this, I’ll change the output of the channel and aux channels to the same bus.
For the rest of the video, we’ll add processing to this bus.
First, let’s insert a multi-band compressor - with it, I’ll create incredibly natural-sounding compression to add a final stage of dynamic control to the vocal.
Let me quickly explain the thinking behind this step - in short, 2 muscles in our ears, naturally protect our eardrums from loud sounds.
To do this the muscles contract, and narrow the passageway to the eardrum, resulting in attenuation from 1dB to 20dB depending on the intensity of the sound.
The contraction of the 2 muscles isn’t instant though - it takes between 35-40ms for it to happen, and these muscles stay contracted for about 130-150ms. Additionally, this contraction mainly attenuates frequencies below 1000Hz.
With all of this in mind, let’s use a multi-band compressor, and isolate our attenuation to 1000Hz and below. Then, we’ll set the attack to 40ms and release to 150ms, and if possible, set a softer knee to mimic the gradual attenuation caused by the tensing of the muscles.
I’ll only introduce a couple of dB of attenuation to keep the effect from being too obvious. Be sure things like lookahead, make-up gain, and any other unrelated functions are turned off.
I’ll use this Izotope compressor, but a good free alternative is TDR Nova by Tokyo Dawn Labs.
Let’s take a listen, and let me know in the comments if the compression sounds natural, and if this is something you plan on trying.
At this point in the chain, our frequency is balanced, our dynamics are controlled, and the overall sound of the vocal is full and present. Let’s add some reflections to give the vocal some depth, and help it fit into a mix.
I’ll use this Seventh Heaven reverb, but a good free alternative is this Reverb by Stone Voices.
With it, I’ll find a room emulation - a studio setting will work well. I’ll blend in the effect with the mix dial and ensure that I have a pre-delay of at least 10 milliseconds to let some of the dry vocal cut through.
If you’re using the free alternative, try a decay time of around 1 second, again use a longer pre-delay, and ensure that the width isn’t too aggressive.
This step is really simple, but these natural-sounding reflections really help a vocal sound like it was cleanly recorded in an actual space, instead of maybe a bedroom studio.
Let’s take a listen to it.
To help the vocal stick out and to give it a unique characteristic, I’m going to introduce delay, but, to the frequencies that provide the most vocal clarity.
I’ll use this FabFilter Timeless plugin, but most stock delay plugins will be fine.
With it, I’ll use a 1/8th note delay on one channel, and a dotted 1/8th note delay on the other - then isolate the reflections to the clarifying ranges of the vocal.
Be careful not to include the delay to sibilance areas - for one, it’ll amplify unwanted sibilance, and 2 since the sibilance has a quick attack and decay, the transients will be really apparent, making the effect too noticeable.
Like last chapter, this is a super simple step, but these settings are a great starting point for vocal delay.
Let’s take a listen.
This will be the last step in our clean vocal chain - with an EQ, we’ll boost any and all frequencies that we want more of, creating a final shape for the vocal. Additionally, if I find a range that was amplified too much, I’ll dip some of it.
What’s great about adding this EQ at the end of our chain, is that it shapes all of the processing that comes before it - including the saturation, delay, and so on.
For this vocal, I’ll boost a little of the fundamental, dip a little of 250Hz to add some clarity to the highs, maybe amplify a little of 500Hz to increase vowel pronunciation, and then amplify some of the vocal’s clarifying 2.5-5kHz range.
What you do is completely up to you, and you should by no means feel like these are the only frequencies you can affect, but I find they make a good starting point.
Let’s take a listen to the vocal with this final shaping, and note that I’ll probably change some of these bands slightly to best suit this particular vocal.
Thanks for watching, and be sure to check out the link in the description for a free mastered sample of your mix.