When mixing, I’d recommend you create the following routing.
Route each instrument group to a bus. So bass bus, drum bus, etc.
The lead vocal should stand alone - so all doubles and harmonies can go to a bus, but the lead vocal should have a distinct bus in which the lead comp and all of its auxiliary effects should be routed.
Then all instrument busses, and the backing vocals can be routed to an instrumental bus.
Aside from giving you a lot of control, this setup let’s you measure something incredibly important: the LUFS of the Instrumental Compared to the LUFS of the lead vocal.
In the majority of popular mixes, the lead vocal has the same loudness as the instrumental.
Whether they did this on purpose or not, mixers of previous decades found this to be the optimal balance between the lead vocal and instrumental; and what resulted in the most favorable response from listeners.
So once you’ve created this routing, use LUFS meters to compare the loudness of these 2 aspects. Try to get them to match as closely as possible. Then if you want the vocal to sound slightly below the mix, lower it’s level slightly. Want it to sit on top of the mix? Raise it slightly.
As I mentioned, you’re not obligated to follow this exactly, but knowing this relationship is crucial to determining where you’d like your vocal to sit.
Here’s the setup with matched loudnesses, the vocal slightly below, and the vocal slightly above.
Watch the video to learn more >
Aside from overall loudness, this is the primary reason it can be difficult to discern the lyrics or vocal performance.The vocal has 3 formants, or areas responsible for vowel and consonant articulation. The most prevalent between 3-5kHz.
Meanwhile, roughly 200-400Hz is the primary masker of 3-5kHz. In other words, if 200-400Hz is too high in amplitude in either the vocal, or the surrounding instrumentation, the vocal’s 3rd formant will be covered. In turn it’s hard to understand the vocal performance and it’ll sound unclear or muffled.
Lastly, 200-400Hz is an incredibly populated range - either it’s the area of the fundamental frequency for a lot of instruments, or it contains a low order overtone or harmonic.
So, if the vocal sounds unclear or muffled, odds are it’s due to this range.
To fix it, you can find instruments with too high of an amplitude in this range and subtly attenuate it. Instead of doing this aggressively to one instrument, spreading out the attenuation amongst multiple tracks will sound more natural.
Additionally, you can subtly attenuate this range in the lead vocal. Lastly, you can subtly boost the 3rd formant in the vocal.
Keep in mind that other processors like reverb, saturation, delay and other additive forms of processing, can emphasize 200-400Hz, so these processors may also need to be adjusted if you’re having this common issue.
Take a listen to how affecting these ranges alters the perceived clarity of the vocal.
Watch the video to learn more >
Although the expected vocal sound changes from genre to genre, the most popular is an upfront and detailed vocal.
Peak down compression with make up gain is the most common way to achieve this sound, but it can often sound over-processed due to the distortion and ADSR changes it causes.
The good news is that a vocal, and any other instrument, can have its quieter details amplified without first attenuating peaks.
For example, if I use the Oxford inflator, notice that the lower amplitude aspects of the signal are boosted while the peaks aren’t attenuated until the reach nearly 0dB.
The Omnipressor does something similar, as does parallel compression - lower amplitudes are boosted while peaks are left alone for the most part.
By blending subtle to moderate peak down compression, with maximization, or the amplification of quieter details without affecting peaks, we can easily achieve that upfront sound while avoiding unwanted artifacts or a noticeably processed vocal.
Watch the video to learn more >
As you probably know, doubling a lead vocal is a great way to make the lead sound fuller and more impressive. Small variations in timing, pitch, amplitude, the frequency response and more, blend in with the original performance, causing a more complex sound.
There is a caveat to this though. Sibilance and aggressive consonants ruin the illusion of the 2 or more performances coming from 1 vocalist.
Although small timing differences are beneficial, the shortness or percussive nature of sibilance and consonance gives the listener a cue that these are 2 separate vocals, if they’re out of time.
For that reason, heavily de-essing BGVs and doubles is a good idea if you want them to blend in with the lead vocal.
Alternatively, you can attenuate high frequencies with an EQ, or a dynamic EQ.
Furthermore, amplifying the mids can help them blend by accentuating the areas with the greatest differential in pitch, in turn causing a very subtle chorus effect.
Let’s listen to a double in which the sibilance is aggressive enough to indicate separate performances, and then that double de-essed heavily to help it blend in.
Watch the video to learn more >
Temporal effects like reverb and delay are needed to add space, thicken a vocal, or add a creative sound; however, they can quickly decrease the vocal’s intelligibility.
There are 2 main ways to add as much reverb or delay as you want without washing out the vocal.
The first is ducking.
Some reverb and delay plugins include ducking, which attenuates the beginning of the effect to allow the vocal’s transient to come through. As a result the vocal retains its clarity, then the effect comes in soon after.
But using plugins that have built in ducking severely limits what you can use.
So, here’s how to duck any reverb or delay.
Set up the temporal effect on an auxiliary or parallel track. For example, say I want to use Logic’s stock reverb plugin - I’d insert it on the aux channel, dial in the settings I want, and then insert a compressor.
With the compressor, I’ll side-chain the original dry vocal track. With the compressor triggered by the dry vocal, I’ll attenuate the reverb for the vocal’s attack and decay.
This will reduce the amplitude of the reverb or delay for that initial period, letting the vocal’s transient through.
The second method is with EQ after the temporal effect.
By reducing masking frequencies, again, 200-400Hz, the effect interferes with the vocal’s clarity less than it normally would.
But, the best sound comes from a combination of these 2 methods.
Let’s listen to high levels of reverb without ducking and EQ shaping, and then high levels of reverb with ducking and eq shaping. Notice how even though the reverb is aggressive, ducking and EQ significantly helps the vocal stay intelligible.