Peak Normalization: Not the Solution

Example 1 in the attached image represents a dual channel (spoken word) recording with 2 individual  participants (L+R). The recorded waveforms exhibit wide level variations within channels and between channels. Distributing this recording in it’s current state would be bothersome to the listener due to the noted level variations resulting in wide dynamic range and less than ideal average loudness. 

I’ve heard some community members mistakenly recommend per channel Peak Normalization to optimize audio with inherent level inconsistencies. I’m going to explain why Peak Normalization is not the solution.

So what is Peak Normalization?

Peak levels do not represent any aspect of perceived or “average” loudness. Digital Audio Peak Level or Amplitude represents the proportional voltage of a measured audio signal. A PPM Meter (Peak Program Meter) displays a full scale representation of the signal, where 0 dBFS represents the ultimate ceiling. Signals at or above 0 dBFS result in clipped audio.

The process of Peak Normalization is based on the relationship between the existing maximum peak ceiling and a user defined peak ceiling. A global gain shift reestablishes the user’s targeted peak ceiling.

For example if an audio clip exhibits a -2.0 dBFS ceiling, and the user elects to Peak Normalize to -1.0 dBFS, 1 dB of global gain will be added, Conversely  if an audio clip exhibits a 0 dBFS ceiling, and the user elects to Peak Normalize to -1.0 dBFS, 1 dB of global gain will be subtracted.

If you refer back to Example 1 below, the Left Channel has a -1.60 dBFS maximum peak. The Right Channel has a -1.70 dBFS maximum peak. If the Peak Normalization target is -1.0 dBFS, the Left Channel’s global gain offset will be +0.60 dB. The Right Channel ’s global gain offset will be +0.70 dB. 

Example 2 displays the normalized audio. 

Example 3 displays the normalized stereo clip bounced to mono. Notice the clear indication of inconsistent signal level and wide dynamics.This proves the ineffectiveness of Peak Normalization for level matching (Leveling) between two very different clips. In this state the inconsistent  processed audio would be difficult to listen to. More importantly any attempt to Loudness Normalize the Peak Normalized clip to the recommended Internet/Mobile targets would require aggressive limiting in order to meet compliance. Not good …

Example 4 (“Mono Bounce Proper”) displays a subjective example of a properly leveled version of the same clip, with preliminary leveling accomplished manually before the clip was bounced to mono. Dynamic Range Compression was applied in order to reestablish uniformity to the density of the waveform resulting in improved level consistency between participants. The processed audio is now well suited for final Loudness Normalization.

Example 5 displays the stereo clip (prior to bouncing to mono) with each channel manually processed to establish level consistency.

I am a firm believer that controlled dynamics (for spoken word) will improve the listening experience when using ubiquitous internet/mobile consumption devices in less than ideal environments. 

Let me stress that I do not support hyper-compression. Compression can potentially elevate a problematic noise floor, exaggerate problematic breaths, and degrade overall fidelity. The key is to use it to your advantage without sacrificing quality. If you feel you lack the manual skills and/or tools to achieve acceptable results, consider using something like Auphonic and take advantage of their perfectly acceptable Leveling algorithm(s).

Remember: Peak Amplitude gain shifts will not establish average perceptual loudness consistency between multiple clips.

-paul.
Photo
Shared publiclyView activity