<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en"> <head> <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=UTF-8" /> <meta name="generator" content="AsciiDoc 8.6.3" /> <title>Psychoacoustic Models in TwoLAME</title> <link rel="stylesheet" href="./twolame.css" type="text/css" /> <script type="text/javascript"> /*<![CDATA[*/ window.onload = function(){asciidoc.footnotes();} /*]]>*/ </script> <script type="text/javascript" src="./asciidoc-xhtml11.js"></script> </head> <body class="article"> <div id="header"> <h1>Psychoacoustic Models in TwoLAME</h1> <span id="revnumber">version 0.3.13</span> </div> <div id="content"> <div class="sect1"> <h2 id="_introduction">Introduction</h2> <div class="sectionbody"> <div class="paragraph"><p>In MPEG audio encoding, a psychoacoustic model (PAM) is used to determine which are the sonically important parts of the waveform that is being encoded. The PAM looks for loud sounds which may mask soft sounds, noise which may affect the level of sounds nearby, sounds which are too soft for us to hear and should be ignored and so on. The information from the PAM is used to determine which parts of the spectrum should get more bits and thus be encoded at greater quality - and which parts are inaudible/unimportant and should thus get fewer bits.</p></div> <div class="paragraph"><p>In MPEG Audio LayerII encoding, 1152 sound samples are read in - this constitutes a <em>frame</em>. For each frame the PAM outputs just <strong>32</strong> values (The values are the Signal to Masking Ratio [SMR] in that subband). This is important! There are only 32 values to determine how to alloctate bits for 1152 samples - this is a pretty coarse technique.</p></div> <div class="paragraph"><p>The different PAMs listed below use different techniques to decide on these 32 values. Some models are better than others - meaning that the 32 values chosen are pretty good at spreading the bits where they should go. Even with a really bad PAM (e.g. Model -1) you can still get satisfactory results a lot of the time. All of these models have strengths and weaknesses. The model <em>you</em> end up using will be the one that produces the best sound for your ears, for your audio.</p></div> </div> </div> <div class="sect1"> <h2 id="_psychoacoustic_model_1">Psychoacoustic Model -1</h2> <div class="sectionbody"> <div class="paragraph"><p>This PAM doesn’t actually look at the samples being encoded to decide upon the output values. There is simply a set of 32 default values which are used, regardless of input.</p></div> <div class="paragraph"><p><strong>Pros</strong>: Faaaast. Low complexity. Surprisingly good. "Surprising" in that the other PAMs go to the effort of calculating FFTs and subbands and masking, and this one does absolutely <strong>nothing</strong>. Zip. Nada. Diddly Squat. This model might be the best example of why it is hard to make a good model - if having no computations sounds OK, how do you improve on it?</p></div> <div class="paragraph"><p><strong>Cons</strong>: Absolutely no attempt to consider any of the masking effects that would help the audio sound better.</p></div> </div> </div> <div class="sect1"> <h2 id="_psychoacoustic_model_0">Psychoacoustic Model 0</h2> <div class="sectionbody"> <div class="paragraph"><p>This PAM looks at the sizes of the <em>scalefactors</em> for the audio and combines it with the Absolute Threshold of Hearing (ATH) to make the 32 SMR values.</p></div> <div class="paragraph"><p><strong>Pros</strong>: Faaast. Low complexity.</p></div> <div class="paragraph"><p><strong>Cons</strong>: This model has absolutely no mathematical basis and does not use any perceptual model of hearing. It simply juggles some of the numbers of the input sound to determine the values. Feel free to hack the daylights out of this PAM - add multipliers, constants, log-tables <strong>anything</strong>. Tweak it until you begin to like the sound.</p></div> </div> </div> <div class="sect1"> <h2 id="_psychoacoustic_model_1_and_2">Psychoacoustic Model 1 and 2</h2> <div class="sectionbody"> <div class="paragraph"><p>These PAMs are from the ISO standard. Just because they are the standard, doesn’t mean that they are any good. Look at LAME which basically threw out the MP3 standard psycho models and made their own (GPSYCHO).</p></div> <div class="paragraph"><p><strong>Pros</strong>: A reference for future PAMs</p></div> <div class="paragraph"><p><strong>Cons</strong>: Terrible ISO code, buggy tables, poor documentation.</p></div> </div> </div> <div class="sect1"> <h2 id="_psychoacoustic_model_3">Psychoacoustic Model 3</h2> <div class="sectionbody"> <div class="paragraph"><p>A re-implementation of psychoacoustic model 1. ISO11172 was used as the guide for re-writing this PAM from the ground up.</p></div> <div class="paragraph"><p><strong>Pros</strong>: No more obscure tables of values from the ISO code. Hopefully a good base to work upon for tweaking PAMs</p></div> <div class="paragraph"><p><strong>Cons</strong>: At the moment, doesn’t really sound any better than PAM1</p></div> </div> </div> <div class="sect1"> <h2 id="_psychoacoustic_model_4">Psychoacoustic Model 4</h2> <div class="sectionbody"> <div class="paragraph"><p>A cleaned up version of PAM2.</p></div> <div class="paragraph"><p><strong>Pros</strong>: Faster than PAM2. No more obscure tables of values from the ISO standard. Hopefully a good base to work from for improving the PAMs</p></div> <div class="paragraph"><p><strong>Cons</strong>: Still has the same "warbling"/"Davros" problems as PAM2.</p></div> </div> </div> <div class="sect1"> <h2 id="_future_psychoacoustic_models">Future psychoacoustic models</h2> <div class="sectionbody"> <div class="paragraph"><p>There’s a heap that could be done. Unfortunately, I’ve got a set of tin ears, crappy speakers and a noisy computer room. If you’ve got the capability to do proper PAM testing then please feel free to do so. Otherwise, I’ll just keep plodding along with new ideas as they arise, such as:</p></div> <div class="ulist"><ul> <li> <p> Temporal masking (there’s no pre-echo or anything in TwoLAME) </p> </li> <li> <p> Left Right Masking </p> </li> <li> <p> A PAM that’s fully tuneable from the command line? </p> </li> <li> <p> Graphical output of SMR values etc. Would allow better debugging of PAMs </p> </li> <li> <p> Re-sampling routines </p> </li> <li> <p> Low/High pass filtering </p> </li> </ul></div> </div> </div> </div> <div id="footnotes"><hr /></div> <div id="footer"> <div id="footer-text"> Version 0.3.13<br /> Last updated 2011-01-01 19:15:01 GMT </div> </div> </body> </html>