2015-01-04 19:11:45

Hello,
Okay, so it seems I totally underestimated the complexity of "downsampling" pcm data.
Since I have to mix all of my audio in one place and feed it to a single playout mechanism, I've been trying to achieve resampling functionality for (1) dealing with audio files that are sampled differently from my system's native sample rate and (2) (unfortunately) having to use it for handling pitch fluctuation.
Within an hour or so I managed to get something that gets the size and speed of the new audio data exactly correct, but the high frequencies are distorted (I'm assuming this is aliasing)?
As of now, I'm just calculating how many samples would have to be removed per second given the source and requested rates, then from there calculating a fraction so to know to delete one out of every N samples.
Is anyone able to comment on how the filters work which prevent this kind of distortion or point me to some resource that helped you understand it?
This is what I have thus far:


short* downsample(short* src_data, int src_size, int src_rate, int channels, int dst_rate, int* dst_size)
{
    float div = (float)src_rate / dst_rate;

    int target_size = src_size / div;//this computes the number of shorts required to hold the result.
   
    *dst_size = target_size;//return output size value.

    short* result = new short[target_size];
    memset(result, 0, target_size * 2);
    float interval = (float)dst_rate / (src_rate - dst_rate)*2;//this computes the number of sample groups (1 times the number of channels) which are to be kept before one is to be dropped


    short* src_pos = src_data;//keeps track of where in the source data we're drawing from.
    short* dst_pos = result;//keeps track of where in the destination buffer we're writing to.
    float tracker = interval;//keeps track of deletion intervals.


    for (int i = 0; i < src_size / 2; i++)
    {
        if (tracker > 0)
        {
            for (int x = 0; x < channels; x++)
            {
                    *dst_pos = *src_pos;//copy one short from source to destination.

                dst_pos++;
                src_pos++;

                tracker--;

            }

        }
        else
        {
            tracker += interval;//+= used because interval is not a hole number, so sometimes copy ceil(interval) and other times copy floor(interval) to maintain averages.
            src_pos += channels;//skip copying (2Channels) samples.

        }

    }





    return result;

}


What I really don't understand about this is that if I'm downsampling by 50 percent (where every other sample per channel is being dropped), I don't hear any distortion in the result; however, lowering from say 44800 down to 44100 (where only one out of every 22 ~ 23 samples are being dropped) results in high frequencies sounding distorted.
I can provide some test audio if interested,
but what do I have to do to accomplish this cleanly?

Thanks.

Official server host for vgstorm.com and developer of the Manamon 2 netplay server.
PSA: sending unsolicited PMs or emails to people you don't know asking them to buy you stuff is disrespectful. You'll just be ignored, so don't waste your time.

2015-01-04 22:01:11 (edited by camlorn 2015-01-04 22:03:06)

First, the easy and high quality way.  Look below to understand how you can do a basic but low-quality one:
You have to go download speex, pull out the files related to their resampler, and plug them into your project.  They implement the entirety of this algorithm.  Yes, that page is very scary.  No, you don't have to actually go that far.  But since someone has already given you the code...
Alternatively, just use audacity and call it a day.  Most libraries for audio are going to allow you to open at any sample rate, so you can just put everything at 44.1k and call it a day.
the low-quality approach:
I do not have isolated code for this.  The Libaudioverse resampler copes with streaming, and is consequently quite, quite long.  I'm also going to be replacing it with Speex.  Anyhow.
You can't just drop samples.  That's wrong.  It will work for dropping the sample rate by a power of 2, for example, or possibly other exact fractions.  But it's not going to work on everything.
Instead, you need to perform linear interpolation.  If you're linear interpolating, you think of the index as a float, not an integer.
If I ask you for sample 5.2 and you're linear interpolating, you would do as follows:
index1=floor(5.2)=5
index2=ceil(5.2)=6
sample1=samples[index1]
sample2=samples[index2]
weight1=index2-index=.80 or 80%
weight2=1-weight1=1-.80=.20=20%
output=sample1*weight1+sample2*weight2
The only part that is left is figuring out what to increment by.  I call this the delta.  If you have 44100 samples per second and want to get to 22050 samples per second, you just do 44100/22050=2 and do a loop that adds to the time in samples and linear interpolates.  If you want to go from 44100 to 48000, do 444100/48000=0.91875.
But that's the math.  Translation to code has a few complexities:
If you are reading past the end of the input buffer, assume 0.  Otherwise the output is unpredictable and includes crashing randomly.
If you continually add to the time variable without wrapping, you will eventually run into very interesting floating point issues.  I personally fixed this by using an integer index and a float offset, but using a double or a long double may push this problem back far enough that you don't need to worry about it.
Figuring out the size of the output buffer is very tricky.  If you know the total duration in seconds (and you do), multiply that by the output sample rate.  The trick here is that you're assuming zero for reads past the end of the input, so you can make your loop only compute exactly enough samples to fill the output buffer-in most cases, if this misses the very last sample of the input, you literally will be unable to tell.
As for filters, if you have to ask you're not ready.  I can point you at this textbook.  Reading it requires knowing LaTeX as a blind person.  Here there be complex numbers.  Here there be integrals.  Here there be integrals involving complex numbers and bounds at infinity.  The truth of the matter is that you can do it without understanding 90% of the content of that book, but you still need to read something to see the final results (understanding the frequency domain, knowing what the shift and convolution theorem means).
If you can understand far enough to know how to code them, here's the RBJ cookbook.  If you google for filter design, everyone just tells you to use a biquad and to use the RBJ cookbook.  This guy doesn't seem super famous outside that, but as far as I can tell that text file has been the thing that everyone has used for like 10 years.
The filter you have to contrive to get is only needed if downsampling and, for the most part, won't be.  Few audio files go high enough for aliasing to be an issue, and cutting the sampling rate in half is probably already losing something.  For complete accuracy, you technically need to have a "brick wall" lowpass filter with the cutoff at the nyquist frequency (half the output sampling rate).
And now you know why I'm saying just go get the code from Speex.

My Blog
Twitter: @ajhicks1992

2015-01-06 18:43:35

Judging by the lack of response, I think I scared everyone off.  Should I explain further or point at Speex or something?

My Blog
Twitter: @ajhicks1992