You can't, not easily.
Whoever originally implemented it decided to replace 10 lines of code with a custom data format and an algorithm to run that format. He removes the phase info from the HRIR and replaces it with some delay lines. If you do take it out and given the architecture of the rest of it, it probably will fail to be fast enough for what you need. The way that it is designed internally is, IMHO, a mess. Partially this comes from the OpenAL API, partially it's from a project that can be traced back to 2005, and partially from the fact that, back then, strange bit tricks and crap really were faster than just multiplication. That last one is a very, very long discussion that I'm not really qualified to have. It's basically just stacked compiler and CPU improvements out our ears. Nevertheless, OpenALSoft somewhere contains my favorite line of code: x &= ~4. If you stopped and stared at that for a minute or two before getting it as I did, you will understand why I dislike it so much. If you didn't, then go ahead because you've probably got a decent chance.
The net result of all of this is that the naive O(m*n) algorithm that takes 22 million math operations a second at 44.1khz is going to do so in such a way that you're not able to take advantage of all the features of your CPU that let you do that kind of thing without bringing the computer to its knees. The rest of OpenALSoft already takes a ton, and I'm almost certain that you can't just use my loop--the model that OpenAL uses for buffers ruins such tricks. I'll post the convolution kernel at the bottom of this post; but you'll still need a rather large dose of trigonometry to figure out what the impulse response itself needs to be (that code, in my much simpler version, verges on 100 lines including loading the datasets, and is mostly math).
I'll make a point of posting a recording for you in the next couple of days with mine. I'm too busy at the moment and it's not trivial for me to do so again until I set a couple things back up. I don't have a side-by-side test as I said, but you should at least be able to tell if it's different enough that you really care. I can load other datasets, but the only other interesting one is CIPIC, and CIPIC is formatted in such a way that it can't be loaded into much if anything without a lot of data munging and conversion that I haven't cared to do yet--I don't know if it's worth it, so I don't want to spend a day or two figuring out how to deal with some of the really weird things about CIPIC.
And the two crucial functions (be aware Libaudioverse is itself GPL, but I hereby give anyone who wants permission to use what I'm putting here for what small bit of good it does). Also be sure to scroll down and see the header snippet and my explanation-you will probably shoot yourself in the foot repeatedly until you do:
void convolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* response) {
for(unsigned int i = 0; i < outputSampleCount; i++) {
float samp = 0.0f;
for(unsigned int j = 0; j < responseLength; j++) {
samp += input[i+j]*response[responseLength-j-1];
}
output[i] = samp;
}
}
void crossfadeConvolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* from, float* to) {
float delta = 1.0f/outputSampleCount;
for(unsigned int i = 0; i < outputSampleCount; i++) {
float weight1 = 1.0f-delta*i;
float weight2 = i*delta;
float samp = 0.0f;
for(unsigned int j = 0; j < responseLength; j++) {
samp += input[i + j] * (weight1*from[responseLength-j-1] + weight2*to[responseLength-j-1]);
}
output[i] = samp;
}
}
And this snippet from the header is also important:
/**The convolution kernel.
The first response-1 samples of the input buffer are assumed to be a running history, so the actual length of the input buffer needs to be outputSampleCount+responseLength-1.
*/
void convolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* response);
/**Same as convolutionKernel, but will crossfade from the first response to the second smoothly over the interval outputSampleCount.*/
void crossfadeConvolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* from, float* to);
And my explanation:
You call these on blocks of data, but you have to maintain a running history for them. In internal Libaudioverse code, these are "kernels", my name for a class of functions that are somewhat like plus: fundamental operations, separated completely from supporting structure for clarity and optimization purposes (I can make these about 4x faster with SSE). You need a block of data that is block_length+impulse_response_length-1 samples long. Every time you "advance" with these, you need to copy the last impulse_response_length-1 samples from the input buffer to its beginning (be extra sure to use memcpy or it will possibly be much too slow). These functions need the tail end of the last block to be there in order to work right (convolution is a weighted average of the past--not doing so is going to just give periodic silence and dropping out of the audio). You should initialize the history section of the buffer with zero.
the reason the history is not separate is because doing so is something between 4 and 20 times slower, depending. The checks to see where to read would be happening 22 million times a second on average for a 44.1khz output sampling rate. Checks for things swamp multiplication and addition by an order of magnitude, and suddenly switching from reading one buffer to another and back causes a phenomenon known as cache misses which is even worse than both of the proceeding put together by an order of magnitude on a modern system.
I hope this helps some. I don't think it does as much as I'd like. There's a great deal of context and knowledge that I can't just shove in a post--the implications of these particular functions are actually deep, but we'd have to go talk about the FFT and stuff.
My BlogTwitter: @ajhicks1992