2014-12-15 18:44:13 (edited by Victorious 2014-12-15 18:45:21)

I've been experimenting with using HRTF. It does a good job for me when playing sounds that are left/right of the listener, and above/below to the left or to the right. However, all of the datasets I've tried so far aren't that good for allowing the listener to distinguish height of the sound when its directly above/below. The kemar dataset that is what openal-soft uses by default seems to be the best so far, and I've also tried some data sets from the listen hrtf database. Does anyone have any suggestions for a better HRTF data set?

2014-12-15 21:05:19

Unfortunately, no.
This particular plane is almost impossible to distinguish.  To be honest, I'd give up on vertical aiming now, if that's what you're considering.  It's a pain mathematically, and you can't do it without additional radars on top of the HRTF.  Anything above +70 degrees starts snapping to the directly above position really hard, the particular circle in question is incredibly difficult, and there are no datasets that go below -45 anyway (at that point, the torso/chair/whatever starts becoming a problem, and any points recorded there would consequently not work well anyway).  When it comes down to it, HRTF makes what we already have much more immersive, but doesn't add too many things you can do with gameplay itself.
Also be aware that, if your goal is to distribute OpenALSoft, that's going to be...difficult.  Not impossible--I've got a fork on Github that lets you override the user's config file.  But getting it to do what you want on all machines can be quite strange.  As I recall, you have to kill everything but Winmm and distribute HRTF datasets for all common sampling rates; the file name for selection has placeholders.  it will also fail, in some configurations and sometimes, with the Logitech G930s and anything else "faking" surround sound already unless you specify an additional option forcing stereo.

My Blog
Twitter: @ajhicks1992

2014-12-16 06:36:20 (edited by Victorious 2014-12-16 14:23:41)

Hmm, I listened to a rapture3d demo of HRTF and I was able to distinguish whether the sound was directly above or below.

I use openal-mob, a fork of openal-soft that allows specifying of config options at runtime instead of from an annoying ..ini file. I've successfully distribute my demo app with just the release dll of openal-mob.

If the z plane is not possible to do with HRTF, how do other games do it or bgt?

2014-12-16 16:05:26

I didn't know about OpenAL-mob.  My fork doesn't allow much changing at runtime: you have to use special functions that take the place of reading  the config file.  On account that it took about ten minutes for me to integrate this, not a big loss.  But I'd say you're not out of the water yet-you may just have 100% "normal" hardware.  Just be aware of it.
What you're calling the z plane (a better name might be the YZ plane, but there's actually a scientific term for it relating to hrtf and z means forward in the sighted/OpenAL world anyway) isn't used.  By anyone, for anything.  Well, except audioquake, which uses special radars.  BGT doesn't even have a third dimension.
Rapture is using a lot of proprietary stuff.  It's possible they've figured out something we haven't, and their web pages imply that they have done all sorts of stuff with mathematical analysis of multiple datasets to come up with theirs.  I may one day work on doing whatever it is they did, but not for a while-directly above and directly below are not important as isolated positions.  In normal use, the sound will move into and out of the directly above position naturally, and I think that's probably actually good enough.
When you do isolated tests like this, there are three things that affect it.  One of them occurs because you are using OpenALSoft's implementation.
The first is that you're missing literally all queues from movement.  This comes in when you actually move sounds, and especially when the player is controlling the player character.  If you know it was up and to the left and that it's moving above you, when it hits that one position, your brain just goes "Okay, it did that", at least to some extent.
The second is a consequence of convolution, and you may have already seen it.  An impulse response is a magical list of numbers that is applied to the sound inside what is basically two nested for loops.  I can cut/paste them here, but the for loops are not the point--it's a case where the code is maybe 5% of the story, and the rest comes from knowing why and wherefore from the math.  Nevertheless, every sound is a sum of individual sine waves.  All an impulse response can do is modify what is already there: if you feed one sine wave through your HRTF, you don't get anything except stereo panning out.  If you want your tests and the stuff you do using it to sound really good, you need to use sounds with what is called a wide frequency response: various forms of noise, drums, footsteps, and things of that ilk.  The trick is to have enough individual frequencies in the sound so that the impulse response from the HRTF has something to add the queues to.
the third is OpenALSoft's optimization and conversion process.  If you're using makehrtf, you'll want to play with all the command line options.  Even if you turn everything up to the maximum quality, however, OpenALSoft throws out all the information related to something called phase and replaces it with a simplified model.  The irony is that it does this in the name of optimization but, with Libaudioverse which doesn't, I get more sources.  No, you cannot disable this optimization.
I'd be interested to see your tests.  I'm not set up with Rapture3d.

My Blog
Twitter: @ajhicks1992

2014-12-16 17:27:01

Hmm, the test programme (https://dl.dropboxusercontent.com/u/213 … 20Test.zip) does 2 things: decrease the height of the sound, then plays a sound at various locations around the listener. Are you saying that it'll sound better if it has more sounds to work with?

Which settings do I modify? I don't even know what most of the options here mean or do.
Usage:  makehrtf <command> [<option>...]
Commands:
-m, --make-mhr  Makes an OpenAL Soft compatible HRTF data set.
Defaults output to: ./oalsoft_hrtf_%r.mhr
-t, --make-tab  Makes the built-in table used when compiling OpenAL Soft.
Defaults output to: ./hrtf_tables.inc
-h, --help      Displays this help information.

Options:
-r=<rate>       Change the data set sample rate to the specified value and resample the HRIRs accordingly.
-f=<points>     Override the FFT window size (defaults to the first power-
of-two that fits four times the number of HRIR points).
-e={on|off}     Toggle diffuse-field equalization (default: on).
-s={on|off}     Toggle surface-weighted diffuse-field average (default: on).
-l={<dB>|none}  Specify a limit to the magnitude range of the diffuse-field average (default: 24.00).
-w=<points>     Specify the size of the truncation window that's applied
after minimum-phase reconstruction (default: 32).
-i=<filename>   Specify an HRIR definition file to use (defaults to stdin).
-o=<filename>   Specify an output file.  Overrides command-selected default. Use of '%r' will be substituted with the data set sample rate.

What HRTF dataset did you choose for LibAudioVerse?

2014-12-16 19:13:59

I just used the MIT Kemar Dataset.  Specifically, the one they say is diffuse equalized.  The difference is that I'm not throwing out the phase information using a C file that's basically the size of a small novel in the interest of speed.  OpenALSoft is interesting internally, mostly because it started in the days when our compilers didn't do a million optimizations they do now, and I've personally found that not doing what he did and just writing for cache optimization and minimizing the number of superfluous math operations is better, at least on an X86 PC.  It also takes literally something like a thousandth the code, and the only concern I have is that it might be required for iOS down the road.  I'm not sure how it sounds versus OpenALSoft as I never set up a side-by-side listening test, but it's definitely less hollow and more full.
I'm not saying you want to use more sounds, I'm saying you want to use a sound with as wide a frequency response as possible.  Examples of "bad" sounds for HRTF include flutes, sine/cosine waves, whistles, etc.  Good examples include clicks, pops, footsteps, drums, various zapping-like sounds, etc.  The HRIR can only modify what is given to it in terms of frequencies, and so you need to give it enough frequencies that it can actually insert the queues.  This is frequency, not sound: every sound is composed of a number of frequencies added together.
Go into audacity and make 10 random sine waves at whatever frequencies.  Play them.  So long as they're not all astronomically far apart, you don't hear them as 10 distinct sounds.  Normal sounds have hundreds and thousands of individual sine waves, but it's still possible to theoretically make any sound that way.  What the HRIR does with some deceptively simple code is modify these in the same way that your head does.  Playing more sounds simultaneously isn't the same at all, though playing multiple sounds simultaneously at the same point in space would potentially be so long as they're not all the same.  For the record, your current sound might be fine.  It's just a thing to be aware of when you're playing with this stuff.
I'm afraid I can't tell you what the OpenALSoft options mean.  I never invested the time to understand the algorithm because, as I said above, I think it's pointless.  Back when i actually did OpenAL, I didn't understand the math well enough.  As I recall, the really important one is -w, and you want to turn it up to the max it will let you.
As for your test, it's not the program I wanted.  I was hoping you had a recording.  I would need to instal Rapture3d, and it might be worth doing so, but I'm not set up with it at the moment.  I should perhaps also post one for comparison purposes, but this will take a little setup that I don't have time for at the moment (and I have got to code the recording object.  Maybe I'll do that first).

My Blog
Twitter: @ajhicks1992

2014-12-17 04:01:31

Have tried pushing -w up to its max value (128) and -f to the max of 16,384 but I see no difference in the file being generated. How easy is it for me to remove the discarding of the phase info and where should I look for this code? Would it be in the MakeHRTF utility?

2014-12-17 04:43:19

You can't, not easily.
Whoever originally implemented it decided to replace 10 lines of code with a custom data format and an algorithm to run that format.  He removes the phase info from the HRIR and replaces it with some delay lines.  If you do take it out and given the architecture of the rest of it, it probably will fail to be fast enough for what you need.  The way that it is designed internally is, IMHO, a mess.  Partially this comes from the OpenAL API, partially it's from a project that can be traced back to 2005, and partially from the fact that, back then, strange bit tricks and crap really were faster than just multiplication.  That last one is a very, very long discussion that I'm not really qualified to have.  It's basically just stacked compiler and CPU improvements out our ears.  Nevertheless, OpenALSoft somewhere contains my favorite line of code: x &= ~4.  If you stopped and stared at that for a minute or two before getting it as I did, you will understand why I dislike it so much.  If you didn't, then go ahead because you've probably got a decent chance.
The net result of all of this is that the naive O(m*n) algorithm that takes 22 million math operations a second at 44.1khz is going to do so in such a way that you're not able to take advantage of all the features of your CPU that let you do that kind of thing without bringing the computer to its knees.  The rest of OpenALSoft already takes a ton, and I'm almost certain that you can't just use my loop--the model that OpenAL uses for buffers ruins such tricks.  I'll post the convolution kernel at the bottom of this post; but you'll still need a rather large dose of trigonometry to figure out what the impulse response itself needs to be (that code, in my much simpler version, verges on 100 lines including loading the datasets, and is mostly math).
I'll make a point of posting a recording for you in the next couple of days with mine.  I'm too busy at the moment and it's not trivial for me to do so again until I set a couple things back up.  I don't have a side-by-side test as I said, but you should at least be able to tell if it's different enough that you really care.  I can load other datasets, but the only other interesting one is CIPIC, and CIPIC is formatted in such a way that it can't be loaded into much if anything without a lot of data munging and conversion that I haven't cared to do yet--I don't know if it's worth it, so I don't want to spend a day or two figuring out how to deal with some of the really weird things about CIPIC.
And the two crucial functions (be aware Libaudioverse is itself GPL, but I hereby give anyone who wants permission to use what I'm putting here for what small bit of good it does).  Also be sure to scroll down and see the header snippet and my explanation-you will probably shoot yourself in the foot repeatedly until you do:

void convolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* response) {
    for(unsigned int i = 0; i < outputSampleCount; i++) {
        float samp = 0.0f;
        for(unsigned int j = 0; j < responseLength; j++) {
            samp += input[i+j]*response[responseLength-j-1];
        }
        output[i] = samp;
    }
}


void crossfadeConvolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* from, float* to) {
    float delta = 1.0f/outputSampleCount;
    for(unsigned int i = 0; i < outputSampleCount; i++) {
        float weight1 = 1.0f-delta*i;
        float weight2 = i*delta;
        float samp = 0.0f;
        for(unsigned int j = 0; j < responseLength; j++) {
            samp += input[i + j] * (weight1*from[responseLength-j-1] + weight2*to[responseLength-j-1]);
        }
    output[i] = samp;
    }
}

And this snippet from the header is also important:

/**The convolution kernel.
The first response-1 samples of the input buffer are assumed to be a running history, so the actual length of the input buffer needs to be outputSampleCount+responseLength-1.
*/
void convolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* response);

/**Same as convolutionKernel, but will crossfade from the first response to the second smoothly over the interval outputSampleCount.*/
void crossfadeConvolutionKernel(float* input, unsigned int outputSampleCount, float* output, unsigned int responseLength, float* from, float* to);

And my explanation:
You call these on blocks of data, but you have to maintain a running history for them.  In internal Libaudioverse code, these are "kernels", my name for a class of functions that are somewhat like plus: fundamental operations, separated completely from supporting structure for clarity and optimization purposes (I can make these about 4x faster with SSE).  You need a block of data that is block_length+impulse_response_length-1 samples long.  Every time you "advance" with these, you need to copy the last impulse_response_length-1 samples from the input buffer to its beginning (be extra sure to use memcpy or it will possibly be much too slow).  These functions need the tail end of the last block to be there in order to work right (convolution is a weighted average of the past--not doing so is going to just give periodic silence and dropping out of the audio).  You should initialize the history section of the buffer with zero.
the reason the history is not separate is because doing so is something between 4 and 20 times slower, depending.  The checks to see where to read would be happening 22 million times a second on average for a 44.1khz output sampling rate.  Checks for things swamp multiplication and addition by an order of magnitude, and suddenly switching from reading one buffer to another and back causes a phenomenon known as cache misses which is even worse than both of the proceeding put together by an order of magnitude on a modern system.
I hope this helps some.  I don't think it does as much as I'd like.  There's a great deal of context and knowledge that I can't just shove in a post--the implications of these particular functions are actually deep, but we'd have to go talk about the FFT and stuff.

My Blog
Twitter: @ajhicks1992

2014-12-17 06:41:32

I think I'm going to concentrate on producing something first, then worry about this later. I'm hoping that LibAudioVerse will be mostly finished by then. Btw, what license will it be released under? Will it be usable in commercial applications?

2014-12-17 06:42:25

All x&=~4 does is turn off the 4 bit,?

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2014-12-17 15:26:13 (edited by camlorn 2014-12-17 15:27:40)

Yeah. I realized very, very late last night that I typoed that.  It's actually x &=~3, which is the most cryptic way I can think of to reduce a number to its last multiple of 4.  I mean, if you know the bit tricks and walk through it, fine, but why not just x = x-x%4 or 4*(x/4)?  Either is clearer.  Unfortunately, OpenALSoft isn't from 2010, it's from 2003, so optimizations like that weren't common.  Nowadays, I'd expect either of those lines to optimize out with a fair degree of certainty; now multiply that by the whole codebase and you begin to understand what this thing looks like internally.  If you really need it, it should at least be put behind a macro or a function, though 2003 compilers probably didn't aggressive inline across translation units--another significant optimization that's somewhat newer, at least in practice.
Libaudioverse will be released under a dual license: GPL and commercial.  I will be charging for the commercial one with various pricing schemes that are yet to be decided-philosophically, making all my cool stuff free and open source for anything is nice, but I'd also like things like an income and a retirement plan.  I chose the GPL because it's basically Libaudioverse the demo version for the commercial people and it will hopefully make this community start sharing code-people do a lot of hoarding around here, imho.
For the record, I'm working on a first person shooter as of now.  Libaudioverse has hit the point where development may only proceed by me actually using it in stuff and seeing where it comes up short.  Also, I need an actual game to test and improve my reverb--I've got an algorithm started which looks like it will be good enough to literally let you hear walls, but I need a bunch of environments to test it in.  It won't be an MMO-style game because that's not the point, but online multiplayer is on the table as of now (I'm using it to prototype a bunch of stuff for my next and hopefully last attempt at a WoW-style MMO, and the fallout is that that might be really easy to add).

My Blog
Twitter: @ajhicks1992