2020-10-31 22:35:47 (edited by shiftBacktick 2020-10-31 22:39:09)

Over the past year I’ve used web technologies to create audio games like soundStrider and participate in numerous game jams. With syngen I’m releasing my tools as open-source so folks can join me in crafting dynamic audio experiences and games for the web.

Links
Example projects
Disclaimer

syngen is experimental software. Please use at your own risk. Currently I have limited time to support this, but please let me know if you encounter any issues. I suggest reading the API documentation and tinkering with the example projects. I'm excited to hear your creations!

2020-11-01 10:08:58 (edited by ogomez92 2020-11-01 10:09:26)

Very good engine. Impressed. I've been wanting to build something with it, but I've been working on other projects first.

ReferenceError: Signature is not defined.

2020-11-01 19:46:11

Hey cool!  I'll have to take a look at this.  Nice work.

2020-11-03 10:45:15

will this work in pure JS also? or do i need to make use of node?

best regards
never give up on what ever you are doing.

2020-11-03 15:47:53

@4:

You can absolutely use this with pure JS. Check out the example directory to find examples of that. Sometimes for larger projects I like to use Node to help glue things together, but it's not required (see my template repository, a work in progress).

2020-11-04 05:51:51

I didn't know about these! Can you do a build and upload the games to gh-pages so I don't need to download a file and run npm install?
Also, what's different about sygnen than something like:
https://github.com/rserota/wad

2020-11-04 07:49:25

@5, keep up the great work. this sounds cool. i love it

best regards
never give up on what ever you are doing.

2020-11-04 08:15:09 (edited by shiftBacktick 2020-11-04 08:26:05)

@6:

You can download the games from their releases on GitHub (portable HTML, Windows, and Linux builds are available), or you can play directly in the browser by going to the itch pages linked from their About sections.

I've never used WadJS so I'm basing my response off its API documentation. They're similar in how they allow positioning sounds in 3D space and have some built-in synths and effects. syngen differs in a few key ways:

  • syngen is also an engine. It provides the scaffolding and utilities for building games, like an event loop, input processing, math functions, and object streaming. WadJS is meant to be the audio layer for an existing engine.

  • Wads are triggerable sounds. Props in syngen are behavior definitions. When instantiated they represent a physical object in space that has that behavior. They can be applied physics and be the source for multiple sounds.

  • WadJS has a more elegant yet restrictive API that abstracts away the Web Audio API to synthesize sounds. The synths and effects built into syngen are more low-level and give you direct access to it, or you can build your own if you prefer.

  • syngen has configurable, real-time binaural rendering that simulates the Doppler effect. You can adjust parameters like head size, ear angle, head shadow, and the speed of sound. WadJS only has a stock HRTF.

  • syngen has a virtualized mixer with a built-in master effects chain. In WadJS you would need to build this yourself with a chain of PolyWads.

  • syngen has a global reverb send with a configurable impulse response and filtering. It's meant to glue sounds together as if they exist in the same physical space. It's unclear how reverb works in WadJS.

Hope this helps!

2020-11-10 20:42:26

@shiftBacktick I sent you a PM.

2020-11-11 01:36:09

What  do I need to use your program? I’m a beginner programmer and I wanted to know what programs I needed to learn, before using this guide.
Congratulations on your new program!!! I cannot wait to use it!

2020-11-11 02:41:11

For what it's worth, you may find in practice that just using a database of impulse responses gives you a much better HRTF implementation than trying to make it configurable.  Almost no one will play with the configuration, and in practice those sorts of mathematical models are actually very poor approximations to the head, so the trade-off very much can become "1% of users configure it, but it's worse for the 99%".  Unfortunately the built-in WebAudio one is highly variable per browser.

The only good configurable one I know of is OpenALSoft, and there's a frankly massive C program which is responsible for taking the impulse responses and the configuration parameters and producing a big data file that gets fed in at runtime.

My Blog
Twitter: @ajhicks1992

2020-11-11 03:16:21 (edited by shiftBacktick 2020-11-11 03:18:10)

Many replies below:

@9:

Thanks for letting me know! I wish this forum had better notifications for PMs. I've sent you a response (and, oh geez, some other folks too).

@10:

This is a great question! I can't say this is entirely for beginners, mainly because I've used it for my own projects and haven't put the time into making it super approachable. Here's how I might start:

  • Know your tools. Find a text editor or IDE that you like and can work efficiently within.

  • Have an idea. Determine what you want to make, and outline it before you start.

  • Read the docs. Learn what this provides for you and what you'll need to build yourself.

  • Check the examples. Discover how you might approach building it, but know there's no right way except what feels right.

Otherwise you don't need anything else to get started besides the library and a web browser. If you need to learn more about sound design or synthesis (which this is primarily geared towards), then I might recommend the excellent Synth Secrets article series. It's a lot to take in, but we're here on this forum to help you get started.

@11:

I totally agree with your assessment in general. I created this library primarily as a learning exercise. There are two main reasons why I kept this direction with respect to binaural rendering:

  • ConvolverNode is extremely expensive in the Web Audio API. Although this model applies two BiquadFilterNodes, DelayNodes, and GainNodes per sound source, it's still much more performant than two StereoPannerNodes (stock HRTF) or two ConvolverNodes (impulse responses).

  • HRTF has fixed spectral qualities. With a configurable acoustic shadow we can apply more specific roll-off frequencies. For example, with sounds tuned to specific musical frequencies with a lot of harmonics, we can apply the shadow to directionally filter out frequencies above the root. So a sawtooth might turn into a sine wave when behind you. I think this is important for eliminating the cone of confusion. In a future version I'd like to make this configurable per instance so musical sounds could have even more directionality.

If there's anything I'm getting totally wrong with my model, please let me know!

2020-11-11 04:29:51

@12
Synthizer gets double digit sources in debug builds with the boring old non-optimized convolution that you're never supposed to use and I expect upwards of 1000 in practice on a good machine in the microbenchmarks, whenever they happen.  That's partly sarcasm--when deciding to use boring old O(n^2) convolution everyone always misses the bit about the crossover point where the FFT version starts being faster, which is something I persistently see people get wrong.  Sadly it sounds like the WebAudio people missed that too.  Fun.  Admittedly part of that is that I am very clever about packing data appropriately for SSE vectorization, but still.  Every time I think I'm done being disappointed with the implementation quality of it, I find that I was in fact mistaken.

I haven't read your code, but I'm assuming that you're doing a biquad per channel.  The problem with that is that a single biquad isn't actually enough to capture HRTF.  You'll permanently lose out on any vertical effects for one thing.  What I may eventually do in Synthizer is allow an additional lowpass filter for emphasis.  But even that doesn't work out so well in practice: the lowpass filter for behind the player due to HRTF and the lowpass filter for occluded by the wall are effectively the same filter.  It might be possible to construct a more accurate model with 2 or 3 biquads in series, but I haven't tried.  The actual frequency response of those impulses is more like a complicated equalizer, not just a couple filters in series.  The "the head is a sphere and" models throw out a lot of detail.

As a benchmark, check this or any of the other variants of it with different HRTF datasets.  OpenALSoft is kind of the benchmark for this, and it's just convolution plus reintroducing removed interaural time difference that was extracted with magic.  I can go into the specifics of the magic, if you know enough math, or you can read the scripts in Synthizer's repository where I do something in the same general ballpark of it.  The general idea is that you convert the filters to minimum phase, then you do some normalization that lets you select between flat frequency response and "this is Bob's head and only Bob's head" response, and just land somewhere in the middle, enough to emphasize what most people have in common without overindividualizing it.

But in general you're stuck with WebAudio and probably can't do better, at least unless you feel like getting into web assembly.  It might be possible to make a program which determines the low-level coefficients for biquad equivalents, but I don't think WebAudio is big on crossfading that properly for you and the naive thing where you crossfade the coefficients one by one actually produces intermediate unstable filters.

One thing that does really leap out at me is that you say what you do is faster than 2 StereoPannerNodes per source.  What was the motivation there?  As far as I know you should only ever need one per source.

My Blog
Twitter: @ajhicks1992

2020-11-11 04:58:31 (edited by shiftBacktick 2020-11-11 04:59:08)

@camlorn:

You're right, I misspoke about two StereoPannerNodes per source. When I write these posts I proofread a lot, but sometimes I miss a few things. The performance between a StereoPannerNode and my implementation is negligible (you'll notice it a lot more with ConvolverNodes, which is much worse), but for me the customization makes mine more preferable.

You're also correct that the WebAudio folks overlooked some issues with panning and it's detrimental to the specification. I share that opinion. Unfortunately, I believe their reliance on AudioWorklets means we have to reimplement if we wish to do better. Eventually I'd like to learn WebAssembly to work on my own implementations with that so it can be more accurate and performant. Then what I'd love to do for my games is offer presets for "Bob's head" so it's more individualized than the averaged defaults I've provided.

Currently I set up the biquad filters for 2D audio, which was easy and sounds decent. The transition to full 3D audio is a very recent development for me. In the future I plan to reevaluate everything for 3D. It's possible that I need more filters as you've carefully alluded to. My model is very simple, a float percentage between 0 and 1, which expresses how directly the sound source is hitting the ear. My initial idea is to compare the quaternions against the conjugates relative to the ears. This is very helpful for the use case I outlined in @12, which involves musical sounds. I'm not sure if I want a more realistic model here yet.

I'll definitely need to read more into your examples and recommendations, which you're certainly more versed than I am. I've got several semesters, a year of real experience, and this tome by Curtis Roads next to me, but that's about it! smile

2020-11-11 16:57:05

Most of my knowledge is self-taught and practical, so possibly you know more of the DSP math than I do.  But if you want a good starting point that may or may not be a quick improvement, you might port my ITD computations, which will require a clockwise angle in degrees from the front of the head (yes, I know, it's a weird coordinate system. But that's what all the HRTF literature does).  In terms of pieces you can drop in to WebAudio, that's probably a pretty simple one.  That said, I don't promise that all the geometry is right, just that it sounds good when I and some others test it.

You probably just flat out can't get better than a lowpass in WebAudio though.  The specification is fine as regards panning.  It's not fine as regards everything else.  And it's not fine in the sense that all the implementors do a really bad job with it, which really surprises me given how relatively simple audio is as compared to things like WebGl.  In this case, Webasm might save you, but the performance of getting audio into and out of it might not be so great.  The last time I got as far as wanting to use Webasm for custom nodes, I got tired of just fighting WebAudio nonstop, realized that everything I might ever want to do needs a desktop app because most blind people can't download 100MB of sounds every time they started my game, and went and started Synthizer instead.  There's more justification to it than that, namely that we really need an audio library like it that's not thousands of dollars, so it wasn't just me ragequitting WebAudio or anything.  But nonetheless here we are.

In either case, if it's helpful in some fashion, my scripts for HRTF impulse normalization are here.  They're not perfect, in particular they take out too much of the frequency dependent effects, but I'll be improving them and they're like 90% of the way there.

My Blog
Twitter: @ajhicks1992

2020-11-12 12:12:56

Ok, so i just have one question here: Will this lib be able to run audio games in the web on mobile devices too?

best regards
never give up on what ever you are doing.

2020-11-12 16:10:48

I can't speak for ShiftBacktick, but that's going to highly depend on the mobile device you want to run it on.  In theory it should work.  In practice, WebAudio can't be optimized nearly as much as something like Synthizer can and it can't offload to something like a GPU either, so it's going to heavily depend on the CPU and be expensive no matter how you slice it.  Plus, lots of mobile devices have older browsers.  New iPhones?  Fine.  Cheap Androids?  Maybe not.

My Blog
Twitter: @ajhicks1992

2020-11-12 20:41:00 (edited by shiftBacktick 2020-11-12 20:41:44)

@16:

In general, yes, Web Audio is compatible with any browser and device that supports it. The specification is mature enough that it should be supported on any mobile device manufactured in the last five years. I have no problem playing the examples (especially Audo and Kaleidophone, which have touch controls) on my Pixel.

In practice it's more complicated. The amount of resources you have available to you for web audio is entirely dependent on the device and browser. In my soundStrider postmortem I dive deeply into the lack of tools Web Audio provides to profile and scale performance, which led to many design concessions I had to make to ensure the game could run smoothly. Basically Web Audio tends to stutter or drop out when it runs out of resources, rather than giving you tools to avoid that.

Paul Adenot's Web Audio API performance and debugging notes are a great resource to understand the CPU and memory costs of every AudioNode. Ultimately it depends on the complexity of your audio graph. You'll especially want to be aware of the number of OscillatorNodes you have running at all times. For example, Kaleidophone always has 34 concurrent oscillators, a number that was carefully tuned to balance performance and sound. You might be able to get better performance by pre-rendering some sounds and playing them back with buffers.

Hope that helps!