2015-05-30 00:59:16

Hello everyone,

I've been lurking around for a bit and have been studying this communities use of audio interfaces, such as 1 Dimensional Side Scrollers, and multi-directional sonar pings in Swamp I believe? Audio Quakes 3D positional audio, SoundRTS and Tower Defense games approaches are also rather interesting.

I'm curious however, have any of you heard of Image to Sound rendering?

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-05-30 01:43:13

I'm pretty sure there was something called "The sound", "the voice", or something to those regards, that was talked about on here a few years ago.  If more has been done in that area, I'd be interested to learn about it.

- Aprone
Please try out my games and programs:
Aprone's software

2015-05-30 02:31:41

Yes, Peter Meijers The vOICe software, i've been working with some of his source code: (http://www.seeingwithsound.com/im2sound.htm), for game engine applications for the past few weeks. Fairly difficult to find much on the subject outside academic papers, although I did find some research on incorporating color spectrums using instruments during sweeps, something i'd like to look into at some point.

I managed to port some of the code to python and integrate parts of it into some engines, and have been experimenting with sonified depth maps for first person environments, among others. If your interesting I can show you what i've come up with so far, though i'm still working on performance issues.

As for Mr Meijer himself, he seems to be busy working closely with members of the Raspberry Pi community to create affordable Sonifying Kits for the non-sighted.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-05-30 05:23:51

It would be awesome if we could here images with sound. That would be epic!

"On two occasions I have been asked [by members of Parliament!]: 'Pray, Mr. Babbage, if you put into the machine wrong figures, will the right answers come out ?' I am not able rightly to apprehend the kind of confusion of ideas that could provoke such a question."    — Charles Babbage.
My Github

2015-05-31 02:14:58

Oops, I meant if your interestED I can show you what i've come up with so far, damn typo's.

@Ethin
Well you can, just download the free version of The Voice software from Mr. Meijers site (http://www.seeingwithsound.com/voice.exe). Although it may not be as easy as you might imagine, research seems to suggest it takes roughly 70 hours of training to really get used to it, and I find sonified Depthmap's to be alot easier to make out than light based images.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-05-31 08:54:39

Just to be clear, if anyone isn't quite understanding what Magurp is saying:
The vOICe is much, much better for real-world first person.
I've played with it a lot in the hopes of making 2D games more accessible, with very little success.
But the one time I tried walking around my house and outside with it (using my laptop's webcam), it was much more helpful.
Generally speaking, I expect my primary use of the vOICe anymore to be if I make videos with my computer, so I can hear if I'm in frame or not.

I might try to throw together an example of what the Swamp Radar might sound like if it were handled by the vOICe. I think I still have sonified screenshots from Sonic3 in my dropbox somewhere.

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2015-05-31 23:44:59

So wait, is this actually a feasible way to play a game?

I'm honestly skeptical and think that the best thing would be to adapt the interface with proper audio indicators and such, but if that idea or a variant of it (e.g. using panning to indicate horizontal position) can yield somewhat usable results, I may consider giving it a quick try as a stopgap until I can implement a full blown audio mode in my game.

Also I wouldn't rule out 2D games completely, but any attempt to do this with them will definitely require changing the way things are rendered (e.g. no background, replacing sprites with detail-less boxes representing the object type, possibly reducing the viewport to allow things to be larger within the limited resolution, etc.).

2015-06-01 02:30:43

That depends, but I think I should go into abit more detail on how it works.

The vOICe software can only convert images in to Black and White, even if the original image is in color. The time it takes a sound to play represents the X axis of the image, so a pixel on the far left of the picture would be at the very beginning of the sound, and the pitch of the sound represents the Y axis of the image, so a very high pitch means the pixel would be at the top of the image, low pitch at the bottom. The brightness of each pixel is represented by the volume, the louder the sound, the brighter the pixel, the quieter the darker the pixel.

Now this means that moving in a real life 3D space with shadows and lighting creates Contrast and provides a sense of Depth, allowing you to more easily make out shapes and determine distance between them by how loud/quiet parts of the sound is, when it plays and what its pitch is. In sighted 3D First Person games its less effective because alot of them have crappy lighting, it often being just as bright down a hallway as it is right next to you among other details, so when converting the image from black and white in to Sound you get little contrast and can lose your sense of depth and positioning.

Its often worse in 2D games, which often have no shadows and use a lot of bright colors, making it very difficult to make out whats going on.

This doesn't necessarily mean its impossible to make 2D and 3D video games using The vOICe, but I do think it takes a very different approach compared to sighted games.

@sik
This is the exact approach i've taken with my own 2D engine, I've rendered the background black and the terrain as dark grey, and the player as a white block to help give contrast to make it easier for the player to know where they are in the environment, combined with positional feedback like sounds for walking, bumping into obsticals, and jumping. I've also tried embedding the Audio Renderer into the game itself and wanted to try incorporating different musical instruments to represent colors when Sonifying each screen to help identify different kinds of objects and opponents, as the vOICe can only do black and white.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-01 10:02:20

I made a rather different image-to-sound program, whose biggest disadvantage compared to the vOICe is that it doesn't do the whole image at once (I meant to add this feature, but never got around to it). In its current form, it lets you move around a cursor (currently via the arrow keys, but the mouse or a touch screen would work as well in theory), and not only does it play a sound representing the color at that position, but it does ray-scanning for the nearest areas of different color, to give a sense of what is nearby. (It does have a means of detecting shading, also.)
I found that most game images are too complicated to get much out of them using sound, this way or with the vOICe. I tried it on screenshots from Sonic 3 (way too much going on in the background), and even on sprite sheets from Streets of Rage (I could find individual sprites, but couldn't make sense of what position's they were supposed to be in).
It does include a method of converting images to braille, but I couldn't get my braille display to work in order to test it.
I don't see it playing well with the vOICe in its current form, but now I'm wondering if it could be adapted to work together with something similar.

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2015-06-02 01:00:02

@CAE_Jones
Hmm, I think you could try adapting your program to make a Match 3 game like Candy Crush, since you can tell what color your on and what adjacent colors are next to it, you then press a button to swap the colors and try to get 3 squares of the same color in a row. You also don't really need to know whats happening on the rest of the screen until you pan over it, and it would work well on a tablet. Or you could potentially adapt it into a Maze game, where you have to trace your way along lines of a particular color to get from start to finish, maybe even something like packman perhaps. These may depend on whether you can tell what position the adjacent squares are relative to your current position though, if thats the case, there could also be other potential applications for it as well.

Color in Audio Games I think is best used as information, which is why I want to incorporate it. To give an example, lets say that the color red is represented by the sound of a trumpet, the color blue by a piano, and the color green by a drum. Now lets say your playing a side scrolling game and you hear a trumpet off to the right, that represents a powerup. Hear a drum further behind it? That's an enemy. Hear a piano off to your left? That's a door. Instead of associating each sound with a particular color, you associate the sound with a particular effect or object, the color in this case is purely arbitary and used by the game itself for data and tracking purposes, the players will never know what color the sound actually represents, and they don't need to. Audio games already do these sorts of things to a degree with 3D positional audio and representational sound effects, the difference is that this method maps those effects precisely to a position on the screen with greater accuracy.

This has a lot of practical applications, for example in theory someone could make a top down Real Time Strategy game like Dune 2 or Command And Conquer with different factions represented by different solid colors, by using the above method a non-sighted player could then tell where all the units are on the screen and who they belong too in a sweep because of the specific sound of each factions color.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-02 09:40:15

@magurp244: how practical would it be though in the use case you describe for a strategic map view of units in an rts for example? The amount of resolution that sound can provide will always be less than what a sighted person could see.

2015-06-02 17:34:37

The real difficulty seems to be two things: information overload (sound is nowhere near as parallelalizable as sight), and distinguishing signal from noise. In a room filled with people, it's easy to pick out an individual face, if one has perfect vision, but even with perfect hearing, picking out an individual voice if everyone is speaking is extremely difficult.
The vOICe's solution is to organize data temporally--instead of getting everything at once, you get a column at once. It can still be messy, but once everything is playing on a predictable interval, it becomes easier to search for the desired information.
I'm still not sure how well this would work on an RTS scale, though.
The vOICe comes with a Tic-tac-toe game, which is very easy to pick up. Would something similar work for, say, Chess? Or even just Checkers?
I have no idea; it seems like trying to search columns of 8 (well, Checkers is more like staggered columns of 4) woul dbe much harder.

Actually, that's probably worth an actual experiment. Try to determine how much information can fit into a single column and still be usable--bonus points for speed and a decreased learning curve. It's probably possible to design a game based around that.
(I would consider doing it myself if I felt like setting up an online scoreboard to track the results. sad )

看過來!
"If you want utopia but reality gives you Lovecraft, you don't give up, you carve your utopia out of the corpses of dead gods."
MaxAngor wrote:
    George... Don't do that.

2015-06-03 07:46:12 (edited by magurp244 2015-06-25 09:02:42)

@Victorious
I can't really say as to how practical it would be, since this has never really been done before. There are all sorts of challenges and potential ways to go about doing it, but I think a game similar to Dune 2 may be possible, but not without some sort of color system in place to help make it easier to differentiate between factions and objects.

As for resolution, lets say you have a sighted game running at 640 by 480 resolution, typically sprites in such a game would be around 32 by 32, or even 64 by 64 pixels so players could visually read them properly, which means that the maximum amount of tiles players could see would be 20 by 15, or 300 different tiles, or for 64 by 64 tiles, that would be 10 by 7.5, or 75 tiles. Players also typically spend most of their time focusing on only a handful of those, their character, enemies, powerups, etc.

In a non-sighted game we could use solid single color blocks to symbollically represent an entire sprite or object as a single tone. This means that unlike sighted games, you don't need sprites to be 32 by 32, or 64 by 64, its all the same. The only difference to you would be that the sound would play longer and in a wider pitch, the tone and what it represents wouldn't change. This means that you could potentially get away with 4 by 4 pixel blocks, or 8 by 8 and still get a similar amount of relevant information over a smaller area.

@CAR_Jones
If nothing else I think it would be interesting to explore this, even if it doesn't work out. I created a mockup image of a chess board with 32 pieces, white blocks on top, grey blocks on bottom. Running the vOICe on very slow motion seemed to work alright for making out where each piece was respectively. If it were an actual game, the player could put their mouse over a piece or move around a selection box and it could tell them what the currently selected piece was to help guide them. I might try throwing together a functional prototype and see how it goes.


In other news, i've uploaded two prototypes i've been working with, i'd be interested to see what everyone thinks. Unfortunately it seems Google Drive is flagging one of them, sigh. Anyway you can download the other one as a win32 binary here:

EDIT: (moved, see later posts for bitbucket respository)

The first is a 2D side scrolling engine I wrote, and is going alot better than the other. It has a built in Audio renderer and i've fixed some performance issues, but its stuck at a single speed for now. If you want to use the vOICe with it for faster rendering, press TAB in game to disable the internal Audio Renderer.

For controls:
Left and Right arrow keys to move Left and Right
Up arrow for jumping
ESC to quit

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-03 17:54:23

I think this is relevant to this discussion.

Yes, I just went ahead and gave it a try, after a tad of tweaks to the rendering engine to remove as much superfluous information as possible (during the early tests it resembled CrazyBus). It's preliminar work and needs tweaking (as well as then adapting the rest of the interface) but it's something.

2015-06-04 05:30:25

@Sik
Hmm, thats interesting. I completed your demo and it got me thinking, I hadn't quite realized how effective audio indicators might be. Though I do think Audio Rendering could still add to the input to help make a more intuitive experience, especially for terrain or barriers. I'm going to try a few additional experiments in my 2D engine with this.

Do you plan on implementing an Audio Renderer? Or adjusting the texture's to make it compatible with the vOICe externally?

My 2D engine was coded with python 2.7.9, Pyglet 1.2.3b1, and Numpy 1.9.2 for performance. How it works is I wait for a soundscape to finish playing or if one hasn't been done yet, then on the next draw call resize the screen using glScalef() to the target resolution, in this case 64 by 64, draw the scene, then grab a section of the frame buffer and pass it to my audio rendering class, resize the screen and do another rendering pass. So, technically I could draw a Soundscape for non-sighted people, and a full visual sprite scene for sighted people simultaneously.

The audio renderer then decodes the image into a uint8 byte array, strips out unused color/alpha data, and uses Numpy array's to do the waveform calculations. I optimized it in such a way that it builds the wave incrementally each cycle to maintain performance, but you can't multiply uneven numpy array's with each other, and some of the Numpy functions don't take floating point so I fudged some of the array sizes for the time being, hence why its stuck at only one speed. After its done processing the audio data, it passes it off to a modified version of one of Pyglets Prodecural audio generator classes for output.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-04 16:28:37

magurp244 wrote:

Do you plan on implementing an Audio Renderer? Or adjusting the texture's to make it compatible with the vOICe externally?

I have the full blown audio renderer already implemented.

First the game renders things in a completely different way: the level gets turned into a solid white silhouette, while objects get turned into colored blocks (with colors for player, goodie (items) and danger (enemies, hazards)). Then this gets filtered so only lines at the top and bottom of each color area is stored (otherwise it becomes a mess), and this gets processed as follows:

Horizontal position: panning
Vertical position: frequency (pitch)
Color: waveform (how it sounds)

Note how unlike vOICe I'm not using time as a factor, so I can update this at 60FPS without problem. So far I only got a person to test but the little testing so far seems to suggest that yes, the idea works (and now is in the process of tweaking).

If you want to test just follow the contact instructions and I'll send you want you need over e-mail. (I'd have uploaded a recording of how it sounds, except for the part where the recording code still isn't setup to record this output)

2015-06-05 07:38:31

@Sik
What, really? Thats great!

When you say that only the top and bottom lines of each color area are stored, does that mean that for a 64 by 64 image for example, you only calculate the top and bottom rows pitch into the waveform and ignore everything in between, like rendering an outline? Or do you include the audio pitch between the two lines into the audio sample for the whole solid block? I also take it you don't scale the image in any way before processing it?

Also what do you mean by not using time as a factor? That instead of using a sweep from left to right to determine position along the X axis via time your using stereo positioning on the left and right with a high speed sweep?

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-05 08:41:14

@Sik: could you upload the recording and provide a link? I'd be really interesting in listening to it.

2015-06-05 16:32:58 (edited by Sik 2015-06-05 16:34:36)

magurp244 wrote:

When you say that only the top and bottom lines of each color area are stored, does that mean that for a 64 by 64 image for example, you only calculate the top and bottom rows pitch into the waveform and ignore everything in between, like rendering an outline?

Er, for each block (e.g. a player or an enemy), but yeah. If I include everything in-between then all of those pitches will play which makes it an unbearable mess to hear as well as renders the waveforms indistinguishable from each other due to how much they got distorted with all the mixing. You only need to know the covered range so I'm only using the top and bottom rows instead.

magurp244 wrote:

I also take it you don't scale the image in any way before processing it?

Actually it is, from 320×200 to 40×25.

Note that replacing graphics with the color indicators happens earlier (in fact, graphics are never rendered, the colored blocks are rendered directly as-is in their place). Then this is shrunk down, then filtered so only top and bottom rows remain (as mentioned earlier), then finally converted into audio.

Note that the scaling is just to make things easier so I don't have to go around all over the code to change the resolution it expects (I'm literally adapting a full-blown finished game for the sighted into an audio game, remember).

magurp244 wrote:

Also what do you mean by not using time as a factor? That instead of using a sweep from left to right to determine position along the X axis via time your using stereo positioning on the left and right with a high speed sweep?

Yep, except no sweep at all, it just mixes in everything together (remember the entire image is processed at once).

Victorious wrote:

@Sik: could you upload the recording and provide a link? I'd be really interesting in listening to it.

That's precisely the part I'm having trouble with tongue (the recording system has to replay every single sound in order to account for the framerate loss and this means figuring out a reasonable way to log the new output first)

2015-06-06 07:57:20

@Sik
Ahh, yes that makes sense since were working strictly with square blocks instead of abstract or 3D shapes.

I like your stereo implementation but there may be a few things to consider. I think the vOICe does have stereo panning support, and I know Meijer has HiFi stereo code on his site that i've only just started to reverse engineer. His code sample has source for both Mono and Stereo output, and I think he may have implemented it this way because some systems may not have stereo support, but even if they do some non-sighted people may not be able to fully perceive stereo because of partial or total hearing loss in one or both ears, have cochlear implants, hearing aids, etc. So, for example someone with partial hearing loss in a stereo setup could end up with perceptual gaps in the rendered scene, or in a mono rendering case it may be difficult to orient without a point of reference.

It's also difficult to say how using different waveform's may effect accessability with the above issues either since I don't think the vOICe has ever been tested with these parameters. Either way, getting a wider test group among different ranges and hardware will be important to weed out and determine whether any these may cause a problem or not.

For reference there is another program based on the vOICe that use color encoding called EyeMusic developed by Amir Amedi (http://brain.huji.ac.il/site/em.html), which creates soundscapes with musical notes instead of tones. Theres not alot to go on other than a video on his youtube channel and some research papers, though it can be bought on itunes.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-15 01:07:59

OK the recording code is trolling me because it decided to start working out of nowhere
http://sik.titandemo.de/.junk/sonar.ogg

Later I'll do a proper playthrough recording, and I need to retweak the volume levels and such, but for now this is what is going on.

2015-06-15 08:57:09

Wow, that kind of sounds like a Lucas Arts adventure, hehe. But yeah, the audio seems abit harsh at around the 2 and 4 seconds mark. 

Its hard to tell whats happening without abit more context, though I can tell your jumping twice. Look forward to your play through.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-15 18:20:33

Yeah there's a pit there if I recall correctly (dangerous stuff has a harsh sound).

The most important part is if you can tell where there's a wall and such, although I think that with the last tweaks now it became too weak... Also I need to find out a way to prevent the sound from being overloaded when there's a lot of the same stuff together (as happens with that pit).

2015-06-16 08:04:21 (edited by magurp244 2015-06-16 08:08:25)

I've noticed that problem in some of my builds too, seems to be the signal to noise ratio CAE_Jones mentioned. It gets too hard to distinguish between competing sounds in close proximity, which could prove problematic when using different waveforms/layers in a single soundscape.

Hmmm.. I think I have an idea that might help, i've recently been working on proximity audio with conventional audio cues to work along side of, instead of in the soundscape. So for example the soundscape sonifies the terrain, but when the player gets close to, say, a body of water, the closer the player is the louder the sound of the water, further the quieter.

You could try making objects volume proximity based to the players position, or make them proximity based to each other, so a cluster could reduce their collective volume based on adjacent objects to a more readable level. So when processing your image data for audio rendering you can adjust the brightness levels of the pixel data.

-BrushTone v1.3.3: Accessible Paint Tool
-AudiMesh3D v1.0.0: Accessible 3D Model Viewer

2015-06-16 17:04:35 (edited by Sik 2015-06-16 18:42:54)

The game already does the volume thing in both directions (horizontally as a result of panning, and vertically to gimmick the same side effect). The problem is that the pit has too many objects and is indeed close by tongue Also the fact that at last minute I had lowered the volume of the level sound because I thought it was too loud, but I think I may have overshot that one...

Here's another attempt making only the corners of objects visible instead of their whole silhouette (also restoring the original volume for the level):
http://sik.titandemo.de/.junk/sonar2.ogg

EDIT: it also doesn't help that the viewport is now rather large. It used to be smaller so I could reduce the amount of stuff to be "shown" and apparently that was better for moving around, but the problem is that it apparently made enemies too hard to notice (and unlike zoomed mode, it's harder to tell when an enemy is bound to come).

EDIT 2: great I just realized the panning wasn't being recorded at all. Looking at this...