Archive forUser Interface

Positional Sound in User Interfaces

Video games are on the forefront of what kinds of rich interactions people can have with computers. In the past decade, there’s been a push for more and more immersive virtual environments resulting in more advanced APIs and hardware to provide things such as super-fast 3D rendering. In recent years, OS X has leveraged these advances in the predominantly 2D world of user interfaces, often in brilliant ways as seen with QuartzGL, CoreAnimation and CoreImage.

In video games, it’s quite common to exploit stereo output or even better, surround sound, to provide positional audio cues. Just as graphics can simulate a 3D space, so can sounds be placed positionally in the same space. If you, super-genetically-modified-mutant-soldier, are running around on the virtual battlefield and there is some big-bad-alien-Nazi-demon-zombie dude shooting at you from the side, you will hear it coming from that direction and react accordingly. Directional audio cues can supplement visual cues or even supplant them if visual ones cannot be shown (i.e. something requiring attention outside your field of view).

On OS X, sound is used rather sparingly in the interface, which is probably a good thing. But for those cases where it’s use is warranted, why not take advantage of technology available? Just as animation can be used to guide the user’s focus, why not sound? OS X does ship with OpenAL, which is to sound what OpenGL is to graphics, providing a way to render sounds in a 3D space.

I’ve put together a quick proof of concept app (download link near the end of the article). Move the window around the screen and click the button to make a sound. Based on the window’s position, the sound will appear to come from the different sides, which, for the most part is left/right, most sound output systems not being designed to articulate things in the up/down direction. The program itself basically maps the window position to a point in the 3D sound space. Right now, it doesn’t really use the z-axis (the axis that goes into your screen) but conceivably you can do things like make the sound appear further away based on window ordering. Try using headphones if the effect is not as apparent using speakers.

There is a significant technical issue, though. You can’t really know the actual physical dimensions and layout of a user’s screens. In addition, the position of the speakers relative to the screens is also not known. While you can get screen resolutions and relative positions of the screens, these are mostly hints at the actual layout. In my demo program, it is assumed that the screens are relatively close to each other forming one gigantic screen. It is also assumed that the speakers produce a soundstage roughly centered on the primary display (the one with the menubar). It assumes a model like this (the circle is the user and the thin slabs are the monitors, from a top-down view):

screen-setup1.png

In reality, it’s probably more likely the user would have a setup like the following:

screen-setup2.png

But who knows, it could possibly be something like this:

screen-setup3.png

The point here is that the effectiveness of this is dependent on the user’s setup. A particular idealized model would have to be chosen that hopefully works well enough for most people. While pinpoint accuracy is not really feasible, it probably isn’t required either. Human hearing is imprecise, otherwise ventriloquists would never be able to pick up a paycheck. Just an indication of left, right or center is probably enough for these purposes.

Where would this be useful? Well, this all came up yesterday when I received an IM (via Adium). I had my IM windows split up across two screens so I had to scan around a bit to find out which window had the new message. Though the window was on the screen to the left, the audio alert made me look at the main screen since the sound was centered straight ahead. It would be great to see an idea like this implemented in Adium and I’ve filed a feature request with them for their consideration. It’s ticket #11292 so you don’t go and submit a duplicate request.

It would be interesting to see more use of this in user interfaces out there. I don’t want to encourage people to add sounds to their apps if they weren’t already using them but for those that are, it’s something to consider. Overall, the effect is quite subtle but with some tweaking, it can be quite effective.

The link to download the demo program is below. Sorry, no source is provided this time. The code is a hacked together mess of stuff copied and pasted from an Apple example as I have never used OpenAL before. This can probably also be implemented in CoreAudio by adjusting the balance between the channels. If you are considering implementing something like this, email me and I’d be happy to discuss details as long as they don’t involve audio APIs since, well, I don’t know them particularly well.

Download PositionalAudioAlertTest.zip (Leopard only)

Thanks to Mike Ashe and Chris Liscio for advice on CoreAudio, which I ended up not needing as Daniel Jalkut suggested I use OpenAL instead which made things easier.

Comments (4)

The Invisible Interface

This is something that I’ve thought about for some time so I thought I’d write a series on the topic of invisible interfaces. What is the invisible interface? When people think of a user interface, they think of something visual made up of windows and widgets. Even for a commandline program, it’s the arguments, the output and error messages. But what many people aren’t aware of are the choices the designer made and the logic the programmer codes that make decisions for you. An interface not only encompasses what the developer put into it, but also what the programmer specifically kept out. This benefits the user in a number of ways: a less cluttered interface, a simpler interaction paradigm and fewer steps to accomplish a task. Many of these things are too subtle to be noticed normally which is the beauty of it. Sometimes the best interface is the one you never know is there.

Let’s take for example the flush toilet. Yes, sorry if this example is a bit disgusting but I’ll try and keep it clean and it is a fitting example. Just bear with me here. So, where were we? Ah yes, the toilet. Simple interface. Push down on the lever, water is flushed down and it stops and refills the tank ready for the next flush. It doesn’t get much simpler than that (well, it can but more on that below). Notice how you don’t have to stop the flush. If the toilet is calibrated properly, it should have enough water to flush down whatever you may put in there.

Of course, from a performance/efficiency standpoint, it’s not optimal. You are flushing the same amount of water each time, whether there is liquid or solid matter to be disposed of. How does one work this new requirement (the need to save water) into the interface? In Europe (I have yet to see them stateside), there are toilets with a split button. Hit one side and a lesser amount of water is flushed whereas hitting the other side flushes down a full measure. There are usually markings to indicate 1 or 2 (one or two dots is what I’ve seen) so you can figure out which one is which. Now, the interface has become more complicated. Yes, in the grand scheme of things, it’s not rocket science, but humor me. Now a decision has been added. Do I hit the 1 or 2 button? The user is now required to give the device more information than they had to before. The question is, is complicating the interface worth the functional gain and also, is there a way to effect the same result without changing the interface at all?

How about auto-detecting the amount of water needed? Not only does this optimize the efficiency of the device, it also takes away a decision. Now, of course, whether this can be practically done is in question. It is unclear whether the technology to do this reliably exists and there are also issues of manufacturing, cost and maintenance that play into it. But the point is that from a pure interface standpoint, it would seem to be a better solution. It meets the new requirement while retaining the one button simplicity from before.

And to take it even further, it could sense when a flush is needed, alleviating the need of the button altogether. While these types of toilets are becoming more common in public restrooms, I haven’t heard of any demand for these in the home. Here, it’s possible that automatically doing something on the user’s behalf becomes unwelcome. I can imagine in your smaller bathroom at home, you are likely to trigger it accidentally by walking by it which can be startling. In a public setting, you probably don’t care if toilets are firing off left and right like the cannons in the 1812 Overture. On one hand, it could be just an issue of implementation; maybe the technology just isn’t accurate enough. On the other hand, it’s very possible that this is a feature (when to flush) that the user wants control over. Either way, it’s an issue that the designer must grapple with.

The point of all this is that there is some room for improvement in terms of simplifying interfaces when one strives to have their program/device do more for the user. The more your program does, the less the user has to. But one can also overstep their bounds to create something that may be seen as intrusive. It’s about defining the balance between what the user does and what the machine does, with an eye towards putting more on the machine’s side.

As I mentioned in the beginning, I’m intending this to be the start of a series. Don’t expect some well-thought out arc with this; it will probably just be an occasional article here and there. While part of me wouldn’t mind writing more about toilets (I haven’t even touched upon those wacky Japanese toilets1), in the next installments, I’ll try and come up with examples more relevant to computer/human interaction.


1: warning: linked page features bare asses

Comments (6)