I'm going to talk about a few different aspects of Audio in games, there is a surprising amount to cover.
Audio is state
Most naive developments, will simply provide a facility to play (via command/function) a single audio file, or maybe a bunch of simultaneous audio files.
playSound("mysound.ogg");
Once started, the sound (or sounds) will play until they are finished and then stop. For trivial games this is fine, tic-tac-toe, minesweeper, maybe even tetris; but when we're talking about RPGs/Adventure Games/FPSs or anything non-trivial it won't fly and using it wont impress anyone.
Why not?
Well because there is a severe lack of control, its the graphical equivalent of being able to only draw a maximum of 20 white squares on a black background; you'll see something, but it is horribly un-interesting. So, what kind of control do we need? First off it would be nice if sounds could loop repeatedly, very handy for background music, or ambient audio... but then you would need a way to potentially stop a looping sound...hmm...but then you would need a way to reference a playing sound; it all gets complicated very quickly, but fear not, I can de-mystify it.
For developers who get past the one-shot sound playing commands, chances are they end up with something where you can play multiple sounds, specify them as one-shot or looping, and each sound played has an alias so you can reference it to stop it later. This is certainly an improvement but it still falls short.
Disembodied Audio
In some games it is quite fun to hear the sounds of spirits from beyond; however if your finely crafted npc is whistling a merry tune and then walks off screen, chances are his haunting whistling will remain though he has gone.
Audio is state - revisited
Sound is never disembodied, something always causes it, an 'emitter' of sound, but the process of playing audio on a computer doesn't require this; a computer 'renders' sound but does not own the sound emissions. Sound familiar? To many it will be a similar realization that our graphics are not our characters, but rather a rendering of the character data.
Sounds should be members of your game-state, each object that lives in your game state, be it The World, a Room, or an Actor; the game should know what kinds of sounds they could potentially play (as constant data); but most importantly the state should know if any of those sounds are playing.
If a tree falls in the woods and no one is around, does it make a sound?
No, it doesn't. The data of what sounds are Logically Playing should be kept in state. This is often very different from what sounds are actually heard.
Things which determine audibility:
- Is the sound supposed to be playing? (sound logical state)
- Is the player (viewer) within coarse audibility (in the same room)
- Is the actor within fine audibility of the player character (spatially for 3D sound)
- Has the user enabled a particular sound category (music/sfx/voice)?
- Is there an open sound channel to actually render another audio file?
- Does the user even have sound hardware?
So, if a tree falls in our game, and we're not in the same room, or even around it, the state says it makes a sound, but it isn't audibly heard. Lots of good wisdom here.
How Selenite Rolls
In selenite each first-class object can have sounds associated with it; and audibility of those sounds depends on the object type.
Note: when i say 'sounds are heard' i mean baring user settings or hardware limitations.
- Game - All playing sounds are always heard
- Room - All playing sounds are always heard if the player's current room is this room.
- Actor - All playing sounds are heard if the actor is within the current player's room, and for 3D sound, within a close proximity.
What this means to the developer/designer
As the player changes the structure of the state (that is, changes the current room, current actor, or what actors exist in what rooms) an Audibility test is run on the state, and each state object is marked as audible or not.
Thus, as an actor walks into the room you're in, it is marked as audible, this audible marking will then make the object check and see if any of its sounds are supposed to be playing, if they are, it will actually render them.
Similarly if the actor walks out of the current room, it will be marked as inaudible, and any sounds playing or not logically, will cease to render.
This is very important to the ease of development, not having to micro-manage the audio state. With this weight of our backs we can feel free to add lots more audio and create a richer environment.
Audio Rendering
Now that we've talked exhaustively about how the concept of audio should be structured, we need to talk about how to actually render this audio. Not so much how to render it great detail, most of you should know that you'll be sending PCM samples to the sound card via some API; but some high level concepts.
Streaming
I am a big fan of streaming audio, mainly because, if you have digital music of significant length, you're going to need to stream; and streaming is very memory friendly. For SeleniteWin32 there exists a high-level streaming interface for audio, and here is how it works:
Requesting a Channel
When you've decided it is time to render some audio, you first request a channel. A channel in this case is some object or handle that represents a currently streaming audio file. When you request it, you pass in the audio file you would like loaded into it and whether or not it should loop. Let's deal with the worst case scenario; due to limitations of resources it is very possible that you will get back a null handle this is the way the audio renderer tells you (i can't play this audio right now) in such a case you should honor this, and to the state, your audio is still playing; this might happen under high audio load situations and is fine.
Assuming that you do get back a valid channel, your audio will now be playing and the channel is your responsibility.
Keeping an eye on the channel
Once you have a valid channel, you'll want to periodically check if it is still rendering. If it is, then no worried, business as usual, if its not then it means your sound is completed (this will never happen for looping sounds); at this completion you should mark your state sound as no longer playing, release the channel, and raise any 'sound done' events you'd like.
Releasing a Channel
Releasing in a channel, playing or not puts it back into the pool of available channels (i use 16 channels). The idea is that while your state may have hundreds or thousands of logically playing sounds; at any given time you should only need to hear 16 of them, and any more than that would likely be a great cacophony.
Tying it all together
As a state object is marked audible, it goes through all of its sounds, and for ones that should be playing it requests channels for each of them. It checks these channels each update loop to see if they've stopped; if they have it marks the sounds as not playing and releases the channels. If the object gets marked as inaudible (say a character leaves the room) all valid channels of the sounds are swiftly released, but the sound states are kept as they were.
As far as I can tell this is the only post I've ever seen on audio programming of this nature, I'd love to hear peoples opinions on it, and systems you use.