LPC: The past, present, and future of Linux audio

By Jake Edge
October 7, 2009

The history, status, and future of audio for Linux systems was the topic of two talks—coming at the theme from two different directions—at the Linux Plumbers Conference (LPC). Ardour and JACK developer Paul Davis looked at audio from mostly the professional audio perspective, while PulseAudio developer Lennart Poettering, unsurprisingly, discussed desktop audio. Davis's talk ranged over the full history of Linux audio and gave a look at where he'd like to see things go, while Poettering focused on the changes since last year's conference and "action items" for the coming year.

Davis: origins and futures

Davis started using Linux as the second employee at Amazon in 1994, and started working on audio and MIDI software for Linux in 1998. So, he has been working in Linux audio for more than ten years. His presentation was meant to provide a historical overview on why "audio on linux still sucks, even though I had my fingers in all the pies that make it suck". In addition, Davis believes there are lessons to be learned from the other two major desktop operating systems, Windows and Mac OS X, which may help in getting to better Linux audio.

He outlined what kind of audio support is needed for Linux, or, really, any operating system. Audio data should be able to be brought in or sent out of the system via any available audio interface as well as via the network. Audio data, as well as audio routing information, should be able to be shared between applications, and that routing should be able to changed on the fly based on user requests or hardware reconfiguration. There needs to be a "unified approach" to mixer controls, as well. Most important, perhaps, is that the system needs to be "easy to understand and to reason about".

Some history

Linux audio support began in the early 1990s with the Creative SoundBlaster driver, which became the foundation for the Open Sound System (OSS). By 1998, Davis said, there was growing dissatisfaction with the design of OSS, which led Jaroslav Kysela and others to begin work on the Advanced Linux Sound Architecture (ALSA).

Between 1999 and 2001, ALSA was redesigned several times, each time requiring audio applications to change because they would no longer compile. The ALSA sequencer, a kernel-space MIDI router, was also added during this time frame. By the end of 2001, ALSA was adopted as the official Linux audio system ~~in favor~~ instead of OSS. But, OSS didn't disappear and is still developed and used both on Linux and other UNIXes.

In the early parts of this decade, the Linux audio developer community started discussing techniques for connecting audio applications together, something that is not supported directly by ALSA. At roughly the same time, Davis started working on the Ardour digital audio workstation, which led to JACK. The audio handling engine from Ardour was turned into JACK, which is an "audio connection kit" that works on most operating systems. JACK is mostly concerned with the low-latency requirements of professional audio and music creation, rather than the needs of desktop users.

Since that time, the kernel has made strides in supporting realtime scheduling that can be used by JACK and others to provide skip-free audio performance, but much of that work is not available to users. Access to realtime scheduling is tightly controlled, so there is a significant amount of per-system configuration that must be done to access this functionality. Most distributions do not provide a means for regular users to enable realtime scheduling for audio applications, so most users are not benefiting from those changes.

In the mid-2000s, Poettering started work on the PulseAudio server, KDE stopped using the aRts sound server, GStreamer emerged as a means for intra-application audio streaming, and so on. Desktops wanted "simple" audio access APIs and created things like Phonon and libsydney, but meanwhile JACK was the only way to access Firewire audio. All of that led to great confusion for Linux audio users, Davis said.

Audio application models

At the bottom, audio hardware works in a very simple manner. For record (or capture), there is a circular buffer in memory to which the hardware writes, and from which the software reads. Playback is just the reverse. In both cases, user space can add buffering on top of the circular buffer used by the hardware, which is useful for some purposes, and not for others.

There are two separate models that can be used between the software and the hardware. In a "push" model, the application decides when to read or write data and how much, while the "pull" model reverses that, requiring the hardware to determine when and how much I/O needs to be done. Supporting a push model requires buffering in the system to smooth over arbitrary application behavior. The pull model requires an application that can meet deadlines imposed by the hardware.

Davis maintains that supporting push functionality on top of pull is easy, just by adding buffering and an API. But supporting pull on top of push is difficult and tends to perform poorly. So, audio support needs to be based on the pull model at the low levels, with a push-based API added in on top, he said.

Audio and video have much in common

OSS is based around the standard POSIX system calls, such as open(), read(), write(), mmap(), etc., while ALSA (which supports those same calls) is generally accessed through libasound, which has a "huge set of functions". Those functions provide ways to control hardware and software configuration along with a large number of commands to support various application styles.

In many ways, audio is like video, Davis said. Both generate a "human sensory experience" by rescanning a data buffer and "rendering" it to the output device. There are differences as well, mostly in refresh rates and the effect of missing refresh deadlines. Unlike audio, video data doesn't change that frequently when someone is just running a GUI—unless they are playing back a video. Missed video deadlines are often imperceptible, which is generally not true for audio.

So, Davis asked, does anyone seriously propose that video/graphics applications should talk to the hardware directly via open/read/write/etc.? For graphics, that has been mediated by a server or server-like API for many years. Audio should be the same way, even though some disagree, "but they are wrong", he said.

The problem with UNIX

The standard UNIX methods of device handling, using open/read/write/etc., are not necessarily suitable interfaces for interacting with realtime hardware. Davis noted that he has been using UNIX for 25 years and loves it, but that the driver API lacks some important pieces for handling audio (and video). Both temporal and data format semantics are not part of that API, but are necessary for handling that audio/video data. The standard interfaces can be used, but don't promote a pull-based application design.

What is needed is a "server-esque architecture" and API that can explicitly handle data format, routing, latency inquiries, and synchronization. That server would mediate all device interaction, and would live in user space. The API would not require that various services be put into the kernel. Applications would have to stop believing that they can and should directly control the hardware.

The OSS API must die

The OSS API requires any services (like data format conversion, routing, etc.) be implemented in the kernel. It also encourages applications to do things that do not work well with other applications that are also trying to do some kind of audio task. OSS applications are written such that they believe they completely control the hardware.

Because of that, Davis was quite clear that the "OSS API must die". He noted that Fedora no longer supports OSS and was hopeful that other distributions would follow that lead.

When ALSA was adopted, there might have been an opportunity to get rid of OSS, but, at the time, there were a number of reasons not to do that, Davis said. Backward compatibility with OSS was felt to be important, and there was concern that doing realtime processing in user space was not going to be possible—which turned out to be wrong. He noted that even today there is nothing stopping users or distributors from installing OSS, nor anything stopping developers from writing OSS applications.

Looking at OS X and Windows audio

Apple took a completely different approach when they redesigned the audio API for Mac OS X. Mac OS 9 had a "crude audio architecture" that was completely replaced in OS X. No backward compatibility was supported and developers were just told to rewrite their applications. So, the CoreAudio component provides a single API that can support users on the desktop as well as professional audio applications.

On the other side of the coin, Windows has had three separate audio interfaces along the way. Each maintained backward compatibility at the API level, so that application developers did not need to change their code, though driver writers were required to. Windows has taken much longer to get low latency audio than either Linux or Mac OS X.

The clear implication is that backward compatibility tends to slow things down, which may not be a big surprise.

JACK and PulseAudio: are both needed?

JACK and PulseAudio currently serve different needs, but, according to Davis, there is hope that there could be convergence between them down the road. JACK is primarily concerned with low latency, while PulseAudio is targeted at the desktop, where application compatibility and power consumption are two of the highest priorities.

Both are certainly needed right now, as JACK conflicts with the application design of many desktop applications, while PulseAudio is not able to support professional audio applications. Even if an interface were designed to handle all of the requirements that are currently filled by JACK and PulseAudio, Davis wondered if there were a way to force the adoption of a new API. Distributions dropping support for OSS may provide the "stick" to move application developers away from that interface, but could something similar be done for a new API in the future?

If not, there are some real questions about how to improve the Linux audio infrastructure, Davis said. The continued existence of both JACK and PulseAudio, along with supporting older APIs, just leads to "continued confusion" about what the right way to do audio on Linux really is. He believes a unified API is possible from a technical perspective—Apple's CoreAudio is a good example—but it can only happen with "political and social manipulation".

Poettering: The state of Linux audio

The focus of Poettering's talk was desktop audio, rather than embedded or professional audio applications. He started by looking at what had changed since last year's LPC, noting that EsounD and OSS were officially gone ("RIP"), at least in Fedora. OSS can still be enabled in Fedora, but it was a "great achievement" to have it removed, he said.

There were only bugs reported against three applications because of the OSS removal, VMware and quake2 amongst them. He said that there "weren't many complaints", but an audience member noted the "12,000 screaming users" of VMware as a significant problem. Poettering shrugged that off, saying that he encouraged other distributions to follow suit.

Confusion at last year's LPC led him to create the "Linux Audio API Guide", which has helped clarify the situation, though there were complaints about what he said about KDE and OSS.

Coming in Fedora 12, and in other distributions at "roughly the same time", is using realtime scheduling by default on the desktop for audio applications. There is a new mechanism to hand out realtime priority (RealtimeKit) that will prevent buggy or malicious applications from monopolizing the CPU—essentially causing a denial of service. The desktop now makes use of the high-resolution timers, because they "really needed to get better than 1/HZ resolution" for audio applications.

Support for buffers of up to 2 seconds has been added. ALSA used to restrict the buffer size to 64K, which equates to ~~70ms~~ 370ms of CD quality audio. Allowing bigger buffers is "the best thing you can do for power consumption" as well as dropouts, he said.

Several things were moved into the audio server, including timer-based audio scheduling which allows the server to "make decisions with respect to latency and interrupt rates". A new mixer abstraction was added, even though there are four existing already in ALSA. Those were very hardware specific, Poettering said, while the new one is a very basic abstraction.

Audio hardware has acquired udev integration over the last year, and there is now "Bluetooth audio that actually works". Poettering also noted that audio often didn't work "out of the box" because there was no mixer information available for the hardware. Since last year, an ALSA mixer initialization database has been created and populated: "It's pretty complete", he said.

Challenges for the next year

There were a number of issues with the current sound drivers that Poettering listed as needing attention in the coming year. Currently, for power saving purposes, PulseAudio shuts down devices two seconds after they become idle. That can lead to problems with drivers that make noise when they are opened or closed.

In addition, there are areas where the drivers do not report correct information to the system. Decibel range of the device is one of those, along with the device strings that are either broken or missing in many drivers, which makes it difficult to automatically discover the hardware. The various mixer element names are often wrong as well; in the past it "usually didn't matter much", but it is becoming increasingly important for those elements to be consistently named by drivers. Some drivers are missing from the mixer initialization database, which should be fixed as well.

The negotiation logic for sample rates, data formats, and so on are not standardized. The order in which those parameters are changed can be interpreted differently by each driver which leads to problems at the higher levels, he said. There are also problems with timing for synchronization between audio and video that need to be addressed at the driver level.

Poettering also had a whole slew of changes that need to be made to the ALSA API so that PulseAudio (and others) can get more information about the hardware. Things like the routing and mixer element mappings as well as jack status (and any re-routing that is done on jack insertion) and data transfer parameters such as the timing and the granularity of transfers. Many of the current assumptions are based on consumer-grade hardware which doesn't work for professional or embedded hardware, he said. It would be "great if ALSA could give us a hint how stuff is connected".

There is also a need to synchronize multiple PCM clocks within a device, along with adding atomic mixer updates that sync to the PCM clock. Latency control, better channel mapping, atomic status updates, and HDMI negotiation are all on his list as well.

Further out, there are a number of additional problems to be solved. Codec pass-through—sending unaltered codec data, such as SPDIF, HDMI, or A2DP, to the device—is "very messy" and no one has figured out how to handle synchronization issues with that. There is a need for a simpler, higher-level PCM API, Poettering said, so that applications can use the pull model, rather than being forced into the push model.

Another area that needs work is handling 20 second buffering. There are a whole new set of problems that come with that change. As an example, Poettering pointed out the problems that can occur if the user changes some setting after that much audio data has been buffered. There need to be ways to revoke the data that has been buffered or there will be up to 20 second lags between user action and changes to the audio.

Conclusion

Both presentations gave a clear sense that things are getting better in the Linux audio space, though perhaps not with the speed that users would like to see. Progress has clearly been made and there is a roadmap for the near future. Whether Davis's vision of a unified API for Linux audio can be realized remains to be seen, but there are lots of smart hackers working on Linux audio. Sooner or later, the "one true Linux audio API" may come to pass.

Index entries for this article
Conference	Linux Plumbers Conference/2009

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 18:02 UTC (Wed) by cventers (guest, #31465) [Link] (5 responses)

One user's opinion:

I do a lot of mixing with the excellent xwax, and I'm also getting into music production. This follows years of using Linux as my primary desktop.

ALSA used to frustrate me, just because it was another thing that didn't always work "just right" out of the box. I've been less than impressed by reliability problems I've had with Intel HDA audio in the past, especially the fact you often have to tell the driver how the card is wired and experiment with different module loading options to get it to work. There may be a good hardware reason for this, but since most hardware works out of the box with Linux these days, the little bit that doesn't really stands out.

I gave Linux a shot for music production but left it behind and began using Windows for that purpose alone. It's actually the first use for Windows I've found in years. I don't blame this on sound in Linux; rather, the stability and quality of some of the open-source tools isn't quite what I might wish it to be, and although there are ways to use VSTs under Wine, there is nothing dependable and functional enough for serious, heavy and everyday use. So for the first time in years, I dual-boot so that I can use a big heap of proprietary software to compose music.

All of that said, when it comes to Vinyl Emulation, I couldn't imagine using anything but xwax + ALSA + Linux. I've read the xwax source code, and while it's not the prettiest I've seen, the author clearly understood how to write simple, reliable, real-time programs. ALSA is great because it supported my USB preamp out of the box, and provided a simple mechanism (asoundrc) where I could apply software gain to assist xwax in better tracking the timecode. udev lets me plug in the USB preamp wherever I want and makes sure it will pick the same device nodes, which is *not possible* with Windows and ASIO. This solution also lets me run with 100% reliability at 2 ms latency, which is great for live mixing.

Frankly, I wouldn't trust Windows with live mixing. I tried the Torq software that came with my preamp, and it seemed like a big, bloated mess... but didn't even work on more than one occasion, even when the hardware was configured precisely as it is when I use it in my Linux environment. Moreover, you can't achieve the same low-latencies with Windows, and I have in fact seen a BSOD in one of my production sessions.

With a handy mlockall() added to the xwax source code, and the fact that it buffers tracks into RAM that are decoded by external command-line utilities, my system *should* keep playing, even if the hard drive (with swap!) crashes, at least until the current tracks are over. Anyone trust Windows to do the same?

As an amateur musician, I should have an opinion about the state of affairs on OS X, but I don't because I'm not fond of microkernel performance, solid-gold prices and extremely basic user interfaces. :p

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 18:32 UTC (Wed) by drag (guest, #31333) [Link] (3 responses)

Well the fact is that OS X does not use a Microkernel and never did. It was a lie. To keep the Apple reality distortion bubble from popping people have created the concept of 'hybrid microkernel' to describe the OS X kernel as if that was some sort of CS concept.

Which all that it means is that Apple copied the NT kernel design approach by incorporating some features of microkernels into what ends up being fundamentally a monolithic design.

But as far as audio stuff goes I am told that Apple's CoreAudio is actually a compelling feature over what is available in other operating systems. It's designed with music production in mind.

--------------------------

To get the best out of Linux it is still very tedious and highly technical.

It involves:

* Installing and configuring Jack
* Configuring your applications to use Jack
* Purchasing a audio card with good performance characteristics. (Intel-HDA, while it is fine for music playback, is not designed for low-latency performance regardless of what drivers you use on it)
* Installing a custom OS kernel with *-rt patches.

And a great deal of learning the ins and outs of how to manage all the above.

Generally the biggest difference between the the actual workflows of Linux vs Windows is that instead of using big music production apps with plugins you use a lot of smaller applications chained together through Jack.

Now keep in mind that it has been a _long_ time since I mucked around with this stuff.

But I have a simple piano-style M-audio midi controller. It connects to the PC using a USB connection.

So the workflow went like this:

USB Controller -(jack midi routing)-> Software Synth (I forget which) -(pcm audio routing)-> Alsa Modular Synth (for effects processing) -(pcm audio routing)-> volume controls -(pcm audio routing)-> digital out on my sound card --> digital receiver --> speakers.

All in all I got the system to reliably operate to the point were I could not notice a delay from when I press a key to when I heard the sound.

Of course this required a couple hours of mucking around and setup. Debian by default could barely do software synth on it's own before I started customising it.

The situation has improved somewhat with the introduction of custom Linux variants in the form of 64Studio and Ubuntu Studio and things of that nature. So at least the software setup is mostly taken care of.

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 18:57 UTC (Wed) by cventers (guest, #31465) [Link]

Perhaps not, I don't remember all of the technical details, but I do remember the funnel lock. That was enough to keep me away from it at the time.

And you're absolutely right about the tedious nature of setup on Linux. I too have a MIDI USB controller from M-Audio, and I too got it working under Linux (actually even patched into reFX Vanguard courtesy of dssi-vst). But what I found is that when musical inspiration hits, I want to spend the least amount of time possible getting into working music software, because it generally doesn't survive having to debug some arcane software issue.

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 18:58 UTC (Wed) by jebba (guest, #4439) [Link] (1 responses)

I appreciate that it may be pretty hard to get jack/rt going using some approaches (e.g. older debian). But a fedora(ish) install with planetccrma packages (including kernel) make it quite easy. I have repeatedly read that Intel HDA can't be used with realtime, but I have used it successfully on various EeePCs and my thinkpad. It seems that the package defaults are more reasonable now--with F11 and the thinkpad I basically just installed the ccrma packages, started up qjackctl and it "just worked".

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 19:20 UTC (Wed) by drag (guest, #31333) [Link]

It's gotten better.

But for best performance you still need to patch and recompile your kernel as well learn the in-and-outs of dealing with the multiple Linux user interfaces.

With my setup I was getting pretty reliable sub-10msec latencies with Jack's settings with no xruns, although I usually let things slide to 60-70 just so I could have more responsive system.

The other thing that sucks about Intel HDA (besides the low quality of digital-analog conversion chips and relative high buffer requirements) as far audio creation stuff is concerned is just the lack of I/O options. This is the biggest real difference between 'profesional' and 'consumer' audio hardware. My old M-Audio Audiophile 24/96 has Analog stereo in, stereo out, digital in, digital out, and midi in and midi out. It also has nice-quality D-A/A-D conversion and the difference is enough that a with a quiet room and nice headphones pretty much anybody can tell the difference.

But, of course, that's PCI.

Otherwise I have no problems with using Intel-HDA for anything. It's the sound card I use the most since that is what is on my laptops. For music playback and doing some recording stuff it's perfectly fine and unless you are in a quiet area with high quality headphones the chances of anybody being able to the the difference is very unlikely.

LPC: The past, present, and future of Linux audio

Posted Oct 29, 2009 18:08 UTC (Thu) by jrigg (guest, #30848) [Link]

Quote:
I gave Linux a shot for music production but left it behind and began using Windows for that purpose alone. It's actually the first use for Windows I've found in years. I don't blame this on sound in Linux; rather, the stability and quality of some of the open-source tools isn't quite what I might wish it to be, and although there are ways to use VSTs under Wine, there is nothing dependable and functional enough for serious, heavy and everyday use. So for the first time in years, I dual-boot so that I can use a big heap of proprietary software to compose music.

Another user's perspective:
I've used Ardour on Linux for music recording for a few years now. I suspect I'm one of a relatively small number who use it for paid work, but I've found it to be very solid and reliable for multi track recording and editing. The current lack of "pretty" plugin GUIs is a positive advantage to me (few things are more disruptive to work flow than having to turn a picture of a knob with a mouse). Those who need good MIDI support might still be better off with Mac or Windows, but I don't require this. I would say stability of my system is noticeably better (for straightforward recording and editing) than that experienced by many of my Mac- and Windows-using colleagues.

One area that still needs improving is using multiple sound cards to boost channel count. In my mobile system I use an RME MADI card (up to 64 channels of simultaneous in/out at 48kHz) with external AD/DA converters, but that is probably too expensive an option for most semi-pro and hobby users. Combining a few eight channel cards for a cheaper setup still requires jumping through difficult configuration hoops (so difficult that AFAIK none of the dedicated media distros come with configurations for doing this).

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 18:24 UTC (Wed) by mezcalero (subscriber, #45103) [Link]

The slides for these talks are now available here:

http://linuxplumbersconf.org/2009/program/

One correction:

"ALSA used to restrict the buffer size to 64K, which equates to 70ms of CD quality audio."

Did I really say that? 64k is actually 370ms @ 44khz/16bit/2ch. Which usually means an interrupt rate of at least 1/180ms or so.

API target

Posted Oct 7, 2009 18:40 UTC (Wed) by ncm (guest, #165) [Link] (15 responses)

Last time I read about this, the major gap was that application coders had no correct, stable, portable API to code against. Has there been any progress on that? Is ALSA (or some subset or future version) supposed to be that API now? What should somebody writing, e.g., a VoIP phone code against for maximum portability and minimum fuss?

API target

Posted Oct 7, 2009 19:33 UTC (Wed) by drag (guest, #31333) [Link] (11 responses)

> What should somebody writing, e.g., a VoIP phone code against for maximum portability and minimum fuss?

The Maemo 5 folks and Palm WebOS folks seem to prefer to just write directly against PulseAudio for their audio needs.

Although for cross-platform compatibility your best bet would be to target Gstreamer since that runs on Alsa/PA/OSS/Windows/OSX/etc.

API target

Posted Oct 7, 2009 20:02 UTC (Wed) by ncm (guest, #165) [Link] (10 responses)

Thanks, drag, I suppose there really is an interpretation of "portability" that applies to Maemo and Palm targets, but it's not the one I meant.

API target

Posted Oct 7, 2009 21:12 UTC (Wed) by drag (guest, #31333) [Link] (8 responses)

I was just giving it as a example.

PulseAudio is portable also. It runs on Alsa/OSS/Windows/OSX/ blah blah blah. But I think for what your asking you'd have better luck with Gstreamer.

If you target Alsa then you can use the 'safe' subset that is supported by PulseAudio then you might be fine. It's possible to port the 'safe' parts of the libasound to other platforms, but what a pain.

You could target SDL and get cross platform compatibility, but that is mostly for game makers.

If you target full Alsa then that is Linux-only. If you OSS then that means it's only useful on some of the BSDs and possibly Solaris.

API target

Posted Oct 7, 2009 21:31 UTC (Wed) by ncm (guest, #165) [Link] (1 responses)

Thanks, I had not got that PA itself had been ported to all these platforms. That seems to change everything.

API target

Posted Oct 8, 2009 12:25 UTC (Thu) by nye (subscriber, #51576) [Link]

>Thanks, I had not got that PA itself had been ported to all these platforms. That seems to change everything.

Not so much - because it's only technically true (in other words, it's lies). A couple of years ago they managed to build it on a selection of platforms so they could claim 'portability' as a ticklist item.

Try building PulseAudio for Windows. When you've given up, try installing the binary package that is usually mentioned whenever this discussion comes up - it's two years old and I couldn't get it to work *at all* with a few hours' hair-pulling.

I don't know where the OS X idea came from - PA doesn't even *claim* to work there, and according to the PA website, the last time it was tested on anything other than Linux was 2007.

API target

Posted Oct 7, 2009 22:10 UTC (Wed) by ncm (guest, #165) [Link] (1 responses)

Do I understand correctly, that if I write directly to PA, then a PA server needs to be running alongside my program, but if I write to Gstreamer, then I might actually be talking to PA, or to ALSA, or to Apple's or MS's services, and most of my code needn't know which? I expect discovering microphones and headsets, and volume controls for them, will be a nuisance no matter what.

API target

Posted Oct 7, 2009 22:43 UTC (Wed) by drag (guest, #31333) [Link]

Yes. I think that is right. I never actually programmed anything dealing with
sound and expected it to work in windows (myself being limited to relatively
low-complexity python programs), but I expect that with Gstreamer you'll have
to depend on the platform to setup everything like that for you to use. So
it seems most appropriate for a standalone application you want to integrate
into the OS its running in.

API target

Posted Oct 8, 2009 12:09 UTC (Thu) by nix (subscriber, #2304) [Link]

Well, some of those PA ports are very old. The last time PA was ported to Windows was so long ago that it predates glitch-free (which means years ago).

API target

Posted Oct 14, 2009 12:15 UTC (Wed) by pharm (guest, #22305) [Link] (2 responses)

If you target Alsa then you can use the 'safe' subset that is supported by PulseAudio then you might be fine.

The (slightly abrasive, but ultimately useful) discussion on the Braid blog about audio support under Linux eventually revealed that the safe Alsa subset isn't really a great deal of use, because you can't guarantee to get your hands on the audio ring buffer & rewrite the parts that haven't been played yet on the fly: The alsa mmap functions that let you do this aren't part of the safe core :(

The *biggest* issue that arose from that discussion was that it's well nigh on impossible for a developer to work out what they're expected to use if they need more than the basic SDL sound API (which can't do a great deal more than 'play this sound now please'). The safe ALSA subset plus the mmap alsa functions (since most hardware can expose those in reality) is probably it, but that isn't exactly well-advertised.

API target

Posted Oct 20, 2009 9:32 UTC (Tue) by njs (subscriber, #40338) [Link] (1 responses)

But, uh, the mmap functions can't possibly be supported over an emulated-in-userspace sound device, i.e., what we're all using now that our "alsa" output is going to pulseaudio?

A better API is clearly needed, but I don't think it involves mmap.

API target

Posted Oct 20, 2009 10:57 UTC (Tue) by cladisch (✭ supporter ✭, #50193) [Link]

> But, uh, the mmap functions can't possibly be supported over an emulated-in-userspace sound device

Classic mmap() can't. However, the ALSA API requires that the applications tells when and where it wants to access the buffer, and when it is done, so it is possible to emulate mmap on top on devices without a memory buffer. (In that case, the extra buffer adds latency, of course.)

> A better API is clearly needed

ALSA has snd_pcm_forward/rewind functions to move around in the buffer. However, these functions are optional, and the PulseAudio plugin does not implement them.

Gstreamer and codecs

Posted Oct 7, 2009 21:19 UTC (Wed) by dmarti (subscriber, #11625) [Link]

Gstreamer gives you much more than just a basic audio API. It also lets you easily use codecs that the user might have decided to install but that you don't want to distribute for whatever reason. The user can play MP3s or SID files, even if you just had FLAC and Ogg on your machine when you built the application.

API target

Posted Oct 9, 2009 5:44 UTC (Fri) by magnus (subscriber, #34778) [Link] (2 responses)

It's not my favorite API, but for a cross-platform (incl Windows) VoIP app, I probably would go with Portaudio.

SDL has a nice and friendly audio API, is cross-platform and has worked very well in my experience but it doesn't do recording.

The PulseAudio API is OK to work with as well. It probably would be my choice for a VoIP app if I didn't have to care about portability.

GStreamer is extremely focused on media-player like applications, and all API documentation is built around the assumption that data comes from somewhere else and you're just building a pipeline. Using an application as the data source seems so to be rare and it's not obvious how to do it.

API target

Posted Oct 15, 2009 10:12 UTC (Thu) by Uraeus (guest, #33755) [Link] (1 responses)

GStreamer is not focused on 'media player' type applications at all, GStreamer was designed from the very beginning to have a much wider use area. There is a large host of applications using GStreamer which are not media player style applications like Buzztard, Jokosher, PiTiVi, Arista, Transmageddon, Empathy and so on.

As for using an application for the data, that has been addressed quite some time ago and there are now two GStreamer elements called appsrc and appsink which specfically targets getting or sending data to an application.

API target

Posted Oct 17, 2009 21:21 UTC (Sat) by magnus (subscriber, #34778) [Link]

OK. I was speaking from my own experience from a couple of years ago, and I don't recall that appsrc/appsink existed at that time.

Still, I don't think that it is obvious how one should port an audio app using another audio API (ALSA for example) to use GStreamer for output. GStreamer seems to be designed more like a toolkit which you have to design your app around (like GTK for graphics) rather than just the audio bottom layer that most other API:s provide.

LPC: The past, present, and future of Linux audio

Posted Oct 7, 2009 21:08 UTC (Wed) by mjthayer (guest, #39183) [Link] (1 responses)

> He outlined what kind of audio support is needed for Linux, or, really, any operating system. Audio data should be able to be brought in or sent out of the system via any available audio interface as well as via the network. Audio data, as well as audio routing information, should be able to be shared between applications, and that routing should be able to changed on the fly based on user requests or hardware reconfiguration. There needs to be a "unified approach" to mixer controls, as well. Most importantly, perhaps, is that the system needs to be "easy to understand and to reason about".

BeOS anyone?

LPC: The past, present, and future of Linux audio

Posted Oct 8, 2009 10:57 UTC (Thu) by tialaramex (subscriber, #21167) [Link]

BeOS punts most of the issues they were discussing there as far as I can tell.

For example, where is the audio data going? In _theory_ BeOS lets applications connect up any kind of graph. In practice, nearly all software asks for the system's default (software) mixer and feeds it 16-bit PCM.

Someone wrote a piece of software "Cortex" which exposes the graph, but if you actually install it and play around, first of all you'll crash a lot (Cortex and sometimes BeOS too) and secondly you'll start to find all the weird little bugs no-one encountered because they always hooked things up to the default mixer. So rather than exposing the graph in a way that users can play with it, like the various JACK graph tools, it behaves more as a debug tool for developers who know how to tread carefully.

Unclear sentence

Posted Oct 8, 2009 12:52 UTC (Thu) by epa (subscriber, #39769) [Link]

By the end of 2001, ALSA was adopted as the official Linux audio system in favor of OSS.

I think you meant to say 'ALSA was adopted instead of OSS', or 'OSS was dropped in favor of ALSA'.

And BTW, why is the <q> element not allowed in comments?

LPC: The past, present, and future of Linux audio

Posted Oct 8, 2009 14:29 UTC (Thu) by mezcalero (subscriber, #45103) [Link]

A small correction regarding the 20s hw buffering thing:

We actually already do the revoking for the 2s buffers. It works quite well these days. The big difference between 2s and 20s when doing this however that when you fill up the full 20s with new audio, this might be a quite expensive operation, for example because you decode it from MP3. Now if the user is seeking around and we have to revoke what we already wrote to the hw playback buffer dropping 20s and decoding that again comes at a very steep price, while dropping 2s and decoding that again might still have been acceptable. So, the idea I was ventilating at LPC (or at least wanted to explain, I might not have been clear on this) was that we use some kind of stochastic model so that for some time after the last seeking we don't fill up the full 20s but only 5s or so, which is much cheaper. And then when during the next iteration we noticed that we never had to revoke it, we generate 10s for the next iteration. And when after it we noticed that we didn't have to revoke it we go for the full 20s for the following iterations. However, if the user seeks around during that time we go back to filling up only 5s again. We'd do that mechanism under the assumption that if the user seeks around he does that in "bursts", i.e. seldom, but when he does it he does it a couple of times shortly following on each other. And we don't want to pay the price for having to decode the full 20s again each time.

Copy CoreAudio?

Posted Oct 13, 2009 14:53 UTC (Tue) by christian.convey (guest, #39159) [Link]

If CoreAudio successfully supports both simple desktop progams *and* professional audio programs, why doesn't Linux simply adopt the CoreAudio API and architecture?

LPC: The past, present, and future of Linux audio

Posted Oct 18, 2009 11:52 UTC (Sun) by hannu (guest, #61409) [Link] (3 responses)

The article seems to be full of misinformation about OSS. I don't know
if there were already in the original presentation. Hopefully not since
both Paul and Lennart have been around long enough time to learn the
basic facts.

"The OSS API requires any services (like data format conversion,
routing, etc.) be implemented in the kernel."

Operations like data format conversions are no rocket science. They do
require few extra CPU cycles but so what. The same number of CPU cycles
will be spent even if the conversions are done in user space. The same
is true with "routing". Routing/mixing is always based on some kernel
level IPC mechanisms. In case of OSS these mechanisms are just hidden
behind the kernel level audio API.

"It also encourages applications to do things that do not work well with
other applications that are also trying to do some kind of audio task."

Like what?

History of the anti-OSS campaign is based on this kind of silly
arguments that don't make any sense. Any API can be misused by
developers who don't care to read the documentation. This is true with
OSS, ALSA, Jack, PulseAudio as well as every single API in the software
industry.

"OSS applications are written such that they believe they completely
control the hardware."

This is complete BS. Yes, there are many OSS applications that open
/dev/mixer and peek/poke the global hardware level volume controls
directly. However this is not the way how OSS is designed to be used.

"Because of that, Davis was quite clear that the 'OSS API must die'. He
noted that Fedora no longer supports OSS and was hopeful that other
distributions would follow that lead."

This is intentional misinformation. There is very loud group of audio
(API) developers who insist that OSS must die based on arguments like
the above. This started about 10 years ago but OSS is still used by
large number of applications. Why? Simply because OSS is perfectly
adequate for needs of 99% of the audio programs. And I doubt ALSA is any
better than OSS for the remaining 1% of applications (the laws of the
nature are the same for both of them).

The big reason to kill OSS is not OSS itself. If OSS is really that bad
then it should have died spontaneously years ago. However it's still
hanging around. The real reason is a design mistake made by developers
of ALSA . It prevents ALSA from co-existing with OSS. This mistake
prevents ALSA from becoming successful as long as there are any
applications using the OSS API. To be specific this "mistake" was ALSA's
decision to implement their own kernel level drivers instead of just
implementing alsa-lib on top of OSS drivers. Another fundamental mistake
was dmix that separates software mixing conceptually from hardware
mixing. For dmix to work it's necessary that all applications go through
ALSA's library interface even they don't need any of ALSA's "advanced"
features. Without these fundamental design flaws both OSS and ALSA could
exist at the same time.

As a workaround it's mandatory that linux distributions like Fedora drop
OSS prematurely. They are also required to move to RT kernels because
ALSA and its followers depend on real time response times. Only after
that the emperor can finally get his new clothes.

Re: OSS vs ALSA flamewar

Posted Oct 19, 2009 12:34 UTC (Mon) by cladisch (✭ supporter ✭, #50193) [Link]

> The article seems to be full of misinformation about OSS. I don't know
> if there were already in the original presentation.

In the original LPC article, there was a link to the slides:
http://linuxplumbersconf.org/2009/program/

> "The OSS API requires any services (like data format conversion,
> routing, etc.) be implemented in the kernel."

This is perfectly true.

> Operations like data format conversions are no rocket science. They do
> require few extra CPU cycles but so what. The same number of CPU cycles
> will be spent even if the conversions are done in user space. The same
> is true with "routing". Routing/mixing is always based on some kernel
> level IPC mechanisms. In case of OSS these mechanisms are just hidden
> behind the kernel level audio API.

These are not arguments for putting the services in the kernel.

Davis' point was that the OSS API requires that _all_ services _must_
be in the kernel; it's not possible to add user-defined plugins without
going through some sort of loopback device.

> "It also encourages applications to do things that do not work well with
> other applications that are also trying to do some kind of audio task."
>
> Like what?

E.g., mmap(), which cannot be reasonably emulated; or mixer devices.

(The OSS v4 API is better in this regard, but developers cannot rely on
it as long as most implementations only offer v3.)

> "OSS applications are written such that they believe they completely
> control the hardware."
>
> This is complete BS.

Please watch your language. And your next sentence proves you false:

> Yes, there are many OSS applications that open /dev/mixer and
> peek/poke the global hardware level volume controls directly.
> However this is not the way how OSS is designed to be used.

Opening /dev/mixer and using it was the _only_ way how OSS was
designed to be used.

The /dev/something interface implies hardware control; even when
implementing 'virtual' devices, OSS has to use the same interface and
to pretend that it's a 'real' hardware device.

> OSS is perfectly adequate for needs of 99% of the audio programs.
> And I doubt ALSA is any better than OSS for the remaining 1% of
> applications (the laws of the nature are the same for both of them).

It's the OSS implementation that's lacking.
The percentage of programs that, for example,
* want to make use of the full capabilities of any modern device (like
USB, HD-Audio, Bluetooth, Xonar), or
* want to use MIDI, or
* want to work correctly with suspend/resume, or
* must work on embedded architectures,
is way more than 1%. As long as these deficiencies exist, OSS will not
even be considered a theoretical replacement for ALSA.

> They are also required to move to RT kernels because
> ALSA and its followers depend on real time response times.

Please don't publish misinformation. Real-time response times are only
needed for real-time tasks like low-latency signal processing or
synthesizers, and this applies to any implementation, i.e., to both OSS
and ALSA.

LPC: The past, present, and future of Linux audio

Posted Oct 20, 2009 9:38 UTC (Tue) by njs (subscriber, #40338) [Link] (1 responses)

I have no technical opinion on sound APIs, though it's obvious that ALSA is a total mess. But this cult-like thing around OSS is just *bizarre*, and plenty to convince me that ALSA is a better choice.

LPC: The past, present, and future of Linux audio

Posted Oct 20, 2009 17:43 UTC (Tue) by bronson (subscriber, #4806) [Link]

njs, Hannu's the guy who wrote OSS, took it private to try to extract money, and is now acting surprised that nobody's interested in using it (even the BSDs, which are OSSish by legacy but still have rewrites: http://wiki.freebsd.org/Sound )

He can safely be ignored.

That said, I do wish ALSA wasn't such a mess. I'm hoping GStreamer and Phonon become the de facto application APIs so app writers don't have to care about the slipperly kernel sound APIs anymore.