Simon - speech activated user interface for KDE (KDE.News)

[Posted August 24, 2009 by jake]

KDE.News has a look at simon, which is a speech-activated interface for KDE. It looks like an interesting project, but, unfortunately, may suffer from some licensing snags: "HTK, the toolkit responsible for the HMM [Hidden Markov Model] evaluation is distributed under GPL-incompatible, restrictive license that prevents redistribution. In order to install simon, one must separately download HTK from their website which requires registration. The source is available, and they encourage you to modify and contribute to it, but it cannot be redistributed. [...] Additionally, Julius, used for the voice recognition has an attribution clause which causes problems with the GPL in a way that is reminiscent of the old-style BSD license (the one with the advertising clause). Any research conducted with simon would thereby require a reference to the Julius authors in the bibliography."

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 16:04 UTC (Mon) by JoeBuck (subscriber, #2330) [Link] (13 responses)

The authors evidently think that a user-does-the-link approach, where a proprietary component has to be downloaded and linked by the end user into a GPL code base, will fly.

Steve Jobs tried that: his old company, NeXT, wanted to ship a proprietary Objective-C compiler that was at least 95% FSF GCC code. They backed down, as the FSF (or their lawyer, Eben Moglen) convinced them that NeXT would lose badly. They wound up contributing their Objective-C changes to GCC and making it free instead. But because of this history, I'd be amazed if Novell or Red Hat would touch this with a ten meter pole.

I think that the code could be re-architected to make it legal, so that there is a separate, standalone executable that contains the speech recognition code, perhaps structured as a server that any apps (KDE or otherwise) could interact with. The protocol could be specified and documented, to allow the proprietary speech recognition component to be replaced with a free one.

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 16:15 UTC (Mon) by rahulsundaram (subscriber, #21946) [Link] (10 responses)

As far as Fedora is concerned, the licensing guidelines are very clear and
detailed on this matter. Unless all the dependencies are free and open
source, it has no chance of going in. This is regardless of whether such a
approach is legal or not.

https://fedoraproject.org/wiki/Licensing:Main

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 17:32 UTC (Mon) by bedahr (guest, #60420) [Link] (9 responses)

The HTK is, strictly speaking, no dependency of simon. It extends simon functionality: Without it is not possible to create speech mdoels but you can still use existing ones.

So I would much rather compare it to firefox / flash. Just because firefox functionality can be extended using flash, firefox itself is not non-free...?

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 17:50 UTC (Mon) by jspaleta (subscriber, #50639) [Link] (8 responses)

Relying on code that can only be generated with a proprietary tool is touchy.
Are there existing speech models which can be shipped under an appropriate license? Are speech models binary blobs akin to byte-compiled code java or python (ie. not allowed in Fedora)? Or are the speech models themselves self documenting scripts that are then interpreted at run time by Simon? Could I realistically write or edit a speech model manually. If Fedora did choose to ship a pre-existing speech model generated by the proprietary tool and there was a bug found in the speech model, what steps would the Fedora maintainer of Simon need to take to fix the problem?

If Simon needs at least one speech model locally to be useful...we'd have to understand what the speech models are in terms of codebase and the implications thereof.

-jef

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 18:11 UTC (Mon) by dlang (guest, #313) [Link] (1 responses)

the GPL does not require that the entire toolchain be free

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 18:50 UTC (Mon) by jspaleta (subscriber, #50639) [Link]

I did not say that it did. Fedora's policy is more nuanced than what is strictly allowed by the GPL. If individual users want to use the proprietary toolchain, they are free to, but at the project level Fedora puts a heavy emphasis on an open toolchain for contributors to use in testing and maintenance. I can use intel's icc compiler on my own systems for its parallelization support..Fedora's not going to stop me...but I can't be expected to submit code to Fedora packages which require icc specific features to be useful. If dependency has to be built with a proprietary toolchain, its inclusion would be exceptional and would require significant discussion methinks.

The devil's in the details. I think a lot of people would need to study up on the details of this codebase interaction with proprietary bits, this is not a fire and forget situation by any means. The point is, this isn't a common situation in terms of licensing a functional software stack, nor is it ideal. It outside standing policy and common practise.

-jef

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 18:47 UTC (Mon) by bedahr (guest, #60420) [Link] (5 responses)

Speech models are not code. Think of them as documents (in this metaphor simon is a document editor).

Of course there are existing speech models.

You could even use speech models created by SPHINX-Train by using a speech model converter to convert the model to HTK format (there is such a converter available on sourceforge).

BUT: Speech models created by the HTK can be used _freely_ anyways. You can create models using HTK and then basically use them for whatever you want. This is also the reason why the voxforge initiative can build their speech model using the HTK and still licence the model itself under the GPL license.

The HTK plain text hmm format is well documented.

You can check out an example here: http://www.repository.voxforge1.org/downloads/Nightly_Bui...
(The file hmmdefs is the HMM model created by the HTK).

I don't know what you mean by "bug in a speech model" but I am going to assume that you mean e.g. wrongly transcribed trainingssamples. Well fixing that would depend on how you built the model in the first place. In all likelyhood you would end up changing the input files and re-generating the whole model with those new parameters (using the HTK, SPHINX or whatever was used in the first place).

For the record: There is an open source initiative called ghmm which tries to create a GPL licenced library for working with HMM models but I contacted them and they said they were not ready for this kind of usage and generally want to be more general-purpose than the HTK so I am not sure if they will be soon/ever.

Also, the HTK is very high quality software and a good recognition rate is obviously the main goal for any speech recognition software - GPL or not.

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 18:56 UTC (Mon) by jspaleta (subscriber, #50639) [Link] (4 responses)

great!
...document format..not compiled code.
...open tool to convert other formats into that format.
...other formats creatable by open codebase.

This should be a non-issue if this comes up for discussion in a package review.

-jef

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 19:01 UTC (Mon) by bedahr (guest, #60420) [Link] (3 responses)

Thanks for actually _discussing_ this!

I can't remember how often I had the exact same issue raised but it always ended in someone crying out: "Uses non-GPL code! Kill it with fire!" (or similar) and not relating to any replies or explanations from my side at all.

So again, thanks for understanding the complicated situation!

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 19:46 UTC (Mon) by jspaleta (subscriber, #50639) [Link] (2 responses)

Make sure you are able to make the speech model as document format argument clear when someone steps up to submit the package. You might want to drop a blurb in a high level readme in the simon codebase which talks to this (if its not there already). When/if this comes up for submission as a Fedora package, there's no guarantee the reviewers will have read the discussion here..but they will review the material in the simon codebase in discussion with the packager. Dropping a note into a readme will help make reviewers aware that speech models are editable text file content and note at a minimum the existence of sphinx-train and the speech model format converter tool.

-jef

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 20:31 UTC (Mon) by bedahr (guest, #60420) [Link] (1 responses)

Yes I will add this information tomorrow.

Maybe I'll even add it to the FAQ of the project wiki...

But btw.: Has anyone even talked to the fedora team? Or is this a hypothetical discussion? If so it is oddly fedora specific IMHO?

Greetings,
Peter

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 20:47 UTC (Mon) by jspaleta (subscriber, #50639) [Link]

This is somewhat hypothetical.... someone has to do the packaging work and submit it for review... and I'm not aware of anyone working on packaging Simon yet for Fedora. Hell this is the first I heard of it. I'm holding out for direct neural interfaces instead of speech...moving my mouth takes soooo much effort.

I'll bet you dollars to doughnuts members of Fedora's Technical leadership will read the discussion here and will be aware of the content argument. But ultimately it comes down to someone taking the responsibility to maintain the Simon package and start the package submission review process. A summary of the situation in faq or readme will help prevent an unnecessary delay once someone does step forward.

I would also think a Debian packaging effort would also benefit from a summary of this discussion...if they aren't ready working on packages. I think they'll have similar concerns but I'm less informed about the details of Debian policy with regard to "content" versus "code" than I am about Fedora's policy.

-jef

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 17:28 UTC (Mon) by bedahr (guest, #60420) [Link] (1 responses)

Yes. We are doing exactly what you described in your last paragraph.

The HTK is never linked to simon - simon starts the htk executables during the building of the model which makes it _perfectly_legal_ (if starting proprietary applications from within other applications simon would not be allowed to start say MS Office either).

All HTK specific code is bundeled in the simonspeechcompilation library (in one class) which could easily be replaced by a GPL replacement - if there were any.

The same goes for Julius, btw.

The server application is called simond and uses tcp/ip. Audio streaming over the network is supported. As only the server uses the HTK you could do a huge setup of simon with one main server that compiles the model thus further limiting the need for the HTK (it only has to be installed once, on the server side).

As we neither have the know-how nor the resources to re-write the HTK this is all we can do for now, sadly.

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 24, 2009 18:21 UTC (Mon) by JoeBuck (subscriber, #2330) [Link]

As long as there is no executable that contains both GPL and GPL-incompatible code, and the coupling between the two executables is loose, it's probably OK. But my guess is Fedora still wouldn't touch it, and debian would have to put even the free part into contrib (since it depends on a proprietary component).

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 25, 2009 2:58 UTC (Tue) by pabs (subscriber, #43278) [Link] (2 responses)

I am very very much reminded about the Java trap:

http://www.gnu.org/philosophy/java-trap.html

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 25, 2009 7:39 UTC (Tue) by bedahr (guest, #60420) [Link] (1 responses)

... with the difference that the HTK is not needed to start / use simon.

You won't be able to compile a model without it but you can do other stuff (and use existing models like said before).

In other words: Even in an "all free system" simon can still be useful.

Simon - speech activated user interface for KDE (KDE.News)

Posted Aug 28, 2009 3:08 UTC (Fri) by pabs (subscriber, #43278) [Link]

So, this would mean no models can get into Debian until HTK is replaced?

Licensing Amateurs

Posted Aug 25, 2009 10:43 UTC (Tue) by pboddie (guest, #50784) [Link] (1 responses)

For all we can tell, the software and work that went into making HTK could be quite good, but the licensing is marred by the restrictive, amateur mentality that comes up again and again in academia: either people want to "commercialize" or "monetize" their research, or they want other people to use their stuff and yet maintain almost total control. Both of these things contradict the proper practice of scientific inquiry, of course, and large regions of academia would do well to discover how Free Software has enabled collaboration in a far more effective way than some of the tame and inefficient ways that some scientists seem to think is "state of the art" (undoubtedly prompted by their corporate overlords at their institutions' "intellectual property" offices).

Licensing Amateurs

Posted Aug 25, 2009 12:39 UTC (Tue) by droundy (subscriber, #4559) [Link]

This is something FFTW got right. Just license it as GPL, with an option to pay for a more restrictive license. It's totally open, and the university (MIT in this case) gets any licensing fees they would have gotten were it restrictively distributed. Copyleft works for you!

The only missing element is an attribution clause for research done with the software. That and the GPL isn't so helpful for research binaries (as opposed to libraries), as other researchers are unlikely to redistribute their modified code.