file formats & ultra-low bandwidth encoding

PullMeUnder · Post by **PullMeUnder** » December 18th, 2005, 7:10 am

Hi all!

First of all, I have some quarrels with the available file formats on LibriVox, or rather with how they are presented:

No file sizes given with the links. Being on broadband, this isn't of too much importance to me, but last time I checked, there still where loads of modem users out there. Obviously, they'd rather chose the smallest file, and it doesn't become apparent which one that is (okay, 64 is smaller than 128, but what about the ogg file?)
"Ogg" isn't really saying much. The ogg file format is just a container format, you could compare it to a zip file (this goes wrong on many web pages). In theory, an ogg file could contain vorbis (the most common, comparable to mp3), flac (lossless, pretty common by now), speex (a voice compression(!)), or even theora (a video format). In real life, most files containing anything but ogg vorbis will have a different file extension (i.e. .flac, .spx, etc.), but this is not obligatory.
No bitrate or quality setting given for the ogg vorbis file.
Why only one bitrate for ogg vorbis files? Especially since ogg vorbis generally fares better than mp3 in the lower bitrates, as far as I know.
Additional formats. For one, allowing aac/mp4 might be considered, this format/container that's the dedicated successor of mp3. iTunes allows for a pretty simple and painless generation of these file types, and they come over pretty well in the low bitrates, too. And, in my opinion, evenm better/more important: Adding speex. Most probably you won't know this format, since it's not a very popular one. It's an open source codec designed for speex. I just experimented a bit with it on the Raven file. Not having the lossless version, I used the 128kbit ogg vorbis file, and transcoded it to a 14(!) kbit speex file. You definitely do notice the difference, but the quality still is very good. If anyone is interested, I can send them the sample, or put it on my webspace (I just hope this is within my rights to do. If not, someone please notify me).

Without further research, I assume that these problems party arise from the usage of the archive.org facilities. But points 1 and 2 could still be easily remedied. And I think it is possible to upload different file formats to the archive, it just doesn't "know" them then. But when I googled for speex on archive.org, I found loads of spx files. And the difference between 8.7MB and 1MB in the file I tested it on is quite impressive, don't you think?

PS: http://www.hydrogenaudio.org is a good, probably the best, place to read about audio compression/formats.

Post by **hugh** » December 18th, 2005, 3:15 pm

Hello pullmeunder,

Thanks for the post, I'll respond point-by-point, but the general answer which covers just about all is this: LibriVox is an entirely volunteer effort, and the management of files (collection, verification, meta-data management, uploading, cataloging etc) is tedious and time-consuming, and to boot the archive.org system is not what one would call user-friendly, or management-friendly. All this work is done by volunteers, on their own time, with no remuneration except the satisfaction of doing the crucial work without which LibriVox would be a nice idea, and nothing else. So, I encourage you to volunteer your services to help remedy the problems you outline!

as for specifics:

1. No file sizes given with the links.

our process is this: individuals submit their files to book coordinators, who then submit to our (volunteer) catalogger (currently one person, with some help here and there), who then must check files and upload to archive.org, then import, add relevant metadata etc. This is a thankless, tedious, and time-consuming job as it is. to add file conversions, extracting & inputting time/file size data, etc is just too onerous for a volunteer to take on. We are currently building a more automated system that should, we hope, strip and publish much of this info, but we are not yet there, and doing the best we can. so again, if anyone would like to help on this, please step up.

2 "Ogg" isn't really saying much.

For those in the know you are right, for most people tho, ogg IS ogg vorbis, but your point is taken. we should specify at vorbis. speex would be fine (tho see below), tho I know none of my players (on my mac ibook) wish to play speex for whatever reason.

3. No bitrate or quality setting given for the ogg vorbis file.

The vorbis files are generated automatically by archive.org when you upload a 128 kbps mp3 (hence we require 128 kb mp3 from our volunteers); and to be honest I don't know what bitrate archive generates for vorbis, tho we could, of course check.

4. Why only one bitrate for ogg vorbis files? Especially since ogg vorbis generally fares better than mp3 in the lower bitrates, as far as I know.

See 3 above. Archive generates an ogg vorbis file, that's what we use. If some passionate volunteer very interested in sound file open formats wishes to find a way to provide us with other bitrate ogg vorbis files (or speex or ... ) without adding to the workload of the cataloggers, I would be happy.

5. Additional formats. For one, allowing aac/mp4 might be considered, this format/container that's the dedicated successor of mp3. iTunes allows for a pretty simple and painless generation of these file types, and they come over pretty well in the low bitrates, too.

we went with simplicity, but also recognize the importance of open formats. 128 mp3 converts automatically through archive.org to 64kbps mp3 and ogg vorbis. So that's what we do.

Again, if a volunteer can find a way to add formats, compression, or in any way help the project, I am supportive, as long as it does not add to the already arduous, thankless, and difficult job that our catalogging volunteers take on.

tis · Post by **tis** » December 18th, 2005, 3:57 pm

PullMeUnder wrote: Not having the lossless version, I used the 128kbit ogg vorbis file, and transcoded it to a 14(!) kbit speex file. You definitely do notice the difference, but the quality still is very good. If anyone is interested, I can send them the sample, or put it on my webspace (I just hope this is within my rights to do. If not, someone please notify me).

Please do put the sample on your webspace (or email it to me) - "all Librivox recordings are in the public domain" so you are certainly within your rights!

Perhaps you can also point me to a suitable tool for transcoding (ideally from 128kbps MP3 to speex), ideally that will compile and run on Debian and/or MacOSX.

Jon Ingram · Post by **Jon Ingram** » December 18th, 2005, 4:34 pm

I have a 12kbit Speex encode of the first part of my reading of The Communist Manifesto available here:
http://s47.yousendit.com/d.aspx?id=04MRRCKK9A8SA0488WD8Q50I2R
(should be there -- can't check at the moment as Yousendit's being very slow)
Sounds better than any equivalent 12kbit MP3 I could generate!

speexenc and speexdec, the commandline tools which encode and decode Speex files from .wav, are available for many operating systems from the Speex homepage.

PullMeUnder · Post by **PullMeUnder** » December 19th, 2005, 3:44 am

Thanks for all the replies to my post!

First and foremost, I'm sorry if I appeared to be making demands as if this was a service I had paid good money for. I recognize and deeply respect all the effort that has gone and is going into the librivox project for no profit at all. I merely wanted to point out a few areas where improvement (imho) is possible/desirable. Unfortunately, right now I don't have the time to look into the specifics of several of the issues that arose, but I hope to remedy that during the next few days. I'll answer what can be answered relatively quickly (though I have just spend over half an hour in trying to find MAC software

)

hugh wrote:
1. No file sizes given with the links.
our process is this: individuals submit their files to book coordinators, who then submit to our (volunteer) catalogger [...], who then must check files and upload to archive.org, then import, add relevant metadata etc. This is a thankless, tedious, and time-consuming job as it is. [...] We are currently building a more automated system [...]

This is a process I'll obviously have to look into more closely if I want to say anything serious about it. However, what are you expecting from the automated system? It shouldn't be to hard to whip up a simple & basic Python script that takes an uncompressed/losslessily compressed input file (wav, flac, ...) and do anything that needs to be done with it. I.e., convert it to various file formats, upload it (depends on the exact upload procedure on archive.org, though), extract meta-info and even generate a first draft of the publication website. And yes, I could imagine trying to come up with something in that area, once I understand what is needed. I can't make any promises beforehand, and it definitely will not be beautiful

.

hugh wrote:
2 "Ogg" isn't really saying much.
For those in the know you are right, for most people tho, ogg IS ogg vorbis, but your point is taken. we should specify at vorbis.

I know I was being nit-picky, but in this case doing it right wouldn't hurt anyone

. You might even consider underlaying the "ogg vorbis" with a link to the xiph homepage, so people can read up on that format.

hugh wrote:speex would be fine (tho see below), tho I know none of my players (on my mac ibook) wish to play speex for whatever reason.

Mac... oh, oops. Sometimes I wonder where the Mac got it's "multimedia computer" reputation from

. In a more serious vain, there really seems to be a lack of audio players capable of playing speex for the mac. I don't have the time right now to look any further, but these are the possible solutions I've found: The Xiph Quicktime project seems to integrate with iTunes, and is thus probably the preferable way to go. Then there's the Java speex player, which, being Java, should run on the Mac as well. And then there's the speex Darwinport, which seems to be part of a larger project, no idea. Maybe someone could try these and tell us of the results? Especially the ease of installation and usage would be of interest here, I think.
By the way, for windows there are loads of possible ways to play speex files, including:

Foobar2000, a player that may be ugly in it's basic configuration, but still very powerful, versatile, and an excellent format converter, too.
The Winamp plugin on the recommendable rarewares site. This has to be unpacked into the Winamp\plugins folder, no further configuration necessary.
The directshow filters, which allow for any directshow based player to play ALL the ogg formats (i.e. vorbis, speex, flac and theora). AFAIK this should allow playing speex in the Windows Media Player, though because of a profound dislike for that player I've never tried it.

hugh wrote:
3. No bitrate or quality setting given for the ogg vorbis file.

[...]I don't know what bitrate archive generates for vorbis, tho we could, of course check.

It's 128kbit, too, probably for all files. This suggests archive.org uses a constant bitrate setting for ogg vorbis, which is not at all desirable. But that's not under discussion here, for obvious reasons.

hugh wrote: Again, if a volunteer can find a way to add formats, compression, or in any way help the project, I am supportive, as long as it does not add to the already arduous, thankless, and difficult job that our catalogging volunteers take on.

As I said before, I'll have to look into the whole procedure and see how much effort it would be to come up with a very basic script that would automate at least some of the steps necessary. I can't make any promises, since if there's more to it than I'm seeing atm, I shouldn't really let my university workload suffer even more

. But so far it SEEMS such a script would only have to take an input file, compress it to various formats, add tags that can either be derived from a descriptive filename, or entered by the user, upload it to archive.org (this totally depends on the conditions of archive.org uploads, i have absolutely no idea how they work) and generate an html page containing tag derived information and download links.

And now some meat - here are the Raven transcodes I spoke of in my first post. Both are transcoded from the ogg vorbis file@128kbit, raven_q4 is transcoded at quality=4 setting, raven_q1 at quality=1 setting. The vorbis files was 8.7MB, the speex q4 file is extremely well listenable (imho) and and 1MB, the speex q1 file is still okay to listen to and it's only a meager 660KB(!). Try them for yourselves, compare them to the downloads on the librivox page, and tell us what you think. For those interested in the gory details, I used Foobar2000 to convert the files, and the settings for speexenc were: -u --vbr --quality 4 %d.

One last issue: Did anyone ever try to create 32000 Hz mp3s instead of the usual 44100 Hz and upload those? Without testing it, I assume 32000 Hz is more than enough for mp3 voice files, and shaving off about one fourth of the frequency information should result in higher quality encodes especially at 64kbs. And yes, the speex files I transcoded are at 32KHz, resampled using SSRC before transcoding them. In Foobar2000 that's one step

.

Post by **hugh** » December 19th, 2005, 5:20 am

hi pullme, chris (tis) is managing the uploader/catalog software devt project - so give him a dingle if you're interested in helping out - even as a bug tester etc.

good luck with exams!

BradBush · Post by **BradBush** » December 19th, 2005, 7:43 am

I actually think you should be able to go down to 22khz for all speech (as that is what I use on my podcast with a VBR encoding of 96, and it sounds great - of course, I'm using a premium dither for 24bit, 44.1). I like all your suggestions, but we have to make sure various file formats are available, because I would say 99% of the people will not search out new software if they cannot get it to work with what they have (ie. Itunes and Windows Media player which don't support ogg vorbis or speex out of the box).

Brad

thistlechick · Post by **thistlechick** » December 19th, 2005, 3:15 pm

PullMeUnder wrote:

No file sizes given with the links. Being on broadband, this isn't of too much importance to me, but last time I checked, there still where loads of modem users out there. Obviously, they'd rather chose the smallest file, and it doesn't become apparent which one that is (okay, 64 is smaller than 128, but what about the ogg file?)
...

Thanks for noting these issues... we'll be updating the catalog pages with file size info and additional information over the next couple of weeks. Thanks for your patience. =)

tis · Post by **tis** » December 19th, 2005, 3:19 pm

PullMe...

The automated system is currently intended to do the following:

- upload 128kbps mp3s to a staging server
- validate that all files are in the correct format (128kbps) and have consistent filenames, metadata etc.
- allow the correction of any files that have metadata errors
- upload the files to archive.org when the book is complete

It was initially designed to solve two problems: (i) that uploads to archive.org are password protected and that's a bottleneck and (ii) that volunteers make mistakes with metadata which can be a pain to check.

A lot of people have trouble moving (large!) wav files around, and a lot of our volunteers are not very technical, so I'm hesitant to ask them to do any more than generate an mp3. However, if we can find a _transcoder_ (I know, I know, never transcode!) from mp3 to speex then we could certainly add 'autogenerate speex files' to the automated system.

The validator runs on our server (Debian) and is developed on a mac, in Ruby (because we want to be consistent with our RoR catalog backend, also under development). If you know Python, Ruby won't be hard to pick up (I've never used it before this project, but it's pretty intuitive).

PullMeUnder · Post by **PullMeUnder** » December 20th, 2005, 9:18 am

I'm still on it, but X-Mas preps have finally caught up with me. I don't know when I'll be able to look into this some more, might be till after the holidays. My intention at this point is

Offering Chris a possibility to transcode from the 128kbit to low-bitrate speex (my problem atm is the tagging, but I haven't looked into it all too intensively yet).
Maybe enhance this mechanism and enable it to provide all sorts of hassle-free transcodes.
Offer a few speex files in varying bitrates for people to decide which bitrate(s) would be desirable to include publications. Obviously, this only makes sense if a few people are interested in testing the various encodes and give their opinion on them - do we have those "few people"?

Of course, we wouldn't want to overdo this and offer dozens of possible encodings to the listener, who'd then be way to overwhelmed. This means either only one or two additional encodes at all, or more encodes but hide them from the main catalogue page.

kayray · Post by **kayray** » December 20th, 2005, 9:22 am

Hi Pullmeunder and all,

I'm sorry if this question has already been asked and answered (I've merely skimmed this thread) , but do we know if archive.org will allow us to upload and store speex files with them? If we don't know, we'd better check before doing a lot of additional work, as we currently have no other long-term file storage options.

Kara

PullMeUnder · Post by **PullMeUnder** » December 20th, 2005, 11:40 am

Hi kayray,

from what I've been told, they allow the upload & storage and just don't DO anything with the file (such as the transcodig to 64kbs and ogg vorbis upon an mp3 upload). So, to my very limited knowledge, this shouldn't be a problem.

tis · Post by **tis** » December 20th, 2005, 2:40 pm

That was my understanding; I'll contact them and confirm.

thistlechick · Post by **thistlechick** » January 1st, 2006, 7:53 pm

Each catalog page is now updated with the latest layout and file sizes ... there are a few file sizes missing and a few suspect, but overall, it is complete... we have also resolved a couple of the other concerns.

freqmod · Post by **freqmod** » July 26th, 2006, 1:01 pm

Hello, I would like to volunteer for the necessary software/changes to make LibriVox offer speex files for download.

I can program (in order of preference) Ruby, Python, PHP, C(++) & Java, and make a mp3 (libmad+libid3tag) -(libresample)> speex (libspeex+libogg) converter if necessary and integrate it with existing converting software.

Speex is optimized for 32kHz, 16kHz or 8 kHz files, and the files should be resampled to 32kHz. A 22050Hz file is bigger than a 32kHz file when coded at the same quality setting. A 32kHz mono audio file would have good quality when encoded with quality 4 vbr (which gives at average 15 kbps, or 8:50(min:sec)/MB).

In case of a converter, would it run on the computers of the uploaders or a server, and if applicable which OS/languages does the server support?

I have (allready) made my own converter ruby script when re encoding many files using lame --decode, sox & speexenc if those programs may be installed.