Synthetic voices from Librivox recordings
Hi,
I wanted to share some information on what we are able to do with Librivox recordings, and give people a thread to discuss about it, if they want.
I work for Toshiba Research Europe Ltd, in a group that does research on speech synthesis ("text-to-speech", "computer voices", i.e. where the computer reads out a text). As you might know, speech synthesis has improved a lot since it was first implemented, and is now often quite good for limited domains. However, it is still a long way away from humans for tasks such as reading a book. This is why there is a lot of research interest in this field. And as speech synthesis relies heavily on data these days, data (i.e. recordings of people reading books) is very important. This is how we came across Librivox. It's great
We have downloaded some recordings and processed them. We can now turn recordings into individual synthetic voices, with which new texts can be synthesized. We have not yet made any texts synthesized with these voices public. However, we would like to do so. For example, we would like to be able to get feedback from people on which synthetic voices sound best. And maybe allow people to download synthesized books. And in the future, it is theoretically possible that the company would also like to use these voices for products (although it would be very unusual to use non-studio recordings directly for this).
I understand (and someone from the Librivox Admin Team has kindly confirmed this) that the public domain license also gives us the legal right to do this. However, we are planning to contact individual readers before we use the synthetic version of their voice publicly. So some of you might be getting a Private Message from me in the near future. But feel free to express your views here instead/in addition.
We know we are not the only ones using these recordings but there doesn't seem to be any previous forum discussion about this topic.
Thanks to everybody who contributed to the Librivox resource!
Dr Sabine Buchholz
Speech Technology Group
Cambridge Research Laboratory
Toshiba Research Europe Limited
http://www.toshiba-europe.com/research/crl/stg/index.html
I wanted to share some information on what we are able to do with Librivox recordings, and give people a thread to discuss about it, if they want.
I work for Toshiba Research Europe Ltd, in a group that does research on speech synthesis ("text-to-speech", "computer voices", i.e. where the computer reads out a text). As you might know, speech synthesis has improved a lot since it was first implemented, and is now often quite good for limited domains. However, it is still a long way away from humans for tasks such as reading a book. This is why there is a lot of research interest in this field. And as speech synthesis relies heavily on data these days, data (i.e. recordings of people reading books) is very important. This is how we came across Librivox. It's great
We have downloaded some recordings and processed them. We can now turn recordings into individual synthetic voices, with which new texts can be synthesized. We have not yet made any texts synthesized with these voices public. However, we would like to do so. For example, we would like to be able to get feedback from people on which synthetic voices sound best. And maybe allow people to download synthesized books. And in the future, it is theoretically possible that the company would also like to use these voices for products (although it would be very unusual to use non-studio recordings directly for this).
I understand (and someone from the Librivox Admin Team has kindly confirmed this) that the public domain license also gives us the legal right to do this. However, we are planning to contact individual readers before we use the synthetic version of their voice publicly. So some of you might be getting a Private Message from me in the near future. But feel free to express your views here instead/in addition.
We know we are not the only ones using these recordings but there doesn't seem to be any previous forum discussion about this topic.
Thanks to everybody who contributed to the Librivox resource!
Dr Sabine Buchholz
Speech Technology Group
Cambridge Research Laboratory
Toshiba Research Europe Limited
http://www.toshiba-europe.com/research/crl/stg/index.html
-
- LibriVox Admin Team
- Posts: 60799
- Joined: June 15th, 2008, 10:30 pm
- Location: Toronto, ON (but Minnesotan to age 32)
Thanks for letting us know. I think this is great!
And yes, public domain means that you can use our recordings as you will, without permission. But it's very considerate of you to ask the readers.
(P.S. My later recordings - after about September 15, 2010, are of better quality due to a better microphone. This is just in case you ever wanted to use my recordings. )
And yes, public domain means that you can use our recordings as you will, without permission. But it's very considerate of you to ask the readers.
(P.S. My later recordings - after about September 15, 2010, are of better quality due to a better microphone. This is just in case you ever wanted to use my recordings. )
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
Wow that is exciting! So glad we can help benefit many more people like this too.
Esther
Esther
"Reasonable people adapt themselves to the world. Unreasonable
people attempt to adapt the world to themselves. All progress,
therefore, depends on unreasonable people." George Bernard Shaw
people attempt to adapt the world to themselves. All progress,
therefore, depends on unreasonable people." George Bernard Shaw
I second Tricia's opinion There's all kinds of people using our stuff out there, it is very nice of you to take the time to post and explain your goals, thank you!
Interested in French recordings?
Interested in French recordings?
-
- Posts: 8065
- Joined: October 24th, 2007, 12:17 pm
- Location: Germany
- Contact:
Text to Speech is a wonderful thing, I use it quite a lot (there are not so many German recordings in our catalog). It's great to know that our recordings are used to create knew and better voices and I really appreciate that you take the trouble to tell us about it.
As a potential customer I'd like to say that I'm not so much interested in buying text to speech recordings. I'd rather buy the voice and generate the mp3 from the text I want to listen to.
My hope is that in the future tts will have developed so far that it will be possible to create voices easily. One day, I hope, there will be voices with a free license, so people can share the computer generated mp3 with others, like a sort of Librivox for tts. (What you get today is for personal use only or horribly expensive.)
Every research is a tiny step in this direction and it's great to know that LV recordings are a part of it.
As a potential customer I'd like to say that I'm not so much interested in buying text to speech recordings. I'd rather buy the voice and generate the mp3 from the text I want to listen to.
My hope is that in the future tts will have developed so far that it will be possible to create voices easily. One day, I hope, there will be voices with a free license, so people can share the computer generated mp3 with others, like a sort of Librivox for tts. (What you get today is for personal use only or horribly expensive.)
Every research is a tiny step in this direction and it's great to know that LV recordings are a part of it.
This might be what you are looking for:Hokuspokus wrote:My hope is that in the future tts will have developed so far that it will be possible to create voices easily. One day, I hope, there will be voices with a free license, so people can share the computer generated mp3 with others, like a sort of Librivox for tts. (What you get today is for personal use only or horribly expensive.)
http://mary.dfki.de/ (They have different voices, including German ones, with different licenses, and you can build your own. And I'm sure they appreciate feedback from a keen TTS user.)
or http://www.cstr.ed.ac.uk/projects/festival/ and http://festvox.org/
-
- Posts: 8065
- Joined: October 24th, 2007, 12:17 pm
- Location: Germany
- Contact:
You mean that someday I could click the text-to-voice feature on my Kindle...and hear myself??? I can't tell you how frightening that sounds.
sjmarky wrote:You mean that someday I could click the text-to-voice feature on my Kindle...and hear myself???
Well, yes, theoretically... But I wouldn't hold my breath
Double thanks then!sjmarky wrote:I can't tell you how frightening that sounds.
-
- Posts: 3647
- Joined: February 15th, 2009, 6:25 pm
- Location: Florida
- Contact:
Not to throw a wet blanket on things, but this reminds me of The Stepford Wives where the heroine is seemingly innocently made to record several hundred words, only to have them (spoiler alert)
.
.
.
.
.
.
.
.
.
installed into the robot replicant of herself by the mad scientists of Stepford.
.
.
.
.
.
.
.
.
.
installed into the robot replicant of herself by the mad scientists of Stepford.
They call me Threadkiller.
My Catalog Page
My Catalog Page
Hi, so, finally (took a long time to set the website up):
Check out
http://www.speechtechgroup.co.uk/eBook/enUS/index.php
to hear some synthetic versions of real Librivox readers.
Voices and the website are just first versions, and comments are very welcome!
Thanks,
Sabine
Check out
http://www.speechtechgroup.co.uk/eBook/enUS/index.php
to hear some synthetic versions of real Librivox readers.
Voices and the website are just first versions, and comments are very welcome!
Thanks,
Sabine
Wow, very interesting! Sounds a bit fuzzy, though, as if there was a lot of noise and heavy Noise Removal was done.. Sorry, forgot to remove that hat!
No British accents, though?
A bit scary to think what they could "make" you say if the technology got really, really good, but if someone really wanted to do something like that, they could take someone's vast Catalog and then cut and paste to make them say just about anything anyway (I think we've had this discussion before... ).
Still, very cool and very awesome that Librivox readers are on the cutting edge of technology in this area!!
Dennis
No British accents, though?
A bit scary to think what they could "make" you say if the technology got really, really good, but if someone really wanted to do something like that, they could take someone's vast Catalog and then cut and paste to make them say just about anything anyway (I think we've had this discussion before... ).
Still, very cool and very awesome that Librivox readers are on the cutting edge of technology in this area!!
Dennis
"I ask to be allowed to have a lamp in the evening;
it is indeed wearisome sitting alone in the dark." ~ William Tyndale (1494-1536) |
it is indeed wearisome sitting alone in the dark." ~ William Tyndale (1494-1536) |
-
- Posts: 8065
- Joined: October 24th, 2007, 12:17 pm
- Location: Germany
- Contact:
Wow, that's pretty cool!
I like the way the voices deal with grammar, the structure of a sentence, the pauses etc. In that they all sound very human. I'm surprised that the sound quality if so different. Some sound like a very bad human recording with plosives, tinkle from too much noise cleaning, and other sound much better. I like the sound of voice 4-6, especially 6.
I have no idea who the original speakers might be.
With the voices I use on my computer I like that I can adjust the speed. To my taste they are just a tiny bit too fast. That might not be true for native speakers.
Great work!
I like the way the voices deal with grammar, the structure of a sentence, the pauses etc. In that they all sound very human. I'm surprised that the sound quality if so different. Some sound like a very bad human recording with plosives, tinkle from too much noise cleaning, and other sound much better. I like the sound of voice 4-6, especially 6.
I have no idea who the original speakers might be.
With the voices I use on my computer I like that I can adjust the speed. To my taste they are just a tiny bit too fast. That might not be true for native speakers.
Great work!