ReVoxer - the LibriVox Recorder Development Proj

Stephan · Post by **Stephan** » January 18th, 2006, 2:20 pm

Please check out the "replace"-function in goldwave. Right next to copy, paste and mix.
Tbh., a function i wouldn?t want to miss. If revoxer wouldnt have it, i?d stick with goldwave.

I found it VERY helpfull for us to redo parts in the audiobooks.
Especially when you have two files open and want to copy and replace-paste something from the one file to the other.
(perfect for doing dialogged audiobooks too!)

Also check out the way it handles rerecording of marked areas:
Yyou can mark an area, and press record, speak into microphone...and when the new things you say are longer than the old marked area, it fills in new space on the go, until you quit recording.
If the new things you say are shorter than marked area, than he cuts out the rest of the errornous marked arrea.

This way the text after the marked area always ends up automaticly at the right place. No need to cut out empty parts or to fill in needed space.

2. examples:
An errornous: "he jumped [erm.in-to.ahh. the] tree and found...."
Easily becomes "he jumped [higher into the thick olive] tree and found..."

An errornous: "he jumped [erm.in-to.ahh. the] tree and found...."
Easily becomes "he jumped [into the] tree and found..."

Post by **hugh** » January 18th, 2006, 4:53 pm

* What compression algorithm and settings (bitrate?) do
LibriVox volunteers use?

most use audacity format, exported by LAME to 128kbps mp3 ... which archive then converts to 64kbps mp3 and ogg.

Which ones are preferred by listeners? Does anybody
re-encode the audio?

64mp3 most used. we would like to offer speex too if possible.

kri · Post by **kri** » January 18th, 2006, 5:31 pm

Woohoo! I'm so excited to see this project progress. I was just thinking that I wish I had something to contribute to this project...yet I have web design skills. If you'd like I could put together a more attractive looking design for the project page you have up. Check out my website (http://www.greenkri.com), I swear I do a pretty OK job!

Let me know if this would be of help, and I'll get started.

Post by **hugh** » January 19th, 2006, 8:05 am

cool! umut - i told brewster kahle of internet archive about this project - i think he will like it.

kri I bet you've found yourself a job, but umut is boss so I'll let him make the official benediction.

any logo makers out there?

tshirt · Post by **tshirt** » January 19th, 2006, 10:07 am

kri wrote:Woohoo! I'm so excited to see this project progress. I was just thinking that I wish I had something to contribute to this project...yet I have web design skills. If you'd like I could put together a more attractive looking design for the project page you have up. Check out my website (http://www.greenkri.com), I swear I do a pretty OK job!

Let me know if this would be of help, and I'll get started.

Hugh is right; with such a nice web page as your reference, you
sure get the job.

Welcome on board!

tshirt · Post by **tshirt** » January 19th, 2006, 11:04 am

Stephan wrote: 2. examples:
An errornous: "he jumped [erm.in-to.ahh. the] tree and found...."
Easily becomes "he jumped [higher into the thick olive] tree and found..."

An errornous: "he jumped [erm.in-to.ahh. the] tree and found...."
Easily becomes "he jumped [into the] tree and found..."

Hi Stephan,

I understand the importance of Replace functionality.
However your examples confused me in some other
way. Do you use it to replace parts of a sentence? Or
do you mostly replace whole sentences? Can you tell
us more about how often you edit partial sentences?
Do you think it would negatively effect your productivity
if you were only able to edit whole sentences?

Stephan · Post by **Stephan** » January 19th, 2006, 11:36 am

Sure. Let me try to explain in other words:

words, sentences, paragraphs, large parts of text
You often re-phrase them, don?t you?
Mark them and redo them.

Whenever you re-record, there is no way that you record with the exact same length than the old marked area. You talk slower or faster a second time.

In the older version of goldwave, if you marked an area to correct and re-record, the time-marker recorded, until the end of the marked area was reached and then it STOPPED recording.

If you were faster and finished with your sentence earlier, fine, there was some space to delete/cut out.

If you wasn?t finished with your sentence until then...BANG..out of recording-time. Stops in mid-sentence. You had to insert some "silent space" and rerecord again, hoping it fits into the enlarged area now.
-----------------
Today it works like that:
When you record into a marked area but you record longer, he adds the needed space on the go. When you click stop the marked area has enlarged for as long you re-recorded and the recording behind the marked area, is perfectly connected with the new one. It gets pushed back.

If you re-recorded shorter than the old marked area, and you click stop, than the old marked area gets trimmed to the new recording. Errornous left-overs are automaticly cut out, because you had marked them to be replaced. They get cut out.
...again the recording behind the marked area, is perfectly connected with the new re-recorded one. It gets pulled to connect with the new recording.

I?d recommend you try out goldwaves shareware-trial and check out the replace funtion AND the "record into mark area" workflow.
I was so surprised when i installed a new goldwave-version and it worked like this now. I thought: "Wow. Never wanna miss this."

tshirt · Post by **tshirt** » January 19th, 2006, 11:55 am

Stephan wrote: If you wasn?t finished with your sentence until then...BANG..out of recording-time. Stops in mid-sentence. You had to insert some "silent space" and rerecord again, hoping it fits into the enlarged area now.

I thought: "Wow. Never wanna miss this."

I see, that's an important feature indeed. Thank you for pointing
it out. I think we should have this as one of the core functionalities.

We need to think about a simpler way to mark the "replace" portions
though. Since we are specifically dealing with voice recordings, maybe
editing at whole sentence granularity will be sufficiently functional, and
more easy to use. Basically we are targeting the simplest user interface
we can have for the core functionality (e.g. one text box). Then the
interface can be enriched by extensions to the core.

tshirt · Post by **tshirt** » January 19th, 2006, 8:47 pm

hugh wrote:cool! umut - i told brewster kahle of internet archive about this project - i think he will like it.

They are hosting the audio files right? This reminds me of a question...
Is there a server where we can collect user created time-synch data?
We will need about 1 GB for starters, and reasonable upload speed.
It is not urgent though. I think I could use my server at home for some
alpha testing, but I don't think it can support more than a handful of clients
at a time.

HerrSchildkroete · Post by **HerrSchildkroete** » January 21st, 2006, 1:20 am

Hi all, this is an ambitious project - Great!

tshirt wrote: Is there a server where we can collect user created time-synch data?

With regard to the time-sync data I have two questions.

1.) How are we going to store the above data? An extra file? Some compressed data embedded in ID3v2 Tags? I can think of many possibilities, all of which have their advantages and drawbacks.
2.) Where and how is the information generated? While recording by manual user interaction? Automatically while recording, by some kind of voice analysis? This part seems quite complicated to me.

tshirt · Post by **tshirt** » January 21st, 2006, 9:29 am

Hi HerrSchildkroete,

HerrSchildkroete wrote: 1.) How are we going to store the above data? An extra file? Some compressed data embedded in ID3v2 Tags? I can think of many possibilities, all of which have their advantages and drawbacks.

We will just collect data as a separate file, which can be
easily converted for many other use including the ones you
have mentioned. However, as you understand, the information in
this file will be dependent on the audio; so the file by itself has no
value without the corresponding audio file.
The content of the file will be dependent on the type of information
collected; at the basics it will include audio timestamps for text. (We
have discussed other types of feedback too, you can read about them
from the threads I listed below.)

HerrSchildkroete wrote: 2.) Where and how is the information generated? While recording by manual user interaction? Automatically while recording, by some kind of voice analysis? This part seems quite complicated to me.

The relevant info regards to this question is distributed into the previous
threads:
http://librivox.org/forum/viewtopic.php?t=945
http://librivox.org/forum/viewtopic.php?t=660
and emails.

Eventually, we will use all 3 resources. We will start with listener feedback
by implementing a player program that displays text and plays audio in
synch. We are hoping to get more people interested in this project after
this stage.

There are challenges in the the first stage already; one that I can see
right now is deciding on the tags when we have multiple (hopefully
a lot of) feedbacks from many clients. We will need to experiment with
different ways of incorporating these feedback into one common ground.
There are many machine learning algorithms about this problem. The
one that's best fit for our application will be clear as the system is used
and as we get feedback about their accuracy from users.

Once we have enough people to work on the harder problem of
integrating a recorder and a player for ReVoxer, we will incorporate
reader feedback and audio analysis. The recorder has to be easy to
use and custom-made for text recording, for this reason we left this
part to the second stage of development.

I hope these answer your questions to some degree. We are still in the
design phase, so please let us know if you have more questions/comments.

And... Please join us if you have time to contribute code to the
implementation.

tshirt · Post by **tshirt** » January 22nd, 2006, 5:08 pm

At this point I think we need a design document, so that
everyone that reads the document will be at the same
understanding about what ReVoxer is planned to be and
how will it get to that point.

Could anybody help me put together a software design
document for ReVoxer?

If you could study the previous discussions and you have
basic software engineering and design experience, please
join us in this stage.

Post by **hugh** » January 22nd, 2006, 5:57 pm

umut I can help with conceptual stuff, but nothing that requires technical knowldege. ie if you need any more help explaining what it should do, I can do that.

tshirt · Post by **tshirt** » January 22nd, 2006, 7:11 pm

hugh wrote:umut I can help with conceptual stuff, but nothing that
requires technical knowldege. ie if you need any more help explaining
what it should do, I can do that.

Deal

Details to come soon.

kri · Post by **kri** » January 23rd, 2006, 10:08 am

This doesn't have anything to do with the design of ReVoxer, but more the implementation. In a discussion about footnotes I was thinking that ReVoxer would be a great tool to use and avoid having to say footnotes in the audio, since it seems to be such a hindrance sometimes to the reading. This was my idea..

I just thought that it would be useful that you could still actually read the footnotes, without having the audio interrupted. I'm just thinking aloud so to speak, but one could put a little note in the text when a footnote appears that says "Pause for note" and insert the note in the text where it appears. Hmm....

The reader would see that there's a note not in the audio, pause the recording to read it, and play when ready to continue.

Maybe you could even include a feature for this into ReVoxer itself. This would probably be a feature to be added when the core important stuff is set. For example, you could put something akin to a link in the text where you have the presence of a note. When one clicks on the [1] or * that marks a note, it would pause the recording and pop up a little text box with the footnote so you can read it.