LibriVox
Forums

* FAQ    * Search
* Login   * Register
It is currently June 21st, 2016, 2:49 pm


Post new topic Reply to topic  Page 2 of 2  [ 30 posts ] 
Go to page Previous  1, 2

Author Message
Offline
Post Posted:: January 1st, 2010, 12:34 pm 
LibriVox Admin Team

Joined: September 26th, 2005, 4:14 am
Posts: 7996
Location: Montreal, QC
@slattery ... what a great tool. how complicated is the syncing process?

_________________
hughmcguire.net | @hughmcguire


Top
 Profile  
Offline
Post Posted:: January 1st, 2010, 5:27 pm 

Joined: December 31st, 2009, 10:55 pm
Posts: 7
Thanks for the feedback! I developed the online tool over the past few months. It uses the Google Web Toolkit, which helps make interactive web applications.

I didn't use the LRC format, actually, nor a typical sub-titler tool. The best tool I have used for aligning text-to-audio has been the free, open-source Transcriber: http://trans.sourceforge.net/
I think it's about as easy to use as the hypothetical "ideal" tool you described.

Transcriber creates a .trs file, containing text and timestamp information. So you would have one .trs file per chapter. My online player can read .trs files, that's how it generates interactive text display that you see.

To support a book with multiple chapters, I created a simple XML file to list what .trs and MP3 files are used for each chapter. Heres an example of such a file: http://content.dinglabs.com/book/thinketh
My player just takes the URL to such an XML file, and can generate the interface to playback that book. You can see the URL passed as a parameter at the end: http://reader.dinglabs.com/#b:=http://content.dinglabs.com/book/thinketh

I don't know of any better tool than Transcriber for manually synchronizing audio to text. Maybe there are some tools we can use to do this automatically? I know YouTube has recently added functionality like this!


Top
 Profile  
Offline
Post Posted:: January 2nd, 2010, 1:14 am 

Joined: February 6th, 2009, 8:21 am
Posts: 4124
Location: Pittsburgh, PA
That's NIFTY, slattery! :9:
Like Jc said, maybe we should put this up on Other Projects and see how many are interested. It would be wonderful if we could offer the text along with the audio!


Top
 Profile  
Offline
Post Posted:: January 2nd, 2010, 9:58 am 
LibriVox Admin Team

Joined: September 26th, 2005, 4:14 am
Posts: 7996
Location: Montreal, QC
Quote:
Transcriber creates a .trs file, containing text and timestamp information.

I'm not clear though: Is the .trs file generated from the (LibriVox) audio, or from the (Gutenberg) text? Or both?

That is:
* If the text is generated from the audio, you will certainly have transcription errors?
* If the text comes from the original source, how do you match it with timestamps on the audio?

Or, does the generated text (from audio) get compared with the original ? That would probably make the most sense, I guess, and, ahem, "shouldn't be too difficult."

_________________
hughmcguire.net | @hughmcguire


Top
 Profile  
Offline
Post Posted:: January 2nd, 2010, 12:20 pm 

Joined: December 31st, 2009, 10:55 pm
Posts: 7
hugh wrote:
I'm not clear though: Is the .trs file generated from the (LibriVox) audio, or from the (Gutenberg) text? Or both?

Good question, I should explain that better. Using Transcriber is a completely manual process. You must provide the text and the audio, and it just provides an good interface for creating timestamps between the two.

So when I run Transcriber to align a chapter of a book, I tell it which audio to use, and then I paste in the exact text of the chapter from Project Gutenberg. As the audio plays, you can hit 'Enter' at different points in the text to create a synchronization point there with the audio. It's the fastest way I know of to manually align a given transcript with an audio. I also add some special markup to indicate paragraph breaks, bold headings, and images.

So the .trs file pretty much contains the original text from Project Gutenberg, except it has added timestamp information. When my online tool reads the .trs file, it assembles all of the time-stamped text segments to display nicely into paragraphs.

I'm looking into software which might be able to automate a lot of this sychronization work. P2FA stands out has a good candidate: http://www.ling.upenn.edu/phonetics/p2fa/
If anyone else has experience with such software, I would love to talk with them!


Top
 Profile  
Offline
Post Posted:: January 2nd, 2010, 12:38 pm 
LibriVox Admin Team

Joined: September 26th, 2005, 4:14 am
Posts: 7996
Location: Montreal, QC
ah right - so a pretty labour-intensive effort...would be great to have it more automated ;-)

_________________
hughmcguire.net | @hughmcguire


Top
 Profile  
Offline
Post Posted:: January 7th, 2011, 12:23 pm 

Joined: August 27th, 2010, 11:09 am
Posts: 125
Location: Portland, Oregon, US
JC,

I was just thinking about this very thing, and I posted this to another forum thread:

This is a free lyric editor. You copy and paste the text of your file into the editor and while Winamp is playing the mp3, you press F5 whenever the recording reaches a new line. I copied and pasted from the txt file at Gutenberg, and tried it out. It works great. You then save the file with the same name as the mp3, only use lrc as the file extension.

http://www.mycnknow.com/download/TUTORIAL/tutor.htm

Here's a winamp plugin that displayed the lrc file. It installs itself into the Visualization plugin area of Winamp. It has an option of left-justifying the text, too. It highlights each line as you get to it during playback.

http://www.winamp.com/plugin/joseph-dke-lyrics-plugin/221546

I played with one of my own recordings, and it works great, and it's very easy to do.

The only additional work needed, beyond the ordinary Librivox recording process, would be for someone to listen to the final recording and press F5 in the lyric editor to make an lrc file. The PL'er could do this :). Then the lrc file could be made available for download along with the mp3 files (I don't know if this works with ogg files too).

_________________
"A mind once stretched by a new idea never regains its original dimensions."


Top
 Profile  
Offline
Post Posted:: April 15th, 2013, 4:14 pm 

Joined: March 27th, 2013, 9:46 pm
Posts: 5
Hey Slatery,

Great work thus far on your web-based UI and your research thus far on forced alignment. Transcriber is really cool too.

I've been searching for mobile products that play audio books with text synchronized because I'm trying to learn another language. Looks like Kindle (amazon.com) has made some strides in that area recently with "Whispersync with Immersion Reading." Here's the information on that:
http://www.amazon.com/gp/help/customer/display.html?nodeId=200375890

Unfortunately, while they have 15000 titles with that feature, they don't have the synchronized audio books that I want to play, so I tried playing around with the CMU Sphinx aligner, still struggling to get it to work.
Here's what I'm working off of:
http://cmusphinx.sourceforge.net/2011/08/long-audio-alignment-phrase-spotter-and-the-subsequent-improvements/
http://cmusphinx.sourceforge.net/wiki/longaudioalignment
http://sourceforge.net/p/cmusphinx/code/HEAD/tree/branches/long-audio-aligner/

Have you been working much on your project?


Top
 Profile  
Offline
Post Posted:: April 16th, 2013, 6:05 am 

Joined: March 1st, 2011, 2:19 pm
Posts: 2033
Location: Surrey, England
I'm not sure that you'll get a response from slattery. His/her last post was in January 2011.

Carol

_________________
My Librivox Recordings


Top
 Profile  
Offline
Post Posted:: April 17th, 2013, 8:39 am 

Joined: December 31st, 2009, 10:55 pm
Posts: 7
Hi derrill,

I did have some success with the P2FA aligner. It worked for English, out of the box. In fact, I aligned an entire book, The Linguist by Steve Kaufmann, and have posted the complete book online. Here's an example chapter in my online player. You can see how every word is synchronized to the audio.

My workflow was to use the p2fa command line to align the content, then I wrote my own converter to turn that output into a Transcriber .trs file. From there, I could do some manual touchups and verification, and it was ready to use with the DingLabs Reader.

I'm interested to do this for other languages, but I haven't invested any more time in this area.

One other tool I wanted to try out was: Prosodylab-Aligner
That looked like a great way to align content in any language.


Top
 Profile  
Offline
Post Posted:: April 18th, 2013, 11:24 am 

Joined: March 27th, 2013, 9:46 pm
Posts: 5
Hi Slattery,

Thanks for replying!

First, let me tell you what I'm trying to do and why. I'm trying to learn a new language (Spanish) and I have tons of audio books with text. However, the audio goes too fast for me, and I need to rewind a lot. What I'm doing now is trying to pause on every period, then clicking rewind 15s on my mobile phone. This is really a pain. I would like it to automatically pause on the period, and then allow me to continue on to the next sentence or rewind that sentence. It would also be cool to display the sentence last played.

Pretty much I'm trying to do what you did in your web app, but in a mobile app, but with automatic sentence pausing and navigation buttons: repeat previous sentence, next sentence, etc. Obviously, if something like this has already been done, there's no sense reinventing the wheel, but I have yet to find anything like it that works with arbitrary content. (As I mentioned, Kindle's Immersion reading has a feature like it, but they don't the content I want.)

Anyway, I did download p2fa and tried it out on a random librivox recording/text. The text did require some massaging to get rid of unknown word errors, (adding spaces, replacing single quotes with double quotes, etc.). Unfortunately I'm getting some strange error "ERROR [+8522] LatFromPaths: Align have dur<=0 " (below). Did you run into this error?

ddabkoski@ddabkoski-wsl:~/Downloads/p2fa$ python align.py -s 25 abou_hunt_py_64kb.wav abou_hunt_py_64kb.txt ./test/abou.TextGrid
Resampling wav file from 24000 to 11025 trim 25...
sox WARN sox: effect `polyphase' is deprecated; see sox(1) for an alternative
SKIPPING WORD ADHEM
SKIPPING WORD —
SKIPPING WORD —
SKIPPING WORD ADHEM
SKIPPING WORD WRITEST
SKIPPING WORD —
SKIPPING WORD CHEERLY
SKIPPING WORD WAKENING
SKIPPING WORD ADHEM’S
./tmp/sound.wav -> ./tmp/tmp.plp
ERROR [+8522] LatFromPaths: Align have dur<=0
FATAL ERROR - Terminating program HVite
Traceback (most recent call last):
File "align.py", line 316, in <module>
writeTextGrid(outfile, readAlignedMLF(output_mlf, SR, float(wave_start)))
File "align.py", line 135, in readAlignedMLF
raise ValueError("Alignment did not complete succesfully.")
ValueError: Alignment did not complete succesfully.

Source:
Audio: http://www.archive.org/download/short_poetry_001_librivox/abou_hunt_py_64kb.mp3
Text: http://www.bartleby.com/41/524.html


Top
 Profile  
Offline
Post Posted:: April 18th, 2013, 2:02 pm 

Joined: March 27th, 2013, 9:46 pm
Posts: 5
It might have to do with using HTK 3.4 and not 3.4.1... Standby.


Top
 Profile  
Offline
Post Posted:: April 18th, 2013, 2:21 pm 

Joined: March 27th, 2013, 9:46 pm
Posts: 5
Yes, using 3.4 got rid of the error! The TextGrid looks accurate too.


Top
 Profile  
Offline
Post Posted:: April 18th, 2013, 2:37 pm 

Joined: March 27th, 2013, 9:46 pm
Posts: 5
Slattery,

So in thinking about what I'm trying to achieve, one very crude solution could be to break the wav file into tracks by sentence. That way I could just use the track navigation that comes with a standard audio player in iphone/android. Pretty crude, but at least I can start rewinding on a per sentence basis. (If I wanted also, I could add the sentence text for each segment as the track "lyrics".)

So I'm not particularly familiar with the TextGrid format. Can you describe more about the implementation you used for your web-based version? I'm assuming you had to convert the TextGrid into something else...


Top
 Profile  
Offline
Post Posted:: April 20th, 2013, 1:43 pm 

Joined: July 24th, 2008, 11:48 am
Posts: 2298
Location: Midwest, USA
The tech-y part of this thread is way over my head, but just to say there is a sychronized voice/text video on you-tube of my reading of Geronimo http://www.youtube.com/watch?v=0oWS4ydlMEA, done by somebody (?) called the 16th Cavern. I had nothing to do with it.

_________________
Sue

My LibriVox Recordings
For Variety Visit the Nonfiction Collection


Top
 Profile  
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 30 posts ]  Go to page Previous  1, 2



Who is online

Users browsing this forum: No registered users and 0 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum

Search for:
Jump to:  
Powered by phpBB® Forum Software © phpBB Group