Suggestion: synchronized audio/text

Post by **hugh** » January 1st, 2010, 12:34 pm

@slattery ... what a great tool. how complicated is the syncing process?

slattery · Post by **slattery** » January 1st, 2010, 5:27 pm

Thanks for the feedback! I developed the online tool over the past few months. It uses the Google Web Toolkit, which helps make interactive web applications.

I didn't use the LRC format, actually, nor a typical sub-titler tool. The best tool I have used for aligning text-to-audio has been the free, open-source Transcriber: http://trans.sourceforge.net/
I think it's about as easy to use as the hypothetical "ideal" tool you described.

Transcriber creates a .trs file, containing text and timestamp information. So you would have one .trs file per chapter. My online player can read .trs files, that's how it generates interactive text display that you see.

To support a book with multiple chapters, I created a simple XML file to list what .trs and MP3 files are used for each chapter. Heres an example of such a file: http://content.dinglabs.com/book/thinketh
My player just takes the URL to such an XML file, and can generate the interface to playback that book. You can see the URL passed as a parameter at the end: http://reader.dinglabs.com/#b:=http://content.dinglabs.com/book/thinketh

I don't know of any better tool than Transcriber for manually synchronizing audio to text. Maybe there are some tools we can use to do this automatically? I know YouTube has recently added functionality like this!

neerajanagarajan · Post by **neerajanagarajan** » January 2nd, 2010, 1:14 am

That's NIFTY, slattery!

Like Jc said, maybe we should put this up on Other Projects and see how many are interested. It would be wonderful if we could offer the text along with the audio!

Post by **hugh** » January 2nd, 2010, 9:58 am

Transcriber creates a .trs file, containing text and timestamp information.

I'm not clear though: Is the .trs file generated from the (LibriVox) audio, or from the (Gutenberg) text? Or both?

That is:
* If the text is generated from the audio, you will certainly have transcription errors?
* If the text comes from the original source, how do you match it with timestamps on the audio?

Or, does the generated text (from audio) get compared with the original ? That would probably make the most sense, I guess, and, ahem, "shouldn't be too difficult."

slattery · Post by **slattery** » January 2nd, 2010, 12:20 pm

hugh wrote:I'm not clear though: Is the .trs file generated from the (LibriVox) audio, or from the (Gutenberg) text? Or both?

Good question, I should explain that better. Using Transcriber is a completely manual process. You must provide the text and the audio, and it just provides an good interface for creating timestamps between the two.

So when I run Transcriber to align a chapter of a book, I tell it which audio to use, and then I paste in the exact text of the chapter from Project Gutenberg. As the audio plays, you can hit 'Enter' at different points in the text to create a synchronization point there with the audio. It's the fastest way I know of to manually align a given transcript with an audio. I also add some special markup to indicate paragraph breaks, bold headings, and images.

So the .trs file pretty much contains the original text from Project Gutenberg, except it has added timestamp information. When my online tool reads the .trs file, it assembles all of the time-stamped text segments to display nicely into paragraphs.

I'm looking into software which might be able to automate a lot of this sychronization work. P2FA stands out has a good candidate: http://www.ling.upenn.edu/phonetics/p2fa/
If anyone else has experience with such software, I would love to talk with them!

Post by **hugh** » January 2nd, 2010, 12:38 pm

ah right - so a pretty labour-intensive effort...would be great to have it more automated

corinna · Post by **corinna** » January 7th, 2011, 12:23 pm

JC,

I was just thinking about this very thing, and I posted this to another forum thread:

This is a free lyric editor. You copy and paste the text of your file into the editor and while Winamp is playing the mp3, you press F5 whenever the recording reaches a new line. I copied and pasted from the txt file at Gutenberg, and tried it out. It works great. You then save the file with the same name as the mp3, only use lrc as the file extension.

http://www.mycnknow.com/download/TUTORIAL/tutor.htm

Here's a winamp plugin that displayed the lrc file. It installs itself into the Visualization plugin area of Winamp. It has an option of left-justifying the text, too. It highlights each line as you get to it during playback.

http://www.winamp.com/plugin/joseph-dke-lyrics-plugin/221546

I played with one of my own recordings, and it works great, and it's very easy to do.

The only additional work needed, beyond the ordinary Librivox recording process, would be for someone to listen to the final recording and press F5 in the lyric editor to make an lrc file. The PL'er could do this

. Then the lrc file could be made available for download along with the mp3 files (I don't know if this works with ogg files too).

derrill · Post by **derrill** » April 15th, 2013, 4:14 pm

Hey Slatery,

Great work thus far on your web-based UI and your research thus far on forced alignment. Transcriber is really cool too.

I've been searching for mobile products that play audio books with text synchronized because I'm trying to learn another language. Looks like Kindle (amazon.com) has made some strides in that area recently with "Whispersync with Immersion Reading." Here's the information on that:
http://www.amazon.com/gp/help/customer/display.html?nodeId=200375890

Unfortunately, while they have 15000 titles with that feature, they don't have the synchronized audio books that I want to play, so I tried playing around with the CMU Sphinx aligner, still struggling to get it to work.
Here's what I'm working off of:
http://cmusphinx.sourceforge.net/2011/08/long-audio-alignment-phrase-spotter-and-the-subsequent-improvements/
http://cmusphinx.sourceforge.net/wiki/longaudioalignment
http://sourceforge.net/p/cmusphinx/code/HEAD/tree/branches/long-audio-aligner/

Have you been working much on your project?

carolb · Post by **carolb** » April 16th, 2013, 6:05 am

I'm not sure that you'll get a response from slattery. His/her last post was in January 2011.

Carol

slattery · Post by **slattery** » April 17th, 2013, 8:39 am

Hi derrill,

I did have some success with the P2FA aligner. It worked for English, out of the box. In fact, I aligned an entire book, The Linguist by Steve Kaufmann, and have posted the complete book online. Here's an example chapter in my online player. You can see how every word is synchronized to the audio.

My workflow was to use the p2fa command line to align the content, then I wrote my own converter to turn that output into a Transcriber .trs file. From there, I could do some manual touchups and verification, and it was ready to use with the DingLabs Reader.

I'm interested to do this for other languages, but I haven't invested any more time in this area.

One other tool I wanted to try out was: Prosodylab-Aligner
That looked like a great way to align content in any language.

derrill · Post by **derrill** » April 18th, 2013, 11:24 am

Hi Slattery,

Thanks for replying!

First, let me tell you what I'm trying to do and why. I'm trying to learn a new language (Spanish) and I have tons of audio books with text. However, the audio goes too fast for me, and I need to rewind a lot. What I'm doing now is trying to pause on every period, then clicking rewind 15s on my mobile phone. This is really a pain. I would like it to automatically pause on the period, and then allow me to continue on to the next sentence or rewind that sentence. It would also be cool to display the sentence last played.

Pretty much I'm trying to do what you did in your web app, but in a mobile app, but with automatic sentence pausing and navigation buttons: repeat previous sentence, next sentence, etc. Obviously, if something like this has already been done, there's no sense reinventing the wheel, but I have yet to find anything like it that works with arbitrary content. (As I mentioned, Kindle's Immersion reading has a feature like it, but they don't the content I want.)

Anyway, I did download p2fa and tried it out on a random librivox recording/text. The text did require some massaging to get rid of unknown word errors, (adding spaces, replacing single quotes with double quotes, etc.). Unfortunately I'm getting some strange error "ERROR [+8522] LatFromPaths: Align have dur<=0 " (below). Did you run into this error?

ddabkoski@ddabkoski-wsl:~/Downloads/p2fa$ python align.py -s 25 abou_hunt_py_64kb.wav abou_hunt_py_64kb.txt ./test/abou.TextGrid
Resampling wav file from 24000 to 11025 trim 25...
sox WARN sox: effect `polyphase' is deprecated; see sox(1) for an alternative
SKIPPING WORD ADHEM
SKIPPING WORD —
SKIPPING WORD —
SKIPPING WORD ADHEM
SKIPPING WORD WRITEST
SKIPPING WORD —
SKIPPING WORD CHEERLY
SKIPPING WORD WAKENING
SKIPPING WORD ADHEM’S
./tmp/sound.wav -> ./tmp/tmp.plp
ERROR [+8522] LatFromPaths: Align have dur<=0
FATAL ERROR - Terminating program HVite
Traceback (most recent call last):
File "align.py", line 316, in <module>
writeTextGrid(outfile, readAlignedMLF(output_mlf, SR, float(wave_start)))
File "align.py", line 135, in readAlignedMLF
raise ValueError("Alignment did not complete succesfully.")
ValueError: Alignment did not complete succesfully.

Source:
Audio: http://www.archive.org/download/short_poetry_001_librivox/abou_hunt_py_64kb.mp3
Text: http://www.bartleby.com/41/524.html

derrill · Post by **derrill** » April 18th, 2013, 2:02 pm

It might have to do with using HTK 3.4 and not 3.4.1... Standby.

derrill · Post by **derrill** » April 18th, 2013, 2:21 pm

Yes, using 3.4 got rid of the error! The TextGrid looks accurate too.

derrill · Post by **derrill** » April 18th, 2013, 2:37 pm

Slattery,

So in thinking about what I'm trying to achieve, one very crude solution could be to break the wav file into tracks by sentence. That way I could just use the track navigation that comes with a standard audio player in iphone/android. Pretty crude, but at least I can start rewinding on a per sentence basis. (If I wanted also, I could add the sentence text for each segment as the track "lyrics".)

So I'm not particularly familiar with the TextGrid format. Can you describe more about the implementation you used for your web-based version? I'm assuming you had to convert the TextGrid into something else...

Sue Anderson · Post by **Sue Anderson** » April 20th, 2013, 1:43 pm

The tech-y part of this thread is way over my head, but just to say there is a sychronized voice/text video on you-tube of my reading of Geronimo http://www.youtube.com/watch?v=0oWS4ydlMEA, done by somebody (?) called the 16th Cavern. I had nothing to do with it.