Reading screen and file format

Post your questions & get help from friendly LibriVoxers
Post Reply
berlingunning
Posts: 78
Joined: January 24th, 2023, 1:23 pm

Post by berlingunning »

Not sure how to pose these two questions but here goes.

Let's say I've got my assigned reading section in a group project. I've downloaded the text from which I've been instructed to read, and only that text. There are several file formats. PDF, Kindle, and so on.

Question one. Which file format do I use?

Question 2. When I am running the audacity software, is it running in the background on my main computer screen while I read from that? Or am I running audacity on one machine and reading from another? Like, can I run the software on my laptop as I read from my tablet screen? Because it seems like I should also be watching to make sure that the software is running properly as I'm recording.
stepheather
Posts: 707
Joined: July 14th, 2007, 5:18 pm
Location: In the urban wild

Post by stepheather »

Hi, there--

Most of these options are viable!

I've noticed that the pdf format often comes through a lot cleaner than the other versions for documents from Internet Archive, so I tend to use that if I'm not just reading from the online version itself. I can't remember what I use for Gutenberg.

I read on my computer screen and have the Audacity program to the left and the document to the right, each taking up half my screen. This is highly customizable, though--I know some people read from a tablet instead of the computer screen. The issue is--as you've noted--you might want to keep an eye on Audacity to make sure you've hit record and your mic isn't muted and the background noise hasn't taken over the whole wave form and you aren't constantly hitting the limit where clipping happens. This is why I organize my space the way I do. The downsides of my way are that both winders are smaller (I'm on a laptop) and I get mouse clicks when I have to adjust the text. I don't think I'd get as many of those noises if I were scrolling on a tablet.

Hope this helps. :) Phil Chenevert has a thread going somewhere about how people set up their spaces and how they record. I just can't find it right at this moment...

Thanks,
Stephanie
--Stephanie
*******************

Current solo:
Life among the Piutes

Native American history--Come read about removal plans, education, and laws:
Annual Report of the Commissioner of Indian Affairs, December 1837
TriciaG
LibriVox Admin Team
Posts: 60799
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

(1) Whichever you prefer.

(2) Whichever you prefer.

8-)

As long as you got the text from the link in the project, any format of it is OK to use.

Yes, it's good to have your eye on the Audacity window as you're recording, to make sure you're not clipping or even that your mic is turned on! (Some people have "recorded" a whole section only to discover that, for whatever, reason, it didn't actually record! Seeing the wave forms scroll by would prevent this.) But it's not required.
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
mightyfelix
LibriVox Admin Team
Posts: 11140
Joined: August 7th, 2016, 6:39 pm

Post by mightyfelix »

TriciaG wrote: January 30th, 2023, 12:28 pm As long as you got the text from the link in the project, any format of it is OK to use.
I will offer one caveat to this. There's one text format that I think readers should avoid like the plague, if you have a text from archive.org, and that is the "Full Text" link. This is a plain-text format that is easy to copy and paste, and so it may be tempting if you like to mark up your text before reading. However, this is a computer-generated text file with no human oversight or correction, so it's always riddled with errors, some of which are clearly ridiculous, and others which are rather hard to catch!

Aside from this caveat, I totally agree! :D
ChristopherW
Posts: 34
Joined: August 4th, 2009, 1:29 pm

Post by ChristopherW »

mightyfelix wrote: February 1st, 2023, 1:46 pm I will offer one caveat to this. There's one text format that I think readers should avoid like the plague, if you have a text from archive.org, and that is the "Full Text" link. This is a plain-text format that is easy to copy and paste, and so it may be tempting if you like to mark up your text before reading. However, this is a computer-generated text file with no human oversight or correction, so it's always riddled with errors, some of which are clearly ridiculous, and others which are rather hard to catch!
For sure. Once I found the word "fire" changed to "tire" in the computer-generated text file. That kind of error can silently change the meaning of a sentence!

(By the way, this process is called "Optical Character Recognition" or OCR. Some OCR software is better than others, but I found a lot are still confused by specks and ink spots and with similar characters like capital I and lowercase l and numeral 1. And sometimes words will be smudged or partially printed in a book, which is virtually impossible for OCR software to read but (usually) can be read by a human with a brain.)
lightcrystal
Posts: 1254
Joined: October 22nd, 2021, 10:55 pm
Location: Melbourne with kangaroos

Post by lightcrystal »

1. I use software called Okular to read from a pdf.

2. In Reaper [my DAW] I have a delay set of about 2 seconds once I hit record. That gives me time to click on the pdf of the text and read from it. Thus I have the pdf filling my whole screen. I am not looking at the waves or the DAW. When I want to stop I press the space bar. If I have included any blank space at the start or end I easily delete them with 1 select and click [that is set in my Reaper template as well]. Note that I use a desktop computer with 1 monitor. I do not use any other device.

But I probably wouldn't suggest method [2] for people who have never recorded. It's a more advanced way of doing it. It requires a sense of timing to not record "too early" before the delay ends. But after awhile you do it at the right time without thinking about it. I remember where I am up to each time; I don't in any way mark the pdf text.
Fan of all 80s pop music except Meatloaf.
rlc77jrm
Posts: 83
Joined: September 28th, 2022, 9:21 am
Contact:

Post by rlc77jrm »

mightyfelix wrote: February 1st, 2023, 1:46 pm
TriciaG wrote: January 30th, 2023, 12:28 pm As long as you got the text from the link in the project, any format of it is OK to use.
I will offer one caveat to this. There's one text format that I think readers should avoid like the plague, if you have a text from archive.org, and that is the "Full Text" link. This is a plain-text format that is easy to copy and paste, and so it may be tempting if you like to mark up your text before reading. However, this is a computer-generated text file with no human oversight or correction, so it's always riddled with errors, some of which are clearly ridiculous, and others which are rather hard to catch!

Aside from this caveat, I totally agree! :D
I'm contributing to a project that specifies reading from a 1-up image text on archive.org (the project page link goes direct to the image). For each or the various chapters, I've taken its text and formatted it into a document, which I then proofread for corrections before reading. (The OCR is probably 85% good, but lots of little problems. See later paragraph.) This works OK for me because I tend to do shorter sections/chapters, and each chapter gets its own document. I also include the intro and outro for the section/chapter in the document, so I don't have to refer to the project page while recording. Even the solo project I've completed, and the one I have in progress, were laid out this way.

Yes, it's a lot of work, but it allows me to arrange windows on my screen similar to what stepheather described. I have Audacity open in the upper left of the screen, Checker open in the lower left corner, and the document open and covering the right half. I also have a File Explorer-type window (I use a flavor of Linux, not Windows) open in the center of the screen behind the other windows. This allows me to drag the exported .MP3 to Checker when I'm finalizing the file for upload.

The most consistent problem in this particular OCR transcription is the substitution of a space for the apostrophe in a possessive (e.g. Paul s instead of Paul's). It almost always misreads Rome (most often Eome) and Roman (Eoman or Koman). The ligature Ӕ (in Aegean) has read as simply E and 1.E in the worst case.
John R Moore [rlc77jrm]
Albertville, AL
txphred
Posts: 73
Joined: June 29th, 2021, 10:40 pm
Location: Nueces county Texas
Contact:

Post by txphred »

What to do about text errors

"I will offer one caveat to this. There's one text format that I think readers should avoid like the plague, if you have a text from archive.org, and that is the "Full Text" link. This is a plain-text format that is easy to copy and paste, and so it may be tempting if you like to mark up your text before reading. However, this is a computer-generated text file with no human oversight or correction, so it's always riddled with errors, some of which are clearly ridiculous, and others which are rather hard to catch!"
mightyfelix


I just started on my first reading project (Section 2 of [u]The Inside of the Cup[/u] by Winston Churchill . I downloaded the Plain Text UTF-8 file from Project Gutenberg site link provided by LibriVox. While reading the chapter I noticed some oddities. I going to proof read the text and make notes of questionable words,phrases, etc. Should I just record the text exactly as given; or, should I record an edited version? Is there a definitive text available (like a Norton Critical Edition) to check against? Should I submit my proof notes to project coordinator? In view of public domain status concerns about the text; I want to be careful. I'd appreciate some advice.

Thanks,

txphred
redrun
LibriVox Admin Team
Posts: 2936
Joined: August 11th, 2022, 8:32 pm
Contact:

Post by redrun »

txphred wrote: March 2nd, 2023, 3:14 pm What to do about text errors

"I will offer one caveat to this. There's one text format that I think readers should avoid like the plague, if you have a text from archive.org, and that is the "Full Text" link. This is a plain-text format that is easy to copy and paste, and so it may be tempting if you like to mark up your text before reading. However, this is a computer-generated text file with no human oversight or correction, so it's always riddled with errors, some of which are clearly ridiculous, and others which are rather hard to catch!"
mightyfelix


I just started on my first reading project (Section 2 of The Inside of the Cup by Winston Churchill . I downloaded the Plain Text UTF-8 file from Project Gutenberg site link provided by LibriVox. While reading the chapter I noticed some oddities. I going to proof read the text and make notes of questionable words,phrases, etc. Should I just record the text exactly as given; or, should I record an edited version? Is there a definitive text available (like a Norton Critical Edition) to check against? Should I submit my proof notes to project coordinator? In view of public domain status concerns about the text; I want to be careful. I'd appreciate some advice.

Thanks,

txphred
You can try downloading one of the other formats rather than the "Plain Text UTF-8" and see if they have the same issues, but be sure you download them from that same page that's linked from LibriVox.

If the issues are still there, then the project/book coordinator (often abbreviated BC) will be the best person to ask how to handle it on that particular project.
I'll be out for a bit on this last weekend of April, but still checking in as I get the chance. I will try to follow up on Monday, with anything I can't do on the go.
TriciaG
LibriVox Admin Team
Posts: 60799
Joined: June 15th, 2008, 10:30 pm
Location: Toronto, ON (but Minnesotan to age 32)

Post by TriciaG »

Some typos may get through the PG proofreaders, but it isn't overly common.

If you can find a scan of the text (at Internet Archive or HathiTrust or somewhere else), you can compare the texts there. But do consult with the BC as well. :)

If they are legitimate errors, they can be reported to PG. Here's their page explaining the process: https://www.gutenberg.org/help/errata.html
School fiction: David Blaize
America Exploration: The First Four Voyages of Amerigo Vespucci
Serial novel: The Wandering Jew
Medieval England meets Civil War Americans: Centuries Apart
txphred
Posts: 73
Joined: June 29th, 2021, 10:40 pm
Location: Nueces county Texas
Contact:

Post by txphred »

I just finished comparing the Gutenberg Plain Text UTF-8 version of Inside the Cup by Winston Churchill with a pdf scan of the original published book. There weren't a great number of errors in the text file; but, the book's meaning and tone were altered. The pdf I used was a scan of the book published in c1913 by New York : Grosset & Dunlap. It is part of the University of California Libraries collection. It's the same as a couple of others I found. I'll write up a note and send it to the BC.

txphred

aka: Fred
zachh
Posts: 435
Joined: November 1st, 2020, 5:02 am
Location: Piercefield, NY
Contact:

Post by zachh »

My method for making sure I'm recording is to watch the Audacity window while I am recording the disclaimer and section title, and then I click on the text to bring it to the front and continue reading without making any changes to Audacity. This way it just keeps running in the background. I'm recording on a laptop with a fairly small screen, but if I had more space I would opt for the side by side approach, as it would be nice to be able to see the recording happening throughout the reading.
Post Reply