Back-Up Project

Non-reading activities need your help too!
Cori
Posts: 12124
Joined: November 22nd, 2005, 10:22 am
Location: Britain
Contact:

Post by Cori »

While talking of backups ... it has been mentioned that ibiblio and gutenberg both hold some files.

Perhaps it might be incredibly useful, though potentially tedious, to check what files of ours they do hold? This has the following benefits:

* Gutenberg hold a variety of formats. (Not sure about ibiblio.)
* Both are high-profile storage servers, with mirrors.
* Knowing where the gaps are will make it easier for Josh to submit our complete collection to PG (or train up someone else to help him, possibly..?) I don't know about ibiblio submissions, but it'd be very handy for missing books to be added over there.

I don't mean to cast gloom on this effort, Alex, as it seems very handy to have a snapshot of how things are now ... but I don't think it's scaleable in the long-term, for the reasons that Jim mentions about size, and also because it requires someone to upload every new release manually. That's fine for a few months when everyone's interested, but again, long-term ... not sure.

Finally ... 300GB is a LOT of stuff. Do you have a download limit..? Or is this pure emergency storage, and the only access should be in case of problems at archive..?
There's honestly no such thing as a stupid question -- but I'm afraid I can't rule out giving a stupid answer : : To Posterity and Beyond!
hugh
LibriVox Admin Team
Posts: 7972
Joined: September 26th, 2005, 4:14 am
Location: Montreal, QC
Contact:

Post by hugh »

a couple of points:

-i think a complete back-up is a very good idea: while I have faith in archive.org... our files regularly get corrupted/chop in the upload/derivation process. and who knows what could happen. much safer to have all files kept somewhere else.

-i suggest ed use back-up as just that: safety for us in case we lose some files - so there would be lots of upload bandwidth needs, but barely any download bandwidth.

-gutenberg adds stuff ad-hoc, so they are not really a reliable back up (unless we formalize the system with them - which so far has not happened, and does not seem likely, as their process is just a guy adding files).

-ibiblio theoretically should be our back-up, but they have not functioned reliably for a while, with a number of problems ranging from accessibility of the files to disappearing files.

-for a back-up system to work as a back-up system, it has to be sysetmatic and reliable.

-a "systematic and reliable" system suggests a script that automatically goes thru the entire catalog, downloads and stores; and then does the same for all new releases...ad-hoc humans won't do the trick.

-a "systematic and reliable" system could be set up using elbow grease and human effort, but I'm not sure what it would look like.

-for a back up there is no need to have 128 files, 128 zip, 64 files, 64 zip ...all we need is the 128 files (or maybe even the 64 files). the zips and different formats provide choice to the end-user, but in the case of back-up we just want best quality files in case we need to replace.
ab2525
Posts: 628
Joined: June 20th, 2006, 8:55 pm
Location: Woodbridge, Virginia
Contact:

Post by ab2525 »

I have 50TB of monthly bandwidth. I'm also working on finding an automated script, but in the mean time, I think ad-hoc humans are better than nothing at all
What's this little box thingy for? Oh! [color=red]C[/color][color=orange]O[/color][color=yellow]L[/color][color=blue]O[/color][color=indigo]R[/color]
ab2525
Posts: 628
Joined: June 20th, 2006, 8:55 pm
Location: Woodbridge, Virginia
Contact:

Post by ab2525 »

Could someone help me write an rsync command that would only download .mp3 files?
What's this little box thingy for? Oh! [color=red]C[/color][color=orange]O[/color][color=yellow]L[/color][color=blue]O[/color][color=indigo]R[/color]
ductapeguy
Posts: 1870
Joined: January 2nd, 2006, 9:51 am
Location: Ontario, Canada
Contact:

Post by ductapeguy »

You might want to try wget too . I know there is a fairly large community of wget scripters online.
[size=84] Sean McGaughey
Librivox: [url=http://librivox.org/newcatalog/people_public.php?peopleid=231]Catalog[/url] | [url=http://ductapeguy.net]ductapeguy.net-- My music and podcasts[/url][/size]
ab2525
Posts: 628
Joined: June 20th, 2006, 8:55 pm
Location: Woodbridge, Virginia
Contact:

Post by ab2525 »

I'm going to use wget, that runs weekly in a cron job
What's this little box thingy for? Oh! [color=red]C[/color][color=orange]O[/color][color=yellow]L[/color][color=blue]O[/color][color=indigo]R[/color]
hugh
LibriVox Admin Team
Posts: 7972
Joined: September 26th, 2005, 4:14 am
Location: Montreal, QC
Contact:

Post by hugh »

I'm also working on finding an automated script, but in the mean time, I think ad-hoc humans are better than nothing at all
well, i'm not sure: if we figure out a sensible automated system, then whatever effort put in beforehand will be duplicated; if we *don't* figure out an automated system, then having a handful of backups might be good - but probably not much better than the status now (where generally BC's keep their projects for a period of time).

mind you, if we can come up with a comprehensive human-based system that makes sense and is workable ... that would be useful.

so i support the idea and the effort, i just think we need to think about the system more to get it to work properly.
ab2525
Posts: 628
Joined: June 20th, 2006, 8:55 pm
Location: Woodbridge, Virginia
Contact:

Post by ab2525 »

Maybe different people can be responsible for backing up each letter of the catalog, so there is an a person, b person, C person, etc. As for now, if my calculations are correct, than if we upload the 128 KB zips, I can hold 4 times the size that the catalog is now before I hit my limit
What's this little box thingy for? Oh! [color=red]C[/color][color=orange]O[/color][color=yellow]L[/color][color=blue]O[/color][color=indigo]R[/color]
ab2525
Posts: 628
Joined: June 20th, 2006, 8:55 pm
Location: Woodbridge, Virginia
Contact:

Post by ab2525 »

Once everything we have is backed up, I can probably maintain the backups myself, as the weekly turnout is around 12
What's this little box thingy for? Oh! [color=red]C[/color][color=orange]O[/color][color=yellow]L[/color][color=blue]O[/color][color=indigo]R[/color]
Kenny
Posts: 15
Joined: November 5th, 2006, 11:43 pm

Post by Kenny »

I can upload 600GB files on at least several places including one or more server at home. The more, the better, right?
Gothoborgensis
Posts: 4
Joined: August 13th, 2008, 2:30 am
Location: Gothenburg Sweden
Contact:

Post by Gothoborgensis »

Could there not be an interest from some official institution to keep a backup of files like the ones at librivox? Library of Congress for example, or any big city library in England or USA... I dont totaly know the US but in Western Europe i think all states and regions have public archives dedicated to preserve written and other documentation of our and older times.
Shipley
Posts: 680
Joined: February 18th, 2009, 10:05 am
Location: MA, USA

Post by Shipley »

Has any thought been given to backing up files on DVD-R's? At 4.7 GB per disc, we would need no more than a couple of hundred for all existing files, and they are easy enough to duplicate so that you could keep multiple copies in widely spaced locations. Alternatively, you can now keep about 7 GB in a gmail account - have a hundred volunteers sign up for an extra Gmail account each, and upload the files via Gspace or some similar Mozilla add-on. I keep several hundred MB of backup files for a church and home business using Gmail in this manner.
Mariane
Posts: 11
Joined: September 4th, 2009, 11:44 am

Post by Mariane »

You can also get one terabyte external hard disk for
about 200 euros now. This is the best backup solution
as it can be put online (while DVDs cannot unless they
are kept in the drive).

You need someone with some apache experience, though,
it is not trivial.

Maybe Cambridge University would be willing to store
the data? They are working on HTK so it would be useful
for them too.

Mariane
Post Reply