[COMPLETE] English and Cantonese Dictionary, by John Chalmers - availle

DavidReader · Post by **DavidReader** » July 13th, 2019, 9:31 pm

English and Cantonese Dictionary by John Chalmers (1825 - 1899) and others.

This project is now complete! All files can be downloaded from the catalog page:
https://librivox.org/english-and-cantonese-dictionary-by-john-chalmers/

Text source (only read from this text!): https://archive.org/details/en00glishcantonesechalrich
Type of proof-listening required (Note: please read the PL FAQ): standard

IMPORTANT - soloist, please note: in order to limit the amount of languishing projects (and hence the amount of files on our hard-pressed server), we ask that you post an update at least once a month in your project thread, even if you haven't managed to record anything. If we don't hear from you for three months, your project may be opened up to a group project if a Book Coordinator is found. Files you have completed will be used in this project. If you haven't recorded anything yet, your project will be removed from the forum (contact any admin to see if it can be re-instated).
Please don't download or listen to files belonging to projects in process (unless you are the BC or PL). Our servers are not set up to handle the greater volume of traffic. Please wait until the project has been completed. Thanks!

Magic Window:

BC Admin
============================================
Genres for the project: *Non-fiction/Reference
Keywords that describe the book: English, cantonese, dictionary
============================================
The reader will record the following at the beginning and end of each file:
No more than 0.5 to 1 second of silence at the beginning of the recording!
START of recording (Intro):
- "Section [number] of English and Cantonese Dictionary. This is a LibriVox recording. All LibriVox recordings are in the public domain. For more information, or to volunteer, please visit: librivox DOT org"
- If you wish, say:
  "Recording by [your name], [city, your blog, podcast, web address]"
- Say:
  "English and Cantonese Dictionary, by John Chalmers. [Chapter]"
For the second and all subsequent sections, you may optionally use the shortened form of this intro disclaimer:
- "Section [number] of English and Cantonese Dictionary by John Chalmers. This LibriVox recording is in the Public Domain."
- If you wish, say:
  "Recording by [your name], [city, your blog, podcast, web address]"
- Only if applicable, say:
  "[Section title]"
END of recording:
- At the end of the section, say:
  "End of [Section]"
- If you wish, say:
  "Recording by [your name], [city, your blog, podcast, web address]"
- At the end of the book, say (in addition):
  "End of English and Cantonese Dictionary, by John Chalmers. "
There should be ~5 seconds silence at the end of the recording.
Example filename englishcantonesedictionary_###_chalmers_128kb.mp3 (all lower-case) where ## is the section number (e.g. englishcantonesedictionary_001_chalmers_128kb.mp3)

Transfer of files (completed recordings)
Please always post in this forum thread when you've sent a file. Also, post the length of the recording (file duration: mm:ss) together with the link.
- Upload your file with the LibriVox Uploader: https://librivox.org/login/uploader
  (If you have trouble reading the image above, please message an admin)
- You'll need to select the MC, which for this project is: Availle
- When your upload is complete, you will receive a link - please post it in this thread.
- If this doesn't work, or you have questions, please check our How To Send Your Recording wiki page.
Any questions?
Please post below

Post by **annise** » July 13th, 2019, 9:55 pm

You will need to give some idea about the number of sections you will use. What plan do you have to divide it up ?

Anne

Post by **Availle** » July 13th, 2019, 10:00 pm

Ummm... how shall I put this.... politely....?

David, I think this is not something that would make a good audiobook. Besides the prefaces and the intros and everything, you have some 820 pages of words English - Cantonese. And only word pairs and nothing else (because the characters will be impossible to bring over to an audiobook).

I don't think the way we organise our audiobooks would work for a dictionary like this because

- you cannot look up a single word: no matter how long/short the section you will have to listen from the beginning or do some very painful back and forth searching in the audio (and we can only have projects with 255 sections max)
- the book is more than 100 years old: languages and pronunciations change, new words are added, old words are deleted which means
- maybe the word you're looking for is not even in the book

In general, LV is very reader-centric, in a sense that if a reader wants to record a book, as long as it's PD, we're: go for it! without caring for listeners on the other end.
But this one, because of its sheer length and contents, I'm afraid it will be excrutiatingly boring for you after a while and very likely to put you off audiobook recording for good (and for very egocentric reasons, I can't have that right now.

)

I really think it's a better option to read a book in Cantonese; with a bit of luck, there may even be bilingual books out there.

Post by **mightyfelix** » July 14th, 2019, 7:42 am

Maybe a selection from this would be great for the insomnia collection, though!

viewtopic.php?f=19&t=74778

DavidReader · Post by **DavidReader** » July 14th, 2019, 9:52 pm

Thanks for the comments and advice.

My case for embarking on this project is this:
1. There is an absolute dearth of such similar project as far as I am aware of;
2. I guess Cantonese is spoken by nearly 100 million people in the world, and I think that it is worth letting English speaking people to have some inkling of how it sounds for an dialect spoken by so many people (of course with the caveat that this may not be exactly what it was like a hundred years ago). This is made more relevant given the lack of such material at present;
3. Being a native Cantonese speaker, with English as the second language, I am in a suitable position to engage in the task.

Comments to the difficulties mentioned:
1. that "...the characters will be impossible to bring over to an audiobook":
This is exactly how this dictionary was presented that makes the above-mentioned difficulty of rendering the character to an audiobook disappears. The Chinese character meaning of the English word is followed by its phonetic symbols. Therefore, I will be reading both the Chinese character and the phonetic symbol at the same time.

2. that this book is too long:
Yes, it is over 800 pages long, but the amount of materials printed on each page is actually not that much. Being a native Cantonese speaker myself, I have no difficulty in reading through the page smoothly. I have done a trial to ascertain the time for me to read one page, and it amounts to about 2.5 minutes. Thus for a 820-pages book, it will take 2050 minutes, i.e., about 34 hours. Also since the whole process will stay relatively unchanged from beginning to end, it should just be a matter of plodding it through in a steady pace with patience.

3. difficulty for people to look up a single word:
Yes, this is a relevant logistical problem that needs to be thought over, but nevertheless this should not be totally insurmountable. Initially, I have the following plan to tackle the problem, which is also the rationale for how to divide the work into reading sections:
i) I shall divide the whole work into 20 minutes each, stating for each section the starting word and the ending word. E.g. "section 3: Abuse to Arbitrator".
ii) To be even more searchable, we can add more details by stating the starting word for each minute of the section. E.g. "section 3: Abuse to Arbitrator; 2m: Account; 3m: Acquire;..." Thus, if someone wants to search "Achieve:, he/she can just search from the second to the third minute of the recording. With the versatility of Audacity, this annotations should not require too much extra work.
iii) From the above calculations, the whole work can be divided up into sections in the following way:
Given an average of 2.5 minutes per page, then for a 20-min section, it will cover 8 pages per section. Eg. "Section 1: Front Matter, including: Preface, Notes to the Fourth and Fifth Edition, Rules for Pronouncing the Chinese"; "Section 2: A - Ambitious, p. 1 - 8"; "Section 3: Ambush - Babble, p. 9 - 16", etc...

4. that "the book is more than 100 years old: languages and pronunciations change etc...word looking for is not even in the book"
I find this a bit baffling. We are only able to read books that are in the public domain, that is at least more than 50 or 70 years old. If a book that is 112 years old (this book published in 1907) is considered "too old", then we have rather limited choices indeed. From my brief perusal of the book, the English words (the word that people look for, not the Chinese character) included are mostly rather ordinary English words of common usage (even for the present). Rather, I admit that some of the Chinese equivalents stated seems a bit archaic and awkward (but most of them pretty accurate), but the listeners should also bear in mind the historical context of the work, and the fact that the compiler was a Scottish Protestant missionary in China who may not be totally conversant with the language. But I don't think people listening to it will treat it as a guide for their conversation with a contemporary Cantonese.

On the whole, I feel that this project is doable, a rather novel inclusion in the Librivox opus, with some real practical use, and most importantly, may even be interesting to the listeners.

Post by **annise** » July 14th, 2019, 10:15 pm

That answers my question, We do have a limit of 255 sections(files) because Archive - our file hoster - can not handle anymore and still keep them playing in order but I think with 20-minute readings that will be OK - and you could spit it into 2 projects if it didn't fit.

Anne

silverquill · Post by **silverquill** » July 14th, 2019, 10:17 pm

Well, I'm not an admin, and maybe this discussion should take place on the Book Suggestions forum instead of here on the Launch Pad.
But, I think you make a good case for this project. It is something out of the usual offering, but I think that's the beauty of LibriVox -- it is a big tent with room for all kinds of acts. Obviously you have given this some serious thought and know what you are facing, which I think counts for a lot.

Dividing it into 20-minutes sections seems valid. That is a comfortable size for both reader and listener to handle. I'm not sure additional tagging is necessary or feasible. If a person wants to look up "dog," then it is easy to see which section it is in, and shouldn't be too hard to find within the section, depending on what program they are using for listening.

Although I'm sure we have 34-hour and longer books in the catalog, I wonder if it make make this more manageable if you divided it into two or three volumes. Just a thought.

And, would you need someone bilingual to do the proof listening?

~Larry

DavidReader · Post by **DavidReader** » July 14th, 2019, 11:39 pm

Thanks for silverquill and annise for your quick response,
Yes, as a full run of 34 hours in one project seems a bit long, but in view of the continuous and integral nature of a dictionary, it seems that the most satisfactory result will be as a single unit.

BTW, I just prepare a tentative division of the sections as follows:
Sections/From(page)/To(page)/"word x" - "word y"
0 Front Matters
1 1 8 A - Ambitious
2 9 16 Ambush - Back
3 17 24 Back-bite - Blade
4 25 32 Bladed knife, three - Butler
5 33 40 Butt - Chasm
6 41 48 Chaste - Compete
7 49 56 Competent - Courtesy
8 57 64 Cousin - Deep
9 65 72 Deer - Discretionary
10 73 80 Discriminate - Dyspepsia
11 81 88 Dysury - Err
12 89 96 Error - Feces
13 97 104 Fee - Foundation
14 105 112 Foundling - Glauber-salts
15 113 120 Glazed - Hale
16 121 128 Half - Hound
17 129 136 Hour - Independent
18 137 144 Index - Intestines
19 145 152 Intimate - Knell
20 153 160 Knife - Lift
21 161 168 Light - Mahogany
22 169 176 Mahommedanism - Merciful
23 177 184 Merciless - Mountain
24 185 192 Mourn - Night-dress
25 193 200 Nimble - Office
26 201 208 Officer - Over-night
27 209 216 Overpass - Part
28 217 224 Partake - Pelvis
29 225 232 Pen,pencil - Philospher
30 233 240 Phlegm - Plain
31 241 248 Plaintiff - Polyedron
32 249 256 Polypode - Pox
33 257 264 Practicable - Pretty
34 265 272 Prevail - Prolapse
35 273 280 Prolegomena - Pulverize
36 281 288 Pumice - Quinquennial
37 289 296 Quinsy - Rations
38 297 304 Rattan - Red-hot
39 305 312 Redeem - Remain
40 313 320 Remainder - Respect
41 321 328 Respectable - Rhyme
42 329 336 Rib - Roast
43 337 344 Rob - Run
44 345 352 Runner - Sample
45 353 360 Sanctify - Scissors
46 361 368 Scoff - Secretary
47 369 376 Secretion - Serious
48 377 384 Sermon - Sharp
49 385 392 Sharpen - Shot
50 393 400 Should - Silkl
51 401 408 Silk-worm - Sky
52 409 416 Sky-blue - Slough
53 417 424 Slovenly -Snub-nosed
54 425 432 Snuff - Son
55 433 440 Son-in-law - Special
56 441 448 Specially - Spoil
57 449 456 Spoil, to - Stabling
58 457 464 Stack - Statement
59 465 472 State-paper - Sting
60 473 480 Stingily - Straightway
61 481 488 Strain - Strive
62 489 496 Stroke - Submissive
63 497 504 Submissively - Sulphate
64 505 512 Sulphur - Suppose
65 513 520 Supposing - Swarthiness
66 521 528 Swath - Synagogue
67 529 536 Synchronize - Talisman
68 537 544 Talk - Tea-tray
69 545 552 Teach - Ten
70 553 560 Tenable - Thank
71 561 568 Thankful - Third
72 569 576 Thirst - Throw
73 577 584 Throwster - Tightly
74 585 592 Tile - To
75 593 600 Toad - Tooth-ache
76 601 608 Tooth-bush - Train
77 609 616 Train, to - Tread
78 617 624 Treason - Trill
79 625 632 Trigger - Trustee
80 633 640 Trustful - Twelve
81 641 648 Twentieth - Unassisted
82 649 656 Unassuaged - Undercurrent
83 657 664 Underdo - Unison
84 665 672 Unit - Unreserved
85 673 680 Unreservedly - Unwell
86 681 688 Unwholesome - Use
87 689 696 Useful - Vapouring
88 697 704 Vaporize - Vernacular
89 705 712 Vernal - Vileness
90 713 720 Vilify - Vituperate
91 721 728 Vituperative - Wake
92 729 736 Wakeful - Warp
93 737 744 Warped - Wax
94 745 752 Wax-candle - Weigh
95 753 760 Weighing-machine - Whereas
96 761 768 Whereat - Wholly
97 769 776 Whore - Wily
98 777 785 Win - With
99 786 792 Withal - Wood
100 793 800 Woodbine - Worry
101 801 808 Worse - Wrongfully
102 809 816 Wrongly - You
103 817 822 Young - Zymotic

Any comments are welcome

Post by **Availle** » July 15th, 2019, 1:44 am

Okay David, let me rebut your rebuttal

I'm ready to MC this and most of the problems that I have here are not really issues with the project itself; but I'm seriously worried you'd burn yourself out on reading - and editing! - 34 hours worth of word pairs...

DavidReader wrote: ↑July 14th, 2019, 9:52 pm 1. There is an absolute dearth of such similar project as far as I am aware of;
2. I guess Cantonese is spoken by nearly 100 million people in the world, and I think that it is worth letting English speaking people to have some inkling of how it sounds for an dialect spoken by so many people

And an audiobook is the best way to do that? I mean, you could be the next hit on youtube "HongKong Style" or something, if spreading the sound of the language is all you want, really.

Comments to the difficulties mentioned:
1. that "...the characters will be impossible to bring over to an audiobook":
This is exactly how this dictionary was presented that makes the above-mentioned difficulty of rendering the character to an audiobook disappears. The Chinese character meaning of the English word is followed by its phonetic symbols. Therefore, I will be reading both the Chinese character and the phonetic symbol at the same time.

Sorry, I was not clear what I meant. I realise that you will read the characters; the point I was trying to make is that if you intend this for learners of the language, the characters are very important. The sound can be exactly the same, but the meaning lies in the characters.
Looking at my Japanese dictionary here, the word "kyo" can mean bulletin or proceedings, clever, today, bad luck, capital, lord, sutra, a spiritual state of selflessness or uninhabited, interest or fun. The pronunciation is exactly the same, but without the kanji the meaning will be lost.
I understand that here, where you start out with the English, it's less of an issue, and of course in normal conversation the context will guide you, but this here is just a list of words.

This is the major issue:

3. difficulty for people to look up a single word:
i) I shall divide the whole work into 20 minutes each, stating for each section the starting word and the ending word. E.g. "section 3: Abuse to Arbitrator".

Good, but I think the sections are too long still. Since you get only a bit over 100 sections like this, I would suggest making 10 minute sections instead.

ii) To be even more searchable, we can add more details by stating the starting word for each minute of the section. E.g. "section 3: Abuse to Arbitrator; 2m: Account; 3m: Acquire;..." Thus, if someone wants to search "Achieve:, he/she can just search from the second to the third minute of the recording. With the versatility of Audacity, this annotations should not require too much extra work.

I don't understand what you mean here?
- You want to first record the whole 10 minutes, then add the beginning/ending word at each minute mark? That's violating our rule of "reading the text as it is written". If there is something like this on the top of each page (like modern dictionaries have), I'll let you get away with it, but if it isn't, that's a no-go I'm afraid.
- Or do you want to put text annotations into audacity? I'm pretty sure that these do not transfer over to the mp3, and even if they do for the files you deliver here, archive will make other derivatives and there is no guarantee that these will transfer.

4. that "the book is more than 100 years old: languages and pronunciations change etc...word looking for is not even in the book"
I find this a bit baffling. [snip] I admit that some of the Chinese equivalents stated seems a bit archaic and awkward (but most of them pretty accurate), but the listeners should also bear in mind the historical context of the work, and the fact that the compiler was a Scottish Protestant missionary in China who may not be totally conversant with the language. But I don't think people listening to it will treat it as a guide for their conversation with a contemporary Cantonese.

You/we have no control over where the project will end up and how. Yes, you may give all this information in the project summary and people downloading straight from LV or archive will see and (hopefully) heed it. But it's just as possible that somebody will take all your 34 hours of word pairs, cut them up and post them individually on youtube. And somehow I have the suspicion, there will be no commentary on where all this goodness is coming from.

As I said, I don't think a dictionary (no matter which language) makes a good audiobook, mostly because of restrictions of the format.
The question is probably:

- What do you want for the listener?
Do you want the listener just to get an idea of how Cantonese sounds? Then maybe recording books in Cantonese would be a better choice.
If you want the listener to really have a dictionary, then go for it!

- What listeners do you want?
Some English speaker who is interesting in learning Cantonese, then go for it!
Some a bit more advanced Cantonese speaker? Then maybe again, a novel would be a better choice imo.

I'm not questioning your determination here, because seeing what you do over in the Red Chamber, I'm pretty confident you'll push through once you get started.

DavidReader · Post by **DavidReader** » July 15th, 2019, 8:55 pm

Thanks for your comments.

Actually I have not also mentioned another reason for my intention in engaging in this project:
One important target audience of this project is for people like myself, who is a native Cantonese and with English as a second language. When I first stumble upon this book, I realize that many of the Cantonese expressions used by Mr. Chalmers are colloquial usages that had lapsed somewhat, and were less commonly used nowadays, but provided the provenances of some of those altered usages that we are using today. I am enlightened on many occasions as I am perusing the book. The historical context of this work in important in providing clues as to how our dialect is evolving.

As for my response to your rebuttals:
1. I am sorry that I still do not quite catch your point about how an audiobook is not doing justice to the Chinese characters printed in Chalmers' book...Of course, Chinese characters have many homophones etc., but my chief intention in this project is NOT to create a guide for people to learn the language. (I will be making a very boastful claim if I were so intended). The meaning of the Chinese characters printed is exactly the English word included by Mr. Chalmers, can I add anything more?

2. I don't have too much issue in whether to divide the recording into 20-min. or 10-min. sections, but if we opt for the latter, then the whole thing will be composed of over 200 sections, making the list of content rather unwieldy.

3. My point of making it more searcher-friendly by adding additional bookmarks, is done by incorporating the added details in the list of contents, and not in the audacity file itself (I don't know how if I were asked to do it!). What I meant is, while I am doing the recording, I can mark down the timeline depicted in the Audacity window to indicate the word at a certain time and report it back in the content list.

4. As to how my future work will be used (or abused) outside of LV, well, who knows? But if this is a concern, then this will apply to all other projects as well.

5. That I should record a fiction or a non-fiction book written in Cantonese instead. This approach has its pros, but despite all the cons for recording a dictionary, it is the only format used that can depict some hints of systematic sense and comprehensiveness in exhibiting this language. But also includes the reasons that I have mentioned beforehand.

I am ready to submit a trial section for comments if you are interested.

One last point, I have made my case clear for this project. But if the Administration of LV is still adamant in considering that this project is unfeasible or inappropriate, then state that definitely, and I will not continue pressing forward.

Post by **Availle** » July 15th, 2019, 9:44 pm

Well David, you know what they say: If you can't fight them, join them!

I'm ready to set this up for you, but I need to know whether you'll go with 10 or 20 minute sections, i.e., whether you need some 100 or 200 sections.

DavidReader wrote: ↑July 15th, 2019, 8:55 pm 2. I don't have too much issue in whether to divide the recording into 20-min. or 10-min. sections, but if we opt for the latter, then the whole thing will be composed of over 200 sections, making the list of content rather unwieldy.

3. My point of making it more searcher-friendly by adding additional bookmarks, is done by incorporating the added details in the list of contents, and not in the audacity file itself (I don't know how if I were asked to do it!). What I meant is, while I am doing the recording, I can mark down the timeline depicted in the Audacity window to indicate the word at a certain time and report it back in the content list.

I'm still not sure what you mean with "list of content".

Does that mean you want to put the words at the full minute marks into the section titles (this will be the "title" field of the ID3 tags in the end)? I think they have a limited length, and many devices may cut them off even before that, so it's best to keep them comparatively short.
If that's what you want to do, you'll need to choose the section length wisely.

DavidReader · Post by **DavidReader** » July 16th, 2019, 8:13 pm

Thanks for your reply.

As I have proposed, I would prefer the 20-min. per section option due to a concern of not letting the list of contents too long. Another advantage is that I can eliminate half the occasions of repeating the mandatory preamble for each of the sections, which may be tedious for the listener for hearing it every time of a relatively short 10-min. clip.

Please also note that I have prepared a rough list of divisions on a previous post, which listed a scheme of 8 pages per section (on the rough estimation of: 20 min./2.5 min.per page), with the beginning and ending page number for each section, and the corresponding range of English word items. To make the word search more efficient, but does not require too much extra work, I can also note down the word appearing at the 10-min. instant, and then incorporating this added information at the completion of the project into the list of content of the final product.

I also propose to name the section for the front matters (prefaces, rules of pronunciation etc.) as section 0, since this would make naming the sections for the remaining dictionary proper more logical, comprehensible and consolidated.

As for the wordings of the preamble, it should all be read as "section x", and no "chapter".

Thanks for your patience, I think we can get it started.

silverquill · Post by **silverquill** » July 16th, 2019, 8:21 pm

And, as long as you understand that I can't comment on the Cantonese, I am ready to come on board as DPL, just checking for the obvious technical things and obvious mistakes. Let's start with one section, so we see how it goes and can make any changes needed before getting too far into it.

Post by **Availle** » July 17th, 2019, 4:02 pm

Okay David, we have a MW!

Since you are now the BC here, you are in charge of keeping it pretty and in good shape. For that, you will need a new password for our workflow and uploader. Details are here: https://wiki.librivox.org/index.php?title=Soloists:_How_to_update_the_Magic_Window
From now on, you can use your personal MW password for the uploader as well.
Let me know if you have questions or can't get in.

Other than this, we look good. You will need three digits in the filenames since you have more than 100 sections; and you probably should name both authors in the intro: by John Chalmers and Thomas Kirkman Dealy. Let me know if you'll take up Larry on his DPL offer so I can add him to the project as DPL.

That's pretty much it. Have fun!

DavidReader · Post by **DavidReader** » July 17th, 2019, 7:54 pm

There are new things to learn every day! I will surely take time to read carefully the link for being a BC. But in the meantime, I would like to proceed as far as I know how, and to tidy up things while I am stumbling along and to deal with any unforeseen problems as may arise in due course. Stay tuned for my stress calls!
As for Larry's offer, I can only say that he is welcome with open arms!
Let's see how it goes then...
Thank you all.

[COMPLETE] English and Cantonese Dictionary, by John Chalmers - availle

Magic Window: