[FIXED] Japanese Fairy Tales: Typo in section 10 title

Report & help check download problems, corrupted files, badly-named files, bad links etc. (NOT for style & reading complaints)
Post Reply
dekymo
Posts: 7
Joined: December 28th, 2020, 3:59 am

Post by dekymo »

In the audiobook "Japanese Fairy Tales" by Yei Theodora OZAKI (1871 - 1932)
https://librivox.org/japanese-fairy-tales-by-yei-theodora-ozaki/
there is a typo in the title of section 10: "The Mirror of Maysuyama"
should be "The Mirror of Matsuyama".

The MP3 file
https://www.archive.org/download/jap_fairytales_0808_librivox/japanesefairytales_10_ozaki.mp3
has an ID3v1.1 tag (at end of file) and an ID3v2.3 tag (at beginning of file).
The title error is present in both.

Additionally, in the v2.3 tag, the artist & album tags are corrupted.
The corruption is visible in the preview/playlist window on the Internet Archive page, where for section 10/track 11 instead of the author (artist) name you see random Chinese characters.
knotyouraveragejo
LibriVox Admin Team
Posts: 22080
Joined: November 18th, 2006, 4:37 pm

Post by knotyouraveragejo »

Hi dekymo

Thanks for letting us know. I've fixed the typo on the catalog page. The problem with the ID3 tags is a little more complicated to fix. Will post when it is fixed. This is quite an old project cataloged in 2008. Was this the only file affected?
Jo
dekymo
Posts: 7
Joined: December 28th, 2020, 3:59 am

Post by dekymo »

knotyouraveragejo wrote: December 30th, 2020, 4:46 pm This is quite an old project cataloged in 2008. Was this the only file affected?
The artist/album corruption is only present in section 10 of the audiobook "Japanese Fairy Tales", none of the other files are affected.

However, I checked sections by the same reader (Clarke Bell) in other audiobooks and the same kind of ID3 tag corruption in present in some of their sections in other books:

section 84 ("The Villefort Family Vault") of The Count of Monte Cristo
https://librivox.org/the-count-of-monte-cristo-by-alexandre-dumas/

section 21 ("THE COUNTESS DE WINTER") of The Three Musketeers
https://librivox.org/the-three-musketeers-by-alexandre-dumas/

(section 37 in the same book, and the sections by Clarke Bell in other books look OK).
knotyouraveragejo
LibriVox Admin Team
Posts: 22080
Joined: November 18th, 2006, 4:37 pm

Post by knotyouraveragejo »

This was happening in the past for a while on some files, and if I remember correctly it had to do with what software used to add the ID3 tags to the original files. If you download the file and check the tag in Windows, they look fine as does the archive.org metadata. I would have to download the files, replace the tags and reupload. This is a little more work for these older projects since archive has changed how they derive the files since then.
Jo
knotyouraveragejo
LibriVox Admin Team
Posts: 22080
Joined: November 18th, 2006, 4:37 pm

Post by knotyouraveragejo »

The mp3 files are fixed for The Japanese Fairy Tales. The ogg file for section 10 is now missing, but I doubt if anyone will notice. Newer projects no longer have these files at all.

As for the others, I'll see about them sometime when I have time. The extra characters on the archive page do not affect streaming or downloading the files.
Jo
dekymo
Posts: 7
Joined: December 28th, 2020, 3:59 am

Post by dekymo »

knotyouraveragejo wrote: December 31st, 2020, 2:41 pm The mp3 files are fixed for The Japanese Fairy Tales. [...]
As for the others, I'll see about them sometime when I have time. The extra characters on the archive page do not affect streaming or downloading the files.
Great! I agree with you that the ID3 tag issue isn't a problem in practice for most people.

I tried to investigate what the cause of the problem was; I include what I found out for the sake of completeness/for future reference, feel free to ignore this post..

I downloaded japanesefairytales_10_ozaki.mp3 (the broken version) and looked at the ID3v2.3 tag data at the start of the file in a hex editor:

Code: Select all

00000000: 4944 3303 0000 0000 1131 5441 4c42 0000  ID3......1TALB..
00000010: 002f 0000 01ff fefe ff00 4a00 6100 7000  ./........J.a.p.
00000020: 6100 6e00 6500 7300 6500 2000 4600 6100  a.n.e.s.e. .F.a.
00000030: 6900 7200 7900 2000 5400 6100 6c00 6500  i.r.y. .T.a.l.e.
00000040: 7300 0054 5045 3100 0000 2b00 0001 fffe  s..TPE1...+.....
00000050: feff 0059 0065 0069 0020 0054 0068 0065  ...Y.e.i. .T.h.e
00000060: 006f 0064 006f 0072 0061 0020 004f 007a  .o.d.o.r.a. .O.z
00000070: 0061 006b 0069 0000 5452 434b 0000 0006  .a.k.i..TRCK....
00000080: 0000 0031 312f 3232 5443 4f4e 0000 0006  ...11/22TCON....
00000090: 0000 0028 3130 3129 5449 5432 0000 001d  ...(101)TIT2....
000000a0: 0000 0031 3020 2d20 5468 6520 4d69 7272  ...10 - The Mirr
000000b0: 6f72 206f 6620 4d61 7973 7579 616d 6100  or of Maysuyama.

You can make sense of this using the ID3v2.3 specification: https://id3.org/id3v2.3.0

Starting towards the end of the first line Thus you would expect the contents of the album title to follow in UTF-16-LE.

Instead you get what looks like another byte order marker (fe ff), which would indicate UTF-16 big-endian, followed by the title in big endian encoding (00 4a 00 61 00 70 ... = "Jap"...)

Thus the correct data is there, in UTF-16-BE, but it's preceded by a bogus UTF-16-LE byte order marker, which is causing the contents of the field to be misinterpreted.

Specifically, the first four bytes (fe ff 00 4a) are being interpreted as which corresponds to what I was seeing displayed in place of the album name: a glyph representing the BOM, followed by a string of random Chinese characters, starting with 䨀.

I haven't looked at the other MP3 files, but I imagine the cause is the same there too.
Peter Why
Posts: 5815
Joined: November 24th, 2005, 3:54 am
Location: Chigwell (North-East London, U.K.)

Post by Peter Why »

I'm glad that there's a sensible explanation for the Chinese ID3 display on archive.org. It's fairly common. The opening screen for my recording of Alice has it, too, and I've seen others: https://archive.org/details/alice_wonderland_0711_librivox

Peter
"I think, therefore I am, I think." Solomon Cohen, in Terry Pratchett's Dodger
annise
LibriVox Admin Team
Posts: 38572
Joined: April 3rd, 2008, 3:55 am
Location: Melbourne,Australia

Post by annise »

It is only on some projects catalogued before our software update - we add the ID tags in the validator so the different character sets used do not get to Archive

Anne
Post Reply