Single-byte encodings like cp1252 will tend to match most byte sequences, so it's best to put them at the end of the list of encodings to try. When an encoding 0 tag is met, the script attempts to recast it as GB18030 first, then if it's not valid falls back to code page 1252. Luckily encoding 0 is easy to recover into its original bytes since ISO-8859-1 is a 1-to-1 direct mapping of the ordinal byte values. Personally I haven't seen ID3s marked as UTF (encodings 1-3) in error before. If a tag is marked as being in UTF-8 or a UTF-16 encoding it's assumed to be correct, and simply converted to UTF-8 if it isn't already. (Ostensibly encoding 0 is ISO-8859-1, but in practice it is often a Windows default code page.) Only the tags marked as being in encoding 0 are wrong. The above script makes a few assumptions: Raise ValueError('None of the tryencodings work for %r key %r' % (path, key)) If value.encoding!=3 and isinstance(getattr(value, 'text', ), unicode):īytes= '\n'.join(value.text).encode('iso-8859-1') For example: musicroot= ur'C:\music\wonky' So I'd download Mutagen and write a custom Python script to automate your own decisions about how to fix up unknown encodings. Having a mixture of cp1252, UTF-16 and GB-18030 is quite unusual and I don't think existing software will be able to solve that automatically. I don't think you're going to find a standalone application that will fix up your particular selection of incorrectly-tagged encodings. I've never needed that functionality - I use EF and mid3v2 in concert to handle my retagging needs. I don't think it does much in the way of internet lookups and I don't know how it is with album artwork - Quod Libet may support that Ex Falso can do it with a plugin, should one exist, though one might not exist. Mutagen is written in Python.Įx Falso is a nice, clean GUI, and supports most of the major retag-multiple-files features you'd expect. It is certainly capable of converting all text into UTF-8, but you may need to script that yourself (I believe that the mid3v2 tool's defaults are to keep the current encoding where possible, and I don't know if it can be told to save everything in a particular encoding). As far as your normalization step goes, Mutagen only saves tags in ID3v2.4. It is also excellent with character encodings, and includes a basic scriptable commandline tagger ( mid3v2). In particular, you want the Mutagen tagging library, which supports id3v2.4 (and by "support" I mean "enforce". Picard (the MusicBrainz tagger) may use the same tagging library, but QL originated it. You want Ex Falso, the tag editor included in the Quod Libet project. I am also not (yet) interested in tag cleaning, mass renaming or categorisation software only I first have do the afore-mentioned normalisation step.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |