If you use NewsFire to export OPML, you may run into some trouble because there are a few things wrong with the export. In particular, even though the XML header indicates that the character encoding is ISO-8859-1:
<?xml version="1.0" encoding="ISO-8859-1"?> <!-- Generated by NewsFire 67 --> <!-- http://www.NewsFireRSS.com/ --> <opml version="1.1"> ...
the file is, in fact, encoded as UTF-16 (or some other two byte encoding). You can see this in the output of od -a
on the OPML file:
0000000 ff fe < nul ? nul x nul m nul l nul sp nul v nul 0000020 e nul r nul s nul i nul o nul n nul = nul " nul 0000040 1 nul . nul 0 nul " nul sp nul e nul n nul c nul
The two characters "ff fe" are referred to as the BOM, or byte order mark, of the file. That's the first clue that this is a two byte encoding. Next, you'll see that every other character is a NUL. That's because UTF-16 keeps a NUL in the high byte for ASCII characters. Anyway, all of this doesn't mean much other than that this file is not, as previously indicated, ISO-8859-1, which is a one byte encoding. To fix it, make use of the lovely utility "iconv", which comes standard on most Unixes (and the Mac). "-f" means "from this encoding" and "-t" means "to this encoding".
stechert@kirin:~/Desktop [1040] $ iconv -f UTF-16 -t UTF-8 My\ NewsFire\ Feeds.opml > My\ NewsFire\ Feeds2.opml stechert@kirin:~/Desktop [1041] $ od -a My\ NewsFire\ Feeds2.opml 0000000 < ? x m l sp v e r s i o n = " 1 0000020 . 0 " sp e n c o d i n g = " I S 0000040 O - 8 8 5 9 - 1 " ? > nl < ! - -
Changing the indicated encoding to UTF-8 by replacing the string "ISO-8859-1" in the header with "UTF-8" gets us a properly encoded XML file.
Now the only remaining problem is that OPML requires a head element within the OPML tag, so go add that. The head of your file should now look like this:
<?xml version="1.0" encoding="UTF-8"?> <!-- Generated by NewsFire 67 --> <!-- http://www.NewsFireRSS.com/ --> <opml version="1.1"> <head/> <body> ...
Having come this far, your OPML file should now validate and you could, e.g., use it to upload a news filter to TailRank, if you wanted to give it a try (instead of getting error messages about how brokenly formatted your OPML is). It's annoying that David Watanabe gets this wrong. And it was a missed opportunity-to-impress that the burtonator's code didn't handle this kind of stuff automatically. But then again, neither does Google Reader or Bloglines.
Thanks a bunch for providing a quick solution to my problem and teaching how it's done in the process. Great post!
Posted by: twitter.com/banton | September 24, 2009 at 12:33 PM