Re: TECH: UTF-8 and schcompile
From: | Benct Philip Jonsson <bpj@...> |
Date: | Monday, April 24, 2006, 9:16 |
Paul Bennett skrev:
> Apparently, it ain't as simple as all that.
>
> I tried it with a UTF-8 sch file.
>
> The error was:
> Error: The first line should read something like '#!...sch...'
>
> so I erased the UTF-8 BOM at the start of the file.
>
> I also edited the "open" line to:
> open (F, '<:utf8', "$file") or error "While trying to read '$file': $!";
That's the expected thing, IIUC. Unfortunately you have to use that
'<:utf8' when opening an UTF-8 file, or perl won't know that your
input is UTF-8. See
<http://perldoc.perl.org/perluniintro.html#Unicode-I%2fO>
# Reading in a file that you know happens to be encoded in
# one of the Unicode or legacy encodings does not
# magically turn the data into Unicode in Perl's eyes. To
# do that, specify the appropriate layer when opening
# files
>
> since without the BOM, there's no defined way to identify a Unicode
> file (absent some external data).
>
> This made it kinda work, though I apparently have syntax errors out the
> wazzoo in my sch file. It works on a single-step single-rule file,
> though, so that's something.
>
> NOTE to Windows users: Notepad, Wordpad, and every standard Windows tool
> I have tried all fail to show the BOM or any sign of its existence.
> While this is technically correct behavior, it's not very helpful in
> this case. I used Cygwin VIM to "repair" the file, but there's a
> learning curve associated with it, and you have to carry out the repair
> steps at every iteration.
Can't you use a Perl s/// statement to remove the BOM? I would imagine
it is just a matter of knowing what it looks like in the encoding you
are using, and remove it if it appears at the beginning of the file?
See <http://www.unicode.org/faq/utf_bom.html#25>
> How have y'all managed to produce and use UTF-8 sch files in Windows?
Haven't tried yet. I'll try with UTF-8 *output*, and the BOM problem
shouldn't appear there (though perhaps I should put in the BOM?)
Apparently one should open the output file like this:
open OUT, ">:utf8", "file";
--
/BP 8^)>
--
Benct Philip Jonsson -- melroch at melroch dot se
"Maybe" is a strange word. When mum or dad says it
it means "yes", but when my big brothers say it it
means "no"!
(Philip Jonsson jr, age 7)
Reply