Re: TECH: UTF-8 and schcompile
From: | Henrik Theiling <theiling@...> |
Date: | Monday, April 24, 2006, 1:07 |
Hi!
Paul Bennett <paul-bennett@...> writes:
> Apparently, it ain't as simple as all that.
>
> I tried it with a UTF-8 sch file.
>
> The error was:
> Error: The first line should read something like '#!...sch...'
>
> so I erased the UTF-8 BOM at the start of the file.
I suspect I could skip the BOM, but the idea is that #! is at the beginning
of the file, of course, just like scripts under Unix.
> I also edited the "open" line to:
> open (F, '<:utf8', "$file") or error "While trying to read '$file': $!";
My idea was it should work in most cases when read in normal
UTF8-unware 8bit mode, provided both .sch and input files are read
this way. UTF8 is unambiguous in matching, so any multichar phoneme
you define should just match a multichar sequence in the input string.
The only problem would be if you used a single . to match one
character -- this would not work, since it would match parts of UTF8
sequences.
Maybe I'll try to quick-fix the BOM problem tomorrow if I find the time.
**Henrik
Reply