Re: TECH: UTF-8 and schcompile

From:	Paul Bennett <paul-bennett@...>
Date:	Monday, April 24, 2006, 1:30

|< < Post > >| << List/Tree >> Reference April 2006 Index

On Sun, 23 Apr 2006 21:06:57 -0400, Henrik Theiling <theiling@...>
wrote:

> Hi!
>
> Paul Bennett <paul-bennett@...> writes:
>> Apparently, it ain't as simple as all that.
>>
>> I tried it with a UTF-8 sch file.
>>
>> The error was:
>> Error: The first line should read something like '#!...sch...'
>>
>> so I erased the UTF-8 BOM at the start of the file.
>
> I suspect I could skip the BOM, but the idea is that #! is at the
> beginning
> of the file, of course, just like scripts under Unix.
I don't know how or whether various Unices handle skipping or not of the
BOM. It makes sense to never ignore it in data files, but it makes a
certain kind of DWIMmish sense for shells, scripts, and other languages to
detect it and work around or even use it. I may have to have a play with a
few OSes tomorrow and report back.

I suspect Perl's default will match the underlying OS's default. My
problem may stem from my using Cygwin Perl (which thinks it's running on
top of Linux) on top of Windows (which is not Linux)...

>> I also edited the "open" line to:
>> open (F, '<:utf8', "$file") or error "While trying to read '$file': $!";
>
> My idea was it should work in most cases when read in normal
> UTF8-unware 8bit mode, provided both .sch and input files are read
> this way.  UTF8 is unambiguous in matching, so any multichar phoneme
> you define should just match a multichar sequence in the input string.
Yes.

> The only problem would be if you used a single . to match one
> character -- this would not work, since it would match parts of UTF8
> sequences.
Indeed.

> Maybe I'll try to quick-fix the BOM problem tomorrow if I find the time.
Don't strain yourself on my behalf. I'll gladly use a workaround, though a
better workaround than VIM may be needed for some people. We're a smart
bunch, though, so we should be able to come up with something.





Paul

|< < Post > >| << List/Tree >> Reference April 2006 Index