Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: TECH: UTF-8 and schcompile

From:Benct Philip Jonsson <bpj@...>
Date:Monday, April 24, 2006, 9:16
Paul Bennett skrev:
> Apparently, it ain't as simple as all that. > > I tried it with a UTF-8 sch file. > > The error was: > Error: The first line should read something like '#!...sch...' > > so I erased the UTF-8 BOM at the start of the file. > > I also edited the "open" line to: > open (F, '<:utf8', "$file") or error "While trying to read '$file': $!";
That's the expected thing, IIUC. Unfortunately you have to use that '<:utf8' when opening an UTF-8 file, or perl won't know that your input is UTF-8. See <http://perldoc.perl.org/perluniintro.html#Unicode-I%2fO> # Reading in a file that you know happens to be encoded in # one of the Unicode or legacy encodings does not # magically turn the data into Unicode in Perl's eyes. To # do that, specify the appropriate layer when opening # files
> > since without the BOM, there's no defined way to identify a Unicode > file (absent some external data). > > This made it kinda work, though I apparently have syntax errors out the > wazzoo in my sch file. It works on a single-step single-rule file, > though, so that's something. > > NOTE to Windows users: Notepad, Wordpad, and every standard Windows tool > I have tried all fail to show the BOM or any sign of its existence. > While this is technically correct behavior, it's not very helpful in > this case. I used Cygwin VIM to "repair" the file, but there's a > learning curve associated with it, and you have to carry out the repair > steps at every iteration.
Can't you use a Perl s/// statement to remove the BOM? I would imagine it is just a matter of knowing what it looks like in the encoding you are using, and remove it if it appears at the beginning of the file? See <http://www.unicode.org/faq/utf_bom.html#25>
> How have y'all managed to produce and use UTF-8 sch files in Windows?
Haven't tried yet. I'll try with UTF-8 *output*, and the BOM problem shouldn't appear there (though perhaps I should put in the BOM?) Apparently one should open the output file like this: open OUT, ">:utf8", "file"; -- /BP 8^)> -- Benct Philip Jonsson -- melroch at melroch dot se "Maybe" is a strange word. When mum or dad says it it means "yes", but when my big brothers say it it means "no"! (Philip Jonsson jr, age 7)

Reply

Benct Philip Jonsson <bpj@...>