Theiling Online    Sitemap    Conlang Mailing List HQ   

Re: TECH: schcompile (Was: More Þrjótran)

From:Philip Newton <philip.newton@...>
Date:Sunday, April 23, 2006, 13:06
On 4/23/06, Benct Philip Jonsson <bpj@...> wrote:
> Henrik Theiling skrev: > > > Because the file is translated to Perl with the literal strings left > > as is, you can use whatever your Perl installation accepts. I.e., > > basically anything. > > OK, but how do I make Perl know that an input file is in UTF-8?
One question is: do you need Perl to know that an input file is in UTF-8? If it's just replacing bytes with other bytes, it doesn't really matter much whether Perl thinks of a string as "öå" (island stream?) or as "öå" -- it's all just bytes to Perl. So if the rule file is in UTF-8 and the input text is in UTF-8, too, it should just work, and the output should be in UTF-8, too, even if Perl is unaware of that fact.
> I tried with a simple program: > > while(<INFILE>){ > chomp; > print length($_) . "\n"; > } > > Which printed 12 when the line in the input file was really > six UTF-8 characters! So I guess there must be some way of > telling Perl in what encoding INFILE is.
With newer perls (>= 5.6.3 or so, I think; 5.8.x should all be fine), I *think* that this could work: open INFILE, '<:utf8', 'filename'; and equivalently open OUTFILE, '>:utf8', 'otherfilename'; Alternatively, binmode(INFILE, ':utf8'); may help, as may use open ':utf8'; . Running 'perldoc perluniintro' may help, and may provide some pointers to further documentation. Lycka till, -- Philip Newton <philip.newton@...>