Re: TECH: schcompile (Was: More Þrjótran)
|From:||Philip Newton <philip.newton@...>|
|Date:||Sunday, April 23, 2006, 13:06|
On 4/23/06, Benct Philip Jonsson <bpj@...> wrote:
> Henrik Theiling skrev:
> > Because the file is translated to Perl with the literal strings left
> > as is, you can use whatever your Perl installation accepts. I.e.,
> > basically anything.
> OK, but how do I make Perl know that an input file is in UTF-8?
One question is: do you need Perl to know that an input file is in UTF-8?
If it's just replacing bytes with other bytes, it doesn't really
matter much whether Perl thinks of a string as "öå" (island stream?)
or as "Ã¶Ã¥" -- it's all just bytes to Perl. So if the rule file is in
UTF-8 and the input text is in UTF-8, too, it should just work, and
the output should be in UTF-8, too, even if Perl is unaware of that
> I tried with a simple program:
> print length($_) . "\n";
> Which printed 12 when the line in the input file was really
> six UTF-8 characters! So I guess there must be some way of
> telling Perl in what encoding INFILE is.
With newer perls (>= 5.6.3 or so, I think; 5.8.x should all be fine),
I *think* that this could work:
open INFILE, '<:utf8', 'filename';
open OUTFILE, '>:utf8', 'otherfilename';
Alternatively, binmode(INFILE, ':utf8'); may help, as may use
open ':utf8'; . Running 'perldoc perluniintro' may help, and may
provide some pointers to further documentation.
Philip Newton <philip.newton@...>