La siguiente introducción esta tomada de la sección la sección 'Unicode' en perluniintro:
Unicode is a character set standard which plans to codify all of the writing systems of the world, plus many other symbols.
Unicode and ISO/IEC 10646 are coordinated standards that provide code points for characters in almost all modern character set standards, covering more than 30 writing systems and hundreds of languages, including all commercially-important modern languages.
All characters in the largest Chinese, Japanese, and Korean dictionaries are also encoded. The standards will eventually cover almost all characters in more than 250 writing systems and thousands of languages. Unicode 1.0 was released in October 1991, and 4.0 in April 2003.
A Unicode character is an abstract entity. It is not bound to any particular integer width, especially not to the C language char .
Unicode is language-neutral and display-neutral: it does not encode the language of the text and it does not generally define fonts or other graphical layout details. Unicode operates on characters and on text built from those characters.
Unicode defines characters likeLATIN CAPITAL LETTER A
orGREEK SMALL LETTER ALPHA
and unique numbers for the characters, in this case0x0041
and0x03B1
, respectively. These unique numbers are called code points.
The Unicode standard prefers using hexadecimal notation for the code points.
The Unicode standard uses the
notation U+0041 LATIN CAPITAL LETTER A
, to give the hexadecimal code
point and the normative name of the character.
Unicode also defines various Unicode properties for the characters, likeuppercase
orlowercase
,decimal digit
, orpunctuation
; these properties are independent of the names of the characters.
Furthermore, various operations on the characters like uppercasing, lowercasing, and collating (sorting) are defined.
A Unicode character consists either of a single code point, or a base character (likeLATIN CAPITAL LETTER A
), followed by one or more modifiers (likeCOMBINING ACUTE ACCENT
). This sequence of base character and modifiers is called a combining character sequence.
Whether to call these combining character sequences "characters" depends on your point of view. If you are a programmer, you probably would tend towards seeing each element in the sequences as one unit, or "character". The whole sequence could be seen as one "character", however, from the user's point of view, since that's probably what it looks like in the context of the user's language.
With this "whole sequence" view of characters, the total number of characters is open-ended. But in the programmer's "one unit is one character" point of view, the concept of "characters" is more deterministic.
In this document, we take that second point of view: one "character" is one Unicode code point, be it a base character or a combining character.
For some combinations, there are precomposed characters.
LATIN CAPITAL LETTER A WITH ACUTE
, for example, is defined as a single code
point.
These precomposed characters are, however, only available for some combinations, and are mainly meant to support round-trip conversions between Unicode and legacy standards (like the ISO 8859).
In the general case, the composing method is more extensible. To support conversion between different compositions of the characters, various normalization forms to standardize representations are also defined.
Because of backward compatibility with legacy encodings, the "a unique number for every character" idea breaks down a bit: instead, there is "at least one number for every character".
The same character could be represented differently in several legacy encodings.
The converse is also not true: some code points do not have an assigned character.
A common myth about Unicode is that it would be "16-bit", that is, Unicode is only represented as 0x10000 (or 65536) characters from 0x0000 to 0xFFFF . This is untrue. Since Unicode 2.0 (July 1996), Unicode has been defined all the way up to 21 bits (0x10FFFF ), and since Unicode 3.1 (March 2001), characters have been defined beyond 0xFFFF . The first 0x10000 characters are called the Plane 0, or the Basic Multilingual Plane (BMP). With Unicode 3.1, 17 (yes, seventeen) planes in all were defined-but they are nowhere near full of defined characters, yet.
Another myth is that the 256-character blocks have something to do with languages-that each block would define the characters used by a language or a set of languages. This is also untrue. The division into blocks exists, but it is almost completely accidental-an artifact of how the characters have been and still are allocated. Instead, there is a concept called scripts, which is more useful: there is
and so on. Scripts usually span varied parts of several blocks. For further information see Unicode::UCD:
pl@nereida:~/Lperltesting$ perl5.10.1 -wdE 0 main::(-e:1): 0 DB<1> use Unicode::UCD qw{charinfo charscripts} DB<2> x charinfo(0x41) 0 HASH(0xc69a88) 'bidi' => 'L' 'block' => 'Basic Latin' 'category' => 'Lu' 'code' => 0041 'combining' => 0 'comment' => '' 'decimal' => '' 'decomposition' => '' 'digit' => '' 'lower' => 0061 'mirrored' => 'N' 'name' => 'LATIN CAPITAL LETTER A' 'numeric' => '' 'script' => 'Latin' 'title' => '' 'unicode10' => '' 'upper' => '' DB<3> x @{charscripts()->{Greek}}[0..3] 0 ARRAY(0xd676a8) 0 880 1 883 2 'Greek' 1 ARRAY(0xd86300) 0 885 1 885 2 'Greek' 2 ARRAY(0xd6c718) 0 886 1 887 2 'Greek' 3 ARRAY(0xd6c790) 0 890 1 890 2 'Greek'
The Unicode code points are just abstract numbers. To input and output these abstract numbers, the numbers must be encoded or serialised somehow. Unicode defines several character encoding forms, of which UTF-8 is perhaps the most popular. UTF-8 is a variable length encoding that encodes Unicode characters as 1 to 6 bytes (only 4 with the currently defined characters). Other encodings include UTF-16 and UTF-32 and their big- and little-endian variants (UTF-8 is byte-order independent) The ISO/IEC 10646 defines the UCS-2 and UCS-4 encoding forms.
Considere el siguiente programa:
lhp@nereida:~/Lperl/src/testing$ cat -n useutf8_1.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 4 my $x = 'áéíóúñ€'; 5 print "$x\n"; 6 print length($x)."\n";Cuando lo ejecutamos obtenemos la salida:
lhp@nereida:~/Lperl/src/testing$ useutf8_1.pl áéíóúñ€ 15Perl tiene dos modos de procesamiento de datos: el modo byte y el modo carácter. El modo por defecto es el modo byte. Este modo es conveniente cuando se trabaja con ficheros binarios (p. ej. una imagen JPEG) y con texto codificado con un código que requiere un sólo byte por carácter como es el caso de Latin 1.
En efecto, la cadena 'áéíóúñ€'
- que es una cadena unicode codificada en UTF-8 -
tiene una longitud de 15 bytes. El asunto es que
no es lo mismo la longitud en bytes que la longitud en caracteres cuando nos salimos de ASCII
y Latin1. Si queremos
que length devuelva la longitud en caracteres usemos utf8 :
lhp@nereida:~/Lperl/src/testing$ cat -n useutf8_2.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 5 my $x = 'áéíóúñ€'; 6 print "$x\n"; 7 print length($x)."\n";Al ejecutar obtenemos la longitud en caracteres:
lhp@nereida:~/Lperl/src/testing$ useutf8_2.pl Wide character in print at ./useutf8_2.pl line 6. áéíóúñ€ 7Ahora
length
retorna la longitud en caracteres.
Obsérvese el mensaje de advertencia. Si queremos asegurar el buen funcionamiento
de la salida por STDOUT
con caracteres codificados en UTF-8
debemos
llamar a binmode sobre STDOUT
con la capa ':utf8'
:
lhp@nereida:~/Lperl/src/testing$ cat -n useutf8_3.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 binmode(STDOUT, ':utf8'); 5 6 my $x = 'áéíóúñ€'; 7 print "$x\n"; 8 print length($x)."\n";El mensaje de advertencia desaparece:
lhp@nereida:~/Lperl/src/testing$ useutf8_3.pl áéíóúñ€ 7Usando la opción
-C
del intérprete Perl se puede conseguir el mismo resultado:
lhp@nereida:~/Lperl/src/testing$ perl useutf8_1.pl áéíóúñ€ 15 lhp@nereida:~/Lperl/src/testing$ perl -Mutf8 -COE useutf8_1.pl áéíóúñ€ 7
Lea perldoc
perlrun
para mas información sobre estas
opciones:
As of 5.8.1, the "-C" can be followed either by a number or a list of option letters. The letters, their numeric values, and effects are as follows; listing the letters is equal to summing the numbers. I 1 STDIN is assumed to be in UTF-8 O 2 STDOUT will be in UTF-8 E 4 STDERR will be in UTF-8 S 7 I + O + E i 8 UTF-8 is the default PerlIO layer for input streams o 16 UTF-8 is the default PerlIO layer for output streams D 24 i + o A 32 the @ARGV elements are expected to be strings encoded in UTF-8 L 64 normally the "IOEioA" are unconditional, the L makes them conditional on the locale environment variables (the LC_ALL, LC_TYPE, and LANG, in the order of decreasing precedence) -- if the variables indicate UTF-8, then the selected "IOEioA" are in effect
En Perl las cadenas tienen un flag que indica si la representación interna
de la cadena es utf-8.
La función is_utf8
de utf8
permite conocer
si una cadena esta almacenada internamente como utf-8:
pl@nereida:~/Lperltesting$ cat -n is_utf8.pl 1 #!/usr/local/lib/perl/5.10.1/bin/perl5.10.1 -w -COE 2 use strict; 3 use utf8; 4 5 my $x = 'áéíóúñ€'; 6 my $y = 'abc'; 7 my $z = 'αβγδη'; 8 print "$x is utf8\n" if utf8::is_utf8($x); 9 print "$y is utf8\n" if utf8::is_utf8($y); 10 print "$z is utf8\n" if utf8::is_utf8($z);Al ejecutar produce la salida:
pl@nereida:~/Lperltesting$ ./is_utf8.pl áéíóúñ€ is utf8 αβγδη is utf8
La documentación de vim sobre modo Multi-byte support relativa a unicode dice:
Useful commands:
ga
shows the decimal, hexadecimal and octal value of the character under
the cursor. If there are composing characters these are shown too. (If the
message is truncated, use ":messages").
g8
shows the bytes used in a UTF-8 character, also the composing
characters, as hex numbers.
:set encoding=utf-8 fileencodings=
forces using UTF-8 for all files. The
default is to use the current locale for 'encoding' and set 'fileencodings'
to automatically detect the encoding of a file.
....
If your current locale is in an utf-8 encoding, Vim will automatically start in utf-8 mode.
If you are using another locale:
set encoding=utf-8
En nuestro caso, tenemos las locale usando utf-8:
casiano@millo:~/Lperltesting$ locale LANG=es_ES.UTF-8 LC_CTYPE="es_ES.UTF-8" LC_NUMERIC="es_ES.UTF-8" LC_TIME="es_ES.UTF-8" LC_COLLATE="es_ES.UTF-8" LC_MONETARY="es_ES.UTF-8" LC_MESSAGES="es_ES.UTF-8" LC_PAPER="es_ES.UTF-8" LC_NAME="es_ES.UTF-8" LC_ADDRESS="es_ES.UTF-8" LC_TELEPHONE="es_ES.UTF-8" LC_MEASUREMENT="es_ES.UTF-8" LC_IDENTIFICATION="es_ES.UTF-8" LC_ALL=
Hay varias formas de crear ficheros Unicode en lenguajes fuera del rango del latin1 con vim.
Los caracteres unicode en la línea 3 del siguiente fichero
han sido generados en vim
insertandolos
mediante su codificación usando la secuencia CTRL-V u hexcode
.
lhp@nereida:~/Lperl/src/testing$ cat -n utf8file.txt 1 áéíóúñÑ 2 àèìòùÇç 3 ェッニは大きEn concreto los códigos creo que fueron:
30a7
, 30c3
, 30cb
, 306f
, 5927
y 304d
.
pl@nereida:~/Lperltesting$ perl5.10.1 -C7 -E 'say chr($_) for (0x30a7, 0x30c3, 0x30cb, 0x306f, 0x5927, 0x304d)' ェ ッ ニ は 大 きUna forma mas cómoda de insertar caracteres Unicode en vim es usar keymaps :
:echo globpath(&rtp, "keymap/*.vim")Para entender el comando anterior hay que tener en cuenta que:
globpath({path}, {expr} [, {flag}])
y realiza un glob de {expr}
sobre la lista de directorios
en {path}
.
{expr1} ..
muestra los valores
de {expr1}
, .. separados por espacios.
Esto mostrará algo como:
/usr/share/vim/vim70/keymap/accents.vim /usr/share/vim/vim70/keymap/arabic.vim /usr/share/vim/vim70/keymap/arabic_utf-8.vim /usr/share/vim/vim70/keymap/bulgarian.vim /usr/share/vim/vim70/keymap/canfr-win.vim /usr/share/vim/vim70/keymap/czech.vim /usr/share/vim/vim70/keymap/czech_utf-8.vim /usr/share/vim/vim70/keymap/esperanto.vim /usr/share/vim/vim70/keymap/esperanto_utf-8.vim /usr/share/vim/vim70/keymap/greek.vim /usr/share/vim/vim70/keymap/greek_cp1253.vim /usr/share/vim/vim70/keymap/greek_cp737.vim /usr/share/vim/vim70/keymap/greek_iso-8859-7.vim /usr/share/vim/vim70/keymap/greek_utf-8.vim ....Como se ve el convenio de nombres para los keymaps es:
<language>_<encoding>.vimSigue un ejemplo de fichero de keymap:
$ cat -n /usr/share/vim/vim70/keymap/greek_utf-8.vim 1 " Vim Keymap file for greek 2 " Maintainer: Panagiotis Louridas <louridas@acm.org> 3 " Last Updated: Thu Mar 23 23:45:02 EET 2006 4 ....................................................................... 72 let b:keymap_name = "grk" 73 loadkeymap 74 " PUNCTUATION MARKS - SYMBOLS (GREEK SPECIFIC) 75 " 76 E$ <char-0x20AC> " EURO SIGN ............................................................................ 115 " 116 " GREEK LETTERS 117 " 118 A <char-0x0391> " GREEK CAPITAL LETTER ALPHA 119 B <char-0x0392> " GREEK CAPITAL LETTER BETA 120 G <char-0x0393> " GREEK CAPITAL LETTER GAMMA 121 D <char-0x0394> " GREEK CAPITAL LETTER DELTA 122 E <char-0x0395> " GREEK CAPITAL LETTER EPSILON 123 Z <char-0x0396> " GREEK CAPITAL LETTER ZETA
:set keymap=greekCuando estamos en modo inserción podemos conmutar entre los dos keymaps tecleando
CTRL-^.o bien
CTRL-6.
vim
:
:set encoding encoding=utf-8Es posible cambiar la codificación con la que se está editando:
:set encoding latin1Esto no modifica la codificación del fichero.
:help
mbyte.txt
:help
mbyte-keymap
Use la forma con tres argumentos de open
y especifique
la capa :utf8
para que la entrada/salida a ese fichero
se procesada por dicha capa. Por ejemplo:
lhp@nereida:~/Lperl/src/testing$ cat -n abreutf8.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 binmode(STDOUT, "utf8"); 4 open my $f, '<:utf8', shift(); 5 my @a = <$f>; 6 chomp(@a); 7 print "$_ tiene longitud ".length($_)."\n" for @a;
Al ejecutar produce una salida como esta:
lhp@nereida:~/Lperl/src/testing$ abreutf8.pl tutu ジジェッニgfは大好あき tiene longitud 14 αβγεφγη tiene longitud 7 νμοπ;ρ^αβψδε tiene longitud 12 & ασηφδξδξδη tiene longitud 12 abc tiene longitud 3 αβγδ&αβψ tiene longitud 8
El módulo charnames
facilita la introducción de caracteres unicode:
lhp@nereida:~/Lperl/src/testing$ cat -n alfabeta.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use charnames qw{:full greek hebrew katakana}; 4 binmode(STDOUT, ':utf8'); 5 6 print "\N{alpha}+\N{beta} = \N{pi}\n"; 7 print "\N{alef} es la primera letra del alfabeto hebreo\n"; 8 print "Un poco de Katakana: \N{sa}\N{i}\N{n}\N{mo}\n"; 9 10 # Usando el nombre completo definido en el Standard Unicode 11 print "Hello \N{WHITE SMILING FACE}\n";Cuando se ejecuta produce una salida como:
lhp@nereida:~/Lperl/src/testing$ alfabeta.pl α+β = ピ א es la primera letra del alfabeto hebreo Un poco de Katakana: サインモ Hello ☺Obsérvese como la salida para
\N{pi}
no muestra la letra griega
π
sino el correspondiente símbolo Katakana ピ
: atención a las
colisiones entre alfabetos.
Las funciones viacode
y vianame
son recíprocas
y nos dan la relación nombre-código de un carácter:
pl@nereida:~/Lperltesting$ perl5.10.1 -COE -Mutf8 -dE 0 main::(-e:1): 0 DB<1> use charnames ':full' DB<2> print charnames::viacode(0x2722) FOUR TEARDROP-SPOKED ASTERISK DB<3> printf "%04X", charnames::vianame("FOUR TEARDROP-SPOKED ASTERISK") 2722
Usando utf8
es posible usar operadores
como tr
y expresiones regulares sobre cadenas UTF-8:
lhp@nereida:~/Lperl/src/testing$ cat -n useutf8.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 binmode(STDOUT, ':utf8'); 5 6 my $x = 'áéíóúñ€'; 7 print "$x\n"; 8 print length($x)."\n"; 9 10 my$y = $x; 11 $y =~ tr/áéíóúñ€/aeioun$/; 12 print "$y\n"; 13 14 $y = $x; 15 $y =~ m/áéíóúñ(€)/; 16 print "$1\n";Al ejecutar, este programa produce la salida:
lhp@nereida:~/Lperl/src/testing$ useutf8.pl áéíóúñ€ 7 aeioun$ €
Macros como \d
han sido generalizadas.
Los digitos Devanagari tienen códigos
del 2406 (0x966) al 2415 (0x96F):
lhp@nereida:~/Lperl/src/testing$ unicode -x 966..96f | egrep '096|\.0' .0 .1 .2 .3 .4 .5 .6 .7 .8 .9 .A .B .C .D .E .F 096. ॠ ॡ । ॥ ० १ २ ३ ४ ५ ६ ७ ८ ९El siguiente ejemplo muestra que expresiones regulares como
\d+
reconocen los digitos Devanagari:
lhp@nereida:~/Lperl/src/testing$ cat -n regexputf8.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 binmode(STDOUT, "utf8"); 4 use utf8; 5 6 # Digitos Devanagari del 0 al 9 7 my @dd = map { chr } (2406..2415); 8 my $x = join '+', @dd; 9 print "La interpolación ocurre: $x\n"; 10 my @number = $x =~ m{(\d+)}g; 11 print "Las expresiones regulares funcionan: @number\n"; 12 print "Sin embargo la conversión numérica no es automática: ".($number[0]+$number[1])."\n";
Como se indica en el programa la conversión automática de dígitos en otros juegos de caracteres no funciona. Véase la ejecución:
lhp@nereida:~/Lperl/src/testing$ regexputf8.pl La interpolación ocurre: ०+१+२+३+४+५+६+७+८+९ Las expresiones regulares funcionan: ० १ २ ३ ४ ५ ६ ७ ८ ९ Argument "\x{967}" isn't numeric in addition (+) at ./regexputf8.pl line 12. Argument "\x{966}" isn't numeric in addition (+) at ./regexputf8.pl line 12. Sin embargo la conversión numérica no es automática: 0
Lo mismo ocurre con la macro\w
:
lhp@nereida:~/Lperl/src/testing$ cat -n words_utf8.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 use charnames qw{greek}; 5 binmode(STDOUT, ':utf8'); 6 7 my $x = 'áéíóúñ€αβγδη'; 8 my @w = $x =~ /(\w)/g; 9 print "@w\n"; lhp@nereida:~/Lperl/src/testing$ words_utf8.pl á é í ó ú ñ α β γ δ η
Cuando se procesan datos codificados en UTF-8 el punto casa
con un carácter UTF-8. La macro \C
puede ser
utilizada para casar un byte:
pl@nereida:~/Lperltesting$ cat -n dot_utf8.pl 1 #!/usr/local/lib/perl/5.10.1/bin/perl5.10.1 -w -COE 2 use v5.10; 3 use strict; 4 use utf8; 5 6 my $x = 'αβγδεφ'; 7 my @w = $x =~ /(.)/g; 8 say "@w"; 9 10 my @v = map { ord } $x =~ /(\C)/g; 11 say "@v"; pl@nereida:~/Lperltesting$ ./dot_utf8.pl α β γ δ ε φ 206 177 206 178 206 179 206 180 206 181 207 134
El mismo efecto de \C
puede lograrse mediante el
pragma use bytes
el cual cambia la semántica de caracteres
a bytes:
lhp@nereida:~/Lperl/src/testing$ cat -n dot_utf8_2.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 use charnames qw{greek}; 5 6 binmode(STDOUT, ':utf8'); 7 8 my $x = 'αβγδεφ'; 9 10 my @w = $x =~ /(.)/g; 11 print "@w\n"; 12 13 { 14 use bytes; 15 my @v = map { ord } $x =~ /(.)/g; 16 print "@v\n"; 17 }
El siguiente ejemplo ilustra el uso de las funciones de cambio de caja (tales como uc , lc , lcfirst y ucfirst ) asi como el uso de reverse con cadenas unicode:
lhp@nereida:~/Lperl/src/testing$ cat -n alfabeta1.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 use charnames qw{greek}; 5 binmode(STDOUT, ':utf8'); 6 7 my $x = "\N{alpha}+\N{beta} = \N{pi}"; 8 print uc($x)."\n"; 9 print scalar(reverse($x))."\n"; 10 11 my $y = "áéíóúñ"; 12 print uc($y)."\n"; 13 print scalar(reverse($y))."\n";Al ejecutarse, el programa produce la salida:
lhp@nereida:~/Lperl/src/testing$ alfabeta1.pl Α+Β = Π π = β+α ÁÉÍÓÚÑ ñúóíéá
El estandar Unicode declara que cadenas particulares de caracteres
pueden tener propiedades particulares y que una expresión regular puede
casar sobre esas propiedades utilizando la notación \p{...}
:
lhp@nereida:~/Lperl/src/testing$ cat -n properties.pl 1 #!/usr/local/bin/perl -w 2 use strict; 3 use utf8; 4 use charnames qw{greek}; 5 binmode(STDOUT, ':utf8'); 6 7 my @a = ('$', 'az', '£', 'α', '€', '¥'); 8 my $x = "@a\n"; 9 10 print /\p{CurrencySymbol}/? "$_ = Dinero!!\n" : "$_ : No hay dinero\n" for @a; 11 print /\p{Greek}/? "$_ = Griego\n" : "$_ : No es griego\n" for @a;Al ejecutar este script obtenemos:
lhp@nereida:~/Lperl/src/testing$ properties.pl $ = Dinero!! az : No hay dinero £ = Dinero!! α : No hay dinero € = Dinero!! ¥ = Dinero!! $ : No es griego az : No es griego £ : No es griego α = Griego € : No es griego ¥ : No es griego
El módulo Unicode::Properties permite obtener las propiedades de un carácter:
casiano@millo:~$ echo $PERL5LIB /soft/perl5lib/perl5_10_1/lib/:/soft/perl5lib/perl5_10_1/lib/perl5:/soft/perl5lib/perl5_10_1/share/perl/5.8.8/ casiano@millo:~$ perl5.10.1 -COE -Mutf8 -dE 0 main::(-e:1): 0 DB<1> use Unicode::Properties 'uniprops' DB<2> x uniprops ('☺'); # Unicode smiley face 0 'Alphabetic' 1 'Any' 2 'Assigned' 3 'IDContinue' 4 'IDStart' 5 'InLatin1Supplement' 6 'Latin' 7 'Lowercase'
Hay un buen número de utilidades de conversión
lhp@nereida:~/Lbook$ unicode 'hebrew letter alef' U+05D0 HEBREW LETTER ALEF UTF-8: d7 90 UTF-16BE: 05d0 Decimal: א א Category: Lo (Letter, Other) Bidi: R (Right-to-Left) U+FB2E HEBREW LETTER ALEF WITH PATAH UTF-8: ef ac ae UTF-16BE: fb2e Decimal: אַ אַ Category: Lo (Letter, Other) Bidi: R (Right-to-Left) Decomposition: 05D0 05B7 ...
iconv
para convertir ficheros latin1 a utf-8:
1 #!/usr/local/bin/perl -w 2 use strict; 3 use warnings; 4 for my $file (@ARGV) { 5 my $ifile = "$file.ISO_8859-15"; 6 7 system("cp $file $ifile"); 8 system("iconv -f ISO_8859-15 -t UTF-8 $ifile > $file"); 9 }
casiano@nereida:~/Lwiley_book_tracer/Coordinado$ paps --help Usage: paps [OPTION...] [text file] Help Options: -?, --help Show help options Application Options: --landscape Landscape output. (Default: portrait) --columns=NUM Number of columns output. (Default: 1) --font_scale=NUM Font scaling. (Default: 12) --family=FAMILY Pango FT2 font family. (Default: Monospace) --rtl Do rtl layout. --justify Do justify the lines. --paper=PAPER Choose paper size. Known paper sizes are legal, letter, a4. (Default: a4) --bottom-margin=NUM Set bottom margin. (Default: 36) --top-margin=NUM Set top margin. (Default: 36) --right-margin=NUM Set right margin. (Default: 36) --left-margin=NUM Set left margin. (Default: 36) --header Draw page header for each page.
perldoc perluniintro
)