dwww Home | Manual pages | Find package

Locale::Recode(3pm)   User Contributed Perl Documentation   Locale::Recode(3pm)

NAME
       Locale::Recode - Object-Oriented Portable Charset Conversion

SYNOPSIS
         use Locale::Recode;

         $cd = Locale::Recode->new (from => 'UTF-8',
                                    to   => 'ISO-8859-1');

         die $cd->getError if $cd->getError;

         $cd->recode ($text) or die $cd->getError;

         $mime_name = Locale::Recode->resolveAlias ('latin-1');

         $supported = Locale::Recode->getSupported;

         $complete = Locale::Recode->getCharsets;

DESCRIPTION
       This module provides routines that convert textual data from one codeset
       to another in a portable way.  The module has been started before
       Encode(3) was written.  It's main purpose today is to provide charset
       conversion even when Encode(3) is not available on the system.  It
       should also work for older Perl versions without Unicode support.

       Internally Locale::Recode(3) will use Encode(3) whenever possible, to
       allow for a faster conversion and for a wider range of supported
       charsets, and will only fall back to the Perl implementation when
       Encode(3) is not available or does not support a particular charset that
       Locale::Recode(3) does.

       Locale::Recode(3) is part of libintl-perl, and it's main purpose is
       actually to implement a portable charset conversion framework for the
       message translation facilities described in Locale::TextDomain(3).

CONSTRUCTOR
       The constructor new() requires two named arguments:

       from
           The encoding of the original data.  Case doesn't matter, aliases are
           resolved.

       to  The  target  encoding.   Again, case doesn't matter, and aliases are
           resolved.

       The constructor will never fail.  In case  of  an  error,  the  object's
       internal  state  is set to bad and it will refuse to do any conversions.
       You can inquire the reason for the failure with the method getError().

OBJECT METHODS
       The following object methods are available.

       recode (STRING)
           Converts STRING  from  the  source  encoding  into  the  destination
           encoding.   In  case  of  success,  a truth value is returned, false
           otherwise.  You can inquire the reason  for  the  failure  with  the
           method getError().

       getError
           Returns  either  false  if the object is not in an error state or an
           error message.

CLASS METHODS
       The object provides some additional class methods:

       getSupported
           Returns a reference to a list of all supported charsets.   This  may
           implicitly    load    additional    Encode(3)    conversions    like
           Encode::HanExtra(3) which may  produce  considerable  load  on  your
           system.

           The  method is therefore not intended for regular use but rather for
           getting resp. displaying once a list of available encodings.

           The members of the list are all converted to uppercase!

       getCharsets
           Like getSupported() but also returns all available aliases.

SUPPORTED CHARSETS
       The range of supported  charsets  is  system-dependent.   The  following
       somewhat special charsets are always available:

       UTF-8
           UTF-8 is available independently of your Perl version.  For Perl 5.6
           or  better or in the presence of Encode(3), conversions are not done
           in Perl but with the interfaces provided by these  facilities  which
           are written in C, hence much faster.

           Encoding  data  into  UTF-8  is  fast,  even  if it is done in Perl.
           Decoding it in Perl may become quite slow.  If you  frequently  have
           to  decode  UTF-8 with Locale::Recode you will probably want to make
           sure that you do that with Perl 5.6 or beter, or  install  Encode(3)
           to speed up things.

       INTERNAL
           UTF-8  is  fast  to  write but hard to read for applications.  It is
           therefore not the worst for internal string representation  but  not
           far  from  that.   Locale::Recode(3)  stores strings internally as a
           reference to an  array  of  integer  values  like  most  programming
           languages (Perl is an exception) do, trading memory for performance.

           The  integer  values  are  the UCS-4 codes of the characters in host
           byte order.

           The encoding INTERNAL is directly  available  via  Locale::Recode(3)
           but of course you should not really use it for data exchange, unless
           you know what you are doing.

       Locale::Recode(3)  has native support for a plethora of other encodings,
       most of them 8 bit encodings that are fast  to  decode,  including  most
       encodings   used  on  popular  micros  like  the  ISO-8859-*  series  of
       encodings, most Windows-* encodings  (also  known  as  CP*),  Macintosh,
       Atari, etc.

NAMES AND ALIASES
       Each charset resp. encoding is available internally under a unique name.
       Whenever  the  information  was  available, the preferred MIME name (see
       <http://www.iana.org/assignments/character-sets/>)  was  chosen  as  the
       internal name.

       Alias  handling  is quite strict.  The module does not make wild guesses
       at what you mean ("What's the meaning of the acronym  JIS"  is  a  valid
       alias  for  "7bit-jis"  in  Encode(3) ....) but aims at providing common
       aliases only.  The same applies to so-called  aliases  that  are  really
       mistakes, like "utf8" for UTF-8.

       The module knows all aliases that are listed with the IANA character set
       registry (<http://www.iana.org/assignments/character-sets/>), plus those
       known to libiconv version 1.8, and a bunch of additional ones.

CONVERSION TABLES
       The  conversion tables have either been taken from official sources like
       the IANA or the Unicode Consortium, from  Bruno  Haible's  libiconv,  or
       from  the  sources of the GNU libc and the regression tests for libintl-
       perl will check for conformance here.   For  some  encodings  this  data
       differs from Encode(3)'s data which would cause these tests to fail.  In
       these  cases, the module will not invoke the Encode(3) methods, but will
       fall back to the internal implementation for the sake of consistency.

       The few encodings that are affected are so  simple  that  you  will  not
       experience  any real performance penalty unless you convert large chunks
       of data.  But the package is not really intended for  such  use  anyway,
       and  since  Encode(3)  is  relatively  new,  I  rather  think  that  the
       differences are bugs in Encode which will be fixed soon.

BUGS
       The module should  provide  fall  back  conversions  for  other  Unicode
       encoding schemes like UCS-2, UCS-4 (big- and little-endian).

       The  pure  Perl  UTF-8  decoder  will  not  always  handle corrupt UTF-8
       correctly, especially at the end and at the  beginning  of  the  string.
       This  is  not likely to be fixed, since the module's intention is not to
       be a consistency checker for UTF-8 data.

AUTHOR
       Copyright  (C)  2002-2017  Guido   Flohr   <http://www.guido-flohr.net/>
       (<mailto:guido.flohr@cantanea.com>),   all  rights  reserved.   See  the
       source code for details!code for details!

SEE ALSO
       Encode(3), iconv(3), iconv(1), recode(1), perl(1)

perl v5.40.0                       2025-02-15               Locale::Recode(3pm)

Generated by dwww version 1.16 on Tue Dec 16 05:23:30 CET 2025.