dwww Home | Manual pages | Find package

MIME::Charset(3pm)    User Contributed Perl Documentation    MIME::Charset(3pm)

NAME
       MIME::Charset - Charset Information for MIME

SYNOPSIS
           use MIME::Charset:

           $charset = MIME::Charset->new("euc-jp");

       Getting charset information:

           $benc = $charset->body_encoding; # e.g. "Q"
           $cset = $charset->as_string; # e.g. "US-ASCII"
           $henc = $charset->header_encoding; # e.g. "S"
           $cset = $charset->output_charset; # e.g. "ISO-2022-JP"

       Translating text data:

           ($text, $charset, $encoding) =
               $charset->header_encode(
                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
                  Charset => 'euc-jp');
           # ...returns e.g. (<converted>, "ISO-2022-JP", "B").

           ($text, $charset, $encoding) =
               $charset->body_encode(
                   "Collectioneur path\xe9tiquement ".
                   "\xe9clectique de d\xe9chets",
                   Charset => 'latin1');
           # ...returns e.g. (<original>, "ISO-8859-1", "QUOTED-PRINTABLE").

           $len = $charset->encoded_header_len(
               "Perl\xe8\xa8\x80\xe8\xaa\x9e",
               Charset => 'utf-8',
               Encoding => "b");
           # ...returns e.g. 28.

       Manipulating module defaults:

           MIME::Charset::alias("csEUCKR", "euc-kr");
           MIME::Charset::default("iso-8859-1");
           MIME::Charset::fallback("us-ascii");

       Non-OO functions (may be deprecated in near future):

           use MIME::Charset qw(:info);

           $benc = body_encoding("iso-8859-2"); # "Q"
           $cset = canonical_charset("ANSI X3.4-1968"); # "US-ASCII"
           $henc = header_encoding("utf-8"); # "S"
           $cset = output_charset("shift_jis"); # "ISO-2022-JP"

           use MIME::Charset qw(:trans);

           ($text, $charset, $encoding) =
               header_encode(
                  "\xc9\xc2\xc5\xaa\xc0\xde\xc3\xef\xc5\xaa".
                  "\xc7\xd1\xca\xaa\xbd\xd0\xce\xcf\xb4\xef",
                  "euc-jp");
           # ...returns (<converted>, "ISO-2022-JP", "B");

           ($text, $charset, $encoding) =
               body_encode(
                   "Collectioneur path\xe9tiquement ".
                   "\xe9clectique de d\xe9chets",
                   "latin1");
           # ...returns (<original>, "ISO-8859-1", "QUOTED-PRINTABLE");

           $len = encoded_header_len(
               "Perl\xe8\xa8\x80\xe8\xaa\x9e", "b", "utf-8"); # 28

DESCRIPTION
       MIME::Charset provides information about character sets used for MIME
       messages on Internet.

   Definitions
       The charset is ``character set'' used in MIME to refer to a method of
       converting a sequence of octets into a sequence of characters.  It
       includes both concepts of ``coded character set'' (CCS) and ``character
       encoding scheme'' (CES) of ISO/IEC.

       The encoding is that used in MIME to refer to a method of representing a
       body part or a header body as sequence(s) of printable US-ASCII
       characters.

   Constructor
       $charset = MIME::Charset->new([CHARSET [, OPTS]])
           Create charset object.

           OPTS    may   accept   following   key-value   pair.    NOTE:   When
           Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
           will not be performed.  So this option do not have any effects.

           Mapping => MAPTYPE
               Whether to extend mappings actually used for  charset  names  or
               not.    "EXTENDED"  uses  extended  mappings.   "STANDARD"  uses
               standardized strict mappings.  Default is "EXTENDED".

   Getting Information of Charsets
       $charset->body_encoding
       body_encoding CHARSET
           Get recommended transfer-encoding of CHARSET for message body.

           Returned value will be one of "B" (BASE64), "Q"  (QUOTED-PRINTABLE),
           "S"  (shorter  one  of  either)  or  "undef" (might not be transfer-
           encoded; either 7BIT or 8BIT).  This may not be same as encoding for
           message header.

       $charset->as_string
       canonical_charset CHARSET
           Get canonical name for charset.

       $charset->decoder
           Get "Encode::Encoding"  object  to  decode  strings  to  Unicode  by
           charset.   If  charset is not specified or not known by this module,
           undef will be returned.

       $charset->dup
           Get a copy of charset object.

       $charset->encoder([CHARSET])
           Get  "Encode::Encoding"  object  to  encode  Unicode  string   using
           compatible charset recommended to be used for messages on Internet.

           If  optional  CHARSET  is  specified,  replace  encoder  (and output
           charset name) of $charset object with those of  CHARSET,  therefore,
           $charset object will be a converter between original charset and new
           CHARSET.

       $charset->header_encoding
       header_encoding CHARSET
           Get recommended encoding scheme of CHARSET for message header.

           Returned  value will be one of "B", "Q", "S" (shorter one of either)
           or "undef" (might not be encoded).  This may not be same as encoding
           for message body.

       $charset->output_charset
       output_charset CHARSET
           Get a  charset  which  is  compatible  with  given  CHARSET  and  is
           recommended to be used for MIME messages on Internet (if it is known
           by this module).

           When  Unicode/multibyte support is disabled (see "USE_ENCODE"), this
           function will simply return the result of "canonical_charset".

   Translating Text Data
       $charset->body_encode(STRING [, OPTS])
       body_encode STRING, CHARSET [, OPTS]
           Get converted (if needed) data of STRING and  recommended  transfer-
           encoding  of  that data for message body.  CHARSET is the charset by
           which STRING is encoded.

           OPTS   may   accept   following   key-value   pairs.    NOTE:   When
           Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
           will not be performed.  So these options do not have any effects.

           Detect7bit => YESNO
               Try  auto-detecting  7-bit  charset  when  CHARSET is not given.
               Default is "YES".

           Replacement => REPLACEMENT
               Specifies error handling scheme.  See "Error Handling".

           3-item list of (converted  string,  charset  for  output,  transfer-
           encoding)  will  be  returned.   Transfer-encoding  will  be  either
           "BASE64", "QUOTED-PRINTABLE", "7BIT"  or  "8BIT".   If  charset  for
           output  could  not  be determined and converted string contains non-
           ASCII byte(s), charset for output  will  be  "undef"  and  transfer-
           encoding will be "BASE64".  Charset for output will be "US-ASCII" if
           and only if string does not contain any non-ASCII bytes.

       $charset->decode(STRING [,CHECK])
           Decode STRING to Unicode.

           Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
           this function will die.

       detect_7bit_charset STRING
           Guess  7-bit  charset  that  may  encode a string STRING.  If STRING
           contains any 8-bit bytes,  "undef"  will  be  returned.   Otherwise,
           Default Charset will be returned for unknown charset.

       $charset->encode(STRING [, CHECK])
           Encode  STRING  (Unicode  or  non-Unicode)  using compatible charset
           recommended to be used for messages  on  Internet  (if  this  module
           knows it).  Note that string will be decoded to Unicode then encoded
           even if compatible charset was equal to original charset.

           Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
           this function will die.

       $charset->encoded_header_len(STRING [, ENCODING])
       encoded_header_len STRING, ENCODING, CHARSET
           Get length of encoded STRING for message header (without folding).

           ENCODING may be one of "B", "Q" or "S" (shorter one of either "B" or
           "Q").

       $charset->header_encode(STRING [, OPTS])
       header_encode STRING, CHARSET [, OPTS]
           Get  converted  (if  needed) data of STRING and recommended encoding
           scheme of that data for message headers.  CHARSET is the charset  by
           which STRING is encoded.

           OPTS   may   accept   following   key-value   pairs.    NOTE:   When
           Unicode/multibyte support is disabled (see "USE_ENCODE"), conversion
           will not be performed.  So these options do not have any effects.

           Detect7bit => YESNO
               Try auto-detecting 7-bit charset  when  CHARSET  is  not  given.
               Default is "YES".

           Replacement => REPLACEMENT
               Specifies error handling scheme.  See "Error Handling".

           3-item  list  of  (converted  string,  charset  for output, encoding
           scheme) will be returned.  Encoding scheme will be either  "B",  "Q"
           or  "undef" (might not be encoded).  If charset for output could not
           be determined  and  converted  string  contains  non-ASCII  byte(s),
           charset  for  output  will be "8BIT" (this is not charset name but a
           special value to represent unencodable  data)  and  encoding  scheme
           will be "undef" (should not be encoded).  Charset for output will be
           "US-ASCII"  if  and  only  if  string does not contain any non-ASCII
           bytes.

       $charset->undecode(STRING [,CHECK])
           Encode Unicode string STRING to byte  string  by  input  charset  of
           $charset.  This is equivalent to "$charset->decoder->encode()".

           Note: When Unicode/multibyte support is disabled (see "USE_ENCODE"),
           this function will die.

   Manipulating Module Defaults
       alias ALIAS [, CHARSET]
           Get/set   charset   alias   for   canonical   names   determined  by
           "canonical_charset".

           If CHARSET is given and isn't false, ALIAS will be  assigned  as  an
           alias  of  CHARSET.   Otherwise,  alias  won't  be changed.  In both
           cases, current charset name that ALIAS is assigned will be returned.

       default [CHARSET]
           Get/set default charset.

           Default charset is used by  this  module  when  charset  context  is
           unknown.   Modules  using  this  module  are recommended to use this
           charset when charset context  is  unknown  or  implicit  default  is
           expected.  By default, it is "US-ASCII".

           If  CHARSET  is  given  and  isn't  false, it will be set to default
           charset.  Otherwise, default charset  won't  be  changed.   In  both
           cases, current default charset will be returned.

           NOTE: Default charset should not be changed.

       fallback [CHARSET]
           Get/set fallback charset.

           Fallback  charset  is  used  by this module when conversion by given
           charset is failed and "FALLBACK" error handling scheme is specified.
           Modules using this module may use this charset  as  last  resort  of
           charset for conversion.  By default, it is "UTF-8".

           If  CHARSET  is  given  and  isn't false, it will be set to fallback
           charset.  If CHARSET is "NONE", fallback charset will be  undefined.
           Otherwise, fallback charset won't be changed.  In any cases, current
           fallback charset will be returned.

           NOTE: It is useful that "US-ASCII" is specified as fallback charset,
           since   result  of  conversion  will  be  readable  without  charset
           information.

       recommended CHARSET [, HEADERENC, BODYENC [, ENCCHARSET]]
           Get/set charset profiles.

           If optional arguments are given and  any  of  them  are  not  false,
           profiles  for  CHARSET  will  be set by those arguments.  Otherwise,
           profiles won't be changed.  In  both  cases,  current  profiles  for
           CHARSET  will  be  returned  as  3-item list of (HEADERENC, BODYENC,
           ENCCHARSET).

           HEADERENC is recommended encoding scheme for message header.  It may
           be one of "B", "Q", "S" (shorter one of either)  or  "undef"  (might
           not be encoded).

           BODYENC  is  recommended transfer-encoding for message body.  It may
           be one of "B", "Q", "S" (shorter one of either)  or  "undef"  (might
           not be transfer-encoded).

           ENCCHARSET  is  a charset which is compatible with given CHARSET and
           is recommended to  be  used  for  MIME  messages  on  Internet.   If
           conversion  is  not  needed (or this module doesn't know appropriate
           charset), ENCCHARSET is "undef".

           NOTE: This function in the future releases can accept more  optional
           arguments  (for example, properties to handle character widths, line
           folding behavior, ...).  So format of returned value may probably be
           changed.  Use "header_encoding", "body_encoding" or "output_charset"
           to get particular profile.

   Constants
       USE_ENCODE
           Unicode/multibyte support flag.  Non-empty string will be  set  when
           Unicode and multibyte support is enabled.  Currently, this flag will
           be  non-empty  on  Perl  5.7.3  or later and empty string on earlier
           versions of Perl.

   Error Handling
       "body_encode"  and  "header_encode"   accept   following   "Replacement"
       options:

       "DEFAULT"
           Put a substitution character in place of a malformed character.  For
           UCM-based encodings, <subchar> will be used.

       "FALLBACK"
           Try  "DEFAULT" scheme using fallback charset (see "fallback").  When
           fallback charset is undefined and conversion causes error, code will
           die on error with an error message.

       "CROAK"
           Code  will  die  on  error  immediately  with  an   error   message.
           Therefore,  you  should  trap the fatal error with eval{} unless you
           really want to let it die on error.  Synonym is "STRICT".

       "PERLQQ"
       "HTMLCREF"
       "XMLCREF"
           Use "FB_PERLQQ", "FB_HTMLCREF" or  "FB_XMLCREF"  scheme  defined  by
           Encode module.

       numeric values
           Numeric  values  are  also  allowed.  For more details see "Handling
           Malformed Data" in Encode.

       If  error  handling  scheme  is  not  specified  or  unknown  scheme  is
       specified, "DEFAULT" will be assumed.

   Configuration File
       Built-in   defaults   for   option   parameters  can  be  overridden  by
       configuration file: MIME/Charset/Defaults.pm.   For  more  details  read
       MIME/Charset/Defaults.pm.sample.

VERSION
       Consult $VERSION variable.

       Development    versions    of    this    module    may   be   found   at
       <http://hatuka.nezumi.nu/repos/MIME-Charset/>.

   Incompatible Changes
       Release 1.001
           •   new() method returns an object  when  CHARSET  argument  is  not
               specified.

       Release 1.005
           •   Restrict  characters  in  encoded-word  according  to  RFC  2047
               section  5   (3).    This   also   affects   return   value   of
               encoded_header_len() method.

       Release 1.008.2
           •   body_encoding() method may also returns "S".

           •   Return  value  of  body_encode()  method  for  UTF-8 may include
               "QUOTED-PRINTABLE" encoding item that in  earlier  versions  was
               fixed to "BASE64".

SEE ALSO
       Multipurpose Internet Mail Extensions (MIME).

AUTHOR
       Hatuka*nezumi - IKEDA Soji <hatuka(at)nezumi.nu>

COPYRIGHT
       Copyright  (C)  2006-2017  Hatuka*nezumi  - IKEDA Soji.  This program is
       free software; you can redistribute it and/or modify it under  the  same
       terms as Perl itself.

perl v5.34.0                       2022-10-13                MIME::Charset(3pm)

Generated by dwww version 1.16 on Tue Dec 16 07:21:57 CET 2025.