dwww Home | Manual pages | Find package

preconv(1)                  General Commands Manual                  preconv(1)

Name
       preconv - prepare files for typesetting with groff

Synopsis
       preconv [-dr] [-D fallback-encoding] [-e encoding] [file ...]

       preconv -h
       preconv --help

       preconv -v
       preconv --version

Description
       preconv  reads  each  file,  converts  its  encoded characters to a form
       ]8;;man:troff(1)\troff(1)]8;;\ can interpret, and sends the  result  to  the  standard  output
       stream.   Currently,  this means that code points in the range 0–127 (in
       US-ASCII, ISO 8859, or Unicode) remain as-is and the remainder are  con-
       verted  to  the groff special character form “\[uXXXX]”, where XXXX is a
       hexadecimal number of four to six digits corresponding to a Unicode code
       point.  By default, preconv also inserts a roff .lf request at  the  be-
       ginning of each file, identifying it for the benefit of later processing
       (including diagnostic messages); the -r option suppresses this behavior.

       In typical usage scenarios, preconv need not be run directly; instead it
       should  be  invoked  with  the  -k  or  -K options of groff.  If no file
       operands are given on the command line, or if file is “-”, the  standard
       input stream is read.

       preconv  tries  to find the input encoding with the following algorithm,
       stopping at the first success.

       1.  If the input encoding has been explicitly specified with option  -e,
           use it.

       2.  If  the  input  starts with a Unicode Byte Order Mark, determine the
           encoding as UTF-8, UTF-16, or UTF-32 accordingly.

       3.  If the input stream is seekable, check the first  and  second  input
           lines for a recognized GNU Emacs file-local variable identifying the
           character  encoding,  here  referred  to  as  the  “coding  tag” for
           brevity.  If found, use it.

       4.  If the input stream is seekable, and  if  the  uchardet  library  is
           available  on the system, use it to try to infer the encoding of the
           file.

       5.  If the -D option specifies an encoding, use it.

       6.  Use the encoding specified by the current locale (LC_CTYPE),  unless
           the  locale  is “C”, “POSIX”, or empty, in which case assume Latin-1
           (ISO 8859-1).

       The coding tag and uchardet methods in the above procedure rely  upon  a
       seekable input stream; when preconv reads from a pipe, the stream is not
       seekable,  and these detection methods are skipped.  If character encod-
       ing detection of your input files is unreliable, arrange for one of  the
       other methods to succeed by using preconv's -D or -e options, or by con-
       figuring   your   locale   appropriately.    groff   also   supports   a
       GROFF_ENCODING environment variable, which can be overridden by  its  -K
       option.  Valid values for (or parameters to) all of these are enumerated
       in  the  lists of recognized coding tags in the next subsection, and are
       further influenced by iconv library support.

   Coding tags
       Text editors that support more than a  single  character  encoding  need
       tags  within  the  input files to mark the file's encoding.  While it is
       possible to guess the right input encoding with the help  of  heuristics
       that  are  reliable  for a preponderance of natural language texts, they
       are not absolutely reliable.  Heuristics can fail on inputs that are too
       short or don't represent a natural language.

       Consequently,  preconv  supports  the  coding  tag  convention  used  by
       GNU  Emacs (with some restrictions).  This notation appears in specially
       marked regions of an input file designated for “file-local variables”.

       preconv interprets the following syntax if it occurs in a  roff  comment
       in  the first or second line of the input file.  Both “\"” and “\#” com-
       ment forms are recognized, but the control (or no-break control) charac-
       ter must be the default and must begin the line.  Similarly, the  escape
       character must be the default.
              -*- [...;] coding: encoding[; ...] -*-

       The  only  variable  preconv  interprets is “coding”, which can take the
       values listed below.

       The following list comprises all MIME “charset” parameter values  recog-
       nized, case-insensitively, by preconv.
              big5,  cp1047,  euc-jp,  euc-kr,  gb2312, iso-8859-1, iso-8859-2,
              iso-8859-5,  iso-8859-7,  iso-8859-9,  iso-8859-13,  iso-8859-15,
              koi8-r, us-ascii, utf-8, utf-16, utf-16be, utf-16le

       In addition, the following list of other coding tags is recognized, each
       of which is mapped to an appropriate value from the list above.
              ascii,   chinese-big5,  chinese-euc,  chinese-iso-8bit,  cn-big5,
              cn-gb,     cn-gb-2312,     cp878,      csascii,      csisolatin1,
              cyrillic-iso-8bit,  cyrillic-koi8,  euc-china, euc-cn, euc-japan,
              euc-japan-1990,   euc-korea,   greek-iso-8bit,    iso-10646/utf8,
              iso-10646/utf-8,     iso-latin-1,    iso-latin-2,    iso-latin-5,
              iso-latin-7, iso-latin-9, japanese-euc, japanese-iso-8bit,  jis8,
              koi8,  korean-euc,  korean-iso-8bit,  latin-0,  latin1,  latin-1,
              latin-2,  latin-5,  latin-7,  latin-9,  mule-utf-8,  mule-utf-16,
              mule-utf-16be,    mule-utf-16-be,   mule-utf-16be-with-signature,
              mule-utf-16le,   mule-utf-16-le,    mule-utf-16le-with-signature,
              utf8,             utf-16-be,            utf-16-be-with-signature,
              utf-16be-with-signature,   utf-16-le,   utf-16-le-with-signature,
              utf-16le-with-signature

       Trailing  “-dos”, “-unix”, and “-mac” suffixes on coding tags (which in-
       dicate the end-of-line convention used in the file) are disregarded  for
       the purpose of comparison with the above tags.

   iconv support
       While  preconv recognizes all of the coding tags listed above, it is ca-
       pable on its own of interpreting only  three  encodings:  Latin-1,  code
       page  1047,  and  UTF-8.  If iconv support is configured at compile time
       and available at run time, all others are passed to iconv library  func-
       tions,  which  may recognize many additional encoding strings.  The com-
       mand “preconv -v” discloses whether iconv support is configured.

       The use of iconv means that characters in the input that encode  invalid
       code  points  for that encoding may be dropped from the output stream or
       mapped to the Unicode replacement character (U+FFFD).  Compare the  fol-
       lowing  examples  using the input “café” (note the “e” with an acute ac-
       cent), which due to its short length challenges inference of the  encod-
       ing used.
              printf 'caf\351\n' | LC_ALL=en_US.UTF-8 preconv
              printf 'caf\351\n' | preconv -e us-ascii
              printf 'caf\351\n' | preconv -e latin-1
       The  fate  of  the  accented  “e”  differs  in each case.  In the first,
       uchardet fails to detect an encoding (though the library on your  system
       may  behave  differently) and preconv falls back to the locale settings,
       where octal 351 starts an incomplete UTF-8 sequence and results  in  the
       Unicode replacement character.  In the second, it is not a representable
       character in the declared input encoding of US-ASCII and is discarded by
       iconv.  In the last, it is correctly detected and mapped.

   Limitations
       preconv  cannot  perform any transformation on input that it cannot see.
       Examples include files that are interpolated by preprocessors  that  run
       subsequently,  including  ]8;;man:soelim(1)\soelim(1)]8;;\;  files  included  by  troff  itself
       through “so” and similar requests;  and  string  definitions  passed  to
       troff through its -d command-line option.

       preconv  assumes  that  its  input  uses the default escape character, a
       backslash \, and writes special character escape sequences accordingly.

Options
       -h and --help display a usage message, while -v and --version show  ver-
       sion information; all exit afterward.

       -d     Emit debugging messages to the standard error stream.

       -D fallback-encoding
              Report fallback-encoding if all detection methods fail.

       -e encoding
              Skip detection and assume encoding; see groff's -K option.

       -r     Write files “raw”; do not add .lf requests.

See also
       ]8;;man:groff(1)\groff(1)]8;;\, ]8;;man:iconv(3)\iconv(3)]8;;\, ]8;;man:locale(7)\locale(7)]8;;\

groff 1.23.0                      3 June 2025                        preconv(1)

Generated by dwww version 1.16 on Tue Dec 16 04:09:23 CET 2025.