dwww Home | Manual pages | Find package

PATGEN(1)                   General Commands Manual                   PATGEN(1)

NAME
       patgen - generate patterns for TeX hyphenation

SYNOPSIS
       patgen dictionary_file pattern_file patout_file translate_file

DESCRIPTION
       This  manual page is not meant to be exhaustive.  See also the Info file
       or manual Web2C: A TeX implementation available as part of the TeX  Live
       distribution or at http://tug.org/web2c.

       The  patgen  program  reads the dictionary_file containing a list of hy-
       phenated words and the pattern_file containing previously-generated pat-
       terns (if any) for a particular language  (not  a  complete  TeX  source
       file;  see  below),  and produces the patout_file with (previously- plus
       newly-generated) hyphenation patterns  for  that  language.  The  trans-
       late_file  defines  language specific values for the parameters left_hy-
       phen_min and right_hyphen_min used by TeX's  hyphenation  algorithm  and
       the  external  representation  of the lower and upper case version(s) of
       all `letters' of that language. Further details of the  pattern  genera-
       tion  process  such  as  hyphenation  levels and pattern lengths are re-
       quested interactively from the user's terminal. Optionally  patgen  cre-
       ates  a  new  dictionary  file pattmp.n showing the good and bad hyphens
       found by the generated patterns, where  n  is  the  highest  hyphenation
       level.

       The  patterns  generated  by patgen can be read by initex for use in hy-
       phenating words. For a real-life example of patgen's output, see $TEXMF-
       MAIN/tex/generic/hyphen/hyphen.tex, which contains the patterns TeX uses
       for English by default.  At some sites, patterns for (many)  other  lan-
       guages  may  be available, and the local tex programs may have them pre-
       loaded.

       All filenames must be complete; no adding of default extensions or  path
       searching is done.

FILE FORMATS
       Letters
           When  initex  digests hyphenation patterns, TeX first expands macros
           and the result must entirely consist of digits (hyphenation levels),
           dots (`.', edge of a word), and letters. In pattern files  for  non-
           English  languages  letters are often represented by macros or other
           expandable constructs.  For the purpose of  patgen  these  are  just
           character  sequences, subject to the condition that no such sequence
           is a prefix of another one.

       Dictionary file
           A dictionary file contains a weighted list of hyphenated words,  one
           word  per line starting in column 1. A digit in column 1 indicates a
           global word weight (initially =1) applicable to all following  words
           up  to  the  next global word weight. A digit at some intercharacter
           position indicates a weight for that position only.

           The hyphens in a word are indicated by `-', `*', or  `.'  (or  their
           replacements as defined in the translate file) for hyphens yet to be
           found,  `good'  hyphens (correctly found by the patterns), and `bad'
           hyphens (erroneously found by the patterns) respectively; when read-
           ing a dictionary file `*' is treated like `-' and `.' is ignored.

       Pattern file
           A pattern file contains only patterns in  the  format  above,  e.g.,
           from  a previous run of patgen.  It may not contain any TeX comments
           or control sequences.  For instance, this is  not  a  valid  pattern
           file:

           % this is a pattern file read by TeX.
           \patterns{%
            ...
           }
           It can only contain the actual patterns, i.e., the `...'.

       Translate file
           A  translate  file  starts  with  a  line  containing  the values of
           left_hyphen_min in columns 1-2, right_hyphen_min in columns 3-4, and
           either a blank or the replacement for one of the "hyphen" characters
           `-', `*', and `.' in columns 5, 6, and 7. (Input  lines  are  padded
           with blanks as for many TeX related programs.)

           Each  following  line  defines  one `letter': an arbitrary delimiter
           character in column 1, followed by one or more external  representa-
           tions  of  that  character (first the `lower' case one used for out-
           put), each one terminated by the delimiter and  the  whole  sequence
           terminated by another delimiter.

           If  the  translate  file  is  empty,  the  values left_hyphen_min=2,
           right_hyphen_min=3, and the 26 lower case letters a...z  with  their
           upper case representations A...Z are assumed.

       Terminal input
           After  reading  the translate_file and any previously-generated pat-
           terns from pattern_file, patgen requests input from the user's  ter-
           minal.

           First  the  integer values of hyph_start and hyph_finish, the lowest
           and highest hyphenation level for which patterns are  to  be  gener-
           ated.  The value of hyph_start should be larger than any hyphenation
           level already present in pattern_file.

           Then, for each hyphenation level, the integer  values  of  pat_start
           and  pat_finish,  the smallest and largest pattern length to be ana-
           lyzed, as well as  good  weight,  bad  weight,  and  threshold,  the
           weights  for  good and bad hyphens and a weight threshold for useful
           patterns.

           Finally the decision (`y' or `Y' vs. anything else) whether  or  not
           to produce a hyphenated word list.

FILES
       $TEXMFMAIN/tex/generic/hyphen/hyphen.tex
           The  original  hyphenation patterns for English, by Donald Knuth and
           Frank Liang.

       http://www.ctan.org/pkg/ushyph
           Additional hyphenation patterns  for  English,  extended  by  Gerard
           Kuiken.

       http://www.ctan.org/pkg/hyph-utf8
           Collected hyphenation patterns for many languages in many formats.

       http://www.ctan.org/tex-archive/language/
           General  CTAN directory for patterns and support for many other lan-
           guages.

SEE ALSO
       Frank Liang and Peter Breitenlohner, patgen.web.

       Frank Liang, Word hy-phen-a-tion by com-puter, STAN-CS-83-977,  Stanford
       University Ph.D. thesis, 1983, http://tug.org/docs/liang.

       Donald  E. Knuth, The TeXbook, Addison-Wesley, 1986, ISBN 0-201-13447-0,
       Appendix H.

AUTHORS
       Frank Liang wrote the first version of  this  program.   Peter  Breiten-
       lohner made a substantial revision in 1991 for TeX 3.  The first version
       was  published  as  the appendix to the TeXware technical report. Howard
       Trickey originally ported it to Unix.

Web2C 2025/dev                    16 June 2015                        PATGEN(1)

Generated by dwww version 1.16 on Tue Dec 16 06:17:35 CET 2025.