dwww Home | Manual pages | Find package

WWW(3)                      Library Functions Manual                     WWW(3)

NAME
       WWW - World Wide Web Package

SYNOPSIS
       extract_description( FILE )
       extract_meta( FILE, NAME )
       hyperlink( LIST )

DESCRIPTION
       This  package provides a utility functions for the World Wide Web to ex-
       tract descriptions of or meta  information  from  files,  and  hyperlink
       text.

SUBROUTINES
       The following Perl subroutines are defined and available:

       extract_description( FILE )
              Extracts  a  description from an HTML or plain text file given by
              the FILE name; FILE should be an absolute path.  The  first  $de-
              scription::chars  (default:  2048)  characters  are read.  If the
              file ends in one of the extensions htm, html,  or  shtml,  it  is
              presumed  to be an HTML file; if the file ends in txt, it is pre-
              sumed to be a plain text file.  Other extensions are  not  recog-
              nized and no description is returned for them.

              For  HTML  files,  first,  if  a  <META  NAME="description"  CON-
              TENT="...">  or  a  <META  NAME="DC.description"   CONTENT="...">
              (Dublin  Core)  element is found, then the words specified as the
              value of the CONTENT attribute is returned as the description.

              Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and
              <TITLE> tags, and all other HTML tags are stripped.  If <AREA ...
              ALT="..."> or <IMG ... ALT="..."> elements are  found,  then  the
              words specified as the value of the ALT attributes are extracted.

              Finally,  for  either HTML or plain text files, at most $descrip-
              tion::words (default: 50) are returned.

       extract_meta( FILE, NAME )
              Extracts the value of the CONTENT attribute from a  META  element
              having  the  given  NAME attribute from an HTML file given by the
              FILE name; FILE should be an absolute path.  The file must end in
              one of the extensions htm, html, or shtml  to  be  considered  an
              HTML file.  The first $description::chars (default: 2048) charac-
              ters  are  read.   The  characters are cached between consecutive
              calls using the same filename.

       hyperlink( LIST )
              Adds hyperlinks to strings: that is  strings  that  contain  sub-
              strings  that are valid URLs (according to RFC 1630) have the ap-
              propriate HTML tags ``wrapped'' around them so that they will  be
              selectable  when  displayed in a browser.  The ftp, gopher, http,
              https, mailto, news, telnet, and wais URLs are recognized.  Exam-
              ple:

                 Read all about it at
                 http://www.usatoday.com/

            becomes:

                 Read all about it at
                 <A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>

SEE ALSO
       perl(1)

       Tim Berners-Lee.  ``Universal Resource Identifiers in WWW,'' Request for
       Comments 1630, Network Working Group of the  Internet  Engineering  Task
       Force, June 1994.

       Tim  Berners-Lee, Larry Masinter, and Mark McCahill.  ``Uniform Resource
       Locators (URL),'' Request for  Comments  1738,  Network  Working  Group,
       1994.

       Dave Raggett, Arnaud Le Hors, and Ian Jacobs.  ``Notes on helping search
       engines  index your Web site,'' HTML 4.0 Specification, Appendix B: Per-
       formance, Implementation, and Design Notes, World Wide  Web  Consortium,
       April 1998.

       --.   ``Objects,  Images,  and Applets: How to specify alternate text,''
       HTML 4.0 Specification, ยง13.8, World Wide Web Consortium, April 1998.

       Dublin Core Directorate.  ``The Dublin Core: A Simple  Content  Descrip-
       tion Model for Electronic Resources.''

       Larry  Wall,  et  al.  Programming Perl, 3rd ed., O'Reilly & Associates,
       Inc., Sebastopol, CA, 2000.

AUTHOR
       Paul J. Lucas <pauljlucas@mac.com>

WWW                            February 12, 2000                         WWW(3)

Generated by dwww version 1.16 on Tue Dec 16 05:22:19 CET 2025.