WWW(3) Library Functions Manual WWW(3)
NAME
WWW - World Wide Web Package
SYNOPSIS
extract_description( FILE )
extract_meta( FILE, NAME )
hyperlink( LIST )
DESCRIPTION
This package provides a utility functions for the World Wide Web to ex-
tract descriptions of or meta information from files, and hyperlink
text.
SUBROUTINES
The following Perl subroutines are defined and available:
extract_description( FILE )
Extracts a description from an HTML or plain text file given by
the FILE name; FILE should be an absolute path. The first $de-
scription::chars (default: 2048) characters are read. If the
file ends in one of the extensions htm, html, or shtml, it is
presumed to be an HTML file; if the file ends in txt, it is pre-
sumed to be a plain text file. Other extensions are not recog-
nized and no description is returned for them.
For HTML files, first, if a <META NAME="description" CON-
TENT="..."> or a <META NAME="DC.description" CONTENT="...">
(Dublin Core) element is found, then the words specified as the
value of the CONTENT attribute is returned as the description.
Otherwise, all HTML comments, text between <SCRIPT>, <STYLE>, and
<TITLE> tags, and all other HTML tags are stripped. If <AREA ...
ALT="..."> or <IMG ... ALT="..."> elements are found, then the
words specified as the value of the ALT attributes are extracted.
Finally, for either HTML or plain text files, at most $descrip-
tion::words (default: 50) are returned.
extract_meta( FILE, NAME )
Extracts the value of the CONTENT attribute from a META element
having the given NAME attribute from an HTML file given by the
FILE name; FILE should be an absolute path. The file must end in
one of the extensions htm, html, or shtml to be considered an
HTML file. The first $description::chars (default: 2048) charac-
ters are read. The characters are cached between consecutive
calls using the same filename.
hyperlink( LIST )
Adds hyperlinks to strings: that is strings that contain sub-
strings that are valid URLs (according to RFC 1630) have the ap-
propriate HTML tags ``wrapped'' around them so that they will be
selectable when displayed in a browser. The ftp, gopher, http,
https, mailto, news, telnet, and wais URLs are recognized. Exam-
ple:
Read all about it at
http://www.usatoday.com/
becomes:
Read all about it at
<A HREF="http://www.usatoday.com/">http://www.usatoday.com/</A>
SEE ALSO
perl(1)
Tim Berners-Lee. ``Universal Resource Identifiers in WWW,'' Request for
Comments 1630, Network Working Group of the Internet Engineering Task
Force, June 1994.
Tim Berners-Lee, Larry Masinter, and Mark McCahill. ``Uniform Resource
Locators (URL),'' Request for Comments 1738, Network Working Group,
1994.
Dave Raggett, Arnaud Le Hors, and Ian Jacobs. ``Notes on helping search
engines index your Web site,'' HTML 4.0 Specification, Appendix B: Per-
formance, Implementation, and Design Notes, World Wide Web Consortium,
April 1998.
--. ``Objects, Images, and Applets: How to specify alternate text,''
HTML 4.0 Specification, ยง13.8, World Wide Web Consortium, April 1998.
Dublin Core Directorate. ``The Dublin Core: A Simple Content Descrip-
tion Model for Electronic Resources.''
Larry Wall, et al. Programming Perl, 3rd ed., O'Reilly & Associates,
Inc., Sebastopol, CA, 2000.
AUTHOR
Paul J. Lucas <pauljlucas@mac.com>
WWW February 12, 2000 WWW(3)
Generated by dwww version 1.16 on Tue Dec 16 05:22:19 CET 2025.