.\" @(#)pep.1 2.8 95/08/11 [gh]
.\" Usage:
.\"    nroff -man pep.1
.TH PEP MANEXT "1995 August 11" "Version 2.8"
.SH NAME
pep \- a file detergent
.SH SYNOPSIS
.B pep
[
.B \-a
]
[
.B \-b
]
[
.B \-c
[
.I size
]]
[
.B \-d + | \-
]
.if n .ti +5
[
.B "\-e [ 0 | 1 | 2"
]]
[
.B \-g
[
.I file
]]
[
.B \-h
]
[
.B \-i + | \-
]
.if n .ti +5
[
.B \-k + | \-
]
[
.B \-l
[n][
.I size
]]
[
.B \-m + | \-
]
[
.B \-o
[
.B b
]]
.if n .ti +5
[
.B \-p
]
[
.B \-s
[
.I size
]]
[
.B \-t
[
.I size
]]
[
.B \-u
.I terminator
]
.if n .ti +5
[
.B \-v
]
[
.B \-w + | \-
]
[
.B \-x
]
[
.B \-z
]
[
.I filename
.B .\|.\|.
]
.SH DESCRIPTION
.LP
.B Pep
is a filter program to "clean" files.  It is named after a
popular Norwegian detergent.
.PP
.B Pep
may be used to remove control characters, strip parity bits,
interpret ANSI escape sequences, compress tabulation,
extract strings and convert character sets.
.PP
.B Pep
is a filter.  Its default operation is to read from standard input
(the keyboard) and write on standard output (the terminal).
.PP
You may also specify the name of one or more files as the last
argument on the command line.  Most versions of
.B pep
(not the version compiled for the DEC VMS operating system)
allow redirection and ambiguous filename arguments.
.PP
Instead of using
.B pep
as a filter; you may instruct
.B pep
to write the result back onto the original input file with the
.B \-o
option.  If you use this option, the original file will be lost.
This is the default behaviour on operating systems that do not
support redirection (e.g. DEC VMS).
.PP
To get a brief summary of the command line syntax and all the options,
you need to specify the
.B \-h
option.  Just type the command:
.sp 0.5
.RS
.B pep \-h
.RE
.PP
followed by the RETURN key.  Note that just
.B pep
will not give you this summary.  The command:
.sp 0.5
.RS
.B pep
.RE
.PP
will start
.B pep
as a filter, and it will just echo back whatever you type, until you
type the end of file character (usually CTRL-D or CTRL-Z).
.PP
When
.B pep
is running as filter, it is reading from the standard input and
writing to the standard output.  In this state,
.B pep
will be very much less verbose than it usually is.  It will still
print error messages, but very little else.  Note that while:
.sp 0.5
.RS
.nf
.B pep < foobar.in > foobar.out
.B pep \-ob foobar.txt
.fi
.RE
.PP
will do more or less the same job, the first will do it quietly,
in the tradition of Unix filters; the latter will print the
copyright notice, a detailed list of the things it will do,
and finally a list and line count
of all the files it processes as it plods along.
.PP
.B Pep
will remove some "noise" from files, even if no options are specified.
The following is the default behavior:
.RS
.TP 3
\(bu
remove trailing spaces;
.TP 3
\(bu
terminate each line with the canonical line terminator (usually LF, CR or both);
.TP 3
\(bu
remove underlining intended for backspacing printers;
.TP 3
\(bu
remove control characters (character codes < 32) except canonical line
terminator, FF and TAB;
.TP 3
\(bu
break the line before the FF if a line contains an FF anywhere except in the
first column.
.RE
.PP
If you want to check what
.B pep
actually intend to do to your file before it does it, you may make it
pause with the
.B \-p
option.  For example:
.sp 0.5
.RS
.B pep \-p foobar.txt
.RE
.PP
will make
.B pep
stop after displaying a list of the conversions it will apply to the
file.  The user is prompted and may choose to proceed
(hitting the RETURN key), or abort
the program without doing anything (hitting CTRL-C).
.PP
The user may want other conversions than the default action described
above.  A number of conversion functions may be selected by specifying one or
more options on the command line.
.PP
Some of the options require an additional argument switch, and must be
followed by a "+" or a "\-", other options
require a number or a filename argument.
Most of the options may be combined with other options, but a few are
mutually exclusive.  If the user specifies invalid options or option
arguments, then
.B pep
will abort with an error message and return an error exit code on
operating systems that support exit codes.
.SH OPTIONS
.TP
.B \-a
Write out information about
.B
pep.
.TP
.B \-b
Remove all characters not in the original 7-bit character set (ISO 646).
I.e. remove the characters which are encoded from 128 to 255.
(If this option is combined with the
.B \-x
option, it will print the codes for these characters in hexadecimal
instead of removing them.)
The
.B \-b
option is powerful, and may remove a lot of bytes if you use it
on the wrong file.  Only use it if you know exactly how the eight bit is
used in the file you intend to filter.  Also note that the options
.B i, d, k, g, m, w
or
.B z
in most cases are better suited to
process files where the eight bit is set.
.TP
\fB\-c \fR[ \fIsize \fR]
Compress space into tabulation.  I.e. insert TAB characters when
replacing a run of two or more SPACE characters would produce a
smaller output file.
This function is the opposite of the function invoked with the
.B \-t
option.
.IP
The default tabulation size is 8,
but you may specify any other tabulation with the optional numeric
argument.
.TP
.B \-d + | \-
Convert to or from the ISO 8859/1 8 bit character set and the Norwegian
version of the ISO 646 7 bit character set.  If the argument is "+",
the file is converted
.I to
ISO 8859/1.  If the argument is "\-",
the file is converted
.I from
ISO 8859/1.  The ISO 8859/1 character set is also
known as the  "DEC Multinational Character Set".
.TP
\fB\-e \fR[ \fB0 | 1 | 2 \fR]
Interpret ANSI screen control sequences (also known as ANSI ESCAPE
sequences).  This function makes
.B pep
emulate cursor positioning and other functions on an ANSI-terminal.
.IP
.B Pep
will complain about "strange" (i.e. implementation dependent) use of
ANSI escape sequences.
.IP
.B Pep
will normally save a screen image on the output file when one of
two events occur:  1) When the screen is full and scrolls up;
or 2) just before a screen image is erased with the "erase screen"
ANSI screen control sequence.  In some cases important fields
on the screen will be overwritten or erased.  There
is no good solution to this
problem, but
.B pep
provides the user with some opportunity to guard against overwriting
and erasure.  This is done by specifying an additional numeric argument
to the
.B \-e
option.  This numeric indicate the level of protection
and is interpreted as follows:
.sp 0.5
.RS
.RS
.TP 3
0:
no protection \(em fields may be erased and overwritten
(this is the default);
.TP
1:
sequences that erase fields are ignored;
.TP
2:
sequences that erase or overwrite fields are ignored.
.RE
.RE
.TP
\fB\-g \fR[ \fIfile \fR]
Read the conversion table from a file.  The name of the file must be
appended as the argument to this option.
.IP
The file itself is a standard ASCII text file where each line should
contain two decimal numbers.  The first number is the character code
to convert
.I from,
and the second number is the character code to convert
.I to.
A "#" character and all the following characters up to a NEWLINE is
considered a comment, and is ignored.  Comments are however echoed
on the screen along with the other comments
.B pep
makes, unless the comment line starts with a "##".
.IP
Below is an example of how such a conversion file may look:
.sp 0.5
.PP
.ft B
.nf
.RS
.RS
# Convert from Macintosh to IBM-PC
##This line is not echoed on the screen.
# MAC IBM
174 146
175 157
129 143
190 145
191 155
140 134
# EOF
.RE
.RE
.fi
.ft R
.IP
If the name of the file is omitted,
.B pep
will write out a list the directories it searches for these files.
.TP
.B \-h
Write a brief summary of
.B pep
options, and exit.
.TP
.B \-i + | \-
Convert to or from the IBM 8 bit character set (Code Page 850 Multilingual)
and the Norwegian
version of the ISO 646 7 bit character set.  If the argument is "+",
the file is converted
.I to
CP 850.  If the argument is "\-",
the file is converted
.I from
CP 850.  The CP 850 character set (or a subset of it)
is what is used in the IBM PC, AT, and PS/2 series of
computers and their clones.  Note that some machines with
American PROMs have a yen- and cent character in
the position rightfully belonging to upper and lower case
versions of the Norwegian character
written as an "o" with a slash across it (often referred to as
.IR oslash ).
.TP
.B \-k + | \-
Convert to or from a 8 bit character set and the
ISO 646 7 bit character set.  This is a modified version
of the
.B \-i
function, hacked to preserve both the
.I backslash
character and the upper case
.I oslash
character as required by, among others, the "KnowledgeMan" package.  These
characters share the same code (92 decimal) in 7 bit ISO 646,
but uses different codes (92 is backslash, 157 is oslash) in
8 bit CP 850.  To get around this, two backslashes in ISO 646
will be converted to the upper case oslash character in CP 850, while
a single backslash will be preserved \(em and vice versa.
.IP
If this option is combined with the
.B \-d
or
.B \-m
option, the DEC/ISO or the Macintosh character sets is used as base
instead of CP 850.
.TP
\fB\-l \fR[ [ \fBn \fR] \fIsize \fR]
Split long lines into lines of maximum length given by the
.I size
argument.
.IP
This option will also make sure that there will be at least one
blank line between each paragraph, unless the optional argument
.B n
is specified.
.IP
If size is not specified, a default value of 72 characters
are used.
.TP
.B \-m + | \-
Convert to or from the Apple Macintosh 8 bit character set and the Norwegian
version of the ISO 646 7 bit character set.  If the argument is "+",
the file is converted
.I to
the Macintosh character set; if the argument is "\-",
the file is converted
.I from
the Macintosh character set.
See description of
.B \-v
option below and
note in "bugs" section below about treatment of "end-of-line" and
"end-of-paragraph".
.TP
\fB\-o \fR[ \fBb \fR]
.B Pep
will usually write the result of conversions on the standard output
.I (stdout).
This option instead instructs
.B pep
to replace each named input file with a file containing the result
of filtering the file through
.B pep.
If the option is augmented with the argument
.B b
(i.e.
.BR \-ob ),
then
.B pep
will create a backup copy of the original input file on a file
with extension .BAK.  If you just specify
.B \-o
the original file is deleted.
.IP
The VMS version of
.B pep
will always run as if this option was specified.  This is because
VMS does not support useful redirection or pipes.  Therefore, it is never
necessary to specify the
.B \-o
option under VMS, but users should still specify
.B \-ob
if they want a backup copy of the original input file.
.TP
.B \-p
Write out a brief description the conversion functions that
will be activated by the current
set of options, and pause.  The user may review the list of
conversion functions and abort (by hitting CTRL-C) if they do not have
the intended effect.
.TP
\fB\-s \fR[ \fIsize \fR]
Find strings in extremely "noisy" files.
.IP
.BR Pep 's
concept of a string is that it is a sequence of "printable" characters
of a certain length.  The default minimum length of this sequence is
4, but this may be changed by the user by supplying an optional
numeric argument that becomes the minimum length of the sequence.
.IP
The default definition of a "printable" character is a symbol with
encoding above 31 decimal (i.e. 32 to 255) plus certain
common control characters (TAB, CR and LF).  This definition
is almost always too liberal, and will include a lot of "noise" in
the output.  One or more of the options
.B \-b, \-d, \-i, \-m
or
.B \-z
should be specified in addition to
.B \-s
in order to narrow the definition and the search space.
In my experience, the
.B \-b
option is a particularly
useful additional filter when searching for strings.
.TP
\fB\-t \fR[ \fIsize \fR]
Expand tabulation, replacing the TAB character with a suitable number
of spaces.  The default tabulation size is 8, but the optional
numeric argument
.I size
may be used to set tabulation to any desired size.
.TP
\fB\-u r | n | s | - | # | \fInumber \fR
.BR Pep 's
default behaviour is to terminate lines with whatever is the
canonical line terminator (the standard way to terminate
a text line) on the assumed target system for the output file.
This means CR/LF on a microcomputer system, LF on a UNIX system,
and CR if the target is a Macintosh).  The assumed target system
is usually the system
.B pep
is running on, unless you request folding to the character set
of another computer system.  Then, that computer system becomes
the assumed target.
.IP
The
.B \-u
option allows you to override this assumption.
You do this by specifying explicit (in decimal) the numeric ASCII
value of the end of line character you want in your output file.
For example, to make sure
lines are terminated by LF (the standard for UNIX text files),
you may use
.BR \-u10 ,
because 10 is the ASCII value of the newline (LF) control character.
Instead of a numeric argument, you may specify
.BR r ,
for carrige return (CR),
.BR n ,
for newline (LF),
.BR s ,
for record separator (RS), the symbol
.BR - ,
for no line terminator, or the symbol
.B #
to get carrige return followed by a newline (CR/LF).
.TP
.B \-v
Normally,
.B pep
will terminate each line with the canonical line terminator.
Some typesetting programs and word processors, however, require
that no hard line terminator is present within a paragraph, and
that only paragraphs are hard terminated.  If you want to
export a file to such a typesetting program or word processor,
you may instruct
.B pep
to terminate paragraphs
.I only
with this option.
.IP
See note in "bugs" section below about treatment of "end-of-line" and
"end-of-paragraph".
.TP
.B \-w + | \-
This slightly obsolete option converts files to and from the
WordStar version 3.2 "document" mode.  If the argument is "+",
the file is converted
.I to
WordStar document mode; if the argument is "\-",
the file is converted
.I from
WordStar document mode into plain ASCII text.
.TP
.B \-x
Expand unprintable characters.  This option
will make
.B pep
expand the characters it would otherwise remove from the file by
printing the character encoding of these characters in
hexadecimal between angle brackets.
.TP
.B \-z
Zero the eight bit (a.k.a. the parity bit) on all characters in the file.
.SH ENVIRONMENT
.PP
.B Pep
knows a single environment variable:
.BR PEP ,
which may be
used to indicate the lookup path for files with conversion
tables.  Below is some examples on how to set this in some
operating systems:
.sp 0.5
.RS
.nf
\fBset PEP=C:\eMISC\eLIB       \fR(MS-DOS)
\fBsetenv PEP /home/george/lib      \fR(UNIX)
\fBdefine PEP "DISK_USR:<GEORGE.LIB>"    \fR(VMS)
.fi
.RE
.PP
The command to set this environment variable should usually be
part of the command file that is read during login (this may
be named
.B "AUTOEXEC.BAT, LOGIN.COM, .profile"
or
.B .login
depending upon your choice of operating system.
.SH DIAGNOSTICS
.PP
If you specify an option that
.B pep
does not recognize, then
.B pep
will
write a summary of usage and abort.  Other errors on the
command line will result in
.B pep
writing an error message
before aborting.
.PP
On operating systems that support exit codes,
.B pep
will return an exit code upon termination.
.PP
If
.B pep
is interpreting ANSI escape sequences and notices
syntactical or semantical errors in the way they are used, a
warning is printed on the screen, prefixed with the string
"ansi:".  This means that it is also possible to use
.B pep
to check if programs use ANSI sequences in a portable way.
.SH FILES
.PP
The directory
.I LIBDIR
should contain a set of standard filters for use with the -g option.
.SH AUTHOR
.PP
Copyright (c) 1987-1995 Gisle Hannemyr.
.PP
This program is free software;  you can redistribute it and/or modify
it under the terms of the GNU General Public License, as published by
the Free Software Foundation. See the file "copying.txt" for details.
.PP
Bug reports, comments and suggestions to:
.ti +0.2i
.ti +0.2i
gisle@hannemyr.no
.SH ACKNOWLEDGMENTS
.PP
Thanks to Robert Andersson, for the SYS-V
.I "rename"
function; and to
Knut Borge, Bjorn Larsen, Knut Omang and Geir-Harald Strand,
for elucidation of the unspeakeable horrors of VMS.
.PP
Several people have contributed character tables, ideas and/or bug
reports.  In addition to those mentioned above:
Inge Arnesen,
Nils-Eivind Naas,
Ola Garstad,
Ottar Grimstad,
Tor Sjowall,
Jens-Henrik Sorensen
and Bjorn Asle Valde,
should be mentioned.  My apologies if anyone is forgotten.
.SH SEE ALSO
.LP
.BR dd (1),
.BR convert (VMS),
.BR expand (1),
.BR od (1V),
.BR sed (1),
.BR strings (1),
.BR tr (1),
.BR unexpand (1).
.PP
Those marked VMS are standard VMS utilities.
The others are standard UNIX utilities.
.SH BUGS
.PP
There is a very strong Norwegian bias in
.B pep.
In particular,
there exists several national versions of the ISO 646 7-bit
character set; but all built-in functions to convert between this
and various 8-bit character sets (i.e.
.B \-d, \-i, \-k
and
.BR \-m )
bluntly assumes the standard Norwegian version of the ISO 646. For
.B pep
to work with other national 7-bit character sets, the
compiled in conversion tables (type FOLDMATRIX for those who read the
source code) need to be extended.
.PP
The VMS version of
.B pep
runs with the
.B \-o
option permanently enabled.  This is because VMS does not support an
useful i/o redirection or pipe mechanism.
.PP
The VMS Record Management Service (RMS) knows of several record formats.
You can see what record format a file is by using the VMS DCL command
.I "DIRECTORY/FULL"
and examine the field "Record format".
On VMS systems,
.B Pep
will always generate output files with record format set to "Stream_LF",
but some programs may require that the output file is in other
formats.  To fix this, it might be necessary to run the output of
.B pep
through the VMS
.B CONVERT
utility.  Please see the DEC VMS manuals for details.
.PP
The Macintosh "text only" format uses the carriage return (CR) character
(ASCII 13) as terminator.  Most text processors (e.g. MacWrite)
seems capable of handling two conventions:
One is to use CR to terminate each line (and two or more
consequtive CR's between paragraphs); the other is to use CR between
paragraphs only.
.B Pep
is also capable of handling both conventions.  The default behaviour
is to terminate each line, but the
.B \-v
option may be used to terminate paragraphs only.
Please note that
.B pep
uses a rather simplistic heuristic to identify the end of a paragraph,
it bluntly assumes that paragraphs are separated by blank lines.
.PP
If you use the
.B \-o
option, then the original input file will
be overwritten.  Before you are familiar with
.B pep,
you may
find that it sometimes removes more material than you expect
from a file.  It may be a good idea to always make a copy
of the original file before you start experimenting with
.B pep,
or you may add the
.B
"b"
argument to the
.B
\-o
option
.B
(\-ob).
.PP
The built-in IBM-PC, DEC and Macintosh conversion tables
converts to and from the Norwegian version of 7-bit "ASCII"
characters.  You should use the
.B \-g
option and "general" conversion tables for all other purposes.
.PP
.B Pep
only knows the ANSI sequences implemented in the
standard MS-DOS console driver
.I
ANSI.SYS.
.PP
There cannot be a space character between an option and the
option's argument (e.g. you'll have to use
.B
"\-gfoo.bar",
not
.B
"\-g foo.bar").
.PP
Pep will only filter "regular" files.  It will skip directories, sockets
and "special" files.
.PP
Links are the GOTOs of file systems.  If you run a hard linked file
through pep using the
.B \-o
option, the link will not be preserved.  Pep will just skip soft
linked files.
.PP
.B Pep
searches for the conversion tables requested with the
.B
\-g
option in the following order: first the current directory,
then the directory of the file
.I PEP.EXE
(MS-DOS only), then the directory pointed to by the
.B PEP
environment
variable, and finally the directory
.I LIBDIR.
.PP
.B Pep
knows nothing about the COFF-format and the
.B \-s
option is
primitive compared to the UNIX command
.IR strings (1).
.\" EOF
