[ By Archive-name | By Author | By Category | By Newsgroup ]
[ Home | Latest Updates | Archive Stats | Search | Usenet References | Help ]

    Search the FAQ Archives

Single Page

Top Document: PDP-8 Frequently Asked Questions (posted every other month)
Previous Document: What does PDP-8 assembly language look like?
Next Document: What different PDP-8 models were made?


What character sets does the PDP-8 support?


From the beginning, PDP-8 software has generally assumed that textual
I/O would be in 7 bit ASCII.  Most early PDP-8 systems used teletypes
as console terminals; as sold by DEC, these were configured for mark
parity, so most older software assumes 7 bit ASCII, upper case only,
with the 8th bit set to 1.  On output, lines are generally terminated
with both CR and LF; on input, CR is typically (but not always) the
line terminator and LF is typically ignored.  In addition, the tab
character (HT) is generally allowed, but software support output of text
containing tabs varies.

One difficulty with much PDP-8 software is that it bypasses the device
handlers provided by the operating system and goes directly to the
device.  This results in very irregular device support, so that, for
example, control-S and control-Q work to start and stop output under
OS/8, but the OS/8 PAL assembler ignores them when reporting errors.

Most of the better engineered PDP-8 software tends to fold upper and
lower case on input, and it ignores the setting of the 8th bit.  Older
PDP-8 software will generally fail when presented with lower case
textual input (this includes essentially all OS/8 products prior to
OS/278 V1).

Internally, PDP-8 programmers are free to use other character sets, but
the "X notation provided by the assembler encourages use of 7 bit ASCII
with the 8th bit set to 1, and the TEXT pseudo-operation encourages the
6 bit character set called "stripped ASCII".  To map from upper-case-only
ASCII to stripped ASCII, each 8 bit character is anded with octal 77 and
then packed 2 characters per word, left to right.  Many programs use a
semi-standard scheme for packing mixed upper and lower case into 6 bit
TEXT form; this uses ^ to flip from upper to lower case or lower to
upper case, % to encode CR-LF pairs, and @ (octal 00) to mark end of
string.  Note that this scheme makes no provision for encoding the %,
^ and @ characters, nor does it allow control characters other than the
CR-LF pair.

The P?S/8 operating system supports a similar 6 bit text file format,
where upper and lower case are folded together, tabs are stored as _
(underline), end-of-line is represented by 00, padded with any
nonzero filler to a word boundary, and end of file is 0000.

Files under the widely used OS/8 system consist of sequences of 256 word
blocks.  When used for text, each block holds 384 bytes, packed 3 bytes
per pair of words as follows:

		aaaaaaaa		ccccaaaaaaaa
		bbbbbbbb		CCCCbbbbbbbb
		ccccCCCC

Control Z is used as an end of file marker.  Because most of the PDP-8
system software was originally developed for paper tape, binary object
code is typically stored in paper-tape image form using the above packing
scheme.



Top Document: PDP-8 Frequently Asked Questions (posted every other month)
Previous Document: What does PDP-8 assembly language look like?
Next Document: What different PDP-8 models were made?

Single Page


[ By Archive-name | By Author | By Category | By Newsgroup ]
[ Home | Latest Updates | Archive Stats | Search | Usenet References | Help ]


Send corrections/additions to the FAQ Maintainer:
jones@cs.uiowa.edu (Douglas W. Jones)

Last Update July 09 2008 @ 00:13 AM

© 2008 FAQS.ORG. All rights reserved.