american standard code for information interchange

from The Free On-line Dictionary of Computing (8 July 2008)
American Standard Code for Information Interchange
ASCII

   The basis of character sets used in almost all present-day
   computers.  {US-ASCII} uses only the lower seven {bits}
   ({character points} 0 to 127) to convey some {control codes},
   space, numbers, most basic punctuation, and unaccented letters
   a-z and A-Z.  More modern coded character sets (e.g.,
   {Latin-1}, {Unicode}) define extensions to ASCII for values
   above 127 for conveying special Latin characters (like
   accented characters, or German ess-tsett), characters from
   non-Latin writing systems (e.g., Cyrillic, or {Han
   characters}), and such desirable {glyphs} as distinct open-
   and close-quotation marks.  ASCII replaced earlier systems
   such as {EBCDIC} and {Baudot}, which used fewer bytes, but
   were each {broken} in their own way.

   Computers are much pickier about spelling than humans; thus,
   hackers need to be very precise when talking about characters,
   and have developed a considerable amount of verbal shorthand
   for them.  Every character has one or more names - some
   formal, some concise, some silly.

   Individual characters are listed in this dictionary with
   alternative names from revision 2.3 of the {Usenet} ASCII
   pronunciation guide in rough order of popularity, including
   their official {ITU-T} names and the particularly silly names
   introduced by {INTERCAL}.

   See {V} {ampersand}, {asterisk}, {back quote}, {backslash},
   {caret}, {colon}, {comma}, {commercial at}, {control-C},
   {dollar}, {dot}, {double quote}, {equals}, {exclamation mark},
   {greater than}, {hash}, {left bracket}, {left parenthesis},
   {less than}, {minus}, {parentheses}, {oblique stroke},
   {percent}, {plus}, {question mark}, {right brace}, {right
   brace}, {right bracket}, {right parenthesis}, {semicolon},
   {single quote}, {space}, {tilde}, {underscore}, {vertical
   bar}, {zero}.

   Some other common usages cause odd overlaps.  The "#", "$",
   ">", and "&" characters, for example, are all pronounced "hex"
   in different communities because various assemblers use them
   as a prefix tag for {hexadecimal} constants (in particular,
   "#" in many assembler-programming cultures, "$" in the {6502}
   world, ">" at {Texas Instruments}, and "&" on the {BBC Micro},
   {Acorn Archimedes}, {Sinclair}, and some {Zilog Z80}
   machines).  See also {splat}.

   The inability of {US-ASCII} to correctly represent nearly any
   language other than English became an obvious and intolerable
   {misfeature} as computer use outside the US and UK became the
   rule rather than the exception (see {software rot}).  And so
   national extensions to US-ASCII were developed, such as
   Latin-1.

   Hardware and software from the US still tends to embody the
   assumption that US-ASCII is the universal character set and
   that words of text consist entirely of byte values 65-90 and
   97-122 (A-Z and a-z); this is a major irritant to people who
   want to use a character set suited to their own languages.
   Perversely, though, efforts to solve this problem by
   proliferating sets of national characters produced an
   evolutionary pressure (especially in protocol design, e.g.,
   the {URL} standard) to stick to {US-ASCII} as a subset common
   to all those in use, and therefore to stick to English as the
   language encodable with the common subset of all the ASCII
   dialects.  This basic problem with having a multiplicity of
   national character sets ended up being a prime justification
   for {Unicode}, which was designed, ostensibly, to be the *one*
   ASCII extension anyone will need.

   A system is described as "{eight-bit clean}" if it doesn't
   mangle text with byte values above 127, as some older systems
   did.

   See also {ASCII character table}, {Yu-Shiang Whole Fish}.

   (1995-03-06)