Why are the letters not consecutive in EBCDIC?

The ASCII standard was adopted in 1963, and now hardly anyone uses an encoding whose first 128 characters are different from ASCII. However, until the end of the last century, EBCDIC was actively used - the standard encoding for IBM mainframes and their Soviet clones of ES computers. EBCDIC remains the default encoding in z/OS, the standard operating system for modern IBM Z mainframes.

What immediately catches your eye when looking at EBCDIC is that the letters are not in a row: between I ΠΈ J and between R ΠΈ S unused positions remained (on the ES computer for these intervals distributed Cyrillic symbols). Who would have thought to encode letters with unequal gaps between adjacent letters?

Why are the letters not consecutive in EBCDIC?

The very name EBCDIC ("Extended BCDIC") hints that this encoding - unlike ASCII - was not created from scratch, but based on the six-bit BCDIC encoding that has been used since IBM 704 (1954):

Why are the letters not consecutive in EBCDIC?

There is no direct backward compatibility: a convenient feature of BCDIC, lost with the transition to EBCDIC, was that the digits 0β€”9 correspond to codes 0-9. However, seven code gaps between I ΠΈ J and eight codes between R ΠΈ S in BCDIC already were. Where did they come from?

The history of (E)BCDIC begins at the same time as the history of IBM, long before electronic computers. IBM was formed as a result of the merger of four companies, of which the most technologically advanced was the Tabulating Machine Company, founded in 1896 by Herman Hollerith, inventor tabulator. The first tabulators simply counted the number of punched cards punched in a certain place; but in 1905 Hollerith began production decimal tabs. Each card for the decimal tabulator consisted of fields of arbitrary length, and the numbers written in these fields in the usual decimal form were added up throughout the deck. The breakdown of the map into fields was set by connecting the wires on the patch panel of the tabulator. For example, on this Hollerith punch card, stored in the Library of Congress, the number 23456789012345678 is apparently embossed, it is unknown how it is divided into fields:

Why are the letters not consecutive in EBCDIC?

The most attentive might notice that Hollerith's map has 12 rows for holes, although ten is enough for numbers; and in BCDIC, only 12 out of 16 possible codes are used for each value of the most significant two bits.

Of course, this is no coincidence. Hollerith originally intended extra rows for "special marks" that were not summed up, but simply counted - as in the very first tabulators. (Today we would call them "bit fields".) In addition, group indicators could be set among the "special marks": if not only final sums were required during tabulation, but also intermediate ones, then the tabulator would stop when it detected a change in any of the group indicators , and the operator had to copy the subtotals from the digital scoreboards to paper, reset the scoreboard, and resume tabulation. For example, when calculating balance sheets, a group of cards could correspond to one date or one counterparty.

By 1920, when Hollerith had already retired, "printing tabulators" came into use, which connected to a teletype machine and could themselves print subtotals without requiring operator intervention. The difficulty now was to determine what each of the printed numbers referred to. In 1931, IBM decided to designate letters using "special marks": a mark in the 12th row denoted the letter from A to I, in the 11th volume - from J to R, at zero - from S to Z. The new "alphabetic tabulator" could print the name of each group of cards, along with subtotals; at the same time, an unbroken column turned into a space between characters. Notice that S denoted by the 0+2 hole combination, and the 0+1 combination was not originally used for fear that two holes side by side in the same column could cause mechanical problems in the reader.

Why are the letters not consecutive in EBCDIC?

Now you can look at the BCDIC table from a slightly different angle:

Why are the letters not consecutive in EBCDIC?

Except that the 0 and space are swapped, the upper two bits define the "special mark" that has been punched into a punched card since 1931 for the corresponding character; and the lower four bits determine the number punched in the main part of the card. Symbol support & - / was added to IBM tabs in the 1930s, and the BCDIC encoding of these characters matches the hole combinations punched for them. When support for even more characters was needed, row 8 was punched as an additional "special mark" - thus, there could be up to three holes in one column. This format of punched cards remained virtually unchanged until the end of the century. In the USSR, the IBM encodings of Latin and punctuation were left, and for Cyrillic letters they punched several β€œspecial marks” at once in rows 12, 11, 0 - not limited to three holes in one column.

When the IBM 704 computer was created, they didn’t think about the character encoding for it for a long time: they took the encoding already used then in punched cards, and only β€œput in place” 0. In 1964, when switching from BCDIC to EBCDIC, the lower four bits of each character were left unchanged, although the higher bits were shuffled a little. Thus, the punched card format chosen by Hollerith at the beginning of the last century influenced the architecture of all IBM computers, up to and including the IBM Z.

Source: habr.com

Add a comment