Thread by @Foone, no, spellchecker, please don't auto-correct "ebcdic" to "bodice" WHAT EVEN IS THE [...]

no, spellchecker, please don't auto-correct "ebcdic" to "bodice"

WHAT EVEN IS THE DIFFERENCE?

EBCDIC was devised by IBM in the early 60s, it's a family of character encodings used on their System/360 mainframes and descendants. It's based on earlier earlier punched-card encodings and is notable for being one of the only encodings to not have a contiguous alphabet

because having ABC at 0xC1-0xC3 is fine, but to go from I to J, you have to jump from C9 to D1.

The interesting thing about the timing of EBCDIC was that it was designed at the same time as ASCII.
EBCDIC didn't come out until 1963, but work on ASCII started in 1960, and IBM had four staff on the 21-member committee that designed ASCII.

but ASCII wasn't finalized until 1963 and IBM didn't have time to be releasing new hardware in 1963 that was using a standard made at the same time.
So they build their own, separate, encoding.

using the insanity that is EBCDIC didn't seem to stop the System/360 from being one of the most successful mainframe systems of all time, being heavily used by minor government projects like NASA's Apollo program.

Yes, we landed people on the moon using a character encoding that didn't even understand how to lay out the alphabet.

in fairness to EBCDIC the amazing success of the System/360 is probably directly responsible for the still-going-strong-after-57-years hacker hate for EBCDIC. That system was so popular and so long lasting that we'll never be free of EBCDIC, even in a world that has moved on

after all, IBM is still selling servers TODAY you can buy and then load an old System/360 program onto them, and it'll still work.
It'll probably be running inside like 4 layers of VMs, but it'll run.

and if you, like everyone, are wondering "why the heck is it laid out like that?" the answer has to do with the "BCD" part of "EBCDIC": Extended Binary Coded Decimal Interchange Code

so Binary Coded Decimals are a way to represent digits in computers which doesn't just use the traditional binary method, it instead puts a single digit into each 4-bit nibble.

the big difference is when you go above 9.
Instead of just using the binary for 10 (1010), you go to two BCD digits:
0001 and 0000.

which would be "16" if you treated it as an 8-bit binary number, but since you're treating it as separate 4-bit nibbles, it's a 1 and a 0, so it looks like "10"

basically BCD was used (and is still used, somewhat) for numbers that have to interact with humans: it's very easy to convert decimal numbers to BCD, and vice versa.

like high scores in video games are a common use for BCD, which is why you get them maxing out at 999 instead of 255.

So, BCD is a sort of middle-ground between how computers think of numbers (in binary) and how humans think of numbers (in decimal): so naturally BCD is going to show up in the input/output parts of computers... as that's where the computer/human interface exists.

So, for example, the IBM 704, from 1954... You programmed these with punch cards, but naturally you have to encode text and numbers into those punch cards so the computer knows what the heck you mean.

well, the IBM 704 used a character set laid out like this.
So the BCD 0-9 are just 00 through 09, then to do alphabetical numbers, you have A as 11, be as 12, C as 13, etc...

the weirdness of course is when you get to 19. in hex your next number is "1A" but instead it goes back to 21.

20 is skipped because of... reasons. zero is kinda weird in a punch card so they wanted to avoid it, but it also can make sense if you think of these letters as being laid out on a 3x9 grid.

imagine you have this grid before you.
A is row 1, column 1: 11. B is row 1, column 2: 12.
J is row 2, column 1: 21.
and so on.

basically this encoding makes sense if you ever have to punch it out manually, and they don't want to have to expect everyone who does that to understand binary (at least not more than 4-bit binary) or 0-based counting and things like that.

so since the late 28s, IBM had been used BCD-encoded schemes like this.
(This one is from the 50s, and there's good reasons I started there...)

EBCDIC just took those existing encodings they'd been using for decades at that point, and embedded them inside a more modern 8-bit character encoding.

So while it seems deeply weird and unusual, it's rather sensible for the time. It has a lot of history behind it.
It just happens that it was designed at the exact moment when ASCII was designed, and basically everything since then jumped on the ASCII train

so it's a weird remnant of how things were designed in the pre-ASCII era, and it hung around unexpectedly long while pretty much everything else instead built on ASCII and its descendants (like 8-bit codepages, shift-jis, and unicode)

But now that I've explained the more sensible IBM 704 encoding, let's go back to the 48-character BCDIC.
This grew out of a 37-character version from 1928, just adding new characters.
But you'll notice a fun thing about the layout: IT'S BACKWARDS BY ROWS!

even EBCDIC isn't so mad that the code for A (31) comes AFTER the code for R (29)

anyway the moral of the story is that if you ever get mad at unicode for not being consistently encoded (UTF-8? UTF-16? which UTF-16? is it really UCS-2? Which one?), just remember, it could always be worse

UTF-8: it could always be worse (UCS-2)
UCS-2: it could always be worse (8-bit codepages)
8-bit codepages: it could always be worse (EBCDIC)
EBCDIC: it could always be worse (48-character BCDIC)

and I guess that can always be worse, because the original baudot codes (because there are multiple mutually incompatible variants!) were not laid out by any kind of binary at all: They were typed out directly by hand, and wanted to minimize hand fatigue

and it's just gonna be used by trained operators with special keyboards, so WHY NOT make them memorize all these arbitrary bit-patterns?

And you know what we call that?
I mean, besides being unnecessarily obtuse and complicated?

JOB SECURITY!

BTW, just to prove that this was fractally insane? those 5 baudot keys have numbers, 1-5.

They're laid out like this:

$BTW, just to prove that this was fractally insane? those 5 baudot keys have numbers, 1-5.They're laid out like this:$

which is why a real baudot signally guide would look like this, with 1-3 on the right, and 4-5 on the left.
That 1-5 version above is way harder to read and translate into actual key presses.

Also note that it's modal, like shift-jis: A given symbol (01000) doesn't mean anything unless you know if you're in figure mode or letter mode.
With figure mode, it's a 2. With letter mode, it's an E.

I really need to take apart a cheap kid's electronic piano and build my own baudot keyboard.

It's not really any sillier than my binary keyboard or my unary keyboard or my analog dial keyboard

Latest Threads Unrolled: