Skip to main content

Decoding the Encodings (01) — The Beginning!

In software engineering, we often encounter the terms, UNICODEUTF-8Base64ISO-8859–1, and ASCII. Somehow its tough to find out a clear explanition about them. 

In this 4 posts series, I’ll try to provide a “simplified” explanation of what these terms are and an over-simplified history of, how did we get where we are today.

Background

Humans, reply on graphical characters to represent data & operations. For example, language letters (A, a, B, b, C, c, etc.), numbers (1,2,3, etc), operations (+, -, ÷, etc), actions (carriage return, etc), and let's not forget the emojis (😀, 🥸, etc) and much more.

However, computers can’t understand these characters. Computer processors are made up of logic gates having only two states, ON and OFF.

Coincidentally (not really), the binary number system is comprised of only two symbols 1 and 0, those can be interpreted as ON and OFF state of logic gates in computers. Therefore, its easy to fed data and instructions to computers in the binary numbers.

But the problem is, that humans are more comfortable working with decimal than with binary. (Because our biology allowed us to easily manage numbers with 10 bases). Thanks to maths, all binary numbers can be converted to Decimal, Octal, Hexadecimal, or any other base.

With all this in mind, early computer scientists created the concept of the Character Set. They mapped some human understandable graphical characters (and some actions) to predefined decimal values, which intern converted to binary for computers.

Earlier Encodings and limited Characters

America & computers

In the early-mid 20th century, computers were used in very few nations. Initial advancements in computers happened in Europe, mostly in Germany & UK. But in the early 20th century, during the distressful times of many wars, America was far from the warzone and technologically not far behind. So, many great scientists & scholars moved there. As a result, the majority of the foundation work for computers happened only in America.

Latin-script-alphabets

English is the official language of America. It uses Latin-script-alphabets and so do many if not all other western European languages. Therefore, The earliest Character Sets were developed only for Latin script alphabets. Because until until  the last quarter of the 20th century, the non-European or non-American world was irrelevant.

  • Let’s skip the earlier history. We do not want hundred of pages in this post.
  • EBCDIC was one of the earliest relevant 8-bit binary-code data-encoding-system, developed by IBM and mainly used in IBM mainframes.
  • In the early days, different manufacturers were accustomed to using their character-set and encoding. It was resulting in trouble communicating among the different devices.
  • To avoid this, in 1963 American-Standards-Association formalized American Standard Code for Information Interchange (ASCII). They used 7 bits, to map from numbers-0 to numbers-127 with 95 printable charactersand 33 control codes, all shown in the below picture.

Fun Fact — 
ASCII was formulized in 1963, But it was not used until IBM incorporated it in 1981 with its release of the world’s first personal computer.

From commons.wikimedia.org

Note from Forgotten History — 
Soviet engineer Vladimir Lukyanov built the world's first computer that could solve partial differential equations, as early as 1936. But due to Stalin's hostile nature toward computers, there were no inventions in the Stalin era. And the Soviet was left behind in the computer race. Still, GOST-10859 was a soviet language character set used in the late-early days.

Confusion in terminology

Encoding vs CharacterSet

The confusion in understanding various terms comes because ASCII, which was used for a long time, was both a character set and an encoding scheme. As it has a one-to-one mapping from binary value to character/control code.

When the character-set increased, these two terms character-set & encoding (or decoding which is also referred to as encoding, go figure) used two different technologies, more on it in a later post in this series.

This was all about the motivation behind the character set, early charsets, encoding, and ASCII. The next post will see what happened when the computers reached a bigger audience.

Comments

Popular posts from this blog

Unable to Redo in VS-Code & Intellij

Since the beginning of personal computers, few keyboard shortcuts are common among all operating systems and software. The ubiquitous cmd+c (copy), cmd+v(paste) , cmd+z (undo) and cmd+y (redo) I am not sure why, both of my favorite IDEs,  Visual Studio Code  &  Intellij  decided to not use  cmd+Y for redo.Below are the quick steps to configure  cmd+Y for a redo in VS-Code & Intellij Visual Studio Code Open VS Code & Go to keyboard shortcuts There will be a search bar at the top Type “  redo  “ in the search bar. You can see on my system its still mapped to  shift+cmd+z Double click on  ⇧ ⌘ z  and the below box will appear. Do not click anywhere or type anything on the keyboard except the key you want to assign, in our case it was  cmd+y,  so type  cmd+y Press Enter and you are done. Now you can use  cmd+z  for undo and  cmd+y  to redo like always Intellij It is also as simple as VS-Code...

My Custom Built Desktop. The Questions & The Answers!

If  you want to avoid overpriced pre-builts like the M1 Mac Mini, Mac Pro, or Dell XPS Desktop without compromising on performance, a self-built desktop is a preferred option. It's also a great choice if you enjoy building things. custom built with ASUS-PRIME-P If you choose to build a custom PC, be prepared to invest time in researching and assembling compatible components.  In this post, I'll share my experience building this colorful powerhouse. I'll cover: Why did I do it.  Key questions to ask when selecting components Thought process behind component choices Components used in my build Benchmark comparisons . ** My second custom-build **.  ***  Disclaimer: Not an Apple product. Just a free apple sticker is used *** Why did I do it I decided to get a desktop during the pre-MacM1 era (yes, that’s a thing). After browsing many websites, I found that well-configured prebuilt PCs were overpriced, while cheaper ones had subpar components. Unable to choose betwee...

An Introduction to Quartz Scheduler

It's a common use case to have an enterprise application, perform specific work, at a specific time or in response to a specific action. In other words, “There is an ask to execute a  Job  upon a predefined  Trigger ”. This brings us to the need for a  Scheduling System.  A system, where  Jobs  &  Trigger  can be registered and the system will manage the remaining complexity. Thankfully for the Java systems,  Quartz  is for rescue. It‘s an open-source library that has been extensively used in enterprise applications for more than a decade. Components in Quartz Sub System: Following are the all major component in the Quartz subsystem: Scheduler : It’s the control room of Quartz. It maintains everything required for scheduling,  such as managing listeners ,  scheduling jobs , clustering, transactions & job persistence. It maintains a registry of  JobDetails ,  Listeners  &  Triggers , and exec...

Time Zones, Meridian, Longitude, IDL… It's more politics than science.

Once, I was working on a few geospatial APIs handling many time zones. While writing tests, I realized I did not know much about timezones. A lame excuse might be, my subpar schooling as a village kid. Nevertheless, I decided to turn the pages on timezones, what I found was more politics than science. Photo by  Arpit Rastogi  on  Unsplash Before diving into anomalies, let’s talk about history then we will go to science followed by politics. History The world without time zones By 300 BCE, the western world agreed that the earth is round. Each developed civilization devised its unique distinct system to measure distances, times & absolute locations, but relative to prime locations within their civilizations. It all worked in ancient times because long-distance travel was not prevalent among common people. Only merchants or armies traveled long distances. And they already developed systems that worked on their predetermined routes, irrespective of the time differences b...

Maven (0) - Preface

During our java based microservice development, we extensively use build tools like  Maven or Gradle.  Usually, IDEs do a lot on our behalf or we just run some predefined commands without checking what's happening inside. Here in this series of 6 posts, I tried to explain Maven. Before I start talking about what Maven is, and its different components, let’s discuss the “why”. Why do we even need Maven?  For this, I’ve to first explain the nature of a Java-based project and also need to take you back in history. The “Build” Step. Java is a compilable language, Unlike Python or Javascript, which are interpreted. ie, the code we write in java, can not as-is run on a Java virtual machine (JVM). JVM understands only the bytecode. Therefore, in the Java world, there is always a need for an  intermediary step.  A step that compiles the java code files into bytecode. That's why after writing the java code, we “somehow” create some deployable (jar, war, ear) to run on ma...

BDD (1) — Behavior Driven Development

A wise man ( narcissist me ) once said, “Life is all about the question and answers. The trick to a meaningful life is,  To ask the right questions to yourself, so you can get on the right path to search for the answer .” The very first question one should always ask oneself is WHY.  Let's discuss our WHY in the current case. Why BDD Let's take a step back and start with the well-known software development practice TDD ( Test-Driven Development).  In TDD, the very first thing developers do is, set up the technical expectations from the code by writing failing test cases. After the expectation is set, the code is written/modified to finally pass all of the failing tests. It's an  Acceptance driven development strategy . TDD works fine to create a robust technically working product. But the whole TDD approach revolves only around technical teams. It barely involves the business analysis or product owners to validate the business aspect of a feature, they get involved o...