Skip to main content

Decoding the Encodings (2) — Bringing the world together!

In the previous post, we saw how ASCII evolved and how useful it was for communication among different devices. With ASCII, the same decimal numbers 0-127 were mapped to the same printable characters and control codes, across all computers and peripherals, creating uniformity.

The Problem

ASCII was powerful but it was limited to English. It would have worked, If Britain was the only colonial power in the world.

Unfortunately, western greed was prevalent in early modern history. Many other European countries like France, Spain, Portugal, Netherland, Germany, and a few more, claimed many other parts of the world as colonies. This led to having, FrenchSpanishPortuguese, and a few more European languages as the lingua franca in a large part of the world.

On top of it, there were some countries like China, Japan, Iran, Russia & few more, which were either not colonized or not completely influenced by the west, and were still using their languages, like MandarinJapaneseArabic & Slavic languages, etc.

So, after inviting the whole world into the computer party, the situation came where ASCII was not enough to communicate in all spoken languages.

Earlier solutions

ASCII Variants

When computers spread into the world, standardization institutions of various regions or countries created their own variations of 7-bit ASCII to support their non-English languages.

For example, the currency symbol. The only available currency character in Standard-ASCII is (36). The dollar was the world’s currency, so the symbol was needed everywhere. But other regions had their currency symbols too. So regions started trading lesser-used characters for their currency symbol. The UK replaced “#” with “£”, Japan replaced “\” with “¥”, and Korea “\” with “₩”along with a few other changes, and the story went on.

On top of it, various private corporations also created their version of 7-bit ASCII to support their devices.

Remedying the pro-English-language bias, created compatibility problems because we were still dealing with a 7-bit character set, and everyone was replacing lesser-used characters, based on their region or use case.

A better opportunity

Birth of Extended-ASCII

In the 1970s, computers & peripherals got standardized to use 8-bit-bytes. Now each byte could represent an 8-digit binary number, ranging from decimal number 0 to 255. As each number can be mapped to one character, this allowed computers to handle text that uses 256-character sets.

ASCII was using only 7 bits, equivalent to decimal numbers 0 to 127, the remaining 127 slots from 128 to 255 become available for mapping to new characters.

So unlike the earlier solutions, now institutions & corporations did not have to replace existing mapping from ASCII. reserving the existing character set, everyone added their characters for the remaining 127 available spaces. This was called Extended ASCII.

NOTE: Some regions even completely replaced all characters in their character sets. As CCCII for the Chinese language character set, developed in Taiwan. Few Japanese Characters like JIS X 0201

Still a chaos

Too many versions

This opened the wrong door. Each region started coming up with its extended character set to fit the remaining 127 slots from 128–to 255. This resulted in many different Extended-ASCII character sets.

Extended ASCII couldn’t solve interoperability or compatibility problems because it became challenging to keep track of varients, there were simply too many Extended-ASCII and no limitation in creating new ones.

Curbing the chaos

ISO 8859 N— Extended ASCII

Eventually in the 1980s, International Standardization Organization (ISO) stepped in and did what it always does. It created a few standard eight-bit ASCII extensions for non-English languages, called ISO 8859-N.

Locations 128 to 159 are mapped to the control characters in all ISO-8859. The remaining 96 characters are different in different ISO 8859-N versions.

The first one of the bunch is ISO-8859–1 shown below. It contained sufficient characters for most Western European languages.

Over the course of the next 12 years, ISO tried to add all known languages and created Character sets. For example

  • ISO-8859–2 for Central European languages,
  • ISO-8859–3 for Southern European languages
  • ISO-8859–4 for North European languages,
  • ISO 8859–5 for Cyrillic languages,
  • ISO-8859–6 for Arabic
  • And so on until ISO-8859–16

Beginning of the end

Rich languages & End of ISO 8859-N

ISO 8859-N maps only 96 positions from 160 to 265. Many languages in the world have more symbols. On top of it, few ideographic, monosyllabic East Asian languages require many thousand symbols. like Mandarin, Cantonese, Japanese and Vietnamese.

So even after agreeing on the compromise of switching between 16 different predefined ISO-8859-N character sets, we were still unable to cover all major languages under the ISO standards scheme.

Due to this, a better system for character sets was created (Mentioned below paragraph). Some of the ongoing character-set development (likeISO-8859–12Devnagiri) was abandoned and started in this new system.

Birth of an ultimate solutions

Then came an invention from a company, that has a long history of inventions to help shape the course of humanity. The one & only Xerox PARC. This time invention was UNICODE. More on it in the next post.

This was all about how character sets evolved from English-speaking America to another developing world. In the next post, we will discuss UNICODE, the ultimate solution for the character set.

Comments

Popular posts from this blog

Unable to Redo in VS-Code & Intellij

Since the beginning of personal computers, few keyboard shortcuts are common among all operating systems and software. The ubiquitous cmd+c (copy), cmd+v(paste) , cmd+z (undo) and cmd+y (redo) I am not sure why, both of my favorite IDEs,  Visual Studio Code  &  Intellij  decided to not use  cmd+Y for redo.Below are the quick steps to configure  cmd+Y for a redo in VS-Code & Intellij Visual Studio Code Open VS Code & Go to keyboard shortcuts There will be a search bar at the top Type “  redo  “ in the search bar. You can see on my system its still mapped to  shift+cmd+z Double click on  ⇧ ⌘ z  and the below box will appear. Do not click anywhere or type anything on the keyboard except the key you want to assign, in our case it was  cmd+y,  so type  cmd+y Press Enter and you are done. Now you can use  cmd+z  for undo and  cmd+y  to redo like always Intellij It is also as simple as VS-Code...

My Custom Built Desktop. The Questions & The Answers!

If  you want to avoid overpriced pre-builts like the M1 Mac Mini, Mac Pro, or Dell XPS Desktop without compromising on performance, a self-built desktop is a preferred option. It's also a great choice if you enjoy building things. custom built with ASUS-PRIME-P If you choose to build a custom PC, be prepared to invest time in researching and assembling compatible components.  In this post, I'll share my experience building this colorful powerhouse. I'll cover: Why did I do it.  Key questions to ask when selecting components Thought process behind component choices Components used in my build Benchmark comparisons . ** My second custom-build **.  ***  Disclaimer: Not an Apple product. Just a free apple sticker is used *** Why did I do it I decided to get a desktop during the pre-MacM1 era (yes, that’s a thing). After browsing many websites, I found that well-configured prebuilt PCs were overpriced, while cheaper ones had subpar components. Unable to choose betwee...

An Introduction to Quartz Scheduler

It's a common use case to have an enterprise application, perform specific work, at a specific time or in response to a specific action. In other words, “There is an ask to execute a  Job  upon a predefined  Trigger ”. This brings us to the need for a  Scheduling System.  A system, where  Jobs  &  Trigger  can be registered and the system will manage the remaining complexity. Thankfully for the Java systems,  Quartz  is for rescue. It‘s an open-source library that has been extensively used in enterprise applications for more than a decade. Components in Quartz Sub System: Following are the all major component in the Quartz subsystem: Scheduler : It’s the control room of Quartz. It maintains everything required for scheduling,  such as managing listeners ,  scheduling jobs , clustering, transactions & job persistence. It maintains a registry of  JobDetails ,  Listeners  &  Triggers , and exec...

Time Zones, Meridian, Longitude, IDL… It's more politics than science.

Once, I was working on a few geospatial APIs handling many time zones. While writing tests, I realized I did not know much about timezones. A lame excuse might be, my subpar schooling as a village kid. Nevertheless, I decided to turn the pages on timezones, what I found was more politics than science. Photo by  Arpit Rastogi  on  Unsplash Before diving into anomalies, let’s talk about history then we will go to science followed by politics. History The world without time zones By 300 BCE, the western world agreed that the earth is round. Each developed civilization devised its unique distinct system to measure distances, times & absolute locations, but relative to prime locations within their civilizations. It all worked in ancient times because long-distance travel was not prevalent among common people. Only merchants or armies traveled long distances. And they already developed systems that worked on their predetermined routes, irrespective of the time differences b...

Maven (0) - Preface

During our java based microservice development, we extensively use build tools like  Maven or Gradle.  Usually, IDEs do a lot on our behalf or we just run some predefined commands without checking what's happening inside. Here in this series of 6 posts, I tried to explain Maven. Before I start talking about what Maven is, and its different components, let’s discuss the “why”. Why do we even need Maven?  For this, I’ve to first explain the nature of a Java-based project and also need to take you back in history. The “Build” Step. Java is a compilable language, Unlike Python or Javascript, which are interpreted. ie, the code we write in java, can not as-is run on a Java virtual machine (JVM). JVM understands only the bytecode. Therefore, in the Java world, there is always a need for an  intermediary step.  A step that compiles the java code files into bytecode. That's why after writing the java code, we “somehow” create some deployable (jar, war, ear) to run on ma...

Quartz Scheduler with SpringBoot

In the previous post,  Quartz Scheduler Introduction  we learned the basics of the Quartz subsystem with plain java. In this post, We will use spring boot magic to create an application with Quartz. This application will have the following. An endpoint, to show current items in the system. A quartz job, to keep adding a new item at a regular interval. Before we start with quartz, let's do some basic SpringBoot setup 1. Maven Project: Create a maven project the way you like, Either by using your favorite IDE or by command line or by  spring-starter .   just keep the name of your project as  QuartzSpringApplication  If you do not want to modify any code provided in this article After that add the following dependencies in your pom.xml.  Lombok  is not needed, but I like to use it everywhere. <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-webflux</artifactId> </dep...

BDD (1) — Behavior Driven Development

A wise man ( narcissist me ) once said, “Life is all about the question and answers. The trick to a meaningful life is,  To ask the right questions to yourself, so you can get on the right path to search for the answer .” The very first question one should always ask oneself is WHY.  Let's discuss our WHY in the current case. Why BDD Let's take a step back and start with the well-known software development practice TDD ( Test-Driven Development).  In TDD, the very first thing developers do is, set up the technical expectations from the code by writing failing test cases. After the expectation is set, the code is written/modified to finally pass all of the failing tests. It's an  Acceptance driven development strategy . TDD works fine to create a robust technically working product. But the whole TDD approach revolves only around technical teams. It barely involves the business analysis or product owners to validate the business aspect of a feature, they get involved o...