This article explores the basic concepts behind character encoding and then takes a dive deeper into the technical details of encoding systems.
If you have just a basic knowledge of character encoding and want to better understand the essentials, the differences between encoding systems, why we sometimes end up with nonsense text, and the principles behind different encoding system architecture, then read on.
Getting to understand character encoding in detail requires some extensive reading and a good chunk of time. I’ve tried to save you some of that effort by bringing it all together in one place while providing what I believe to be a pretty thorough background of the topic.
I’m going to go over how single-byte encodings (ASCII, Windows-1251 etc.) work, the history of how Unicode came to be, the Unicode-based encodings UTF-8, UTF-16 and how they differ, the specific features, compatibility, and lack thereof among various encodings, character encoding principles, and a practical guide to how characters are encoded and decoded.