Binary file primer


What are binary files?
All the files on your computer, programs as well as documents, are really large "lists" of numbers. Imagine a numbered list like this one:
0) 24
1) 33
2) 9
3) 0
4) 115
The numbering starts at zero. At each list index there is a number, for example the number 9 is at index "2". The index is usually called "offset". Translhextion displays the current offset in the lower left corner of it's window. Large files contain many numbers, small files not so many. The numbers in the list can have a value between 0 and 255, for a special reason: Each number is a "byte".

What are bits and bytes?
"Bit" is an abbreviation for "BInary digiT". It can have a value of either 0 or 1. One bit alone is not very useful, because already the number 2 can not be described with it. So you take many bits to describe large numbers. This is how it works:
The normal decimal system you know, with digits from 0 to 9, describes numbers larger than 9 by arranging several digits in such a way that each position in the number means something different.

For example the number "245" means really "2 times 100 + 4 times 10 + 5 times 1". The further left a digit's position in the number, the higher a value it contributes to the number. Every time you go one step left in the number, you multiply the digit there with 1, 10, 100, 1000 and so on and add that value to what you already have. The values you have to multiply the digit *with* can be obtained if you start out with 1 on the far right, then multiply with 10 every time you go left. So you get to multiply with 1, then with 1*10=10, then with 10*10=100, then 10*100=1000 and so on. This is our "base 10" number system.

Now the *binary system* is a "base 2"-system. A binary number might looks like this: 1011
This is not "one thousand and eleven", as it would be in the decimal system. In the decimal system we multiplied the digits at the various positions with a 1, 10, 10*10=100, 10*100=1000 and so on, but this time, in the binary system (also known as "dual system"), we multiply with 1, 2, 2*2=4, 2*4=8, 2*8=16 and so on, starting at the right and doing with a "2" what we did with a "10" before. So the value of the binary number above is "1 times 1 +1 times 2 + 0 times 4 + 1 times 8", giving us the number "11" in our decimal system.

Computers use bits to calculate everything, because they can be modeled as "current on" or "current off" easily. With a packet of 4 bits (called a "nibble") you can describe the numbers from 0 to 15. A packet of 8 bits is called a "byte", and *these* are what the file contains. Each byte can describe a number from 0 to 255.
So, a file is a list (or "array") of "bytes", with each byte at a certain index or "offset". These bytes are what Translhextion displays in the middle of the screen, but in yet another different number system: the "hexadecimal" system, which is a system with the base 16. So there are 16 digits, the normal ones from 0 to 9, and additionally the letters "a" through "f" with following values:

a = 10
b = 11
c = 12
d = 13
e = 14
f = 15
So the "hex" (short for "hexadecimal") number "f14a" means: "10 times 1 + 4 times 16 + 1 time 16*16(=256) + 15 times 16*256(=4096)", giving a value of 61770 in decimal. The point of using hex numbers is that values of up to 255 (maximum value of a byte) can be described by employing up to 2 hexadecimal digits, which gives a neat and ordered output on hex editors such as Translhextion.
On the right side of the Translhextion window there are "normal" letters, punctuation symbols and so on. This is because each byte can be seen as an index to a symbol in a "character set". This means that the computer stores for example the letter "a" as a number with the value 97 ("61" in hex notation), the comma "," as a number 44 and so on. So if you write a text on the computer and save it to disk, then the file will contain a number of bytes, each byte signifying a letter, other symbol, or a special code number for "End of line" or "Tabulator" and such. But there are several character sets in use (for example Windows uses either the ANSI or the OEM character set): In one character set the number 97 might mean the letter "a", but in another it could be used for the apostrophe.

Primer provided by Raihan Kibria