Data Representation For Computing

From Computing Concepts
Jump to: navigation, search

Representing Data

Are Computers mystery machines? We have all seen computers do seemingly miraculous things with all kinds of sounds, pictures, graphics, numbers, and text. It seems we can build a replica of parts of our world inside the computer. You might think that this amazing machine is also amazingly complicated - it really is not.

In fact, all of the wonderful multi-media that we see on modern computers is all constructed from simple ON/OFF switches - millions of them - but really nothing much more complicated than a switch. The trick is to take all of the real-world sound, picture, number etc. data that we want in the computer and convert it into the kind of data that can be represented in switches, as shown in Figure 3.

Figure 3: Representing Real-World Data In The Computer

Like with the artist’s abstract composition, the trick is to take all of the real-world sound, picture, number, etc. data that we want in the computer and convert it into the kind of data that can be represented in switches, as shown in Figure 3.

Computers Are Electronic Machines. The computer uses electricity, not mechanical parts, for its data processing and storage. Electricity is plentiful, moves very fast through wires, and electrical parts fail much less frequently than mechanical parts. The computer does have some mechanical parts, like its disk drive, (which are often the sources for computer failures), but the internal data processing and storage is electronic, which is fast and reliable (as long as the computer is plugged in).


Electricity can flow through switches: if the switch is closed, the electricity flows; if the switch is open, the electricity does not flow. To process real-world data in the computer, we need a way to represent the data in switches. Computers do this representation using a binary coding system.

Binary and Switches

Binary is a mathematical number system: a way of counting.

We have all learned to count using ten digits: 0-9. One probable reason is that we have ten fingers to represent numbers. The computer has switches to represent data and switches have only two states: ON and OFF. Binary has two digits to do the counting: 0 and 1 - a natural fit to the two states of a switch (0 = OFF, 1 = ON). As you can read about in the part of this course on the history of computers, the evolution of how switches were built made computers faster, cheaper, and smaller. Originally, a switch was a vacuum tube, about the size of a human thumb. In the 1950's the transistor was invented (and won its inventors a Nobel Prize).

Figure 4: Silicon Chip
It allowed a switch to be the size of a human fingernail. The development of integrated circuits in the 1960s allowed millions of transistors (shown in Figure 5) to be fabricated on a silicon chip as shown in Figure 4 - which allowed millions of switches on something the size of a fingernail.
Figure 5: A group of Transistors

Bits and Bytes

One binary digit (0 or 1) is referred to, as a bit, which is short for binary digit. Thus, one bit can be implemented by one switch, as shown in Figure 6.

In the following table, we see that bits can be grouped together into larger chunks to represent data. A 0 or 1 is one bit, 0110 is four bits, and 01101011 is eight bits. For several reasons that we do not go into here, computer designers use eight bit chunks called bytes as the basic unit of data. A byte is implemented with eight switches as shown in Figure 6.

Figure 6: Implementing (making) a Byte


Computer manufacturers express the capacity of memory and storage in terms of the number of bytes it can hold. The number of bytes can be expressed as kilobytes. Kilo represents 2 to the tenth power, or 1024. Kilobyte is abbreviated KB, or simply K. (Sometimes K is used casually to mean 1000, as in "I earned $30K last year.") A kilobyte is 1024 bytes. Thus, the size of a 640K files is 640x1024, or 655,360 bytes. Some larger files may also be expressed in terms of megabytes (1024x1024 bytes). One megabyte, abbreviated MB, means roughly one million bytes. With storage devices, manufacturers sometimes express memory amounts in terms of gigabytes (abbreviated GB); a gigabyte is roughly a billion bytes. Modern computer hard disks hold gigabytes (e.g. 300GB).


Figure 7: Byte, Kilobyte, Megabyte, Gigabyte and Terabyte

Representing Data In Bytes

Here is an important thing to keep in mind: A single byte can represent many different kinds of data. What data it actually represents depends on how the computer uses the byte. For instance, the byte: 01000011 can represent

1. the integer 67, or

2. the character 'C', or

3. the 67th decibel level for a part of a sound, or

4. the 67th level of darkness for a dot in a picture, or

5. an instruction to the computer like "move to memory",

6. and other kinds of data too.

Integers

Since computers cannot think complexly (yet!), integer numbers are represented in a computer’s “mind”by counting in a number system called Binary. Binary looks like a series of the numbers “1” and “0.” We saw above that 4 bits looks like “0110.”

Pause and think for a minute how humans count in decimal:

Figure: 8 Decimal Counting

1. We start with zero and every new thing we count, we go to the next decimal digit - 0,1,2,3,4,5,6,7,8,9.

2. When we reach the end of the decimal digits, we use two digits to count by putting a digit in the "tens place" and then starting over again using our 10 digits.

3. Thus, the decimal number 10 is a: ~ 1 in the "tens place" and ~ a zero in the "ones place"

4. Eleven is a 1 in the "tens place" and a 1 in the "ones place". And so on. If we need three digits, like 158, we use a third digit in the "hundred's place".


We do a similar thing to count in binary - except now we only have two digits: 0 and 1, instead of nine.

Figure: 9 Binary Counting

1. We start with 0, then 1, then we run out of digits, right?

2. So we need to use the same two digits to keep counting.

3. We do this by ~ putting a 1 in the "two's place" and ~ then using our two digits.

4. Thus, the number “two” appears as “10” in binary: a 1 in the "two's place" and a 0 is the "one's place".

5. The number “three” appears as “11” in binary: a 1 in the "two's place" and a 1 in the "one's place".

6. But wait! We ran out of digits again!

7. Thus, the number “four” appears in binary as “100”: a one in the "four's place" a zero in the "two's place" a 0 in the "one's place".

What "places" we use depends on the counting system. In the decimal system humans use, which we call Base 10, we use powers of 10.

Ten to the zero power is 1, so the counting starts in the "one's place".

Ten to the one power is 10, so the counting continues in the "ten's place".

Ten to the second power (10 squared) is 100, so we continue in the "hundred's place.” And so on.

The Binary system computers use is called Base 2.

Thus, the "places" are two to the zero power ("one's place"), two to the one power ("two's place"), two to the second power ("four's place"), two to the third power ("eight's place"), and so on.

When you look at a byte, the rightmost bit is the "one's place". The next bit is the "two's place", then the "four's place", then the "eight's place", and so on.

So, when we said that the byte: “01000011” in binary represents the decimal integer 67.

We got that by adding up a

1 in the "ones place" and

1 in the "two's place" and a

1 in the "64's place" (two to the 6 power is 64).

Add them up: 1+2+64= 67.


The largest integer that can be represented in one byte is: 11111111 which is 128+64+32+16+8+4+2+1 = 255.

Thus, the largest decimal integer you can store in one byte is 255.

Computers use several bytes together to store larger integers.

The following table shows some binary counting:

Decimal -- Binary

0 -- 0

1 -- 1

2 -- 10

3 -- 11

4 -- 100

5 -- 101

6 -- 110

7 -- 111

8 -- 1000

For some exercises and more detail on binary numbers, try the exercises at http://www.mathsisfun.com/binary-number-system.html.

Hexadecimal (Hex)

Like binary is a number system of Base 2, hexadecimal (also Base 16, or hex) is a numeral system of base 16.

Figure 10: Hexadecimal symbols

Hexadecimal uses sixteen distinct symbols: most often the symbols 0–9 to represent values zero to nine,

and A, B, C, D, E, F (or alternatively af) to represent values ten to fifteen.

For example, the hexadecimal number "2AF3" is equal, in Binary, to 10101011110011, or in Decimal 10995.

Each hexadecimal digit represents four binary digits, and thus is used in computing to abstract four bits to a single unit - the hex digit.

Figure 11: Hexadecimal Table

For example, byte values can range from 0 to 255 (decimal), but may be more conveniently represented as two hexadecimal digits in the range 00 to FF.

Hex editors, which are used to examine and change binary bits in computing devices, typically use this abstraction to make examination of binary data more reasonable for people.

Using 0-9 & A-F

Characters

The computer also uses a single byte to represent a single character. But just what particular set of bits is equivalent to which character? In theory we could each make up our own definitions, declaring certain bit patterns to represent certain characters. Needless to say, this would be about as practical as each person speaking his or her own special language. Since we need to communicate with the computer and with each other, it is appropriate that we use a common scheme for data representation. That is, there must be agreement on which groups of bits represent which characters.

The code called ASCII (pronounced "AS-key"), which stands for American Standard Code for Information Interchange, uses 7 bits for each character.

Since there are exactly 128 unique combinations of 7 bits, this 7-bit code can represent only characters. Find the character "Escape" below.

Notice the Decimal is 27 and the Hexadecimal is 1B.


Figure 12: ASCII Table


A more common version is ASCII-8, also called extended ASCII, which uses 8 bits per character and can represent 256 different characters.

For example, the letter A is represented by 01000001.

The ASCII representation has been adopted as a standard by the U.S. government and is found in a variety of computers, particularly minicomputers and microcomputers. The following table shows part of the ASCII-8 code.

Note that the byte: 01000011 does represent the character 'C'.

Thus, when you type a 'C' on the keyboard, circuitry on the keyboard and in the computer converts the 'C' to the byte: 01000011 and stores the letter in the computer's memory as well as instructing the monitor to display it. If the person typed the word "CAB", it would be represented by the following three bytes in the computer's memory (think of it as three rows of eight switches in memory being ON or OFF):

01000011 C

01000001 A

01000010 B.

Click this link for a full ASCII table.

Picture and Graphic Data

You have probably seen photographs that have been greatly enlarged, or shown close up. If so, you know that a photograph is a big grid of colored dots. A grid of pixels can represent
Figure 13: Pixels

1. computer graphic data like pictures,

2. frames of a movie,

3. drawings,

4. or frames of an animation.

"Pixel" is short for picture element. In simple graphics (those without many colors), a byte can represent a single pixel. In a graphic representation called gray scale each pixel is a shade of grey from black at one extreme to white at the other.

Since eight bytes can hold 256 different integers (0-255 as described a few paragraphs ago), a pixel in one byte can be one of 256 shades of grey (usually with 0 being white and 255 being black).

Modern video games and colorful graphics use several bytes for each pixel (Nintendo 64 uses eight bytes = 64 bits for each pixel to get a huge array of possible colors). See color wheel in Figure 14.

Figure 14: Color wheel

We saw that computer manufacturers got together and agreed how characters will be represented (the ASCII code). For graphics, there are several similar standards or formats.

One of the original formats still in use is called a bitmap (.bmp).

Bitmaps store every pixel of the image and thus results in files with large amounts of bytes. A simple bitmap drawing can easily exceed several megabytes. These files gobble up lots of storage space! File compression, discussed below, doesn't store every pixel, but instead stores patterns of pixles. For instance if 20,000 pixels in a region of the image were all black, instead of storing 20,000 black pixels, a compression algorithm might store just a few bytes that mean "repeat black 20,000 times". Two common compressed graphics formats used on the Internet are JPEG and GIF.

This makes JPEG and GIF much better suited to storing on the small flash cards of digital cameras or for downloading over the relatively slow Internet.

File Size Matters. The size of each image becomes especially important when designing a Web page, sending digital photographs through email, downloading pictures over the Internet, and storing photographs on small flash cards of digital cameras or any other secondary storage device. The primary goal of using these compressed formats such as JPEG and GIF is to shrink the file size to as few bytes as possible without negatively altering the image quality.


Image Quality Affects Size! When considering Web page graphics, the compression ratio is commonly adjusted to make the file size of a graphic smaller. The following images will provide you with an example of the effect that different compression ratios can have on the quality of an image.


Each of the images are equal in pixels, 400 x 336. The original image is in the bitmap format; as previously discussed, this format stores every pixel of the image and results in the largest file size.

Digital images can be created by generating pixel patterns using software such as a drawing program like Paint or an original program you write that draws shapes; by manipulating existing digital images; or by combining images. Using existing software such as video editors, or by modified software can create digital effects and animations that you program yourself to include functionality to implement the effects and animations.

The following JPEG images are significantly smaller than the bitmap image and are displayed beginning with 20% compression and decreasing in file size up to 95% compression.


JPEG format, 400x336 pixels, 20% compressed, 37 kilobytes
JPEG format, 400x336 pixels, 40% compressed, 25 kilobytes
JPEG format, 400x336 pixels, 60% compressed, 19 kilobytes
JPEG format, 400x336 pixels, 80% compressed, 12 kilobytes
JPEG format, 400x336 pixels, 90% compressed, 7 kilobytes
JPEG format, 400x336 pixels, 95% compressed, 4 kilobytes

As you can see, minor degradation is becoming apparent when the image is about 60% compressed; the quality of the image gradually worsens up to 90% compression.

At 95% compression the image above is poorly pixelated, right?

A preferable value for this image would be about 40-50% compression with an image size of 20-25 kilobytes.

Why is this important?

Decreasing the image size in pixels can also reduce the size of a file.

The example below exhibits the same image in a 200x168 format.

JPEG format, 200x168 pixels, 40% compressed, 12 kilobytes
JPEG format, 200x168 pixels, 60% compressed, 9 kilobytes

Once again the image is greatly reduced in file size. The original bitmap image is 155 times larger than it's 95% compressed comparison!

Programs such as Paint Shop Pro and Adobe Photoshop allow you to significantly shrink the size of a file. Simply open the image in such a program and re-save the image at a different compression ratio and save as either JPEG or GIF.

Sound Data As Bytes

Sound occurs naturally as an analog wave, as shown in Figure 15.


Figure 15: Sound Data In Bytes

Most current electronic speakers, the means that we use to electronically reproduce sound, also produce analog waves. However, as we have seen, all data in the computer is digital and must be processed in bytes.

The process of taking analog data, such as sound, and making it digital is called analog to digital conversion.

Many music CD's from old original analog recordings on tapes were converted to digital to be placed on a CD (a CD is digital; it is just a collection of bits with a small hole burned in the CD representing a 1 and no hole representing a 0).

Current music CD's have the analog to digital conversion done in the recording equipment itself, which produces better conversion. To convert an analog wave into digital, converters use a process called sampling.

They sample the height of the sound wave at regular intervals of time, often small fractions of a second. If one byte is used to hold a single sample of an analog wave, then the wave can be one of 256 different heights (0 being the lowest height and 255 being the highest).

These heights represent the decibel level of the sound. Thus a spoken word might occupy several hundred bytes - each being a sample of the sound wave of the voice at a small fraction of a second. If these 100 bytes were sent to a computer's speaker, the spoken word would be reproduced.

Like ASCII for characters and GIF and JPEG for pictures, sound has several agreed-upon formats for representing samples in bytes. WAV is a common format in which every sample is stored; similar to the way that bitmap stores every pixel of an image.

A more common sound format is MP3. MP3 is a compressed format, like JPEG and GIF are compressed format for images. MP3 does not store every sample, instead it stores only samples that the human ear can hear and then condenses these samples to patterns. It is these patterns that are stored so that another computer, or MP3 player, can read them and reproduce the sound.

Digital audio and music can be created by synthesizing sounds, by sampling existing audio and music, and by recording and manipulating sounds, including layering (imposing audio tracks on each other such as having music in the background as you narrate a video), and looping (e.g. having an audio clip play over and over again to form the audio track behind a PowerPoint presentation).

Program Data as Bytes

Think of software programs as a list of instructions telling a computer what to do. When you buy a piece of software on a CD or diskette, you are getting a collection of instructions that someone wrote to tell the computer to perform the task that the software is meant to do.

Each instruction is a byte, or a small collection of bytes. If a computer used one byte for an instruction, it could have up to 256 instructions.

Figure 16: Compiler

Later we will look at what these instructions are, but for now, you should realize that a byte could also be a computer's instruction. The conversion of instructions to bytes is shown in the figure above.

The programming process allows humans to write instructions in an English-like way.

Figure 17: Computer Instructions

A software program called a compiler then transforms the English-like text into the bytes for instructions that the computer understands. This is shown in Figure 16.

Luckily we can write programs to do this work for us!

Macs and PCs don’t play together because they are given different instructions in bytes by people. Like all other kinds of data, there are agreed-upon formats for computer instructions too. One reason that Macintosh computer programs do not run natively on PC-compatible (Intel-based) computers is that Macintoshes and Intel PCs use different formats for coding instructions in bytes.

How Does The Computer Know What a Byte Represents?

We have seen that the byte: 01000011 can represent

the integer 67,

the character 'C',

a pixel with darkness level 67,

a sample of a sound with decibel level 67,

or an instruction.

There are other types of data that a byte can represent too.

If that same byte can be all of those different types of data, how does the computer know what type it is?

The answer is: the context in which the computer uses the byte.

If it sends the byte to a speaker, the 67th level of sound is produced. If it sends the byte to a monitor or printer, a pixel with the 67th level of darkness is produced, etc.

More accurately, if the byte were coded with a standard coding technique, like ASCII for characters, GIF for pictures, and WAV for sounds, then when the computer sends the byte to a device, the data corresponding to that coding is produced by the device.

Summary: Digital Data and Abstraction

Digital data is represented by abstractions at different levels.

At the lowest level, bits represent all digital data. At a higher level, bits are grouped into units such as hex digits and bytes to represent abstractions including but not limited to numbers, characters, and color.

Number bases, including binary, decimal, and hexadecimal, are used to represent and investigate digital data.

At one of the lowest levels of abstraction, digital data is represented in binary (base 2) using only combinations of the digits zero and one.

Figure 18: Hexadecimal symbols

Hexadecimal (base 16) is used to represent digital data because the hexadecimal representation uses fewer digits than binary.

Numbers can be converted from any base to any other base: (Base 2, Base 10, Base 16)

In a computing device, a finite representation is used to model the infinite mathematical concept of a number.

In many programming languages, the fixed number of bits used to represent characters or integers limits the range of integer values and mathematical operations; this limitation can result in overflow or other errors.

Figure 19

A sequence of bits may represent instructions telling a computer what to do - or data. A sequence of bits may represent different types of data in different contexts. Humans write programs that enable computers to interpret binary sequences and determine how it is to be used. A number? A shade of color? A letter?

If you understand how information is represented in computer language, computers are not miraculous machines after all. If you learn computer languages, you can program them to do what you tell them!

References

Parts of this page are based on information from: Wikipedia: The Free Encyclopedia