1 Computer Architecture:Introduction

Dr A. P. Shanthi

Introduction

The objectives of this module are to understand the importance of studying Computer Architecture, indicate the basic components and working of the traditional von Neumann architecture, discuss the different types of computer systems that are present today, look at the different types of parallelism that programs exhibit and how the architectures exploit these various types of parallelism.

The first and foremost reason is that Computer Architecture is an exciting subject. You will find many interesting facts about the machine that you use thrown open to you and you’ll find it a very interesting course. Any computer engineer or scientist should basically know the underlying details of the machine he or she is going to use. You may be an application programmer, a compiler writer or any software designer. Only if you know the underlying architecture, you’ll be able to use the machine much more effectively and your performance will improve. To become an expert on computer hardware you need to know the underlying concepts of computer architecture. Even if you’re only looking at becoming a software designer, you need to understand the internals of the machine in order to improve the code performance. Also to explore new opportunities, you need to be updated about the latest technological improvements that are happening. Only if you know the latest technological improvements, you’ll be able to apply those technological improvements to your advantage. This subject has an impact on all fields of engineering and science because computers are present everywhere and whatever field of engineering and science you are at, you know that computers are very predominantly used and the study on computer architecture will be very useful in order to use your machine more effectively.

A computer by definition is a sophisticated electronic calculating machine that accepts input information, processes the information according to a list of stored instructions and finally produces the resulting output information. Based on the functions performed by the computer, we can identify the components of a digital computer as, the input unit that takes in information, the processing unit that processes the information, the memory unit that stores the information and the output unit that outputs the data. The data part is the path through which your information flows, say you have an arithmetic and logical unit called the ALU which includes functional units like adders, subtractors, multipliers, shifters etc and you also have registers which are used as storage media within the processor because the data has to be stored somewhere for processing. Registers are inbuilt storage mechanisms available within the processor and the ALU is used for performing all arithmetic and logical operations. For the control path you need to have some unit which will coordinate the activities of the various units you should know when data flows from one point to another point, when an addition operation has to take place, when a subtraction operation has to take place, so on and so forth. So the control path coordinates the activities of the various units of the computer system and the data path and the control path put together is called the central processing unit or popularly abbreviated as the CPU. The data storage consists of the memory unit which stores all the information that is required for processing, the data as well as the program. The program is nothing but a list of instructions. Computers are only dumb machines that work according to the instructions that are given. If you instruct it to add, it will add. Initially the program is stored in memory, you take instructions from there, you execute them and output the results to the outside world, with devices like a monitor or printer. Apart from these classical components, every machine typically has a network component for communication with the other machines. We know that we don’t operate them only as a stand-alone machine and we need to communicate from one machine to another machine either within a very short distance or across the globe.

Computer architecture composes of computer organisation and the Instruction Set Architecture, ISA. ISA gives a logical view of what a computer is capable of doing and when you look at computer organization, it basically talks about how ISA is implemented. Both these put together is normally called computer architecture and in this course we are trying to cover both the computer organisation part as well as the ISA part.

To give a basic idea about what an instruction is, we will look at some sample instructions. Instructions basically specify commands to the processor, like transfering information from one point to another within a computer, say, for example, from one register to another register, from a memory location to a register or an input output device. You will have specific instructions which will say transfer the information from this source to this destination. So instructions basically specify commands to either transfer information from one point to another within a computer, instruct the computer to perform arithmetic and logical operations like multiply these two numbers, etc. You need to also have some instructions to control the flow of the program. Say for example, I’m trying to add two numbers, and if the result is greater than something I want to take one course of action and if the result is less than something, I want to take a different course of action. These instructions will allow you to control the flow of the program. Jump instructions will make the control to transfer to a different point. You may have a subroutine call, a function call. When we do modular programming, when you are executing something, you need to specifically go to execute a function, get the result and then continue with the main program. These instructions are examples of control flow instructions.

When you have a sequence of instructions to perform a particular task, it is called a program, which is stored in memory. Say for example, if I have to add two numbers, and those numbers are stored in memory. From memory, you have to bring the numbers to the adder unit and add. So, we need data transfer instructions to transfer the data from memory to the processor and an add instruction to add. The processor fetches instructions that make up a program from the memory and performs the operations stated in those instructions exactly in that order. Suppose you have a control flow instruction in between and it says don’t execute the next instruction but jump to some other location and execute that instruction, the control is transferred to that point.

Once we have some idea of what these instructions are, we also need to know on what data these instructions operate. The data could be decimal numbers, binary numbers or octal numbers, or encoded characters. The memory unit stores instructions as well as data as a sequence of bits. Groups of bits stored or retrieved at a time and is processed is normally called a word. The word length of the processor depends upon the processor that you’re looking at, if it is an 8-bit processor, the word length is eight. If it’s a 64-bit processor, you talk about a word length of 64.

In order to read from and write to the memory, we should know how to access the memory. The memory consists of a number of memory locations, for example, if I’m looking at 1K memory, I will have 1024 memory locations. Just like we have unique addresses to identify our houses, each memory location has a unique address of 10 bits in this case. In order to access the memory location, we need to know the unique address of the memory location and the processor reads or writes to and from memory based on this memory address. A random access memory provides fixed access times, independent of the location of the word. We define memory access time as the time that elapses between the initiation of a request and the satisfaction of the request. Say for example, I’ve put in a memory read request, so the time between the requisition that has been placed and the time when the data actually arrives is called the memory access time. The memory access time depends upon the speed of the memory unit – a slow memory has larger access times and a fast memory has slower access times.

When you look at memory, we need the memory to be fast, large enough to accommodate voluminous data and also affordable. Now, all this does not come together. So, we do not look at a flat memory system, but have a hierarchical memory system. The processor and the memory will have to communicate with each other in order to read and write information. So, in order to cope up with the processor speed and reduce the communication time, a small amount of RAM, normally known as the cache is tightly coupled with the processor and modern computers have multiple levels of caches. Then, we have the main memory and then the secondary storage. The fastest memory, closest to the processor, satisfies the speed requirements and the farthest memory satisfies the capacity requirements. The cost also decreases as we move away from the inner most level. Though we look at a main memory which is very high these days, the main memory is not obviously enough to store all your programs and data so you need to look at secondary storage, capable of storing large amounts of data. Examples are magnetic disks and tapes, optical discs, CDs, etc. The access to the data stored in secondary storages is definitely slower, but you take advantage of the fact that the most frequently accessed data is placed closer to the processor.

Having looked at the basic components of a digital computer, we should also have some means of connecting these components together and communicating between them. The connection is done by means of wires called a bus. The bus is nothing but an interconnection of wires, capable of carrying bits of information. Functional units are connected by means of a group of parallel wires, each wire in a bus can transfer one bit of information and the number of parallel wires on the bus is normally equal to the word length of the computer. When you talk about a processor which has a word length of, say, 64-bits, it means typically the processor operates on 64 bits of data. So it is only reasonable that we also have a bus which can transfer 64 bits of data from one point of the computer to another point.

You know that the information handled by a computer can be either instructions or data. Instructions or machine instructions are explicit commands that govern the transfer of information within a computer as well as between the computer and the memory and I/O devices and specify the arithmetic and logic operations to be performed. A list of instructions that perform a task is called a program. The program is usually stored in memory and the processor fetches these instructions one after the other and executes them. The earliest computing machines had fixed programs. Some very simple computers still use this design, either for simplicity or training purposes. For example, a desk calculator is a fixed program computer. It can do basic mathematics, but it cannot be used as a word processor or to run video games. To change the program of such a machine, you have to re-wire or reprogram the machine. Reprogramming, when it was possible at all, was a very manual process, starting with flow charts and paper notes, followed by detailed engineering designs, and then the often-arduous process of implementing the physical changes.

The idea of the stored-program computer changed all that. By creating an instruction set architecture and detailing the computation as a series of instructions (the program), the machine becomes much more flexible. By treating those instructions in the same way as data, a stored-program machine can easily change the program, and can do so under program control. The terms “von Neumann architecture” and “stored-program computer” are generally used interchangeably. Instructions, as well as data, are stored in memory as a sequence of zeros and ones and the processor executes these instructions sequentially and program flow is controlled or governed by the type of instructions and other factors like interrupts, etc. The fetch and execute cycle is repeated continuously so an instruction is fetched from memory and executed and then you go ahead and fetch the next instruction from memory. This indicates the fetch execute cycle. The instruction is fetched from memory using the unique address, decoded and then executed. The instruction is after all a sequence of zeros and ones, and you need to know what is to be done with those zeros and ones – whether it is an addition to be performed or what operation is to be performed, where the operands are available and so on. Once the entire information is available, fetch the operands and go ahead with the execution and then finally store the result. The advantages of the stored program concept is that programs can be simply shipped as files of binary numbers that maintain the binary compatibility and computers can inherit ready-made software provided they are compatible with the existing ISA.

Computer organization, as we pointed out earlier is the realization of the instruction set architecture. You will have to look at the characteristics of the principal components that make up your computer system, ways in which these computer systems are interconnected and how information flows between these components.

There has been a lot of technological improvements that has been happening starting from 1951 – from vacuum tubes we went into transistors, ICs, VLSIs, ultra-scale ICs, so on and so forth. We find that the processor transistor counts have increased about 32 to 40% every year, thanks to Moore’s Law. Moore’s Law was basically proposed by Gordon Moore of Intel in 1965 and he proposed that the transistor densities are going to be doubled every 18 to 24 months and that has really been holding good. The memory capacity also has gone up to about 60% per year. All these technological advancements give room for better or new applications. The applications demand more and more and the processors are becoming better and better and this is vicious cycle. The performance improved greatly from 1978 to 2005. After 2005, you find that the performance has actually slowed down due to what is called the power wall and the memory wall.

You have different classes or different types of computer systems that are available. One is the desktop and notebook computers, the most competent market. Here we look at general-purpose applications where you plan to run a lot of applications and the main constraint is the cost performance trade-off. The next category of computer systems is the server systems, where they need to have high capacity and performance is very important. For servers, reliability and availability are very important. Throughput needs to be high for such systems. We also have embedded systems, where the computers are hidden as part of a larger system. For example, when you look at a mobile phone, you don’t realize that it is a computer system but you know that there are many processors inside your mobile phone. A washing machine is a simple example of an embedded system. These embedded computers have a stringent power performance requirement, they have stringent cost constraints and they are specifically meant for a particular application. It is a processor which is meant to do a particular task, unlike a desktop processor you’re not going to run a range of applications. It is expected to perform well with respect to that particular application and this is a class of computer system which covers a wide range of applications. Your requirements may range from a very small toy car application to a very sophisticated diagnostic system for example or a surveillance mechanism. Depending on that, all your requirements are going to change. Last of all, you also have the personal mobile devices which are very predominant today, where cost is important, energy is important and media performance becomes very important. Personal mobile also will have to lay a lot of importance on the responsiveness. Once you put in a request to a PMD, you expect to get an answer immediately. So responsiveness is very important when you’re looking at personal mobile devices. And of course, these days you also have clusters and warehouse scale computers that are becoming very popular. You have large number of computers put together and called a cluster. Here again, price performance becomes very important and throughput is important. The number of transactions done per unit time or the number of web services that have been serviced all that becomes very important when you’re looking at clusters. It is again the same as that of your servers and energy proportionality also gains a lot of importance when you look at this type of computer systems.

The main driving forces of computer systems are energy and cost. Today everybody is striving to design computer systems which will minimize your energy and cost. Also we’ll have to look at the different types of parallelism that your applications exhibit and try to exploit this parallelism in the computer systems that we designed. So that becomes the primary driving force of a computer system. The different types of parallelism that programs may exhibit are called data level parallelism and task level parallelism. You need to design systems that exploit them. There are different techniques that processors use to exploit parallelism. Even in a sequential execution, there are different techniques available to exploit the instruction level parallelism, ILP, i.e. executing independent instructions parallel. When there is data level parallelism available in programs, vector processors and SIMD style of architectures try to exploit them. Processors also look at having multiple threads of execution. Thread level parallelism is exploited more in terms of task level parallelism and when it is done in a more loosely coupled architecture we call it a request level parallelism. So applications exhibit different types of parallelism and the computer hardware that you’re designing should try to exploit that parallelism and try to give better performance.

To summarize, in this module, we pointed out why you need to study computer architecture, that is, the motivation for the course, what is it that you are going to study in this computer architecture course, and then be pointed out the functional units of a digital computer and how they are interconnected, what is meant by a traditional von Neumann architecture. Last of all, we pointed out the different classes of computer systems and the driving forces that are driving us to come up with better and better computer architectures in order to exploit the parallelism that is available among the various applications and also bring down the energy and cost.

Web Links / Supporting Materials

http://en.wikipedia.org/wiki/Computer_architecture
Computer Architecture – A Quantitative Approach, John L. Hennessy and David A. Patterson, Fifth Edition, Morgan Kaufmann, 2011.
Computer Organization and Architecture – Designing for Performance, William Stallings, Eighth Edition, Pearson, 2010.