THE CENTRAL PROCESSING UNIT (PROCESSOR) - HOW IT WORKS AND PROCESS DATA

Introduction

The Central Processing Unit (CPU), also called a processing unit, microprocessor, or a processor, is an integrated circuit. An integrated circuit that integrates billions of tiny electrical parts, arranging them into circuits and fitting them all into a small compact box.

Bottom side of a Procesor, showing its pins

Bottom side of an Intel 80486DX2 Procesor, showing its pins

Image Credit: CC BY-SA 2.0,


The retrieval and execution of data and instructions for a computer is conducted inside the CPU. The processor is essentially the brain of a computing device that processes and controls all the activities inside the computer. This could be from a software or hardware part. The components inside the processor are classified into three groups; which consists of an arithmetic and logic unit (ALU), control unit, and various registers.

The ALU performs the arithmetic operations, logic operations, and related operations, according to the program set instructions. The control unit controls all the CPU operations, including the ALU operations, the movement of data within the CPU, and the exchange of data and control signals across external interfaces, through system bus. Registers, also called Accumulators are high-speed internal memory-storage units located within the CPU.

Simplified diagram of a microprocessor

A basic skeleton of computer structure.

Image Credit: By User:Lambtron -CC BY-SA 4.0,


Simplified diagram of a microprocessor. Black lines indicate data flow, whereas red lines indicate control flow; arrows indicate flow directions.


DATA PROCESSING OPERATION INSIDE MICROPROCESSOR

The central processing unit interact with program instruction and other parts of a computer by using a specific language, similar to how we human used to communicate among ourselves using a particular language. The only language CPU understand is called a binary language - language that deals with 1s and 0s only. Any type of data in computer, be it a texts, images, or sounds must be converted to 1s and 0s by the control unit before the processor will be able to execute it.

With this, we’re now ready to jump inside the central processing unit and explore how it handles and executes data, and also how it controls other hardware parts and software inside the computer. But let's start by observing how the data and information are been handles inside register-storage device located in the CPU.

Very few among us can do complex math operations within our heads. Even for something as simple as adding rows of numbers, we need a pencil and paper to keep track of which numbers go where. Microprocessors are not all that different in this regard. Although, they are capable of performing intricate math involving thousands or millions of numbers, processors, too, need notepads to keep track of their calculations. This CPU notepads are what we called registers, and their pencils are pulses of electricity.

The register of processor consists of several parts, and each part has a specific role. One of these parts of register is a reserved section for transistors in the faster memory inside the microprocessor.

Inside the CPU, there is the processor’s arithmetic logic unit (ALU), in charge of carrying out math instructions, and the control unit, which coordinates instructions and data through the processor, have quick access to the registers. The size of the registers determines how much data the processor can work with at one time. Most personal computers have registers with 32 or 64 bits for data.

The processor’s control unit directs the fetching and execution of program instructions. It uses an electrical signal to fetch each instruction, decodes it – translates the instruction into machine language i.e. 0s and 1s, and sends another control signal to the arithmetic logic unit telling the ALU what operation to carry out.

With each clock cycle - the small unit of time during which the different components of a computer can do one thing, mostly in fraction of a second, the processor writes or reads the values of the bits by sending or withholding a pulse of electricity to each. Each chunk of binary numbers is only that. They have no labels to identify them as instructions, data, values going into a computation, or the product of executing instructions. What the values represent depends on which registers the control unit uses to store them.

Address registers collect the contents of different addresses in RAM or in the processor’s on-board cache, where they have been prefetched in anticipation that they would be needed.

When the processor gets the contents of a location in memory, it tells the data bus to place those values into a memory data register. When the processor wants to put values of data to memory, it places the values in the memory data register, where the bus retrieves them to transfer to RAM.

A program counter register holds the memory address of the next value the processor will fetch. As soon as a value is retrieved, the processor increments the program counter’s contents by 1, so it points to the next program data location.

The processor puts the results of executing an operation into several accumulation registers, where they await the results of other executing operations, before assembling all the results and put it into yet another accumulator.


How CPU Works

Video credit: Fireship!


HANDLING MATH OPERATION INSIDE PROCESSOR

As we have already know, all the information inside processor - words and graphics as well as numbers and sounds - is stored in and manipulated by a processor in the form of binary numbers. In this binary numerical system, there exist only two digits – 0s and 1s. All numbers, words, and graphics are formed from different combinations of these digits.

DECIMAL NUMBERBINARY NUMBER
00
11
210
311
4100
5101
6110
7111
81000
91001
101010

Equivalence of Decimal number in Binary number system


Transistor switches are used to manipulate binary numbers since there are only two possible states of a switch, it is either in open (off) state or in closed (on) state, which nicely matches the two binary digits. An open transistor, through which no current is flowing, represents a 0. A closed transistor, which allows a pulse of electricity regulated by the PC’s clock to pass through, represents a 1. The computer’s clock regulates how fast the computer works. The faster a clock ticks, causing pulses of electricity, the faster the computer works. Clock speeds are measured in megahertz or gigahertz millions or billions of ticks per second. Current passing through one transistor can be used to control another transistor, in effect turning the switch on and off to change what the second transistor represents. Such an arrangement is called a gate because, like a fence gate, the transistor can be open or closed, allowing or stopping current flowing through it.

Transistor: Open (OFF)

Transistor: Open (OFF)


Transistor: Closed (ON)

Transistor: Closed (ON)


The simplest operation that can be performed with a transistor is called a NOT logic gate, made up of only a single transistor. This NOT gate is designed to take one input from the clock and one from another transistor. The NOT gate produces a single output - one that’s always the opposite of the input from the transistor. When current from another transistor representing a 1 is sent to a NOT gate, the gate’s own transistor switches open so that a pulse, or current, from the clock can’t flow through it, which makes the NOT gate’s output 0. A 0 input closes the NOT gate’s transistor so that the clock pulse passes through it to produce the opposite of the input, as its output which is 1.

INPUT FROM OTHER TRANSISTORNOT LOGIC GATE OUTPUT
10
01

NOT logic gate


NOT gates strung together in different combinations to create other logic gates, all of which have a line to receive pulses from the clock and two other input lines for pulses from other logic gates. With different combinations of logic gates, a computer performs the math that is the foundation of all its operations. This is accomplished with gate designs called half-adders and full–adders. A half-adder consists of an XOR gate and an AND gate, both of which receive the same input representing a one-digit binary number. A full-adder consists of half-adders and other switches.

XOR logic gate

XOR logic gate


AND logic gate

AND logic gate


A combination of a half-adder and a full-adder handles larger binary numbers and can generate results that involve carrying over numbers. To add the decimal numbers 6 and 3 (110 and 11 in the binary system), first the half-adder processes the digits on the right side through both XOR and AND gates.

MOVEMENT OF DATA IN THE CPU

There exist more than 100 billion transistors inside the microprocessor of a computer. Taking a walk through one of the transistors could easily get a person hopelessly lost. Old or new, however, how a processor performs its most basic functions hasn’t changed. Microprocessor could have as many several processor chips bundled in it, called multi-cores, and multiple caches but, like old Pentium III processor, they all face the same problem of how to move data quickly and without any hitch (Ron White, 2015).

A processor and its integrated cache share the same interface to the computer’s information. Program code or data manipulated by that program code move in and out of the processor chip at the computer’s maximum bus speed. Much of a computer’s architecture is structured to alleviate the bus bottleneck by minimizing the times a clock cycle - the smallest time in which a computer can do anything - ticks away without the processor completing an operation.

When information enters the processor through the bus interface unit (BIU), the BIU duplicates the information and sends one copy to the CPU’s closest data caches that are housed directly within the processor core. The BIU sends program code to the Level 1 instruction cache, or I-cache, and sends data to be used by the code to another part of L1 cache called, the data cache (D-cache).

While the fetch/decode unit – the unit responsible for getting data and instructions from RAM or from cache memory, is pulling in instructions from the I-cache, the branch target buffer (BTB) compares each instruction with a record in a separate set-aside buffer to see whether any instruction has been used before. The BTB is looking in particular for instructions that involve branching, a situation in which the program’s execution could follow one of two paths. If the BTB finds a branch instruction, it predicts, based on past experience, which path the program will take.

As the fetch/decode unit pulls instructions in the order predicted by the BTB, three decoders working in parallel break up the more complex instructions into mops, which are smaller micro-operations. Then, another unit called the dispatch/execution unit processes several mops faster than it processes a single higher-level instruction.

The decode unit sends all mops to the instruction pool, also called the reorder buffer. This contains two arithmetic logic units (ALUs) that handle all calculations involving integers. The ALUs use a circular buffer, with a head and tail, that contains the mops in the order in which the BTB predicted they would be needed.

The dispatch/execute unit checks each mop in the buffer to see whether it has all the information needed to process it, and when it finds a mop ready to process, the unit executes it, stores the result in the micro-op itself, and marks it as done. If a mop needs data from memory, the execute unit skips it, and the processor looks for the information first in the nearby L1 cache. If the data isn’t there, the processor checks the next cache level, L2 in this case.

Instead of sitting idle while that information is fetched, the execute unit continues inspecting each mop in the buffer for those it can execute. This is called speculative execution because the order of mops in the circular buffer is based on the BTB’s branch predictions. The unit executes up to five mops simultaneously. When the execution unit reaches the end of the buffer, it starts at the head again, re-checking all the mops to see whether, any have finally received the data they need to be executed.

If an operation involves decimal-point numbers, such as 7.34 or .23333, the ALUs hand off the job to the floating-point math unit, which contains processing tools designed to manipulate decimal-point numbers quickly.

When a mop that had been delayed is finally processed, the execute unit compares the results with those predicted by the BTB. Where the prediction fails, a component called the jump execution unit (JEU) moves the end marker from the last mop in line to the mop that was predicted incorrectly. This signals that all mops behind the end marker should be ignored and can be overwritten by new mops. The BTB is told that its prediction was incorrect, and that information becomes part of its future predictions.

Meanwhile, the retirement unit is also inspecting the circular buffer. It first checks to see whether the mop at the head of the buffer has been executed. If it hasn’t, the retirement unit keeps checking it until it has been processed. Then, the retirement unit checks the second and third mops. If they’re already executed, the unit sends all three results - its maximum, to the store buffer. There, the prediction unit checks them out one last time before they’re sent to their proper place in system RAM.

Multi-Core Processors and How it Works

There are billions or so transistors in any recently manufactured microprocessors, we may think that these number of transistors would more than satisfy the requirement of the most intense software operations we can push through the processor chips. But in reality, the word enough doesn’t exist in computing world. So, if it’s too hard to put more transistors on the processors, there’s another solution: Put more processors as a single device in the computer, the term called multi-core processors. Multi-core processors are like putting a couple of computers together into a single unit and having them share the same memory, power, and input/output devices. Machines with between two and more processor cores are standard issue, and that number will only get bigger. (Ron White, 2015)

Let’s use a quad-core processor as an example, the one that has four executable processor cores all assembled onto a single chip, for the explanation that follows on how processor works.

Specific designs vary, but a typical quad-core processor combines four execution cores onto a single die, or silicon chip. Other designs spread their cores across two dies. Regardless, these identical cores are the heart of any microprocessor and the part that does the heavy work of executing instructions from software.

To gain the speed and other advantages possible with a multi-processor computer, the operating system (OS) running on it must be designed to recognize that the computing machine has multi-core processors, be able to distinguish them, and know how to handle operations such as hyperthreading. Similarly, software applications, games, and utilities need to be rewritten to use the multiple cores. Such software is referred to as threaded, or multi-threaded software. The software app adding, say, a column of four-place numbers could divide the job into four threads: Adding the 1s-place numbers, the 2nd-place numbers, the 3rd-place numbers, and the 4th-place numbers. Each of the subtasks is directed to a different core.

When the subtasks exit the cores, the operating system combines the threads into a single number, and sends that operation to one of the cores for execution.

If the application software isn’t equipped to work in multiple cores, the operating system can still take advantage of them. It picks one of the cores to run the software and creates an affinity between that core and the program. It then creates affinities between the remaining cores and various tasks. A second core may handle background operations, such as disk optimizing; a third core might supervise a download; and the fourth could render a video that’s streaming from the Internet. Neither the operations nor their finish times are affected by the processing going on in the other cores.

The OS puts that operation into a time-staggered queue along with requests that are going to other cores. Each of the operations enters its respective core on different clicks of the computer’s clock so they are less likely to run into each other or cause a traffic jam in the areas they have mutual access to. Each processor core isn’t completely distinct. They might share access to resources like an on-die graphics processor, memory caches, and more. The operating system can determine how each core shares access to these resources. If only one core is active, for example, the OS dynamically allocates more of the shared cache to that core.



The Central Processor or Microprocessor is one of the most critical components of every computing device device. It serves as a brain for the computer that executes and coordinates all the tasks we run within the computer.

All the recent operating systems in our computing machine takes the advantages of multi-core processors in handling and executing tasks within a very short period of time – about a fraction of second, by dividing these tasks into groups called Threads and handles them to the available cores within the processor chip for execution.

Kernel, the core service that the operating systems built on, is responsible for controlling how the OS interact with all the number of cores present inside the CPU chip, and how the data is been accessed from the RAM. You may wish to read on different operating system Kernels to fully understand how every OS interact with the software and the hardware installed on the machine.


Don't forget to share the article on your social media handles by clicking the Share button so that others can also benefit!




Post a Comment

Previous Post Next Post
/*! lazysizes - v5.3.2 | lazyload script*/