National Semiconductor's IPC-16A PACE, short for "Processing and Control Element", was the first commercial single-chip 16-bit microprocessor, announced in late 1974.[1] It was a single-chip implementation of their early 1973 five-chip IMP-16 architecture, which in turn had been inspired by the Data General Nova minicomputer. To the basic IMP-16, PACE added a new operational mode, "byte mode", which was useful for working with 8-bit data like ASCII text.
Implemented in pMOS, as was common for the era, PACE required three supply voltages and an external clock with enough signal to drive the internal logic. This was normally supplied by the STE chip. Most PACE systems also required the BTE chip to convert the higher internal voltage signals to TTL levels used by the rest of the system. Its multiplexed address and data pins also required additional logic.
Although National Semiconductor had second source agreements with Signetics and Rockwell Semiconductor, neither company produced the PACE design. The PACE was followed by the INS8900, which had the same architecture but was implemented in nMOS. This version made electrical interfacing easier and also fixed several bugs in the PACE logic and increased the speed about 50%. By the time it was available, higher-performance 16-bit CPUs were appearing, and the company began to deemphasize sales of the line.
The PACE was packaged in a 40-pin dual in-line package (DIP), originally in ceramic. As it was based on pMOS logic, the PACE series required three supply voltages, +5V (VSS, pin 20), +8V (VBB, pin 23) and -12V as the ground level (VGG, pin 29). The +8V level was normally supplied using simple electronics fed by the +5V line, thus reducing the complexity of the power supply.
The chip was normally driven using an external 750 nanosecond clock (1.33 MHz) using the System Timing Element, STE, chip to produce signals of the required signal strength. As these signals were also used by external devices, the clock signals were at TTL levels, +5V, in contrast to most pins which were at +8V.
As the external signals were presented at the +8V, interfacing the system with common devices working at TTL levels was not trivial. For this reason, systems using the PACE normally included a Bidirectional Transceiver Element, BTE. This worked in conjunction with the PACE to produce a complete set of bus signals at TTL voltages that could then be used to easily interface with most contemporary devices like SRAM.
In order to fit 16-bit addresses and data onto a 40-pin DIP, the same set of 16 pins was multiplexed between presenting an address and reading and writing data on separate cycles. This required the external devices, like the main memory, to latch the address between cycles.
National Semiconductor's IMP-16 had been inspired by the Data General Nova but had a number of minor differences in its ISA. Among these was the handling of the four user-accessible 16-bit processor registers. In the Nova, the first two registers were general-purpose accumulators and used for most basic arithmetic and logic operations, while the second two could be used as operands or used as index registers. The IMP-16 followed this model, but the PACE changed a number of instructions so that they operated only on the first accumulator, AC0.
The original Nova did not implement a stack in hardware, although this was added in the later Nova 3 models starting in 1975. PACE implemented a different style of stack using a hidden stack pointer which is automatically incremented and decremented when and instructions are encountered. The Program Counter (PC) is automatically pushed or pulled to the stack during subroutine calls and returns. Additional instructions allows the four registers and the Status and Control Flag Register to be pushed and pulled as well.
PACE has ten 16-bit internal locations that hold the topmost stack values. A unique feature of the PACE is that after the stack is almost full and another push is attempted, or it is empty after a pull is attempted, an interrupt is generated. This is normally used to call interrupt handler code that copies some or all of the values to or from the stack into main memory. This allows the internal stack registers to be used like a cache of a larger memory-based stack.
The Status and Control Flag register was also 16-bits wide. Bits 0 and 15 are both set to 1 and are normally unused, while the remaining fourteen are actively used. This included common flags like CRY to indicate an addition resulted in a carry, OVF if it overflowed, and LINK, which indicated a bit needed to be shifted during shift and rotate instructions. LINK is normally handled using the carry flag in most microcomputer CPU designs, but having two separate flags is more common in minicomputers where there are enough available status bits, as it allows the two to be tracked separately during a series of shift/rotate and add instructions, which is a common sequence. The IN EN flag, normally 1, allows interrupts to be enabled or disabled. One unique feature of the PACE, not present in the IMP-16, is the BYTE flag. When this is turned on, data is accessed in 8 bit words instead of 16. This allows for easier processing of 8-bit data like ASCII text.
The rest of the bits in the SCF are mostly mapped directly onto pins on the outside of the chip. Bits 1 through 5 are the IE1 through IE5 flags, which are used to control interrupts in a priority fashion. IE1 is set only in the case of a stack overflow/underflow. The other four can be used to disable individual interrupt lines, or more commonly, produce a binary value from 0 to 15 that external devices use to determine whether or not they should perform an interrupt. For instance, if the value in these flags adds up to 5, any device with an interrupt value of 5 or lower (1 is the highest priority) can express it, a device wishing to call a lower priority, say 7, is being instructed to hold it.
Similarly, SCF flags F11 through F14 are used as outputs to provide direct control over external devices. For instance, they might be used to indicate that device 6 should present data on the bus, which it might do by mapping 128 bytes of internal buffer onto the split base page mentioned earlier.
In contrast to most microcomputer designs of the era, the PACE did not use variable-length instructions, all instructions used 16 bits. The 16-bit words were broken into a series of bit fields for the instruction format. The top six bits, 10 through 15, held the opcode, while bits 8 (R for Relative) and 9 (X for indeX) indicated the addressing mode. The remaining eight bits in the instruction normally held an 8-bit address. This meant that an arbitrary memory location could not be specified directly; several different systems were used to build the required 16-bit address from the 8-bit value. There were 43 instructions and 45 opcodes, with two opcodes each for and (see below).
When X was zero, the address bits represented a direct address in memory. With R also set to zero, the address was within the base page, normally the first 256 bytes of memory. Setting R to 1 and X to 0 used the remaining eight bits as an offset from the PC. Setting the X bit to 1 turned on indexing, using the eight bits in addition to the values in the index registers, with R at 0 it would add the value in AC2, and setting it to 1 used AC3 instead.
Normally the Base Page was the first 256 bytes of memory, but when the pin was asserted it instead split the base page between the first and last 128 bytes. The idea was that external devices would be mapped onto these high memory locations, and could easily watch for writes and reads by examining the address on the bus and seeing if the top nine bits were all 1's. Oddly, there is no instruction to change the setting of the, instead most systems connected the pin to one of the status pins, and then used the status changing instructions to control it.
Indirect addressing in the PACE was limited, supported primarily by the and instructions, which load and saved values between the registers and memory. Indicating indirect addressing used separate opcodes, as opposed to using the addressing indication bits. When used, the address was constructed as normal, adding the eight address bits to the base page or PC. It would then read the 16-bit value in that memory location and then load or store from that address. When combined with the X flag, the 8-bit offset is first added or subtracted from the indicated index register.
Another user of indirect addressing was and . These incremented or decremented a value in memory and were commonly used to implement loops, thus indirect addressing was common as the control variable for the loop might be located outside the code block. Another interesting feature of these instructions was that (in any addressing mode) if the value was changed to zero, the ip instruction was called. This allowed loops to be exited without any additional tests; typically the last instruction in the loop would be a back to the top of the loop, but when the value reached 0 it would automatically past that and continue.
This style of looping control is common in minicomputers, but not so in microcomputer designs. In dedicated micros, this sort of operation is normally accomplished with several instructions, one that compares the loop index with a given value (in this case, zero), then branches back to the top if the condition is not met. The PACE's inherent skip-on-zero was a common feature of minis that sped loop performance by avoiding a separate test. When not appropriate, the increment or decrement could set the offset to zero to avoid triggering this feature.
Continued improvement in semiconductor fabrication in the early 1970s led to the introduction of the NMOS logic concept, or nMOS. This type of logic has the significant advantage that its internal transistors do not require a large voltage on the substrate layer, like pMOS. In practical terms, this means an nMOS processor can operate with only two input voltages rather than three, and the positive supply can be set to +5V, making interfacing with TTL circuits trivially easy.[2]
National Semiconductor took advantage of this technique with a redesign of the PACE in nMOS to create the INS8900. The new version retained much of the original chip layout, although, unsurprisingly, some of the power supply pins changed their inputs; the original +5V VSS was now ground (GND), VBB changed from -8 to +8V, and the former -12V VGG became the +12V VDD. For reasons unknown, two other pins did not change function but did change name; CLK became VCC and NCLK became CLKX.
The most important change in terms of usage was that the various signal pins now worked at TTL voltages, allowing them to communicate directly with external systems like memory. This change did not address the issue of having to latch the address on the shared data/address bus, but it did make such latching much easier. Instead of requiring the relatively complex BTE chip, this task could now be performed by common TTL components, although National Semiconductor suggested their own INS8208 and INS8212 for this purpose. The bus could now be implemented by a single INS8208 buffering the control signals that indicated whether the bus was in address or data mode (among other things), two more INS8208's to each buffer 8-bits of data, and two INS8212 to each latch 8-bits of the address.
Another change made possible by the lower loads in nMOS was that clock signals no longer required as much power. This eliminated the need for the STE, which could be replaced by a suitable crystal and a single 7404 inverter, available from many manufacturers. As the external clock was no longer high-power, only one clock input was needed, the former NCLK, now renamed CLKX. The former second phase was now generated onboard the CPU. These changes also allowed the system to run at a higher speed, a 2 MHz crystal was recommended, increasing fairly significantly from the PACE's 1.33. This improved instruction times to 8 to 20 microseconds.
Other changes included a number of fixes to problems found in the PACE. Notable among these was a problem with the interrupt that was triggered when the stack filled. In the PACE this did not work properly; if the interrupt arrived at exactly the same time as a NIR3 or NIR5, the wrong interrupt code would be called from location 0 rather than 2. National Semiconductor suggested either not using this feature, or placing the same address in both locations so they would always call the same code, which would then determine what had actually occurred. There were similar problems when a level-0 interrupt occurred within a 12 cycles of other interrupts, causing the wrong code to be called. All of these problems were solved in the 8900.
Although the PACE ran at a relatively fast clock speed for the era, the instruction set architecture (ISA) was implemented using microcode and the multiplexed bus required two cycles for each memory access. As a result, a typical instruction took about 12 to 30 microseconds to complete, making it about the same speed as contemporary 8-bit processors like the Intel 8080. This still provided an advantage when working with larger data, for instance in a floating point library, as that single instruction could process twice as much data in a single operation.