Monday, August 17, 2015

Addressing modes, part 1: Instruction decoding

It's been more than a week since my last post, and I've been pretty busy.  Between work and kids' birthday parties, I haven't had a whole lot of free time - and what I've had I honestly haven't wanted to spend writing code.  Nevertheless I've been making my way through DEC's Introduction to Programming and have done a number of small test programs to work things out.  As I said in my earlier post, PDP-8 machine language is an odd beast, but I feel like I've got a decent handle on the major topics, and so I thought I'd try to put some thoughts down on virtual paper.

The next few posts won't be a primer on PDP-8 assembly language by any means.  For that I'd refer to you to the aforementioned Introduction to Programming.  Instead I'll just talk about the topics that stood out to me as different from other machines.
 
The first topic I wanted to tackle is the two addressing modes, because I think they're sufficiently different from other machines that they'd cause some confusion, but before we can tackle that, I think it's important to understand how instructions are encoded.

Unlike most other CPU's, all instructions for the PDP-8 are encoded within a single 12-bit word (remembering, of course, that the PDP-8 is a 12-bit architecture, not 8-bit).  Most machines that I'm familiar with use a variable number of bytes for a single instruction.  Take, for example, the LDA instruction for the 6502 architecture, which loads a value into the accumulator.  Used with an immediate value (a hard-coded number), it requires 2 bytes to encode.  Used with absolute addressing (2-byte memory address), it takes up 3 bytes.  On the PDP-8, however, all instructions are encoded within only one 12-bit word.  So how does this work?

The figure above, taken from the Introduction to Programming manual, lays out how the the 12-bits are used.  (Note that this is only for MRI [Memory Reference Instructions], instructions that read or write memory.  Obviously there are other operations, but these are obviously very important.  We'll cover other operations in future posts.)  The first three bits (Operation Code) indicate the specific operation: AND, TAD, DCA, JMP, ISZ, and JMS.  I won't get into what each of these does, since I think these would be pretty familiar to any semi-knowledgeable programming, except for some of the weird mnemonics.  Think of them like this:

  • TAD = ADD
  • DCA = STO or MOV
  • JMS = JSR or CALL
  • ISZ = BEQ or JE
  • JMP  and AND are pretty universal mnemonics, I think
These aren't exact translations (ISZ is definitely a little weird compared to BEQ or JE), but they're close enough for this discussion.  And these are pretty much the core instructions you'd expect of any assembly language.  Of course, these aren't the only instructions you'd need for a full language.  Now the clever reading might have already figured out that with only 3 bits to designate the operation, that leaves us only 2 more operations after these six.  There are actually quite a few more, but they operate very differently and don't access memory, so we'll come back to those in a future post.  Right now we're focused on addressing modes and memory.

So with 3 of our 12 bits used, that leaves only 9 bits for the operand.  Remember, we can't use a second 12-bit word as our operand, we have to contain the entire instruction in these 12 bits.  So limited to only 9 bits, we can only access a maximum of 512 addresses.  But it's even worse than that.  As you see from the diagram bit 3 indicates whether you're doing direct addressing or indirect addressing, and bit 4 indicates whether you're access the current page or page 0.  That leaves only 7 bits for an actual address, which means that we can only access 128 bytes.  Obviously since the base PDP-8 had 4k and was expandable to 32k, there's more to it than just this, but understanding the 7-bit limitation is important to understanding why the addressing modes work the way they do.

In the next post (hopefully in a day or two), we'll cover memory paging and page 0 - basically what bit 4 is for.

No comments: