Saturday, August 22, 2015

Addressing modes, part 2: Memory paging and Direct addressing

A brief preface to remind readers that octal is the standard numbering system for the PDP-8, but for most of us decimal is generally easier to understand.  As a general rule, I will use octal when referring to memory addresses or encoded data, but decimal for quantities and most other things.  To make it clear when I'm using one or the other, I will adopt the C language practice of preceding octal numbers with a leading 0.  If there is no leading zero, it will be a decimal number.

So as I discussed in the last post, all operations are encoded into a single 12-bit word.  The operation code and the operand are all contained within a single word, unlike most other architectures where the operand is a second or even a third byte after the first.  And of these 12 bits, only 7 is used for the memory address.  7 bits can represent only 128 numbers, but even a basic PDP-8 has 4k words of ram.  So how does this work?

The PDP-8 breaks it's entire bank of RAM into 128-word pages.  The first starts at address 0, the second at 0200, the third at 0400, and so on and so forth.  There is no actual physical break in ram, of course, it's just a paging scheme to support the instruction set.

To better understand this, let's examine a single assembler instruction: TAD 0250, which for the purposes of this example is in location 0200 (as we'll see, the location of the operation is important for Direct addressing).  As mentioned in the last post, TAD is the mnemonic for two's complement addition to the accumulator.  It will take whatever value is in address 0250 and add it to the accumulator.  Now you might be wondering how 0250 is encoded in 7 bits since, as we've already observed, the largest possible number is 128 or 0177.  The answer, of course, is that 0250 isn't encoded.  Since 0250 is found within the same page as the instruction itself (the page goes from 0200 to 0377), only the address relative to the top of the page is encoded.  Thus the operand for the instruction is encoded as 050, not 0250.

So why does the assembler make us enter in 0250 if it's just going to convert it to 050?  Well, this is where bit 4 comes in, the bit that says whether we're working with the current page or page 0.  Page 0 is the first page in the range of 0 to 0177, and it is the only page that is accessible from all other pages.  In the above instruction, TAD 0250 would be encoded with bit 4 set to 1.  But set bit 4 to 0 and now it's accessing page 0.  So that's why the instruction is encoded TAD 0250 instead of TAD 050; it tells the assembler you're wanting the current page's 050, not page 0's 050.  Of course, in a real program, you'd likely use labels instead of hard-coding addresses, and so much of this would be taken care of automatically, but it's still important to understand as we try to understand the two addressing modes.  The Direct mode is what I have been discussing in this post.  The Indirect mode is the other mode, and it will be the topic of the next post.

One final note for 6502 aficionados.  Page 0 may sound a lot like the 6502's Zero Page, and there certainly are some similarities, but there are also important differences.  Most notably they differ in that, as far as I can tell, there is no real performance benefit to be had by using page 0, other than how it may simplify your code overall.  That is to say, the Zero Page on the 6502 is used for oft-accessed data because it's just faster and requires fewer bytes to encode the instruction, but this isn't the case on the PDP-8.  An instruction fetching data from page 0 takes 12 bits, and so does one fetching from the current page.  And the number of cycles each requires appears to be the same.  So it's certainly a good idea to use page 0 when necessary, but there should be no performance-related compulsion to do so.

Monday, August 17, 2015

Addressing modes, part 1: Instruction decoding

It's been more than a week since my last post, and I've been pretty busy.  Between work and kids' birthday parties, I haven't had a whole lot of free time - and what I've had I honestly haven't wanted to spend writing code.  Nevertheless I've been making my way through DEC's Introduction to Programming and have done a number of small test programs to work things out.  As I said in my earlier post, PDP-8 machine language is an odd beast, but I feel like I've got a decent handle on the major topics, and so I thought I'd try to put some thoughts down on virtual paper.

The next few posts won't be a primer on PDP-8 assembly language by any means.  For that I'd refer to you to the aforementioned Introduction to Programming.  Instead I'll just talk about the topics that stood out to me as different from other machines.
 
The first topic I wanted to tackle is the two addressing modes, because I think they're sufficiently different from other machines that they'd cause some confusion, but before we can tackle that, I think it's important to understand how instructions are encoded.

Unlike most other CPU's, all instructions for the PDP-8 are encoded within a single 12-bit word (remembering, of course, that the PDP-8 is a 12-bit architecture, not 8-bit).  Most machines that I'm familiar with use a variable number of bytes for a single instruction.  Take, for example, the LDA instruction for the 6502 architecture, which loads a value into the accumulator.  Used with an immediate value (a hard-coded number), it requires 2 bytes to encode.  Used with absolute addressing (2-byte memory address), it takes up 3 bytes.  On the PDP-8, however, all instructions are encoded within only one 12-bit word.  So how does this work?

The figure above, taken from the Introduction to Programming manual, lays out how the the 12-bits are used.  (Note that this is only for MRI [Memory Reference Instructions], instructions that read or write memory.  Obviously there are other operations, but these are obviously very important.  We'll cover other operations in future posts.)  The first three bits (Operation Code) indicate the specific operation: AND, TAD, DCA, JMP, ISZ, and JMS.  I won't get into what each of these does, since I think these would be pretty familiar to any semi-knowledgeable programming, except for some of the weird mnemonics.  Think of them like this:

  • TAD = ADD
  • DCA = STO or MOV
  • JMS = JSR or CALL
  • ISZ = BEQ or JE
  • JMP  and AND are pretty universal mnemonics, I think
These aren't exact translations (ISZ is definitely a little weird compared to BEQ or JE), but they're close enough for this discussion.  And these are pretty much the core instructions you'd expect of any assembly language.  Of course, these aren't the only instructions you'd need for a full language.  Now the clever reading might have already figured out that with only 3 bits to designate the operation, that leaves us only 2 more operations after these six.  There are actually quite a few more, but they operate very differently and don't access memory, so we'll come back to those in a future post.  Right now we're focused on addressing modes and memory.

So with 3 of our 12 bits used, that leaves only 9 bits for the operand.  Remember, we can't use a second 12-bit word as our operand, we have to contain the entire instruction in these 12 bits.  So limited to only 9 bits, we can only access a maximum of 512 addresses.  But it's even worse than that.  As you see from the diagram bit 3 indicates whether you're doing direct addressing or indirect addressing, and bit 4 indicates whether you're access the current page or page 0.  That leaves only 7 bits for an actual address, which means that we can only access 128 bytes.  Obviously since the base PDP-8 had 4k and was expandable to 32k, there's more to it than just this, but understanding the 7-bit limitation is important to understanding why the addressing modes work the way they do.

In the next post (hopefully in a day or two), we'll cover memory paging and page 0 - basically what bit 4 is for.

Saturday, August 8, 2015

A new PDP-8 project, new stuff to learn

It's been over a week since my last post.  I've been pretty busy at work, with an important deadline looming, but I have to admit I've sort of been taking a break from retrocomputing for a few days.  But as I mentioned in my last post wrapping up the Retrochallenge, I planned to continue working with the PDP-8.  So here I am back at it, with a new project.

I've had a lot of ideas about what to do next, but what I've finally settled on is that I'd like to write a cross-assembler, something that will generate PDP-8 machine code in a BIN file that can be read by the paper tape reader in simh.

But this is a bit of a long-term goal, not an immediate one.  The one downside to my Retrochallenge project, as I mentioned in my previous post, is that learning Fortran shielded me from many of the lower level details of the PDP-8.  So I have to learn that, starting with the PAL8 assembly language itself.  So my next few posts will almost certainly be about what I've learned and how to write code in PAL8 assembly language.

Toward that end, I've started reading the "Introduction to Programming", put out by DEC in 1969.  It's a gentler introduction to PDP-8 assembly language than the OS/8 Handbook that I've been relying on so far.  It seems to take a lot less for granted in terms of knowledge on the part of the reader, and it does a much better job of explaining things.  This is both good and bad, of course.  For example, the first chapter covers numbering systems such as binary and octal (the standard for PDP-8 code), how to perform arithmetic operations on such numbers, etc.  Being an experienced programmer, though, this was mostly just review, so I actually ended up skipping large chunks of this.  But the second chapter starts to get into the meat of the topic, so I'm looking forward to making my way through that.

Just a quick note before going further: My experience with assembly is completely limited to microcomputers, and even then a small set.  Although I've done some x86 assembler in the past, it's been quite some time, and now its really 6502 assembly that I'm most familiar with.  So naturally as I'm working my way through PDP-8's assembly language, I can't help but compare it to the 6502, and I'm sure that will come out in my posts.  If you're not familiar with the 6502 and its assembly language idioms, then I apologize.  I can only hope that this will not prove to be too much of an obstacle.