PDP-8 Challenge

Saturday, August 22, 2015

Addressing modes, part 2: Memory paging and Direct addressing

A brief preface to remind readers that octal is the standard numbering system for the PDP-8, but for most of us decimal is generally easier to understand. As a general rule, I will use octal when referring to memory addresses or encoded data, but decimal for quantities and most other things. To make it clear when I'm using one or the other, I will adopt the C language practice of preceding octal numbers with a leading 0. If there is no leading zero, it will be a decimal number.

So as I discussed in the last post, all operations are encoded into a single 12-bit word. The operation code and the operand are all contained within a single word, unlike most other architectures where the operand is a second or even a third byte after the first. And of these 12 bits, only 7 is used for the memory address. 7 bits can represent only 128 numbers, but even a basic PDP-8 has 4k words of ram. So how does this work?

The PDP-8 breaks it's entire bank of RAM into 128-word pages. The first starts at address 0, the second at 0200, the third at 0400, and so on and so forth. There is no actual physical break in ram, of course, it's just a paging scheme to support the instruction set.

To better understand this, let's examine a single assembler instruction: TAD 0250, which for the purposes of this example is in location 0200 (as we'll see, the location of the operation is important for Direct addressing). As mentioned in the last post, TAD is the mnemonic for two's complement addition to the accumulator. It will take whatever value is in address 0250 and add it to the accumulator. Now you might be wondering how 0250 is encoded in 7 bits since, as we've already observed, the largest possible number is 128 or 0177. The answer, of course, is that 0250 isn't encoded. Since 0250 is found within the same page as the instruction itself (the page goes from 0200 to 0377), only the address relative to the top of the page is encoded. Thus the operand for the instruction is encoded as 050, not 0250.

So why does the assembler make us enter in 0250 if it's just going to convert it to 050? Well, this is where bit 4 comes in, the bit that says whether we're working with the current page or page 0. Page 0 is the first page in the range of 0 to 0177, and it is the only page that is accessible from all other pages. In the above instruction, TAD 0250 would be encoded with bit 4 set to 1. But set bit 4 to 0 and now it's accessing page 0. So that's why the instruction is encoded TAD 0250 instead of TAD 050; it tells the assembler you're wanting the current page's 050, not page 0's 050. Of course, in a real program, you'd likely use labels instead of hard-coding addresses, and so much of this would be taken care of automatically, but it's still important to understand as we try to understand the two addressing modes. The Direct mode is what I have been discussing in this post. The Indirect mode is the other mode, and it will be the topic of the next post.

One final note for 6502 aficionados. Page 0 may sound a lot like the 6502's Zero Page, and there certainly are some similarities, but there are also important differences. Most notably they differ in that, as far as I can tell, there is no real performance benefit to be had by using page 0, other than how it may simplify your code overall. That is to say, the Zero Page on the 6502 is used for oft-accessed data because it's just faster and requires fewer bytes to encode the instruction, but this isn't the case on the PDP-8. An instruction fetching data from page 0 takes 12 bits, and so does one fetching from the current page. And the number of cycles each requires appears to be the same. So it's certainly a good idea to use page 0 when necessary, but there should be no performance-related compulsion to do so.

Monday, August 17, 2015

Addressing modes, part 1: Instruction decoding

It's been more than a week since my last post, and I've been pretty busy. Between work and kids' birthday parties, I haven't had a whole lot of free time - and what I've had I honestly haven't wanted to spend writing code. Nevertheless I've been making my way through DEC's Introduction to Programming and have done a number of small test programs to work things out. As I said in my earlier post, PDP-8 machine language is an odd beast, but I feel like I've got a decent handle on the major topics, and so I thought I'd try to put some thoughts down on virtual paper.

The next few posts won't be a primer on PDP-8 assembly language by any means. For that I'd refer to you to the aforementioned Introduction to Programming. Instead I'll just talk about the topics that stood out to me as different from other machines.

The first topic I wanted to tackle is the two addressing modes, because I think they're sufficiently different from other machines that they'd cause some confusion, but before we can tackle that, I think it's important to understand how instructions are encoded.

Unlike most other CPU's, all instructions for the PDP-8 are encoded within a single 12-bit word (remembering, of course, that the PDP-8 is a 12-bit architecture, not 8-bit). Most machines that I'm familiar with use a variable number of bytes for a single instruction. Take, for example, the LDA instruction for the 6502 architecture, which loads a value into the accumulator. Used with an immediate value (a hard-coded number), it requires 2 bytes to encode. Used with absolute addressing (2-byte memory address), it takes up 3 bytes. On the PDP-8, however, all instructions are encoded within only one 12-bit word. So how does this work?

The figure above, taken from the Introduction to Programming manual, lays out how the the 12-bits are used. (Note that this is only for MRI [Memory Reference Instructions], instructions that read or write memory. Obviously there are other operations, but these are obviously very important. We'll cover other operations in future posts.) The first three bits (Operation Code) indicate the specific operation: AND, TAD, DCA, JMP, ISZ, and JMS. I won't get into what each of these does, since I think these would be pretty familiar to any semi-knowledgeable programming, except for some of the weird mnemonics. Think of them like this:

TAD = ADD
DCA = STO or MOV
JMS = JSR or CALL
ISZ = BEQ or JE
JMP and AND are pretty universal mnemonics, I think

These aren't exact translations (ISZ is definitely a little weird compared to BEQ or JE), but they're close enough for this discussion. And these are pretty much the core instructions you'd expect of any assembly language. Of course, these aren't the only instructions you'd need for a full language. Now the clever reading might have already figured out that with only 3 bits to designate the operation, that leaves us only 2 more operations after these six. There are actually quite a few more, but they operate very differently and don't access memory, so we'll come back to those in a future post. Right now we're focused on addressing modes and memory.

So with 3 of our 12 bits used, that leaves only 9 bits for the operand. Remember, we can't use a second 12-bit word as our operand, we have to contain the entire instruction in these 12 bits. So limited to only 9 bits, we can only access a maximum of 512 addresses. But it's even worse than that. As you see from the diagram bit 3 indicates whether you're doing direct addressing or indirect addressing, and bit 4 indicates whether you're access the current page or page 0. That leaves only 7 bits for an actual address, which means that we can only access 128 bytes. Obviously since the base PDP-8 had 4k and was expandable to 32k, there's more to it than just this, but understanding the 7-bit limitation is important to understanding why the addressing modes work the way they do.

In the next post (hopefully in a day or two), we'll cover memory paging and page 0 - basically what bit 4 is for.

Saturday, August 8, 2015

A new PDP-8 project, new stuff to learn

It's been over a week since my last post. I've been pretty busy at work, with an important deadline looming, but I have to admit I've sort of been taking a break from retrocomputing for a few days. But as I mentioned in my last post wrapping up the Retrochallenge, I planned to continue working with the PDP-8. So here I am back at it, with a new project.

I've had a lot of ideas about what to do next, but what I've finally settled on is that I'd like to write a cross-assembler, something that will generate PDP-8 machine code in a BIN file that can be read by the paper tape reader in simh.

But this is a bit of a long-term goal, not an immediate one. The one downside to my Retrochallenge project, as I mentioned in my previous post, is that learning Fortran shielded me from many of the lower level details of the PDP-8. So I have to learn that, starting with the PAL8 assembly language itself. So my next few posts will almost certainly be about what I've learned and how to write code in PAL8 assembly language.

Toward that end, I've started reading the "Introduction to Programming", put out by DEC in 1969. It's a gentler introduction to PDP-8 assembly language than the OS/8 Handbook that I've been relying on so far. It seems to take a lot less for granted in terms of knowledge on the part of the reader, and it does a much better job of explaining things. This is both good and bad, of course. For example, the first chapter covers numbering systems such as binary and octal (the standard for PDP-8 code), how to perform arithmetic operations on such numbers, etc. Being an experienced programmer, though, this was mostly just review, so I actually ended up skipping large chunks of this. But the second chapter starts to get into the meat of the topic, so I'm looking forward to making my way through that.

Just a quick note before going further: My experience with assembly is completely limited to microcomputers, and even then a small set. Although I've done some x86 assembler in the past, it's been quite some time, and now its really 6502 assembly that I'm most familiar with. So naturally as I'm working my way through PDP-8's assembly language, I can't help but compare it to the 6502, and I'm sure that will come out in my posts. If you're not familiar with the 6502 and its assembly language idioms, then I apologize. I can only hope that this will not prove to be too much of an obstacle.

Thursday, July 30, 2015

Challenge wrap-up and retrospective

Well, to no one's surprise at least among those who have been following this blog, I am not going to complete my project. I did actually get some coding done on Hammurabi, but honestly I didn't get very far. It's not a big game, and I think I'm finally well positioned to get some serious work done on it, so it would still be possible to knock it out today if I had the time, but I just don't. So I'll just accept that it's not going to get done.

But by no means do I consider this project a failure. To refresh our memory, my project was to develop a text-based game (Hammurabi) in Fortran on a PDP-8 running OS/8 using the simh simulator instead of a real machine (since I don't have one and don't have several thousand in spare cash to get one). But the real point of the project, from my standpoint at least, was really to learn about a system and language that I knew nothing about. And in that I think I succeeded.

When I started at the beginning of July, all I knew of the PDP-8 was that it was a minicomputer from 1965 that had been an important step in computer history. I had actually seen several at VCF SE a few months ago, but hadn't touched or worked on any. I hadn't even seen the front panel light up - and I wouldn't have known what the lights meant if it had!

Similarly, I had never worked in - nor knew anything about - Fortran. In fact, when I first contemplated this project, I actually wanted to use COBOL, but I couldn't find a COBOL for the PDP-8. Then I thought about ALGOL, given it seems to have been an influential language, but I finally settled on Fortran only because the OS/8 Handbook had two large sections on Fortran (though, as it turned out, it was for two different versions of Fortran, so really only one section was of use to me).

By the end of this retrochallenge, I feel I'm pretty comfortable with OS/8, TECO, and Fortran. I still have a lot to learn, but I can get around the system and do common things pretty easily. My only regret in this regard is that by choosing a high level language like Fortran, I really didn't gain as much knowledge of the PDP-8 architecture as I might have if I'd done a project in PAL8 assembly language. But still I'm very happy with what I accomplished and where I ended up, even if I didn't actually finish the project.

Having spent a month with the PDP-8 and OS/8, I can say that I really like the system - better in some ways than most microcomputers, in fact. Most microcomputers use BASIC as their operating environment, and while I used BASIC extensively back in the day, I really don't like it much now. When I first got into vintage computing, I did a few BASIC projects for the sake of nostalgia and discovered that, unlike the microcomputer systems themselves, BASIC just hasn't aged well - for me, at least. So I like that OS/8 is an operating system that just include BASIC as another utility that you can run, but not something that you're forced to use. So I fully intend to do more with the PDP-8. I'm still looking forward to getting my PiDP-8 in the second run in October, and between then and now I want to continue to sharpen my PDP-8 skills.

I actually have several PDP-8 projects in mind. I'd like to attempt to build a full-screen editor, possibly with VT100 support, though that's not really a small project. I'd also like to write something to be able to transfer files between OS/8 disk images and the host filesystem. I don't like writing code that I can't put up on Github, and I certainly don't intend to port git to OS/8! Also I really like what Cat's Eye Technologies did for their challenge and thought about doing something similar for the PDP-8.

On the other hand, while I don't hate Fortran, I can't say I'm overly fond of it either. I could see using it as a glue language to bring assembly modules together - that's what Adventure did. But honestly I doubt I'll use it at all again. That doesn't mean I regret learning it, or that I think it's a horrible language. It actually has some cool features, though ultimately for me nothing that will compel me to use it. It probably falls into the class of languages that I would use if there was a specific purpose, but I won't go out of my way to use it.

So where to from here? Well, as I said, I will continue to work with the PDP-8. My two top projects will be to develop a file transfer utility, and to learn PAL8. There is an open source PAL-8 cross-assembler that I could use, if I get the file transfer utility done, but I probably will just stick with doing assembler on the PDP-8 itself for the moment. I plan to continue to blog about my PDP-8 experiences, using this very blog, so if you have any interest, feel free to check back.

Finally, with regards to the Retrochallenge itself, this is my first time participating and I can say without any hesitation that I love it! I'm not a particular focused person, and I often drift from project to project, but this was great for keeping me on target. And watching the other projects as they developer over the past month was very inspirational. My only regret is that the Retrochallenge only occurs twice a year! Already thinking about what to do for January...

Wednesday, July 29, 2015

The BATCH utility

In my last post, I talked in some depth about how to build a multi-module Fortran program, and then at the end lamented a little bit about the lack of any way to automate builds. Well, as it turns out, I was wrong! There is an OS/8 utility called BATCH that allows you to do this. In my first pass through the OS/8 Handbook, I came across this utility, but since they really describe it as a batch processing utility, which isn't anything I'm really doing, I skipped over it and kind of forgot about it.

But after writing my post yesterday, I recalled that it was there and wondered if it could be used to automate builds. And, sure, enough it can - quite easily, in fact. It's not without its own quirks and pitfalls, but it is a pretty nice little utility that pretty much allows you to execute any set of commands.

To use this, you have to create a text file with your commands. There are a few tricks, but it's generally straightforward. First, the file has to be on a permanent device. So a hard disk (RK device) is acceptable, but a floppy (RX) is not. But on my system, the second partition, RKB0, wasn't acceptable either. It had to be on RKA0. Since I'm developing my source on a floppy image, it's a little annoying that the equivalent of my make file has to be on the main hard drive, but c'est la vie. There may be a way around this, but for now it's acceptable.

Using the BATCH utility is pretty simple. First, come up with your build script (I named mine simply BUILD), such as this one for my little test program:

$JOB
/ CLEAN EXISTING FILES
.DEL RXA0:TEST.RL
.DEL RXA0:TEST.LD
/ COMPILE
.R F4
*RXA0:TEST<RXA0:TEST
/ LOAD
.R LOAD
*RXA0:TEST<RXA0:TEST
*$
/ EXECUTE
.R FRTS
*<RXA0:TEST$
$END

If you're already familiar with OS/8, or you read my last post, then much of the above is already familiar. Also, I assume $JOB and $END are pretty self-explanatory. As you might also infer from the above, lines that start with a slash are comments. Everything else is pretty much the same OS/8 commands that you should recognize. The only two things to note is that you must precede each line with a '.' or '*' as the Keyboard Monitor or Command Decoder would do if you were entering the commands interactively. So when you run a program, it is ".R <program>", not just "R <program>". The parameters would be preceded by a "*". And, finally, when you need an ALTMODE character (ESC on modern keyboards), you just use a $, i.e., Shift-4.

Calling this file is even easier. The CCL command, SUBMIT, calls the BATCH utility, so you can execute your batch file by:

.SUBMIT BUILD

Of course, this still isn't a proper build system as we're used to now. There's no conditional compilation based on what's changed and no error checking. So if the compile fails, for example, it will still perform the LOAD and FRTS commands. But this is still much better than the alternative, typing in each command by hand every time.

Tuesday, July 28, 2015

Some answers to old problems and why development shortcuts can be bad

This is a pretty long post, so for that I apologize, but two big topics to cover.

If you've been following this blog, in the last post I indicated that I was having a problem loading the Fortran standard library. The FORLIB.RL seemed to be missing from my system. I'm happy to report that I've gotten past this issue. I found a floppy disk image that had a complete OS/8 Fortran IV system, I copied the FORLIB.RL to my main partition, and it works.

Interestingly, it seems to work without me having to load it. As you may recall, to load all the modules for an application, you use the LOAD utility and specify all the modules, what order they are loaded (for purposes of overlaying code), and what the main program file name will be. In reading through this section in the Handbook, I didn't see anything about loading the standard library, so I assumed that it just happened automatically. Then, as I discussed in yesterday's blog, I ran into problems and noticed that in their sample LOAD session, they actually loaded the standard library explicitly but using a different file name (LIB.RL instead of FORLIB.RL). It gives no explanation why it does this, so then I assumed that you had to explicitly load the standard library.

So I wrote a little test program and tried to load the FORLIB.RL in LOAD, but kept getting a BAD INPUT FILE error. But then I noticed something: When I failed to load in the standard library, my little test program still ran correctly. Apparently it does load in the standard library automatically if it finds FORLIB.RL. This puzzled me because I wondered why would they load it in manually in the sample LOAD session, and why was it giving me the BAD INPUT FILE error?

After some thought, I think I have at least part of an answer. I have no clue about the BAD INPUT FILE, but I think I understand why they were loading it in manually in LOAD. This utility does two things. Most obviously it allows the developer to load all the modules. But it also allows for modules to be defined as overlays. Such a module would not be loaded into memory when the program first starts, but as the program executes it can be swapped into memory as needed. This allows a Fortran program to be as large as 300k even though there's only a maximum of 32k memory. The standard library is quite large, so I assume by loading the FORLIB.RL manually, the sample session was showing how to turn it into an overlay module so that it's not eating up memory for the whole life of the program. I'm still not sure why it was named LIB.RL though, but since I'm not really wanting to concern myself with overlays, I'm happy to ignore this issue.

But in looking at this issue, I did actually end up resolving an earlier issue concerning subroutines and functions. I talk about the issue in two of my prior blogs: Random Linking Woes and Still learning, still haven't gotten anywhere. Basically to do a subroutine or function, each has to be defined as a separate module. But I couldn't figure out how to load these modules.

The problem really comes from the fact that I was using the COMPILE and EXECUTE commands. These are nice commands sort of built in to OS/8. They allow you to build and execute simple programs without knowing what you're doing. In essence, they're development shortcuts. Now I'm not opposed to such shortcuts, but as a professional developer now for more than two decades, I've always followed the rule that you should only use such shortcuts when you know what they're doing. If you don't, then when things screw up as they inevitably do, you don't know how to fix it. Unfortunately I didn't follow that rule with this project, so when I ran into issues with subroutines, I didn't really know how to get around it.

But this issue forced me to really break down and understand in more depth about how to build a Fortran app, and that's led me back to the solution for subroutines and functions. As I mentioned in yesterday's blog, there are 3 steps to building an application:

Compile - The F4 utility
Loading - The LOAD utility
Execution - The FRTS utility

When you use the COMPILE command, it does the first step, but not the second or third. But the EXECUTE command does all three, and if successful, your app just "magically" runs without your really being aware of the steps involved. Unfortunately, EXECUTE doesn't seem to be a very intelligent command, so it doesn't seem to really know how to load separate modules. So it's perfectly good for simple applications, but not really useful for anything of any real size or complexity.

So let's assume you have a program with a MAIN.FT, a subroutine module SUB1.FT, and a function FUNC1.FT. We're going to assume that this will all be part of the main application segment; we'll keep things simple and not use any overlays. When building a Fortran application, you first use the F4 utility to build each module, like so:

.R F4

*PROG1<PROG1 <-- In OS/8, extensions are assumed so need them only if not standard

.R F4

*SUB1<SUB1

.R F4

*FUNC1<FUNC1

This produces an RL for each module. The next step is to load all the RL files:

.R LOAD

*PROG1<PROG1

*<SUB1,FUNC1

*$ <-- This is the ALTMODE key, which echoes as $ and is ESC in simh

Use TECO and you will become very familiar with this.

While I did the load in two lines above, it would have been equally valid to do them in one line with commas, or even in three separate lines. It doesn't make a difference.

Finally, you can execute PROG1.LD by:

.R FRTS

*PROG1

And there you have it, all the steps to build a Fortran IV program on OS/8 complete with subroutines, functions, and whatever modules you want. To build in overlay support, as far as I know, all you have to do is do some more steps in LOAD. At that point, you could write a 300k Fotran program. But that's really beyond the scope of this blog - and of this project.

As a quick final note, the one issue I see with the above is that almost all of these steps has to be repeated every time you build the application. The only part you can skip is that you don't have to recompile a module if it hasn't changed, but otherwise everything above has to be done each time. Unfortunately, as far as I know, OS/8 doesn't have any way to automate this.

Monday, July 27, 2015

Time to write some code! Ok, maybe not quite yet...

So after nearly two weeks, I'm finally back to working on my project. First, I was waylaid for nearly a week with the flu, then I found myself tied up with real life (my least-favorite retro project ever). I will confess that I did also blow a few days on non-retro computer gaming, but now I'm back to see if I can wrap this up before the weekend. Unfortunately, with the way it's been going, I'm not optimistic.

As I indicated with my last post, I wanted to tackle the random number generator first. This seemed to be the biggest remaining hurdle, as I figured the rest of the coding would be pretty straightforward, but I really didn't expect it to take long. After doing a little research, I found that a linear congruent generator was probably my best bet. Mathematically, it's pretty simple.

Shamelessly stolen from Wikipedia entry for Linear Congruent Generator, entry linked above

Just multiplication, addition, and modular arithmetic. Now Fortran IV doesn't have a modulo operator (which surprised me a little, since math is supposed to be it's strong point), but it does have a MOD function in its standard library that takes two integers and returns a single integer. Excellent! Or so I thought...

When I tried to use it, I kept getting USER ERROR MAIN 0004 error. Obviously this isn't the most obvious error message ever, and I couldn't find specific information on that error, but I did finally track it down to the MOD function. Apparently it couldn't find the MOD function. Somewhat annoyingly, this doesn't occur in the compilation phase.

To understand, you have to appreciate how any but the most simple of computer applications are compiled. Whether you're dealing with a 50-year old PDP-8 or the latest computer, the basic steps are still the same. There are three basic phases: Compilation, linking/loading, and execution. For OS/8's Fortran IV system, they have F4 as their compiler, a utility called LOAD for loading, and then the FRTS (Fortran Run-Time System) program for linking and execution. Based on what I was seeing, I think they're using what we would today call Late Binding, because I got the error from the last phase, the FRTS utility. I don't recall seeing this term used in the Handbook, so I'm not sure what they would have called it in 1974 when it was written, but it basically means that external functions and subroutines are linked together at execution time rather than compile time (which would be called, not surprisingly, Early Binding).

So even though linking is done in the third phase, actual loading is done in the second phase with the LOAD utility. When doing so, you actually specify your output program, and then specify all the modules that go into it. Looking at a sample LOAD session in the Handbook, one of the modules that you include is the standard library. In the book, it says the standard library is contained in a file named FORLIB.RL, but in the example session they use LIB.RL instead. They have a comment explaining that they're using this name instead of FORLIB.RL, but give no indication why. In any case, after a quick search, neither file appears on the OS/8 hard disk image that I'm using. So I guess this is a slightly misconfigured installation of Fortran.

This leaves me with two options. First, I could try and locate FORLIB.RL and copy it to my system. Or, second, I could just write my own version of mod. Obviously, this isn't difficult, but I would like to be able to use the standard library. I don't really anticipate using it in any of the rest of my program, but it annoys me that I can't so I will at least take a stab at fixing the problem. But I don't really have that much time left, so if I don't fix it soon, I'll just write it myself.