Monday, January 25, 2016

6502 - 24 Bit Math and a little BCD (RC2016/1)

I decided that another experiment/lesson to do on my way to making my calculator app was to learn how to do multibyte math and possibly experiment with BCD/Decimal vs Hexadecimal. (Or as Mark Watney calls them "Hexidecimals".

I kinda like breaking down this project into multiple "lessons" as it were.  It makes me feel like I'm following along lesson plans in a book.  Perhaps I should go the other way around and actually make the book I would be following if I were following a book to make this thing.

The code for this can be found in my github repository.

I broke down the application into a few main steps:

  1. display the last result
  2. add together the two previous results
  3. store that sum into a result variable
  4. repeat
When broken down further, we see that we also have to have some method for "kickstarting" it, as it were, since the first two numbers in the sequence do not follow the standard fibonacci sequence. (quick reminder: each value in the fibonacci sequence is the previous two values added together. very simple.  So, for the first two values, there is no "previous two" so they are just hardcoded as "0, 1"

Computation of the sequence can be described as :
  1. hardcoded "0"
  2. hardcoded "1"
  3. use algorithm to sum previous two values
  4. same as 3
  5. etc
For doing the math, I wanted to have variables that mimiced the 3 bytes we are able to display on the KIM, so I use 24 bits (3 bytes) to store them.  I broke down the math functions to be generic in that they can perform using two variables ("i" and "j") and store their result in a third variable "RESULT".  From there, additional functions were created to move the values around between them.  For example, we need to "roll" the values through if we want to make this repeatable. So the computation sequence can be described as:
  1. RESULT, I and J all set to '0'
  2. refresh display RESULT "0"
  3. RESULT gets "1"
  4. refresh display RESULT "1"
  5. shift the values through:
    1. J gets I's value
    2. I gets RESULT's value
  6. add:  RESULT gets the value from adding I and J
  7. repeat at step "4"
And this is basically the procedure as seen in the source.  

The multibyte addition was actually a lot simpler than I thought it was going to be. My first thought was "how could this possibly work if i were to add like 100 to 100... you end up with "2" for the carry instead of "1"."  Obviously, you can see the error here, but for some reason this got stuck in my head and suddenly, all of the multibyte (16+ bit) math seemed near impossible to deal with.  I think it was the multiplication that seemed hard, but when you break it down as multistep additions instead of multiplications, it all makes sense.  I blame this on the cold and fuzzy head I have right now.  I'm just not thinking right... also extra time at work... sure... and um... an ARP storm.  all contributing factors to not thinking clearly. ;)

The basic procedure for doing multibyte math is to observe the carry bit.  The carry bit is set when math on two 8 bit values exceeds the 8 bit container.  If you think of it in decimal, when you add 1 to 9, you get "0" with a carry of "1" which ends up in the next digit space, resulting in a "10".  So if you were to add 99 and 04, you end up with 03 with a carry of 1, resulting in "103".  Math on the 6502 is no different, other than we're (probably) using hex where the value can go from 0-9,a-f rather than just 0-9 for each digit.  The math for addition is basically:
  1. for each byte (starting from the least significant on the right)
    1. add one byte to the other, with the carry bit from the previous byte
    2. store that result in the RESULT
Or, more precisely
  1. clear "Carry" (Carry = 0)
  2. register A gets I0 (A = J0)
  3. add j0 to A.  (A = A + J0 + Carry)
  4. store the result in RESULT0
  5. A = I1
  6. A = A + J1 + Carry
  7. RESULT1 = A
  8. A = I2
  9. A = A + J2 + Carry
  10. RESULT2 = A 
I think you can see that this can be carried out indefinitely for multiple bytes.

The "display the result" was pretty straightforward as well.  The "RESULT" bytes were stored into INH, POINTL and POINTH, and then the SCANDS function is called. This refreshes those three values out to the KIM's LED display.  Then a call to GETKEY stores the current key press value into the accumulator register.  If nothing is pressed, this fills A with $15, or KEY_NONE as I have it defined.  Then it just sits in a tight loop refreshing the display and waiting for any key to be pressed.
  1. refresh display
  2. check for key press
  3. no key press? repeat at 1
  4. return
So the end result is a program that advances to the next sequence number each time you press a key.

When it fills all the digits, when we get a "carry" on the third digit while doing the math, i display "EEEEEE" as a cheesy error display and wait for a press.  When something is pressed then, it resets and starts all over agian.


As for BCD, I basically have run the code both in BCD (decimal) mode and hex mode, just to see how it works out.  Turns out i was worried for nothing,  It all 'just worked' fine in both modes.

So yeah.  My throat is sore, and I'd love to just go to sleep right now.

Thursday, January 21, 2016

6502 - RLE Image Renderer (RC2016/1)


I finished up my RLE (Run-Length Encoded) image renderer last night.  It would have been much simpler but there were a few things that I wanted to deal with to have proper full support for sprite placement and large image rendering.

The basic concept of RLE is that instead of storing just a series of pixel colors, we also store the number of times each pixel is repeated.  As described in the previous post, we know that this hardware uses the lower nibble of each byte to store the color number.  We will use the upper nibble to indicate repetitions as well as other commands, which we'll get to later...

Using '0' for the number of repetitions makes no sense, so it will never be used when the image is encoded. (repeat "red" pixels 0 times? nope.) So we'll use '0' in the top nibble to indicate commands.  A few commands that we will need are:

$00 - End of image (stop rendering, return)
$0F - End of line (no more pixels on this line, start over vertically down one pixel from the start of this line)

Which leaves $01 through $0E, which we will use as a "skip".  Advance the screen position, but do not draw any pixels to the screen. We can use this to allow images to have "transparency".

One thing to deal with was that after 255 bytes (at most), the referencing will go into another bank.  If everything fits in one bank, that's fine, but the screen itself is 4 banks, so this was something that needed to be addressed.  (HA! Addressed! I'm hilarious!)  If this isn't dealt with, and we only are incrementing the lower byte of a two byte address, we'll just keep reading (or writing) forever inside of one bank. $41FE, $41FF, $4100, etc  rather than $41FE, $41FF, $4200, $4201 ...

So basically instead of just incrementing the screen pointer by one, indirectly using
    inc IMGPTR    ; will wrap around inside a bank. bad.
I instead had to add a '1' to it, then add the carry bit onto the high byte of the value.  I need to take a step back here.  The 6502 only really has grasp of 8 bit (one byte) values.  It can use 16 bit values for addresses, stored as two bytes, but all math functions happen on the one-byte scale.
    clc          ; clear the carry bit  (Carry = 0)
    lda IMGPTR   ; A = *IMGPTR
    adc #$01     ; A = A + 1 + C
    sta IMGPTR   ; *IMGPTR = A
      ; at this point, the carry bit is either set or not,
      ; so we will add 0 into the next byte with carry
    lda IMGPTR+1 ; A = *IMGPTR+1
    adc $#00     ; A = A + 0 + C
    sta IMGPTR+1 ; *IMGPTR+1 = A

Why use RLE? A couple reasons.  First of all, it will save ROM space.  The RLE encoded (compressed) images should take a bit less space inside the rom.  An alternate we could do is to store color data in both nibbles of the byte, then just shift them out to the screen.  We would lose the ability for transparency, but you're guaranteed 50% space savings with the system we have here.

The full source code for this project is over at github.

The image shown at the top of this post shows three sprites stored in the ROM.  They were hand-encoded from graph paper sketches of various sources.  The rainbow was just coded by scratch to test out everything.

The red ghost is obviously borrowed from Namco's "Pac-Man" arcade game.  The mouse is borrowed from Nintendo's "Goonies" arcade game. Both are used for educational/demonstrative purposes here.

Wednesday, January 20, 2016

6502 Learning (RC2016/1)... Video buffer sidetrack...

I got a little sidetracked while working on the KIM-Uno calculator, playing with video buffers. I had added a video buffer to the desktop Kim Uno Remix project. Ultimately, I want to make a compressed image decoder and viewer.. to draw sprites to the screen or full-screen images.

I've written stuff like this before back on the Z80 for Pac-Man hardware, so I thought it would be a fun exercise to see how something like this would be implemented on 6502. It gives me a good chance to learn addressing modes and methods for this architecture... which is very different than Z80's.

You can attempt to play along by using this web-based assembler system. I based the video buffer in KIM Uno Remix on this system.  There are a few differences though...

6502asm.com:

  • 32x32 pixels
  • 16 colors
  • Commodore 64 palette
  • starts at $0200, continues horizontally then down, starting top left
  • one byte per pixel
  • bottom nibble indicates the color ($00..$0F)
  • top nibble is ignored
  • code starts at $0600
KIM Uno Remix:

  • is 32x32 pixels
  • 16 colors
  • Modified Deluxe Paint palette
  • starts at $4000, continues horizontally then down, starting top left
  • one byte per pixel
  • bottom nibble indicates the color ($00..$0F)
  • top nibble is ignored
  • code starts at $0200
Here's the output from a small program (shown below) that shows off the palettes of the systems. The KIM Uno is on the left, and shows the very reasonable "rainbow" palette.  The one on the right shows the more convoluted "Commodore 64 palette" of the web tool.


The colored sections are 8 rows of 32 pixels across. Since there's only 16 colors, the color stripes get repeated twice along the horizontal of the screen.

The code to run the above was essentially identical on both systems but there are some tweaks to accommodate the addresses and some minor differences between the CC65 tools that I use and the web-based tool.


Here's the source code listing used for CC65, which generated the image on the left above.  You can see that it writes to two of the four banks of memory space, at $4000 and $4200, while not doing anything with the $4100 and $4300 banks, which is why we see two segments of stripes, and two segments of black in the above image. It is a very simple program that simply increments "X" and writes it to videobuffer[x].


And here's the source for the web-based tool.  I colorized it to match the above CC65/KIM Uno Remix listing.  Notice that the program is the same, although it uses $0200 and $0400 for the screen memory, skipping the $0300 and $0500 sections.  I also switched the "unnamed label" from the above code to be a label named "loop" for this one.  It apparently doesn't support that.

Saturday, January 9, 2016

KIM Calculator Update: Learning 6502 and reducing code size

One of the functions of this KIM-Uno project involves functionality similar to the stock KIM monitor.  You press buttons 0-9, A-F to enter a number (a nibble, a half byte), and it scrolls in from the right side of the display.  Usually you can only enter the address bytes (first 4 digits) or the data byte (rightmost 2 digits).  I wanted to use all 6 digits for the values entered so I needed to write my own handler for this. I decided to kinda glance at the KIM Monitor code, but I wanted to make it all on my own.

Version 1 sketch...

Version 2 sketch.

Both of the above (which didn't quite work) were basically the same as the working version (v3) which I did implement and had in the code for a week or so.  It was 46 bytes long, plus a sub function which got called 3 times that was 20 bytes long.  Not very small.  The basic procedure was this:


  1. Get the key from the user (0x00 - 0x0F), store it aside
  2. for each of the three display bytes (eg 0xMN)
    1. Store it in X (input byte)
    2. shift it to the left by 4 nibbles (one display digit)
    3. store aside this value 0xN0
    4. restore the display byte (0xMN)
    5. shift it to the right by 4 nibbles (one display digit)
    6. store aside this value now (0x0M) this is the "carry out"
    7. Add the user input key with the first stored value  0xNK and put in "X" (output byte)
    8. restore the "carry out" to "A"

That's basically it. Nothing particularly wrong with it, it works fine, but once you understand more of the opcodes available in the system, we can drastically reduce this.

I was working on implenenting one of the calculator functions "Shift left one bit" when I learned/remembered about shifting with ROL (rotate left) and ROR (rotate right) which shift the bits around, storing the one going out and the one going in using the "carry" flags bit.  The implementation of this function was super easy using this:

clc          ; clear carry bit (shift in a '0')
rol DIGIT3   ; shift data in memory location DIGIT3 one bit
rol DIGIT2   ; these take the bit shifted out and store it
rol DIGIT1   ; in the carry bit, then shift that bit in
jsr DISPLAY  ; and display it

Then it hit me, I could leverage off of this carry bit for the above process, since it basically is:

  • Shift all three bytes to the left by one bit four times (one nibble) 
  • Shove the key in to the lower nibble of DIGIT3
This change of shifting everything by one bit four times, rather than the above where I was shifting four bits three times, worked out perfectly with the available opcodes.  Here's the final code for this routine:  (INH is the third digit pair, POINTL and POINTH are the second and first digit pairs of the display respectively)

We set up a loop using "x" as the counter register, notice we set it to '4' first.  We are also doing something that I just learned about which is non-named labels, which is why you see ":" starting a line, then a "bne :-"  (branch (goto) if not equal to the previous non-named label.)

keyShiftIntoDisplay:
        ldx     #$04            ; 4 bits to shift
:       clc                     ; rol pulls from carry, so clear it
        rol     KIM_INH         ; shift this byte by 1 
        rol     KIM_POINTL      ; shift this one, carry from INH
        rol     KIM_POINTH      ; shift this one, carry from POINTL
        dex                     ; x = x - 1
        txa                     ; a = x 
        cmp     #$00            ; a == 0?
        bne     :-              ; mot 0, repeat loop

        ; now shove the content in
        lda     KEYBAK          ; restore key 00 .. 0F to A
        ora     KIM_INH         ; A = A | INH
        sta     KIM_INH         ; INH = A
        jsr     SCANDS          ; and display it to the screen
        jmp     keyinput        ; next!

Which works out substantially smaller.  It has one chunk of 13 bytes that gets run four times, for the entire code block of 27 bytes.  Quite a lot less!  I'm really enjoying learning 6502 asm for this!

The code for this project can be found on github at The LlamaCalc directory of my Projects5502 repository.  This requires the cc65 toolset to build.

Friday, January 1, 2016

Retrocomputing Challenge 2016-1: Learning 6502, KIM-Uno stuff

Starting today, I'm going to attempt to better learn 6502 asm in my copious amounts of free time for the  RC2016/01 Retrocmputing Competition.  To prepare for this, over the past year I've gotten into working with Oscar Vermeulen's awesome KIM Uno kit, as well as pushing out my own updated firmware for it in the form of my Kim Uno Remix project on github.

Part of that project was to make it more portable and make it available on other platforms.  I have a preliminary iOS build of it, as well as a QT-based desktop build of it checked in which builds on Mac, Windows and Linux.  Source at github will build for all of these, if you have QT Creator installed (along with support compilers for your system of course.) Binaries will be available eventually for all platforms.



In the above screenshot you can see some 6502 asm in the center window. I have a makefile which uses the industry(homebrew) standard(?) cc65 compiler/assembler  to assemble it into a .lst "listing file".  This file contains the original ASM as well as the machine language bytes and the addresses they sit at.  This is a lot better, imo for distribution as it can be easily trimmed (ref: unix 'cut') to the original asm, or it provides the necessary information to hand-enter it into a KIM.

The KIM Uno emulator can be seen on the leftmost window. It looks like you'd expect a KIM on a desktop to look.  There are two other windows here though.  The Video Display shows a virtual framebuffer which sits in KIM memory at $4000 (0x4000).  It is 32x32 pixels, one byte per pixel.  Only the bottom nibble is currently used in the byte, to signify one of 16 colors ($00, $01... $0E, $0F).  At compile time you can use the Commodore 64 palette, or the Amiga palette, which is based on the default colors from Deluxe Paint.  This feature is heavily influenced (copied/borrowed) from the very awesome virtual machine/programming interface available at 6502asm.com.  Theirs sits at $0200, which collides with KIM stuff, so I moved it to $4000.  Otherwise it behaves the same.

There are a couple windows not shown, including a serial terminal emulator that connects to an emulated UART in the KIM, so that you can run the chess application or what have you.  Also available is a memory browser that lets you look through the entire 64k memory space of the 6502. It allows you to have it update automatically so that you can see changes as they occur.  Very handy for debugging

Now here's where things get neat...

The window in the bottom right is for a feature I call "Code Drop".  You can take one of the .lst files mentioned above (generate one by running "ca65 project.asm -l project.lst") and drag and drop it to that window. Or you can click "Browse..." and pick it from your filesystem.  Now, when you hit "Load to RAM", it will load in that .lst file, and drop the bytes in the appropriate place in RAM, while the emulation is still running.

The "Auto ADDR seek" feature will then auto type-in for you the first address specified in the LST.  The "Auto GO" feature will do the seek, then press "GO" for you as well.

The application is also sensitive to the SIGUSR1 signal, which does the same as pressing the "Load To RAM" button.

So here's what you (I) do...

The desktop application is set to the appropriate .lst file for the project I'm working on.  It is set for "auto GO" as seen above.  Now in the makefile for the project,  it will build the .lst file, then send the SIGUSR1 signal to the application.  When I type 'make', it assembles the file, builds the lst, then triggers the emulator to reload and restart the code, essentially integrating it into my build process.

.oOo.

For the challenge, I want to use this system to make a simple integer programmer's calculator which I can run on the KIM Uno itself.  Press keys to shift in the nibbles, then switch it into a mode where i can affect the data.  Convert hex to decimal, do bitshifts, add, multiply, etc.