/ Low Level

MC6000 in hardware - Part 1: The assembler

SHENZHEN-IO is an interactive circuit building and programming puzzle game with a programmable microcontroller called the MC6000, it has an extremely simple instruction set and no memory besides 2 registers that can only store numbers from -999 to 999.

Each instruction consists of a label, condition, instruction, and comment:

foo: +mov 50 x2 # puts 50 to XBus 2

Conditions can either be +, -, or blank and control if the instruction should execute after a comparison. Labels are optional and used to tell where the jmp instruction should jump to, this is just sugar to make things easier to keep track of. Registers consist of acc, bak, and 6 virtual registers coresponding to the 6 I/O ports on the MC6000. The game comes with a more in-depth manual with a language specification here: https://u.pxtst.com/QAgvo8UJR6fah.pdf

The first step to implementing this in actual hardware is to lay out the machine code:

Registers

000acc 001dat 010p0 011p1
100x0 101x1 110x2 111x3

You may notice the lack of the register null which does nothing when you write to it and returns 0 when you read from it. I did not include it because it can simply be replaced with the literal 0 except when writing to it with MOV, because of this I added the flag E to the instruction SLX which when 1 will eat a value from the bus and do nothing with it.

Condition codes

00always execute
01only execute on - flag
10only execute on + flag
11only execute once

Internally there are two execution flags, + and - which are set by the test instructions TEQ, TGT, TLT, TCP. Every instruction but TCP sets the flags in a differential as in only either + or - can be true at the same time but TCP will disable both flags when both operands are equal.

Because I want to reduce instruction size I've made it so only the first operand of test instructions can be immediate values, because of this things have to be shifted around when assembling:

TGT acc 69 -> TLT 69 acc
TEQ 69 69 -> TST 1 0
TCP acc 42 -> TPC 42 acc

You may notice I've added 2 extra test instructions: TST and TPC.
TST takes 2 operands, + and - and sets the coresponding flags directly, this is always emitted when both operands of a test instruction are immediates.
TPC is the same as TCP but with the operands reversed, this happens when the second operand is an immediate.

Register/Immediate values

109876543210
10000000 reg Regsiter
immediate Immediate

Immediate values can range from -999 to 999 and are encoded with two's complement and because it doesn't completely fill the 11 bits (0 - 2048) we can store a register number without adding an extra bit to signal whether or not it's a register. To easily tell the difference between the two you simply check if the first 8 bits are 10000000 which means it's <= -1017.

Register/Digit values

43210
1 reg Regsiter
0 immediate Digit

Register/Digit values work similarly to Register/Immediate values but store single digits from 0 to 9 but requires a flag to differentiate registers from immediates. If the digit is out of bounds it should be encoded as 0b1111 which will be interpreted as a no-op.

Instructions

1817161514131211109876543210
cond 000 R/I arg reg MOV
cond 010 00 line JMP
cond 010 01 R/I sleep amount SLP
cond 010 10 Expin SLX
cond 010 11 R/I value ADD
cond 011 00 reg SUB
cond 011 01 R/I value MUL
cond 011 110 NOT
cond 011 100 R/D selection DGT
cond 011 101 R/D value R/D selection DST
cond 100 R/I arg reg TEQ
cond 101 R/I arg reg TGT
cond 110 R/I arg reg TLT
cond 111 R/I arg reg TCP
cond 001 R/I arg reg TPC
cond 011 111 +- TST

Using this layout I made an assembler and disassembler in Dart:

main() {
  try {
    var rom = assemble(
      "beg:teq x2 -1\n"
      "- slp 1\n"
      "- jmp beg\n"
      "  mov -1 x1\n"
      "  mov p0 x3\n"
      "  mov p1 x3\n"
      "  mov x1 acc\n"
      "  add x1\n"
      "  mov acc x3\n"
    );
    print("Assembled ROM:");
    print(rom);
    print("Disassembled:");
    print(dissasemble(rom));
  } catch(e, bt) {
    print(e);
    print(bt);
  }
}

abstract class RI { int encode(); }
class RegRI extends RI {
  RegRI(this.id);
  int id;
  encode() => (1 << 10) | id;
}

class ImmRI extends RI {
  ImmRI(this.x);
  int x;
  encode() => x.toUnsigned(11);
}

abstract class RD { int encode(); }

class RegRD extends RD {
  RegRD(this.id);
  int id;
  encode() => 0x10 | id;
}

class DigRD extends RD {
  DigRD(this.x);
  int x;
  encode() => x;
}

const regNames = const ["acc", "dat", "p0", "p1", "x0", "x1", "x2", "x3"];

T maybeAt<T>(List<T> l, int i) => i >= l.length || i < 0 ? null : l[i];

List<int> assemble(String text) {
  var lines = text.split("\n").map((e) {
    if (e.contains("#")) e = e.substring(0, e.indexOf("#"));
    return e;
  }).map((e) => e.toLowerCase().trim()).where((e) => e != "").toList();

  var labels = <String, int>{};
  for (var i = 0; i < lines.length; i++) {
    var labelns = [];
    while (i < lines.length) {
      var m = new RegExp("^([a-zA-Z_][a-zA-Z_0-9]*):(.*)\$").matchAsPrefix(lines[i]);
      if (m == null) break;
      labelns.add(m.group(1));
      lines[i] = m.group(2);
      while (lines.isNotEmpty && lines[i].trim().isEmpty) lines.removeAt(i);
      if (i >= 0 && lines[i].trim().isNotEmpty) break;
    }

    if (i < lines.length) {
      for (var nm in labelns) labels[nm] = i;
    }
  }

  RI parseRI(String text) {
    if (text == null) throw "Expected register or number";
    text = text.toLowerCase();
    if (text == "null") return new ImmRI(0);
    if (regNames.contains(text)) return new RegRI(regNames.indexOf(text));
    var i = int.parse(text, onError: (i) => throw "Invalid register");
    if (i > 999) throw "Number too large";
    if (i < -999) throw "Number too small";
    return new ImmRI(i);
  }

  int parseReg(String text) {
    if (text == null) throw "Expected register";
    var x = parseRI(text);
    if (x is RegRI) return x.id;
    throw "Register expected";
  }

  RD parseRD(String text) {
    var x = parseRI(text);
    if (x == null) return new DigRD(0);
    if (x is ImmRI)
      return new DigRD(x.x > 9 || x.x < 0 ? 0xF : x.x);
    else return new RegRD((x as RegRI).id);
  }

  encodeMOV(RI src, int dest) =>
      src.encode() << 3 | dest;

  encodeSLX(int xbus, bool eat) =>
      0x0A << 12 | (eat ? 1 : 0) << 2 | xbus;

  encodeADD(RI val) =>
      0x0B << 12 | val.encode();

  encodeTST(bool plus, bool minus) =>
      0x1F << 11 | (plus ? 2 : 0) | (minus ? 1 : 0);

  var o = <int>[];
  for (var line in lines) {
    // if (o.length == 14) throw "Too many lines";
    var params = line.split(" ");
    int cond = 0;
    if (params[0].startsWith("+") || params[0].startsWith("-")) {
      cond = params[0].startsWith("+") ? 1 << 18 : 1 << 17;
      params[0] = params[0].substring(1);
      while (params[0].trim() == "") params.removeAt(0);
    }
    var cmd = params[0].toUpperCase();
    if (cmd == "NOP") {
      if (params.length > 1) throw "Too many operands";
      o.add(cond | encodeMOV(new RegRI(0), 0));
    } else if (cmd == "MOV") {
      if (params.length > 3) throw "Too many operands";
      var x = parseRI(maybeAt(params, 1));
      if (maybeAt(params, 2) == "null") {
        if (x is ImmRI || (x is RegRI && x.id < 4)) // MOV n null
          o.add(cond | encodeMOV(new RegRI(0), 0)); // NOP
        else if (x is RegRI)
          o.add(cond | encodeSLX(x.id, true)); // eating SLX p
      } else {
        o.add(cond | encodeMOV(x, parseReg(maybeAt(params, 2))));
      }
    } else if (cmd == "JMP") {
      if (params.length > 2) throw "Too many operands";
      if (params.length < 2) throw "Expected label";
      if (!labels.containsKey(params[1])) throw "Label not defined";
      o.add(cond | 1 << 15 | labels[params[1]]);
    } else if (cmd == "SLP") {
      if (params.length > 21) throw "Too many operands";
      o.add(cond | 9 << 12 | parseRI(maybeAt(params, 1)).encode());
    } else if (cmd == "SLX") {
      var r = parseReg(maybeAt(params, 1));
      if (r < 4) throw "Must be an XBus pin";
      o.add(cond | encodeSLX(r & 3, false));
    } else if (cmd == "ADD") {
      if (params.length > 2) throw "Too many operands";
      o.add(cond | encodeADD(parseRI(maybeAt(params, 1))));
    } else if (cmd == "SUB") {
      if (params.length > 2) throw "Too many operands";
      var r = parseRI(maybeAt(params, 1));
      if (r is ImmRI)
        o.add(cond | encodeADD(new ImmRI(-r.x)));
      else if (r is RegRI)
        o.add(cond | 0x0C << 12 | r.id);
    } else if (cmd == "MUL") {
      if (params.length > 2) throw "Too many operands";
      var r = parseRI(maybeAt(params, 1));
      o.add(cond | 0x0D << 12 | r.encode());
    } else if (cmd == "NOT") {
      if (params.length > 1) throw "Too many operands";
      o.add(cond | 0x0F << 12);
    } else if (cmd == "DGT") {
      if (params.length > 2) throw "Too many operands";
      o.add(cond | 0xE << 12 | parseRD(maybeAt(params, 1)).encode());
    } else if (cmd == "DST") {
      if (params.length > 3) throw "Too many operands";
      o.add(
        cond | 0xE << 12 | 1 << 11 |
        parseRD(maybeAt(params, 2)).encode() << 5 |
        parseRD(maybeAt(params, 1)).encode()
      );
    } else if (cmd == "TEQ" || cmd == "TGT" || cmd == "TLT" || cmd == "TCP") {
      if (params.length > 3) throw "Too many operands";
      var x = parseRI(maybeAt(params, 1));
      var y = parseRI(maybeAt(params, 2));
      if (x is ImmRI && y is ImmRI) {
        o.add(cond | encodeTST(
          cmd == "TEQ" ? x.x == y.x :
          cmd == "TGT" ? x.x > y.x :
          cmd == "TLT" ? x.x < y.x :
          (x.x > y.x) && x.x != y.x,
          cmd == "TEQ" ? x.x != y.x :
          cmd == "TGT" ? x.x <= y.x :
          cmd == "TLT" ? x.x >= y.x :
          (x.x < y.x) && x.x != y.x
        ));
      } else {
        bool inv = false;
        if (y is ImmRI) {
          inv = true;
          var t = x;
          x = y;
          y = t;
        }
        o.add(cond | (
          cmd == "TEQ" ? 0x4 :
          cmd == "TGT" ? (inv ? 0x6 : 0x5) :
          cmd == "TLT" ? (inv ? 0x5 : 0x6) :
          inv ? 0x1 : 0x7
        ) << 14 | x.encode() << 3 | (y as RegRI).id);
      }
    } else {
      throw "Unknown opcode \"${cmd}\"";
    }
  }

  while (o.length < 14) o.add(0x7FFFF);
  return o;
}

const labelNames = const [
  "a", "b", "c", "d", "e", "f", "g",
  "h", "i", "j", "k", "l", "m", "n"
];

String dsRI(int x) {
  x = x & 0x7FF;
  if (x >> 3 == 0x80)
    return regNames[x & 7];
  else
    return x.toSigned(11).toString();
}

String dsReg(int x) => regNames[x & 7];

String dsRD(int x) {
  x = x & 0x1F;
  if (x >> 4 == 1) {
    return regNames[x & 7];
  } else {
    return x == 0xF ? "10" : x.toString();
  }
}

String dissasemble(List<int> rom) {
  Map<int, List<int>> jumps = {};
  List<String> cond = [];
  List<String> o = [];
  for (var i in rom) {
    if ((i >> 17) & 3 == 3) break;
    cond.add((i >> 18) & 1 == 1 ? "+" : (i >> 17) & 1 == 1 ? "-" : "");
    if ((i >> 14) & 7 == 0) {
      o.add("MOV ${dsRI(i >> 3)} ${dsReg(i)}");
    } else if ((i >> 14) & 7 == 1) {
      o.add("TCP ${dsReg(i)} ${dsRI(i >> 3)}");
    } else if ((i >> 14) & 7 == 2) {
      if ((i >> 12) & 3 == 0) {
        jumps.putIfAbsent(i & 0xF, () => []).add(o.length);
        o.add("JMP ");
      } else if ((i >> 12) & 3 == 1) {
        o.add("SLP ${dsRI(i)}");
      } else if ((i >> 12) & 3 == 2) {
        if ((i >> 2) & 1 == 1) {
          o.add("MOV ${dsReg((i & 3) | 0x4)} null");
        } else {
          o.add("SLX ${dsReg((i & 3) | 0x4)}");
        }
      } else if ((i >> 12) & 3 == 3) {
        o.add("ADD ${dsRI(i)}");
      }
    } else if ((i >> 14) & 7 == 3) {
      if ((i >> 12) & 3 == 0) {
        o.add("SUB ${dsReg(i)}");
      } else if ((i >> 12) & 3 == 1) {
        o.add("MUL ${dsRI(i)}");
      } else if ((i >> 12) & 3 == 2) {
        if ((i >> 11) & 1 == 0) {
          o.add("DGT ${dsRD(i)}");
        } else {
          o.add("DST ${dsRD(i)} ${dsRD(i >> 5)}");
        }
      } else if ((i >> 12) & 3 == 3) {
        if ((i >> 11) & 1 == 0) {
          o.add("NOT");
        } else {
          if (i & 1 == 1) {
            o.add("TEQ 1 0");
          } else if (i & 2 == 2) {
            o.add("TEQ 1 1");
          } else {
            o.add("TCP 1 1");
          }
        }
      }
    } else if ((i >> 14) & 7 == 4) {
      o.add("TEQ ${dsRI(i >> 3)} ${dsReg(i)}");
    } else if ((i >> 14) & 7 == 5) {
      o.add("TGT ${dsRI(i >> 3)} ${dsReg(i)}");
    } else if ((i >> 14) & 7 == 6) {
      o.add("TLT ${dsRI(i >> 3)} ${dsReg(i)}");
    } else if ((i >> 14) & 7 == 7) {
      o.add("TCP ${dsRI(i >> 3)} ${dsReg(i)}");
    }
  }

  for (int k = 0; k < o.length; k++) {
    o[k] = (
        jumps.containsKey(k) ? (cond[k] != "" ? "${labelNames[k]}:${cond[k]}" :
          "${labelNames[k]}: ${cond[k]}") : "${cond[k].padRight(2)}"
    ) + o[k];
    if (jumps.containsKey(k)) {
      for (var j in jumps[k]) {
        o[j] += labelNames[k];
      }
    }
  }

  return o.join("\n");
}
MC6000 in hardware - Part 1: The assembler
Share this