Getting to Grips with Assembly Language

Have you pushed Basic to its limits? Do you want to write routines that are really fast, or go beyond what Basic can do? If so, Martyn Fox will help you on your way


This is an extremely lengthy article, and has therefore been broken into a number of 'chapters'. Clicking the following links will scroll the page down to the corresponding heading, whereas clicking on the heading itself will return the view to the top of the page.

The example programs referred to in this article are provided in an associated archive, as is a StrongHelp manual (and a copy of StrongHelp itself) which is referred to in the final section. Use the icons to the right to access these resources.
Zip See


You may be wondering, "What is assembly language and why bother to learn it?"

Assembly language is a convenient way of writing machine code: the actual instructions used directly by the ARM or StrongARM processor in your computer. A program written in assembly language is the most efficient type of program; it should run faster than code produced by other methods, and take up less memory.

It's also the most laborious to write and debug, though, so why bother? It may be that you want to move on from Basic but can't be bothered to learn C or C++, the other common programming options. It may be that you want to write something which will run faster than a Basic program (BBC Basic on a RISC OS machine is actually pretty fast, but not as fast as machine code), or you may wish to write a module (which certainly can't be written in Basic). Alternatively, you may not want someone else to be able to dissect your programming, which they can do fairly easily with a Basic program.

You've nothing to lose by having a dabble, so read on.

Hex dumps, bytes and words

The first thing to do is to see what machine code actually looks like.

You can load a machine code file into a text editor of the type which can show you a hexadecimal listing of its contents or, alternatively, you could start up Basic and load the file into spare memory with the *Load command, then look at it with the *Memory command. Either way, you'll see something like this:

Address  :     7 6 5 4     B A 9 8     F E D C     3 2 1 0 :    ASCII Data


00009054 :    E92D4000    EB000011    68BD8000    E3A00006 : .@-....h..


00009064 :    E3A03C01    EF02001E    68BD8000    E58C2000 : .<....h. 


00009074 :    E1A0C002    EA000031    E92D4000    EB000007 : .1...@-...


00009084 :    68BD8000    E3A00006    E3A03C01    EF02001E : .h...<...


00009094 :    68BD8000    E58C2000    E1A0C002    EA000027 : .h. .'..


000090A4 :    E24F20AC    E2422000    E28F3060    E2833000 :  O. B`0.0


000090B4 :    E59F4084    E4920004    E4931004    E0211004 : @......!


000090C4 :    E1500001    128F0010    139EF201    E28F0068 : ..P.....h.


000090D4 :    E1500003    1AFFFFF6    E1A0F00E    00000000 : ..P.....

Everything in this listing is in hexadecimal notation. Each line shows the contents of 16 bytes of memory, split into groups of four bytes or 32 bits. Each number in the left-hand column is the address of the first byte in the first group on the line, and the numbers along the top line are the lowest digits of the addresses of the bytes below. Each 32-bit group, incidentally, is called a word. It's important to understand that every machine code instruction consists of one word and that the address of the first byte of the word must be divisible by four. The four bytes are then said to be word-aligned.

You'll no doubt agree that this listing is pretty well incomprehensible. The processor knows what the contents of the words mean, but to us poor humans it conveys nothing. Even the right-hand section, which shows the result of treating the individual bytes as ASCII code, tells us nothing.

We need a simpler way of looking at the code; one which will tell us what each instruction means.

Disassembly

We could make the text editor treat the file as machine code or look at it in memory using the *MemoryI command instead of *Memory, in which case we would see something like this:

00009054 : .@- : E92D4000 : STMDB   R13!,{r14 }


00009058 : ... : EB000011 : BL      &000090A4


0000905C : .h : 68BD8000 : LDMVSIA R13!,{pc }


00009060 : .. : E3A00006 : MOV     R0,#6


00009064 : .< : E3A03C01 : MOV     R3,#&0100


00009068 : ... : EF02001E : SWI     XOS_Module


0000906C : .h : 68BD8000 : LDMVSIA R13!,{pc }

You can see from the addresses on the left-hand side that each line now contains just one four-byte word, corresponding to one machine code instruction. The second column shows what the four bytes would represent if they were ASCII code and the third column shows the bytes themselves: still as incomprehensible as before.

When we look at the fourth and fifth columns, though, things start to get a little clearer. Each line in the fourth column contains a mnemonic (a brief description of what the instruction does), followed in the fifth column by which numbers or which of the processor's registers are involved.

Clearly, the simplest way to write machine code is by writing the mnemonics and getting them converted. The software which produced the above listing is called a disassembler. A package which does the job in reverse, turning mnemonics into machine code, is called an assembler, and a program consisting of mnemonics is said to be written in assembly language.

Comments

You may have your own ideas about how many REM statements it's worth adding to your own Basic programs. You may also occasionally look at a program you wrote some time ago and wish you had added more REMs!

The equivalent to the REM in assembly language is the comment. A comment starts with a semi-colon (;) and, like a REM statement, can either occupy a line on its own or be added onto the end of an instruction.

It's a good idea to put plenty of comments in your assembly language programs for two reasons. Firstly, assembly language is a lot more inscrutable than Basic, and it may be difficult to comprehend the workings of a program that you wrote several months ago without them. Secondly, they do not inflate the 'final product' of your programming, which is the machine code itself. You can put in as many comments as you like, but none of them will appear in the assembled code.

ARM Registers

Before we start assembling instructions, we must first understand something of the internal architecture of the processor that is going to use them.

Within the ARM and StrongARM processors there are sixteen number-stores called registers, each of which can hold a 32-bit number. These are referred to as R0, R1, R2 etc. through to R15. In fact, there are more than sixteen, because the processor actually has four modes of operation and some of the higher number registers are replaced by alternative ones when the processor is switched into a different mode, but that need not concern us in this article.

Register R15 is the program counter (PC). This always holds the address of the next instruction to be read from memory. Usually, this is increased by four (to point to the next word) each time an instruction is read. Sometime, though, the instruction may be to branch, i.e. jump, to a different address. When this happens, the number in R15 is replaced by the address where the program jumps to.

Register R14 is called the link register. Sometimes, we may wish to call a subroutine, a bit like calling a procedure in Basic. When the subroutine has finished, we will want the program to jump back to the point at which the subroutine was called and continue from there.

This is achieved with a branch linking instruction; you can see one at address &9058 in the listing above, with the mnemonic BL. Before the jump to the subroutine, the number in the program counter (the return address) is copied into the link register. To return just involves copying the link register back into the program counter.

The other higher-numbered registers do not have specific functions within the processor but are often given special jobs by the software. Register R13, for example, is normally used as the stack pointer, containing an address within a section of memory used for temporary storage. Most of the number-crunching is done by the lower-numbered registers, R0 to R6.

Some of the assembly language instructions are concerned with copying numbers from one register to another, or loading a number into a register. The instruction at &9060, for example, moves the number 6 into register R0.

The Basic Assembler

As we saw earlier, in order to write assembly language you need an assembler. Fortunately, you already have one! It's built into the BBC Basic interpreter included in every RISC OS computer. Take a look at the short file Assem01 to see how to use it:

Run

   10 REM > Assem01


   20 REM simple assembly language program


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 12


   50 P%=code%


   60 [OPT 3


   70 MOV R0,#7


   80 SWI "OS_WriteC"


   90 MOV PC,R14


  100 ]

The assembler is turned on and off by the square brackets, [ and ], in lines 60 and 100, and the assembly language instructions are put between them. Before we get to that bit, line 40 sets up a block of memory to hold the machine code instructions (12 bytes are enough for the three instructions in this simple program) and line 50 sets up the resident integer variable P%. This variable is used by the assembler to represent the program counter and determines (in this case) where in memory each instruction is put.

The OPT instruction in line 60 controls the way the assembler operates. More about that later; for now, leave it set to OPT 3.

We'll look at the rest of the program later. In the meantime, try running it. Observe what is on the screen, then press space or click the mouse to get rid of it and get back to here.

You should have seen a command window showing something like the following:

00008FB8                    OPT 3


00008FB8 E3A00007           MOV R0,#7


00008FBC EF000000           SWI "OS_WriteC"


00008FC0 E1A0F00E           MOV PC,R14

The program started up the assembler and assembled each instruction at the address pointed to by P%, putting what it was doing on the screen each time. The first instruction was placed at address &8FB8, which happened to be the value of code% and the start of the memory block set up by the DIM instruction. After each address comes the hexadecimal machine code instruction followed by the original mnemonic which created it.

After each instruction has been assembled, P% is incremented by four so that the next instruction is assembled four bytes further on. Don't worry if your address numbers were different; it's not important, but note that the addresses are always word-aligned; they all end in 0, 4, 8 or C because they're divisible by four.

All that this program does is assemble a bit of machine code; it doesn't run it. To do that, prepare to run file Assem02, which is identical to Assem01 except that it has two extra lines on the end:

Run

  110 REPEAT UNTIL GET


  120 CALL code%

This time, when you run the program, you should see the same assembled listing as before, though with different address numbers because the Basic program is longer. The cursor will be flashing away underneath. The program has paused at the loop in line 110. When you press a key it will move on to line 120, which is an instruction to run the machine code, starting at address code%.

Now run the file and see what happens; but first, a word of warning. Machine code does not have the error-trapping capabilities of Basic. If there is an error in your assembly language program (as there is bound to be at some point), it is highly likely that your computer will crash and have to be reset. If you're following an on-screen guide such as this, make sure that you can get back to where you were in it. Also make sure that you don't have any unsaved work on your desktop.

You should have found that, when you pressed a key, the machine beeped, then asked you to press Space or click the mouse to return to the desktop. It's time to examine the three instructions to find out what they're doing:

The first instruction, the MOV command, is one of the simplest of all assembly language instructions: a command to move something into a register. If the instruction had been:

then this would have been a command to copy the contents of register R1 into register R0. The hash symbol (#), though, means that '7' is an immediate constant: it specifies the actual number to be placed in the register. Without it, the assembler would assume it meant register R7 (the 'R' is optional, but makes things clearer). This instruction results in the number seven being placed in register R0.

There is a limitation to the number that can be moved into a register in this way. The machine code instruction consists of 32 bits, and only eight of them are available to hold the number, so the number itself can only have eight bits. The instruction

is permitted but is not.

The second instruction is, of course, a software interrupt (SWI), used to call the operating system. In Basic, a SWI may be called using the SYS command. As you may know, the command:

puts the values of a%, b% and c% into R0, R1 and R2 respectively, calls the SWI whose name is given, and, on return, puts the values in R0, R1 and R2 into Basic variables x%, y% and z%.

The first two instructions are the equivalent of the Basic command:

which is the equivalent of VDU 7, a command to make the machine beep.

If you were to copy Assem02 from the CD onto your hard disc so that you could modify it and change the number after the hash from 7 to 65, you should find that, instead of beeping, a letter 'A' appears on the screen. The number 65 is, of course, the ASCII code for A, and the two instructions are now the equivalent of VDU 65.

Getting back to Basic

The third instruction is the most important of all. It brings the program to an end by making the processor jump back to the Basic interpreter. Without it, the processor would load whatever was in the memory after the last instruction and try to execute it as a machine code instruction: definitely a way of crashing the computer!

When Basic executed the CALL command, it treated your program as a subroutine, getting to it with a Branch Linking or BL instruction. This caused the address in the program counter (i.e. the first instruction to be executed after your program had finished) to be placed into R14 (this is often known as the return address). To get back to this address, we simply have to copy the number in R14 back into R15 which, you will recall, is the program counter.

The assembler recognises "PC" as referring to R15, so we use it to remind ourselves that we are talking about the program counter. The instruction MOV PC,R14 simply copies the contents of R14 into the program counter, which is all that is required to pass control back to Basic.

Labels and loops

In our next experiment, we're going to make use of Basic variables within the assembly language part of the program.

Take a look at file Assem03:

Run

   10 REM > Assem03


   20 REM simple assembly language program


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 24


   50 P%=code%


   60 [OPT 3


   70 MOV R0,#ASC("A")


   80 .loop


   90 SWI "OS_WriteC"


  100 ADD R0,R0,#1


  110 CMP R0,#ASC("Z")+1


  120 BNE loop


  130 MOV PC,R14


  140 ]


  150 REPEAT UNTIL GET


  160 CALL code%

This program works in the same way as the previous one, but we've added some more instructions. The memory block created by the DIM command in line 40 has been enlarged to 24 bytes for this reason. In fact, there's no harm in creating a large block, perhaps of several thousand bytes, while you're experimenting, provided you have the RAM to spare; it can mean that you avoid the risk of running out.

We saw how the previous program could be modified by changing the immediate constant in the MOV instruction from 7 to 65 so that it printed a letter A. Line 70 does the same thing, but the simple number '65' has been replaced by ASC("A"), which means 'the ASCII code for A', to make the listing more readable. The result is just the same, but it makes it easier to follow what the program is doing.

The word loop with a dot in front of it in line 80 is a label. This is a way of marking the point in the program where something occurs so that we could refer to it at some other place. We might use a label in one of two ways:

  1. To mark a point we might wish to branch (i.e. jump) to;
  2. To mark some data we might wish to load into a register, or a place in memory where we might wish to store some data.

In this program, the label is being used as part of a repeated loop, to mark the point where the program jumps back to.

A label with a dot may either occupy a line on its own, as in this case, or be placed in front of an instruction, separated from it by a space.

The label is, in fact, a Basic variable which is created and given the current value of the program counter, P%. Putting a dot in front of it is similar to typing:

(except that you can't do that between the square brackets where the assembler is operating.)

Line 90 operates in the same way as in the previous program, calling the SWI "OS_WriteC" to print a letter A on the screen.

Line 100 introduces a new instruction, ADD, which, not surprisingly, adds two numbers together. It has to be followed by three parameters, referred to as the destination, operand one and operand two, such as:

This instruction would mean, "Take the numbers stored in registers R1 and R2, add them together and store the result in R0."

The destination (R0) and operand one (R1) must be registers. Operand two could be either a register or an immediate constant.

The actual instruction in line 100 has an immediate constant for operand two and the register where the answer is stored is the same as the one where the other number is taken from. There is nothing wrong with this. The instruction means, "Take the number in R0, add 1 to it and put the result back into R0," in other words, "increment the number in R0 by 1."

Comparisons and processor flags

The next instruction, CMP, is a comparison and is followed by two operands. It subtracts operand 2 from operand 1, but doesn't put the result anywhere. Instead, it sets or clears one or more of the processor flags. These are single bits which are actually bits in R15 that are not used by the program counter. There are four of them, each usually referred to by a letter:

Negative (N):   Set if operand 2 is greater than operand 1
Zero (Z):   Set if operand 2 is equal to operand 1 (i.e. the result of the subtraction is zero)
Carry (C):   Set if operand 1 is greater than operand 2, treating them as unsigned numbers
Overflow (V):   Set if a mathematical overflow occurred

In fact, various mathematical instructions can set these flags, but they only do so if they have the suffix S on the end of their mnemonic. The CMP instruction doesn't need the S suffix because its only purpose is to set the flags.

When the CMP instruction is executed for the first time, operand 1 (R0) is the ASCII code for 'A' (65) and it subtracts 91, a number one greater than the ASCII code for 'Z', from it. The result is clearly negative, so the N flag would be set. Each time round the loop, however, R0 has been increased by 1, and eventually reaches the value 91. When this happens, the result of the comparison becomes zero, the N flag is clear and the Z flag is set.

Branches and conditional execution

Now we get to the instruction in line 120 where a decision is made whether to go back for a repeat of the loop or to plough straight on. This instruction is a branch; a jump forwards or backwards, specified not in terms of the destination address but as the distance moved from where we are now. The number in the program counter is increased or decreased by the number contained in the instruction so that execution continues at a different point. This need not greatly concern us, though, because the assembler works everything out for us. We just have to tell it the address to branch to, which is the value of the label loop.

If we wanted the program to branch every time it reached this point, the instruction would be:

The other two letters, NE, mean that this instruction is our first example of conditional execution.

Any instruction can be executed conditionally, the condition being determined by the suffix. There are sixteen possibilities:

Obviously, you never need to use the AL suffix because unconditional execution doesn't need a suffix. The use of NV is also frowned upon because the bit-pattern it sets up might be used for some other condition some day.

The BNE instruction in line 120 means 'Branch if not equal'. As long as the CMP instruction in the previous line is comparing two different numbers, it will keep the Z flag clear. When the value in R0 reaches ASC("Z")+1, i.e. 91, the instruction will be comparing identical numbers, so the Z flag will be set. Under this condition, the branch instruction will not be executed and the program finishes.

We could have achieved the same result with the instruction:

In this case, the loop repeats as long as the N flag is set. When operand 1 reaches the same value as operand 2, not only are they equal but the result of the subtraction is no longer negative. Doing it this way would be safer if R0 was not always incremented by 1 each time. If for some reason R0 skipped the value in operand 2, the BNE instruction would fail to stop the loop because the situation where both values were equal would be missed. The BMI instruction, though, would still catch it.

You've probably worked out by now what this program does, even if you haven't run it to have a look. Each time round the loop, a character is printed whose ASCII code is one higher than the one before. It starts at 'A' and ends with 'Z'. In other words, it prints the alphabet.

Loading and storing memory

So far we've seen how to move immediate numbers into registers and move numbers between registers, but we haven't loaded from or stored to the main memory.

Take a look at file Assem04. It works in a similar manner to the first program, except that the ASCII code used to call the SWI is loaded from a memory location instead of being moved into a register as an immediate constant.

Run

   10 REM > Assem04


   20 REM simple assembly language program


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 100


   50 P%=code%


   60 [OPT 3


   70 .data


   80 EQUD &41


   90 .start


  100 ADR R1,data


  110 LDR R0,[R1]


  120 SWI "OS_WriteC"


  130 MOV PC,R14


  140 ]


  150 REPEAT UNTIL GET


  160 CALL start

The code byte is stored at a location pointed to by the label data, and is put there by the EQUD command. This is one of several assembler commands which put data into memory rather than assemble machine code instructions. The full list is:

EQUB:   stores one byte,   e.g. EQUB &41
EQUW:   stores two bytes,   e.g. EQUW &0D0A
EQUD:   stores four bytes (one word),   e.g. EQUD &56F4D31A
EQUS:   stores a string,   e.g. EQUS "This is a string"

A word of explanation here. These terms were originally devised for the 8-bit 6502 assembler built into the Basic interpreter in the BBC Microcomputer in the early 1980s. In those days, the term 'word' was used to mean 16 bits or two bytes; hence the term EQUW for two bytes. The expression EQUD meant 'double word', or four bytes. When the 32-bit ARM processor was developed, it was decided that it would be better for 'word' to mean four bytes, or 32 bits. The old expressions, however, have been retained for compatibility.

Keeping things aligned

It will have occurred to you that line 80 appears to use EQUD to store one byte. There is a reason for doing it this way. The address pointed to by label data is word-aligned, i.e. divisible by four. If we used the EQUB command to store one byte, the program counter, P%, would be incremented by one, not four, and so would no longer be word-aligned. This is important because it would mean that the machine code instruction in line 100 (and all the subsequent instructions) were not word-aligned and the processor would not cope with this.

Line 80 could have been written as:

The ALIGN command means, "If P% is not divisible by four, then increment it until it is." It is very important to use it if you add something to the memory which doesn't consist of a multiple of four bytes. We might, for example, wish to put in a string, terminated by a zero:

The string itself contains 16 bytes: the zero increases this to 17. The ALIGN command then adds a further three bytes to restore word-alignment.

Coming back to the original form of line 80, the expression EQUD &41 is effectively the same as EQUD &00000041. Four-byte numbers are always stored in memory with the least significant byte in the lowest of the four addresses. The byte pointed to by label data will contain &41, and the next three bytes will contain zeros.

Memory addressing

The LDR instruction in line 110 loads the contents of a memory location into register R0. It's not possible to put the entire 32-bit address of label data into the instruction; the entire instruction only consists of 32 bits, so we have to do it some other way.

The address of the location to be loaded from is contained within the square brackets. In this case, it's pointed to by R1. It's also possible to add an offset (either a second register or an immediate constant) which comes after the register number but is still within the brackets. We'll see how that is used later.

The form of the instruction in line 110 tells the processor to load from the address pointed to by R1, and to put the contents into R0.

We set up R1 in line 100 with an ADR instruction. You may be wondering how, if we can't get a complete 32-bit address into the LDR instruction, we're able to do it with ADR. The answer is that ADR is actually a pseudo-instruction: one which the assembler effectively creates out of another instruction. In this case, it calculates the difference between the required address and the current value of the program counter and sets up an instruction to add or subtract this difference to or from the PC. In other words, the address is stated relative to the PC, not in absolute terms.

Pipelining

All this is transparent to the programmer when using the ADR instruction except for one thing. There is a limit to how far away from the instruction the target address can be because the offset number can only have eight bits. It's actually possible to refer to an address a little further forward from the ADR instruction than behind it because of a feature called pipelining.

When the processor fetches an instruction from memory, it decodes it while it's fetching the next one and executes it while fetching the one after that. What this means is that, while an instruction is being executed, the program counter has already moved on eight bytes to fetch the instruction after next. If the instruction does something relative to the PC, such a branch or ADR, it is doing it relative not to its own address but to an address eight bytes further on. The assembler always takes this into account when setting up such instructions, but it's best to bear it in mind.

Loading bytes and words

Returning to our program, the LDR instruction always loads a complete word which, in this case, is &00000041. This doesn't matter much in this program because the OS_WriteC SWI only acts on the bottom byte of R0. If we wanted to load a single byte (for example, if we had a string of ASCII codes), we could add a B suffix to the LDR instruction, making it: Only the bottom eight bits would be loaded. This instruction would work just as well in line 110 as the LDR instruction.

Note, by the way, that the B suffix goes after any condition code on the instruction. If, for example, the above instruction was only to be executed if the zero flag was clear, its mnemonic would be:

It should be apparent now how this program works. The address of the data word is set up in R1 and is used to load an ASCII code into R0. The SWI is then called to print a character on the screen and the program then exits.

A problem

The last line of the Basic part of the program has been changed. The part to be executed no longer starts at the beginning of the block at code%, but four bytes further on at the label called start, so that is the address which we CALL. This may be inconvenient. It may seem preferable to put the data label and the ASCII code at the end of the code, as in the file Assem05:

Run

   60 [OPT 3


   70 .start


   80 ADR R1,data


   90 LDR R0,[R1]


  100 SWI "OS_WriteC"


  110 MOV PC,R14


  120 .data


  130 EQUD &41


  140 ]

There is certainly nothing wrong with the assembly language instructions here, but you will find if you try to run Assem05 that you get an error message saying, "Unknown or missing variable at line 80".

It's easy to work out what's going wrong. In line 80, the program has to do something with the value of label data which, you will recall, is a Basic variable. In the previous listing, this variable was created in line 70, given the current value of P% and used in line 100. By the time the program reached line 100, it already knew the value of variable data.

In this program, though, the variable data is used in line 80 but not created until the program reaches line 120. How do we get round this problem?

Two-pass assembly

The answer is to assemble the code twice. On the first 'pass' we create all the instructions but ignore any references to labels we don't know about. The instruction will occupy its four bytes but the numbers it contains may be wrong.

By the time we get to the end of the first pass, we should have met all the labels. We can then go back and assemble the code again, exactly as it was before, except that this time all the references to labels should work (provided, of course, that we don't include a reference to a label that doesn't exist!).

The easiest way to run a piece of code twice in Basic is with a FOR ... NEXT loop, and this is what we do in Assem06:

Run

   10 REM > Assem06


   20 REM simple assembly language program


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 100


   50 FOR pass%=0 TO 3 STEP 3


   60   P%=code%


   70   [OPT pass%


   80   .start


   90   ADR R1,data


  100   LDR R0,[R1]


  110   SWI "OS_WriteC"


  120   MOV PC,R14


  130   .data


  140   EQUD &41


  150   ]


  160 NEXT


  170 REPEAT UNTIL GET


  180 CALL start

Making use of OPT

The assembler has to work differently on the two passes. The way it behaves is controlled by the OPT statement at the start of the assembler section. In all the examples up to now, we've left this set to 3.

The individual bits of the value of OPT control different aspects of the assembler:

Bit 0:   If clear, the assembled listing is not shown on the screen; if set, it is shown.
Bit 1:   If clear, unknown labels are ignored; if set, they cause an error.
Bit 2:   If clear, P% acts as both the program counter and a pointer to where the machine code is assembled. If set, offset assembly is used. P% then acts as the program counter but O% controls where in memory the instruction is placed. Both variables are normally incremented together.
Bit 3:   If set, a range check is applied to ensure that we don't try to assemble more code than will fit into the data block which we created to hold it. We can set L% to the upper limit and assembly will stop if P% (or O%) exceeds it.

In this program, the FOR ... NEXT loop creates two passes, the first with OPT set to zero and the second with it set to 3. On the first pass, the listing is not shown on the screen (we don't want to see it twice!) and unknown labels are ignored. On the second pass, the listing is shown and any references to non-existent labels cause an error.

Note that P% is set to code% inside the loop, so that it is reset at the start of the second pass. It is important for both passes to start in the same place.

This time, the program will work.

If we didn't want to see the assembled listing on either pass, we could, of course, change line 50 to read:

Both passes now will have bit 0 clear.

Offset assembly

All the code assembled up to now has been run in the memory buffer where it was assembled. Suppose, though, we wanted to save the assembled code as an Absolute file (such as the !RunImage file of an application). Suppose also that the code contained references to addresses within it in absolute terms, rather than relative to the program counter.

An Absolute file is loaded into memory starting at &8000 and run from there. We would have to assemble the code in the data block, but its contents would have to be as though it started at &8000.

We can do this using offset assembly, with bit 2 of OPT set on both passes, as in Assem07:

Run

   10 REM > Assem07


   20 REM simple assembly language program with offset assembly


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 100


   50 FOR pass%=4 TO 7 STEP 3


   60   P%=&8000:O%=code%


   70   [OPT pass%


   80   .start


   90   ADR R1,data


  100   LDR R0,[R1]


  110   SWI "OS_WriteC"


  120   SWI "OS_Exit"


  130   .data


  140   EQUD &41


  150   ]


  160 NEXT


  170 REPEAT UNTIL GET


  180 OSCLI ("Save MyFile "+STR$~code%+" "+STR$~O%)


  190 *SetType MyFile Absolute

This time, we set P% to &8000 and O% to code%. Watch the assembled listing on the screen as you run the program: instead of the numbers on the left-hand side referring to addresses within Basic's variable workspace, they now start at &8000.

Instead of calling the code and displaying the letter A on the screen, the last part of the program saves the code as a file, after you've pressed a key (you may wish to dispense with line 170). The OSCLI command sets up a command line string of the form:

where xxxx is the start address (code% in hex form) and yyyy is the address following the end of the program (O% after assembly has finished).

The file will be saved in your currently selected directory and will run if you double-click on it.

There is, incidentally, an important difference between the assembly language in this file and that in the previous one. Because the program is not CALLed from Basic, but run as an absolute file, it doesn't have a return address passed to it in R14, so it can't finish with MOV PC,R14. Instead, it calls SWI OS_Exit, which passes control straight back to the operating system.

PC-relative addressing

Getting a number into R0 in the previous example involved using a second register, R1. This can be a bit cumbersome; you may not have a register to spare for this job if your program is complex, and it takes two instructions to get a number from memory into a register.

If you've finished with the address once you've loaded the data from it, there's nothing wrong with the following:

R0 is first set up to point to the address. The address is then overwritten by the data itself. This gets rid of the extra register but it still takes two instructions, the first of which sets up the address by referring to the program counter.

We can combine the two instructions into one, which looks like this:

This is really a pseudo-instruction, like ADR. The assembler turns it into an instruction to load from an address relative to the program counter, which involves using indexed addressing; something we shall look at next.

Pre-indexed addressing

As we heard earlier, it's possible to put two parameters between the square brackets in a LDR instruction. The first one (the base) has to be a register, but the second (the offset) may be another register or an immediate constant. The processor adds them together to get the address to load from.

An example:

R0 is loaded from the address obtained by adding the contents of R1 and R2.

We might have a label called data which points to the start of eight bytes of data. We want to load the first four-byte word into R0 and the second into R1:

This is especially useful if we want to load repeatedly from successive addresses, using a loop.

Look at file Assem08:

Run

   70 [OPT pass%


   80 ;set up R1 to point to text string


   90 ADR R1,string


  100 .loop


  110 LDRB R0,[R1];load one character


  120 ADD R1,R1,#1;increment R1 ready for next character


  130 CMP R0,#0;check for terminating zero


  140 ;next two instructions executed only if end of string not yet reached


  150 SWINE "OS_WriteC"


  160 BNE loop


  170 MOV PC,R14


  180 .string


  190 EQUB &0A:EQUS "This is a string":EQUB &0A:EQUB 0:ALIGN


  200 ]

From now on, we'll only show the part of the program between the square brackets which turn the assembler on and off, except where necessary. This is because the Basic parts remain the same as before. One change which has been made starting with Assem08, though, is that the REPEAT UNTIL GET loop has been removed: the program assembles the code and executes it immediately.

You'll also notice that we've started adding comments because the code is getting more complicated.

Printing a string

Returning to our latest listing, this is one of several programs which print a string one character at a time, using a call to SWI OS_WriteC for each character. It's not actually necessary to do this in practice; one call to OS_Write0 will achieve the same result.

Let's take a look at the string first. This is contained in several statements in line 190, starting with a LF character to create a blank line. The text of the string is in the EQUS statement (you could change it to anything you like!) and is followed by another LF. After this comes a null character (zero) which marks the end of the string. Last of all, we have an ALIGN instruction to ensure that whatever comes next is word-aligned. It's not actually necessary in this case, because nothing follows the string, but it's a good habit to get into.

This version of the program works in the simplest possible way. Register R1 is set up to point to the first character, which is loaded into R0. Note that we use LDRB, not LDR, as we are only loading one eight-bit ASCII character, which goes into the bottom byte of R0. After loading, we increment R1 by one to point to the next character.

We check the character we've just loaded, using the CMP instruction, to see if it is zero. If it is not, we print it and branch back. Note that the instructions in lines 150 and 160 which do this only do so if the character is not zero, due to the NE suffix on their mnemonics. Once the terminating zero has been loaded, we get to line 170 and the program exits.

This isn't actually indexed addressing; we're just using an address in R1 and incrementing it each time we want to read another character. It's possible that we might want to keep R1 pointing to the start of the string, perhaps so that we can load it again. To see how we could do this, look at file Assem09:

Run

   70 [OPT pass%


   80 ;set up R1 to point to start of text string


   90 ADR R1,string


  100 MOV R2,#0;set up R2 to index first character of string


  110 .loop


  120 LDRB R0,[R1,R2];load one character


  130 ADD R2,R2,#1;increment R2 ready for next character


  140 CMP R0,#0;check for terminating zero


  150 ;next two instructions executed only if end of string not yet reached


  160 SWINE "OS_WriteC"


  170 BNE loop


  180 MOV PC,R14


  190 .string


  200 EQUB &0A:EQUS "This is a string":EQUB &0A:EQUB 0:ALIGN


  210 ]

This time, we set up R1 to point to the start of the string and R2 to select an individual character within the string; we say that R2 indexes a character, starting with the one that's zero bytes in (i.e. the first one).

In line 120, we load a byte from the address obtained from the values of R1 + R2. If we haven't reached the terminating zero, we increment R2 for the next character.

Write back

We can streamline the loop a little by combining the loading and incrementing instructions, using a facility called write back, as shown in listing Assem10:

Run

   70 [OPT pass%


   80 ;set up R1 to point to one byte before start of text string


   90 ADR R1,string-1


  100 .loop


  110 LDRB R0,[R1,#1]!;load one character


  120 CMP R0,#0;check for terminating zero


  130 ;next two instructions executed only if end of string not yet reached


  140 SWINE "OS_WriteC"


  150 BNE loop


  160 MOV PC,R14


  170 .string


  180 EQUB &0A:EQUS "This is a string":EQUB &0A:EQUB 0:ALIGN


  190 ]

Note the pling (!) on the end of the LDRB instruction in line 110. We derive the address to load from by adding the immediate constant (1 in this case) to the value of R1. After doing the loading, this value is written back into R1, due to the presence of the pling. The effect of this is that R1 is incremented each time the instruction is executed.

Because R1 + 1 points to the next character to be loaded, R1 has to be set up initially to point to one byte before the string starts.

Post-indexed addressing

The examples we've just been looking at used pre-indexed addressing. This means that the two parameters in the LDR (or LDRB) instruction are added together before data is loaded from memory.

An alternative technique is post-indexed addressing, which is used in listing Assem11:

Run

   70 [OPT pass%


   80 ;set up R1 to point to start of text string


   90 ADR R1,string


  100 .loop


  110 LDRB R0,[R1],#1;load one character, then increment R1


  120 CMP R0,#0;check for terminating zero


  130 ;next two instructions executed only if end of string not yet reached


  140 SWINE "OS_WriteC"


  150 BNE loop


  160 MOV PC,R14


  170 .string


  180 EQUB &0A:EQUS "This is a string":EQUB &0A:EQUB 0:ALIGN


  190 ]

The instruction in line 110 still has two parameters in its operand, but one of them is now outside the square brackets. In this case, the data to be loaded is pointed to by R1 on its own, and R1 is incremented by having the second parameter added to it after the loading has been done. There is no pling suffix because write back is implicit in post-indexed addressing.

Storing

We've only seen examples of data being loaded from memory so far. Storing a number is just the same, and is done with a STR instruction to store a word, or STRB to store one byte. The various forms of addressing that we've seen all work in the same way.

Subroutines and the stack

If you've programmed in Basic, or any other high-level language, you'll be aware of the advantages of structured programming and the way programs can be broken down into smaller units by using functions and procedures.

The equivalent in assembly language is the subroutine, called using a BL instruction.

Take a look at the following listing. You won't find it as a file to be run from the CD, for a reason which will become apparent shortly.

   10 REM > Assem12a


   20 REM use of subroutine to multiply by six


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 100


   50 FOR pass%=0 TO 3 STEP 3


   60   P%=code%


   70   [OPT pass%


   80   LDR R0,buf;get number passed from Basic via buffer


   90   BL times_six;call subroutine to multiply


  100   STR R0,buf;deposit answer in buffer for Basic to find


  110   MOV PC,R14


  120   ;


  130   ;subroutine to multiply value of R0 by six


  140   .times_six


  150   ADD R0,R0,R0,LSL #1;multiply by three


  160   MOV R0,R0,LSL #1;multiply by two


  170   MOV PC,R14


  180   ;


  190   .buf EQUD 0


  200   ]


  210 NEXT


  220 REPEAT


  230   INPUT "Give me a number "a%


  240   !buf=a%


  250   CALL code%


  260   PRINT !buf


  270 UNTIL FALSE

This is a program to multiply a number, entered by the user, by six and print it on the screen. To avoid writing a long and complicated assembly language program, most of the work is done by Basic and the machine code part just does the multiplication.

The assembled code includes a one-word buffer, pointed to by label buf. Because buf is a Basic variable, is can be used in the Basic part of the program which follows the assembly. After assembling the code, the loop is entered. The number INPUTted is put into the buffer for the machine code to find when it is called. The machine code multiplies the number by six (we'll see how later) and puts the answer back into the buffer for the Basic part to find and print. The program is terminated by pressing Escape.

Looking now at the assembly language, in line 80 the number is loaded into R0 from the buffer, using PC-relative addressing. The following instruction calls the subroutine. You will recall from earlier that the BL instruction causes the address of the next instruction to be copied into R14, known as the link register, so that the program can return to the right point when the subroutine has finished.

Don't worry for now how the subroutine works; we'll look at it later. For now, think of it as a 'black box' which returns a value in R0 six times the original.

When the program returns to line 100, the new value of R0 is stored in the buffer and the machine code part exits, back to Basic.

Have you spotted a problem here? The reason that this particular listing is shown in this article but isn't included in the files to be run is that, if you did run it, your computer would crash.

As far as Basic is concerned, the whole machine code part of the program is a subroutine. When it got to the CALL command, it branched to the assembled code with a BL instruction, putting its return address in R14. We give control back to Basic by copying R14 back into the program counter.

Unfortunately, we've used a BL instruction ourselves in line 90, putting the address of the following STR instruction into R14 and thereby overwriting the return address previously put there by Basic. When we get to the final exit point at line 110, the address of the line 100 instruction will still be there. Instead of handing control back to Basic, the program will jump back to the instruction at line 100 and go into an infinite loop, possibly requiring you to reset your computer.

In effect, we have nested subroutines. We may wish to go further and have subroutines which call other subroutines and so on, so we need a way to store R14 and recover it later. If a subroutine uses other registers, we may wish to store them as well and recover their values when the subroutine finishes.

What we need is a stack: an area of memory used for temporary storage.

What's a stack?

A good analogy of a stack is a column of building blocks with numbers on them. Whenever we want to store a number, we put its block on top of the column. When we take blocks off the top, we take them in the reverse order to the order in which they were put on. If, for example, we placed number x, then y, then z on the column, the first number we retrieved would be z, followed by y, followed by x. For obvious reasons, this is known as a 'LIFO' (Last In, First Out) stack. The height of the column is limited only by the space available to contain it.

In software, our column of blocks is, of course, a section of memory. The stack can grow either upwards from the bottom or downwards from the top. In fact, the analogy starts to break down here because most stacks grow downwards, and if our column of blocks were like a computer stack, it would have to be hanging from the ceiling with new blocks being stuck on the bottom!

The address where new data can be stored on the stack is contained in the stack pointer. It is customary to use R13 for this purpose. It could point to either the first free address or the last address that was used.

If we are running our program from Basic with the CALL command, we can use part of Basic's stack. There should be plenty of it to spare, unless your machine is short on memory and the program only just fits, as it can occupy the space between HIMEM and the top of the variables. If you are writing a program to be run as an Absolute file, like our earlier listing, Assem07, you will have to set up your own block of memory to act as the stack and set R13 to point to it.

The LDM and STM instructions

The LDM and STM instructions load and store multiple registers, using the address in one register as a basis for where to put the data. There is a range of options for how these instructions will work; for example: stores the contents of R0, R1, R2 and R3 in that order, starting with R0 at the address pointed to by R4. The I suffix means that the address is incremented as each register is stored, and the A suffix that it is incremented after each store, rather than before. The final address is written back into R4 because of the !, so R4 ends up pointing to the address one word above where R3 is stored.

To read the data out again, you could use:

This time, the D suffix means that the address is decremented as each register is loaded and the B suffix that this happens before the load.

The list of registers between the curly brackets can include individual registers separated by commas, e.g. {R0,R3,R5}, or a continuous range, using a hyphen as in the example above, or a combination of the two.

Foolproof stacks

There is considerable scope for getting the above instructions wrong! Fortunately, there is a set of pseudo-instructions which can be used in place of them to implement a LIFO stack. Registers can be pushed onto the stack and pulled off again in reverse order.

As we saw earlier, a stack can either start at the bottom of the memory block and work upwards (an ascending stack) or at the top and work downwards (a descending stack). It can be full, where the stack pointer (usually R13) points to the address where the last register was stored; or empty, where R13 points to the first free location.

You can implement these options with the following pseudo-instructions:

STMEA, LDMEA    empty ascending stack
STMED, LDMED    empty descending stack
STMFA, LDMFA    full ascending stack
STMFD, LDMFD    full descending stack (the most commonly used type of stack)

The great advantage of these pseudo-instructions is that the same instruction is used for pushing something onto the stack and for pulling it off again, with just the first two letters changing. For example, you can push all the registers except the program counter and stack pointer onto the stack with:

and pull them off again with:

Now look at file Assem12:

Run

   10 REM > Assem12


   20 REM use of subroutine to multiply by six and use of stack


   30 ON ERROR REPORT:PRINT " at line ";ERL:END


   40 DIM code% 100


   50 FOR pass%=0 TO 3 STEP 3


   60   P%=code%


   70   [OPT pass%


   80   STMFD R13!,{R14};save R14 on stack


   90   LDR R0,buf;get number passed from Basic via buffer


  100   BL times_six;call subroutine to multiply


  110   STR R0,buf;deposit answer in buffer for Basic to find


  120   LDMFD R13!,{R14};restore R14 from stack


  130   MOV PC,R14


  140   ;


  150   ;subroutine to multiply value of R0 by six


  160   .times_six


  170   ADD R0,R0,R0,LSL #1;multiply by three


  180   MOV R0,R0,LSL #1;multiply by two


  190   MOV PC,R14


  200   ;


  210   .buf EQUD 0


  220   ]


  230 NEXT


  240 REPEAT


  250   INPUT "Give me a number "a%


  260   !buf=a%


  270   CALL code%


  280   PRINT !buf


  290 UNTIL FALSE

As you can see, two extra instructions have been added, at lines 80 and 120. The value of R14, passed to our program by Basic, is now protected by being stored on the stack, so it doesn't matter if we use R14 when calling the subroutine, or make any other use of it, for that matter, provided we pull it off the stack again when we've finished.

Although we're only storing one register on the stack, it is still worth using the 'store multiple' and 'load multiple' instructions to do so because they make it easier to control the stack pointer register, R13. Before R14 is pushed onto the stack, R13 is decremented to point to an unused address for it to go into. We might wish to push more registers onto the stack before we pull R14 off again. They would go into addresses below the one where we just pushed R14, and should be pulled off again (in reverse order) before we pull R14 off at line 120.

In this example, we restore the value of R14, then transfer it to the program counter in line 130 to return to Basic. It's not really necessary to do this as two separate steps; there is no reason why we couldn't pull the return address off the stack and put it straight into the program counter, instead of going via R14. Lines 120 and 130 could therefore be combined into:

Repeating the earlier example, we can save all the registers except the program counter and stack pointer at the start of a subroutine with:

and restore them and return to the main program with:

Shifting bits sideways

Now we'll examine the subroutine in listing Assem12, which multiplies the number in R0 by six.

All the work is done in lines 170 and 180. The first line looks very elaborate:

Basically, this is an instruction to add R0 (as operand one) to R0 (as operand two) and put the result in R0 (as the destination register); in other words, to double the value of R0. There's an extra bit on the end of the instruction, though: the LSL #1 part.

LSL stands for 'Logical Shift Left', and means that all the bits of operand two are shifted to the left, in this case by one place, replacing the lowest bit with zero. The equivalent of this instruction in Basic would be:

The effect of shifting all the bits in operand two by one place to the left in binary arithmetic is, of course, to double the number. Adding it to operand one has the overall effect of multiplying the value in R0 by three.

The second line is a bit simpler:

This instruction simply replaces the value in R0 by itself, but shifted one place to the left and thus doubled, as in:

in Basic.

So the effect of the two instructions is to multiply the value of R0 by six.

There are other types of shift:

LSR ('Logical Shift Right'): all the bits are shifted to the right, the highest bit(s) being replaced by zeros.

ASR ('Arithmetic Shift Right'): like LSR except that the highest bit is replaced by whatever was there before (0 or 1). This is to preserve the sign of signed numbers.

In the shifts listed so far, the bit which 'falls out of the end' of the register is moved into the carry flag.

ROR ('ROtate Right'): the bits are shifted to the right, and bit 0 is copied into bit 31.

RRX ('Rotate Right eXtended;): the same as ROR except that the carry flag acts as an extra bit.

The shift instruction may be followed by either an immediate constant, as in listing Assem12, or a register which contains the number of positions to be shifted.

Further reading

There are 25 different ARM instructions and we've covered roughly ten so far. The remainder consist chiefly of arithmetic instructions; as well as ADD, there is ADC (add with carry), SUB (subtract) and MUL (multiply) and several others. There are also several bit-manipulation instructions (AND, ORR, EOR etc.) which are similar to their Basic equivalents.

A full list of these instructions can be found in Guttorm Vik's StrongHelp assembly language manual, which is an excellent reference source for this subject.

In a future article, we'll be looking at assembly language in action by examining the source code for the IClear module. This module enables the text in a writable icon to be cleared and replaced by new text by double-clicking on the icon and typing a new character. We'll be looking at an upgraded version of IClear which will be published here for the first time.


StrongHelp assembly language manual

If you have not already installed StrongHelp on your system, then you will need to do so in order to access the manual provided here. Use the icons to the right either to access StrongHelp for installation onto your computer, or to run a copy directly from the CD.
See Run

The icons to the right access the StrongHelp Assembly manual. If your system has already 'seen' a copy of StrongHelp, clicking the 'Run' icon will launch the manual into it; otherwise, clicking the icon will produce an error message.
See Run