RISCOS.com

www.riscos.com Technical Support:
Acorn C/C++

 

Portability


The C programming language has gained a reputation for being portable across machines, while still providing capabilities at a machine-specific level. The fact that a program is written in C by no means indicates the effort required to port software from one machine to another, or indeed from one compiler to another. Obviously the most time-consuming task is porting between two entirely different hardware environments, running different operating systems with different compilers. Since many users of the Acorn C compiler will find themselves in this situation, this chapter deals with a number of issues you should be aware of when porting software to or from our environment. The chapter covers the following:

  • general portability considerations
  • major differences between ANSI C and the well-known 'K&R' C as defined in the book The C Programming Language, (first edition) by Kernighan and Ritchie
  • using the Acorn C compiler in 'pcc' compatibility mode
  • environmental aspects of portability.

General portability considerations

If you intend your code to be used on a variety of different systems, there are certain aspects which you should bear in mind in order to make porting an easy and relatively error-free process. It is essential to single out items which may make software system-specific, and to employ techniques to avoid non-portable use of such items. In this section, we describe general portability issues for C programs.

Fundamental data types

The size of fundamental data types such as char, int, long int, short int and float will depend mainly on the underlying architecture of the machine on which the C program is to run. Compiler writers usually implement these types in a manner which best fits the architectures of machines for which their compilers are targeted. For example, Release 5 of the Microsoft C Compiler has int, short int and long int occupying 2, 2 and 4 bytes respectively, where the Acorn C Compiler uses 4, 2 and 4 bytes. Certain relations are guaranteed by the ANSI C Standard (such as the fact that the size of long int is at least that of short int), but code which makes any assumptions regarding implementation-defined issues such as whether int and long int are the same size will not be maximally portable.

A common non-portable assumption is embedded in the use of hexadecimal constant values. For example:

 int i;
 i = i & 0xfffffff8; /* set bottom 3 bits to zero, assuming 32-bit int */

Such non-portability can be avoided by using:

 int i;
 i = i & ~0x07; /* set bottom 3 bits to zero, whatever sizeof(int) */

If you find that some size assumptions are inevitable, then at least use a series of assert calls when the program starts up, to indicate any conditions under which successful operation is not guaranteed. Alternatively, write macros for frequently-used operations so that size assumptions are localised and can be altered locally.

Byte ordering

A highly non-portable feature of many C programs is the implicit or explicit exploitation of byte ordering within a word of store. Such assumptions tend to arise when copying objects word by word (rather than byte by byte), when inputting and outputting binary values, and when extracting bytes from or inserting bytes into words using a mix of shift-and-mask and byte addressing. A contrived example is the following code which copies individual bytes from an int variable w into an int variable pointed to by p, until a null byte is encountered. The code assumes that w does contain a null byte.

int a;
char *p = (char *)&a;
int w = AN_ARBITRARY_VALUE;

for (;;)
{
  if ((*p++ = w) == 0) break;
  w >>= 8;
}

This code will only work on a machine with even (or little-endian) byte-sex, and so is not portable. The best solution to such problems is either to write code which does not rely on byte-sex, or to have different code to deal appropriately with different byte-sex and to compile the correct variant conditionally, depending on your target machine architecture.

Store alignment

The only guarantee given in the ANSI C Standard regarding alignment of members of a struct, is that a 'hole' (caused by padding) cannot exist at the beginning of the struct. The values of 'holes' created by alignment restrictions are undefined, and you should not make assumptions about these values. In particular, two structures with identical members, each having identical values, will only be considered equal if field-by-field comparison is used; a byte-by-byte, or word-by-word comparison may not indicate equality.

This may also have implications on the size requirements of large arrays of structs. Given the following declarations:

#define ARRSIZE 10000
typedef struct
        {
          int i;
          short s;
         } ELEM;
ELEM arr[ARRSIZE];

this may require significantly different amounts of store under, say, a compiler which aligns ints on even boundaries, as opposed to one which aligns them on word boundaries.

Pointers and pointer arithmetic

A deficiency of the original definition of C, and of its subsequent use, has been the relatively unrestrained interchanging between pointers to different data types and integers or longs. Much existing code makes the assumption that a pointer can safely be held in either a long int or int variable. While such an assumption may indeed be true in many implementations on many machines, it is a highly non-portable feature on which to rely.

This problem is further compounded when taking the difference of two pointers by performing a subtraction. When the difference is large, this approach is full of possible errors. For this purpose, ANSI C defines a type ptrdiff_t, which is capable of reliably storing the result of subtracting two pointer values of the same type; a typical use of this mechanism would be to apply it to pointers into the same array.

Function argument evaluation

Whilst the evaluation of operands to such operators as && and || is defined to be strictly left-to-right (including all side-effects), the same does not apply to function argument evaluation. For example, in the function call f(i, i++);, the issue of whether the post-increment of i is performed after the first use of i is implementation-dependent. In any case, this is an unwise form of statement, since it may be decided later to implement f as a macro, instead of a function.

System-specific code

The direct use of operating system calls is, as you would expect, non-portable. If you use code which is obviously targeted for a particular environment, then it should be clearly documented as such, and should preferably be isolated into a system-specific module, which needs to be modified when porting to a new machine or operating system. Pathnames of system files should be #defined and not hard-coded into the program, and, as far as possible, all processing of filenames should be made easy to modify. Many file operations can be written in terms of the ANSI input/output library functions, which will make an application more portable. Obviously, binary data files are inherently non-portable, and the only solution to this problem may be the use of some portable external representation.

ANSI C vs K&R C

The ANSI C Standard has succeeded in tightening up many of the vague areas of K&R C. This results in a much clearer definition of a correct C program. However, if programs have been written to exploit particular vague features of K&R C, then their authors may find surprises when porting to an ANSI C environment. In the following sections, we present a list of what we consider to be the major differences between ANSI and K&R C. These differences are at the language level, and we defer discussion of library differences until a later section. The order in which this list is presented follows approximately relevant parts of the ANSI C Standard Document.

Lexical elements

The ordering of phases of translation is well-defined. Of special note is the preprocessor which is conceptually token-based (which does not yield the same results as might naively be expected from pure text manipulation).

A number of new keywords have been introduced with the following meanings:

  • The type qualifier volatile which means that the object may be modified in ways unknown to the implementation, or have other unknown side effects. Examples of objects correctly described as volatile include device registers, semaphores and flags shared with asynchronous signal handlers. In general, expressions involving volatile objects cannot be optimised by the compiler.
  • The type qualifier const which indicates that a variable's value should not be changed.
  • The type specifier void to indicate a non-existent value for an expression.
  • The type specifier void *, which is a generic pointer to or from which pointer variables can be assigned, without loss of information.
  • The signed type qualifier, to sign any integral types explicitly.
  • structs and unions have their own distinct name spaces.
  • There is a new floating-point type long double.
  • The K&R C practice of using long float to denote double is now outlawed in ANSI C.
  • Suffixes U and L (or u and l), can be used to explicitly denote unsigned and long constants (eg. 32L, 64U, 1024UL etc).
  • The use of 'octal' constants 8 and 9 (previously defined to be octal 10 and 11 respectively) is no longer supported.
  • Literal strings are to be considered as read-only, and identical strings may be stored as one shared version (as indeed they are, in the Acorn C Compiler). For example, given:

       char *p1 = "hello";
       char *p2 = "hello";

    p1 and p2 will point at the same store location, where the string hello is held. Programs should not therefore modify literal strings.

  • Variadic functions (ie those which take a variable number of arguments) are declared explicitly using an ellipsis (...). For example, int printf(const char *fmt, ...);
  • Empty comments /**/ are replaced by a single space (use the preprocessor directive ## to do token-pasting if you previously used /**/ to do this).
Conversions

ANSI C uses value-preserving rules for arithmetic conversions (whereas K&R C implementations tend to use unsigned-preserving rules). Thus, for example:

int f(int x, unsigned char y)
{
  return (x+y)/2;
}

does signed division, where unsigned-preserving implementations would do unsigned division.

Aside from value-preserving rules, arithmetic conversions follow those of K&R C, with additional rules for long double and unsigned long int. It is now also possible to perform float arithmetic without widening to double. Floating-point values truncate towards zero when they are converted to integral types.

It is illegal to attempt to assign function pointers to data pointers and vice versa (even using explicit casts). The only exception to this is the value 0, as in:

int (*pfi)();
pfi = 0;

Assignment compatibility between structs and unions is now stricter. For example, consider the following:

struct {char a; int b;} v1;
struct {char a; int b;} v2;
v1 = v2; /* illegal because v1 and v2 
            strictly have different types*/

Expressions
  • structs and unions may be passed by value as arguments to functions.
  • Given a pointer to function declared as, say, int (*pfi)();, then the function to which it points can be called either by pfi(); or (*pfi)();.
  • Due to the use of distinct name spaces for struct and union members absolute machine addresses must be explicitly cast before being used as struct and union pointers. For example:

    ((struct io_space *)0x00ff)->io_buf;

Declarations

Perhaps the greatest impact on C of the ANSI Standard has been the adoption of function prototypes. A function prototype declares the return type and argument types of a function. For example, int f(int, float); declares a function returning int with one int and one float argument. This means that a function's argument types are part of the type of that function, thus giving the advantage of stricter argument type-checking, especially across source files. A function definition (which is also a prototype) is similar except that identifiers must be given for the arguments. For example, int f(int i, float f);. It is still possible to use 'old style' function declarations and definitions, but you are advised to convert to the 'new style'. It is also possible to mix old and new styles of function declaration. If the function declaration which is in scope is an old style one, normal integral promotions are performed for integral arguments, and floats are converted to double. If the function declaration which is in scope is a new style one, arguments are converted as in normal assignment statements.

Empty declarations are now illegal.

Arrays cannot be defined to have zero or negative size.

Statements
  • ANSI has defined the minimum attributes of control statements (eg the minimum number of case limbs which must be supported by a compiler). These values are almost invariably greater than those supported by PCCs, and so should not present a problem.
  • A value returned from main() is guaranteed to be used as the program's exit code.
  • Values used in the controlling statement and labels of a switch can be of any integral type.
Preprocessor
  • Preprocessor directives cannot be redefined.
  • There is a new ## directive for token-pasting.
  • There is a directive # which produces a string literal from its following characters. This is useful for cases where you want replacement of macro arguments in strings.
  • The order of phases of translation is well defined and is as follows for the preprocessing phases:
  • Map source file characters to the source character set (this includes replacing trigraphs).
  • Delete all newline characters which are immediately preceded by \.
  • Divide the source file into preprocessing tokens and sequences of white space characters (comments are replaced by a single space).
  • Execute preprocessing directives and expand macros.

    Any #include files are passed through steps 1-4 recursively.

    The macro __STDC__ is #defined to 1 in ANSI-conforming compilers.

The ToPCC and ToANSI tools

The desktop tools ToPCC and ToANSI help you to translate C programs and headers between the ANSI and PCC dialects of C. For more details of their use and capabilities see the earlier chapters ToANSI and ToPCC.

pcc compatibility mode

This section discusses the differences apparent when the compiler is used in 'PCC' mode. When the UNIX pcc setup option is enabled, the C compiler will accept (Berkeley) UNIX-compatible C, as defined by the implementation of the Portable C Compiler and subject to the restrictions which are noted below.

In essence, PCC-style C is K&R C, as defined by B Kernighan and D Ritchie in their book The C Programming Language, with a small number of extensions and clarifications of language features that the book leaves undefined.

Language and preprocessor compatibility

In UNIX pcc mode, the Acorn C compiler accepts K&R C, but it does not accept many of the old-style compatibility features, the use of which has been deprecated and warned against for many years. Differences are listed briefly below:

  • Compound assignment operators where the = sign comes first are accepted (with a warning) by some PCCs. An example is =+ instead of +=. Acorn C does not allow this ordering of the characters in the token.
  • The = sign before a static initialiser was not required by some very old C compilers. Acorn C does not support this syntax.
  • The following very peculiar usage is found in some UNIX tools pre-dating UNIX Version 7:

     struct {int a, b;};
     double d;
    
     d.a = 0;
     d.b = 0x....;

    This is accepted by some UNIX PCCs and may cause problems when porting old (and badly written) code.

  • enums are less strongly typed than is usual under PCCs. enum is a non-K&R extension to C which has been standardised by ANSI somewhat differently from the usual PCC implementation.
  • chars are signed by default in UNIX pcc mode.
  • In UNIX pcc mode, the compiler permits the use of the ANSI '...' notation which signifies that a variable number of formal arguments follow.
  • In order to cater for PCC-style use of variadic functions, a version of the PCC header file varargs.h is supplied with the release.
  • With the exception of enums, the compiler's type checking is generally stricter than PCC's - much more akin to lint's, in fact. In writing the Acorn C compiler, we have attempted to strike a balance between generating too many warnings when compiling known, working code, and warning of poor or non-portable programming practices. Many PCCs silently compile code which has no chance of executing in just a slightly different environment. We have tried to be helpful to those who need to port C among machines in which the following varies:
    • the order of bytes within a word (eg little-endian ARM, VAX, Intel versus big-endian Motorola, IBM370)
    • the default size of int (four bytes versus two bytes in many PC implementations)
    • the default size of pointers (not always the same as int)
    • whether values of type char default to signed or unsigned char
    • the default handling of undefined and implementation-defined aspects of the C language.

If the verbosity of CC in UNIX pcc mode is found undesirable, all warnings and/or errors can be turned off using the Suppress warnings and/or Suppress errors setup options.

  • The compiler's preprocessor is believed to be equivalent to UNIX's cpp, except for the points listed below. Unfortunately, cpp is only defined by its implementation, and although equivalence has been tested over a large body of UNIX source code, completely identical behaviour cannot be guaranteed. Some of the points listed below only apply when the Preprocess only option is used with the CC tool.
    • There is a different treatment of whitespace sequences (benign).
    • nl is processed by CC with Preprocess only enabled, but passed by cpp (making lines longer than expected).
    • Cpp breaks long lines at a token boundary; CC with Preprocess only enabled doesn't (this may break line-size constraints when the source is later consumed by another program).
    • The handling of unrecognised # directives is different (this is mostly benign).
Standard headers and libraries

Use of the compiler in UNIX pcc mode precludes neither the use of the standard ANSI headers built in to the compiler nor the use of the run-time library supplied with the C compiler. Of course, the ANSI library does not contain the whole of the UNIX C library, but it does contain almost all the commonly used functions. However, look out for functions with different names, or a slightly different definition, or those in different 'standard' places. Unless the user directs otherwise using Default path, the C compiler will attempt to satisfy references to, say, <stdio.h> from its in-store filing system.

Listed below are a number of differences between the ANSI C Library, and the BSD UNIX library. They are placed under headings corresponding to the ANSI header files:

ctype.h

There are no isascii() and toascii() functions, since ANSI C is not character-set specific.

errno.h

On BSD systems there are sys_nerr and sys_errlist() defined to give error messages corresponding to error numbers. ANSI C does not have these, but provides similar functionality via perror(const char *s), which displays the string pointed to by s followed by a system error message corresponding to the current value of errno.

There is also char *strerror(int errnum) which, when given a purported value of errno, returns its textual equivalent.

math.h

The #defined value HUGE, found in BSD libraries, is called HUGE_VAL in ANSI C. ANSI C does not have asinh(), acosh(), atanh().

signal.h

In ANSI C the signal() function's prototype is:

extern void (*signal(int, void(*func)(int)))(int);

signal() therefore expects its second argument to be a pointer to a function returning void with one int argument. In BSD-style programs it is common to use a function returning int as a signal handler. The PCC-style function definitions shown below will therefore produce a compiler warning about an implicit cast between different function pointers (since f() defaults to int f()). This is just a warning, and correct code will be generated anyway.

f(signo)
int signo;
{ 
.........
}

main()
{
extern f();
signal(SIGINT, f);
}

stdio.h

sprintf() now returns the number of characters 'printed' (following UNIX System V), whereas the BSD sprintf() returns a pointer to the start of the character buffer.

The BSD functions ecvt(), fcvt() and gcvt() are not included in ANSI C, since their functionality is provided by sprintf().

string.h

On BSD systems, string manipulation functions are found in strings.h, whereas ANSI C places them in <string.h>. The Acorn C Compiler also has strings.h for PCC-compatibility.

The BSD functions index() and rindex() are replaced by the ANSI functions strchr() and strrchr() respectively.

Functions which refer to string lengths (and other sizes) now use the ANSI type size_t, which in our implementation is unsigned int.

stdlib.h

malloc() returns void *, rather than the char * of the BSD malloc().

float.h

A new header added by ANSI giving details of floating point precision etc.

limits.h

A new header added by ANSI to give maximum and minimum limit values for data types.

locale.h

A new header added by ANSI to provide local environment-specific features.

Environmental aspects

When porting an application, the most extensive changes will probably need to be made at the operating system interface level. The following is a brief description of aspects of RISC OS and Acorn C which differ from systems such as UNIX and MS-DOS.

The most apparent interface between a C program and its environment is via the arguments to main(). The ANSI Standard declares that main() is a function defined as the program entry point with either no arguments or two arguments (one giving a count of command line arguments, commonly called int argc, the other an array of pointers to the text of the arguments themselves, after removal of input/output redirection, commonly called char *argv[]). As discussed in the Environment (A.6.3.2), Acorn C supports the style of input/output redirection used by UNIX BSD4.3, but does not support filename wildcarding. Further parameters to main() are not supported.

Under UNIX and MS-DOS, it is common to use a third parameter, normally called char *environ[] under UNIX and char *envp[] under Microsoft C for MS-DOS, to give access to environment variables. The same effect can be achieved in our system by using getenv() to request system variable values explicitly; the names of these variables are as they appear from a RISC OS *Show command. The string pointed at by argv[0] is the program name (similar to UNIX and MS-DOS, except the name is exactly that typed on invocation, so if a full pathname is used to invoke the program, this is what appears in argv[0]).

File naming is one of the least portable aspects in any programming environment. RISC OS uses a full stop (.) as a separator in pathnames and does not support filename extensions (nor does UNIX, but existing UNIX tools make assumptions about file naming conventions). The best way to simulate extensions is to create a directory whose name corresponds to the required extension (in a manner similar to the use of c and h directories for C source and header files). RISC OS filename components are limited to 10 characters.

The Acorn C compiler has support for making Software Interrupt (SWI) calls to RISC OS routines, which can be used to replace any system calls which you make under UNIX or MS-DOS. The include file kernel.h has function prototypes and appropriate typedefs for issuing SWIs. Briefly, the type _kernel_swi_regs allows values to be placed in registers R0-R9, and _kernel_swi() can then be used to issue the SWI; a list of SWI numbers can be found in the include file swis.h. File information, for example, can be obtained in a way similar to stat() under UNIX, by making an OS_GBPB SWI with R0 set to the reason code 11 (full file information). Most of the UNIX/MS-DOS low-level I/O can be simulated in this way, but the ANSI C run-time library provides sufficient support for most applications to be written in a portable style.

You'll find some more information on kernel.h in comments within the header file itself.

RISC OS does not support different memory models as in MS-DOS, so programs which have been written to exploit this will need modification; this should only require the removal of Microsoft C keywords such as near, far and huge, if the program has otherwise been written with portability in mind.

© 3QD Developments Ltd 2013