Previous Up Next

A  Porting C code to Cyclone

Though Cyclone resembles and shares a lot with C, porting is not always straightforward. Furthermore, it's rare that you actually port an entire application to Cyclone. You may decide to leave certain libraries or modules in C and port the rest to Cyclone. In this chapter, we want to share with you the tips and tricks that we have developed for porting C code to Cyclone and interfacing Cyclone code against legacy C code.

A.1  Semi-Automatic Porting

The Cyclone compiler includes a simple porting mode which you can use to try to move your C code closer to Cyclone. The porting tool is not perfect, but it's a start and we hope to develop it more in the future.

When porting a file, say foo.c, you'll first need to copy the file to foo.cyc and then edit it to add __cyclone_port_on__; and __cyclone_port_off__; around the code that you want Cyclone to port. For example, if after copying foo.c, the file foo.cyc contains the following:
1.   #include <stdio.h>
2. 
3.   void foo(char *s) {
4.     printf(s);
5.   }
6. 
7.   int main(int argc, char **argv) {
8.     argv++;
9.     for (argc--; argc >= 0; argc--, argv++)
10.      foo(*argv);
11.  }
then you'll want to insert __cyclone_port_on__; at line 2 and __cyclone_port_off__; after line 11. You do not want to port standard include files such as stdio, hence the need for the delimiters.

Next compile the file with the -port flag:
  cyclone -port foo.cyc > rewrites.txt
and pipe the output to a file, in this case rewrites.txt. If you edit the output file, you will see that the compiler has emitted a list of edits such as the following:
  foo.cyc(5:14-5:15): insert `?' for `*'
  foo.cyc(9:24-9:25): insert `?' for `*'
  foo.cyc(9:25-9:26): insert `?' for `*'
You can apply these edits by running the rewrite program on the edits:
  rewrite -port foo.cyc > rewrites.txt
(The rewrite program is written in Cyclone and included in the tools sub-directory.) This will produce a new file called foo_new.cyc which should look like this:
#include <stdio.h>

__cyclone_port_on__;

void foo(char ?s) { 
  printf(s);
}

int main(int argc, char ??argv) {
  argv++;
  for (argc--; argc >= 0; argc--, argv++) 
    foo(*argv);
}
__cyclone_port_off__;
Notice that the porting system has changed the pointers from thin pointers to fat pointers (?) to support the pointer arithmetic that is done in main, and that this constraint has flowed to procedures that are called (e.g., foo).

You'll need to strip out the port-on and port-off directives and then try to compile the file with the Cyclone compiler. In this case, the rewritten code in foo_new.cyc compiles with a warning that main might not return an integer value. In general, you'll find that the porting tool doesn't always produce valid Cyclone code. Usually, you'll have to go in and modify the code substantially to get it to compile. Nonetheless, the porting tool can take care of lots of little details for you.

A.2  Manually Translating C to Cyclone

To a first approximation, you can port a simple program from C to Cyclone by following these steps which are detailed below: Even when you follow these suggestions, you'll still need to test and debug your code carefully. By far, the most common run-time errors you will get are uncaught exceptions for null-pointer dereference or array out-of-bounds. Under Linux, you should get a stack backtrace when you have an uncaught exception which will help narrow down where and why the exception occurred. On other architectures, you can use gdb to find the problem. The most effective way to do this is to set a breakpoint on the routines _throw_null() and _throw_arraybounds() which are defined in the runtime and used whenever a null-check or array-bounds-check fails. Then you can use gdb's backtrace facility to see where the problem occurred. Of course, you'll be debugging at the C level, so you'll want to use the -save-c and -g options when compiling your code.
Change pointer types to fat pointer types where necessary.
Ideally, you should examine the code and use thin pointers (e.g., int* or better int*@notnull) wherever possible as these require fewer run-time checks and less storage. However, recall that thin pointers do not support pointer arithmetic. In those situations, you'll need to use fat pointers (e.g., int*@fat which can also be written as int?). A particularly simple strategy when porting C code is to just change all pointers to fat pointers. The code is then more likely to compile, but will have greater overhead. After changing to use all fat pointers, you may wish to profile or reexamine your code and figure out where you can profitably use thin pointers.

Be careful with char pointers. By default, a char ? is treated as zero-terminated, i.e. a char * @fat @zeroterm. If you are using the char pointer as a buffer of bytes, then you may actually wish to change it to be a char ? @nozeroterm instead. Along these lines, you have to be careful that when you are using arrays that get promoted to pointers, that you correctly indicate the size of the array to account for the zero terminator. For example, say your original C code was
    char line[MAXLINELEN];
    while (fgets(line, MAXLINELEN, stdin)) ...
If you want your pointer to be zero-terminated, you would have to do the following:
    char line[MAXLINELEN+1] @zeroterm;
    while (fgets(line, MAXLINELEN, stdin)) ...
The @zeroterm qualifier is needed since char arrays are not zero-terminated by default. Adding the +1 makes space for the extra zero terminator that Cyclone includes, ensuring that it won't be overwritten by fgets. If you don't do this, you could well get an array bounds exception at runtime. If you don't want your char array to be zero-terminated, you can simply leave the original C code as is.
Use comprehensions to heap-allocate arrays.
Cyclone provides limited support for malloc and separated initialization but this really only works for bits-only objects. To heap- or region-allocate and initialize an array that might contain pointers, use new or rnew in conjunction with array comprehensions. For example, to copy a vector of integers s, one might write:
  int *@fat t = new {for i < numelts(s) : s[i]};
Use tagged unions for unions with pointers.
Cyclone only lets you read members of unions that contain ``bits'' (i.e., ints; chars; shorts; floats; doubles; or tuples, structs, unions, or arrays of bits.) So if you have a C union with a pointer type in it, you'll have to code around it. One way is to simply use a @tagged union. Note that this adds hidden tag and associated checks to ensure safety.
Initialize variables.
Top-level variables must be initialized in Cyclone, and in many situations, local variables must be initialized. Sometimes, this will force you to change the type of the variable so that you can construct an appropriate initial value. For instance, suppose you have the following declarations at top-level:
struct DICT; 
struct DICT *@notnull new_dict();
struct DICT *@notnull d;
void init() {
  d = new_dict();
}
Here, we have an abstract type for dictionaries (struct Dict), a constructor function (new_dict()) which returns a pointer to a new dictionary, and a top-level variable (d) which is meant to hold a pointer to a dictionary. The init function ensures that d is initialized. However, Cyclone would complain that d is not initialized because init may not be called, or it may only be called after d is already used. Furthermore, the only way to initialize d is to call the constructor, and such an expression is not a valid top-level initializer. The solution is to declare d as a ``possibly-null'' pointer to a dictionary and initialize it with NULL:
struct DICT; 
struct DICT *nonnull new_dict();
struct DICT *d;
void init() {
  d = new_dict();
}
Of course, now whenever you use d, either you or the compiler will have to check that it is not NULL.
Put breaks or fallthrus in switch cases.
Cyclone requires that you either break, return, continue, throw an exception, or explicitly fallthru in each case of a switch.
Replace one temporary with multiple temporaries.
Consider the following code:
void foo(char * x, char * y) {
  char * temp;
  temp = x;
  bar(temp);
  temp = y;
  bar(temp);
}
When compiled, Cyclone generates an error message like this:
type mismatch: char *@zeroterm #0  != char *@zeroterm #1 
The problem is that Cyclone thinks that x and y might point into different regions (which it named #0 and #1 respectively), and the variable temp is assigned both the value of x and the value of y. Thus, there is no single region that we can say temp points into. The solution in this case is to use two different temporaries for the two different purposes:
void foo(char * x, char * y) {
  char * temp1;
  char * temp2;
  temp1 = x;
  bar(temp1);
  temp2 = y;
  bar(temp2);
}
Now Cyclone can figure out that temp1 is a pointer into the region #0 whereas temp2 is a pointer into region #1.
Connect argument and result pointers with the same region.
Remember that Cyclone assumes that pointer inputs to a function might point into distinct regions, and that output pointers, by default point into the heap. Obviously, this won't always be the case. Consider the following code:
int *foo(int *x, int *y, int b) {
  if (b)
    return x;
  else
    return y;
}
Cyclone complains when we compile this code:
foo.cyc:3: returns value of type int *`GR0 but requires int *
  `GR0 and `H are not compatible. 
foo.cyc:5: returns value of type int *`GR1 but requires int *
  `GR1 and `H are not compatible. 
The problem is that neither x nor y is a pointer into the heap. You can fix this problem by putting in explicit regions to connect the arguments and the result. For instance, we might write:
int *`r foo(int *`r x, int *`r y, int b) {
  if (b)
    return x;
  else
    return y;
}
and then the code will compile. Of course, any caller to this function must now ensure that the arguments are in the same region.
Insert type information to direct the type-checker.
Cyclone is usually good about inferring types. But sometimes, it has too many options and picks the wrong type. A good example is the following:
void foo(int b) {
  printf("b is %s", b ? "true" : "false");
} 
When compiled, Cyclone warns:
(2:39-2:40): implicit cast to shorter array
The problem is that the string "true" is assigned the type const char ?{5} whereas the string "false" is assigned the type const char ?{6}. (Remember that string constants have an implicit 0 at the end.) The type-checker needs to find a single type for both since we don't know whether b will come out true or false and conditional expressions require the same type for either case. There are at least two ways that the types of the strings can be promoted to a unifying type. One way is to promote both to char? which would be ideal. Unfortunately, Cyclone has chosen another way, and promoted the longer string ("false") to a shorter string type, namely const char ?{5}. This makes the two types the same, but is not at all what we want, for when the procedure is called with false, the routine will print
b is fals
Fortunately, the warning indicates that there might be a problem. The solution in this case is to explicitly cast at least one of the two values to const char ?:
void foo(int b) {
  printf("b is %s", b ? ((const char ?)"true") : "false");
} 
Alternatively, you can declare a temp with the right type and use it:
void foo(int b) {
  const char ? t = b ? "true" : "false"
  printf("b is %s", t);
} 
The point is that by giving Cyclone more type information, you can get it to do the right sorts of promotions. Other sorts of type information you might provide include region annotations (as outlined above), pointer qualifiers, and casts.
Copy ``const'' code or values to make it non-const.
Cyclone takes const seriously. C does not. Occasionally, this will bite you, but more often than not, it will save you from a core dump. For instance, the following code will seg fault on most machines:
void foo() {
  char ?x = "howdy"
  x[0] = 'a';
}
The problem is that the string "howdy" will be placed in the read-only text segment, and thus trying to write to it will cause a fault. Fortunately, Cyclone complains that you're trying to initialize a non-const variable with a const value so this problem doesn't occur in Cyclone. If you really want to initialize x with this value, then you'll need to copy the string, say using the dup function from the string library:
void foo() {
  char ?x = strdup("howdy");
  x[0] = 'a';
}
Now consider the following call to the strtoul code in the standard library:
extern unsigned long strtoul(const char ?`r n, 
                             const char ?`r*`r2 endptr,
                             int base);
unsigned long foo() {
  char ?x = strdup("howdy");
  char ?*e = NULL;
  return strtoul(x,e,0);
}
Here, the problem is that we're passing non-const values to the library function, even though it demands const values. Usually, that's okay, as const char ? is a super-type of char ?. But in this case, we're passing as the endptr a pointer to a char ?, and it is not the case that const char ?* is a super-type of char ?*. In this case, you have two options: Either make x and e const, or copy the code for strtoul and make a version that doesn't have const in the prototype.
Get rid of calls to free, calloc etc.
There are many standard functions that Cyclone can't support and still maintain type-safety. An obvious one is free() which releases memory. Let the garbage collector free the object for you, or use region-allocation if you're scared of the collector. Other operations, such as memset, memcpy, and realloc are supported, but in a limited fashion in order to preserve type safety.
Use polymorphism or tagged unions to get rid of void*.
Often you'll find C code that uses void* to simulate polymorphism. A typical example is something like swap:
void swap(void **x, void **y) {
  void *t = x;
  x = y;
  y = t;
}
In Cyclone, this code should type-check but you won't be able to use it in many cases. The reason is that while void* is a super-type of just about any pointer type, it's not the case that void** is a super-type of a pointer to a pointer type. In this case, the solution is to use Cyclone's polymorphism:
void swap(`a *x, `a *y) {
  `a t = x;
  x = y;
  y = t;
}
Now the code can (safely) be called with any two (compatible) pointer types. This trick works well as long as you only need to ``cast up'' from a fixed type to an abstract one. It doesn't work when you need to ``cast down'' again. For example, consider the following:
int foo(int x, void *y) {
  if (x)
   return *((int *)y);
  else {
    printf("%s\n",(char *)y);
    return -1;
  }
}
The coder intends for y to either be an int pointer or a string, depending upon the value of x. If x is true, then y is supposed to be an int pointer, and otherwise, it's supposed to be a string. In either case, you have to put in a cast from void* to the appropriate type, and obviously, there's nothing preventing someone from passing in bogus cominations of x and y. The solution in Cylcone is to use a tagged union to represent the dependency and get rid of the variable x:
@tagged union IntOrString { 
  int Int;
  const char *@fat String;
};
typedef union IntOrString i_or_s;
int foo(i_or_s y) {
  switch (y) {
  case {.Int = i}:  return i;
  case {.String = s}:  
    printf("%s\n",s);
    return -1;
  }
}
Rewrite the bodies of vararg functions.
See the section on varargs for more details.
Use exceptions instead of setjmp.
Many uses of setjmp/longjmp can be replaced with a try-block and a throw. Of course, you can't do this for things like a user-level threads package, but rather, only for those situations where you're trying to ``pop-out'' of a deeply nested set of function calls.

A.3  Interfacing to C

When porting any large code from C to Cyclone, or even when writing a Cyclone program from scratch, you'll want to be able to access legacy libraries. To do so, you must understand how Cyclone represents data structures, how it compiles certain features, and how to write wrappers to make up for representation mismatches.

A.3.1  Extern ``C''

Sometimes, interfacing to C code is as simple as writing an appropriate interface. For instance, if you want to call the acos function which is defined in the C Math library, you can simply write the following:
  extern "C" double acos(double);
The extern "C" scope declares that the function is defined externally by C code. As such, it's name is not prefixed with any namespace information by the compiler. Note that you can still embed the function within a Cyclone namespace, it's just that the namespace is ignored by the time you get down to C code. If you have a whole group of functions then you can wrap them with a single extern "C" { ... }, as in:
  extern "C" {
    double acos(double);
    float  acosf(float);
    double acosh(double);
    float  acoshf(float);
    double asin(double);
  }
You must be careful that the type you declare for the C function is its real type. Misdeclaring the type could result in a runtime error. Note that you can add Cyclonisms to the type that refine the meaning of the original C. For example, you could declare:
  extern "C" int strlen(const char * @notnull str);
Here we have refined the type of strlen to require that a non-NULL pointer is passed to it. Because this type is representation-compatible with the C type (that is, it has the same storage requirements and semantics), this is legal. However, the following would be incorrect:
  extern "C" int strlen(const char * @fat str);
Giving the function this type would probably lead to an error because Cyclone fat pointers are represented as three words, but the standard C library function expects a single pointer (one word).

The extern "C" approach works well enough that it covers many of the cases that you'll encounter. However, the situation is not so when you run into more complicated interfaces. Sometimes you will need to write some wrapper code to convert from Cyclone's representations to C's and back (so called wrapper code).

A.3.2  Extern ``C include''

Another useful tool is the extern "C include" mechanism. It allows you to write C definitions within a Cyclone file. Here is a simple example:
extern "C include" {
  char peek(unsigned int i) {
    return *((char *)i);
  }

  void poke(unsigned int i, char c) {
    *((char *)i) = c;
  }
} export {
  peek, poke;
}
In this example, we've defined two C functions peek and poke. Cyclone will not compile or type-check their code, but rather pass them on to the C compiler. The export clause indicates which function and variable definitions should be exported to the Cyclone code. If we only wanted to export the peek function, then we would leave the poke function out of the export list. All all other definitions, like typedefs, structs, etc., not to mention #defines and other preprocessor effects, are exported by default (but this may change in a later release).

Any top-level types you mention in the extern "C include" are interpreted by the Cyclone code that uses them as Cyclone types. If they are actually C types (as would be the case if you #included some header in the C block), this will be safe, but possibly undesirable, since they may not communicate the right information to the Cyclone code. There are two ways around this. In many cases, you can actually declare Cyclone types within the C code, and they will be treated as such. For example, in lib/core.cyc, we have For example, you could do something like:
extern "C include" {
  ... Cyc_Core_mkthin(`a ?`r dyn, sizeof_t<`a> sz) {
    unsigned bd = _get_dyneither_size(dyn,sz);
    return Cyc_Core_mktuple(dyn.curr,bd);
  } 
} export {
  Cyc_Core_mkthin
}
In this case, we are able to include a ? notation directly in the C type, but then manipulate it using the runtime system functions for fat pointers (see cyc_include.h for details).

In the case that you are #includeing a C header file, you may not be able to change its definitions to have a proper Cyclone type, or it may be that the Cyclone definitions will not parse for some reason. In this case, you can declare a block to override the definitions with Cyclone compatible versions. For example, we could change the above code to be instead:
extern "C include" {
  struct foo { int x; int y; };
  struct foo *cast_charbuf(char *buf, unsigned int n) {
    if (n >= sizeof(struct foo))
      return (struct foo *)buf;
    else
      return (void *)0;
  }
} cyclone_override {
  struct foo *cast_charbuf
    (char * @numelts(valueof(`n)) @nozeroterm buf,tag_t<`n> n);
} export {
  cast_charbuf
}
Now we have given cast_charbuf its original C type, but then provided the Cyclone type in the override block. The Cyclone type ensures the value of n correctly represents the length of the buffer, by using Cyclone's dependent types (see Section 3). Note that top-level struct and other type definitions can basically be entirely Cyclone syntax. If you try to declare a Cyclone overriding type that is representation-incompatible with the C version, the compiler will complain.

Here is a another example using an external header:
extern "C include" {  /* tell Cyclone that <pcre.h> is C code */
#include <pcre/pcre.h>
} cyclone_override {
  pcre *`U pcre_compile(const char @pattern, int options,
                        const char *`H *errptr, int *erroffset,
                        const unsigned char *tableptr);
  int pcre_exec(const pcre @code, const pcre_extra *extra, 
                const char *subject, int length,
                int startoffset, int options,
                int *ovector, int ovecsize);
} export { pcre_compile, pcre_exec; }
In this case, we have included the Perl regular expression library C header, and then exported two of its functions, pcre_compile and pcre_exec. Moreover, we have given these functions Cyclone types that are more expressive in the original C. Probably we would yet want to write wrappers around these functions to check other invariants of the arguments (e.g., that the length passed to pcre_exec is indeed the length of the subject). Take a look at tests/pcredemo.cyc for more information on this example. Another example that shows how you can override things is in tests/cinclude.cyc.

The goal of this example is to show how you can safely suck in a large C interface (in this case, the Perl Compatible Regular Expression interface), write wrappers around some of the functions to convert represenations and check properties, and then safely export these wrappers to Cyclone.

One word of warning: when you #include something within an extern "C include", it will follow the normal include path, which is to say that it will look for Cyclone versions of the headers first. This means that if you do something like:
extern "C include" {
#include <string.h>
} export { ... }
It will actually include the Cyclone version of the string.h header! These easiest way around this is to use an absolute path, as in
extern "C include" {
#include "/usr/include/string.h"
} export { ... }
Even worse is when a C header you wish to include itself includes a header for which there exists a Cyclone version. In the pcre.h example above, this actually occurs in that pcre.h includes stdlib.h, and gets the Cyclone version. To avoid this, the pcredemo.cyc program includes the Cyclone versions of these headers first. Ultimately we will probably change the compiler so that header processing within extern "C include" searches the C header path but not the Cyclone one.
Previous Up Next

Web Accessibility