Kiddinglife C/C++/PYTHON ENGINEER

SWIG-3: Wrap ANSI C

2016-03-19

5.2 Wrapping Simple C Declarations

For example, consider the following interface file:

%module example
%inline %{
extern double sin(double x);
extern int strcmp(const char *, const char *);
extern int Foo;
%}
#define STATUS 50
#define VERSION "1.1"

When SWIG creates an extension module, these declarations are accessible as scripting language functions, variables, and constants respectively. For example, in python:

>>> example.sin(3)
5.2335956
>>> example.strcmp('Dave', 'Mike')
-1
>>> print example.cvar.Foo
42
>>> print example.STATUS
50
>>> print example.VERSION
1.1

Whenever possible, SWIG creates an interface that closely matches the underlying C/C++ code. However, due to subtle differences between languages, run-time environments, and semantics, it is not always possible to do so.

5.2.1 Basic Type Handling

In order to build an interface, SWIG has to convert C/C++ datatypes to equivalent types in the target language:

5.2.1.1 Integral

When an integral value is converted from C, a cast is used to convert it to the representation in the target language. Thus, a 16 bit short in C may be promoted to a 32 bit integer. When integers are converted in the other direction, the value is cast back into the original C type. If the value is too large to fit, it is silently truncated.

bool data type is cast to and from an integer value of 0 and 1 unless the target language provides a special boolean type.

unsigned/signed char are special cases that are handled as small 8-bit integers. Normally, the char datatype is mapped as a one-character ASCII string.

As a rule of thumb, the int datatype and all variations of char and short datatypes are safe to use. For unsigned int and long data types, you will need to carefully check the correct operation of your program after it has been wrapped with SWIG.

5.2.1.2 Float

SWIG recognizes the following floating point types : float and double. Floating point numbers are mapped to and from the natural representation of floats in the target language. This is almost always a C double.

5.2.1.3 Unicode:

For those scripting languages that provide Unicode support, Unicode strings are often available in an 8-bit representation such as UTF-8 that can be mapped to the char * type (in which case the SWIG interface will probably work).

5.2.2 Global Variables

Whenever possible, SWIG maps C/C++ global variables into scripting language variables.

Whenever the scripting language variable is used, the underlying C global variable is accessed。 However, the way to access global variables in different script langs are different.

Finally, if a global variable has been declared as const, it only supports read-only access.

For example:

%module example
double foo;

results in a scripting language variable like this:

# Tcl
set foo [3.5]                   ;# Set foo to 3.5
puts $foo                       ;# Print the value of foo
# Python
cvar.foo = 3.5                  # Set foo to 3.5
print cvar.foo                  # Print value of foo
# Perl
$foo = 3.5;                     # Set foo to 3.5
print $foo, "\n";               # Print value of foo
# Ruby
Module.foo = 3.5               # Set foo to 3.5
print Module.foo, "\n"         # Print value of foo

5.2.3 Constants

Constants can be created using #define, enumerations, or a special %constant directive. The following interface file shows a few valid constant declarations :

#define I_CONST       5               // An integer constant
#define PI            3.14159         // A Floating point constant
#define S_CONST       "hello world"   // A string constant
#define NEWLINE       '\n'            // Character constant
enum boolean {NO=0, YES=1};
enum months {JAN, FEB, MAR, APR, MAY, JUN, JUL, AUG,
             SEP, OCT, NOV, DEC};
%constant double BLAH = 42.37;
#define PI_4 PI/4
#define FLAGS 0x04 | 0x08 | 0x40

5.2.3.1 Special cases

1.SWIG will not create constants for macros unless the value can be completely determined by the preprocessor.

#define EXTERN extern
EXTERN void foo();

what would the value of a constant called EXTERN would be?

2.for the same conservative reasons even a constant with a simple cast will be ignored, such as

#define F_CONST (double) 5 // A floating point constant with cast

3.For enumerations, it is critical that the original enum definition be included somewhere in the interface file (either in a header file or in the %{ %} block) because it needs the original enumeration declaration in order to get the correct enum values as assigned by the C compiler.

5.2.3.2 A brief word about const

Typically, %constant is only used when you want to add constants to the scripting language interface that are not defined in the original header file.

If the right-most const occurs after all other type modifiers (such as pointers), then the variable is const. Otherwise, it is not. Here are some examples of const declarations:

const char a;           // A constant character
char const b;           // A constant character (the same)
char *const c;          // A constant pointer to a character
const char *const d;    // A constant pointer to a constant character

Here is an example of a declaration that is not const:

const char *e; 

In this case, the pointer e can change — it’s only the value being pointed to that is read-only.

5.2.5 A cautionary tale of char *

When strings are passed from a scripting language to a C char *, the pointer usually points to string data stored inside the interpreter.

char *strcat(char *s, const char *t)

your application will perhaps crashes with a segmentation fault or other memory related problem. This is because s refers to some internal data in the target language—data that you should mot be touching.

5.3 Pointers and complex objects

5.3.1 Simple pointers

Pointers to primitive C data types such as:

int* p_int;
double*** ppp_double;
char** pp_char;
Int* null_ptr=NULL;

are fully supported by SWIG.

Rather than trying to convert the data being pointed to into a scripting representation, SWIG simply encodes the pointer itself into a representation that contains the actual value of the pointer and a type-tag.

Thus, the SWIG representation of the above pointers (in Tcl), might look like this:

_10081012_p_int
_1008e124_ppp_double
_f8ac_pp_char
_NULL_p_int

All pointers are treated as opaque objects by SWIG. Thus, a pointer may be returned by a function and passed around to other C functions as needed. (? here is something I feel unclear and need go over in the future.)

The scripting language representation of a pointer value should never be manipulated directly.
eg. the numbers used may differ from the actual machine address (e.g., on little-endian machines, the digits may appear in reverse order)

5.3.2 Run time pointer type checking

Although this has the potential to cause a crash, NULL pointers are also sometimes used as sentinel values or to denote a missing/empty value. Therefore, SWIG leaves NULL pointer checking up to the application.
(? here is something I feel unclear when I wrote this blog and need go over in the future.)

5.3.3 Derived types, structs, and classes

For everything else (structs, classes, arrays, etc…) SWIG applies a very simple rule :

Everything else is a pointer

Suppose you have an interface file like this :

%module fileio
FILE *fopen(char *, char *);
int fclose(FILE *);
unsigned fread(void *ptr, unsigned size, unsigned nobj, FILE *);
unsigned fwrite(void *ptr, unsigned size, unsigned nobj, FILE *);
void *malloc(int nbytes);
void free(void *);

In this file, SWIG doesn’t know what a FILE is, but since it’s used as a pointer, so it doesn’t really matter what it is. If you wrapped this module into Python, you can use the functions just like you expect :

# Copy a file 
def filecopy(source, target):
  f1 = fopen(source, "r")
  f2 = fopen(target, "w")
  buffer = malloc(8192)
  nbytes = fread(buffer, 8192, 1, f1)
  while (nbytes > 0):
    fwrite(buffer, 8192, 1, f2)
          nbytes = fread(buffer, 8192, 1, f1)
  free(buffer)

In this case f1, f2, and buffer are all opaque objects containing C pointers. It doesn’t matter what value they contain–our program works just fine without this knowledge.

When SWIG encounters an undeclared data type, it automatically assumes that it is a structure or class.

5.3.5 Typedef

Like C, typedef can be used to define new type names in SWIG. For example:

typedef unsigned int size_t;

typedef definitions appearing in a SWIG interface are not propagated to the generated wrapper code.

Therefore, they either need to be defined in an included header file or placed in the declarations section like this:

%{
 /* Include in the generated wrapper file */  
typedef unsigned int size_t; 
%}  
/* Tell SWIG about it */  
typedef unsigned int size_t;  

or

%inline %{ typedef unsigned int size_t; %}

SWIG tracks typedef declarations and uses this information for run-time type checking.

For instance, if you use the above typedef and had the following function declaration:

void foo(unsigned int *ptr);

The corresponding wrapper function will accept arguments of type unsigned int * or size_t *.

5.4.1 Passing structures by value

For example, consider the following function:

double dot_product(Vector a, Vector b);

To deal with this, SWIG transforms the function to use pointers by creating a wrapper equivalent to the following:

double wrap_dot_product(Vector *a, Vector *b) 
{
    Vector x = *a;
    Vector y = *b;
    return dot_product(x, y);
}

In the target language, the dot_product() function now accepts pointers to Vectors instead of Vectors. For the most part, this transformation is transparent so you might not notice.

5.4.2 Return by value

For example, consider the following function:

Vector cross_product(Vector v1, Vector v2);

This function wants to return Vector, but SWIG only really supports pointers. As a result, SWIG creates a wrapper like this:

Vector *wrap_cross_product(Vector *v1, Vector *v2) 
{
        Vector x = *v1;
        Vector y = *v2;
        Vector *result;
        result = (Vector *) malloc(sizeof(Vector));
        *(result) = cross(x, y);
        return result;
}

Or if SWIG was run with the -c++ option:

Vector *wrap_cross(Vector *v1, Vector *v2) 
{
        Vector x = *v1;
        Vector y = *v2;
        Vector *result = new Vector(cross(x, y)); // Uses default copy constructor
        return result;
}

In both cases, SWIG allocates a new object and returns a reference to it. It is up to the user to delete the returned object when it is no longer in use.

Clearly, this will leak memory if you are unaware of the implicit memory allocation and don’t take steps to free the result.

It should also be noted that the handling of pass/return by value in C++ has some special cases.

For example, the above code fragments don’t work correctly if Vector doesn’t define a default constructor. The section on SWIG and C++ has more information about this case.

5.4.3 Linking to global structure variables

For example, a global variable like this:

Vector unit_i;

gets mapped to an underlying pair of set/get functions like this :

Vector *unit_i_get()
{
    return &unit_i;
}
void unit_i_set(Vector *value)
{
    unit_i = *value;
}

A global variable created in this manner will show up as a pointer in the target scripting language. It would be an extremely bad idea to free or destroy such a pointer.

Also, C++ classes must supply a properly defined copy constructor in order for assignment to work correctly.

5.4.4 Linking to char*

When a global variable of type char * appears, SWIG uses malloc() or new to allocate memory for the new value.

Specifically, if you have a variable like this

char *foo;

SWIG generates the following code:

/* C mode */
void foo_set(char *value)
{
    if (foo) free(foo);
    foo = (char *) malloc(strlen(value)+1);
    strcpy(foo, value);
}
/* C++ mode.  When -c++ option is used */
void foo_set(char *value)
{
    if (foo) delete [] foo;
    foo = new char[strlen(value)+1];
    strcpy(foo, value);
}

If this is not the behavior that you want, consider making the variable read-only using the %immutable directive.

Alternatively, you might write a short assist-function to set the value exactly like you want. For example:

%inline %{
    void set_foo(char *value)
    {
        strncpy(foo, value, 50);
    }
%}

Note: If you write an assist function like this, you will have to call it as a function from the target scripting language (it does not work like a variable). For example, in Python you will have to write:

>>> set_foo("Hello World")

A common mistake with char * variables is to link to a variable declared like this:

char *VERSION = "1.0";

In this case, the variable will be readable, but any attempt to change the value results in a segmentation or general protection fault.

This is due to the fact that SWIG is trying to release the old value using free or delete when the string literal value currently assigned to the variable wasn’t allocated using malloc() or new.

To fix this behavior, you can either:

  1. mark the variable as read-only,
  2. write a typemap (as described in Chapter 6),
  3. write a special set function as shown.
  4. declare the variable as an array: char VERSION[32] = "1.0";

5.4.4 Linking to const char*

SWIG still generates functions for setting and getting the value. However, the default behavior does not release the previous contents (resulting in a possible memory leak).

In fact, you may get a warning message such as this when wrapping such a variable:

example.i:20. Typemap warning. Setting const char * variable may leak memory

The reason for this behavior is that const char * variables are often used to point to string literals. For example:

const char *foo = "Hello World\n";

On the other hand, it is legal to change the pointer to point to some other value.

When setting a variable of this type, SWIG allocates a new string (using malloc or new) and changes the pointer to point to the new value.

However, the following repeated modifications of the value will result in a memory leak since the old value is not released by default.

Anyway, try to use const char* or char* as read-only.

5.4.5 Arrays

Arrays are fully supported by SWIG, but they are always handled as pointers instead of mapping them to a special array object or list in the target language.

Thus, the following declarations :

int foobar(int a[40]);
void grok(char *argv[]);
void transpose(double a[20][20]);

are processed as if they were really declared like this:

int foobar(int *a);
void grok(char **argv);
void transpose(double (*a)[20]);

Like C, SWIG does not perform array bounds checking. It is up to the user to make sure the pointer points to a suitably allocated region of memory.

Multi-dimensional arrays are transformed into a pointer to an array of one less dimension. For example:

int [10];         // Maps to int *
int [10][20];     // Maps to int (*)[20]
int [10][20][30]; // Maps to int (*)[20][30]

Array variables are supported, but are read-only by default. For example:

int   a[100][200];

In this case, reading the variable ‘a’ returns a pointer of type int (*)[200] that points to the first element of the array &a[0][0] .

Trying to modify ‘a’ results in an error. This is because SWIG does not know how to copy data from the target language into the array.

To work around this limitation, you may want to write a few simple assist functions like this: For example:

%inline %{
        void a_set(int i, int j, int val) 
        {
            a[i][j] = val;
        }
        int a_get(int i, int j) 
        {
            return a[i][j];
        }
%}

In the target language, the value cannot be set/get like a normal variable but set/get it like this:

a_set(0,0,1) # set a[0][0] to 1
val = a_get(0,0) # get a[0][0]

To dynamically create arrays of various sizes and shapes, it may be useful to write some helper functions in your interface. For example:

// Some array helpers
%inline %{
    /* Create any sort of [size] array */
    int *int_array(int size) 
    {
        return (int *) malloc(size*sizeof(int));
    }
    /* Create a two-dimension array [size][10] */
    int (*int_array_10(int size))[10] 
    {
        return (int (*)[10]) malloc(size*10*sizeof(int));
    }
%}

Arrays of char are handled as a special case by SWIG. In this case, strings in the target language can be stored in the array.

For example, if you have a declaration like this,

char pathname[256];

SWIG generates functions for both getting and setting the value that are equivalent to the following code:

char *pathname_get() 
{
    return pathname;
}
void pathname_set(char *value) 
{
    strncpy(pathname, value, 256);
}

The difference is that in the target language, the value can be set/get like a normal variable.

pathname = 'python' # set
print pathname # get

5.4.8 Default/optional arguments

For example:

int plot(double x, double y, int color=WHITE);

SWIG generates wrapper code where the default arguments are optional in the target language:

plot -3.4 7.5    # Use default value
plot -3.4 7.5 10 # set color to 10 instead

5.4.9 Pointers to functions and callbacks

SWIG provides full support for function pointers provided that the callback functions are defined in C and not in the target language.

For example, consider a function like this:

int binary_op(int a, int b, int (*op)(int, int));

When you first wrap something like this into an extension module, you may find the function to be impossible to use. For instance, in Python:

>>> def add(x, y):
...     return x+y
...
>>> binary_op(3, 4, add)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: Type error. Expected _p_f_int_int__int
>>>

The reason for this error is that SWIG doesn’t know how to map a scripting language function into a C callback.

However, existing C functions can be used as arguments provided you install them as constants. One way to do this is to use the %constant directive like this:

/* Function with a callback */
int binary_op(int a, int b, int (*op)(int, int));
/* Some callback functions */
%constant int add(int, int);
%constant int sub(int, int);
%constant int mul(int, int);

In this case, add, sub, and mul become function pointer constants in the target scripting language. This allows you to use them as follows:

>>> binary_op(3, 4, add)
7
>>> binary_op(3, 4, mul)
12
>>>

Unfortunately, by declaring the callback functions as constants, they are no longer accessible as functions. For example:

>>> add(3, 4)
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
TypeError: object is not callable: '_ff020efc_p_f_int_int__int'
>>>

If you want to make a function available as both a callback function and a function, you can use the %callback and %nocallback directives like this:

/* Function with a callback */
int binary_op(int a, int b, int (*op)(int, int));
/* Some callback functions */
%callback("%s_cb");
int add(int, int);
int sub(int, int);
int mul(int, int);
%nocallback;

%s gets replaced by the function name. The callback mode remains in effect until it is explicitly disabled using %nocallback. When you do this, the interface now works as follows:

>>> binary_op(3, 4, add_cb)
7
>>> binary_op(3, 4, mul_cb)
12
>>> add(3, 4)
7
>>> mul(3, 4)
12

Notice that when the function is used as a callback, special names such as add_cb are used instead. To call the function normally, just use the original function name such as add() .

Although SWIG does not normally allow callback functions to be written in the target language, this can be accomplished with the use of type maps and other advanced SWIG features.

See the Typemaps chapter for more about typemaps and individual target language chapters for more on callbacks and the ‘director’ feature.

5.5 Structures and unions

If SWIG encounters the definition of a C structure or union, it creates a set of accessor functions. For example:

struct Vector 
{
    double x, y, z;
}

gets transformed into the following set of accessor functions :

double Vector_x_get(struct Vector *obj) 
{
    return obj->x;
}
double Vector_y_get(struct Vector *obj) 
{ 
    return obj->y;
}
double Vector_z_get(struct Vector *obj) 
{ 
    return obj->z;
}
void Vector_x_set(struct Vector *obj, double value) 
{
    obj->x = value;
}
void Vector_y_set(struct Vector *obj, double value) 
{
    obj->y = value;
}
void Vector_z_set(struct Vector *obj, double value) 
{
    obj->z = value;
}

In addition, SWIG creates default constructor and destructor functions if none are defined in the interface. For example:

struct Vector *new_Vector() 
{
    return (Vector *) calloc(1, sizeof(struct Vector));
}
void delete_Vector(struct Vector *obj) 
{
    free(obj);
}

Using these low-level accessor functions, an object can be minimally manipulated from the target language using code like this:

 v = new_Vector()
Vector_x_set(v, 2)
Vector_y_set(v, 10)
Vector_z_set(v, -5)
delete_Vector(v)

However, most of SWIG’s language modules also provide a high-level interface that is more convenient. Keep reading.

5.5.2 Char* strings members

Structures involving character strings require some care.

SWIG assumes that all members of type char * have been dynamically allocated using malloc() and that they are NULL-terminated ASCII strings. For example :

%module mymodule
...
struct Foo {
  char *name;
  ...
}

This results in the following accessor functions :

// Note: If the -c++ option is used, new and delete are used to perform memory allocation.
char *Foo_name_get(Foo *obj) 
{
    return Foo->name;
}
char *Foo_name_set(Foo *obj, char *c) 
{
    if (obj->name)
        free(obj->name);
    obj->name = (char *) malloc(strlen(c)+1);
    strcpy(obj->name, c);
    return obj->name;
}

If this behavior differs from what you need in your applications, the SWIG “memberin” typemap can be used to change it. See the typemaps chapter for further details.

5.5.3 Array members

Arrays may appear as the members of structures, but they will be read-only by default. SWIG may generate a warning message:

interface.i:116. Warning. Array member will be read-only

To eliminate the warning message, typemaps can be used, but this is discussed in a later chapter. In many cases, the warning message is harmless.

5.5.4 Structure data members Occasionally, a structure will contain data members that are themselves structures. For example:

typedef struct Foo 
{
    int x;
} Foo;

typedef struct Bar 
{
    int y;
    Foo f;  /* Structure data member */
} Bar;

When a structure member is wrapped, it is handled as a pointer, unless the %naturalvar directive is used where it is handled more like a C++ reference (see C++ Member data).

The accessors to the member variable as a pointer are effectively wrapped as follows:

Foo *Bar_f_get(Bar *b) 
{
    return &b->f;
}
void Bar_f_set(Bar *b, Foo *value) 
{
    b->f = *value;
}

The reasons for this are have to do with the problem of modifying and accessing data inside the data member. For example, suppose you wanted to modify f.x in C like this:

Bar *b;
b->f.x = 37;

Translating them to function calls in C (as would be used inside the scripting language interface) results in the following code:

Bar *b;
Foo_x_set(Bar_f_get(b), 37);

In this code, if the Bar_f_get() function were to return a Foo instead of a Foo*, then the resulting modification would be applied to a copy of f and not the data member f itself. Clearly that’s not what you want!

It should be noted that this transformation to pointers only occurs if SWIG knows that a data member is a structure or class. For instance, if you had a structure like this,

struct Foo 
{
    WORD   w;
};

and nothing was known about WORD, then SWIG will generate more normal accessor functions like this:

WORD Foo_w_get(Foo *f) 
{
    return f->w;
}
void Foo_w_set(FOO *f, WORD value) 
{
    f->w = value;
}

Compatibility Note: if you need to tell SWIG that an undeclared datatype is really a struct, simply use a forward struct declaration such as “struct Foo;”.

5.5.5 C constructors and destructors

When wrapping structures, it is generally useful to have a mechanism for creating and destroying objects

As there may be memory allocations happening when constructing structures that should be released when destruction.

SWIG will by default automatically generate functions for creating and destroying objects using malloc() and free() when SWIG is used on C code, with C++ being handled differently.

If you don’t want SWIG to generate default constructors for your interfaces, you can use the %nodefaultctor directive or the -nodefaultctor command line option. For example:

swig -nodefaultctor example.i

or

%module foo
...
%nodefaultctor;        // Don't create default constructors
... declarations ...
%clearnodefaultctor;   // Re-enable default constructors

If you need more precise control, %nodefaultctor can selectively target individual structure definitions. For example:

%nodefaultctor Foo; // No default constructor for Foo
...
struct Foo 
{             
    // No default constructor generated.
};
struct Bar 
{             
    // Default constructor generated.
};

Since ignoring the implicit or default destructors most of the time produces memory leaks, SWIG will always try to generate them.

If needed, however, you can selectively disable the generation of the default/implicit destructor by using %nodefaultdtor:

%nodefaultdtor Foo; // No default/implicit destructor for Foo
...
struct Foo 
{              
    // No default destructor is generated.
};
struct Bar 
{              
    // Default destructor generated.
};

Note: There are also the -nodefault option and %nodefault directive, which disable both the default or implicit destructor generation. This could lead to memory leaks across the target languages, and it is highly recommended you don’t use them.

5.5.6 Adding member functions to C structures

SWIG provides a special %extend directive that makes it possible to attach methods to C structures for purposes of building an object oriented interface. Suppose you have a C header file with the following declaration :

/* file : vector.h */
...
typedef struct Vector 
{
    double x, y, z;
} Vector;

You can make a Vector look a lot like a class by writing a SWIG interface like this:

// file : vector.i
%module mymodule
%{
        #include "vector.h"
%}

%include "vector.h"          // Just grab original C header file
%extend Vector 
{             
    // Attach these functions to struct Vector.
    // The newly constructed object must be returned as if the constructor 
    // declaration had a return value, a Vector * in this case.
    Vector(double x, double y, double z) 
    {
        Vector *v;
        v = (Vector *) malloc(sizeof(Vector));
        v->x = x;
        v->y = y;
        v->z = z;
        return v;
    }
    ~Vector()
    {
        free($self);
    }
    double magnitude() 
    {
        return sqrt($self->x*$self->x+$self->y*$self->y+$self->z*$self->z);
    }
    void print() 
    {
        printf("Vector [%g, %g, %g]\n", $self->x, $self->y, $self->z);
    }
};

Note the usage of the $self special variable. Its usage is identical to a C++ ‘this’ pointer and should be used whenever access to the struct instance is required.

Now, when used with proxy classes in Python, you can do things like this :

v = Vector(3, 4, 0)                 # Create a new vector
print v.magnitude()                # Print magnitude
# 5.0
v.print()                  # Print it out
# [ 3, 4, 0 ]
del v                      # Destroy it

The %extend directive can also be used inside the definition of the Vector structure. For example:

// file : vector.i
%module mymodule
%{
        #include "vector.h"
%}

typedef struct Vector {
  double x, y, z;
  %extend {
    Vector(double x, double y, double z) { ... }
    ~Vector() { ... }
    ...
  }
} Vector;

Note that %extend can be used to access externally written functions provided they follow the naming convention used in this example :

/* File : vector.c */
/* Vector methods */

#include "vector.h"

Vector *new_Vector(double x, double y, double z) {
  Vector *v;
  v = (Vector *) malloc(sizeof(Vector));
  v->x = x;
  v->y = y;
  v->z = z;
  return v;
}

void delete_Vector(Vector *v) {
  free(v);
}

double Vector_magnitude(Vector *v) {
  return sqrt(v->x*v->x+v->y*v->y+v->z*v->z);
}

// File : vector.i
// Interface file
%module mymodule
%{
#include "vector.h"
%}

typedef struct Vector {
  double x, y, z;
  %extend {
    Vector(int, int, int); // This calls new_Vector()
    ~Vector();           // This calls delete_Vector()
    double magnitude();  // This will call Vector_magnitude()
    ...
  }
} Vector;

The name used for %extend should be the name of the struct and not the name of any typedef to the struct. For example:

typedef struct Integer {
  int value;
} Int;
%extend Integer { ...  } /* Correct name */
%extend Int { ...  } /* Incorrect name */

struct Float {
  float value;
};
typedef struct Float FloatValue;
%extend Float { ...  } /* Correct name */
%extend FloatValue { ...  } /* Incorrect name */

There is one exception to this rule and that is when the struct is anonymously named such as:

typedef struct {
  double value;
} Double;
%extend Double { ...  } /* Okay */

A little known feature of the %extend directive is that it can also be used to add synthesized attributes or to modify the behavior of existing data attributes.

For example, suppose you wanted to make magnitude a read-only attribute of Vector instead of a method. To do this, you might write some code like this:

// Add a new attribute to Vector
%extend Vector {
    const double magnitude;
}
// Now supply the implementation of the Vector_magnitude_get function
%{
const double Vector_magnitude_get(Vector *v) {
  return (const double) sqrt(v->x*v->x+v->y*v->y+v->z*v->z);
}
%}

Now, for all practical purposes, magnitude will appear like an attribute of the object.

A similar technique can also be used to work with data members that you want to process. For example, consider this interface:

typedef struct Person {
  char name[50];
  ...
} Person;

Say you wanted to ensure name was always upper case, you can rewrite the interface as follows to ensure this occurs whenever a name is read or written to:

typedef struct Person {
  %extend {
    char name[50];
  }
  ...
} Person;

%{
#include <string.h>
#include <ctype.h>
void make_upper(char *name) {
  char *c;
  for (c = name; *c; ++c)
    *c = (char)toupper((int)*c);
}

/* Specific implementation of set/get functions forcing capitalization */
char *Person_name_get(Person *p) {
  make_upper(p->name);
  return p->name;
}
void Person_name_set(Person *p, char *val) {
  strncpy(p->name, val, 50);
  make_upper(p->name);
}
%}

Note: it should be stressed that even though %extend can be used to add new data members, these new members can not require the allocation of additional storage in the object (e.g., their values must be entirely synthesized from existing attributes of the structure or obtained elsewhere).

5.5.7 Nested structures

Occasionally, a C program will involve structures like this :

 typedef struct Object {
  int objtype;
  union {
    int ivalue;
    double dvalue;
    char *strvalue;
    void *ptrvalue;
  } intRep;
} Object;

When SWIG encounters this, it performs a structure splitting operation that transforms the declaration into the equivalent of the following:

typedef union {
  int ivalue;
  double dvalue;
  char *strvalue;
  void *ptrvalue;
} Object_intRep;

typedef struct Object {
  int objType;
  Object_intRep intRep;
} Object;

SWIG will then create an Object_intRep structure for use inside the interface file. Accessor functions will be created for both structures. In this case, functions like this would be created :

Object_intRep *Object_intRep_get(Object *o) 
{
    return (Object_intRep *) &o->intRep;
}
int Object_intRep_ivalue_get(Object_intRep *o) 
{
    return o->ivalue;
}
int Object_intRep_ivalue_set(Object_intRep *o, int value) 
{
    return (o->ivalue = value);
}
double Object_intRep_dvalue_get(Object_intRep *o) 
{
    return o->dvalue;
}
... etc ...

Although this process is a little hairy, it works like you would expect in the target scripting language–especially when proxy classes are used. For instance, in Perl:

# Perl5 script for accessing nested member
$o = CreateObject();                    # Create an object somehow
$o->{intRep}->{ivalue} = 7              # Change value of o.intRep.ivalue

Note: if you have a lot of nested structure declarations, it is advisable to double-check them after running SWIG. Nesting is handled differently in C++ mode, see Nested classes.

5.6 Code Insertion

Sometimes it is necessary to insert special code into the resulting wrapper file generated by SWIG. For example, you may want to include additional C code to perform initialization or other operations.

There are four common ways to insert code, but it’s useful to know how the output of SWIG is structured first.

5.6.1 The output of SWIG

When SWIG creates its output C/C++ file, it is broken up into five sections:

  • Begin section
    A placeholder for users to put code at the beginning of the C/C++ wrapper file. This is most often used to define preprocessor macros that are used in later sections.
  • Runtime code
    This code is internal to SWIG and is used to include type-checking and other support functions that are used by the rest of the module.
  • Header section
    This is user-defined support code that has been included by the %{ … %} directive. Usually this consists of header files and other helper functions.
  • Wrapper code
    These are the wrappers generated automatically by SWIG.
  • Module initialization
    The function generated by SWIG to initialize the module upon loading.

5.6.2 Code insertion blocks

The %insert directive enables inserting blocks of code into a given section of the generated code. It can be used in one of two ways:

%insert("section") "filename"
%insert("section") %{ ... %}

For example, %runtime is used instead of %insert(“runtime”). The valid sections and order of the sections in the generated C/C++ wrapper file is as shown:

%begin %{
  ... code in begin section ...
%}
%runtime %{
  ... code in runtime section ...
%}
%header %{
  ... code in header section ...
%}
%wrapper %{
  ... code in wrapper section ...
%}
%init %{
  ... code in init section ...
%}

Note: the bare %{ … %} directive is a shortcut that is the same as %header %{ ... %}.

A common use for code blocks is to write “helper” functions. These are functions that are used specifically for the purpose of building an interface, but which are generally not visible to the normal C program. For example:

%{
   /* Create a new vector */
   static Vector *new_Vector() 
   {
      return (Vector *) malloc(sizeof(Vector));
   }
%}

// Now tell SWIG to warp and expose it to script world 
Vector *new_Vector();

5.6.3 Inlined header code blocks

Since the process of writing helper functions is fairly common, there is a special inlined form of code block that is used as follows :

%inline 
%{
   // Create a new vector and at the same time 
   // tell SWIG to warp and expose it to script world
   Vector *new_Vector() 
   {
      return (Vector *) malloc(sizeof(Vector));
   }
%}

The %inline directive inserts all of the code that follows verbatim into the header portion of an interface file.

The code is then parsed by both the SWIG preprocessor and parser. Thus, the above example creates a new command new_Vector using only one declaration.

5.6.4 Initialization blocks

When code is included in the %init section, it is copied directly into the module initialization function. For example, if you needed to perform some extra initialization on module loading, you could write this:

%init %{init_variables();%}

5.7 Interface Building Strategy

This section describes the general approach for building interfaces with SWIG. The specifics related to a particular scripting language are found in later chapters.

5.7.1 Preparing a C program for SWIG

Here’s a series of steps you can follow to make an interface for a C program :

  1. Identify the functions that you want to wrap.
    It’s probably not necessary to access every single function of a C program–thus, a little forethought can dramatically simplify the resulting scripting language interface. C header files are a particularly good source for finding things to wrap.
  2. Create a new interface file to describe the scripting language interface to your program.
  3. Copy the appropriate declarations into the interface file or use SWIG’s %include directive to process an entire C source/header file.
  4. Make sure everything in the interface file uses ANSI C/C++ syntax.
  5. Most importantly, define a type before it is used!
    Make sure all necessary typedef declarations and type-information is available in the interface file. if type information is not specified correctly, the wrappers can be sub-optimal and even result in uncompilable C/C++ code.
  6. If your program has a main() function, you may need to rename it (read on).
  7. Run SWIG and compile.
    Although this may sound complicated, the process turns out to be fairly easy once you get the hang of it.

Note: In the process of building an interface, SWIG may encounter syntax errors or other problems. The best way to deal with this is to simply copy the offending code into a separate interface file and edit it.

5.7.2 SWIG interface file

The preferred method of using SWIG is to generate a separate interface file. Suppose you have the following C header file :

/* File : header.h */
#include <stdio.h>
#include <math.h>
extern int foo(double);
extern double bar(int, int);
extern void dump(FILE *f);

A typical SWIG interface file for this header file would look like the following :

/* File : interface.i */
%module mymodule
%{
    #include "header.h"
%}
extern int foo(double);
extern double bar(int, int);
extern void dump(FILE *f);

Of course, in this case, our header file is pretty simple so we could use a simpler approach and use an interface file like this:

/* File : interface.i */
%module mymodule
%{
    #include "header.h"
%}
%include "header.h"

The main advantage of this approach is minimal maintenance of an interface file for when the header file changes in the future.

Note: in more complex projects, an interface file containing numerous %include and #include statements like this is one of the most common approaches to interface file design due to lower maintenance overhead.

5.7.3 Why use separate interface files?

Although SWIG can parse many header files, it is more common to write a special .i file defining the interface to a package. There are several reasons why you might want to do this:

  1. It is rarely necessary to access every single function in a large package. Many C functions might have little or no use in a scripted environment. Therefore, why wrap them?
  2. Separate interface files provide an opportunity to provide more precise rules about how an interface is to be constructed.
  3. Interface files can provide more structure and organization.
  4. SWIG can’t parse certain definitions that appear in header files. Having a separate file allows you to eliminate or work around these problems.
  5. Interface files provide a more precise definition of what the interface is. Users wanting to extend the system can go to the interface file and immediately see what is available without having to dig it out of header files.

5.7.4 Getting the right header files

Note: Sometimes, it is necessary to use certain header files in order for the code generated by SWIG to compile properly. Make sure you include certain header files by using a %{ %} block like this:

%module graphics
%{
   #include <GL/gl.h>
   #include <GL/glu.h>
%}
// Put the rest of the declarations here
...

5.7.5 What to do with main()

If your program defines a main() function, you may need to get rid of it or rename it in order to use a scripting language. As most scripting languages define their own main() procedure that is called instead.

main() also makes no sense when working with dynamic loading. There are a few approaches to solving the main() conflict :

  • Get rid of main() entirely.
  • Rename main() to something else. You can do this by compiling your C program with an option like -Dmain=oldmain.
  • Use conditional compilation to only include main() when not using a scripting language.

Note: getting rid of main() may cause potential initialization problems of a program. To handle this problem, you may consider writing a special function called program_init() that initializes your program upon startup.

This function could then be called either from the scripting language as the first operation, or when the SWIG generated module is loaded. In many cases, the old main() program can be completely replaced by a Perl, Python, or Tcl script.


Comments