# README.EXT - -*- RDoc -*- created at: Mon Aug 7 16:45:54 JST 1995
This document explains how to make extension libraries for Ruby.
In C, variables have types and data do not have types. In contrast, Ruby variables do not have a static type, and data themselves have types, so data will need to be converted between the languages.
Data in Ruby are represented by the C type ‘VALUE’. Each VALUE data has its data-type.
To retrieve C data from a VALUE, you need to:
-
Identify the VALUE’s data type
-
Convert the VALUE into C data
Converting to the wrong data type may cause serious problems.
The Ruby interpreter has the following data types:
- T_NIL
-
nil
- T_OBJECT
-
ordinary object
- T_CLASS
-
class
- T_MODULE
-
module
- T_FLOAT
-
floating point number
- T_STRING
-
string
- T_REGEXP
-
regular expression
- T_ARRAY
-
array
- T_HASH
-
associative array
- T_STRUCT
-
(Ruby) structure
- T_BIGNUM
-
multi precision integer
- T_FIXNUM
-
Fixnum(31bit or 63bit integer)
- T_COMPLEX
-
complex number
- T_RATIONAL
-
rational number
- T_FILE
-
IO
- T_TRUE
-
true
- T_FALSE
-
false
- T_DATA
-
data
- T_SYMBOL
-
symbol
In addition, there are several other types used internally:
- T_ICLASS
-
included module
- T_MATCH
-
MatchData object
- T_UNDEF
-
undefined
- T_NODE
-
syntax tree node
- T_ZOMBIE
-
object awaiting finalization
Most of the types are represented by C structures.
The macro TYPE() defined in ruby.h shows the data type of the VALUE. TYPE() returns the constant number T_XXXX described above. To handle data types, your code will look something like this:
switch (TYPE(obj)) { case T_FIXNUM: /* process Fixnum */ break; case T_STRING: /* process String */ break; case T_ARRAY: /* process Array */ break; default: /* raise exception */ rb_raise(rb_eTypeError, "not valid value"); break; }
There is the data-type check function
void Check_Type(VALUE value, int type)
which raises an exception if the VALUE does not have the type specified.
There are also faster check macros for fixnums and nil.
FIXNUM_P(obj) NIL_P(obj)
The data for type T_NIL, T_FALSE, T_TRUE are nil, false, true respectively. They are singletons for the data type. The equivalent C constants are: Qnil, Qfalse, Qtrue. Note that Qfalse is false in C also (i.e. 0), but not Qnil.
The T_FIXNUM data is a 31bit or 63bit length fixed integer. This size is depend on the size of long: if long is 32bit then T_FIXNUM is 31bit, if long is 64bit then T_FIXNUM is 63bit. T_FIXNUM can be converted to a C integer by using the FIX2INT() macro or FIX2LONG(). Though you have to check that the data is really FIXNUM before using them, they are faster. FIX2LONG() never raises exceptions, but FIX2INT() raises RangeError if the result is bigger or smaller than the size of int. There are also NUM2INT() and NUM2LONG() which converts any Ruby numbers into C integers. These macros includes a type check, so an exception will be raised if the conversion failed. NUM2DBL() can be used to retrieve the double float value in the same way.
You can use the macros StringValue() and StringValuePtr() to get a char* from a VALUE. StringValue(var) replaces var’s value with the result of “var.to_str()”. StringValuePtr(var) does same replacement and returns char* representation of var. These macros will skip the replacement if var is a String. Notice that the macros take only the lvalue as their argument, to change the value of var in place.
You can also use the macro named StringValueCStr(). This is just like StringValuePtr(), but always add nul character at the end of the result. If the result contains nul character, this macro causes the ArgumentError exception. StringValuePtr() doesn’t guarantee the existence of a nul at the end of the result, and the result may contain nul.
Other data types have corresponding C structures, e.g. struct RArray for T_ARRAY etc. The VALUE of the type which has the corresponding structure can be cast to retrieve the pointer to the struct. The casting macro will be of the form RXXXX for each data type; for instance, RARRAY(obj). See “ruby.h”.
There are some accessing macros for structure members, for example ‘RSTRING_LEN(str)’ to get the size of the Ruby String object. The allocated region can be accessed by ‘RSTRING_PTR(str)’. For arrays, use ‘RARRAY_LEN(ary)’ and ‘RARRAY_PTR(ary)’ respectively.
Notice: Do not change the value of the structure directly, unless you are responsible for the result. This ends up being the cause of interesting bugs.
To convert C data to Ruby values:
- FIXNUM
-
left shift 1 bit, and turn on LSB.
- Other pointer values
-
cast to VALUE.
You can determine whether a VALUE is pointer or not by checking its LSB.
Notice Ruby does not allow arbitrary pointer values to be a VALUE. They should be pointers to the structures which Ruby knows about. The known structures are defined in <ruby.h>.
To convert C numbers to Ruby values, use these macros.
- INT2FIX()
-
for integers within 31bits.
- INT2NUM()
-
for arbitrary sized integer.
INT2NUM() converts an integer into a Bignum if it is out of the FIXNUM range, but is a bit slower.
As I already mentioned, it is not recommended to modify an object’s internal structure. To manipulate objects, use the functions supplied by the Ruby interpreter. Some (not all) of the useful functions are listed below:
- rb_str_new(const char *ptr, long len)
-
Creates a new Ruby string.
- rb_str_new2(const char *ptr)
- rb_str_new_cstr(const char *ptr)
-
Creates a new Ruby string from a C string. This is equivalent to rb_str_new(ptr, strlen(ptr)).
- rb_tainted_str_new(const char *ptr, long len)
-
Creates a new tainted Ruby string. Strings from external data sources should be tainted.
- rb_tainted_str_new2(const char *ptr)
- rb_tainted_str_new_cstr(const char *ptr)
-
Creates a new tainted Ruby string from a C string.
- rb_sprintf(const char *format, …)
- rb_vsprintf(const char *format, va_list ap)
-
Creates a new Ruby string with printf(3) format.
- rb_str_cat(VALUE str, const char *ptr, long len)
-
Appends len bytes of data from ptr to the Ruby string.
- rb_str_cat2(VALUE str, const char* ptr)
-
Appends C string ptr to Ruby string str. This function is equivalent to rb_str_cat(str, ptr, strlen(ptr)).
- rb_str_catf(VALUE str, const char* format, …)
- rb_str_vcatf(VALUE str, const char* format, va_list ap)
-
Appends C string format and successive arguments to Ruby string str according to a printf-like format. These functions are equivalent to rb_str_cat2(str, rb_sprintf(format, …)) and rb_str_cat2(str, rb_vsprintf(format, ap)), respectively.
- rb_enc_str_new(const char *ptr, long len, rb_encoding *enc)
-
Creates a new Ruby string with the specified encoding.
- rb_usascii_str_new(const char *ptr, long len)
- rb_usascii_str_new_cstr(const char *ptr)
-
Creates a new Ruby string with encoding US-ASCII.
- rb_str_resize(VALUE str, long len)
-
Resizes Ruby string to len bytes. If str is not modifiable, this function raises an exception. The length of str must be set in advance. If len is less than the old length the content beyond len bytes is discarded, else if len is greater than the old length the content beyond the old length bytes will not be preserved but will be garbage. Note that RSTRING_PTR(str) may change by calling this function.
- rb_str_set_len(VALUE str, long len)
-
Sets the length of Ruby string. If str is not modifiable, this function raises an exception. This function preserves the content upto len bytes, regardless RSTRING_LEN(str). len must not exceed the capacity of str.
- rb_ary_new()
-
Creates an array with no elements.
- rb_ary_new2(long len)
-
Creates an array with no elements, allocating internal buffer for len elements.
- rb_ary_new3(long n, …)
-
Creates an n-element array from the arguments.
- rb_ary_new4(long n, VALUE *elts)
-
Creates an n-element array from a C array.
- rb_ary_to_ary(VALUE obj)
-
Converts the object into an array. Equivalent to Object#to_ary.
There are many functions to operate an array. They may dump core if other types are given.
- rb_ary_aref(argc, VALUE *argv, VALUE ary)
-
Equivaelent to Array#[].
- rb_ary_entry(VALUE ary, long offset)
- rb_ary_subseq(VALUE ary, long beg, long len)
-
ary[beg, len]
- rb_ary_push(VALUE ary, VALUE val)
- rb_ary_pop(VALUE ary)
- rb_ary_shift(VALUE ary)
- rb_ary_unshift(VALUE ary, VALUE val)
- rb_ary_cat(VALUE ary, const VALUE *ptr, long len)
-
Appends len elements of objects from ptr to the array.
You can add new features (classes, methods, etc.) to the Ruby interpreter. Ruby provides APIs for defining the following things:
-
Classes, Modules
-
Methods, Singleton Methods
-
Constants
To define a class or module, use the functions below:
VALUE rb_define_class(const char *name, VALUE super) VALUE rb_define_module(const char *name)
These functions return the newly created class or module. You may want to save this reference into a variable to use later.
To define nested classes or modules, use the functions below:
VALUE rb_define_class_under(VALUE outer, const char *name, VALUE super) VALUE rb_define_module_under(VALUE outer, const char *name)
To define methods or singleton methods, use these functions:
void rb_define_method(VALUE klass, const char *name,
VALUE (*func)(), int argc)
void rb_define_singleton_method(VALUE object, const char *name,
VALUE (*func)(), int argc)
The ‘argc’ represents the number of the arguments to the C function, which must be less than 17. But I doubt you’ll need that many.
If ‘argc’ is negative, it specifies the calling sequence, not number of the arguments.
If argc is -1, the function will be called as:
VALUE func(int argc, VALUE *argv, VALUE obj)
where argc is the actual number of arguments, argv is the C array of the arguments, and obj is the receiver.
If argc is -2, the arguments are passed in a Ruby array. The function will be called like:
VALUE func(VALUE obj, VALUE args)
where obj is the receiver, and args is the Ruby array containing actual arguments.
There are some more functions to define methods. One takes an ID as the name of method to be defined. See also ID or Symbol below.
void rb_define_method_id(VALUE klass, ID name, VALUE (*func)(ANYARGS), int argc)
There are two functions to define private/protected methods:
void rb_define_private_method(VALUE klass, const char *name,
VALUE (*func)(), int argc)
void rb_define_protected_method(VALUE klass, const char *name,
VALUE (*func)(), int argc)
At last, rb_define_module_function defines a module functions, which are private AND singleton methods of the module. For example, sqrt is the module function defined in Math module. It can be called in the following way:
Math.sqrt(4)
or
include Math sqrt(4)
To define module functions, use:
void rb_define_module_function(VALUE module, const char *name,
VALUE (*func)(), int argc)
In addition, function-like methods, which are private methods defined in the Kernel module, can be defined using:
void rb_define_global_function(const char *name, VALUE (*func)(), int argc)
To define an alias for the method,
void rb_define_alias(VALUE module, const char* new, const char* old);
To define a reader/writer for an attribute,
void rb_define_attr(VALUE klass, const char *name, int read, int write)
To define and undefine the ‘allocate’ class method,
void rb_define_alloc_func(VALUE klass, VALUE (*func)(VALUE klass)); void rb_undef_alloc_func(VALUE klass);
func has to take the klass as the argument and return a newly allocated instance. This instance should be as empty as possible, without any expensive (including external) resources.
We have 2 functions to define constants:
void rb_define_const(VALUE klass, const char *name, VALUE val) void rb_define_global_const(const char *name, VALUE val)
The former is to define a constant under specified class/module. The latter is to define a global constant.
There are several ways to invoke Ruby’s features from C code.
The easiest way to use Ruby’s functionality from a C program is to evaluate the string as Ruby program. This function will do the job:
VALUE rb_eval_string(const char *str)
Evaluation is done under the current context, thus current local variables of the innermost method (which is defined by Ruby) can be accessed.
Note that the evaluation can raise an exception. There is a safer function:
VALUE rb_eval_string_protect(const char *str, int *state)
It returns nil when an error occur. Moreover, *state is zero if str was successfully evaluated, or nonzero otherwise.
You can invoke methods directly, without parsing the string. First I need to explain about ID. ID is the integer number to represent Ruby’s identifiers such as variable names. The Ruby data type corresponding to ID is Symbol. It can be accessed from Ruby in the form:
:Identifier
or
:"any kind of string"
You can get the ID value from a string within C code by using
rb_intern(const char *name) rb_intern_str(VALUE name)
You can retrieve ID from Ruby object (Symbol or String) given as an argument by using
rb_to_id(VALUE symbol) rb_check_id(volatile VALUE *name) rb_check_id_cstr(const char *name, long len, rb_encoding *enc)
These functions try to convert the argument to a String if it was not a Symbol nor a String. The second function stores the converted result into *name, and returns 0 if the string is not a known symbol. After this function returned a non-zero value, *name is always a Symbol or a String, otherwise it is a String if the result is 0. The third function takes NUL-terminated C string, not Ruby VALUE.
You can convert C ID to Ruby Symbol by using
VALUE ID2SYM(ID id)
and to convert Ruby Symbol object to ID, use
ID SYM2ID(VALUE symbol)
To invoke methods directly, you can use the function below
VALUE rb_funcall(VALUE recv, ID mid, int argc, ...)
This function invokes a method on the recv, with the method name specified by the symbol mid.
You can access class variables and instance variables using access functions. Also, global variables can be shared between both environments. There’s no way to access Ruby’s local variables.
The functions to access/modify instance variables are below:
VALUE rb_ivar_get(VALUE obj, ID id) VALUE rb_ivar_set(VALUE obj, ID id, VALUE val)
id must be the symbol, which can be retrieved by rb_intern().
To access the constants of the class/module:
VALUE rb_const_get(VALUE obj, ID id)
See also Constant Definition above.
As stated in section 1.3, the following Ruby constants can be referred from C.
Qtrue Qfalse
Boolean values. Qfalse is false in C also (i.e. 0).
Qnil
Ruby nil in C scope.