routine or constant name search

4.9 Memory Structures

4.9.1 Introduction

Writing Euphoria code to interact with the operating system or external libraries often requires communicating via data structures stored in memory. In addition to using peeks and pokes to read and write to memory locations, Euphoria programmers can also define structures that can be used to more easily read and write values from and into memory.

The conventions used are similar to those found in the C programming language, since that's the way the most commonly encountered structures are defined and meant to be used. This is meant to provide a familiar syntax to those who already know C, and also to make it easy to define and use memory structures.

4.9.2 Basic Syntax

There are two keywords for defining memory structures: memstruct and memunion. They are similar, except a memstruct is a way to define a data structure that may contain many different, distinct elements, while a memunion (just like a union in C) is a way to refer to the same locations in memory in different ways (e.g., either as an integer or as a floating point number).

Within a memstruct or memunion, different members are defined using names for data types along with some data type modifiers.

It is also possible to declare fixed-length arrays of a member by adding the element count inside square brackets after the member name.

4.9.2.1 Assigning memstruct values

Assigning a value to a memstruct is an alternative to using one of the poke built in procedures. The big advantage to using a memstruct over poke is that euphoria handles data conversion and the calculation of offsets.

The syntax for assigning a value to memory is the pointer to the memory, followed by a dot, then the name of the memstruct, optionally followed by a dot and then a member of the memstruct. If that member is an array and you wish to reference a specific array element, then a 1-based index inside square brackets is also required.

memstruct POINT
	int x
	int y
end memstruct

memstruct BUFFER
    char name[12]  -- 'name' uses 12 bytes of space.
    int  next_element
    int  element[200] -- element uses 200 * 4 = 800 bytes of space.
end memstruct
   
atom point = allocate( sizeof( POINT ) )
point.POINT.x = 1
point.POINT.y = 2

atom buf = allocate( sizeof( BUFFER ) )

buf.BUFFER.name = "testname" -- assigns first 8 byte to string and zeros the remainder.
buf.BUFFER.element[1] = 543
buf.BUFFER.element[2] = 210
buf.BUFFER.element[3] = 987
buf.BUFFER.next_element = 4

When a member is itself a memstruct, the dot notation may be continued to access the nested members. Continuing the POINT example from above:

memstruct RECT
	POINT upper_left
	POINT lower_right
end memstruct

atom rect = allocate( sizeof( RECT ) )
rect.RECT.upper_left.x  = 0
rect.RECT.lower_right.x = 5

Additionally, you can assign to multiple members of a memstruct at a time. When the last referenced member of the left hand side is itself a memstruct, the right hand side will be assigned to the respective members of the memstruct:

point.POINT = { 3, 6 }
rect.RECT.upper_left = { 0, 0 }

If an atom is passed on the right hand side, it is treated as a single element sequence. If the sequence has more elements than the memstruct has members, the extra elements are ignored. If the sequence has fewer elements than the memstruct has members, then the additional members are set to zero. This provides for a simple way to zero out all elements:

point.POINT = {3}  -- x = 3, y = 0
point.POINT = {}   -- x = 0, y = 0

Note that a memunion is treated as unsigned char data.

4.9.2.2 Reading memstruct values

Reading from a memstruct is an alternative to using one of the peek built in procedures. The big advantage to using a memstruct over peek is that euphoria handles data conversion and the calculation of offsets.

The syntax is the same as for Assigning memstruct values, except that it is applied to the right hand side of an expression.

4.9.2.3 Reading and assigning with pointers

A memstruct member that is itself a pointer has an additional way to be used. The normal assignment and reading operations deal with the value of the pointer itself. To access the value to which the pointer points, use an additional dot, then an asterisk:

memstruct PTR_TO_INT
    pointer int a
end memstruct

atom ptr = allocate( sizeof( PTR_TO_INT ) )
ptr.PTR_TO_INT.a = allocate( sizeof( int ) )

ptr.PTR_TO_INT.a.* = 5
ptr.PTR_TO_INT.a.* += 5

? ptr.PTR_TO_INT.a.* -- prints 10

4.9.2.4 Data member size and alignment

In general, euphoria memstructs are sized and aligned the same way as C compilers do on the respective platform. Data elements are usually aligned on their size or the sign of a pointer. There are exceptions, such as double floating point values on 32-bit Linux, which is 4-byte aligned.

In some cases, a different alignment may be required, especially on Windows. C compilers use special pragma directives to tell the compiler to pack structures differently than normal. Euphoria uses with pack directives when declaring a memstruct to define non-standard packing:

memstruct PACK1 with pack 1
	char a
	int b
end memstruct

In this example, an int would normally be aligned on a 4-byte boundary, and would have an offset of 4, leaving 3 additional bytes between a and #b. However since with pack 1 is used in the declaration, b comes immediately after a with no extra bytes. The size of the memstruct is 5. Normally, extra padding would be added to the end of the memstruct to ensure that the memstruct's alignment was based on the largest member's size.

memstruct PACK2 with pack 2
	char a
	int b
end memstruct

In this example, b is placed at offset 2, and the overall size of the memstruct is 6.

==== memstruct

memstruct is used to declare a memory based structure to be used by a Euphoria program. The format is similar to other declarations:

memstruct foo
    int a
    unsigned int b
    pointer int c
end foo

Normal scope rules apply to memstruct definitions, and so they can be local, export,

public or global. The members are laid out in memory sequentially, though based on their sizes, Euphoria may add some space in between members, just like a C compiler would.

The size of a memstruct may be determined at runtime using sizeof(). It is the sum of the sizes of its members.

memstructs may contain other memstructs or pointers to other memstructs.

==== memunion

A memunion is like a memstruct, except that the various data members of a memunion are all located at offset zero, which means that they all start at the same RAM address. A memunion, like a union in C, provides different ways to interpret the same location in memory.

memunion conversion
	int i
	float f
	double d
end memunion

The size of a memunion may be determined at runtime using sizeof(). It is the size of the largest member.

==== memtype

A memtype is an alias of another type that can be used in a memstruct or a memunion. They are used similarly to how typedefs are used in C, and can make porting C structs easier. Especially on Windows, it is common for many common struct declarations to be typedefs. The euphoria programmer can therefore create a memtype and use the same terminology as the native structs.

memtype object as HANDLE

memtype int as BOOL

4.9.2.5 char

A char is a data type that is 1 byte long. Elements of type char are considered to be signed by default. The range of a signed char is -128 to 127. An unsigned char has a range of 0 - 255. They can only be declared inside of a memstruct or memunion.

memstruct char_types
	char          c   -- signed by default, -128 - 127
	unsigned char uc  -- 0 - 255
	signed char   sc  -- -128 - 127
end memstruct

4.9.2.6 short

A short is a data type that is 2 bytes long. Elements of type short are considered to be signed by default. The range of a signed short is −32,768 to 32,767. An unsigned short has a range of 0 - 65,535. They can only be declared inside of a memstruct or memunion.

memstruct char_types
	short           s  -- signed by default, −32,768 to 32,767
	unsigned short us  -- −32,768 to 32,767
	signed short   ss  -- −32,768 to 32,767
end memstruct

4.9.2.7 int

An int is a data type that is 4 bytes long. Elements of type int are considered to be signed by default. The range of a signed int is −2,147,483,648 to 2,147,483,647. An unsigned int has a range of 0 - 4,294,967,295. They can only be declared inside of a memstruct or memunion.

memstruct char_types
	int           i  -- signed by default, −2,147,483,648 to 2,147,483,647
	unsigned int ui  -- 0 - 4,294,967,295
	signed int   si  --  −2,147,483,648 to 2,147,483,647
end memstruct

4.9.2.8 long

A long (can also be long int) varies in size based on the platform. On Windows and 32-bit Unix like operating systems, a long is 4 bytes, or the same size as an int. On 64-bit Unix-like operating systems, a long is 64-bits, or the same size as a long long.

4.9.2.9 long long

A long long (can also be long long int) is an integer that is 8 bytes (64-bits) in size. By default, it is signed (−9,223,372,036,854,775,808 to 9,223,372,036,854,775,807). An unsigned long long varies from 0 to 18,446,744,073,709,551,615.

4.9.2.10 object (memstruct)

An object, in a memstruct or memunion, is an integer the same size as a Euphoria object, and is also the same size as a pointer. By default, it is signed, but can be declared unsigned.

4.9.2.11 float

A float is a 32-bit floating point number, just like those used by atom_to_float32 and float32_to_atom.

4.9.2.12 double

A double is a 64-bit floating point number, just like those used by atom_to_float64 and float64_to_atom. This is the size of floating point numbers used by 32-bit Euphoria.

4.9.2.13 long double

A long double is an 80-bit floating point number, just like those used by atom_to_float80 and float80_to_atom. This is the size of floating point numbers used by 64-bit Euphoria. Although they only use 80 bits, in memstructs they require 16 bytes of storage for alignment purposes.

4.9.2.14 eudouble

A eudouble is a platform independent floating point data type. On 32-bit Euphoria, a eudouble is the same size (64-bits) as a double. On 64-bit Euphoria, it is 80 bits, the same size as a long double.

4.9.2.15 pointer

Data members may have the pointer modifier prepended to their declaration. This signifies that the memstruct contains a pointer to that type of element, rather than the element itself.

4.9.2.16 signed

Integer data types may be signed or unsiged. The default is to be signed, but this can be made explicit by using the signed modifier.

4.9.2.17 unsigned

Integer data types may be signed or unsiged. The default is to be signed, but to use an unsigned integer type, use the unsigned modifier.

4.9.3 Using memstructs

To use a memstruct requires a pointer to the memory where the structure is stored. This can be created by allocate(), or as the result of a call to an external library. No type information is ever stored with the pointer. Instead, the memory may be manipulated using a dot notation, where the name of the memstruct follows the pointer, and the names of data elements

Example:

include std/machine.e

memstruct point
    int x
    int y
end memstruct

memstruct rect
    point upper_left
    point lower_right
end memstruct

atom my_rect = allocate( sizeof( rect ) )

my_rect.rect.upper_left.x = 50
my_rect.rect.upper_left.y = 100

my_rect.rect.lower_right.x = 125
my_rect.rect.lower_right.y = 150

? my_rect.rect.lower_right.x  -- outputs 125

It is possible to abuse the memstruct functionality because there is no type information associated with the address pointer. For example ...

include std/machine.e

memstruct point
    int x
    int y
end memstruct

memstruct rect
    point upper_left
    point lower_right
end memstruct

atom my_rect = allocate( sizeof( rect ) )

my_rect.rect.upper_left.x = 20
my_rect.rect.upper_left.y = 200
? my_rect.rect.upper_left.x  -- outputs 20

my_rect.point.x = 50
my_rect.point.y = 100

? my_rect.rect.upper_left.x  -- outputs 50

4.9.3.1 addressof

A function that returns the address of a memstruct member. You must supply the variable that contains the starting RAM address, the memstruct name, and member name sing the dot notation described above.

Example:

include std/machine.e

memstruct point
    int x
    int y
end memstruct

memstruct rect
    point upper_left
    point lower_right
end memstruct

atom pa = allocate( sizeof( point ) )

? addressof(pa.point.x) = pa      -- outputs 1 (true)
? addressof(pa.point.y) = pa + 4  -- outputs 1 (true)

? addressof(pa.rect.lower_right.y) = pa + 12  -- outputs 1 (true)

4.9.3.2 offsetof

Returns the offset in bytes of a member inside of its memstruct. You can optionally omit the address pointer variable when calling this, just the memstruct name and member is required.

Example:

include std/machine.e

memstruct point
    int x
    int y
end memstruct

memstruct rect
    point upper_left
    point lower_right
end memstruct

atom pa = allocate( sizeof( point ) )

? offsetof(point.x)    -- outputs 0
? offsetof(pa.point.x) -- outputs 0
? offsetof(point.y)    -- outputs 4
? offsetof(pa.point.y) -- outputs 4

? offsetof(rect.lower_right.y) -- outputs 12