動態類型語言數組的類型是什麼?


3

例如,在JavaScript中,我可以執行以下操作:

var arr = [1, "two", /three/, [4]];

在C語言中無法做到這一點!除非使用void*,否則這不是有效/安全的方法。

這是實現方式嗎?到處都在使用void*

6

Basically, yes. Each element of the array has to take the same amount of space, so some representation that can represent any value must be used for all elements. A simple option is making everything heap allocated and always (variables, array elements, object attributes, etc.) use pointers to boxed objects. Another option is to store values of a few types directly and others as pointers. There are a variety of clever schemes for this, but of course they're all limited in that they only support a few types without indirection.

Finally, some implementations automatically recognize when the values stored in an array are homogenous, despite the lanugage not guaranteeing that, and use an optimized representation for those cases. PyPy calls this list strategies.


7

Everything is like a void*, and yes it's more inefficient than values that are stored directly in the array.

But for example in V8 an array can have three type modes: int, double, anything and only in anything-mode everything is stored in the generic format. In int or double modes, the array directly stores those values.

In V8 your array is in generic mode and is represented like:

(SMi means a pointer that actually embeds a direct integer rather than pointing at valid memory)

[MapPointer, Properties[]Pointer, Elements[]Pointer, LengthSMi]

Then at the Elements[] (the actual storage array) array you have:

[MapPointer, LengthSMi, SMiForOne, PointerAtString, PointerAtRegexpObject, PointerAtArrayObject, PointerAtHole, PointerAtHole...]

The holes special objects that mark a reserved slot.

And you don't have to use any struct at all for them (V8 doesn't) - you can and should manage everything yourself with raw memory.


2

void* is not enough if you want dynamic typing. You need a type information for each value as well. Something like:

struct {
  void * data;
  int type_id;
};

5

There is no way to do such a thing in C! Except by using a void*, which is not an efficient/safe way.

Why do you believe that using a (void*) is not an efficient/safe way?

If you want to hide implementation details in C, it is a common technique to expose it as typedef to (void*) in a header and provide functions to manipulate it.

Here is an exaple header for a dynamic type holding integers and strings:

/* Dynamic types in C */

#define DYNAMIC_INT 0
#define DYNAMIC_STRING  1

struct dynamic_phantom;

typedef struct dynamic_phantom* dynamic;

dynamic dynamic_of_int(int c);
dynamic dynamic_of_string(char *s);
int dynamic_classify(dynamic v);
int dynamic_get_int(dynamic v, int *c);
int dynanic_get_string(dynamic v, char **s);
void dynamic_free(dynamic v);

I hope that the names of the functions and their protoypes are enough to get what they are for. There is a lot of options to implement this. Let me outline two popular choices:

UNIONS

In your implementation file, you implement the dynamic as a unionlike this:

union dynamic_value {
  int value_int;
  char* value_string;
};

struct dynamic_cell {
  int cell_type;
  union dynamic_value cell_value;
};

typedef struct dynamic_cell *dynamic;

and implement the functions I enumerated above is straightforward. (Of course, you can refine this in several ways, defining error control procedures, add control bits if you wish to avoid duplicating string contents, and so on.)

SPECIALISED MEMORY POOLS

For each available dynamic type, you allocate a memory pool, i.e. a large array of such values. A dynamic value is then implemented as a pointer to some value in these memory pools casted to void*. The type of the value is then recovered from the pointer range.

An advantage of this strategy over the previous one is that it natively deals with values of different sizes while the union approach takes the biggest size of the possible value types. It is however slightly more complicated to implement, especially because of memory management—it needs to reimplement efficiently the logic of malloc if you want to deal with a significant number of values.


0

It is overly naive to think that typing an array to void** will make your code less efficient, then say, int*. This is because the compiler today are often smart enough to manipulate the data behind the scenes in ways a programmer may not anticipate. Of course, some compilers are smarter then others, but it's getting incrementally better, and we are nowhere near the climax of art of compiler writing. We are probably somewhere in the very beginning of the road.

There also may be many other factors in play, such as how exactly the runtime allocates its objects (does it fragment them or not, how does it align them etc.). In what part of the memory do these objects land (memory may have all sorts of structures, hierarchies, different devices may have their own memory etc.) and most important of all: how your code handles them.

Now, from the perspective of type theory, "dynamically typed" is a misconception. There are untyped languages (even better named "languages where all data are of the same type") and there are languages where types are checked at various stages of the program creation cycle. The types can be often checked during compilation (if this is at all a step in the program livecycle), upon loading the program, "dynamically" (that is applying some heuristic after loading the program, based no the program's usage).

However, "dynamic language" is a terminology often found in special literature, different researchers assign different meanings to it, relating to different aspects of the program. By some standards, Java may be called a dynamic language (because it needs to take certain amount of decisions required for method dispatch at run time, for example). But some would not call Common Lisp a dynamic language because it verifies types during the compilation.

So, if anything, I'd avoid using "dynamic language" in an academic setting until further notice :)