Now we've looked at the most common types of scalar, (there are a few complications, which we'll cover in Section 4.3) let's examine array and hash structures. These, too, are build on top of the basic SV structure, with reference counts and flags, and structures hung off sv_any.
Arrays are known in the core as AVs. Their structure can be found in av.h:
struct xpvav { char* xav_array; /* pointer to first array element */ SSize_t xav_fill; /* Index of last element present */ SSize_t xav_max; /* max index for which array has space */ IV xof_off; /* ptr is incremented by offset */ NV xnv_nv; /* numeric value, if any */ MAGIC* xmg_magic; /* magic for scalar array */ HV* xmg_stash; /* class package */ SV** xav_alloc; /* pointer to malloced string */ SV* xav_arylen; U8 xav_flags; };
We're going to skip over xmg_magic and xmg_stash for now, and come back to them in Section 4.3.
Let's use Devel::Peek as before to examine the AV, but we must remember that we can only give one argument to Devel::Peek::Dump; hence, we must pass it a reference to the AV:
% perl -MDevel::Peek -e '@a=(1,2,3); Dump(\@a)' SV = RV(0x8106ce8) at 0x80fb380 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x8105824 SV = PVAV(0x8106cb4) at 0x8105824 REFCNT = 2 FLAGS = () IV = 0 NV = 0 ARRAY = 0x80f7de8 FILL = 2 MAX = 3 ARYLEN = 0x0 FLAGS = (REAL) Elt No. 0 SV = IV(0x80fc1f4) at 0x80f1460 REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 1 Elt No. 1 SV = IV(0x80fc1f8) at 0x80f1574 REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 2 Elt No. 2 SV = IV(0x80fc1fc) at 0x80f1370 REFCNT = 1 FLAGS = (IOK,pIOK,IsUV) UV = 3
Something similar to the offset hack is performed on AVs to enable efficient shifting and splicing off the beginning of the array; while AvARRAY (xav_array in the structure) points to the first element in the array that is visible from Perl, AvALLOC (xav_alloc) points to the real start of the C array. These are usually the same, but a shift operation can be carried out by increasing AvARRAY by one and decreasing AvFILL and AvLEN. Again, the location of the real start of the C array only comes into play when freeing the array. See av_shift in av.c.
Hashes are represented in the core as, you guessed it, HVs. Before we look at how this is implemented, we'll first see what a hash actually is...
A hash is actually quite a clever data structure: it's a combination of an array and a linked list. Here's how it works:
The hash key undergoes a transformation to turn it into a number called, confusingly, the hash value. For Perl, the C code that does the transformation looks like this: (from hv.h)
register const char *s_PeRlHaSh = str; register I32 i_PeRlHaSh = len; register U32 hash_PeRlHaSh = 0; while (i_PeRlHaSh--) { hash_PeRlHaSh += *s_PeRlHaSh++; hash_PeRlHaSh += (hash_PeRlHaSh << 10); hash_PeRlHaSh ^= (hash_PeRlHaSh >> 6); } hash_PeRlHaSh += (hash_PeRlHaSh << 3); hash_PeRlHaSh ^= (hash_PeRlHaSh >> 11); (hash) = (hash_PeRlHaSh += (hash_PeRlHaSh << 15));Converting that to Perl and tidying it up:
sub hash { my $string = shift; my $hash; for (map {ord $_} split //, $string) { $hash += $_; $hash += $hash << 10; $hash ^= $hash >> 6; } $hash += $hash << 3; $hash ^= $hash >> 1; return ($hash + $hash << 15); }
This hash is distributed across an array using the modulo operator. For instance, if our array has 8 elements, ("Hash buckets") we'll use $hash_array[$hash % 8]
Each bucket contains a linked list; adding a new entry to the hash appends an element to the linked list. So, for instance, $hash{"red"}="rouge" is implemented similar to
push @{$hash->[hash("red") % 8]}, { key => "red", value => "rouge", hash => hash("red") };
Why do we store the key as well as the hash value in the linked list? The hashing function may not be perfect - that is to say, it may generate the same value for "red" as it would for, say, "blue". This is called a hash collision, and, while it is rare in practice, it explains why we can't depend on the hash value alone.
Hashes come in two parts: the HV is the actual array containing the linked lists, and is very similar to an AV; the things that make up the linked lists are hash entry structures, or HEs. From hv.h:
/* entry in hash value chain */ struct he { HE *hent_next; /* next entry in chain */ HEK *hent_hek; /* hash key */ SV *hent_val; /* scalar value that was hashed */ }; /* hash key -- defined separately for use as shared pointer */ struct hek { U32 hek_hash; /* hash of key */ I32 hek_len; /* length of hash key */ char hek_key[1]; /* variable-length hash key */ };
As you can see from the above, we simplified slightly when we put the hash key in the buckets above: the key and the hash value are stored in a separate structure, a HEK.
The HEK stored inside a hash entry represents the key: it contains the hash value and the key itself. It's stored separately so that Perl can share identical keys between different hashes - this saves memory and also saves time calcu.llating the hash value. You can use the macros HeHASH(he) and HeKEY(he) to retrieve the hash value and the key from a HE.
Now to turn to the HVs themselves, the arrays which hold the linked lists of HEs. As we mentioned, these are not too dissimilar from AVs.
% perl -MDevel::Peek -e '%a = (red => "rouge", blue => "bleu"); Dump(\%a);' SV = RV(0x8106c80) at 0x80f1370 REFCNT = 1 FLAGS = (TEMP,ROK) RV = 0x81057a0 SV = PVHV(0x8108328) at 0x81057a0 REFCNT = 2 FLAGS = (SHAREKEYS) IV = 2 NV = 0 ARRAY = 0x80f7748 (0:6, 1:2) hash quality = 150.0% KEYS = 2 FILL = 2 MAX = 7 RITER = -1 EITER = 0x0 Elt "blue" HASH = 0x8a5573ea SV = PV(0x80f17b0) at 0x80f1574 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x80f5288 "bleu"\0 CUR = 4 LEN = 5 Elt "red" HASH = 0x201ed SV = PV(0x80f172c) at 0x80f1460 REFCNT = 1 FLAGS = (POK,pPOK) PV = 0x80ff370 "rouge"\0 CUR = 5 LEN = 6