Arrays (Learning the Korn Shell, 2nd Edition)

6.4.1. Indexed Arrays

The Korn shell provides an indexed array facility that, while useful, is much more limited than analogous features in conventional programming languages. In particular, indexed arrays can be only one-dimensional (i.e., no arrays of arrays), and they are limited to 4096 elements.[86] Indices start at 0. This implies that the maximum index value is 4095. Furthermore, they may be any arithmetic expression: ksh automatically evaluates the expression to yield the actual index.

[86] 4096 is a minimum value in ksh93. Recent releases allow up to 64K elements.

There are three ways to assign values to elements of an array. The first is the most intuitive: you can use the standard shell variable assignment syntax with the array index in brackets ([]). For example:

nicknames[2]=bob
nicknames[3]=ed

These assignments put the values bob and ed into the elements of the array nicknames with indices 2 and 3, respectively. As with regular shell variables, values assigned to array elements are treated as character strings unless the assignment is preceded by let, or the array was declared to be numeric with one of the typeset options -i, -ui, -E, or -F. (Strictly speaking, the value assigned with let is still a string; it's just that with let, the shell evaluates the arithmetic expression being assigned to produce that string.)

The second way to assign values to an array is with a variant of the set statement, which we saw in Chapter 3. The statement:

set -A aname val1 val2 val3 ...

creates the array aname (if it doesn't already exist) and assigns val1 to aname[0], val2 to aname[1], etc. As you would guess, this is more convenient for loading up an array with an initial set of values.

The third (recommended) way is to use the compound assignment form:

aname=(val1 val2 val3)

Starting with ksh93j, you may use the += operator to add values to an array:

aname+=(val4 val5 val6)

To extract a value from an array, use the syntax ${aname[i]}. For example, ${nicknames[2]} has the value "bob". The index i can be an arithmetic expression -- see above. If you use * or @ in place of the index, the value will be all elements, separated by spaces. Omitting the index ($nicknames) is the same as specifying index 0 (${nicknames[0]}).

Now we come to the somewhat unusual aspect of Korn shell arrays. Assume that the only values assigned to nicknames are the two we saw above. If you type print "${nicknames[*]}", you will see the output:

bob ed

In other words, nicknames[0] and nicknames[1] don't exist. Furthermore, if you were to type:

nicknames[9]=pete
nicknames[31]=ralph

and then type print "${nicknames[*]}", the output would look like this:

bob ed pete ralph

This is why we said "the elements of nicknames with indices 2 and 3" earlier, instead of "the 2nd and 3rd elements of nicknames". Any array elements with unassigned values just don't exist; if you try to access their values, you get null strings.

You can preserve whatever whitespace you put in your array elements by using "${aname[@]}" (with the double quotes) instead of ${aname[*]}, just as you can with "$@" instead of $* or "$*".

The shell provides an operator that tells you how many elements an array has defined: ${#aname[*]}. Thus ${#nicknames[*]} has the value 4. Note that you need the [*] because the name of the array alone is interpreted as the 0th element. This means, for example, that ${#nicknames} equals the length of nicknames[0] (see Chapter 4). Since nicknames[0] doesn't exist, the value of ${#nicknames} is 0, the length of the null string.

If you think of an array as a mapping from integers to values (i.e., put in a number, get out a value), you can see why arrays are "number-dominated" data structures. Because shell programming tasks are much more often oriented towards character strings and text than towards numbers, the shell's indexed array facility isn't as broadly useful as it might first appear.

Nevertheless, we can find useful things to do with indexed arrays. Here is a cleaner solution to Task 5-4, in which a user can select his or her terminal type (TERM environment variable) at login time. Recall that the "user-friendly" version of this code used select and a case statement:

print 'Select your terminal type:'
PS3='terminal? '
select term in \
    'Givalt GL35a' \
    'Tsoris T-2000' \
    'Shande 531' \
    'Vey VT99'
do
    case $REPLY in
        1 ) TERM=gl35a ;;
        2 ) TERM=t2000 ;;
        3 ) TERM=s531 ;;
        4 ) TERM=vt99 ;;
        * ) print "invalid." ;;
    esac
    if [[ -n $term ]]; then
        print "TERM is $TERM"
        export TERM
        break
    fi
done

We can eliminate the entire case construct by taking advantage of the fact that the select construct stores the user's numeric choice in the variable REPLY. We just need a line of code that stores all of the possibilities for TERM in an array, in an order that corresponds to the items in the select menu. Then we can use $REPLY to index the array. The resulting code is:

set -A termnames gl35a t2000 s531 vt99
print 'Select your terminal type:'
PS3='terminal? '
select term in \
    'Givalt GL35a' \
    'Tsoris T-2000' \
    'Shande 531' \
    'Vey VT99'
do
    if [[ -n $term ]]; then
        TERM=${termnames[REPLY-1]}
        print "TERM is $TERM"
        export TERM
        break
    fi
done

This code sets up the array termnames so that ${termnames[0]} is ``gl35a'', ${termnames[1]} is "t2000", etc. The line TERM=${termnames[REPLY-1]} essentially replaces the entire case construct by using REPLY to index the array.

Notice that the shell knows to interpret the text in an array index as an arithmetic expression, as if it were enclosed in (( and )), which in turn means that the variable need not be preceded by a dollar sign ($). We have to subtract 1 from the value of REPLY because array indices start at 0, while select menu item numbers start at 1.

Think about how you might use arrays to maintain the directory stack for pushd and popd. The arithmetic for loop might come in handy too.

6.4.2. Associative Arrays

As mentioned in the previous section, shell programming tasks are usually string- or text-oriented, instead of number-oriented. ksh93 introduced associative arrays into the shell to improve the shell's programmability. Associative arrays are a mainstay of programming in languages such as awk, perl, and python.

An associative array is an array indexed by string values. It provides an association between the string index and the value of the array at that index, making programming certain kinds of tasks work much more naturally. You tell the shell that an array is associative by using typeset -A:

typeset -A person
person[firstname]="frank"
person[lastname]="jones"

We can rewrite our terminal example from the previous section using associative arrays:

typeset -A termnames                  termnames is associative
termnames=([Givalt GL35a]=gl35a       Fill in the values
           [Tsoris T-2000]=t2000
           [Shande 531]=s531
           [Vey VT99]=vt99)
print 'Select your terminal type:'
PS3='terminal? '
select term in "${!termnames[@]}"     Present menu of array indices
do
    if [[ -n $term ]]; then
        TERM=${termnames[$term]}      Use string to index array
        print "TERM is $TERM"
        break
    fi
done

Note that the square brackets in the compound assignment act like double quotes; while it's OK to quote the string indices, it's not necessary. Also note the "${!termnames[@]}" construct. It's a bit of a mouthful, but it gives us all the array indices as separate quoted strings that preserve any embedded whitespace, just like "$@". (See the next section.)

Starting with ksh93j, as for regular arrays, you may use the += operator to add values to an associative array:

termnames+= ([Boosha B-27]=boo27 [Cherpah C-42]=chc42)

As a side note, if you apply typeset -A to a previously existing nonarray variable, that variable's current value will be placed in index 0. The reason is that the shell treats $x as equivalent to ${x[0]}, so that if you do:

x=fred
typeset -A x
print $x

you will still get fred.

6.4.3. Array Name Operators

In Chapter 4 we saw that the shell provides a large number of ways to access and manipulate the values of shell variables. In particular, we saw operators that work with shell variable names. Several additional operators apply to arrays. They are described in Table 6-5.

Table 6-5. Array name-related operators

Operator	Meaning
`${!``array``[subscript]}`	Return the actual subscript used to index the array. Subscripts can come from arithmetic expressions or from the values of shell variables.
`${!``array``[*]}`	List of all subscripts in the array associative array.
`${!``array``[@]}`	List of all subscripts in the array associative array, but expands to separate words when used inside double quotes.

You can think of the ${!...} construct to produce the actual array as being conceptually similar to its use with nameref variables. There, it indicates the actual variable that a nameref refers to. With arrays, it yields the actual subscript used to access a particular element. This is valuable because subscripts can be generated dynamically, e.g., as arithmetic expressions, or via the various string operations available in the shell. Here is a simple example:

$ set -A letters a b c d e f g h i j k l m n o p q r s t u v w x y z
$ print ${letters[20+2+1]}
x
$ print ${!letters[20+2+1]}
23

To loop over all elements of an indexed array, you could easily use an arithmetic for loop that went from 0 to, for example, ${#letters[*]} (the number of elements in letters). Associative arrays are different: there are no lower or upper bounds on the indices of the array, since they're all strings. The latter two operators in Table 6-5 make it easy to loop through an associative array:

typeset -A bob                             Create associative array
...                                        Fill it in
for index in "${!bob[@]}"; do              For all bob's subscripts
    print "bob[$index] is ${bob[$index]}"  Print each element
    ...
done

Analogous to the difference between $* and "$@", it is best to use the @ version of the operator, inside double quotes, to preserve the original string values. (We used "${!var[@]}" with select in the last example in the earlier section on associative arrays.)


6.3. Arithmetic for		6.5. typeset

6.4. Arrays

6.4.1. Indexed Arrays

6.4.2. Associative Arrays

6.4.3. Array Name Operators

Table 6-5. Array name-related operators