>>  <<  Usr  Pri  JfC  LJ  Phr  Dic  Voc  !:  Help  Learning J

Chapter 30: Sparse Arrays

30.1 Introduction

The sparse array facility of J allows a large array to be stored in the computer in a moderate amount of memory if many of the array's elements are all the same. In this case a value which occurs many times need be stored only once.

For an example, sparse representation might be considered for a connection matrix describing a network. In this chapter we will look at the J machinery for handling sparse arrays.

Suppose that D is a matrix with most of its elements the same:

   ] D =: 2 3 4 (2 2; 3 6; 4 4) } 16 16 $ 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 2 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0
0 0 0 0 4 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
   
This array can be stored in a compact form, called a "sparse array", where only its non-zero elements occupy storage. An ordinary array which is not sparse may be called a "dense" array.

There is a built-in function, $. (dollar dot) to compute a sparse array from a dense.

   S =: $. D
For many purposes dense matrix D and sparse matrix S are equivalent: S matches D, and therefore it has the same dimensions, and gives the same result on indexing:

S -: D ($S) -: ($D) ((< 0 0){ S) -: (<0 0) { D
1 1 1

30.2 Sparse Array is Compact

Compared to matrix D, matrix S is economical in storage because the value which occurs many times in D is stored only once in S. This value is known as the "sparse element" of S, or the "zero" element of S. It happens to be 0 in the case of S, but need not be 0 always.

We can measure the size of the storage occupied by an array with the built-in 7!:5. We see that the size of S (which the sparse form of D) is smaller than the size D itself:

7!:5 <'S' 7!:5 <'D'
384 2048

30.3 Inspecting A Sparse Array

There is a useful function datatype in the standard library. It shows the type of its argument.

datatype D datatype S
integer sparse integer

Recall that the built verb 3!:0 also gives the type of its argument. For a sparse array, the possible types reported by 3!:0 are

1024 sparse boolean
2048 sparse character
4096 sparse integer
8192 sparse floating point
16384 sparse complex
32768 sparse boxed
If we display S in the usual way , we see, not the familiar representation of a matrix, but instead a list of index-value pairs, one pair for each (in this example) non-zero element.

   S 
2 2 | 2
3 6 | 3
4 4 | 4
   
This display does not show that the sparse element of S is in fact integer zero. To show this, we can extract the sparse element with the verb 3 & $. .

se =: 3 $. S datatype se
0 integer

If we now compute a new matrix from S

   T =: S + 5
   
we see that T is sparse, and the sparse element of T is not zero but 5

T 3 $. T
2 2 | 7
3 6 | 8
4 4 | 9
5

Another way to view a sparse array is simply to convert it to dense with 0 & $.

   view =: 0 & $.   
   

T view T
2 2 | 7
3 6 | 8
4 4 | 9
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 7 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 8 5 5 5 5 5 5 5 5 5
5 5 5 5 9 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 5

30.4 Computing with Sparse Arrays

Computations with sparse arrays are pretty much the same as with dense arrays, except that they tend to produce sparse arrays as results. We saw this with S+5 above. Here is another example. Summing over T produces a vector of column-sums which is sparse
   ] V =: +/ T
2 | 82
4 | 84
6 | 83
   
but the "zero" element of V is the sum of a column of "zero" elements of T
   3 $. V 
80
    
At the time of writing, there are still some limitations on what can be done with sparse arrays compared with dense arrays. See the Dictionary under $. for more information.

30.5 Constructing A Sparse Array

At this point it will be helpful to define a few terms. First note that, according to context, the numerals 0 or 1 or 0.0 or 1.0 could be valid as boolean or integer or real. However in the absence of any context the J system takes them all to be in fact boolean.

datatype 0 datatype 1 datatype 0.0 datatype 1.0
boolean boolean boolean boolean

It will be useful to define some values of unambiguous type.

INTEGERZERO =: 3 - 3 datatype INTEGERZERO
0 integer

INTEGERONE =: 3 - 2 datatype INTEGERONE
1 integer

REALZERO =: 0.0*0.1 datatype REALZERO
0 floating

REALONE =: ^ 0 datatype REALONE
1 floating

Returning now to sparse arrays, the recommended method of constructing them is to begin by making an empty array of the required shape and type, but with no actual data.

An empty array is built by evaluating the expression

        1 $. shape;axes;zero
where
  • shape specifies the dimensions
  • axes specifies which of those dimensions will be sparse, as a list of axis-numbers. For example, with 2 dimensions both sparse the list would be 0 1

    So far, in the examples of sparse arrays, all axes have been sparse but we will see below mixed sparse and dense axes.

  • zero specifies the value of the "zero" element, and hence the type of the array as a whole. An unambiguous value is evidently needed.

If zero is omitted the default is REALZERO. If both axes and zero are omitted, the default is all axes sparse and REALZERO.

So to build a 6 by 6 matrix, sparse in all dimensions (that is, on axis 0 and axis 1), of type integer with "zero" element of 0 we can write:

   U =: 1 $. 6 6 ; 0 1; INTEGERZERO 
At this point, U is empty, that is, all "zero", so displays as nothing:
   U
Populate it by inserting a few non-zero elements into it
   U =: 4 5 6 7 ( 0 0 ; 1 1; 2 2; 3 3) } U
   
and check that U is what we expect by viewing it:
   view U
4 0 0 0 0 0
0 5 0 0 0 0
0 0 6 0 0 0
0 0 0 7 0 0
0 0 0 0 0 0
0 0 0 0 0 0
   

30.6 Sparse and Dense Axes

An array may be sparse on some axes and dense on others. In the following example W is sparse on its first axis and dense on its second, because its list of sparse axes is just 0
   saw   =: ,0   NB. sparse axes for W
   
   W =: 1 $. 3 5; saw ; INTEGERZERO  
   
   W =: 4 5 6 (0 1; 0 2; 1 3) } W
   
It looks as expected:
   view W
0 4 5 0 0
0 0 0 6 0
0 0 0 0 0
   
but we see that it is stored as two dense rows only:
   W
0 | 0 4 5 0 0
1 | 0 0 0 6 0
Compare with an array sparse on second axis axis only, because its list of sparse axes is 1
   saz=: ,1   NB. sparse axes for Z
   Z =: 1 $. 3 5; saz; INTEGERZERO   
   Z =: 4 5 6 (0 1; 0 2; 1 3) } Z
Z looks just like W
   view Z
0 4 5 0 0
0 0 0 6 0
0 0 0 0 0
   
but we see it is stored as three dense colums.
   Z
1 | 4 0 0
2 | 5 0 0
3 | 0 6 0
   
   
   

30.7 Deconstructing a Sparse Array

As we noted above, if we display U itself, we see, not the familiar representation of a matrix, but instead a list of index-value pairs, one pair for each non-zero element.
   U
0 0 | 4
1 1 | 5
2 2 | 6
3 3 | 7
We can extract the index from each pair to get what is called the index-matrix of U. This is an ordinary dense array
   4 $. U
0 0
1 1
2 2
3 3
To extract the value from each pair
    5 $. U
4 5 6 7
As we noted above, 0 & $. will produce a dense array from a sparse:
   0 $. U
4 0 0 0 0 0
0 5 0 0 0 0
0 0 6 0 0 0
0 0 0 7 0 0
0 0 0 0 0 0
0 0 0 0 0 0
   
   
   

30.8 Sparse Array From Relation

Next we look at representing data as a sparse array as an alternative to representing data as a relation (that is, a table).

The point is that the sparse array may be more convenient than the relation for some computations with the data. Thus we are interested in converting between sparse arrays and relations.

For example, suppose that a given relation R represents sales of various commodities in various cities

   'Pa Qu Ro Sy' =: s: ' Paris Quebec Rome Sydney'
   'Ap Ba Ch Da' =: s: ' Apples Bananas Cherries Damsons'
   
   
   R =: (". ;. _2) 0 : 0
Ap ; Pa; 99
Ap ; Qu ; 50
Ba ; Qu ; 10
Ch ; Ro ; 19
Da ; Sy ; 110
Da ; Pa ; 88
)
   
   R
+---------+-------+---+
|`Apples  |`Paris |99 |
+---------+-------+---+
|`Apples  |`Quebec|50 |
+---------+-------+---+
|`Bananas |`Quebec|10 |
+---------+-------+---+
|`Cherries|`Rome  |19 |
+---------+-------+---+
|`Damsons |`Sydney|110|
+---------+-------+---+
|`Damsons |`Paris |88 |
+---------+-------+---+
We can convert the relation R to a sparse array as follows.

Firstly, we need to establish the domain -the set of all possible values - of the first column. It can be computed from R :

   ] Fru =:  > ~. 0 { |: R
`Apples `Bananas `Cherries `Damsons
Similarly for the domain of the second column:
   ] Cit =: > ~. 1 { |: R
`Paris `Quebec `Rome `Sydney
Now the first column converted to indices into its domain:
   ] r =: Fru i. > 0 { |: R
0 0 1 2 3 3
   
Similarly for the second column:
   ] c =: Cit i. > 1 { |: R
0 1 1 2 3 0
   
and the values from the third
   ] v =: > 2 { |: R 
99 50 10 19 110 88
   
Now we build an empty sparse array of dimensions #Fru by #Cit . By default the sparse axes will be 0 and 1 and the "zero" element will be REALZERO . The function 1&$. produces the empty array.
   A =: (1 & $.) (#Fru) , (#Cit)
Insert the values by amending in the ordinary way:
   A =: v (<"1 r,.c) } A
   
and check we have what we expect:
   view A
99 50  0   0
 0 10  0   0
 0  0 19   0
88  0  0 110
   
To display A with labelling of rows and columns, the list of row-labels is Fru computed above, and the list of column-labels is Cit :
   (a:, <"0 Cit), (<"0 Fru) ,. (<"0 view A)      
+---------+------+-------+-----+-------+
|         |`Paris|`Quebec|`Rome|`Sydney|
+---------+------+-------+-----+-------+
|`Apples  |99    |50     |0    |0      |
+---------+------+-------+-----+-------+
|`Bananas |0     |10     |0    |0      |
+---------+------+-------+-----+-------+
|`Cherries|0     |0      |19   |0      |
+---------+------+-------+-----+-------+
|`Damsons |88    |0      |0    |110    |
+---------+------+-------+-----+-------+
   
Now we have finished producing the sparse array from the original relation, so we can can compute with our data as an array.

For example, total value of sales for each city is given by:

   +/ A	
0 | 187
1 |  60
2 |  19
3 | 110
   
This is sparse, so taking the usual view :
   view +/ A
187 60 19 110
   	

30.9 Relation from Sparse Array

To complete the circle, we look next at how to produce a relation from a sparse array, A for example.
   A
0 0 |  99
0 1 |  50
1 1 |  10
2 2 |  19
3 0 |  88
3 3 | 110
The first step is to get the index-matrix for the non-zero elements.
   ] INDS =: 4 $. A   
0 0
0 1
1 1
2 2
3 0
3 3
and next the values.
   ] VALS =: 5 $. A  
99 50 10 19 88 110
The first column of the relation we produce by indexing the domain Fru which we computed above. The second column is produced similarly from Cit.
   ] c0 =: (0 { |: INDS) { Fru
`Apples `Apples `Bananas `Cherries `Damsons `Damsons
   ] c1 =: (1 { |: INDS) { Cit
`Paris `Quebec `Quebec `Rome `Paris `Sydney
   
So finally we see that the relation recovered from the sparse array is
   (<"0 c0) ,. (<"0 c1) ,. (<"0 VALS)
+---------+-------+---+
|`Apples  |`Paris |99 |
+---------+-------+---+
|`Apples  |`Quebec|50 |
+---------+-------+---+
|`Bananas |`Quebec|10 |
+---------+-------+---+
|`Cherries|`Rome  |19 |
+---------+-------+---+
|`Damsons |`Paris |88 |
+---------+-------+---+
|`Damsons |`Sydney|110|
+---------+-------+---+
This is the end of Chapter 30

NEXT
Table of Contents
Index


The examples in this chapter were executed using J version 701. This chapter last updated 21 Mar 2013
Copyright © Roger Stokes 2012. This material may be freely reproduced, provided that this copyright notice is also reproduced.


>>  <<  Usr  Pri  JfC  LJ  Phr  Dic  Voc  !:  Help  Learning J