Galet

Matrix multiplication memory schedule generator

 [Presentation] [Papers] [Code] [Contacts]

[Presentation]
 [Papers] [Code] [Contacts]

Galet is a brute force search algorithm, similar to the pebble game of Huss-Lederman et al. 1996, for memory schedules of matrix multiplication, and in particular of Strassen-Winograd's algorithms.

A sequence of computations is represented as a directed graph.

A node represents a program variable. The nodes can be classified as initials (when they correspond to inputs), temporaries (for intermediate computations) or finals (results or nodes that we want to keep, such as ready-only inputs).

The arcs represent the operations; they point from the operands to the result.

A pebble represents an allocated memory. We can put pebbles on any nodes, move or remove them according to a set of simple rules shown below.
When a pebble arrives on a node, the computation at the associated variable starts, and can be ``partially'' or ``fully'' executed.
If not specified, it is assumed that the computation is fully executed.

Arcs can be removed, when the corresponding operation has been computed.

These two rules are especially useful for accumulation operations: for example, it is possible to try schedule the multiplication separately from the addition in an otherwise recursive AB+C call; the multiplication-involved arcs would then be removed first and the accumulated part later.
It is also useful if we do not want to fix the way some additions are performed: if U_3 = P_1 + P_6  + P_7 the associativity allows different ways of computing the sum and we let the program explore these possibilities.

At the beginning of the exploration, each initial node has a pebble and we may have a few extra available pebbles.
The program then tries to apply the following rules, in order, on each node. The program stops when every final node has a pebble or when there is no more allowed move:
1. Computing a result/removing arcs. If a node has a pebble and parents with pebbles, then the operation can be performed and the corresponding arcs removed. The node is then at least partially computed.
2. Freeing some memory/removing a pebble. If a node is isolated and not final, its pebble is freed. This means that we can reclaim the memory here because this node has been fully computed (no arcs pointing to it) and is no longer in use as an operand (no arcs initiating from it).
3. Computing in place/moving a pebble. If a node P has a full pebble and a single empty child node S and if other parents of S have pebbles on them, then the pebble on P may be transferred  to S (corresponding arcs are removed). This means an operation has been made in place in the parent P's pebble.
4. Using more memory/adding a pebble. If parents of an empty node N have pebbles and a free pebble is available, then N can be assigned this pebble and the corresponding arcs are removed. This means that the operation is computed in a new memory location.
5. Copying some memory/duplicating a pebble. A computed node having a pebble can be duplicated. The arcs pointed to or from the original node are then rearranged between them. This means that a temporary result has been copied into some free place to allow more flexibility.

[Papers]
 [Presentation] [Code] [Contacts]

[Source Code]
 [Presentation] [Papers] [Contacts]

•  0.1-alpha ( Last update: Friday, June 1st 2011 )

• The input schedule file format should follow the following specification:
```--------------------------------------------------------------------
|#operation                                                        |
|name arity prop_res prop_op1 prop_op2 |                           |
|...                                                               |
|#algo                                                             |
|tmp0 name_op tmp1 tmp2                                            |
|...                                                               |
|#noeuds                                                           |
|name size_type [prop]                                             |
|...                                                               |
--------------------------------------------------------------------
```
• "#operation" introduces the properties for the operations.
• name: Name of operation.
• arity: Number of parameters (-1 if not fixed).
• prop_res: Property of the result (0 for now).
• prop_opi: Property (between 0 and 15, see below) of paremeter i .
• 0 is the lower possible choice for a parameter property ;
• add one if the operation can be made in place here ;
• add 2 if the operation overwrites this operand ;
• add 4 if this parameter overwrites the pebble on the result node ;
• add 8 if this parameter needs to be computed first.
• Two operations with different properties can share the same name. The properties are then separated by a '|'.
• Each line ends with a '|' symbol.

• "#algo" introduces the algorithm (order of tasks is irrelevant).
• add: "C = A + B" is represented by "C add A B".
• mul: "C = A * B" is represented by "C mul A B".
• acc: "C = A*B+D" is represented by "C acc A B D".

• "#nodes" lists special nodes.
• size_type: variables with the same number have the same size.
• prop: 2 if variable is overwritable, leave blank otherwise.

• a comment is a line starting with ##.

[Contacts]  [Presentation] [Papers] [Code]

 , Laboratoire Jean Kuntzmann B.P. 53 -- 51, av. des Mathématiques, 38041 Grenoble, France Laboratoire LIG Antenne de Montbonnot, 51, avenue Jean Kuntzmann F38330 Montbonnot, France. School of Computer Science University of Waterloo Waterloo, ON, N2B 3G1, Canada. http://ljk.imag.fr/membres/Jean-Guillaume.Dumas http://membres-liglab.imag.fr/pernet http://www.scg.uwaterloo.ca/people/w2zhou.shtml