Application of Efficient Matrix Inversion to the Decomposition of Hierarchical Matrices

On this page, you can find my Bachelor's Thesis, supplementary material and errata.

Keywords: hierarchical matrices, linear systems, matrix inversion, nested dissection, numerical math, optimization, parallel computing.



(Copied from § 1 of the written thesis.)

Solving linear systems – that is, given Mn × n and b(1), …, b(k)n for n, k, finding x(1), …, x(k)n such that M x(l) = b(l) for l ∈ {1, …, k} – is one of the fundamental tasks of numerical mathematics.

We are studying systems where the coefficient matrix M is symmetric and positive definite and has a special hierarchical structure that is obtained by performing a procedure known as nested dissection that yields a structure like the following

Schematic layout of an H-matrix obtained via nested dissection.

where the blank regions are known to be zero and the shaded blocks may contain anything. Such hierarchical matrices (H-matrices) are stored as a tree of sub-matrices with ordinary matrix representations at the leafs. Many operations for such matrices can be formulated elegantly using recursion.

Previous Work

Maurer and Wieners have adapted block LU decomposition for such matrices and presented an implementation closely related to finite-element computations for distributed memory systems.

Meanwhile, Sanders, Speck and Steffen have presented an algorithm that combines Newton and Strassen inversion to obtain a work-efficient matrix inversion algorithm (NeSt) for symmetric and positive definite matrices with poly-logarithmic time complexity.

Our Contribution

We have combined block LU decomposition of H-matrices with nested dissection structure – which is block LLT decomposition for symmetric and positive definite matrices – with matrix inversion to obtain a new scheme that we call block LDLT decomposition. It computes a decomposition into a lower triangular block matrix and a block diagonal matrix where the latter is inverted.

Using NeSt for matrix inversion, this algorithm's work is dominated almost exclusively by computing matrix-matrix products for which the most highly optimized libraries are available off-the-shelf. Furthermore, it is important that matrix-matrix multiplication has a logarithmic critical path length if computed in parallel while Cholesky factorization and substitution which are needed for the block LLT version both have a linear critical path. Therefore, our variant is supposed to take out more of the hardware and scale better to highly parallel systems. On the down side, the LDLT variant has to perform more work for the decomposition.

Once the decomposition is computed, both algorithms can solve systems for the various right-hand sides with equal work required. However, the LDLT variant has again a shorter critical path and only needs matrix-vector products, which are again readily available in highly optimized versions.

We have implemented block LLT and LDLT decomposition for shared-memory systems using the C++ programming language. The source code is available as free software.

Tests show that for a machine with 32 CPUs and 500 GiB memory, the achieved ratio of effective floating point instructions per unit time is better roughly by a factor of two which sometimes compensates for the additional work that is to be done so with regard to overall execution time, there are problems where either of the algorithms outperforms the other. However, the advantage of block LDLT is not significant and for smaller systems, the LLT version is much faster.

On the other hand, we could show a significant improvement on the execution time when it comes to solving individual linear systems, once a decomposition is computed.

Other than one might expect, the additional errors introduced by computing the inverse matrix via an iterative procedure – as opposed to a direct Cholesky factorization – are significant but very moderate (≈ 10 %), at least for the well-conditioned systems we have investigated.


File Signature Description
thesis_desktop.pdf thesis_desktop.pdf.sig Written thesis with colorized embedded hyperlinks. Good for reading on-screen.
thesis_print.pdf thesis_print.pdf.sig Written thesis without colorized embedded hyperlinks. The graphics still use color. Good for printing.
raw_data.sqlite raw_data.sqlite.sig Raw experimental data in an SQLite database file (schema.sql).
report-1.0.tar.gz report-1.0.tar.gz.sig Source code for the report (includes experimental data).
code-1.0.tar.gz code-1.0.tar.gz.sig Source code for the implementation of the algorithms.
blas-benchmarks-1.0.tar.gz blas-benchmarks-1.0.tar.gz.sig Source code for the examples and benchmarks we have written for comparing some linear algebra libraries.


Here I am listing errata that were found in the printed thesis. Included are only errors that might affect understanding, not just spelling and grammar mistakes (There are just too many of them!). Also not listed are insights that things might have better be done differently.

Valid XHTML 1.0 Strict Valid CSS
jump to navigation menu