Matrix Decomposition Python Project: Matrix Decomposition

代做Matrix Decomposition作业、代写algorithm design作业、代写Python编程作业、代做Python程序留学生作业
Python Project: Matrix Decomposition
Instructor: Qi Deng
Deadline 10/21/2018 11:59am (Noon)
Update: 10/14/2018
Background
In this project, you will develop efficient algorithms for matrix decomposition and direct solver for linear
system . Walking through this project, you will be familiar with algorithm design in the world of
numerical computing.
For many applications of numerical linear algebra, we would rather keep in the decomposed
form. Here are some "simpler" matrices retaining certain structures. For example, we can view
where and are lower and upper triangular matrices, respectively. If we want to solve
, the LU decomposition allows us to solve two easier problems:
This is precisely the idea underlying Gaussian elimination we learned in the first year, but in a different
view. LU decomposition has some other advantages. Once we obtain the decomposition, the two
triangular systems only cost , which is significantly better than . However, we don't get
reduced complexity for free, as it could take to compute LU in the first place. That being said, LU is
still a great choice for practical use. Suppose now we want to solve a sequence of linear systems for
vectors while using the same . Then we only need to perform one LU while still
solving cheap triangular system each time.
In this project you will implement for general matrix and a variant called Cholesky factorization for
symmetric matrix. Note that there are many other decompositions for various purposes. For instance, QR
decomposition lets us write where is orthogonal and is upper-triangular. can be very
useful in Krylov subspace methods and eigen decomposition. One who is interested in this can find more
in courses on numerical computing 1 . The pseudocode of LU or Cholesky can be found in the book as
well.
Rules
1. You can exchange ideas with other teams but you are not allowed to exchange code and any
writings.
2. For your algorithm implementation (in src/), you are not allowed to use any external matrix
factorization such as Scipy.linalg.cholesky. However, you are allowed to examine the
correctness of your algorithm by comparing with output of those libraries (in tests/).
3. Submission after deadline will have 20 points penalty each passing day.
4. Compress your project in a single zip file for submission, remember again, don't include any
datasets.
5. Submission by one team meber is enough, if 张三 is your team leader, name the zip file
zhangsan.zip. But do include the names and IDs of all your members in the report.
Questions （95 points）
1. LU (20 points).
Implement LU decomposition for non-degenerate square matrix. We attempt to write our algorithm in an
OOP style. In the following, we provide a base class for any matrix decomposition algorithm. You will
need to complete the code for LU class. Althouth there are infinite ways to decompose into , in
your code, make sure the diagonal of is 1. (Refer to 3.2.8 of book 1 for implementations)
# decompose.py
class Decompose(object):
""" Matrix in decomposed form
"""
def __init__(self):
pass

def factorize(self, A):
# Abstract methods for matrix factorization
pass

# lu.py

class LU(Decompose):
def factorize(self, A):
# A is a non-degenerate matrix
# Return P, L, U, here P is a permutation;
# it will be the permutation list in the pivoted LU
L = np.zeros(A.shape)
U = np.zeros(A.shape)
# Todo
P = list(range(A.shape[0]))
You can use the following code to generate synthetic matrices to test your LU.
2. Cholesky. (20 points)
The above LU can be applied to any non-degenerate square matrix. If we further assume that matrix is
symmetric positive definite, it immediately follows that . The advantage of exploiting symmetry is
that now we can obtain for free. In such case, we can obtain by Cholesky decomposition.
（Refer to 4.2.5 of book 1 ).
3. Pivoting (20 points).
Let us walk through an example to see the importance of pivoting. Even though this is an artificial
example, it may still help us to understand the importance of pivoting better. Consider solving the
following equations where
return P, L, U

def testcase(p=100):
L = np.tril(np.random.normal(size=[p, p]))
np.fill_diagonal(L, np.abs(L.diagonal()))
U = np.tril(np.random.normal(size=[p, p])).T
np.fill_diagonal(U, np.abs(U.diagonal()))
A = L.dot(U)
return A
# chol.py
class Chol(Decompose):
def factorize(self, A):
""" A is a SPD matrix
return P, L, here P is a permutation.
Find L such that A = LL'
"""
L = np.zeros(A.shape)
# TODO
P = list(range(A.shape[0]))
return P, L
A = np.asarray([[1e-10, 1], [1, 0]])
b=np.array([1, 1e-7])
If we do LU without pivoting, make sure your code will generate:
In Python, we can solve the triangular system by function scipy.linalg.solve_triangular. Thus the
following two lines should give us the solution of .
This will show
The solution is wrong in sense that the second equation gets the wrong answer.
Now switch the rows of A and b in th following way, use the triangular solver to find , and x again, do
you get the same answer or not? And what is your observation?
You probably see why we need large pivots, the point is that good pivoting rule can stablize and ,
which means stronger stability in solving the linear systems. Now implement a simple pivoting rule
in LU to realize this idea. That is to say you will need to permutate rows of when you find small
number in the diagonal. Consequently, you will generate such that
L = array([[1.e+00, 0.e+00],
[1.e+10, 1.e+00]])
U = array([[ 1.e-10, 1.e+00],
[ 0.e+00, -1.e+10]])
y = solve_triangular(L, b, lower=True)
x = solve_triangular(U, y, lower=False)
print(x)
[0. 1.]
A = np.asarray([[1, 0], [1e-10, 1]])
b=np.array([ 1e-7, 1])
# pivoted lu.py
def __init__(self, permute_row=False):
"""Todo
self.permute_row = permute_rowd
permute_row is a boolean variable, if true, compute PA=LU;
otherwise compute A = LU;
"""
pass
def factorize(self, A):
# Return additional P matrix where P is a permutation matrix,
# P should be stored in a a permutation list (list or 1-dim array),
# Todo
return P, L, U
In general, a permutation matrix 2 is a sparse matrix of 0 and 1, and each column and row sums up to 1.
For example,
In practice, it is more efficient to store it as a permutation list: . In your implementation,
make sure that the returned permutation is stored in a list. Please check wikipedia 2 for more
information.
4. Sparsity (35 points).
In many applications (in fact, most of the intersting ones), is an extremely sparse matrix, their nnzs
grows sublinearly with the dimension of the matrix ( , often ).
Part a) (15 points) An important question we mentioned is how to perform pivoting and reduce "fill in"
in matrix decomposition. That is to say, we want to avoid adding new nonzeros to the factorized matrix.
Suppose we generate in the following way. Clearly it is a sparse matrix. Apply your cholesky
decomposition to to obtain . Run the following code and how many nonzeros are there in ? Is
still as sparse as ? Here np.count_nonzero counts the number of nonzeros in ndarray.
Suppose we permute to , apply your Chol code to obtain and how many number of nonzeros are
there in , how is the sparsity compared with ?
Now you may have an impression that the ordering of rows and columns affects the sparsity pattern. In
practice we would like to have decomposition as sparse as possible. Try to implement some pivoting
heuristics so that you can reduce the "fill in" . Don't worry about efficiency of ndarray in this part.
p = 100
A = np.zeros((p, p), dtype=float)
A[:, 0] = 0.1
A[0, :] = 0.1
A[np.arange(p-1), np.arange(p-1) + 1] = 0.1
A[np.arange(p-1) + 1, np.arange(p-1)] = 0.1
np.fill_diagonal(A, 2)
mychol = Chol()
La = mychol.factorize(A)
print(np.count_nonzero(La))
B = A[::-1, ::-1]
part b) (20 points) It is prohibitive to perform an algorithm for large . Try to improve the
efficiency of Chol by replacing ndarray with sparse format. Even better, use C level programming in
Cython which may significantly improve the speed. In your code, implement one or few advanced direct
methods for sparse linear system 3 . A simple strategy of your code can be that:
1. Observe the nnz pattern of ; build adjacency graph
2. Find good ordering by heurstics
3. Predict the nonzeros of (Symbolic)
4. Perform factorization
Our TA Shengyu provides the following code to test some benchmarks by calling sksparse, which is the
Python interface for SuiteSparse. You can see how the state of the art solver performs. The data are in
mm format.
You can carry out a your test in the same way, make sure that the reconstructed matrix is as close to the
original one as possible: . We will consider two perspectives when evaluating the code
performance.
1. Correctness in solving , you will solve triangular systems and , you can
class Chol(Decompose):

def __init__(self, pivoting):
"""Todo: pivoting (bool) is a flag for enable pivoting
"""
def factorize(self, A):
""" Todo: Find P, L such that PAP' = LL'
"""
return P, L
# sparsechol.py
class SparseChol(Decompose):

def __init__(self, pivoting):
"""Todo: pivoting (bool) is a flag for enable pivoting
"""
pass
def factorize(self, A):
"""A is a sparse matrix, csc, csr or coo."""
# Todo, perform sparse cholesky factorization,
# You may use it as an interface and leave the implementation details in external
pyx code
return P, L
find the corresponding functions in scipy.sparse.
2. If your code is correct, then we will check efficiency of factorization. How long does it take to run
the factorize method, and does it generate a reasonably sparse ?
About Cython
1. Here I provide a simple setup file, you don't have to use the same as mine. Suppose you have two
cython file fast_chol and cy_utils.cy under src/. You need the Extension class to wrap up
your cython configuration. In this example I need C interface of numpy and blas which is installed
in openblas. This may not be necessary for you.
import numpy as np
from scipy.io import mmread
from sksparse.cholmod import cholesky
from scipy.sparse.linalg import norm as spnorm, spsolve_triangular as spsolve
from numpy.linalg import norm
import time
def test_norm_diff(A, L, P, tol=1e-8):
permA = A[P[:, np.newaxis], P[np.newaxis, :]]
assert spnorm(permA - L.dot(L.T)) / spnorm(permA) < tol
def test_normal_equation(A, L, P, tol=1e-8):
b = np.random.randn(A.shape[0])
PT = np.zeros(P.shape, dtype=int)
PT[P] = np.arange(P.shape[0])
y = spsolve(L.tocsr(), b[P])
z = spsolve(L.T, y, lower=False)
x = z[PT]
assert norm(A.dot(x) - b) / norm(b) < tol
mat = mmread("LFAT5/LFAT5.mtx").tocsc()
tstrt = time.time()
factor = cholesky(mat)
print("time cost: {:.2e} secs".format(time.time() - tstrt))
test_norm_diff(mat, factor.L(), factor.P())
test_normal_equation(mat, factor.L(), factor.P())
"""
Setup cython file
python setup.py build_ext --inplace
"""
from distutils.core import setup
2. Use Cython interface of numpy.ndarray. Also check memoryview and C raw pointer in the Cython
book if you want to squeeze even more running time.
Structures （5 points)
For submission, you should have your project organized, please stick to the following structure.
Readme.md: let the TA knows how to run your code.
report.ipynb or report.md . Leave your algorithm description, discussion and conclusion here.
setup.py and Makefile are for configuring your code, they are optional
src/ contains all the source code
test/ test scripts to run your code
datasets/ You can put downloaded data or generated data here but do not include them for
submission!!! (Don't lose 5 points for this.)
You can make any auxiliary folder or file if necessary.
from distutils.extension import Extension
from Cython.Build import cythonize
import numpy as np
from numpy.distutils.system_info import get_info
from sklearn._build_utils import get_blas_info
blas_include = "/usr/local/opt/openblas/include"
extensions = [
Extension("src.fast_chol", sources=['src/fast_chol.pyx'],
include_dirs=[np.get_include(), blas_include]),
Extension("src.cy_utils", sources=['src/cy_utils.pyx'],
include_dirs=[np.get_include(), blas_include]),
]
setup(
name='spchol',
ext_modules=cythonize(extensions),
)
yourproject/
Readme.md
report.ipynb
setup.py
Makefile
src/
__init__.py
decompose.py
lu.py
chol.py
spchol.py
my_secret_code.pyx
1. Golub, Gene H., and Charles F. Van Loan. 1996. “Matrix Computations (3rd Ed.).”
2. Permutation matrix. https://en.wikipedia.org/wiki/Permutation_matrix
3. Davis, Timothy A., Sivasankaran Rajamanickam, and Wissam M. Sid-Lakhdar. 2016. “A Survey of Direct Methods for Sparse Linear Systems.” Acta
Here I provide a simple script to load data in .mat format. I only test it on a few datasets and it seems
working.
Testing dataset
We will consider two types of datasets
1. Dense random data.
2. Sparse data from SuiteSparse matrix collection. They provide a convenient Java GUI for
downloading data.
Reference

http://www.5daixie.com/contents/9/1741.html

因为专业，所以值得信赖。如有需要，请加QQ：99515681 或邮箱：[email protected]

微信：codinghelp

Matrix Decomposition Python Project: Matrix Decomposition

猜你喜欢