Matrix Operations

Matrix Operations

Yu Jiangsheng

Institute of Computational Linguistics

Peking University

September 26, 2002

Topics

1. Matrix Multiplication

2. Solving System of Linear System

3. Inverting Matrixes

4. Symmetric Positive-deﬁnitive Matrixes and

Least-squares Approximation

5. Winograd Theorem and AHU Theorem

Matrix

A matrix A is, in fact, a sequence of arrays:

A =








· · · a

...

· · · a








×n

Note

From the viewpoint of transformation,

Ax describes the rotation of x around 0 and

linear stretching.

Singular Matrix

linearly independent

→ row rank and column

rank

→ rank → full rank ↔ existence of in-

verse (nonsingular)

Definition 1 A

null vector

for a matrix A

×n

is a nonzero vector x such that Ax = 0.

Homework 1

A matrix A has full column rank

iﬀ it has no null vector.

Homework 2

A square matrix is singular iﬀ

it has null vector.

Determinant of Matrix

Definition 2 The ijth

minor

of matrix A

×n

is the (n

− 1) × (n − 1) matrix A

[ij]

obtained

by deleting the ith row and the jth column

of A.

Definition 3 The

determinant

of A

×n

is de-

ﬁned recursively in terms of minors by

det(A) =






if n = 1

j=1

(

−1)

1+j

det(A

[1j]

)

if n > 1

(1)

Property 1 A

×n

is singular iﬀ det(A) = 0.

MatLab for Numerical Computing

• 1970s, Dr. Moler developed two FORTRAN sub-

routines, EISPACK for eigenvalue of matrix and
LINPACK for linear equations, which evolved to
MatLab (Matrix Laboratory).

• In 1983, John Little, Cleve Moler and Steve Bangert

developed MatLab in C.

• In 1984, J. Little and C. Moler founded Math-

Works Company.

• The recent version of MatLab is 6.1.

• Softwares of numerical computing:

MatLab

XMath

Gauss

SAS

S-Plus

Origin

, etc.

Softwares of

symbolic processing:

Mathematica

Maple

, etc.

A Demo of MatLab Program

Matrix Multiplication

ATRIX

-M

ULTIPLICATION

(A, B)

← rows[A]

let C be a n

× n matrix

for i

← 1 to n

do for j

← 1 to n

do c

← 0

for k

← 1 to n

do c

← c

+ a

· b

return C

The time cost of

ATRIX

-M

ULTIPLICATION

(A, B)

is Θ(n

), where A, B are two n

× n matrixes.

Method of Block Matrixes

Let A, B be two n

× n matrixes, where n is an

exact power of 2. Then

(2)

where A

, B

, C

are

n
2

matrixes. Thus,

the elements of C are












= A

+ A

= A

+ A

= A

+ A

= A

+ A

(3)

Time Complexity

Because the method of block matrixes divides

the multiplication to eight multiplications and

four additions of

n
2

matrixes. Therefore,

the recurrence is

(n) = 8

(n/2) + Θ(n

)

(4)

By Master Theorem,

(n) = Θ(n

). That is,

the method does not decrease the time cost.

Strassen turns recurrence (4) to

(n) = 7

(n/2) + Θ(n

)

(5)

whose solution is

(n) = Θ(n

log 7

) = O(n

2.81

)

Strassen’s Method

In 1969, Strassen found the seven variables:

= (A

+ A

)(B

+ B

)

= (A

+ A

= (A

− A

)(B

+ B

)

= A

− B

)

= (A

− A

)(B

+ B

)

= A

− B

)

= (A

+ A

Then, the (3) turns into:












= X

+ X

− X

+ X

= X

+ X

= X

+ X

= X

+ X

− X

+ X

(6)

Complexity of Multiplication

It is obvious that the best lower bound of

× n matrix multiplication is Ω(n

), because

we have to ﬁll in n

elements of the prod-

uct matrix. The current best upper bound is

approximately O(n

2.376

We have not

known the exact

complexity of ma-

trix multiplication.

Why not Strassen’s Algorithm?

In practice, Strassen’s algorithm is not a good
choice. The reason is summarized as

1. The constant factor hidden in the running time

of Strassen’s algorithm is larger than that in the
naive Θ(n

) method.

2. When the matrixes are sparse, there are faster

methods.

3. Strassen’s algorithm is not quite as numerically

stable as the naive method.

4. The submatrixes formed at the levels of recursion

consume space.

Linear Equation












· · · + a

...
a

· · · + a

(7)








· · · a

...

· · · a















...















...








(8)

Ax = b

(9)

x = A

−1

b suﬀers in practice from nu-

merical instability. A faster algorithm,

LUP decomposition, is numerically stable.

LUP Decomposition

The idea of

LUP decomposition

is to ﬁnd

three n

× n matrixes L, U and P such that

P A = LU

(10)

where

• L is a unit lower-triangular matrix,

• U is a upper-triangular matrix, and

• P is a permutation matrix.

Permutation Matrix

Definition 4 A permutation matrix P = (p

)

×n

has exactly one 1 in each row and column,
and 0’s elsewhere.

Example 1 P is a permutation matrix:

P =








0 0 1 0 0
1 0 0 0 0
0 1 0 0 0
0 0 0 0 1
0 0 0 1 0








Denote the permutation

by π. p

iπ[i]

= 1 and p

0 for j

= π[i].

Given x = (x

, x

)

, then the per-

mutation is P x = (x

, x

)

Why LUP Decomposition?

Theorem 1 If A is a nonsingular n

×n matrix,

then there exits a unique LUP decomposition

of A.

P Ax = P b

⇒ LUx = P b or L(Ux) = P b. Let

y = U x, then Equation (9) turns to two steps:

1. Ly = P b for the unknown vector y by the

method of

forward substitution

, and

2. U x = y for the unknown vector x by the

method of

backward substitution

Forward Substitution












π[1]

π[2]

π[3]

...

· · · + y

π[n]

The solution is:












= b

π[1]

= b

π[2]

− l

= b

π[3]

− (l

+ l

)

...

= b

π[i]

−

−1

j=1

(11)

Therefore, the time cost of Forward Substi-
tution is Θ(n

Backward Substitution










· · ·

1,n

−1

· · ·

2,n

−1

−1,n−1

−1

−1,n

−1

n,n

The solution is:












n,n

−1

(

−1

− u

−1,n

)

−1,n−1

...

−

j=n+1

(12)

Therefore, the time cost of Backward Sub-

stitution is Θ(n

LUP Algorithm

LUP-S

OLVE

(L, U, π, b)

← rows[L]

for i

← 1 to n

do y

← b

π[i]

−

−1

j=1

for i

← n downto 1

do x

←




−

j=n+1




return x

The complexity of

LUP-S

OLVE

is Θ(n

Example of LUP Algorithm

Example 2 Given Ax = b, where

A =

b =

3
7
8

The LUP of A is

L =

0.2

0.6

0.5

U =

0.8

−0.6

2.5

P =

Then by LUP algorithm, we have

0.2

0.6

0.5

8
3
7

⇒

y =

1.4
1.5

0.8

−0.6

2.5

1.4
1.5

⇒

x =

−1.4

2.2
0.6

LU Decomposition

Definition 5

LU decomposition

is a special

case of LUP decomposition, in which P = I

Ax=b

LUP

Gauss — Forever Genius

1777-1855

Gaussian Elimination








· · ·

...

· · ·








then,

we have

v/a

−1

− vw

v/a

−1

v/a

= LU

where A

− vw

is called the

Schur com-

plement

of A with respect to a

SOS for a

= 0

LUP

Note

If a

= 0 or the upper leftmost entry

of the Schur complement A

− vw

is 0,

then the Gaussian elimination fails.

P Saves LU

Definition 6 The elements by which we di-

vide during LU decomposition is called

pivots

and they occupy the diagonal elements of U .

The reason of P added to LU decomposition

is to avoid dividing by 0.

Definition 7 Using permutation to avoid di-

vision by 0 (or small element) is called

pivot-

ing

. But,

symmetric positive-deﬁnite matrix

does not need pivoting

(we’ll prove it later).

LU Algorithm

LU-D

ECOMPOSITION

(A)

← rows[A]

for k

← 1 to n

do u

← a

//determining the pivot

for i

← k + 1 to n

do l

← a

//l

holds v

← a

//u

holds w

for i

← k + 1 to n

do for j

← k + 1 to n

do a

← a

− l

return L and U

Property 2 The complexity of LU algorithm

is Θ(n

), also is that of LUP decomposition.

Example of LU Decomposition

Example 3 The steps of LU:

⇒































How to Get P ?

Find a permutation matrix Q such that

QA =

v/a

−1

− vw

(13)

where a

= 0, v = (a

, a

· · · , a

)

except

that a

replaces a

, and w

= (a

, a

· · · , a

The LUP decomposition of A

− vw

) = L

(14)

The

permutation matrix

P is deﬁned by

P =

0 P

(15)

since it is a product of two permutation ma-

trixes (

homework

How to Get L and U ?

P A

v/a

−1

− vw

v/a

− vw

v/a

−1

− vw

)

v/a

−1

v/a

= LU

Example of LUP Decomposition

0.6

−2

−1 −2 3.4 −1

−2

0.6

−1

−2 3.4 −1

0.6

1.6

−3.2

0.4

−2

0.4

−0.2

−1 4.2 −0.6

0.4

−2

0.4

−0.2

0.6

1.6

−3.2

−0.2 −1

4.2

−0.6

0.4

−2

0.4

−0.2

0.6

1.6

−3.2

−0.2 −0.5

−0.5

0.4

−2

0.4

−0.2

−0.2 −0.5

−0.5

0.6

1.6

−3.2

0.4

−2

0.4

−0.2

−0.2 −0.5

−0.5

0.6

0.4

−3

So, the permutation is

π =

1 2 3 4
3 1 4 2

π[i] = j indicates that the ith
row contains 1 at column j.

The permutation matrix is

P =








0 0 1 0
1 0 0 0
0 0 0 1
0 1 0 0








The LUP decomposition is

P A






0.4

−0.2 −0.5

0.6

0.4











−2 0.4 −0.2

−0.5

−3






LUP Decomposition Algorithm

LUP-D

ECOMPOSITION

(A)

← rows[A]

for i

← 1 to n

do π[i]

← i

for k

← 1 to n

do p

← 0

for i

← k to n

do if

> p

then p

← |a

← i

if p = 0

then error “singular matrix”

exchange π[k]

↔ π[k

]

for i

← k to n

do exchange a

↔ a

for i

← k + 1 to n

do a

↔ a

for j

← k + 1 to n

do a

← a

− a

Computing Matrix Inverse

Problem 1 The problem of computing the

inverse of A

×n

is to ﬁnd X

×n

such that

AX = I

(16)

We will prove that

Theorem 2 Matrix multiplication and com-

puting the inverse of a matrix are equivalently

hard problem.

Inverse by LUP Decomposition

To solve AX = I

is equivalent to solve the

following sets of linear equations:

= b

, AX

= b

· · · , AX

= b

where b

= (0,

· · · ,

−1

0 ,

i+1

0 ,

· · · , 0)

Algorithm: By LUP decomposition, we get

the column vectors X

, X

· · · , X

and X =

· · · X

) is the inverse of A.

Positive-definite Matrix

Definition 8 A matrix A

×n

is called

positive-

deﬁnite

(PD) if x

Ax > 0 for all size-n vectors

= 0.

Example 4 The identity matrix I

is positive-

deﬁnite because x

x = x

x =

i=1

Homework 3

For any matrix A

×n

with full

column rank, the matrix A

A is positive-deﬁnite.

Properties

Property 3 Any positive-deﬁnite matrix is nonsigular.

Proof If not,

∃x s.t. Ax = 0, then x

Ax = 0 which is

a contradiction with assumption.

Property 4 If A is a symmetric positive-deﬁnite (SPD)
matrix, then every leading submatrix of A is symmetric
positive-deﬁnite.

Proof It is obvious that the leading submatrix A

symmetic.

If A

is not positive-deﬁnite, then there

exists x

such that x

≤ 0. Deﬁne x = (x

a size-n vector, then

≤ 0

which contradicts A being positive-deﬁnite.

Schur Complement Lemma

Definition 9 Let A be a symmetric positive-

deﬁnite (SPD) matrix, and A

is the leading

sunmatrix of A. Partition A as

A =

(17)

The

Schur complement

of A with respect to

is deﬁned by

S = C

− BA

−1

(18)

Lemma 1 Matrix (18) is symmetric and positive-

deﬁnite.

Proof of Schur’s Lemma

Because A is symmetric, so is the submatrix
C.

And BA

−1

is also symmetric (

home-

work

). Therefore, S is symmetric.

0 < x

Ax = (y

)

y
z

= y

y + y

z + z

By + z

= (y + A

−1

(y + A

−1

z)+

− BA

−1

For any z, choose y =

−A

−1

z, then

− BA

−1

)z = z

Sz > 0

Why Without Pivoting?

Corollary 1 LU decomposition of a symmet-

ric positive-deﬁnite matrix never causes a di-

vision by 0.

Proof If A is symmetric positive-deﬁnite, then

= e

> 0, where e

is the ﬁrst unit

vector. The the Schur complement of A

) is also symmetric positive-deﬁnite, whose

LU keeps on without pivoting.

LU Method without Pivoting

Because Ax = b

⇔ A

Ax = A

b and A

is symmetric positive-deﬁnite matrix, the LU

method for (A

A)x = A

b does not need piv-

oting. But, in practice,

LUP-D

ECOMPOSITION

still works better than it.

Homework 4

Find out the reason that

LUP-

ECOMPOSITION

still works better than LU

method without pivoting.

Winograd Theorem

Theorem 3 If we can invert an n

× n matrix in time

I(n) = Ω(n

) and I(n) satisﬁes the

regularity condi-

tion

∗

I(3n) = O(I(n)), then we can multiply two n

× n

matrixes in time O(I(n)).

Proof Given A

×n

and B

×n

, construct

D =







Then, the inverse of D is

−1




−A AB

−B




∗

I(n) satisﬁes regularity condition whenever I(n) =
Θ(n

log

n) for any constants c > 0, d

≥ 0.

Aho-Hopcroft-Ullman Theorem

Theorem 4 Suppose we can multiply two n

n matrixes in time M (n) = Ω(n

) and M (n)

satisﬁes the two regularity conditions:

1. M (n + k) = O(M (n)) for 0

≤ k ≤ n, and

2. M (n/2)

≤ cM(n) for some constant c < 1/2.

Then, we can compute the inverse of any real

nonsingular n

× n matrix in time O(M(n)).

Note

For convenience, Aho-Hopcroft-Ullman

is abbreviated by AHU.

Proof of AHU Theorem

Step 1: Let A

×n

be a symmetric positive-deﬁnite ma-

trix, where n is an exact power of 2. We partition A
into four

n
2

matrixes just like (17). Then

−1

−

−1

−

−1

(19)

We just need four multiplications of

n
2

matrixes:

(1) K = CB

−1

, (2) CB

−1

= KC

, (3)

= S

−1

K, (4) (CB

−1

)

−1

) =

Thus, we have the recursion:

I(n)

≤ 2I(n/2) + 4M(n/2) + O(n

)

2I(n/2) + Θ(M (n))

//because M (n) = Ω(n

)

O(M (n))

//Master Theorem (3)

Step 2: For any given invertible A, A

−1

= (A

−1

Application of SPD Matrix

One important application of SPD matrix is

the method of

least-squares approximation

Problem 2 Given a sample

{(x

, y

), (x

, y

· · · , (x

, y

)

}, to estimate the c

’s in F (x) =

i=1

(x) such that

i=1

(F (x

)

− y

)

i=1

(20)

reaches the least, where η

is called the

ap-

proximation error

Regression Analysis

Representation in Matrix

Definition 10 Deﬁne m

× n matrix A, deﬁne

two size-n vectors c and η:

A =




)

· · ·

)

· · ·

)

· · ·

)




, c =







, η =







then η = Ac

− y.

The problem of (20) is just to minimize

Ac − y

i=1




j=1

− y




(21)

Pseudoinverse

Definition 11 For any matrix A

×n

, A

−1

is called the

pseudoinverse

of A.

Property 5 If A is an m

× n matrix, then its

pseudoinverse is an n

× m matrix.

Note

Pseudoinverse is a generalization of in-

verse when A is not square.

Why do we need pseudoinverse?

Least-squares Approximation

Because

∂

∂c

i=1




j=1

− y




= 0

thus c is the solution of

(Ac

− y)

A = 0

(Ac

− y) = 0

Therefore, c = ((A

−1

)y = A

Theorem 5 The solution of least-squares ap-

proximation is c = A

Note

In practice, Firstly A

y, then ﬁnding

an LU decomposition of A

Example of Least Square

Example 5 Let (x

, y

) = (

−1, 2), (x

, y

) = (1, 1), (x

, y

) =

(2, 1), (x

, y

) = (3, 0), (x

, y

) = (5, 3), and the ap-

proximation function is F (x) = c

+ c

x + c

. Then

A =








2
1

2
2

2
3

2
4

2
5















−1








The pseudoinverse of A is:




0.500

0.300

0.200

0.100

−0.100

−0.388

0.093

0.190

0.193

−0.088

0.060

−0.036 −0.048 −0.036

0.060




Multiply y by A

, we get c =




1.200

−0.757

0.214




, then the

approximation function is 1.200

− 0.757x + 0.214x

Homework

Homework 5

Find the function of the form

F (x) = c

+ c

xlogx + c

that is the best least-squares ﬁt to the data

points: (1, 2), (2, 1), (3, 3), (4, 8).

Homework 6

Solve the following equation by

LUP decomposition:






1 5 4
2 0 3
5 8 2





















9
5






Conclusion

1. The properties of symmetric positive-deﬁnite

matrix are quite interesting.

2. We have not known the exact complex-

ity of matrix multiplication and comput-

ing the inverse of matrix.

3. Singular Value Decomposition (SVD) method

is also important in Matrix Computation.

References

1. T. H. Cormen, C. E. Leiserson, R. L.

Rivest and C. Stein (2001),

Introduction

to Algorithms

(the second edition). The

MIT Press

2. G. H. Golub and C. F. von Loan (1996),

Matrix Computation

. The Johns Hopkins

University Press.

3. H. S. Wilf (1994),

Algorithm and Com-

plexity

Draft of the Internet edition at

http://www.cis.upenn.edu/wilf

Thank you

for your attention!

Wyszukiwarka

Podobne podstrony:
The uA741 Operational Amplifier[1]
operatory i funkcje matematyczne
operator maszyn lesnych 833[02] o1 03 n
mechanik operator pojazdow i maszyn rolniczych 723[03] z2 04 n
Kierowca operator wózków jezdniowych 833401
mechanik operator pojazdow i maszyn rolniczych 723[03] o1 05 u
OPERAT STABLE VERSION ugoda id Nieznany
operator urzadzen przemyslu szklarskiego 813[02] z2 07 n
4 Steyr Operation and Maintenance Manual 8th edition Feb 08
operator urzadzen przemyslu spozywczego 827[01] z2 02 u
mechanik operator pojazdow i maszyn rolniczych 723[03] z3 02 n

więcej podobnych podstron