Research to the People
  • What is Research to the People?
  • About the Data
    • What Data Do We Work With?
    • Recommended: External Data Sources
  • Hacking on the Cloud
    • Getting Set-up on Google Cloud
    • Cloud Toolbox
  • Biology-AI Toolbox
    • Overview
  • Specialized Biological Domains
    • Overview
    • Cancer Fundamentals
    • Cancer Analysis Approaches: Bio/AI
    • SVAI Research Team MVPs
  • Biological Fundamentals
    • Overview
    • Genome Analysis: The Basics
    • Proteome Analysis: The Basics
    • Transcriptome Analysis: The Basics
    • Genomic Applications
    • Transcriptomic Applications
    • Proteomic Applications
    • Multi-omics Bioinformatic Applications
  • AI fundamentals
    • Overview
    • Computational Linear Algebra Techniques
    • Machine Learning Heuristics
    • Types of Machine Learning problems: Supervised, Unsupervised, and Reinforcement Learning
    • Fundamental ML Models
    • ML Applications
    • Networks: Another type of ML topic
    • Deep Learning Fundamentals
    • You Don't Have Enough DATA
    • CNNs: An Overview
    • RNNs: An Overview
    • GANs: An overview
    • Deep Belief Networks: Deep Dive
    • Autoencoders: Deep Dive
    • DL Applications
Powered by GitBook
On this page
  • Introduction
  • Measuring Size of Vector/Matrix: Norms
  • Special Properties: Orthogonality
  • Eigen-decomposition
  • Linear Dimensional Reduction
  • Matrix Decompostition/Matrix Factorization
  • SVD
  • Nonnegative matrix factorization (NMF)
  • Truncated SVD
  • Stochastic Gradient Descent (SGD)
  • Applying SGD to NMF
  1. AI fundamentals

Computational Linear Algebra Techniques

Created by Lily Vittayarukskul for SVAI research community. Open to collaborators!

PreviousOverviewNextMachine Learning Heuristics

Last updated 6 years ago

Most of knowledge derived from .

Introduction

This section assumes you know foundational linear algebra, and we'll just emphasize a few key concepts crucial to intuition behind (mostly) deep learning techniques.

Measuring Size of Vector/Matrix: Norms

The L2 norm is used so frequently in machine learning that it is often denoted simply as ||x||, with the subscript 2 omitted. For computational convenience, we actually use the squared L2 norm, which can be calculated simply as x_transpose*x. However, a trade-off to recognize is that the norm increases really slowly near the origin, and making it difficult to discriminate between elements that are exactly zero and elements that are small but nonzero. In the case that this discrimination is important, we turn to the L1 norm.

With L1 norm, the function grows at the same rate at all locations.

Sometimes we may also wish to measure the size of a matrix. In the context of deep learning, the most common way to do this is with Frobenius norm, which is analogous to the L2 norm of a vector.

Special Properties: Orthogonality

A vector x and a vector y are orthogonal to each other if x_transpose*y= 0. This is a way to say that the two vector are independent of each other. If these vectors are also expressed in standard unit terms, then we can also call them orthonormal. We tend to favor orthonormal matrices because they are cheap to compute.

Eigen-decomposition

We can also decompose matrices in ways that show us information about their functional properties that is not obvious from the representation of the matrix as an array of elements. One of the most widely used kinds of matrix decomposition is called eigendecomposition, in which we decompose a matrix into a set of eigenvectors and eigenvalues.

To explore this possibility, let's start with a foundation property: An eigenvector of a square matrix A is a nonzero vector v such that multiplication by A alters only the scale of v:

If we found a set of n independent eigenvectors (corresponding to n unique λ's) that belong to A, we can concatenate these eigenvectors into a square matrix V, such that eigendecomposition of A is then given by:

Each of these matrices is defined to have a special structure. The matrices U and V are both defined to be orthogonal matrices. The matrix D is defined to be a diagonal matrix (Note that D is not necessarily square).

Building off of SVD, we can find what's called a pseudoinverse:

We can now produce some really powerful practical properties:

  • When A has more columns than rows, then solving a linear equation using the pseudoinverse provides the solution x=A+y with minimal Euclidean norm||x||2 among all possible solutions.

  • When A has more rows than columns, it is possible for there to be no solution. In this case, using the pseudoinverse gives us the x for which Ax is as close as possible to y in terms of Euclidean norm ||Ax − y||.

Linear Dimensional Reduction

The determinant of a square matrix maps matrices to real value scalars that gives us insight into how much multiplication by the matrix expands or contracts space. If the determinant is 0, then space is contracted completely along at least one dimension, causing it to lose all its volume. If the determinant is 1, then the transformation preserves volume.

The determinant can be used to ultimately find eigenvalues of a matrix (via the characteristic equation). We can utilize eigenvectors and eigenvalues of a matrix to perform a principle component analysis (PCA), which is a linea

Matrix Decompostition/Matrix Factorization

SVD

Nonnegative matrix factorization (NMF)

Truncated SVD

Stochastic Gradient Descent (SGD)

Applying SGD to NMF

This is great! We've decomposed a matrix A -- however, an important fact is that not every real matrix A has a eigendecomposition . But every real matrix does have a singular value decomposition (SVD). SVD is really similar to eigendecomposition, but this time we will write A as a product of three matrices:

Save a lot of computational time using only subset of columns of interest -- just interested in vectors corresponding to largest singular value

😢
this source
scalar λ is known as the eigenvalue corresponding to this eigenvector.
A is an m ×n matrix. Then U is defined to be an m ×m matrix, D to be an m × n matrix, and V to be an n × n matrix.
where U, D and V are the singular value decomposition of A, and the pseudoinverse D+ of a diagonal matrix D is obtained by taking the reciprocal of its nonzero elements then taking the transpose of the resulting matrix