Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like `numpy.corrcoef`

that will work on a scipy sparse matrix.

You can compute the correlation coefficients fairly straightforwardly from the covariance matrix like this:

```
import numpy as np
from scipy import sparse
def sparse_corrcoef(A, B=None):
if B is not None:
A = sparse.vstack((A, B), format='csr')
A = A.astype(np.float64)
# compute the covariance matrix
# (see http://stackoverflow.com/questions/16062804/)
A = A - A.mean(1)
norm = A.shape[1] - 1.
C = np.dot(A, A.T.conjugate()) / norm
# the correlation coefficients are given by
# C_{i,j} / sqrt(C_{ii} * C_{jj})
d = np.diag(C)
coeffs = C / np.sqrt(np.outer(d, d))
return coeffs
```

Check that it works OK:

```
# some smallish sparse random matrices
>>> a = sparse.rand(100, 100000, density=0.1, format='csr')
>>> b = sparse.rand(100, 100000, density=0.1, format='csr')
>>> coeffs1 = sparse_corrcoef(a, b)
>>> coeffs2 = np.corrcoef(a.todense(), b.todense())
>>> print np.allclose(coeffs1, coeffs2)
# True
```

Just using numpy:

```
import numpy as np
C=((A.T*A -(sum(A).T*sum(A)/N))/(N-1)).todense()
V=np.sqrt(np.mat(np.diag(C)).T*np.mat(np.diag(C)))
COV = np.divide(C,V+1e-119)
```

Similar Questions

I have 82 .csv files, each of them a zoo object, with the following format: Index, code, pp 1951-01-01, 2030, 22.9 1951-01-02, 2030, 0.5 1951-01-03, 2030, 0.0 I want to do a correlation matrix b

We have an application that stores a sparse matrix. This matrix has entries that mostly exist around the main diagonal of the matrix. I was wondering if there were any efficient algorithms (or existin

I have a very large Scipy sparse (csr) matrix. I can't use M.toarray() since it triggers ValueError: array is too big. Is there a way of saving a Scipy sparse matrix in Python to be read in Matlab? I

I have the following sparse matrix that contains O(N) elements boost::numeric::ublas::compressed_matrix<int> adjacency (N, N); I could write a brute force double loop to go over all the entries

I have the following matrix which I believe is sparse. I tried converting to dense using the x.dense format but it never worked. Any suggestions as to how to do this?, thanks. mx=[[(0, 2), (1, 1), (2,

I have a large (500k by 500k), sparse matrix. I would like to get the principle components of it (in fact, even computing just the largest PC would be fine). Randomized PCA works great, except that it

I have a correlation matrix: cor.table <- matrix( sample( c(0.9,-0.9) , 2500 , prob = c( 0.8 , 0.2 ) , repl = TRUE ) , 50 , 50 ) diag(cor.table) <- 1 I try to do eigenvalue decomposition: libra

I have two sparse matrix A (affinity matrix) and D (Diagonal matrix) with dimension 100000*100000. I have to compute the Laplacian matrix L = D^(-1/2)*A*D^(-1/2). I am using scipy CSR format for spars

I'm looking for a library to do huge Sparse Matrix x Vector multiplication. The matrix itself will almost fill the RAM. I've found Eigen3, OSKI and some basic Sparse BLAS implementations. Are there ot

Is there a library with the inverse function included? I am currently working on a direction finding algorithm as part of a project. I am using the Bartlett Correlation. In the Bartlett correlation I

A have a n x m matrix in which row i represents the timeseries of the variable V_i. I would like to compute the n x n correlation matrix M, where M_{i,j} contains the correlation coefficient (Pearson'

I am attempting to cluster a set of data points that are represented as a sparse scipy matrix, X. That is, >>> print type(X) <class 'scipy.sparse.csr.csr_matrix'> >>> print X.s

What do you think? What would be faster and how much faster: Doing sparse matrix (CSR) multiplication (with a vector) on the GPU or the CPU (multithreaded)?

I'm reading through instructions of Matrix package in R. But I couldn't understand the p argument in function: sparseMatrix(i = ep, j = ep, p, x, dims, dimnames, symmetric = FALSE, index1 = TRUE, give

After learning about the options for working with sparse matrices in R, I want to use the Matrix package to create a sparse matrix from the following data frame and have all other elements be NA. s r

In R, I use cov2cor() to calculate a correlation matrix like: A,B,C,... A 1,0.5,0.2,... B 0.5,1,0.4,... C 0.2,0.4,1,... ... How can I reshape the matrix so that the columns are stacked in rows like:

Let us say I have a very large correlation matrix of this form: t1.rep1 = rnorm(n=100,mean=10,sd=) t2.rep1 = t1.rep1 + rnorm(n=100,mean=3,sd=2) t3.rep1 = t1.rep1 + rnorm(n=100,mean=2,sd=2) t1.rep2 = r

I would like to know how to plot a correlation matrix similar to the example below using GNUPlot (possibly from Octave, if that makes answers easier, but that's really not necessary): The input is a

I am using the cusp library with CUDA to use sparse matrix. Can't I use it in a struct in C like: #include <cusp/coo_matrix.h> #include <cusp/multiply.h> #include <cusp/print.h> #in

I find many similar questions but no answer. For simple array there is multiprocessing.Array. For sparse matrix or any other arbitrary object I find manager.namespace. So I tried the code below: from

I see 2 implementations of sparse matrix in this package. OpenMapRealMatrix SparseFieldMatrix Both are documented as Sparse matrix implementation based on an open addressed map. Do you know what a

I have the following code in Python using Numpy: p = np.diag(1.0 / np.array(x)) How can I transform it to get the sparse matrix p2 with the same values as p without creating p first?

I want to do SVD on a sparse matrix by using scipy: from svd import compute_svd print(The size of raw matrix: +str(len(raw_matrix))+ * +str(len(raw_matrix[0]))) from scipy.sparse import dok_matrix

I'm making a little program to make a representation of sparse matrixes (a matrix with a lot of elements equal to zero). Represented like this page 108 (I think watching at the figure is enough to und

I just started to learn to program in Python and I am trying to construct a sparse matrix using Scipy package. I found that there are different types of sparse matrices, but all of them require to sto

I'm currently working with sparse matrices, and I have to compare the computation time of sparse matrix-matrix multiplication with full matrix-matrix multiplication. The issue is that sparse matrix co

In a scipy program I'm creating a dia_matrix (sparse matrix type) with 5 diagonals. The centre diagonal the +1 & -1 diagonals and the +4 & -4 diagonals (usually >> 4, but the principle i

I have a scipy sparse matrix, where I need to add multiple rows (in blocks), say 1:30, then 45:50, etc. What is the most efficient way to do this?

I have this sparse matrix of size 20 millionx20million in matlab. I want to get around 40000 specific rows from this matrix. If I do new_data = data_original(index,:) where index consists of the rows

Does anyone know how to perform svd operation on a sparse matrix in python? It seems that there is no such functionality provided in scipy.sparse.linalg.

Suppose we want to compute C=A*B for given sparse matrices A,B but are interested in a very small subset of entries of C, represented by a list of index pairs: rows=[i1, i2, i3 ... ] cols=[j1, j2, j3

How can I create a sparse matrix from a list of dimension names? Suppose you have this matrix edgelist in a data frame: from to weight 1 4 a 1 2 5 b 2 3 6 c 3 It can be created like this: from <-

I need a command to check for zero sparse matrix, isempty(..) does not work. Is there some sparse version of isempty(..)? >> mlf2=sparse([],[],[],2^31+1,1) mlf2 = All zero sparse: 2147483649-by-

I'm having trouble resizing a matrix - the set_shape function seems to have no effect: >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInk

I am creating a matrix from a Pandas dataframe as follows: dense_matrix = numpy.array(df.as_matrix(columns = None), dtype=bool).astype(np.int) And then into a sparse matrix with: sparse_matrix = scip

I have a matrix x = [0 0 0 1 1 0 5 0 7 0] I need to remove all of the zeros such as x=[1 1 5 7] The matrices I am using are large (1x15000) I need to do this multiple times (5000+) Efficiency is key

I want to create a correlation matrix given the correlation vector, which is the upper (or lower) triangular matrix of the correlation matrix. The goal is to transform this vector to this correlation

Is there any sparse matrix library that can do these: solve linear algebraic equations support operations like matrix-matrix/number multiplication/addition/subtraction,matrix transposition, get a row

This question already has an answer here: Sparse matrices / arrays in Java 11 answers I need to implement a sparse matrix as efficiently memory-wise as I can in Java.I receive a matrix with mor

I have a sparse matrix that represents a 3D rectangular space. Along some of the boundaries, I know what the value is going to be (it's a constant). The other boundaries may be reflective, differentia

If I have two different data sets that are in a time series, is there a simple way to find the correlation between the two sets in python? For example with: # [ (dateTimeObject, y, z) ... ] x = [ (8:0

I am using Scipy sparse matrix csr_matrix to be used as context vectors in word-context vectors. My csr_matrix is a (1, 300) shape so it is a 1-dimensional vector. I need to use permutation (circular

I am trying to find an efficient way to retrieve a list / vector / array of the non-zero upper triangular elements of a sparse matrix in R. For example: library(igraph) Gmini <- as.directed(graph.

I am working on a project that involves the computation of the eigenvectors of a very large sparse matrix. To be more specific I have a Matrix that is the laplacian of a big graph and I am interested

I am new to the use of sparse matrices, but now need to utilize one in my work to save space. I understand that the following matrix: 10 0 0 0 -2 0 3 9 0 0 0 3 0 7 8 7 0 0 3 0 8 7 5 0 0 8 0 9 9 13 0 4

So I have this matrix here, and it is of size 13 x 8198. (I have called it 'blah'). This is a sparse matrix, in that, most of its entries are 0. When I do an imagesc(blah), I get the following image:

I need to create a matrix with values from a numpy array. The values should be distributed over the matrix lines according to an array of indices. Like this: >>> values array([ 0.73620381, 0.

I would like to extract specific rows and columns from a scipy sparse matrix - probably lil_matrix will be the best choice here. It works fine here: from scipy import sparse lilm=sparse.lil_matrix((10

I am looking for sparse matrix representation that allow for efficient row and column swaping. The classic representation (by compressed row,compressed column or triplets) seems to only allow to perfo

I am trying to show correlation between two individual lists. Before installing Numpy, I parsed World Bank data for GDP values and the number of internet users and stored them in two separate lists. H