Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like `numpy.corrcoef`

that will work on a scipy sparse matrix.

You can compute the correlation coefficients fairly straightforwardly from the covariance matrix like this:

```
import numpy as np
from scipy import sparse
def sparse_corrcoef(A, B=None):
if B is not None:
A = sparse.vstack((A, B), format='csr')
A = A.astype(np.float64)
# compute the covariance matrix
# (see http://stackoverflow.com/questions/16062804/)
A = A - A.mean(1)
norm = A.shape[1] - 1.
C = np.dot(A, A.T.conjugate()) / norm
# the correlation coefficients are given by
# C_{i,j} / sqrt(C_{ii} * C_{jj})
d = np.diag(C)
coeffs = C / np.sqrt(np.outer(d, d))
return coeffs
```

Check that it works OK:

```
# some smallish sparse random matrices
>>> a = sparse.rand(100, 100000, density=0.1, format='csr')
>>> b = sparse.rand(100, 100000, density=0.1, format='csr')
>>> coeffs1 = sparse_corrcoef(a, b)
>>> coeffs2 = np.corrcoef(a.todense(), b.todense())
>>> print np.allclose(coeffs1, coeffs2)
# True
```

Just using numpy:

```
import numpy as np
C=((A.T*A -(sum(A).T*sum(A)/N))/(N-1)).todense()
V=np.sqrt(np.mat(np.diag(C)).T*np.mat(np.diag(C)))
COV = np.divide(C,V+1e-119)
```

