I've got a collection of O(N) NxN `scipy.sparse.csr_matrix`

, and each sparse matrix has on the order of N elements set. I want to add all these matrices together to get a regular NxN numpy array. (N is on the order of 1000). The arrangement of non-zero elements within the matrices is such that the resulting sum certainly isn't sparse (virtually no zero elements left in fact).

At the moment I'm just doing

```
reduce(lambda x,y: x+y,[m.toarray() for m in my_sparse_matrices])
```

which works but is a bit slow: of course the sheer amount of pointless processing of zeros which is going on there is absolutely horrific.

Is there a better way ? There's nothing obvious to me in the docs.

**Update:** as per user545424's suggestion, I tried the alternative scheme of summing the sparse matrices, and also summing sparse matrices onto a dense matrix. The code below shows all approaches to run in comparable time (Python 2.6.6 on amd64 Debian/Squeeze on a quad-core i7)

```
import numpy as np
import numpy.random
import scipy
import scipy.sparse
import time
N=768
S=768
D=3
def mkrandomsparse():
m=np.zeros((S,S),dtype=np.float32)
r=np.random.random_integers(0,S-1,D*S)
c=np.random.random_integers(0,S-1,D*S)
for e in zip(r,c):
m[e[0],e[1]]=1.0
return scipy.sparse.csr_matrix(m)
M=[mkrandomsparse() for i in xrange(N)]
def plus_dense():
return reduce(lambda x,y: x+y,[m.toarray() for m in M])
def plus_sparse():
return reduce(lambda x,y: x+y,M).toarray()
def sum_dense():
return sum([m.toarray() for m in M])
def sum_sparse():
return sum(M[1:],M[0]).toarray()
def sum_combo(): # Sum the sparse matrices 'onto' a dense matrix?
return sum(M,np.zeros((S,S),dtype=np.float32))
def benchmark(fn):
t0=time.time()
fn()
t1=time.time()
print "{0:16}: {1:.3f}s".format(fn.__name__,t1-t0)
for i in xrange(4):
benchmark(plus_dense)
benchmark(plus_sparse)
benchmark(sum_dense)
benchmark(sum_sparse)
benchmark(sum_combo)
print
```

and logs out

```
plus_dense : 1.368s
plus_sparse : 1.405s
sum_dense : 1.368s
sum_sparse : 1.406s
sum_combo : 1.039s
```

although you can get one approach or the other to come out ahead by a factor of 2 or so by messing with N,S,D parameters... but nothing like the order of magnitude improvement you'd hope to see from considering the number of zero adds it should be possible to skip.

Can't you just add them together before converting to a dense matrix?

```
>>> sum(my_sparse_matrices[1:],my_sparse_matrices[0]).todense()
```

This is not a complete answer (and I too would like to see a more detailed response), but you can gain an easy factor of two or more improvement by not creating intermediate results:

```
def sum_dense():
return sum([m.toarray() for m in M])
def sum_dense2():
return sum((m.toarray() for m in M))
```

On my machine (YMMV), this results in being the fastest computation. By placing the summation in a `()`

rather than a `[]`

, we construct a generator rather than the whole list before the summation.

I think I've found a way to speed it up by a factor of ~10 if your matrices are very sparse.

```
In [1]: from scipy.sparse import csr_matrix
In [2]: def sum_sparse(m):
...: x = np.zeros(m[0].shape)
...: for a in m:
...: ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr))
...: x[ri,a.indices] += a.data
...: return x
...:
In [6]: m = [np.zeros((100,100)) for i in range(1000)]
In [7]: for x in m:
...: x.ravel()[np.random.randint(0,x.size,10)] = 1.0
...:
m = [csr_matrix(x) for x in m]
In [17]: (sum(m[1:],m[0]).todense() == sum_sparse(m)).all()
Out[17]: True
In [18]: %timeit sum(m[1:],m[0]).todense()
10 loops, best of 3: 145 ms per loop
In [19]: %timeit sum_sparse(m)
100 loops, best of 3: 18.5 ms per loop
```

@user545424 has already posted what will likely be the fastest solution. Something in the same spirit that is more readable and ~same speed... **nonzero()** has all kinds of useful applications.

```
def sum_sparse(m):
x = np.zeros(m[0].shape,m[0].dtype)
for a in m:
# old lines
#ri = np.repeat(np.arange(a.shape[0]),np.diff(a.indptr))
#x[ri,a.indices] += a.data
# new line
x[a.nonzero()] += a.data
return x
```

Similar Questions

I was wondering if there is a operator for element-wise multiplication of rows of a sparse matrix with a vector in scipy.sparse library. Something similar to A*b for numpy arrays? Thanks.

I've a Sparse matrix in CSR Sparse format in python and I want to import it to MATLAB. MATLAB does not have a CSR Sparse format. It has only 1 Sparse format for all kind of matrices. Since the matrix

I am trying to figure out the fastest method to find the determinant of sparse symmetric and real matrices in python. using scipy sparse module but really surprised that there is no determinant functi

I got Numpy and Matplotlib running on Heroku, and I'm trying to install Scipy as well. However, Scipy requires BLAS[1] to install, which is not presented on the Heroku platform. After contacting Herok

I need to do some mathematics operations on sparse matrices. I noticed that using arrays may not be the most efficient way to utilize my memory, especially since the matrices may have over 200 rows. I

The SciPy Sparse Matrix tutorial is very good -- but it actually leaves the section on slicing un(der)developed (still in outline form -- see section: Handling Sparse Matrices). I will try and updat

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix). In particular, I will need to

I have to invert a large sparse matrix. I cannot escape from the matrix inversion, the only shortcut would be to just get an idea of the main diagonal elements, and ignore the off-diagonal elements (I

Good Afternoon, I am trying to do: scipy.sparse.dia_matrx(x, shape = (x.size, x.size)) but the resulting shape of the matrix is x.size x 1. Am i doing something wrong? Or did i miss something in the

I have a large (100K by 30K) and (very) sparse dataset in svmlight format which I load as follows: import numpy as np from scipy.cluster.vq import kmeans2 from scipy.spatial.distance import pdist, sq

I would like to extract specific rows and columns from a scipy sparse matrix - probably lil_matrix will be the best choice here. It works fine here: from scipy import sparse lilm=sparse.lil_matrix((10

I'm trying to persist a collection containing ~3000 objects as efficiently as possible. Here's what I get in the logs: Hibernate: insert into Animal (name, id) values (?, ?) \ ... | Hibernate: insert

I'd like to pass a sparse precomputed Gram matrix to sklearn.svm.SVC.fit. Here's some working code: import numpy as np from sklearn import svm X = np.array([[0, 0], [1, 1]]) y = [0, 1] clf = svm.SVC(k

I have a scipy.sparse.csr.csr_matrix which is the output from TfidfVectorizer() class. I know I can access the individual components of this matrix in this manner: So if I have this matrix here: tf_id

I get the following error message using python v2.7.3 and scipy v0.11.0 with py2exe v0.6.9: ImportError: No module named _csr my setup.py: from distutils.core import setup import py2exe setup(console=

I am trying to diagonalise a complex symmetric matrix in python. I had a look at numpy and scipy linalg routines but they all seem to deal with either hermitian or real symmetric matrices. What I am

I have a scipy.sparse.dok_matrix (dimensions m x n), wanting to add a flat numpy-array with length m. for col in xrange(n): dense_array = ... dok_matrix[:,col] = dense_array However, this code raises

I'm trying to find a way to subtract a column of a scipy.sparse matrix from a numpy vector but I can't seem to find a way to do it without changing the shape of the vector. This is what I have so far:

I am starting dealing with sparse matrices so I'm not really proficient on this topic. My problem is, I have a simple coo-occurrences matrix from a word list, just a 2-dimensional co-occurrence matrix

Suppose we want to compute C=A*B for given sparse matrices A,B but are interested in a very small subset of entries of C, represented by a list of index pairs: rows=[i1, i2, i3 ... ] cols=[j1, j2, j3

I have two sparse matrices A (logical, 80274 x 80274) and B (non-negative integer, 21018 x 80274) and a vector c (positive integer, 21018 x 1). I'd like to find the result res (logical, 21018 x 80274)

I am working on an algorithm that uses diagonal and first off-diagonal blocks of a large (will be e06 x e06) block diagonal sparse matrix. Right now I create a dict that stores the blocks in such a wa

Suppose I have a 2d sparse array. In my real usecase both the number of rows and columns are much bigger (say 20000 and 50000) hence it cannot fit in memory when a dense representation is used: >&g

I currently want to multiply a large sparse matrix(~1M x 200k) with its transpose. The values of the resulting matrix would be in float. I tried loading the matrix in scipy's sparse matrix and by mul

Is there support for sparse matrices in python? Possibly in numpy or in scipy?

I want to know how to add sparse matrices in python. I have a program that breaks a big task into subtasks and distributes them across several cpus. Each subtask yields a result (a scipy sparse matrix

I have a large matrix that I would like to convert to sparse CSR format. When I do: import scipy as sp Ks = sp.sparse.csr_matrix(A) print Ks Where A is dense, I get (0, 0) -2116689024.0 (0, 1) 39462

I know in python it's hard to see the memory usage of an object. Is it easier to do this for SciPy objects (for example, sparse matrix)?

if you have this hierarchical clustering call in scipy in Python: from scipy.cluster.hierarchy import linkage # dist_matrix is long form distance matrix linkage_matrix = linkage(squareform(dist_matrix

What is the function of copy argument in construction of scipy sparse arrays? scipy.sparse.lil_matrix(arg1, shape=None, dtype=None, copy=False) It doesn't seem to do anything! When I construct a spar

I can't find more info about scipy.sparse indexing except SciPy v0.11 Reference Guide, which says that The lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy ar

I am new to the use of sparse matrices, but now need to utilize one in my work to save space. I understand that the following matrix: 10 0 0 0 -2 0 3 9 0 0 0 3 0 7 8 7 0 0 3 0 8 7 5 0 0 8 0 9 9 13 0 4

I'm trying to manipulate some data in a sparse matrix. Once I've created one, how do I add / alter / update values in it? This seems very basic, but I can't find it in the documentation for the sparse

I am working on a code that deals with large networks and therefor I am restricted to use pythons scipy.sparse matrices (desirably csr). Now I would like to write a function, that takes two sparse, bo

I'm working with arrays of sparse matrices using SciPy, and sometimes some of the matrices have no rows or columns (that is, shape = (0,0)). I would like my various other functions to just ignore thes

I'm not sure I understand sparse indexes correctly. I have a sparse unique index on fbId { ns : mydb.users, key : { fbId : 1 }, name : fbId_1, unique : true, sparse : true, backgroun

I would appreciate any help, to understand following behavior when slicing a lil_matrix (A) from the scipy.sparse package. Actually, I would like to extract a submatrix based on an arbitrary index lis

I have three large matrices: I, G, and G^2. These are 4Million x 4Million matrices and they are sparse. I would like to check if they are linearly independent and I would like to do this in R. For sm

I was wondering if there's any way to have a scipy.sparse.csc_matrix format for mlpy in python. I have worked with mlpy before and have always dealt with non sparse matrices. For instance if I have 5

I'm trying to upgrade Scipy from 0.9.0 to 0.12.0. I use the command: sudo pip install --upgrade scipy and I get all sorts of errors which can be seen in the pip.log file here and I'm unfortunately no

Is there a sparse matrix library that copies the functionality of dense BLAS? I'd want at least: efficient SYR and SYRK (rank-k update) with sparse input (and possibly dense output), option for spars

I've written a routine in C++ that solves the system of equations Ax = b using Gauss-Seidel method. However, I want to use this code for specific A matrices that are sparse (most of the elements are

I have the following warnings being thrown by sparse when I run spare on my Linux driver with the following options: make C=2 CF=-D__CHECK_ENDIAN__ My function is: static inline u8 rsi_get_register_ad

I have a 1034_by_1034 sparse matrix (scipy.sparse.csr.csr_matrix), which basically represents the adjacency matrix of a graph. I want to check if some elements are ones or not. But I found this to be

Whats the main difference between a Collection of String (Collection)and a simple, plain collection?

what's the best suitable data structure to use in C for sparse dynamic matrix. I know about the Yale format but it's for static matrices. I need to be able to add rows column and values in it. Thanks

I am using Enthought Canopy and recently upgraded both Scipy and numpy to the following: scipy: 0.13 build 2 numpy: 1.8 build 1 When I attempt: from scipy import stats I receive the following error:

I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb... I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going f

I need to hold a 50,000x50,000 sparse matrix/2d-array, with ~5% of the cells, uniformly distributed, being non-empty. I will need to: edit I need to do this in numpy/scipy, sorry if wasn't clear. Also

I have 2 matrices and a vector which I multiply by using the dot() function of numpy. print D.shape, A.shape, y.shape, type(D), type(A), type(y) # (236, 236) (236, 236) (236,) # <class 'scipy.spars