I'm doing a project and I'm doing a lot of matrix computation in it.

I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M non-zeros values. The example below is just to see my point.

Let's say I have:

- A vector v of size (2)
- A vector c of size (3)
A sparse matrix X of size (2,3)

`v = np.asarray([10, 20]) c = np.asarray([ 2, 3, 4]) data = np.array([1, 1, 1, 1]) row = np.array([0, 0, 1, 1]) col = np.array([1, 2, 0, 2]) X = coo_matrix((data,(row,col)), shape=(2,3)) X.todense() # matrix([[0, 1, 1], # [1, 0, 1]])`

Currently I'm doing:

```
result = np.zeros_like(v)
d = scipy.sparse.lil_matrix((v.shape[0], v.shape[0]))
d.setdiag(v)
tmp = d * X
print tmp.todense()
#matrix([[ 0., 10., 10.],
# [ 20., 0., 20.]])
# At this point tmp is csr sparse matrix
for i in range(tmp.shape[0]):
x_i = tmp.getrow(i)
result += x_i.data * ( c[x_i.indices] - x_i.data)
# I only want to do the subtraction on non-zero elements
print result
# array([-430, -380])
```

And my problem is the for loop and especially the subtraction. I would like to find a way to vectorize this operation by subtracting only on the non-zero elements.

Something to get directly the sparse matrix on the subtraction:

```
matrix([[ 0., -7., -6.],
[ -18., 0., -16.]])
```

Is there a way to do this smartly ?

You don't need to loop over the rows to do what you are already doing. And you can use a similar trick to perform the multiplication of the rows by the first vector:

```
import scipy.sparse as sps
# number of nonzero entries per row of X
nnz_per_row = np.diff(X.indptr)
# multiply every row by the corresponding entry of v
# You could do this in-place as:
# X.data *= np.repeat(v, nnz_per_row)
Y = sps.csr_matrix((X.data * np.repeat(v, nnz_per_row), X.indices, X.indptr),
shape=X.shape)
# subtract from the non-zero entries the corresponding column value in c...
Y.data -= np.take(c, Y.indices)
# ...and multiply by -1 to get the value you are after
Y.data *= -1
```

To see that it works, set up some dummy data

```
rows, cols = 3, 5
v = np.random.rand(rows)
c = np.random.rand(cols)
X = sps.rand(rows, cols, density=0.5, format='csr')
```

and after run the code above:

```
>>> x = X.toarray()
>>> mask = x == 0
>>> x *= v[:, np.newaxis]
>>> x = c - x
>>> x[mask] = 0
>>> x
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
>>> Y.toarray()
array([[ 0.79935123, 0. , 0. , -0.0097763 , 0.59901243],
[ 0.7522559 , 0. , 0.67510109, 0. , 0.36240006],
[ 0. , 0. , 0.72370725, 0. , 0. ]])
```

The way you are accumulating your result requires that there are the same number of non-zero entries in every row, which seems a pretty weird thing to do. Are you sure that is what you are after? If that's really what you want you could get that value with something like:

```
result = np.sum(Y.data.reshape(Y.shape[0], -1), axis=0)
```

but I have trouble believing that is really what you are after...

Similar Questions

Given a sparse matrixR of type scipy.sparse.coo_matrix of shape 1.000.000 x 70.000 I figured out that row_maximum = max(R.getrow(i).data) will give me the maximum value of the i-th row. What I need n

Well, Trying to do something with search engines. I have generated a matrix (term-document) from a collection of 5 documents. The output is: docs= (5,1) 1.0000 (1,2) 0.7071 (3,2) 0.7071 (1,3) 0.7071

I've got a scipy.sparse_matrix A and I want to zero-out a decently-sized fraction of the elements. (In the matrices I'm working with today, A has about 70M entries and I want to zero-out about 700K of

I have the following sparse matrix that contains O(N) elements boost::numeric::ublas::compressed_matrix<int> adjacency (N, N); I could write a brute force double loop to go over all the entries

What is the fastest way to convert 1/0 sparse matrix to 0/1 sparse matrix without using todense() method? Example: Source matrix looks like: matrix([[1, 1, 0, 0, 0, 0, 0, 0, 1, 1], [1, 1, 0, 0, 1, 1,

I am trying to cPickle a large scipy sparse matrix for later use. I am getting this error: File tfidf_scikit.py, line 44, in <module> pickle.dump([trainID, trainX, trainY], fout, protocol=-1)

According to MKL BLAS documentation All matrix-matrix operations (level 3) are threaded for both dense and sparse BLAS. http://software.intel.com/en-us/articles/parallelism-in-the-intel-math-kernel-

I'm trying to figure out how to iterate through a scipy sparse matrix by column. I'm trying to compute the sum of each column, then weight the members of that column by that sum. What I want to do is

I have a scipy.sparse.csr.csr_matrix that represents words in a document and a list of lists where each index represents the categories for each index in the matrix. The problem that I am having is t

I have a sparse matrix created from R's Matrix package. I would like to iterate over each entry in the matrix and perform an operation, saving the result in another sparse matrix with the same indexes

I'm currently working with sparse matrices, and I have to compare the computation time of sparse matrix-matrix multiplication with full matrix-matrix multiplication. The issue is that sparse matrix co

I'm integrating a system of stiff ODE's using SciPy's integrate.odeint function. As the integration is non-trivial and time consuming I'm also using the corresponding jacobian. By rearranging the equa

I wish to speed up my machine learning algorithm (written in Python) using Numba (http://numba.pydata.org/). Note that this algorithm takes as its input data a sparse matrix. In my pure Python impleme

Is it possible to effectively obtain the norm of a sparse vector in python? I tried the following: from scipy import sparse from numpy.linalg import norm vector1 = sparse.csr_matrix([ 0 for i in xrang

Shouldn't the following uses of eigh and eigsh from the sparse and normal linalg libraries be giving the same answer? from numpy import random from scipy.linalg import eigh as E1 from scipy.sparse.lin

I have a set of sparse matrices filled with boolean values that I need to perform logical operations on (mostly element-wise OR). as in numpy, summing matrices with dtype='bool' gives the element-wise

import numpy as np from scipy.sparse import lil_matrix using numpy I get test_mat = (np.ones((4,6))) test_list = test_mat[0,:].tolist() gives test_list as a list which has 6 elements. However whe I

This question already has an answer here: Sparse matrices / arrays in Java 11 answers I need to implement a sparse matrix as efficiently memory-wise as I can in Java.I receive a matrix with mor

Let A be a sparse matrix in coordinate format [row(int) col(int) val(float)]. If a upper triangular sparse matrix of A is needed, then the same can be obtained using logical indexing like: A = A(A(:,1

I am examining java version sparse matrix multiplication program which is from JGF benchmark. I run this program in many kinds of cpu frequency. I also do some profile for this program. I classify it

I need a library to solve Ax=b systems, where A is a non-symmetric sparse matrix, with 8 entry per row (and it might be quite big). I think a library that implements biconjugate gradient should be fin

I'm trying to convert some code to Python but I noticed that SciPy's sparse diagonal operations are having some trouble handling systems that are diagonal. For example the following code can be writt

cuSparse only has a function api for multiplying a sparse matrix with a dense matrix. How to do multiply operation for two sparse matrices using cuSparse or any other cuda liberary?

I am implementing a sparse matrix based on the Stack class, and I'm getting the following error: Sparse.java:6: Sparse is not abstract and does not override abstract method pop() in Stack public clas

Say that I have a sparse matrix in scipy.sparse format. How can I extract a diagonal other than than the main diagonal? For a numpy array, you can use numpy.diag. Is there a scipy sparse equivalent? F

I am trying to find the dot product between a scipy sparse matrix and a numpy.ndarray. tensor refers to theano.tensor. X is the sparse matrix and W_hidden is the ndarray. b_hidden is also ndarray. te

I have an issue with importing the scipy.special package. It isn't harmful, just annoying/interesting. When I import scipy using import scipy as sp and then try to access sp.special I get >>>

If I have a matrix like this A = [1 2; 3 4]; I can use interp2 to interpolate it like this newA = interp2(A,2); and I get a 5x5 interpolated matrix. But what if I have a matrix like this: B = zeros(

I'm writing a program in Python using scipy's spsolve to solve a linear equation using a sparse matrix (csr_matrix). The matrices are fairly large (M=90826x90826, b=90826x1) and are hard to check by h

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

I need to do a simple curve fitting using scipy (curve_fit). However, my data is in the form of a matrix. I can easily do this in numpy but I wanted to see the goodness of fit for scipy. Problem: AX =

I am trying to use the very recent capability of the RcppArmadillo package (version 0.3.910.0 with R 3.0.1 and evrerything up to date) for conversion of a sparse matrix from the Matrix package (class

Is there a way to compute the power of a sparse matrix in matlab without converting it to a full matrix. If I try b = a^0.5 where a is a sparse matrix, I get the error Use full(x)^full(y).. However

I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an

If you have a sparse matrix X: >> X = csr_matrix([[0,2,0,2],[0,2,0,1]]) >> print type(X) >> print X.todense() <class 'scipy.sparse.csr.csr_matrix'> [[0 2 0 2] [0 2 0 1]] And a

I am looking for a C library to solve linear and, if possible, nonlinear matrix equation of the form Ax = b. It is important to me, that the packages are not too big and free of charge. Speed does not

Basically, I am just trying to do a simple matrix multiplication, specifically, extract each column of it and normalize it by dividing it with its length. #csc sparse matrix self.__WeightMatrix__ = s

Using the Matrix package I can create a two-dimensional sparse matrix. Can someone suggest a package that would allow me to create a multidimensional (specifically a 3-dimensional) sparse matrix (arra

I have a huge sparse matrix in Scipy and I would like to replace numerous elements inside by a given value (let's say -1). Is there a more efficient way to do it than using: SM[[rows],[columns]]=-1 H

I have a sparse matrix that represents a 3D rectangular space. Along some of the boundaries, I know what the value is going to be (it's a constant). The other boundaries may be reflective, differentia

How can I print sparse L and U matrices calculated by splu, which uses SuperLU? My MWE: >>> import scipy >>> import scipy.sparse >>> import scipy.sparse.linalg >>>

Is it possible to speed up large sparse matrix calculations by e.g. placing parantheses optimally? What I'm asking is: Can I speed up the following code by forcing Matlab to do the operations in a spe

ok , i don't think, i can explain this problem in words so , here is the snippet of ipython session , where i import scipy , in order to construct a sparse matrix. In [1]: import scipy as sp In [2]: a

I have this sparse matrix of size 20 millionx20million in matlab. I want to get around 40000 specific rows from this matrix. If I do new_data = data_original(index,:) where index consists of the rows

When I use large sparse matrix, it's better to use compressed matrix like CCS, CRS and so on. I tried to use ScalaNLP, la4j, colc to calc 100,000*100,000 sparse matrix. There are some problems. Breez

I see 2 implementations of sparse matrix in this package. OpenMapRealMatrix SparseFieldMatrix Both are documented as Sparse matrix implementation based on an open addressed map. Do you know what a

from numpy.random import rand from sklearn.preprocessing import normalize from scipy.sparse import csr_matrix from scipy.linalg import norm w = (rand(1,10)<0.25)*rand(1,10) x = (rand(1,10)<0.25)

In using SciPy's scipy.special.ellipeinc and ellipkinc, there seem to be some islands of numerical instability. For example, >>> from scipy.special import ellipkinc >>> ellipkinc(0.9

I am working on a project that involves the computation of the eigenvectors of a very large sparse matrix. To be more specific I have a Matrix that is the laplacian of a big graph and I am interested

How can I create a sparse matrix from a list of dimension names? Suppose you have this matrix edgelist in a data frame: from to weight 1 4 a 1 2 5 b 2 3 6 c 3 It can be created like this: from <-