I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an ndarray. I looked through the documentation, but I couldn't find any utility to repeats the rows of a scipy csr sparse matrix.

I wrote the following code that operates on the internal data, which seems to work

```
def csr_repeat(csr, repeats):
if isinstance(repeats, int):
repeats = np.repeat(repeats, csr.shape[0])
repeats = np.asarray(repeats)
rnnz = np.diff(csr.indptr)
ndata = rnnz.dot(repeats)
if ndata == 0:
return sparse.csr_matrix((np.sum(repeats), csr.shape[1]),
dtype=csr.dtype)
indmap = np.ones(ndata, dtype=np.int)
indmap[0] = 0
rnnz_ = np.repeat(rnnz, repeats)
indptr_ = rnnz_.cumsum()
mask = indptr_ < ndata
indmap -= np.int_(np.bincount(indptr_[mask],
weights=rnnz_[mask],
minlength=ndata))
jumps = (rnnz * repeats).cumsum()
mask = jumps < ndata
indmap += np.int_(np.bincount(jumps[mask],
weights=rnnz[mask],
minlength=ndata))
indmap = indmap.cumsum()
return sparse.csr_matrix((csr.data[indmap],
csr.indices[indmap],
np.r_[0, indptr_]),
shape=(np.sum(repeats), csr.shape[1]))
```

and be reasonably efficient, but I'd rather not monkey patch the class. Is there a better way to do this?

It's not surprising that `np.repeat`

does not work. It delegates the action to the hardcoded `a.repeat`

method, and failing that, first turns `a`

into an array (object if needed).

In the linear algebra world where sparse code was developed, most of the assembly work was done on the `row`

, `col`

, `data`

arrays BEFORE creating the sparse matrix. The focus was on efficient math operations, and not so much on adding/deleting/indexing rows and elements.

I haven't worked through your code, but I'm not surprised that a `csr`

format matrix requires that much work.

I worked out a similar function for the `lil`

format (working from `lil.copy`

):

```
def lil_repeat(S, repeat):
# row repeat for lil sparse matrix
# test for lil type and/or convert
shape=list(S.shape)
if isinstance(repeat, int):
shape[0]=shape[0]*repeat
else:
shape[0]=sum(repeat)
shape = tuple(shape)
new = sparse.lil_matrix(shape, dtype=S.dtype)
new.data = S.data.repeat(repeat) # flat repeat
new.rows = S.rows.repeat(repeat)
return new
```

But it is also possible to repeat using indices. Both `lil`

and `csr`

support indexing that is close to that of regular numpy arrays (at least in new enough versions). Thus:

```
S = sparse.lil_matrix([[0,1,2],[0,0,0],[1,0,0]])
print S.A.repeat([1,2,3], axis=0)
print S.A[(0,1,1,2,2,2),:]
print lil_repeat(S,[1,2,3]).A
print S[(0,1,1,2,2,2),:].A
```

give the same result

and best of all?

```
print S[np.arange(3).repeat([1,2,3]),:].A
```

Similar Questions

What is the function of copy argument in construction of scipy sparse arrays? scipy.sparse.lil_matrix(arg1, shape=None, dtype=None, copy=False) It doesn't seem to do anything! When I construct a spar

I have a sparse matrix that is not symmetric I.E. the sparsity is somewhat random, and I can't count on all the values being a set distance away from the diagonal. However, it is still sparse, and I w

Say I would like to remove the diagonal from a scipy.sparse.csr_matrix. Is there an efficient way of doing so? I saw that in the sparsetools module there are C functions to return the diagonal. Based

I am looking for sparse matrix representation that allow for efficient row and column swaping. The classic representation (by compressed row,compressed column or triplets) seems to only allow to perfo

I have two sparse matrix A (affinity matrix) and D (Diagonal matrix) with dimension 100000*100000. I have to compute the Laplacian matrix L = D^(-1/2)*A*D^(-1/2). I am using scipy CSR format for spars

There is a nonzero() method for the csr_matrix of scipy library, however trying to use that function for csr matrices result in an error, according to the manual that should return a tuple with row an

Are there any algorithms that allow efficient creation (element filling) of sparse (e.g. CSR or coordinate) matrix in parallel?

I'm trying to use large 10^5x10^5 sparse matrices but seem to be running up against scipy: n = 10 ** 5 x = scipy.sparse.rand(n, n, .001) gets ValueError: Trying to generate a random sparse matrix su

I have a number of scipy sparse matrices (currently in CSR format) that I need to multiply with a dense numpy 1D vector. The vector is called G: print G.shape, G.dtype (2097152,) complex64 Each spars

I have two 2-D arrays with the same first axis dimensions. In python, I would like to convolve the two matrices along the second axis only. I would like to get C below without computing the convolutio

I'm trying to build and update a sparse matrix as I read data from file. the matrix is of size 100000X40000 what is the most efficient way of updating multiple entries of the sparse matrix. specifical

I see 2 implementations of sparse matrix in this package. OpenMapRealMatrix SparseFieldMatrix Both are documented as Sparse matrix implementation based on an open addressed map. Do you know what a

I want to initial a sparse matrix with numpy array. The numpy array contains NaN as zero for my program, the code to initial a sparse matrix as following: a= np.array([[np.NaN,np.NaN,10]]) zero_a= np.

I need to calculate the following matrix math: D * A Where D is dense, and A is sparse, in CSC format. cuSPARSE allows multiplying sparse * dense, where sparse matrix is in CSR format. Following a rel

I have a 5000 *5000 sparse matrix with 4 different values. I want to visualise the nonzero elements with 4 different colors such that I can recognise the ratio of this values and the relationships bet

Shouldn't the following uses of eigh and eigsh from the sparse and normal linalg libraries be giving the same answer? from numpy import random from scipy.linalg import eigh as E1 from scipy.sparse.lin

I have a numpy array A such that A.shape[axis] = n+1. Now I want to construct two slices B and C of A by selecting the indices 0, .., n-1 and 1, ..., n respectively along the axis axis. Thus B.shape[

Does anyone know how to compute a correlation matrix from a very large sparse matrix in python? Basically, I am looking for something like numpy.corrcoef that will work on a scipy sparse matrix.

I would like to repeat a vector A of length n on a diagonal m times to obtain a (n+m-1) x m matrix B. As an example let's say A = [a;b;c;d], m = 4. This should result in B = [a 0 0 0; b a 0 0; c b a 0

I have a large sparse matrix, implemented as a lil sparse matrix from sci-py. I just want a statistic for how sparse the matrix is once populated. Is there a method to find out this?

I am trying to create a large sparse matrix, 10^5 by 10^5 in R, but am running into memory issues. > Matrix(nrow=1e5,ncol=1e5,sparse=TRUE) Error in Matrix(nrow = 1e+05, ncol = 1e+05, sparse = TRUE)

I'm trying to implement a function in NumPy/Scipy to compute Jensen-Shannon divergence between a single (training) vector and a large number of other (observation) vectors. The observation vectors are

I'm trying to convert some code to Python but I noticed that SciPy's sparse diagonal operations are having some trouble handling systems that are diagonal. For example the following code can be writt

I'm afraid that I can't describe the problem so I draw a sketch of it.Anyway,what I need is to find the max values along the 0th axis in a numpy ndarray,i.e.array.shape(5,5,3), and their corresponding

If you have a sparse matrix X: >> X = csr_matrix([[0,2,0,2],[0,2,0,1]]) >> print type(X) >> print X.todense() <class 'scipy.sparse.csr.csr_matrix'> [[0 2 0 2] [0 2 0 1]] And a

I am trying to create a very huge sparse matrix which has a shape (447957347, 5027974). And, it contains 3,289,288,566 elements. But, when i create a csr_matrix using scipy.sparse, it return something

I am attempting to multiply two relatively large scipy.sparse matrices. One is 100000 x 20000 and the other is 20000 x 100000. My machine has sufficient memory to handle the operation but it is by no

I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string

I have a numpy array, X: type(X) >>> <class 'scipy.sparse.csc.csc_matrix'> I am interested in finding the indices of the rows where there are non-zero entries, in the 0th column. I tri

I'm currently working on a scipy sparse csr matrix. I would like to delete all rows in the matrix that contain 0 in the data array of the matrix (the data array is the 1s and 2s you can see in the exa

I am using scipy.sparse.linalg.eigsh to solve the generalized eigen value problem for a very sparse matrix and running into memory problems. The matrix is a square matrix with 1 million rows/columns,

I was wondering if there is a operator for element-wise multiplication of rows of a sparse matrix with a vector in scipy.sparse library. Something similar to A*b for numpy arrays? Thanks.

I'm looking for any standard C program that uses OpenMP APIs for a sparse matrix-vector or matrix-matrix multiplications. Can anyone let me know if there are any such programs.

I am trying to compute nearest neighbour clustering on a Scipy sparse matrix returned from scikit-learn's DictVectorizer. However, when I try to compute the distance matrix with scikit-learn I get an

I am trying to factorize very large matrixes with the python library Nimfa. Since the matrix is so large I am unable to instanciate it in a dence format in memory, so instead I use scipy.sparse.csr_ma

I am using the cusp library with CUDA to use sparse matrix. Can't I use it in a struct in C like: #include <cusp/coo_matrix.h> #include <cusp/multiply.h> #include <cusp/print.h> #in

I have a sparse matrix: from scipy import sparse a = sparse.diags([1,4,9],[-1,0,1],shape =(10,10),format =csr) I want to take the square root of each of the elements in the sparse matrix I look up

I am working on an algorithm that uses diagonal and first off-diagonal blocks of a large (will be e06 x e06) block diagonal sparse matrix. Right now I create a dict that stores the blocks in such a wa

Does anyone know that how to use scipy.sparse package to compute SVD on sparse matrix? I know that I need to use scipy.sparse.linalg.svds(). But I did as bellow: from scipy.sparse import * csr = csr_

I have been given this 63521x63521 real sparse symmetric matrix in MATLAB and for some reason it seems to be behaving weirdly for some commands. I am not sure if there is a 'defect' in the matrix fil

I am trying to use the very recent capability of the RcppArmadillo package (version 0.3.910.0 with R 3.0.1 and evrerything up to date) for conversion of a sparse matrix from the Matrix package (class

I've got a large matrix stored as a scipy.sparse.csc_matrix and want to subtract a column vector from each one of the columns in the large matrix. This is a pretty common task when you're doing things

I have a large sparse matrix, and I want to permute its rows or columns to turn the original matrix into a block diagonal matrix. Anyone knows which functions in R or MATLAB can do this? Thanks a lot.

I am creating a matrix from a Pandas dataframe as follows: dense_matrix = numpy.array(df.as_matrix(columns = None), dtype=bool).astype(np.int) And then into a sparse matrix with: sparse_matrix = scip

I have a sparse matrix A, generated as an output of glmnet function. When I print the matrix A, it shows all the entries and at the top it reads - 1897 x 100 sparse Matrix of class dgCMatrix [[ su

Due to lack of explaining i am going to edit my question a bit. I have a data set along y axis plotted against x axis with step of 0.01 along x axis. Of course along y axis the step can be any arbitra

I Have this code to get shortest path using Dijkstra but I know a real problem involves sparse matrices (matrix populated primarily with zeros). Is there a way to improve efficiency in this code? If

Suppose I have a list or a list of lists (each list with the same size). How do I convert to a sparse vector or sparse matrix, respectively?

I want to know how to add sparse matrices in python. I have a program that breaks a big task into subtasks and distributes them across several cpus. Each subtask yields a result (a scipy sparse matrix

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matri