I have a sparse logical matrix, which is quite large. I would like to draw random non-zero elements from it without storing all of its non-zero elements in a separate vector (eg. by using find command). Is there an easy way to do this?

Currently I am implementing rejection sampling, which is drawing a random element and checking whether that is non-zero or not. But it is not efficient when the ratio of non-zero elements is small.

find is the standard interface to get the non-zero elements in a sparse matrix. Have a look here http://www.mathworks.se/help/techdoc/math/f6-9182.html#f6-13040

```
[i,j,s] = find(S)
```

*find returns the row indices of nonzero values in vector i, the column indices in vector j, and the nonzero values themselves in the vector s.*

No need to get s. Just pick a random index in i,j.

A sparse logical matrix is not a very practical representation of your data if you want to pick random locations. Rejection sampling and `find`

are the only two ways that make sense to me. Here's how you can do them efficiently (assuming you want to get 4 random locations):

```
%# using find
idx = find(S);
%# draw 4 without replacement
fourRandomIdx = idx(randperm(length(idx),4));
%# draw 4 with replacement
fourRandomIdx = idx(randi(1,length(idx),4));
%# get row, column values
[row,col] = ind2sub(size(S),fourRandomIdx);
%# using rejection sampling
density = nnz(S)/prod(size(S));
%# estimate how many samples you need to get at least 4 hits
%# and multiply by 2 (or 3)
n = ceil( 1 / (1-(1-density)^4) ) * 2;
%# random indices w/ replacement
randIdx = randi(1,n,prod(size(S)));
%# identify the first four non-zero elements
[row,col] = find(S(randIdx),4,'first');
```

An n x m matrix with nnz non-zero elements requires nnz + n + 1 integers to store the locations of its non-zero entries. For a logical matrix there is no need to store the value of the non-zero entries: these are all true. Correspondingly, you would do best to convert your logical sparse matrix into a list of the linear indices of its non-zero entries, together with n and m, which requires only nnz + 2 integers of storage. From these (and ind2sub) you can readily reconstruct the subscripts corresponding to any non-zero entry that you choose randomly using randi over the range 1..nnz

By representing the entries in a 3 column format, aka a coordinate list (i, j, value), you can simply select the items from the list. To get this, you can either use your original method for creating the sparse matrix (i.e. the precursor to `sparse()`

), or use the `find`

command, a la `[i,j,s] = find(S);`

If you don't need the entries, and it seems you don't, you can just extract `i`

and `j`

.

If, for some reason, your matrix is massive and your RAM limitations are severe, you can simply divide the matrix into regions, and let the probability of selecting a given sub-matrix be proportional to the number of non-zero elements (using `nnz`

) in that sub-matrix. You could go so far as to divide the matrix into individual columns, and the rest of the calculation is trivial. NB: by applying `sum`

to the matrix, you can get the per-column counts (assuming your entries are just 1s).

In this way, you need not even bother with rejection sampling (which seems pointless to me in this case, since Matlab knows where all of the non-zero entries are).

Similar Questions

I have encountered a difference in how slicing a scipy sparse matrix works in 0.10.0 and 0.10.1. Consider the following piece of code: from numpy import array, ravel from scipy.sparse import csr_matri

Is it possible to apply for example numpy.exp or similar pointwise operators to all elements in a scipy.sparse.lil_matrix or another sparse matrix format? import numpy from scipy.sparse import lil_mat

I am trying to create a matrix by drawing random block rows from another matrix. I have managed to do so with a loop. set.seed(1) a_matrix <- matrix(1:10,10,5) # the matrix with original sample b_m

How do I select a sample of rows at random with repetition from a matrix in R? So do be clear, I would start with a matrix of, for example, 100 rows and I would be able to select 5 of those rows and m

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

This question has two parts (maybe one solution?): Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample

I am trying to do an efficient sparse matrix multiplication. Right now I am reading the data into memory and this is how my data structure looks like: typedef struct node{ int x; int y; int value; st

I wrote a small sparse matrix class with the member: std::map<int,std::map<int,double> > sm; The method below is the function i use to access the elements of a matrix, if not possible thr

I'm looking for any standard C program that uses OpenMP APIs for a sparse matrix-vector or matrix-matrix multiplications. Can anyone let me know if there are any such programs.

Is there an easy way to simulate a random permutation matrix (say of size 1000 by 1000) in Matlab? I would like to study the eigenvalue distribution of independent sum of such matrices. Thanks in adva

There does not seem to be a method in scipy.sparse which gives the minimum of a sparse matrix. In particular, I seek the minimum of the columns. No method appears in the docs and numpy minimum does no

It takes 0.02 seconds for Matlab to compute the inverse of a diagonal matrix using the sparse command. P = diag(1:10000); P = sparse(P); tic; A = inv(P); toc However, for the Python code it takes for

I am using eigen 3.1.0-alpha1 as solver for a my first little software. I need to return a sparse matrix from a method of a class: SparseMatrix KMDMatrix::Assembly(double ***p_objs){ SparseMatrix <

I've got a sparse Matrix in R that's apparently too big for me to run as.matrix() on (though it's not super-huge either). The as.matrix() call in question is inside the svd() function, so I'm wonderin

I'm having trouble resizing a matrix - the set_shape function seems to have no effect: >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInk

I have a Collection<Obj> how do I get a random Obj from it? I've checked the docs and there doesn't seem to be a way, since iterator is the only way to access the collection. Do I have to iterat

Question is very simple: Let's say I have a given row r from scipy sparse matrix M (100,000X500,000), I want to find its location/index in the M matrix? How can I accomplish this in an efficient way?

In another post regarding resizing of a sparse matrix in SciPy the accepted answer works when more rows or columns are to be added, using scipy.sparse.vstack or hstack, respectively. In SciPy 0.12 the

what's the best suitable data structure to use in C for sparse dynamic matrix. I know about the Yale format but it's for static matrices. I need to be able to add rows column and values in it. Thanks

I noticed on Google Summer of Code 2013 that a possible project was implement sparse matrix support for Decision Trees and ensemble methods. Out of curiosity, did this project get anywhere? I really n

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to b

I've read several topics, but I'm lost. I'm quite new to this. I want to store huge sparse matrix and have several idea's but can choose between them. Here's my needs: Adjacency matrix of approx. 50

I am trying to add a numpy ndarray to a sparse matrix and I have been unsuccessful in doing so. I was wondering if there is a way to do so, without transforming my sparse matrix into a dense one. ano

I don't know if the signature matrix I am trying to build has a proper pre-existing name or definition in any fields, but the following code appears to generate the correct result on some toy matric

To my understanding, numpy.sparse.csr_sparse.dot(other) does multiply other to my sparse matrix from the right: A = numpy.sparse.csr_sparse(something) B = numpy.matrix(something) C = A.dot(B) # C = A*

I got some sparse matrix like this >>>import numpy as np >>>from scipy.sparse import * >>>A = csr_matrix((np.identity(3))) >>>print A (0, 0) 1.0 (1, 1) 1.0 (2, 2) 1

I initialize an empty sparse matrix using S = scipy.sparse.lil_matrix((n,n),dtype=int) As expected print S doesn't show anything, since nothing has been assigned. Yet if I test: print S[0,0]==0 I r

I would like to pick an random element from an array, remove it from the array, and then return the element. I can use sample to get an element, index to find where it is, and then delete_at to remove

type(A) <class 'scipy.sparse.csc.csc_matrix'> A.shape (8529, 60877) print A[0,:] (0, 25) 1.0 (0, 7422) 1.0 (0, 26062) 1.0 (0, 31804) 1.0 (0, 41602) 1.0 (0, 43791) 1.0 print A[1,:] (0, 7044) 1.0

I am trying to find the indices of nonzero entries by row in a sparse matrix: scipy.sparse.csc_matrix. So far, I am looping over each row in the matrix, and using numpy.nonzero() to each row to get t

I'd like to find the N smallest eigenvalues of a sparse matrix in Python. I've tried using the scipy.sparse.linalg.eigen.arpack package, but it is very slow at computing the smallest eigenvalues. I re

I have a very large sparse matrix in Octave and I want to get the variance of each row. If I use std(A,1); it crashes because memory is exhausted. Why is this? The variance should be very easy to calc

I have a large sparse matrix (lil) implemented in Python with scipy, which comprises of Users on one axis, and songs they played on the other. So each row is a linked list of the songs that user has p

What is the best way to efficiently remove columns from a sparse matrix that only contain zeros. I have a matrix which I have created and filled with data: matrix = sp.sparse.lil_matrix((100, 100)) I

Hello I know there are a lot of questions on sparse matrix multiplication, but many of the answers say to just use libraries. I want to do it without using library functions. So far I've done the easy

I have a little problem, I would like to convert a matrix 10*10 in a CSR or COO sparse matrix/format. The matrix is: 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 -0.45 0.10 -0.45 0.00 0.00 0.00

Using the Matrix package I can create a two-dimensional sparse matrix. Can someone suggest a package that would allow me to create a multidimensional (specifically a 3-dimensional) sparse matrix (arra

How to create a 2d sparse matrix in a MEX-file written in C. After creating the matrix how to access the elements individually like in C , say mat[i][j]? I tired using mxCreateNumericArray function bu

I need to create a matrix with values from a numpy array. The values should be distributed over the matrix lines according to an array of indices. Like this: >>> values array([ 0.73620381, 0.

I want to create a custom optimized matrix operation (a smart kronecker product based on what I know about the sparse matrices i'm using) using MathNet.numerics for csharp. Is there an accessor to get

For example, there is a Scala array val A = Array(please, help, me). How to choose a random element from this array?

When trying to directly set the data attribute of a sparse lil_matrix, I encounter very unexpected behavior. Can someone explain what is going on in the following simple example? My particular use ca

Is it possible to remove just the translation element from a Matrix object so that only Scale and Rotation elements remain? Thanks

Edit: The huge difference in performance is due to a bug in the test, when set up properly Eigen is 2 to 3 times faster. I noticed that sparse matrix multiplication using C++ Eigen library is much slo

I need to insert the zero elements in any sparse matrix in the Matrix Market format (but already without the headers). The first column is the number of the ROW, the second columns is the number of th

I have sparse vectors with dimensionalities of around 200.000. I also have a matrix with the same amount of columns, and the same amount of rows as the number of vectors. I want to set all of these in

I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machi

I am trying to make an existing piece of software that uses hand tuned sparse multiplication of special CSC matrices that have exactly k nonzero elements per column. I decided to use cusparse for the

So I have this matrix here, and it is of size 13 x 8198. (I have called it 'blah'). This is a sparse matrix, in that, most of its entries are 0. When I do an imagesc(blah), I get the following image:

in c#.net 3.5 I get by Linq all my users from my user-table. Now I will return a random user from this list, how to do?