This question already has an answer here:

I am looking for a solution to store about 10 million floating point (double precision) numbers of a sparse matrix. The matrix is actually a two-dimensional triangular matrix consisting of 1 million by 1 million elements. The element `(i,j)`

is the actual score measure `score(i,j)`

between the element `i`

and element `j`

. The storage method must allow very fast access to this information maybe by memory mapping the file containing the matrix. I certainly don't want to load all the file in memory.

```
class Score(IsDescription):
grid_i = UInt32Col()
grid_j = UInt32Col()
score = FloatCol()
```

I've tried `pytables`

by using the `Score`

class as exposed, but I cannot access directly to the element `i,j`

without scanning all the rows. Any suggestion?

You should use `scipy.sparse`

. Here's some more info about the formats and usage.

10 million double precision floats take up 80 MB of memory. If you store them in a 1 million x 1 million sparse matrix, in CSR or CSC formats, you will need an additional 11 million int32s, for a total of around 125 MB. That's probably less than 7% of the physical memory in your system. And in my experience, on a system with 4GB running a 32-bit version of python, you rarely start having trouble allocating arrays until you try to get a hold of ten times that.

Run the following code on your computer:

```
for j in itertools.count(100) :
try :
a = np.empty((j * 10**6,), dtype='uint8`)
print 'Allocated {0} MB of memory!'.format(j)
del a
except MemoryError:
print 'Failed to allocate {0} MB of memory!'.format(j)
break
```

And unless it fails to get you at least 4 times the amount calculated above, don't even hesitate about sticking the whole thing in memory using a `scipy.sparse`

format.

I have no experience with pytables, nor much with numpy's `memmap`

arrays. But it seems to me that either one of those will involve you coding the logic to handle the sparsity, something I would try to avoid unless impossible to.

Similar Questions

(using MATLAB) I have a large coordinate matrix and a large sparse adjacency matrix for which coordinates are connected to each other. I had asked previously on SO, how to efficiently compute these di

I want to do SVD on a sparse matrix by using scipy: from svd import compute_svd print(The size of raw matrix: +str(len(raw_matrix))+ * +str(len(raw_matrix[0]))) from scipy.sparse import dok_matrix

I'm currently working with sparse matrices, and I have to compare the computation time of sparse matrix-matrix multiplication with full matrix-matrix multiplication. The issue is that sparse matrix co

I am using Boost's uBLAS in a numerical code and have a 'heavy' solver in place: http://www.crystalclearsoftware.com/cgi-bin/boost_wiki/wiki.pl?LU_Matrix_Inversion The code works excellently, however,

I am trying to run the full SVD of a large (120k x 600k) and sparse (0,1% of non-null values) matrix M. Due to memory limitation all my previous attempts failed (with SVDLIBC, Octave, and R) and I am

I have a very large and sparse matrix of size 180GB(text , 30k * 3M) containing only the entries and no additional data. I have to do matrix multiplication , inversion and some similar linear algebra

I have a data file storing a large matlab sparse matrix (matlab 7.3) that needs to be used in my python program. I use h5py to load this sparse matrix and find there are 3 data structures associated w

I am trying to create a large sparse matrix, 10^5 by 10^5 in R, but am running into memory issues. > Matrix(nrow=1e5,ncol=1e5,sparse=TRUE) Error in Matrix(nrow = 1e+05, ncol = 1e+05, sparse = TRUE)

I'm thinking of using Boost's Sparse Matrix for a computation where minimal memory usage is the goal. Unfortunately, the documentation page didn't include a discussion of the sparse matrix implementat

This question has two parts (maybe one solution?): Sample vectors from a sparse matrix: Is there an easy way to sample vectors from a sparse matrix? When I'm trying to sample lines using random.sample

I'm looking for a library to do huge Sparse Matrix x Vector multiplication. The matrix itself will almost fill the RAM. I've found Eigen3, OSKI and some basic Sparse BLAS implementations. Are there ot

Possible Duplicate: C#: Any faster way of copying arrays? I have an array of structs like this: struct S { public long A; public long B; } ... S[] s1 = new S[1000000]; ... S[] = new S[s1.Length]; //

What do you think? What would be faster and how much faster: Doing sparse matrix (CSR) multiplication (with a vector) on the GPU or the CPU (multithreaded)?

Well, Trying to do something with search engines. I have generated a matrix (term-document) from a collection of 5 documents. The output is: docs= (5,1) 1.0000 (1,2) 0.7071 (3,2) 0.7071 (1,3) 0.7071

Finding the maximum sum subrectangle in an NxN matrix can be done in O(n^3) time using 2-d kadane's algorithm, as pointed out in other posts. However, if the matrix is sparse, specifically O(n) non-ze

When I try to store large number into a float like float varr = 123456789; the variable varr has value of 123456792.0 not 123456789.0 as I expected. Why? Is there any solution for this? I cannot use d

I want to test some of the newer sparse linear solvers and I want to know if there is a fast way of filling in the matrix. The format I'm interested is CSR (http://goo.gl/hLXYd). Let's say the matrix,

I'm doing a project and I'm doing a lot of matrix computation in it. I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M n

I am trying to speed up a sparse matrix-vector product using open mp, the code is as follows: void zAx(double * z, double * data, long * colind, long * row_ptr, double * x, int M){ long i, j, ckey; in

I've got a sparse Matrix in R that's apparently too big for me to run as.matrix() on (though it's not super-huge either). The as.matrix() call in question is inside the svd() function, so I'm wonderin

I have a large dataset with 1008412 observations, the columns are customer_id (int), visit_date (Date, format: 2010-04-04), visit_spend (float). This date function for the aggregate maps week number

I've got a collection of O(N) NxN scipy.sparse.csr_matrix, and each sparse matrix has on the order of N elements set. I want to add all these matrices together to get a regular NxN numpy array. (N is

If you have a sparse matrix X: >> print type(X) <class 'scipy.sparse.csr.csr_matrix'> ...How can you sum the squares of each element in each row, and save them into a list? For example: &

I have three lists namely A , B , C All these lists contain 97510 items . I need to create a sparse matrix like this matrix[A[0]][B[0]] = C[0] For example , A=[1,2,3,4,5] B=[7,8,9,10,11] C=[14,15,1

I have a very large (about 91 million non-zero entries) sparseMatrix() in R that looks like: > myMatrix a b c a . 1 2 b 1 . . c 2 . . I would like to convert it to a triangular matrix (upper or lo

I have a large sparse matrix, implemented as a lil sparse matrix from sci-py. I just want a statistic for how sparse the matrix is once populated. Is there a method to find out this?

What is the fastest way to convert 1/0 sparse matrix to 0/1 sparse matrix without using todense() method? Example: Source matrix looks like: matrix([[1, 1, 0, 0, 0, 0, 0, 0, 1, 1], [1, 1, 0, 0, 1, 1,

I am using the cusp library with CUDA to use sparse matrix. Can't I use it in a struct in C like: #include <cusp/coo_matrix.h> #include <cusp/multiply.h> #include <cusp/print.h> #in

I have to create a matlab matrix that is much bigger that my phisical memory, and i want to take advantage of the sparsity. This matrix is really really sparse [say N elements in an NxN matrix], and m

This question already has an answer here: Sparse matrices / arrays in Java 11 answers I need to implement a sparse matrix as efficiently memory-wise as I can in Java.I receive a matrix with mor

It seems that I am missing something very basic here. I have a large square matrix which is mostly zeros. What I want is to reduce it to a matrix that contains all rows and columns with non-zero entri

I have the following code in Python using Numpy: p = np.diag(1.0 / np.array(x)) How can I transform it to get the sparse matrix p2 with the same values as p without creating p first?

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matri

I ran into the following issue trying to vstack two large CSR matrices: /usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self) 229 raise ValueError('negative row index found') 230 if s

is there an easy way to shuffle a sparse matrix in python? This is how I shuffle a non-sparse matrix: index = np.arange(np.shape(matrix)[0]) np.random.shuffle(index) return matrix[index] How can I d

I would like to take column and row names from a text file and build a sparse matrix using the row and column information (the algorithm can be found in the description below). I have a working soluti

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

I'm trying to figure out how to efficiently solve a sparse triangular system, Au*x = b in scipy sparse. For example, we can construct a sparse upper triangular matrix, Au, and a right hand side b with

I am doing a text classification task with R, and I obtain a document-term matrix with size 22490 by 120,000 (only 4 million non-zero entries, less than 1% entries). Now I want to reduce the dimension

I need to randomly scramble the values of an nx1 matrix in matlab. I'm not sure how to do this efficiently, I need to do it many times for n > 40,000. Example Matrix before: 1 2 2 2 3 4 5 5 4 3 2 1

Suppose I have a NxN matrix M (lil_matrix or csr_matrix) from scipy.sparse, and I want to make it (N+1)xN where M_modified[i,j] = M[i,j] for 0 <= i < N (and all j) and M[N,j] = 0 for all j. Basi

I want to multiply a sparse matrix A, with a matrix B which has 0, -1, or 1 as elements. To reduce the complexity of the matrix multiplication, I can ignore items if they are 0, or go ahead and add th

I have the following sparse matrix A. 2 3 0 0 0 3 0 4 0 6 0 -1 -3 2 0 0 0 1 0 0 0 4 2 0 1 Then I would like to capture the following information from there: cumulative count of entries, as matrix i

I have a sparse logical matrix, which is quite large. I would like to draw random non-zero elements from it without storing all of its non-zero elements in a separate vector (eg. by using find command

I'm working in a program for Power System analysis and I need to work with sparse matrices. There is a routine where I fill a sparse matrix just with the following call: self.A = bsr_matrix((val, (row

void add(sparseMatrix<T> &b, sparseMatrix<T> &c); // c is output sparseMatrix<T> operator+(sparseMatrix<T> &b); I'm creating a sparse matrix which is made up of an

What's the best way to represent a sparse data matrix in PostgreSQL? The two obvious methods I see are: Store data in a single a table with a separate column for every conceivable feature (potentiall

Given a sparse binary matrix A (csr, coo, whatever) I want to make a plot such that I can see the position (i,j) = white in the figure if A(i,j) = 1, and (i,j) = black if A(i,j) = 0; For a dense numpy

I have a matrix x = [0 0 0 1 1 0 5 0 7 0] I need to remove all of the zeros such as x=[1 1 5 7] The matrices I am using are large (1x15000) I need to do this multiple times (5000+) Efficiency is key

I was trying to iterate over the non zero elements of a row major sparse matrix, such as shown below: Eigen::SparseMatrix<double,Eigen::RowMajor> Test(2, 3); Test.insert(0, 1) = 34; Test.insert