I have a fairly large sparse matrix that, I reckon, would occupy 1Gb when loaded into memory.

I don't need access to the whole matrix at all times, so some kind of memory mapping would work; it doesn't seem to be possible, however, to memory map a sparse matrix using numpy or spicy (the tools I'm familiar with).

It can easily fit into memory, but it would be a pain if I had to load it every time I run the program. Maybe some way to keep it in memory between runs?

So, what do you suggest: 1. Find a way to memory map a sparse matrix; 2. Just load the whole think into memory every time 3. ?

Scipy supports different kinds of sparse matrices. But you'd have to write a routine to read it into memory. Which type you should use depends on what you want to do with it.

If your matrix is very sparse, you could save `(row, column, value)`

tuples to disk as binary data using the struct module. That would make the on-disk data smaller and make if easier to load, assuming portability is not an issue.

You could then read the data like this:

```
import struct
from functools import partial
fmt = 'IId'
size = struct.calcsize(fmt)
with open('sparse.dat', 'rb') as infile:
f = partial(infile.read, size)
for chunk in iter(f, ''):
row, col, value = struct.unpack(fmt, chunk)
# put it in your matrix here
```

The following may work as a general concept, but you are going to have to figure out a lot of details... You should first make yourself familiar with CSR format, where all information for an array is stored in 3 arrays, two of length the number of non-zero entries, one of length the number of rows plus one:

```
>>> import scipy.sparse as sps
>>> a = sps.rand(10, 10, density=0.05, format='csr')
>>> a.toarray()
array([[ 0. , 0.46531486, 0.03849468, 0.51743202, 0. ],
[ 0. , 0.67028033, 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0. , 0. , 0.9967058 ],
[ 0. , 0. , 0. , 0. , 0. ]])
>>> a.data
array([ 0.46531486, 0.03849468, 0.51743202, 0.67028033, 0.9967058 ])
>>> a.indices
array([1, 2, 3, 1, 4])
>>> a.indptr
array([0, 3, 4, 4, 5, 5])
```

So `a.data`

has the non-zero entries, in row major order, `a.indices`

has the corresponding column indices of the nono-zero entries, and `a.indptr`

has the starting indices into the other two arrays where data for every row starts, e.g. `a.indptr[3] = 4`

and `a.indptr[3+1] = 5`

, so non-zero entries in the fourth row are `a.data[4:5]`

, and their column indices `a.indices[4:5]`

.

So you could store these three arrays in disk, and access them as memmaps, and you could then retrieve rows m through n as follows:

```
ip = indptr[m:n+1].copy()
d = data[ip[0]:ip[-1]]
i = indices[ip[0]:ip[-1]]
ip -= ip[0]
rows = sps.csr_matrix((d, i, ip))
```

As a general proof of concept:

```
>>> c = sps.rand(1000, 10, density=0.5, format='csr')
>>> ip = c.indptr[20:25+1].copy()
>>> d = c.data[ip[0]:ip[-1]]
>>> i = c.indices[ip[0]:ip[-1]]
>>> ip -= ip[0]
>>> rows = sps.csr_matrix((d, i, ip))
>>> rows.toarray()
array([[ 0. , 0. , 0. , 0. , 0.55683501,
0.61426248, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0.67789204, 0. , 0.71821363,
0.01409666, 0. , 0. , 0.58965142, 0. ],
[ 0. , 0. , 0. , 0.1575835 , 0.08172986,
0.41741147, 0.72044269, 0. , 0.72148343, 0. ],
[ 0. , 0.73040998, 0.81507086, 0.13405909, 0. ,
0. , 0.82930945, 0.71799358, 0.8813616 , 0.51874795],
[ 0.43353831, 0.00658204, 0. , 0. , 0. ,
0.10863725, 0. , 0. , 0. , 0.57231074]])
>>> c[20:25].toarray()
array([[ 0. , 0. , 0. , 0. , 0.55683501,
0.61426248, 0. , 0. , 0. , 0. ],
[ 0. , 0. , 0.67789204, 0. , 0.71821363,
0.01409666, 0. , 0. , 0.58965142, 0. ],
[ 0. , 0. , 0. , 0.1575835 , 0.08172986,
0.41741147, 0.72044269, 0. , 0.72148343, 0. ],
[ 0. , 0.73040998, 0.81507086, 0.13405909, 0. ,
0. , 0.82930945, 0.71799358, 0.8813616 , 0.51874795],
[ 0.43353831, 0.00658204, 0. , 0. , 0. ,
0.10863725, 0. , 0. , 0. , 0.57231074]])
```

Similar Questions

I'm coding the program that using linked list to store a sparse matrix. First I create a class Node contains the index of entry, value of entry and two pointers to next row and next column. Second I

I have a large m *n sparse matrix Y. I would like to normalize each row of Y, so that each row has zero mean. I first tried this. But the mean of each row is also subtracted from the zero entries, wh

I am working with large sparse matrices (sparse). I have a large sparse matrix of values which needs to be included into a larger sparse matrix. I have an array of logicals which indicates which rows

Does anyone know of a library or example of openCL code which will solve for Ax=B where A is large and sparse? I do not want to calculate the inverse of A as it would be very large and dense. The A ma

when I print the values of the second row in a sparse matrix, I noticed that the first index is 0 instead 1. See my example below. Why is that? >>> from scipy.sparse import * >>> a=l

I am trying to do some (k-means) clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of 1 values per row). I want to get around 2000

I'm doing a project and I'm doing a lot of matrix computation in it. I'm looking for a smart way to speed up my code. In my project, I'm dealing with a sparse matrix of size 100Mx1M with around 10M n

I work on converting a large Matlab code to C++ and CUDA. I have problems converting some sparse matrix operations like: 1. full_Matrix * sparse_Matrix 2. sparse_Matrix * full_Matrix 3. sparse_Matrix

I have to compute massive similarity computations between vectors in a sparse matrix. What is currently the best tool, scipy-sparse or pandas, for this task?

I am trying to speed up a sparse matrix-vector product using open mp, the code is as follows: void zAx(double * z, double * data, long * colind, long * row_ptr, double * x, int M){ long i, j, ckey; in

I'd like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one). from pylab import * import scipy.sparse as sp def normalize(W): z = W.sum(0) z[z < 1e-6]

Given a sparse matrixR of type scipy.sparse.coo_matrix of shape 1.000.000 x 70.000 I figured out that row_maximum = max(R.getrow(i).data) will give me the maximum value of the i-th row. What I need n

I am trying to find an efficient way to retrieve a list / vector / array of the non-zero upper triangular elements of a sparse matrix in R. For example: library(igraph) Gmini <- as.directed(graph.

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following: from scipy.sparse import lil_matrix x = lil_matrix( (20,1) ) x[1

I don`t know how to solve this problem in Fundamentals of data structure in C ed.2nd ch2.5 On a computer with w bits per word, how much storage is needed to represent a sparse matrix, A, with t nonzer

I have a large file with the following format which I read as x userid,productid,freq 293994,8,3 293994,5,3 949859,2,1 949859,1,1 123234,1,1 123234,3,1 123234,4,1 ... It gives the product a given use

I need a command to check for zero sparse matrix, isempty(..) does not work. Is there some sparse version of isempty(..)? >> mlf2=sparse([],[],[],2^31+1,1) mlf2 = All zero sparse: 2147483649-by-

What is the compact way of storing a sparse matrix that allows to iterate over each row and each column efficiently?

I am attempting to cluster a set of data points that are represented as a sparse scipy matrix, X. That is, >>> print type(X) <class 'scipy.sparse.csr.csr_matrix'> >>> print X.s

I have encountered a difference in how slicing a scipy sparse matrix works in 0.10.0 and 0.10.1. Consider the following piece of code: from numpy import array, ravel from scipy.sparse import csr_matri

I'm writing a lexer generator as a spare time project, and I'm wondering about how to go about table compression. The tables in question are 2D arrays of short and very sparse. They are always 256 cha

Below is my code for generating my sparse matrix: import numpy as np import scipy def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_,

I am using eigen 3.1.0-alpha1 as solver for a my first little software. I need to return a sparse matrix from a method of a class: SparseMatrix KMDMatrix::Assembly(double ***p_objs){ SparseMatrix <

I am trying to subsample a scipy sparse matrix as a numpy matrix like this to get every 10th row and every 10th column: connections = sparse.csr_matrix((data,(node1_index,node2_index)), shape=(dimensi

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

I am attempting to perform fastclust on a very large set of distances, but running into a problem. I have a very large csv file (about 91 million rows so a for loop takes too long in R) of similaritie

I am looking for the best package for sparse matrix multiplication on single core solution. I am not looking for CUDA, MPI or OpenMP solutions. My preference for languages in decreasing order : Matlab

I'm looking for an a command or trick to convert two arrays to a sparse matrix. The two arrays contain x-values and y-values, which gives a coordinate in the cartesian coordinate system. I want to gro

I want to augment the scipy.sparse.csr_matrix class with a few methods and replace a few others for personal use. I am making a child class which inherits from csr_matrix, as such: class SparseMatrix(

I am trying to cPickle a large scipy sparse matrix for later use. I am getting this error: File tfidf_scikit.py, line 44, in <module> pickle.dump([trainID, trainX, trainY], fout, protocol=-1)

I have this sparse matrix of the following form Lets take an example of 5x10 matrix 1 2 3 4 5 6 7 8 9 10 1 1 1 0 0 0 0 0 0 0 0 2 0 1 0 0 0 0 0 0 0 0 3 ............................. 4 ................

I am building a simple model of the Milky Way and one of the things I need to store is a 3D grid of mass densities. The problem is that if I put a rectangular box around the galaxy, most of the grid c

I'm trying to set up a particular kind of sparse matrix in R. The following code gives the result I want, but is extremely slow: library(Matrix) f <- function(x){ out <- rbind(head(x, -1), tail(

I'm currently working with sparse matrices, and I have to compare the computation time of sparse matrix-matrix multiplication with full matrix-matrix multiplication. The issue is that sparse matrix co

In a scipy program I'm creating a dia_matrix (sparse matrix type) with 5 diagonals. The centre diagonal the +1 & -1 diagonals and the +4 & -4 diagonals (usually >> 4, but the principle i

I have a sparse matrix represented as > (f <- data.frame(row=c(1,2,3,1,2,1,2,3,4,1,1,2),value=1:12)) row value 1 1 1 2 2 2 3 3 3 4 1 4 5 2 5 6 1 6 7 2 7 8 3 8 9 4 9 10 1 10 11 1 11 12 2 12 Here

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix). In particular, I will need to

I need to overload [] operator in class Sparse Matrix. This operator must work like in 2D table access. For example tab[1][1], return reference. The problem is I have a vector of elements(struct). tem

Is it possible to speed up large sparse matrix calculations by e.g. placing parantheses optimally? What I'm asking is: Can I speed up the following code by forcing Matlab to do the operations in a spe

I'm trying to figure out how to efficiently solve a sparse triangular system, Au*x = b in scipy sparse. For example, we can construct a sparse upper triangular matrix, Au, and a right hand side b with

I am very new to the use of sparse matrices for solving simultaneous equations. I want to use them for solving problems in FEA. Is there a FORTRAN routine, or a function in the Intel Math Kernel libra

I have two sparse matrices, m1 and m2: > m1 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c(b,d),NULL)) > m2 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c

I have a very large matrix (about 500000 * 20000) containing the data that I would analyze with pca. To do this I'm using ParallelColt library, but both using singular value decomposition and eigenval

If I am using the sparse.lil_matrix format, how can I remove a column from the matrix easily and efficiently?

I have the following sparse matrix A. 2 3 0 0 0 3 0 4 0 6 0 -1 -3 2 0 0 0 1 0 0 0 4 2 0 1 Then I would like to capture the following information from there: cumulative count of entries, as matrix i

Using the Eigen library in C++, given a sparse matrix A, what is the most efficient way (row-wise operations? how to?) to compute a sparse matrix B such that B(i, j) = A(i, j) / A(i, i) ? That is, div

I feel there should be an easy solution but I can't find it: I have the sparse matrices A B with the same dimension n*n. I want to create matrix C which copies values in A where B is non-zero. This is

I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an

I would like to know the best way to check if a scipy sparse matrix, if CSC or CSR. Right now I'm using. rows, cols = X.shape() indptr = X.indptr() if len(indptr) == cols + 1: print csc else: print

What's the best way to represent a sparse data matrix in PostgreSQL? The two obvious methods I see are: Store data in a single a table with a separate column for every conceivable feature (potentiall