I am trying to cPickle a large scipy sparse matrix for later use. I am getting this error:

```
File "tfidf_scikit.py", line 44, in <module>
pickle.dump([trainID, trainX, trainY], fout, protocol=-1)
SystemError: error return without exception set
```

`trainX`

is the large sparse matrix, the other two are lists 6mil elements long.

```
In [1]: trainX
Out[1]:
<6034195x755258 sparse matrix of type '<type 'numpy.float64'>'
with 286674296 stored elements in Compressed Sparse Row format>
```

At this point, Python RAM usage is 4.6GB and I have 16GB of RAM on my laptop.

I think I'm running into a known memory bug for cPickle where it doesn't work with objects that are too big. I tried `marshal`

as well but I don't think it works for scipy matrices. Can someone offer a solution and preferably an example on how to load and save this?

Python 2.7.5

Mac OS 10.9

Thank you.

I had this problem with a multi-gigabyte Numpy matrix (Ubuntu 12.04 with Python 2.7.3 - seems to be this issue: https://github.com/numpy/numpy/issues/2396 ).

I've solved it using numpy.savetxt() / numpy.loadtxt(). The matrix is compressed adding a .gz file extension when saving.

Since I too had just a single matrix I did not investigate the use of HDF5.

Both `numpy.savetxt`

(only for arrays, not sparse matrices) and `sklearn.externals.joblib.dump`

(pickling, slow as hell and blew up memory usage) didn't work for me on Python 2.7.

Instead, I used `scipy.sparse.save_npz`

and it worked just fine. Keep in mind that it only works for `csc`

, `csr`

, `bsr`

, `dia`

or `coo`

matrices.

Similar Questions

For a current project I have to compute the inner product of a lot of vectors with the same matrix (which is quite sparse). The vectors are associated with a two dimensional grid so I store the vector

Is there any sparse matrix library that can do these: solve linear algebraic equations support operations like matrix-matrix/number multiplication/addition/subtraction,matrix transposition, get a row

I have a matrix A in CSC-format, of which I index just a single column b = A[:,col] resulting in a (n x 1) matrix. What I want to do is: v = M * b where M is a (n x n) matrix in CSR. The result v i

I am attempting to use Scipy's gmres command. My matrix is dense and large, so my plan was to use the LinearOperator command to return only the matrix vector product. Please see my previous question h

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

How do you save/load a scipy sparse csr_matrix in a portable format? The scipy sparse matrix is created on Python 3 (Windows 64-bit) to run on Python 2 (Linux 64-bit). Initially, I used pickle (with p

I got some sparse matrix like this >>>import numpy as np >>>from scipy.sparse import * >>>A = csr_matrix((np.identity(3))) >>>print A (0, 0) 1.0 (1, 1) 1.0 (2, 2) 1

I need to store around 50.000 scipy sparse csr matrices where each matrix is a vector of length 3.7Million: x = scipy.sparse.csr_matrix((3.7Mill,1)) I currently store them into a simple dictionary, b

I just started to learn to program in Python and I am trying to construct a sparse matrix using Scipy package. I found that there are different types of sparse matrices, but all of them require to sto

Finding the maximum sum subrectangle in an NxN matrix can be done in O(n^3) time using 2-d kadane's algorithm, as pointed out in other posts. However, if the matrix is sparse, specifically O(n) non-ze

I am currently trying to speed up my large sparse (scipy) matrix multiplications. I have successfully linked my numpy installation with OpenBLAS and henceforth, also scipy. I have run these tests with

I'm trying to use hcluster library in python. I have no enough python knowledges to use sparse matrix in hcluster. Please help me anybody. So, that what I'm doing: import os.path import numpy import

I hope my question is clear, but let's say I have a sparse matrix like following: import numpy as np a = np.eye(5, 5) a[0,3]=1 a[3,0]=1 a[4,2]=1 a[3,2]=1 a = csr_matrix(a) [[ 1. 0. 0. 1. 0.] [ 0. 1. 0

I have a very large and very sparse matrix, composed of only 0s and 1s. I then basically handle (row-column) pairs. I have at most 10k pairs per row/column. My needs are the following: Parallel inser

I'd like to find the N smallest eigenvalues of a sparse matrix in Python. I've tried using the scipy.sparse.linalg.eigen.arpack package, but it is very slow at computing the smallest eigenvalues. I re

I am trying to compute the m first eigenvectors of a large sparse matrix in R. Using eigen() is not realistic because large means N > 106 here. So far I figured out that I should use ARPACK from th

I have a square matrix with a few tens of thousands rows and columns with only a few 1 beside tons of 0, so I use the Matrix package to store that in R in an efficient way. The base::matrix object can

I am attempting to perform fastclust on a very large set of distances, but running into a problem. I have a very large csv file (about 91 million rows so a for loop takes too long in R) of similaritie

I have an array of high dimensional however very sparse matrices. I want to normalize them so that column sums of all matrices sum to one. Here is the sample code I use: bg = matrices{1}; for i = 2:le

i have two sq matrix (a, b) of size in order of 100000 X 100000. I have to take difference of these two matrix (c = a-b). Resultant matrix 'c' is a sparse matrix. I want to find the indices of all non

I'm coding the program that using linked list to store a sparse matrix. First I create a class Node contains the index of entry, value of entry and two pointers to next row and next column. Second I

I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb... I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going f

I really couldn't google it. How to transform sparse matrix to ndarray? Assume, I have sparse matrix t of zeros. Then g = t.todense() g[:10] matrix([[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]])

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matri

I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand: from scipy.sparse import * matrix = dok_matrix((en,en), int) for pub in pubs:

Good Afternoon, I am trying to do: scipy.sparse.dia_matrx(x, shape = (x.size, x.size)) but the resulting shape of the matrix is x.size x 1. Am i doing something wrong? Or did i miss something in the

Is it possible to speed up large sparse matrix calculations by e.g. placing parantheses optimally? What I'm asking is: Can I speed up the following code by forcing Matlab to do the operations in a spe

This Sparse Matrix and its 3-Tuple representation is not getting into my head... Either its bit tricky or my resources from where I am studying are really not that good... here is the URI Sparse Matri

I have a data file storing a large matlab sparse matrix (matlab 7.3) that needs to be used in my python program. I use h5py to load this sparse matrix and find there are 3 data structures associated w

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

I have a very large sparse matrix in Octave and I want to get the variance of each row. If I use std(A,1); it crashes because memory is exhausted. Why is this? The variance should be very easy to calc

I'm trying to set up a particular kind of sparse matrix in R. The following code gives the result I want, but is extremely slow: library(Matrix) f <- function(x){ out <- rbind(head(x, -1), tail(

I'm calculating the dot product between a scipy.sparse matrix (CSC) and a numpy ndarray vector: >>> print type(np_vector), np_vector.shape <type 'numpy.ndarray'> (200,) >>> pri

Below is my code for generating my sparse matrix: import numpy as np import scipy def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_,

I'm trying to manipulate some data in a sparse matrix. Once I've created one, how do I add / alter / update values in it? This seems very basic, but I can't find it in the documentation for the sparse

I get the following error message using python v2.7.3 and scipy v0.11.0 with py2exe v0.6.9: ImportError: No module named _csr my setup.py: from distutils.core import setup import py2exe setup(console=

I am trying to create a very huge sparse matrix which has a shape (447957347, 5027974). And, it contains 3,289,288,566 elements. But, when i create a csr_matrix using scipy.sparse, it return something

I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matrices, I am storing th

I ran into the following issue trying to vstack two large CSR matrices: /usr/lib/python2.7/dist-packages/scipy/sparse/coo.pyc in _check(self) 229 raise ValueError('negative row index found') 230 if s

Question one: Are there specialized databases to store dense and sparse matrices ? I googled but didn't find any... The matrix in question is huge (10^5 by 10^5) but it's sparse, which means that most

Hi I have a file having a structure as follows 12 45 56 34 65 31 12 23 43 and so on I have a huge dataset so I have text file having 3 columns but, the way I want to create the sparse matrix is that

let's say I have a big Matrix X with a lot of zeros, so of course I make it sparse in order to save on memory and CPU. After that I do some stuff and at some point I want to have the nonzero elements.

consider the code below, Am trying to make a Sparse_matrix that contains an array of pointer. In struct Sparse_matrix, Is this a proper way to create a pointer array when size is unknown? What's the

I have a large sparse graph that I am representing as an adjacency matrix (100k by 100k or bigger), stored as an array of edges. An example with a (non-sparse) 4 by 4 matrix: 0 7 4 0 example_array = [

I have a need for a sparse matrix in up to 4 dimensions in a .NET application. The size of the matrix (if represented as a .NET Array) would potentially top 400MB. The array is likely to be very spars

Is there any effective implement of the solution for sparse matrix linear equation using CUDA?

The question is: Is it possible to create a sparse matrix using the following sparse list implementation? In special, using a class template with a class template (SparseList*>)? I've created a cla

I am examining java version sparse matrix multiplication program which is from JGF benchmark. I run this program in many kinds of cpu frequency. I also do some profile for this program. I classify it

I was wondering if there is a operator for element-wise multiplication of rows of a sparse matrix with a vector in scipy.sparse library. Something similar to A*b for numpy arrays? Thanks.

I have a large, square sparse matrix in R (about 30M real numbers), and I'd like to see the distribution of its values. If I use the hist function, as most values are 0, I get a very tall first bar an