I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using `scipy.sparse.lil_matrix`

. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be highly inefficient (and in my case, impossible) to store all the data twice, I'm currently storing data at the coordinate (i,j) where i is always smaller than j. So in other words, I have a value stored at (2,3) and no value stored at (3,2), even though (3,2) in my model should be equal to (2,3). (See the matrix below for an example)

My problem is that I need to be able to randomly extract the data corresponding to a given index, but, at least the way, I'm currently doing it, half the data is in the row and half is in the column, like so:

```
M =
[1 2 3 4
0 5 6 7
0 0 8 9
0 0 0 10]
```

So, given the above matrix, I want to be able to do a query like `M[1]`

, and get back `[2,5,6,7]`

. I have two questions:

1) Is there a more efficient (preferably built-in) way to do this than first querying the row, and then the column, and then concatenating the two? This is bad because whether I use CSC (column-based) or CSR (row-based) internal representation, one of the two queries is highly inefficient.

2) Am I even using the right part of Scipy? I have seen a few functions in the Scipy library that mention triangular matrices, but they seem to revolve around getting triangular matrices from a full matrix. In my case, (I think) I already have a triangular matrix, and want to manipulate it.

Many thanks.

I would say that you can't have the cake and eat it too: if you want efficient storage, you cannot store full rows (as you say); if you want efficient row access, I'd say that you have to store full rows.

While real performances depend on your application, you could check whether the following approach works for you:

You use Scipy's sparse matrices for efficient storage.

You automatically symmetrize your matrix (there is a small recipe on StackOverflow, that works at least on regular matrices).

You can then access its rows (or columns); whether this is efficient depends on the implementation of sparse matricesâ€¦

Similar Questions

I'm currently working on a scipy sparse csr matrix. I would like to delete all rows in the matrix that contain 0 in the data array of the matrix (the data array is the 1s and 2s you can see in the exa

Is there a way to compute the power of a sparse matrix in matlab without converting it to a full matrix. If I try b = a^0.5 where a is a sparse matrix, I get the error Use full(x)^full(y).. However

Question is very simple: Let's say I have a given row r from scipy sparse matrix M (100,000X500,000), I want to find its location/index in the M matrix? How can I accomplish this in an efficient way?

I am trying to compute the m first eigenvectors of a large sparse matrix in R. Using eigen() is not realistic because large means N > 106 here. So far I figured out that I should use ARPACK from th

I have a large matrix that I would like to convert to sparse CSR format. When I do: import scipy as sp Ks = sp.sparse.csr_matrix(A) print Ks Where A is dense, I get (0, 0) -2116689024.0 (0, 1) 39462

I am trying to cPickle a large scipy sparse matrix for later use. I am getting this error: File tfidf_scikit.py, line 44, in <module> pickle.dump([trainID, trainX, trainY], fout, protocol=-1)

I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matrices, I am storing th

Sorry for rookie question. I'm learning to work with scipy.sparse and I'm out of ideas why this code does not work. The dimensions are correct but the subtraction can not be computed: c=(count_mat[i])

As I understand LU factorization, it means that a matrix A can be written as A = LU for a lower-triangular matrix L and an upper-triangular matrix U. However, the functions in scipy relating to LU fac

I have a large sparse matrix A, and I would like to create a sparse matrix of the 3X3 block diagonals of A. How would I do this? keep in mind that A is very large and sparse, so any methods that use i

I have a large csv of similarities between keywords that I would like to convert it to a triangular distance matrix (because it is very large and sparse would be even better) to perform hierarchical c

What I'm looking for: a way to implement in Python a special multiplication operation for matrices that happen to be in scipy sparse (csr) format. This is a special kind of multiplication, not matrix

How can you take the log base 10 of every element in a sparse matrix (COO)? >>print type(X) <class 'scipy.sparse.coo.coo_matrix'> I've tried this but it doesn't work: import math X.data =

I'm working with large sparse matrices that are not exactly very sparse and I'm always wondering how much sparsity is required for storage of a matrix as sparse to be beneficial? We know that sparse r

I have been trying to make a general import of Ghaul's answer to my earlier question about importing an upper triangular matrix. Initial Data: 1.0 3.32 -7.23 1.00 0.60 1.00 A = importdata('A.txt') A =

I want to know how to add sparse matrices in python. I have a program that breaks a big task into subtasks and distributes them across several cpus. Each subtask yields a result (a scipy sparse matrix

I am trying to find the indices of nonzero entries by row in a sparse matrix: scipy.sparse.csc_matrix. So far, I am looping over each row in the matrix, and using numpy.nonzero() to each row to get t

I recently encountered difficulty when converting a coo_matrix to a dense matrix using scipy. I have a dtype float16 sparse matrix and attempt to convert it to a dense matrix. The error complains abou

I am trying to add a numpy ndarray to a sparse matrix and I have been unsuccessful in doing so. I was wondering if there is a way to do so, without transforming my sparse matrix into a dense one. ano

Suppose I have a list or a list of lists (each list with the same size). How do I convert to a sparse vector or sparse matrix, respectively?

Given the number of rows (or columns) , n, of a square matrix, I am trying to get the index pairs of the lower triangular matrix in a 1 dimensional list. So far I thought of the following solution: de

I am trying to do an efficient sparse matrix multiplication. Right now I am reading the data into memory and this is how my data structure looks like: typedef struct node{ int x; int y; int value; st

Well, Trying to do something with search engines. I have generated a matrix (term-document) from a collection of 5 documents. The output is: docs= (5,1) 1.0000 (1,2) 0.7071 (3,2) 0.7071 (1,3) 0.7071

Suppose I have a matrix in the CSR format, what is the most efficient way to set a row (or rows) to zeros? The following code runs quite slowly: A = A.tolil() A[indices, :] = 0 A = A.tocsr() I had to

Good Afternoon, I am trying to do: scipy.sparse.dia_matrx(x, shape = (x.size, x.size)) but the resulting shape of the matrix is x.size x 1. Am i doing something wrong? Or did i miss something in the

I am trying to find an efficient way to retrieve a list / vector / array of the non-zero upper triangular elements of a sparse matrix in R. For example: library(igraph) Gmini <- as.directed(graph.

I wish to speed up my machine learning algorithm (written in Python) using Numba (http://numba.pydata.org/). Note that this algorithm takes as its input data a sparse matrix. In my pure Python impleme

I'm coding the program that using linked list to store a sparse matrix. First I create a class Node contains the index of entry, value of entry and two pointers to next row and next column. Second I

Shouldn't the following uses of eigh and eigsh from the sparse and normal linalg libraries be giving the same answer? from numpy import random from scipy.linalg import eigh as E1 from scipy.sparse.lin

I'm trying to use large 10^5x10^5 sparse matrices but seem to be running up against scipy: n = 10 ** 5 x = scipy.sparse.rand(n, n, .001) gets ValueError: Trying to generate a random sparse matrix su

I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string

I can't find more info about scipy.sparse indexing except SciPy v0.11 Reference Guide, which says that The lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy ar

is there an easy way to shuffle a sparse matrix in python? This is how I shuffle a non-sparse matrix: index = np.arange(np.shape(matrix)[0]) np.random.shuffle(index) return matrix[index] How can I d

I want to augment the scipy.sparse.csr_matrix class with a few methods and replace a few others for personal use. I am making a child class which inherits from csr_matrix, as such: class SparseMatrix(

Given a Scipy CSC Sparse matrix sm with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector v (170k x 1) with a few non-null points, is there anything that can be don

I have the following sparse matrix that contains O(N) elements boost::numeric::ublas::compressed_matrix<int> adjacency (N, N); I could write a brute force double loop to go over all the entries

I have sparse CSR matrices (from a product of two sparse vector) and I want to convert each matrix to a flat vector. Indeed, I want to avoid using any dense representation or iterating over indexes. S

I'm trying to manipulate some data in a sparse matrix. Once I've created one, how do I add / alter / update values in it? This seems very basic, but I can't find it in the documentation for the sparse

I've got a scipy.sparse_matrix A and I want to zero-out a decently-sized fraction of the elements. (In the matrices I'm working with today, A has about 70M entries and I want to zero-out about 700K of

I wanted CSR files preferably from matrix market for my OpenCL library, I searched a lot for CSR generators in C but didn't get any. I find matrix market formats comfortable since they have defined th

I have a large sparse numpy/scipy matrix where each row corresponds to a point in high-dimensional space. I want make queries of the following kind: Given a point P (a row in the matrix) and a distanc

I have a sparse weighted directed graph represented in a file with each line in the format from to weight I would like to read it into scipy's compressed sparse format so I can perform simple traver

I'm integrating a system of stiff ODE's using SciPy's integrate.odeint function. As the integration is non-trivial and time consuming I'm also using the corresponding jacobian. By rearranging the equa

I noticed on Google Summer of Code 2013 that a possible project was implement sparse matrix support for Decision Trees and ensemble methods. Out of curiosity, did this project get anywhere? I really n

I have been given this 63521x63521 real sparse symmetric matrix in MATLAB and for some reason it seems to be behaving weirdly for some commands. I am not sure if there is a 'defect' in the matrix fil

I have a data file storing a large matlab sparse matrix (matlab 7.3) that needs to be used in my python program. I use h5py to load this sparse matrix and find there are 3 data structures associated w

I can define a sparse Matrix using a vector for i, j, and x: i <- c(1,3:8) j <- c(2,9,6:10) x <- 7 * (1:7) (A <- sparseMatrix(i, j, x = x)) I want to extract the i, j, and x elements from

I have two large, scipy sparse matrices, representing time series data. In the first, each row represents a user's music listening over a number of months (the columns), with each value in the row bei

We have an application that stores a sparse matrix. This matrix has entries that mostly exist around the main diagonal of the matrix. I was wondering if there were any efficient algorithms (or existin

I am filling a sparse matrix P (230k,290k) with values coming from a text file which I read line by line, here is the (simplified) code while ... C = textscan(text_line,'%d','delimiter',',','EmptyValu