I want to test some of the newer sparse linear solvers and I want to know if there is a fast way of filling in the matrix. The format I'm interested is CSR (http://goo.gl/hLXYd). Let's say the matrix, in CSR format, is given by:

```
values(num non-zero elements)
columns(num non-zero elements)
rowIndex(num rows + 1)
```

The sparse matrix under consideration derives from networks. So, I have thousands of nodes and some of them are connected between them by lines. So, the matrix is structurally symmetric. Each connection (i,j) adds something to the diagonal terms (i,i) and (j,j) and to the off-diagonal (i,j) and (j,i). I could have several connections between the same nodes (i,j,1), (i,j,2)... So, I might need to revisit the 2 diagonal and 2 off-diagonal elements more than once.

I know I can get the beginning of the row by doing rowIndex(i). Then, I would have to run through the elements columns(rowIndex(i):rowIndex(i+1)-1) to find where is j situated.

**The question:**

Is there a way of accessing the elements faster, while in CSR format, without having to do a search every time I want to update an element?

Some clarifications: I just need to fill in the matrix from scratch. The matrix is structurally symmetric and not really symmetric. The values saved have to do with network data (impedances, resistances etc), they have real values. In general Value(i,j)<>Value(j,i). I have tuples of the form (name1,i1,j1,value1), (name2,i2,j2,value2) etc. These tuples are not sorted, and 2 tuples can refer to the same i,j values, meaning they need to be added

Thanks in advance!

Your question seems to confuse 2 rather different questions:

- What is a fast way of creating a matrix in CSR form ?
- Is there a faster way of reading values from a matrix already stored in CSR form ? (Faster, that is, than the straightforward approach you describe)

So here are 2 answers:

- In general, read the network data from whatever form it is in into something like a dictionary of keys (other intermediate forms are available and may be more appealing to you for speed or other reasons); then turn that intermediate structure into the CSR form of the matrix. More on this below.
- I don't believe so, not with a matrix stored in CSR form. This relative slowness of access is part of the price you pay for saving space. You've traded time for space, or space for time, depending on your point of view.

Your description of your input data suggests that you should consider devising your own intermediate form into which to marshal the raw data. Since your adjacency matrix is symmetric you only need to store, in any form, half of it. Further, you probably don't need to store the elements along the main diagonal -- I'm guessing either that node `i`

is always connected to node `i`

or never so that the nature of the network determines the value stored at `(i,i)`

. I'm a little uncertain of the information you want to store at each node of the matrix, is it the number of connections between `i`

and `j`

or something else ?

What you have is so called triplet sparse format. Creation of CRS, including removing duplicate entries and summing the values, can be implemented very efficiently. Before programing it yourself, have a look at the SuiteSparse library. It is written in C, but I'm sure you will understand the principle. What interests you is the `cholmod_triplet.c`

file, which implements the functionality you need.

Essentially, the conversion is performed using two phase bucket sort on your row and column indices. This algorithm has linear complexity, which is important if you are interested in processing large data sets.

**Edit** If you want to skip explicit creation of the triplet format all together, you can do that by generating the `(row, col)`

connectivities on the fly and adding them to a dynamic sparse structure. I usually do it using insertion sort and sorted lists, which is in practice the fastest. It is also faster than triplet to CRS conversion, and uses much less memory. The method goes as follows:

if you know approximately, how many non-zero entries there are in every row, for every row you pre-allocate an array of (empty) column indices, and a separate array for the values (not linked list, but a simple array) of that size. Something like

`static_lists_cols[row] = malloc(sizeof(int)*expected_number_of_non_zeros)`

`static_lists_vals[row] = malloc(sizeof(double)*expected_number_of_non_zeros)`

If you do not know that, you choose an initial size and reallocate as needed (using some block size large enough to avoid reallocation overhead) when the row lists are full.

- for every
`(row, col)`

pair you insert the`col`

into the sorted list corresponding to`row`

using insertion sort. For small (up to a few hundred) non-zeros per row linear search is the fastest. For larger number of non-zeros per row you can use bisection to locate the correct place to insert the`col`

index. `col`

is inserted into`row`

th sorted list by moving the non-zero entries with higher column index in the sorted list. This is cache-friendly, since the rows are in practice small enough to fit into any cache nowadays.- After you finish you need to assemble the individual sorted lists into a valid CRS structure by copying the individual row lists into the final
`columns`

. The same with values. - You could actually avoid the last step by pre-allocating a static 'array of lists' if you are ok that some of the rows can have zero entries. You will hence have a constant number of entries per row, some of which might be zero. Sometimes that is ok.

This method is faster than using triplet to sparse conversion, at least for FEM models, for which I use it. The general reason is that *memory bandwidth* is the bottleneck here, and the above scheme uses much less memory:

- creating the triplet format takes time, and you need to write the triplets to memory
- conversion to CRS requires reading and writing the triplets at least once to sort them (actually a bit more than once, if you look at the algorithm. You sort twice, and you need auxiliary data structures.)
- depending on the connectivity structure, you may end up having a large number of
`(row, col)`

duplicates in the triplet format, which are removed during the assembly by adding the corresponding values. This overhead does not exist in the method above - if the`col`

already exists in the row list, you simply update the corresponding value. - updating the sorted lists can be done in parallel if you assign row ranges to individual workers. No communication, nor synchronization is needed. Assuring load balancing is another story...

Have a look at a performance comparison of using those two methods (Figure 1) for triangular elements in 2D. Note that the performance difference depends on the *ratio of the number of entries in the triplet to assembled sparse matrix format* (Table 2). But in general, the method is never worse than triplet to crs conversion, **and triplets need to be created in the first place**. You can also download a MATLAB MEX function `sparse_create`

, which is a part of `mutils`

package (see the downloads section).

Similar Questions

What is the compact way of storing a sparse matrix that allows to iterate over each row and each column efficiently?

I got a problem when using octave sparse matrix. max(speye(65536)(:)) will result in a 0x0 variable. However, speye(65535) and speye(65537) works. How that happens? My octave version is 3.2.4 in Fedo

I am working with data from neuroimaging and because of the large amount of data, I would like to use sparse matrices for my code (scipy.sparse.lil_matrix or csr_matrix). In particular, I will need to

I need to create a matrix with values from a numpy array. The values should be distributed over the matrix lines according to an array of indices. Like this: >>> values array([ 0.73620381, 0.

I would like to get the minimum nonzero values per row in a sparse matrix. Solutions I found for dense matrices suggested masking out the zero values by setting them to NaN or Inf. However, this obvio

I wanted to repeat the rows of a scipy csr sparse matrix, but when I tried to call numpy's repeat method, it simply treats the sparse matrix like an object, and would only repeat it as an object in an

I have built a small code that I want to use for solving eigenvalue problems involving large sparse matrices. It's working fine, all I want to do now is to set some elements in the sparse matrix to ze

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

I would appreciate any help, to understand following behavior when slicing a lil_matrix (A) from the scipy.sparse package. Actually, I would like to extract a submatrix based on an arbitrary index lis

I'm trying to figure out how to efficiently solve a sparse triangular system, Au*x = b in scipy sparse. For example, we can construct a sparse upper triangular matrix, Au, and a right hand side b with

I created a sparse matrix in MEX using mxCreateSparse. mxArray *W; W=mxCreateSparse(n*n,n*n,xsize,mxREAL); double *wpoint; wpoint=mxGetPr(W); for(p=0;p<xsize;p++) { Wpoint[(returnindex1(xi[p][0],xi

I use some C++ code to take a text file from a database and create a dgcMatrix type sparse matrix from the Matrix package. For the first time, I'm trying to build a matrix that has more than 2^31-1 no

Below is my code for generating my sparse matrix: import numpy as np import scipy def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_,

I need to create spare matrices with variable elements. Unfortunately, sparse matrix index operations are very slow. Is there any way to speed up the process? Maybe there are some tricks that I don't

I need a command to check for zero sparse matrix, isempty(..) does not work. Is there some sparse version of isempty(..)? >> mlf2=sparse([],[],[],2^31+1,1) mlf2 = All zero sparse: 2147483649-by-

I am trying to add a numpy ndarray to a sparse matrix and I have been unsuccessful in doing so. I was wondering if there is a way to do so, without transforming my sparse matrix into a dense one. ano

I have downloaded and included UJMP (Universal Java Matrix Package) library to my project for generating sparse matrix. But I could not find any documentation about functions of the library, how to cr

I want to multiply a sparse matrix A, with a matrix B which has 0, -1, or 1 as elements. To reduce the complexity of the matrix multiplication, I can ignore items if they are 0, or go ahead and add th

Is it possible to determine the byte size of a scipy.sparse matrix? In NumPy you can determine the size of an array by doing the following: import numpy as np print(np.zeros((100, 100, 100).nbytes) 80

I want to create a custom optimized matrix operation (a smart kronecker product based on what I know about the sparse matrices i'm using) using MathNet.numerics for csharp. Is there an accessor to get

I have three lists namely A , B , C All these lists contain 97510 items . I need to create a sparse matrix like this matrix[A[0]][B[0]] = C[0] For example , A=[1,2,3,4,5] B=[7,8,9,10,11] C=[14,15,1

I have to implement sparse matrix and do some decompositions like Cholesky Decomposition, LU Decomposition, QR Decomposition on it. Actually I found a library called JAMA which is capable of doing th

I have never use R ,but now I need import a sparse matrix to do association rule in R My import data is a sparse matrix like this: i j x 1 2 3 1 2 3 5 1 3 3 1

I have a sparse matrix (dgCMatrix) as the result of fitting a glmnet. I want to write this result to a CSV but can't use write.table() the matrix because it can't coerced into a data.frame. Is there a

I have two large sparse matrices: In [3]: trainX Out[3]: <6034195x755258 sparse matrix of type '<type 'numpy.float64'>' with 286674296 stored elements in Compressed Sparse Row format> In [

There is a nonzero() method for the csr_matrix of scipy library, however trying to use that function for csr matrices result in an error, according to the manual that should return a tuple with row an

I have a 1000x1000 sparse matrix (called ppm) in csr_matrix, with 39,000 nonzero elements. It is a symmetric matrix, so I want to transfrm it into a networkx undirected graph with weights. If I use th

I want to initialise a sparse matrix (for use with scipy minimum_spanning_tree if that matters) from a list of matrix coordinates and values. That is, I have: coords - Nx2 array of coordinates to be s

I have a matrix A in CSC-format, of which I index just a single column b = A[:,col] resulting in a (n x 1) matrix. What I want to do is: v = M * b where M is a (n x n) matrix in CSR. The result v i

I have a 1034_by_1034 sparse matrix (scipy.sparse.csr.csr_matrix), which basically represents the adjacency matrix of a graph. I want to check if some elements are ones or not. But I found this to be

I have a Table in MySQL with three columns: row-index, column-index and value, which I want to read in into a scipy csr_matrix. I use the Python-MySQL connector. There are 112,500 non-zero elements. T

I am working on an algorithm that uses diagonal and first off-diagonal blocks of a large (will be e06 x e06) block diagonal sparse matrix. Right now I create a dict that stores the blocks in such a wa

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following: from scipy.sparse import lil_matrix x = lil_matrix( (20,1) ) x[1

I wanted CSR files preferably from matrix market for my OpenCL library, I searched a lot for CSR generators in C but didn't get any. I find matrix market formats comfortable since they have defined th

I'm having trouble resizing a matrix - the set_shape function seems to have no effect: >>> M <14x3562 sparse matrix of type '<type 'numpy.float32'>' with 6136 stored elements in LInk

I have a sparse weighted directed graph represented in a file with each line in the format from to weight I would like to read it into scipy's compressed sparse format so I can perform simple traver

I have the following sparse matrix that contains O(N) elements boost::numeric::ublas::compressed_matrix<int> adjacency (N, N); I could write a brute force double loop to go over all the entries

I'm wondering whether there is a way to simply add a dense vector to all the rows of a sparse matrix represented as a csr_matrixin scipy.sparse and returning a sparse matrix, ie trying to sum only the

I hope my question is clear, but let's say I have a sparse matrix like following: import numpy as np a = np.eye(5, 5) a[0,3]=1 a[3,0]=1 a[4,2]=1 a[3,2]=1 a = csr_matrix(a) [[ 1. 0. 0. 1. 0.] [ 0. 1. 0

What's the best way to represent a sparse data matrix in PostgreSQL? The two obvious methods I see are: Store data in a single a table with a separate column for every conceivable feature (potentiall

I have a sparse matrix that represents a 3D rectangular space. Along some of the boundaries, I know what the value is going to be (it's a constant). The other boundaries may be reflective, differentia

I have a numpy array such as: array = [0.2, 0.3, 0.4] (this vector is actually size 300k dense, I'm just illustrating with simple examples) and a sparse symmetric matrix created using Scipy such as f

I have to process a large sparse matrix whose size is 6004*17842 (doc*terms). The function find() has been tried to get its rows, cols and values and the result has been save in ascii form. But the te

I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand: from scipy.sparse import * matrix = dok_matrix((en,en), int) for pub in pubs:

I want to save two sparse matrix Y and R to a mat file. However, when I run the following code, I found out that the twomatrices.mat contains two full matrices instead of sparse matrices. Do does .mat

I am working on a code that deals with large networks and therefor I am restricted to use pythons scipy.sparse matrices (desirably csr). Now I would like to write a function, that takes two sparse, bo

I got Memory Error when I was running dbscan algorithm of scikit. My data is about 20000*10000, it's a binary matrix. (Maybe it's not suitable to use DBSCAN with such a matrix. I'm a beginner of machi

Sorry for rookie question. I'm learning to work with scipy.sparse and I'm out of ideas why this code does not work. The dimensions are correct but the subtraction can not be computed: c=(count_mat[i])

I have to invert a large sparse matrix. I cannot escape from the matrix inversion, the only shortcut would be to just get an idea of the main diagonal elements, and ignore the off-diagonal elements (I

I create a sparse matrix in scala breeze, ie using http://www.scalanlp.org/api/breeze/linalg/CSCMatrix.html. Now I want to get a column slice from it. How to do this? Edit: there are some further requ