I have the following sparse matrix A.

```
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
```

Then I would like to capture the following information from there:

cumulative count of entries, as matrix is scanned columnwise. Yielding:

Ap = [ 0, 2, 5, 9, 10, 12 ];

row indices of entries, as matrix is scanned columnwise. Yielding:

Ai = [0, 1, 0, 2, 4, 1, 2, 3, 4, 2, 1, 4 ];

Non-zero matrix entries, as matrix is scanned columnwise. Yielding:

Ax = [2, 3, 3, -1, 4, 4, -3, 1, 2, 2, 6, 1];

Since the actual matrix A is potentially very2 large, is there any efficient way in Perl that can capture those elements? Especially without slurping all matrix A into RAM.

I am stuck with the following code. Which doesn't give what I want.

```
use strict;
use warnings;
my (@Ax, @Ai, @Ap) = ();
while (<>) {
chomp;
my @elements = split /\s+/;
my $i = 0;
my $new_line = 1;
while (defined(my $element = shift @elements)) {
$i++;
if ($element) {
push @Ax, 0 + $element;
if ($new_line) {
push @Ai, scalar @Ax;
$new_line = 0;
}
push @Ap, $i;
}
}
}
push @Ai, 1 + @Ax;
print('@Ax = [', join(" ", @Ax), "]\n");
print('@Ai = [', join(" ", @Ai), "]\n");
print('@Ap = [', join(" ", @Ap), "]\n");
```

Ap is easy: simply start with zeroes and increment each time you meet a nonzero number. I don't see you trying to write anything into @Ap, so it's no surprise it doesn't end up as you wish.

Ai and Ax are trickier: you want a columnwise ordering while you're scanning rowwise. You won't be able to do anything in-place since you don't know yet how many elements the columns will yield, so you can't know in advance the elements' position.

Obviously, it would be a hell lot easier if you could just alter the requirement to have a rowwise ordering instead. Failing that, you could get complex and collect (i, j, x) triplets. While collecting, they'd naturally be ordered by (i, j). Post-collection, you'd just want to sort them by (j, i).

The code you provided works on a row-by-row basis. To get results sequential by columns you have to accumulate your values into separate arrays, one for each column:

```
# will look like ([], [], [] ...), one [] for each column.
my @columns;
while (<MATRIX>) {
my @row = split qr'\s+';
for (my $col = 0; $col < @row; $col++) {
# push each non-zero value into its column
push @{$columns[$col]}, $row[$col] if $row[$col] > 0;
}
}
# now you only need to flatten it to get the desired kind of output:
use List::Flatten;
@non_zero = flat @columns;
```

See also List::Flatten.

This is what you are looking for, I guess:

```
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper::Simple;
my @matrix;
# Populate @matrix
while (<>) {
push @matrix, [ split /\s+/ ];
}
my $columns = @{ $matrix[0] };
my $rows = @matrix;
my ( @Ap, @Ai, @Ax );
my $ap = 0;
for ( my $j = 0 ; $j <= $rows ; $j++ ) {
for ( my $i = 0 ; $i <= $columns ; $i++ ) {
if ( $matrix[$i]->[$j] ) {
$ap++;
push @Ai, $i;
push @Ax, $matrix[$i]->[$j];
}
}
push @Ap, $ap;
}
print Dumper @Ap;
print Dumper @Ai;
print Dumper @Ax;
```

A common strategy for storing sparse data is to drop the values you don't care about (the zeroes) and to store the row and column indexes with each value that you do care about, thus preserving their positional information:

```
[VALUE, ROW, COLUMN]
```

In your case, you can economize further since all of your needs can be met by processing the data column-by-column, which means we don't have to repeat COLUMN for every value.

```
use strict;
use warnings;
use Data::Dumper;
my ($r, $c, @dataC, @Ap, @Ai, @Ax, $cumul);
# Read data row by row, storing non-zero values by column.
# $dataC[COLUMN] = [
# [VALUE, ROW],
# [VALUE, ROW],
# etc.
# ]
$r = -1;
while (<DATA>) {
chomp;
$r ++;
$c = -1;
for my $v ( split '\s+', $_ ){
$c ++;
push @{$dataC[$c]}, [$v, $r] if $v;
}
}
# Iterate through the data column by column
# to compute the three result arrays.
$cumul = 0;
@Ap = ($cumul);
$c = -1;
for my $column (@dataC){
$c ++;
$cumul += @$column;
push @Ap, $cumul;
for my $value (@$column){
push @Ax, $value->[0];
push @Ai, $value->[1];
}
}
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
```

**Updated** based on FM's comment. If you do not want to store any of the original data:

```
#!/usr/bin/perl
use strict;
use warnings;
my %matrix_info;
while ( <DATA> ) {
chomp;
last unless /[0-9]/;
my @v = map {0 + $_ } split;
for (my $i = 0; $i < @v; ++$i) {
if ( $v[$i] ) {
push @{ $matrix_info{$i}->{indices} }, $. - 1;
push @{ $matrix_info{$i}->{nonzero} }, $v[$i];
}
}
}
my @cum_count = (0);
my @row_indices;
my @nonzero;
for my $i ( sort {$a <=> $b } keys %matrix_info ) {
my $mi = $matrix_info{$i};
push @nonzero, @{ $mi->{nonzero} };
my @i = @{ $mi->{indices} };
push @cum_count, $cum_count[-1] + @i;
push @row_indices, @i;
}
print(
"\@Ap = [@cum_count]\n",
"\@Ai = [@row_indices]\n",
"\@Ax = [@nonzero]\n",
);
__DATA__
2 3 0 0 0
3 0 4 0 6
0 -1 -3 2 0
0 0 1 0 0
0 4 2 0 1
```

**Output:**

C:\Temp> m @Ap = [0 2 5 9 10 12] @Ai = [0 1 0 2 4 1 2 3 4 2 1 4] @Ax = [2 3 3 -1 4 4 -3 1 2 2 6 1]

Similar Questions

I'm making a little program to make a representation of sparse matrixes (a matrix with a lot of elements equal to zero). Represented like this page 108 (I think watching at the figure is enough to und

I was wondering if there's any way to have a scipy.sparse.csc_matrix format for mlpy in python. I have worked with mlpy before and have always dealt with non sparse matrices. For instance if I have 5

I'm trying to use large 10^5x10^5 sparse matrices but seem to be running up against scipy: n = 10 ** 5 x = scipy.sparse.rand(n, n, .001) gets ValueError: Trying to generate a random sparse matrix su

I have n documents in MongoDB containing a scipy sparse vector, stored as a pickle object and initially created with scipy.sparse.lil. The vectors are all of the same size, say p x 1. What I need to

Is there a sparse matrix library that copies the functionality of dense BLAS? I'd want at least: efficient SYR and SYRK (rank-k update) with sparse input (and possibly dense output), option for spars

I need a code that receives an arbitrary matrix and find the nonzero values the harder approach is needed not simple commands like nnz ! i tried this m = input( ' Enter row elements of a matrix ' )

for class I have to write my own linear equation solver for sparse matrices. I am free to use any type of data structure for sparse matrices and I have to implement several solves, including conjuguat

If you have a sparse matrix X: >> X = csr_matrix([[0,2,0,2],[0,2,0,1]]) >> print type(X) >> print X.todense() <class 'scipy.sparse.csr.csr_matrix'> [[0 2 0 2] [0 2 0 1]] And a

I have a very large sparse matrix (each row is several thousand elements--most of the elements are 0's). In addition, I have a vector of row indexes, and I need to perform the following operation on e

I have a very large sparse matrix in Octave and I want to get the variance of each row. If I use std(A,1); it crashes because memory is exhausted. Why is this? The variance should be very easy to calc

I have this huge sparse matrix A of size 2 million by 10 thousand. I want to index particular 1000 rows (index) from this matrix. If I do B = A(index,:); it takes some time. Is there a better quick

I'm trying to put a matrix in sparse form, and so far I'm looping through the rows and columns checking every element to see if it's non zero. However, this seems to be order n^2, where n is the numbe

I have a sparse matrix that I obtained by using Sklearn's TfidfVectorizer object: vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', vocabulary=my_vocab, stop_words='english') tfid

This Sparse Matrix and its 3-Tuple representation is not getting into my head... Either its bit tricky or my resources from where I am studying are really not that good... here is the URI Sparse Matri

Could anyone recommend set of tools to perform standard NMF application onto sparse input data [ matrix of size 50kx50k ], thanks!

I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matrices, I am storing th

I have encountered a difference in how slicing a scipy sparse matrix works in 0.10.0 and 0.10.1. Consider the following piece of code: from numpy import array, ravel from scipy.sparse import csr_matri

I am trying to compute nearest neighbour clustering on a Scipy sparse matrix returned from scikit-learn's DictVectorizer. However, when I try to compute the distance matrix with scikit-learn I get an

Given an image of size [hh,ww], I would like to create efficiently a sparse matrix of size [hh*ww, hh*ww]. For each 4- or 8-neighbor of a given pixel, the sparse matrix should be filled with a constan

I am using cusp for sparse matrix multiplication. From the resultant matrix i need the max value without copying the matrix from device memory to host memory. I am planning to wrap the resultant matri

I'm writing a lexer generator as a spare time project, and I'm wondering about how to go about table compression. The tables in question are 2D arrays of short and very sparse. They are always 256 cha

I did a sparse qr decomposition with Matrix package in R like a <- Matrix(runif(20), nrow = 5, sparse = T) a[3:5,] <- 0 #now **a** is a 5X4 matrix b <- qr.R(qr(a), complete = T) #but now **

Let A be a sparse matrix in coordinate format [row(int) col(int) val(float)]. If a upper triangular sparse matrix of A is needed, then the same can be obtained using logical indexing like: A = A(A(:,1

I need to pass a scipy.sparse CSR matrix to a cython function. How do I specify the type, as one would for a numpy array?

I would like to get the minimum nonzero values per row in a sparse matrix. Solutions I found for dense matrices suggested masking out the zero values by setting them to NaN or Inf. However, this obvio

preface: As the matlab guiderules state, Usually, when one wants to efficiently populate a sparse matrix in matlab, he should create a vector of indexes into the matrix and a vector of values he wants

It seems to me that in SAGE the only difference between creating a dense matrix and a sparse matrix is by the flag passed to the constructor (sparse = True). In particular, this means that if I want

I originally asked a related question here, but didn't really seem to get anywhere. Perhaps if I rephrase part of it more specifically it might help.... I have files stored using Matlab's sparse forma

I have a big sparse matrix. I want to take log4 for all element in that sparse matrix. I try to use numpy.log() but it doesn't work with matrices. I can also take logarithm row by row. Then I crush

Is there a well-vectorized way to take the product of all the nonzero elements in each column of a sparse matrix in octave (or matlab) (returning a row-vector of products)?

I have a very big and sparse matrix, represented as a CSV file (67 GB). Is it possible to load and work with this matrix in Matlab? I can use a 64bit version on a MAC OS computer, 8GB RAM. I have read

I am attempting to cluster a set of data points that are represented as a sparse scipy matrix, X. That is, >>> print type(X) <class 'scipy.sparse.csr.csr_matrix'> >>> print X.s

I am using the C++ Eigen 3 library in my program. In particular, I need to multiply two Eigen sparse matrix and store the result into another Eigen sparse matrix. However, I noticed that if the some e

I have built a small code that I want to use for solving eigenvalue problems involving large sparse matrices. It's working fine, all I want to do now is to set some elements in the sparse matrix to ze

I have a matrix of factors in R and want to convert it to a matrix of dummy variables 0-1 for all possible levels of each factors. However this dummy matrix is very large (91690x16593) and very spar

I need to perform a set of operations on a scipy sparse matrix in a Cython method. To efficiently apply these I need access to lil_matrix representation. The lil (linked-list sparse matrix) data repre

What's the best way to find the maximum value and their corresponding row and column indices in a Scipy sparse lil_matrix object ? I can loop through the nonzero entries using itertools.izip, but is t

I really couldn't google it. How to transform sparse matrix to ndarray? Assume, I have sparse matrix t of zeros. Then g = t.todense() g[:10] matrix([[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]])

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

What am I doing wrong here? I want to element-wise multiply two sparse matrices using Colt. Here's an example of how I'm attempting to do this: DoubleMatrix2D A = new SparseDoubleMatrix2D(2, 2); A.se

I'm trying to optimize a piece of code that solves a large sparse nonlinear system using an interior point method. During the update step, this involves computing the Hessian matrix H, the gradient g,

I try to create a compressed_matrix using a coordinate_matrix as a builder: #include <boost/numeric/ublas/io.hpp> #include <boost/numeric/ublas/matrix_sparse.hpp> using namespace boost::nu

I would like to multiply single rows of a csr matrix with a scalar. In numpy I would do matrix[indices,:] = x * matrix[indices,:] For csr this raises an exception in scipy. Is there a way to do this

So I have this matrix here, and it is of size 13 x 8198. (I have called it 'blah'). This is a sparse matrix, in that, most of its entries are 0. When I do an imagesc(blah), I get the following image:

I have a large scipy sparse matrix, which is taking up >90% of my total system memory. I would like to save it to disk, as it takes hours to build the matrix... I tried cPickle, but that leads to a

I know there are quite a few good ways to store a sparse matrix without taking up much memory. But I'm wondering whether there is a good way to store a sparse matrix during the construction of it? Her

I have a list of 100k items and each item has a list of indices. I am trying to put this into a boolean sparse matrix for vector multiplication. My code isn't running as fast as I would like, so I am

If I have a matrix like this A = [1 2; 3 4]; I can use interp2 to interpolate it like this newA = interp2(A,2); and I get a 5x5 interpolated matrix. But what if I have a matrix like this: B = zeros(

I'm writing a program that will convert a sparse matrix to Blocked Compressed Row Storage BCRS. I know how to acquire Rowptr, Colind(although not in the code) and A_f. Code: p = 0; for (i = 0; i <

Let's say I have A = [3 0 2; ... 0 0 1; ... 1 1 0] A = sparse(A); The output of which is below: ans = (1,1) 3 (3,1) 1 (3,2) 1 (1,3) 2 (2,3) 1 Question: is there an easy command to generate the fol