I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matrices, I am storing them as scipy.sparse matrices.

I have found the most efficient way to gather the counts to be using Counter objects. Now I need to transfer the counts from the Counter objects to the sparse matrices, but this takes too long. It currently takes on the order of 18 hours to populate the matrices.

The code I'm using is roughly as follows:

```
for word_ind1 in range(len(wordlist1)):
for word_ind2 in range(len(wordlist2)):
word_counts[word_ind2, word_ind1]=word_counters[wordlist1[word_ind1]][wordlist2[word_ind2]]
```

Where `word_counts`

is a scipy.sparse.lil_matrix object, `word_counters`

is a dictionary of counters, and `wordlist1`

and `wordlist2`

are lists of strings.

Is there any way to do this more efficiently?

You're using LIL matrices, which (unfortunately) have a linear-time insertion algorithm. Therefore, constructing them in this way takes quadratic time. Try a DOK matrix instead, those use hash tables for storage.

However, if you're interested in boolean term occurrences, then computing the co-occurrence matrix is much faster if you have a sparse term-document matrix. Let `A`

be such a matrix of shape `(n_documents, n_terms)`

, then the co-occurrence matrix is

```
A.T * A
```

Similar Questions

Given an image of size [hh,ww], I would like to create efficiently a sparse matrix of size [hh*ww, hh*ww]. For each 4- or 8-neighbor of a given pixel, the sparse matrix should be filled with a constan

Is there some easy and fast way to convert a sparse matrix to a dense matrix of doubles? Because my SparseMatrix is not sparse any more, but became dense after some matrix products. Another question I

Below is my code for generating my sparse matrix: import numpy as np import scipy def sparsemaker(X, Y, Z): 'X, Y, and Z are 2D arrays of the same size' x_, row = np.unique(X, return_inverse=True) y_,

I am implementing a sparse matrix based on the Stack class, and I'm getting the following error: Sparse.java:6: Sparse is not abstract and does not override abstract method pop() in Stack public clas

For Scipy sparse matrix, one can use todense() or toarray() to transform to Numpy.matrix or array. What are the functions to do the inverse? I searched, but got no idea what keywords should be the rig

I am using Scipy to construct a large, sparse (250k X 250k) co-occurrence matrix using scipy.sparse.lil_matrix. Co-occurrence matrices are triangular; that is, M[i,j] == M[j,i]. Since it would be high

I can't seem to find a way how to efficiently load scipy sparse matrices, e.g. csr_matrix, into a petsc4py matrix, e.g. PETSc.Mat().createAIJ. I found http://lists.mcs.anl.gov/pipermail/petsc-users/20

If I am using the sparse.lil_matrix format, how can I remove a column from the matrix easily and efficiently?

Does anyone know how to perform svd operation on a sparse matrix in python? It seems that there is no such functionality provided in scipy.sparse.linalg.

I have a sparse matrix that I obtained by using Sklearn's TfidfVectorizer object: vect = TfidfVectorizer(sublinear_tf=True, max_df=0.5, analyzer='word', vocabulary=my_vocab, stop_words='english') tfid

I researched a lot on this but couldn't find a practical solution to this problem. I am using scipy to create csr sparse matrix and want to substract this matrix from an equivalent matrix of all ones.

I am trying to subset a matrix : windowSize <- 60 windows <- embed(1:nrow(datatsr), windowSize) head(windows): [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] [,11] [,12] [,13] [,14] [,15] [,

I am attempting to multiply two relatively large scipy.sparse matrices. One is 100000 x 20000 and the other is 20000 x 100000. My machine has sufficient memory to handle the operation but it is by no

I have an assignment where Im supposed to finish the implementation on a generic sparse matrix. Im stuck on the addition part. The matrix is only going to support numbers so I had it extend Number hop

I'd like to find the N smallest eigenvalues of a sparse matrix in Python. I've tried using the scipy.sparse.linalg.eigen.arpack package, but it is very slow at computing the smallest eigenvalues. I re

I'm involved in the resolution of a system of the type Ax = b, where A is a square sparse matrix, x is the vector of the unknows (I have to compute it) and b is a vector of all zeros excpet for the la

I'm looking for a Sparse Matrix library I can use from Ruby. I'm currently using the GNU Scientific Library bindings provided by the gsl gem, but my application would be better optimized if I used a

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d. I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in

I have two sparse matrices, m1 and m2: > m1 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c(b,d),NULL)) > m2 <- Matrix(data=0,nrow=2, ncol=1, sparse=TRUE, dimnames=list(c

My goal is to combine many sparse matrices together to form one large sparse matrix. The only two ideas I've been able to think of are (1) create a large sparse matrix and overwrite certain blocks, (2

I think those with even a slight grasp on basic string manipulation, loops and dictionaries can work out how to populate a Dictionary from a String such as this: Black:#00000|Green:#008000| (where Bl

I'm a bit of a newbie to both Matlab and Python so, many apologies if this question is a bit dumb... I'm trying to convert some Matlab code over to Python using numpy and scipy and things were going f

I am trying to multiply two large sparse matrices of size 300k * 1000k and 1000k*300k using Eigen. The matrices are highly sparse ~0.01% non zero entries, however there's no block or other structure i

I have a matrix X1 with 6 columns. Column 3 in this X1 matrix contains RouteNo. I also have a vector V1 which is extracted from another matrix. Few values from this vector matches with RouteNo in X1.

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following: from scipy.sparse import lil_matrix x = lil_matrix( (20,1) ) x[1

I have a large sparse matrix X in scipy.sparse.csr_matrix format and I would like to multiply this by a numpy array W making use of parallelism. After some research I discovered I need to use Array in

When I have two non-sparse matrices A and B, is there a way to efficiently calculate C=A.T.dot(B) when I only want a subset of the elements of C? I have the desired indices of C stored in CSC format w

I am trying to create a large sparse matrix, 10^5 by 10^5 in R, but am running into memory issues. > Matrix(nrow=1e5,ncol=1e5,sparse=TRUE) Error in Matrix(nrow = 1e+05, ncol = 1e+05, sparse = TRUE)

I know there are packages in R to store sparse matrices efficiently. Is there also a way to store a low-rank matrix efficiently? For example: A <- matrix(rnorm(1e6), nrow=1e5, ncol=1e1) B <- A %

I noticed Pandas now has support for Sparse Matrices and Arrays. Currently, I create DataFrame()s like this: return DataFrame(matrix.toarray(), columns=features, index=observations) Is there a way to

I want to make a sparse matrix in python. I have the index and value of non-zero elements as a dictionary i.e.: {((1,3),0.0001),(10,4),0.0212)...} which means that value of element (1,3) is 0.0001, (

Ok, so you can get a single value by dictionary[key] or all values by dictionary.Values. What I am looking for is a way to get all values for a given key set like so: List<string> keys; Dictiona

I'm basically trying to do the example of StratifiedShuffleSplit but with X not being an array but a sparse matrix. In the example below, this matrix was created by a DictVectorizer fit to an array of

I have a large scipy sparse symmetric matrix which I need to condense by taking the sum of blocks to make a new smaller matrix. For example, for a 4x4 sparse matrix A I will like to make a 2x2 matrix

This question already has an answer here: Is there support for sparse matrices in Python? 4 answers I am looking for a solution to store about 10 million floating point (double precision) numbe

I have a sparse matrix Formal class 'dgCMatrix' [package Matrix] with 6 slots ..@ i : int [1:37674] 1836 2297 108 472 1735 1899 2129 2131 5 67 ... ..@ p : int [1:3417] 0 2 8 22 25 35 44 45 45 47 ..

Is it possible to apply for example numpy.exp or similar pointwise operators to all elements in a scipy.sparse.lil_matrix or another sparse matrix format? import numpy from scipy.sparse import lil_mat

How would one sum up duplicate values efficently when converting from COO format to CSR. Does something similar to scipy implementation (http://docs.scipy.org/doc/scipy-0.9.0/reference/sparse.html) ex

I need to store around 50.000 scipy sparse csr matrices where each matrix is a vector of length 3.7Million: x = scipy.sparse.csr_matrix((3.7Mill,1)) I currently store them into a simple dictionary, b

I have been given this 63521x63521 real sparse symmetric matrix in MATLAB and for some reason it seems to be behaving weirdly for some commands. I am not sure if there is a 'defect' in the matrix fil

I see 2 implementations of sparse matrix in this package. OpenMapRealMatrix SparseFieldMatrix Both are documented as Sparse matrix implementation based on an open addressed map. Do you know what a

How can you take the log base 10 of every element in a sparse matrix (COO)? >>print type(X) <class 'scipy.sparse.coo.coo_matrix'> I've tried this but it doesn't work: import math X.data =

I am using SciPy's hierarchical agglomerative clustering methods to cluster a m x n matrix of features, but after the clustering is complete, I can't seem to figure out how to get the centroid from th

I really couldn't google it. How to transform sparse matrix to ndarray? Assume, I have sparse matrix t of zeros. Then g = t.todense() g[:10] matrix([[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]])

I can't find more info about scipy.sparse indexing except SciPy v0.11 Reference Guide, which says that The lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy ar

I create a sparse matrix in scala breeze, ie using http://www.scalanlp.org/api/breeze/linalg/CSCMatrix.html. Now I want to get a column slice from it. How to do this? Edit: there are some further requ

I would like to get the minimum nonzero values per row in a sparse matrix. Solutions I found for dense matrices suggested masking out the zero values by setting them to NaN or Inf. However, this obvio

Given a large sparse matrix (say 10k+ by 1M+) I need to find a subset, not necessarily continuous, of the rows and columns that form a dense matrix (all non-zero elements). I want this sub matrix to b

I am working with an extremely large data set in a sparse matrix format. The data has the filing format (3 tab separated columns, where the string in the first column corresponds to a row, the string

I'm using Python + Scipy to diagonalize sparse matrices with random entries on the diagonal; in particular, I need eigenvalues in the middle of the spectrum. The code I've written has worked fine for