I have to compute massive similarity computations between vectors in a sparse matrix. What is currently the best tool, `scipy-sparse`

or `pandas`

, for this task?

After some research I found that both pandas and Scipy have structures to represent sparse matrix efficiently in memory. But none of them have out of box support for compute similarity between vectors like cosine, adjusted cosine, euclidean etc. Scipy support this on *dense* matrix only. For sparse, Scipy support dot products and others linear algebra basic operations.

Similar Questions

I am trying to figure out the fastest method to find the determinant of sparse symmetric and real matrices in python. using scipy sparse module but really surprised that there is no determinant functi

I'm trying to normalize a csr_matrix: <5400x6845 sparse matrix of type '<type 'numpy.float64'> with 91833 stored elements in Compressed Sparse Row format> What I tried was this: import nu

For Scipy sparse matrix, one can use todense() or toarray() to transform to Numpy.matrix or array. What are the functions to do the inverse? I searched, but got no idea what keywords should be the rig

I am new to the use of sparse matrices, but now need to utilize one in my work to save space. I understand that the following matrix: 10 0 0 0 -2 0 3 9 0 0 0 3 0 7 8 7 0 0 3 0 8 7 5 0 0 8 0 9 9 13 0 4

Say I would like to remove the diagonal from a scipy.sparse.csr_matrix. Is there an efficient way of doing so? I saw that in the sparsetools module there are C functions to return the diagonal. Based

In another post regarding resizing of a sparse matrix in SciPy the accepted answer works when more rows or columns are to be added, using scipy.sparse.vstack or hstack, respectively. In SciPy 0.12 the

In scipy, we can construct a sparse matrix using scipy.sparse.lil_matrix() etc. But the matrix is in 2d. I am wondering if there is an existing data structure for sparse 3d matrix / array (tensor) in

I'm trying to cluster some data with python and scipy but the following code does not work for reason I do not understand: from scipy.sparse import * matrix = dok_matrix((en,en), int) for pub in pubs:

I am doing some computations on a sparse matrix of floats in the log domain, so the empty entries are actually -Inf (using -FLT_MAX). I'm using a custom sparse matrix class right now but I am eager

I am encountering a TypeError with a pandas sparse data frame when I use the value_counts method. I have listed the versions of the packages that I am using. Any suggestions on how to make this work ?

I'm working on implementing the stochastic gradient descent algorithm for recommender systems using sparse matrices with Scipy. This is how a first basic implementation looks like: N = self.model.sha

How to convert an Eigen::Matrix<double,Dynamic,Dynamic> to an Eigen::SparseMatrix<double> ? I'm looking for a better way instead of iterate through the dense matrix

I am currently trying to speed up my large sparse (scipy) matrix multiplications. I have successfully linked my numpy installation with OpenBLAS and henceforth, also scipy. I have run these tests with

I'm trying to figure out how to efficiently solve a sparse triangular system, Au*x = b in scipy sparse. For example, we can construct a sparse upper triangular matrix, Au, and a right hand side b with

Shouldn't the following uses of eigh and eigsh from the sparse and normal linalg libraries be giving the same answer? from numpy import random from scipy.linalg import eigh as E1 from scipy.sparse.lin

I've read several topics, but I'm lost. I'm quite new to this. I want to store huge sparse matrix and have several idea's but can choose between them. Here's my needs: Adjacency matrix of approx. 50

I'm calculating the dot product between a scipy.sparse matrix (CSC) and a numpy ndarray vector: >>> print type(np_vector), np_vector.shape <type 'numpy.ndarray'> (200,) >>> pri

Given a Scipy CSC Sparse matrix sm with dimensions (170k x 170k) with 440 million non-null points and a sparse CSC vector v (170k x 1) with a few non-null points, is there anything that can be don

I am using eigen 3.1.0-alpha1 as solver for a my first little software. I need to return a sparse matrix from a method of a class: SparseMatrix KMDMatrix::Assembly(double ***p_objs){ SparseMatrix <

Is it possible to speed up large sparse matrix calculations by e.g. placing parantheses optimally? What I'm asking is: Can I speed up the following code by forcing Matlab to do the operations in a spe

Suppose I create this sparse matrix, where the non-zero elements consist of booleans 'true': s = sparse([3 2 3 3 3 3 2 34 3 6 3 2 3 3 3 3 2 3 3 6], [10235 11470 21211 33322 49297 88361 91470 127422

I'm trying to convert some code to Python but I noticed that SciPy's sparse diagonal operations are having some trouble handling systems that are diagonal. For example the following code can be writt

I have a set of sparse matrices filled with boolean values that I need to perform logical operations on (mostly element-wise OR). as in numpy, summing matrices with dtype='bool' gives the element-wise

I have a sparse matrix A, generated as an output of glmnet function. When I print the matrix A, it shows all the entries and at the top it reads - 1897 x 100 sparse Matrix of class dgCMatrix [[ su

I am creating a matrix from a Pandas dataframe as follows: dense_matrix = numpy.array(df.as_matrix(columns = None), dtype=bool).astype(np.int) And then into a sparse matrix with: sparse_matrix = scip

I am trying to create a very huge sparse matrix which has a shape (447957347, 5027974). And, it contains 3,289,288,566 elements. But, when i create a csr_matrix using scipy.sparse, it return something

I'd like to find the N smallest eigenvalues of a sparse matrix in Python. I've tried using the scipy.sparse.linalg.eigen.arpack package, but it is very slow at computing the smallest eigenvalues. I re

I can't find more info about scipy.sparse indexing except SciPy v0.11 Reference Guide, which says that The lil_matrix class supports basic slicing and fancy indexing with a similar syntax to NumPy ar

I'm having trouble efficiently loading data into a sparse matrix format in R. Here is an (incomplete) example of my current strategy: library(Matrix) a1=Matrix(0,5000,100000,sparse=T) for(i in 1:5000)

I'm trying to use large 10^5x10^5 sparse matrices but seem to be running up against scipy: n = 10 ** 5 x = scipy.sparse.rand(n, n, .001) gets ValueError: Trying to generate a random sparse matrix su

This Creating sparse matrix in MEX has a good example on mxCreateSparse. But this function return a double sparse matrix instead of single. If I want to return a single sparse matrix, what should I do

I'm making a little program to make a representation of sparse matrixes (a matrix with a lot of elements equal to zero). Represented like this page 108 (I think watching at the figure is enough to und

I really couldn't google it. How to transform sparse matrix to ndarray? Assume, I have sparse matrix t of zeros. Then g = t.todense() g[:10] matrix([[0], [0], [0], [0], [0], [0], [0], [0], [0], [0]])

Say I have a huge numpy matrix A taking up tens of gigabytes. It takes a non-negligible amount of time to allocate this memory. Let's say I also have a collection of scipy sparse matrices with the sam

I have a N*N matrix: N=3 x = scipy.sparse.lil_matrix( (N,N) ) for _ in xrange(N): x[random.randint(0,N-1),random.randint(0,N-1)]=random.randint(1,100) Assume the matrix looks as below: X Y Z X 0 [2,

I'm integrating a system of stiff ODE's using SciPy's integrate.odeint function. As the integration is non-trivial and time consuming I'm also using the corresponding jacobian. By rearranging the equa

I'd like to write a function that normalizes the rows of a large sparse matrix (such that they sum to one). from pylab import * import scipy.sparse as sp def normalize(W): z = W.sum(0) z[z < 1e-6]

I am attempting to multiply two relatively large scipy.sparse matrices. One is 100000 x 20000 and the other is 20000 x 100000. My machine has sufficient memory to handle the operation but it is by no

I've got a SciPy sparse matrix A, let's say in CSR format, and a vector v of matching length. What's the best way of row-scaling A with v, i.e., performing diag(v) * A?

i have two sq matrix (a, b) of size in order of 100000 X 100000. I have to take difference of these two matrix (c = a-b). Resultant matrix 'c' is a sparse matrix. I want to find the indices of all non

from numpy.random import rand from sklearn.preprocessing import normalize from scipy.sparse import csr_matrix from scipy.linalg import norm w = (rand(1,10)<0.25)*rand(1,10) x = (rand(1,10)<0.25)

I'm looking for a Sparse Matrix library I can use from Ruby. I'm currently using the GNU Scientific Library bindings provided by the gsl gem, but my application would be better optimized if I used a

I would like to use sparse matrices for an analysis. Each cell in the sparse matrix comprises one value from the set {0,1,NA}. NA here represents a missing value. For example, I can use the following

I need to do a simple curve fitting using scipy (curve_fit). However, my data is in the form of a matrix. I can easily do this in numpy but I wanted to see the goodness of fit for scipy. Problem: AX =

I have the following code in Python using Numpy: p = np.diag(1.0 / np.array(x)) How can I transform it to get the sparse matrix p2 with the same values as p without creating p first?

I have time data with irregular intervals and I need to convert it to a sparse matrix for use with a graphing library. The data is currently in the following format: { :series1 => [entry, entry, en

I have a large sparse numpy/scipy matrix where each row corresponds to a point in high-dimensional space. I want make queries of the following kind: Given a point P (a row in the matrix) and a distanc

I have downloaded and included UJMP (Universal Java Matrix Package) library to my project for generating sparse matrix. But I could not find any documentation about functions of the library, how to cr

In my project, I'm trying to build an adjacency matrix for a graph, and for space and time considerations we are supposed to use a sparse matrix, which, from my understanding, is most easily done with

I can't get simple matrix operations to work on data, for the life of me I haven't been able to figure out what I'm doing incorrectly: data = np.genfromtxt(dataset1, names=True, delimiter=,, dtype=