I've read several topics, but I'm lost. I'm quite new to this. I want to store huge sparse matrix and have several idea's but can choose between them. Here's my needs:

- Adjacency matrix of approx. 50 million vertices.
- Maximum amount of neighbors per one vertex - approx. 10 000.
- Average amount of neighbors per one vertex - approx. 200-300.
- Fast row query - vector will be multiplied by this matrix.
- O(1) complexity to add edge.
- Most probably, edges will not be deleted.
- Enumeration of the vertices adjacent to v - as fast as possible.
- Portability - there must be a way to transfer base from one computer to another.

So, here's my ideas:

- Huge table with pairs (row, col). Very simple, but enumeration of vertices will be at least O(log N), where N - size of table. It's quite slow as I think. Also, it must be indexed. Every RDBMS will be good for what.
- Enormous amount of lists: one list per vertex. Very fast enumeration, but wouldn't it take much resources to storage this? Also, I'm not sure about which DBMS to use in this case: maybe some NoSql?
- Huge table (row | set of cols). Combination of two above. I'm not sure is there any RDBMS to support arbitrary sets. Do you know any? Maybe NoSql will be useful here?
- Collection of adjacency lists. Any RDBMS will be suitable for that, and costs in terms of complexity are good, but they can be killed by multiple request to DB for one vertex.
- HDF5 - I think it will be slow due to I/O.
- Neo4j - As far as I understand, it storages data in double-linked lists, so it will be practically the same as №4, am i right?

Please, help me to choose or offer a better decision.

If I'm wrong with estimates somewhere, please correct me.

A hybrid neo4j / hbase approach may work well in which neo4j optimizes the graph processing aspects while hbase does the heavy lifting scalability wise - e.g for storing lots of extra attributes.

neo4j contains the nodes and relationships. It may well be enough scalability wise . My investigation on the web on independent non-neo4j sites claim up to several billion nodes/relationships on a single machine with couple of orders of magnitude better performance on traversal than RDBMS.

But.. in case more scalability were needed, you can bring in the hbase big iron to store non-relationship/node identifier extra attributes. Then simply add the hbase rowkey into the neo4j node info for lookup purposes when needed by application.

In the end, I've implemented solution number one.

I used PostgreSQL with two tables: one for edges with two columns - start/end, and another for vertices with unique serial for vertex number and some columns for vertex description.

I've implemented upsert based on pg_advisory_xact_lock. It was a bit slow, but it was enough for me.

Also, it's a pain to delete vertex from this configuration.

To speed up multiplication, I've exported edges table to file. It can even be placed in RAM on x64 machine.

To be fair, the amount of data was less than I expected. Instead of 50 million vertices and average 200-300 edges for 1 vertex there were only 7 million vertices and 160 million edges total.

Similar Questions

The following code performs a breadth first search algorithm using an adjacency list structure. I was curious. If I were to implement this an adjacency matrix, what kinds of edits would I have to do t

I am reading the textbook Introduction to Algorithms aka CLRS, I want to implement the mst using kruskal algorithm in c, what I want to know is, which graph implementation should I use, the adjacency

I have a String data type adjacency matrix in Java: String[][] A; I want to read my adjacency matrix A into a MySQL table. The problem is I never know how many rows/columns I will need (nor would I w

Apparently my teacher believes that even if we don't have time to learn stuff (nor enough examples) we should move on, so I now need to know how to make Floyd-Warshall's and Warshall's algorithms in c

I have the following matrix which I believe is sparse. I tried converting to dense using the x.dense format but it never worked. Any suggestions as to how to do this?, thanks. mx=[[(0, 2), (1, 1), (2,

I have a very large and very sparse matrix, composed of only 0s and 1s. I then basically handle (row-column) pairs. I have at most 10k pairs per row/column. My needs are the following: Parallel inser

I have a network like this: g1 <- erdos.renyi.game(10, 0.5) V(g1)$time <- seq(1:10) I know that igraph has a get.adjacency() to retrieve an adjacency matrix from a graph, where values in the ma

I have a large 500x53380 sparse matrix and trying to dichotomize it. I have tried using event2dichot under sna package but no success because it requires an adjacency matrix or network object. I al

I have a 5000 *5000 sparse matrix with 4 different values. I want to visualise the nonzero elements with 4 different colors such that I can recognise the ratio of this values and the relationships bet

I am using Scipy sparse matrix csr_matrix to be used as context vectors in word-context vectors. My csr_matrix is a (1, 300) shape so it is a 1-dimensional vector. I need to use permutation (circular

I'm trying to manipulate some data in a sparse matrix. Once I've created one, how do I add / alter / update values in it? This seems very basic, but I can't find it in the documentation for the sparse

I want to make a sparse matrix in python. I have the index and value of non-zero elements as a dictionary i.e.: {((1,3),0.0001),(10,4),0.0212)...} which means that value of element (1,3) is 0.0001, (

I have a question about Eigen library in C++. Actually, I want to calculate inverse matrix of sparse matrix. When I used Dense matrix in Eigen, I can use .inverse() operation to calculate inverse of d

I have a large scipy.sparse.csc_matrix and would like to normalize it. That is subtract the column mean from each element and divide by the column standard deviation (std)i. scipy.sparse.csc_matrix ha

Anyone know of any good sparse matrix library? I need it for doing kronecker products and multiplication on large sparse matrices (10,000 x 10,000). Right now we are using R, which handles them prett

I am trying to do some (k-means) clustering on a very large matrix. The matrix is approximately 500000 rows x 4000 cols yet very sparse (only a couple of 1 values per row). I want to get around 2000

I have sparse CSR matrices (from a product of two sparse vector) and I want to convert each matrix to a flat vector. Indeed, I want to avoid using any dense representation or iterating over indexes. S

I have an assignment to implement Dijkstra's Algorithm. We're given skeleton code that inputs the graph as an adjacency matrix, but told our solution must run in O(MlogN) {M-edges, N-vertices}. I see

I'm writing a program about image-processing. I need to store an int square matrix with the size of 480 000 columns and 480 000 rows. Any ideas how can I do that?

I have a large matrix, currently in numpy that i would like to port over to scipy sparse matrix, because saving the text representations of the numpy (2000,2000) matrix is over 100mb. (1) There seem t

I have a data file storing a large matlab sparse matrix (matlab 7.3) that needs to be used in my python program. I use h5py to load this sparse matrix and find there are 3 data structures associated w

What is the compact way of storing a sparse matrix that allows to iterate over each row and each column efficiently?

I have table in SQL Server 2005 that contains two columns Fruit and Colour as shown below Fruit Colour Apple Red Orange Red Berry Green PineApple Green Now i want to convert it into an adjacency matr

I have to implement a sparse matrix (a matrix that has predominantly zeroes, so you only record values different than 0), but I have to implement it using a binary search tree. EDIT: So now I'm thinki

Say I have the following adjacency matrix produced A B C D E F G H I A 0 1 0 1 0 0 0 0 0 B 1 0 0 0 0 0 0 0 0 C 0 0 0 1 0 0 0 0 0 D 1 0 1 0 0 0 1 0 0 E 0 0 0 0 0 1 0 0 0 F 0 0 0 0 1 0 0 0 0 G 0 0 0 1

For this program, I am given a set of inputs that I need to store in an adjacency matrix. I've done this, so I have an adjacency matrix Matrix[11][11]. Now, using this matrix, I need to perform a dept

If you have a sparse matrix X: >> print type(X) <class 'scipy.sparse.csr.csr_matrix'> ...How can you sum the squares of each element in each row, and save them into a list? For example: &

I have to perform this operation: N = A'*P*A The structure of the P matrix is block diagonal while the A matrix is largely sparse (also in a banded structure). The multiplication is performed in blo

I initialize an empty sparse matrix using S = scipy.sparse.lil_matrix((n,n),dtype=int) As expected print S doesn't show anything, since nothing has been assigned. Yet if I test: print S[0,0]==0 I r

I am trying to calculate the laplacian matrix of a graph. I ve calculated the sparse representation of the adjacency matrix which is stored in a text file with dimension Nx3. N the size of nodes (ith-

I'm wondering what the best way is to iterate nonzero entries of sparse matrices with scipy.sparse. For example, if I do the following: from scipy.sparse import lil_matrix x = lil_matrix( (20,1) ) x[1

I am implementing a sparse matrix based on the Stack class, and I'm getting the following error: Sparse.java:6: Sparse is not abstract and does not override abstract method pop() in Stack public clas

I'm trying to represent an adjacency matrix via a very large list of tuples. How would I represent this list in a numpy matrix or scipy.sparse matrix so as to use a package like igraph or networkx? [(

Does anyone know how to perform svd operation on a sparse matrix in python? It seems that there is no such functionality provided in scipy.sparse.linalg.

There does not seem to be a method in scipy.sparse which gives the minimum of a sparse matrix. In particular, I seek the minimum of the columns. No method appears in the docs and numpy minimum does no

I need to store word co-occurrence counts in several 14000x10000 matrices. Since I know the matrices will be sparse and I do not have enough RAM to store all of them as dense matrices, I am storing th

This question already has an answer here: Is there support for sparse matrices in Python? 4 answers I am looking for a solution to store about 10 million floating point (double precision) numbe

I'm basically trying to do the example of StratifiedShuffleSplit but with X not being an array but a sparse matrix. In the example below, this matrix was created by a DictVectorizer fit to an array of

This question already has an answer here: Sparse matrices / arrays in Java 11 answers I need to implement a sparse matrix as efficiently memory-wise as I can in Java.I receive a matrix with mor

This is the psudacode i used for kruskal algorithm.Data structure I have used is an adjacency matrix. I got the order of growth as n2. I wanna know whether it is correct or not. Kruskal’s Pseudo code

I'm trying to find a way to subtract a column of a scipy.sparse matrix from a numpy vector but I can't seem to find a way to do it without changing the shape of the vector. This is what I have so far:

Can anybody suggest a good C++ library for storing Multi-dimensional Sparse Matrix that focuses on the compression of data in matrix. The number of dimensions of the matrix will be huge (say, 80 dimen

I'm working in a program for Power System analysis and I need to work with sparse matrices. There is a routine where I fill a sparse matrix just with the following call: self.A = bsr_matrix((val, (row

Given a quadratic matrix of dimension 1 million I want to calculate the diagonal degree matrix. The diagonal degree matrix is defined as a diagonal matrix, which has the count of non zero values per r

I have a sparse matrix (dgCMatrix) as the result of fitting a glmnet. I want to write this result to a CSV but can't use write.table() the matrix because it can't coerced into a data.frame. Is there a

I'm trying to write a matrix to sparse matrix conversion program. But, there is a problem in my scan_matrix function.I couldn't figure out where is the problem.According to Nemiver(debugging program)

I have a large sparse matrix, and I want to permute its rows or columns to turn the original matrix into a block diagonal matrix. Anyone knows which functions in R or MATLAB can do this? Thanks a lot.

I'm writing a program in Python using scipy's spsolve to solve a linear equation using a sparse matrix (csr_matrix). The matrices are fairly large (M=90826x90826, b=90826x1) and are hard to check by h

Given a sparse matrixR of type scipy.sparse.coo_matrix of shape 1.000.000 x 70.000 I figured out that row_maximum = max(R.getrow(i).data) will give me the maximum value of the i-th row. What I need n

The following code runs too slowly even though everything seems to be vectorized. from numpy import * from scipy.sparse import * n = 100000; i = xrange(n); j = xrange(n); data = ones(n); A=csr_matrix