I am looking for a solution to store about 10 million floating point (double precision) numbers of a sparse matrix. The matrix is actually a two-dimensional triangular matrix consisting of 1 million by 1 million elements. The element `(i,j)`

is the actual score measure `score(i,j)`

between the element `i`

and element `j`

. The storage method must allow very fast access to this information maybe by memory mapping the file containing the matrix. I certainly don't want to load all the file in memory.

```
class Score(IsDescription):
grid_i = UInt32Col()
grid_j = UInt32Col()
score = FloatCol()
```

I've tried `pytables`

by using the `Score`

class as exposed, but I cannot access directly to the element `i,j`

without scanning all the rows. Any suggestion?

You should use `scipy.sparse`

. Here's some more info about the formats and usage.

10 million double precision floats take up 80 MB of memory. If you store them in a 1 million x 1 million sparse matrix, in CSR or CSC formats, you will need an additional 11 million int32s, for a total of around 125 MB. That's probably less than 7% of the physical memory in your system. And in my experience, on a system with 4GB running a 32-bit version of python, you rarely start having trouble allocating arrays until you try to get a hold of ten times that.

Run the following code on your computer:

```
for j in itertools.count(100) :
try :
a = np.empty((j * 10**6,), dtype='uint8`)
print 'Allocated {0} MB of memory!'.format(j)
del a
except MemoryError:
print 'Failed to allocate {0} MB of memory!'.format(j)
break
```

And unless it fails to get you at least 4 times the amount calculated above, don't even hesitate about sticking the whole thing in memory using a `scipy.sparse`

format.

I have no experience with pytables, nor much with numpy's `memmap`

arrays. But it seems to me that either one of those will involve you coding the logic to handle the sparsity, something I would try to avoid unless impossible to.

