Assume two sets (unordered, no duplicate elements):

```
A = set(["z", "x", "c"])
B = set(["x", "z", "d", "e"])
```

These sets have two common elements: "z" and "x", and some set-specific elements: c, d, e.

How can you give each set a score, just like string-distance, while

- disregarding the ordering of elements and
- imposing the no-duplicate constraint for each isolated set

?

As you can see in the example, the size of each set can be different.

The non-critical requirements for this algorithm are:

- Insertion > Deletion (a set lacking an element implies higher cost, than the one that has one too many) if possible, or just INS = DEL
- Swap: 0 (no cost, since ordering has no effect on distance)

For now I have been calculating a set distance score:

```
score_A = len(common(a,b)) / len(a) # common(...) calculates intersection
score_B = len(common(a,b)) / len(b)
quadratic_score = sqrt(score_A * score_B)
```

How would you recommend approaching this problem or improving my solution?

Are there any algorithms that allow specification of costs?

Right now I am about to define a simple algebra for set modification:

```
def calculate_distance( a, b, insertion_cost=1, deletion_cost=1 ):
"""
Virtually, a programmer-friendly set-minus.
@return the distance from A to B, mind that this is not
a commutative operation.
"""
score = 0
for e in a:
if e not in b: # implies deletion from A
score += deletion_cost
for e in b:
if e not in a: # implies insertion into A
score += insertion_cost
return score
```

How can I normalize this value and against what?

How about the size of the set intersection over the size of the larger set? So:

```
float(len(A.intersection(B)))/max(len(A),len(B))
```

It'll give you a number scaled in the range 0.0 to 1.0 which is often desirable. 1.0 representing full equality, 0.0 representing nothing in common.

Similar question to this one

Assuming OP is asking something as the "distance", I think it's better to make it **0** when two sets are identical according to the general requirements of a distance function

And it would be also good to have *symmetric* and *triangle inequality*

*symmetric* is intuitive, and *triangle inequality* means d(A,C) ≤ d(A,B) + d(B,C)

I suggest something like:

```
C = A.intersection(B)
Distance = sqrt(len(A-C)*2 + len(B-C)*2)
```

However I don't know how to prove the *triangle inequality* yet

To normalize OP's updated function result, just do `score = score / (len(a) + len(b))`

which will give you 1 when `a`

doesn't intersect `b`

, and 0 when `a == b`

This answer is of course out of date with respect to the question, but hopefully will be picked up by any future visitors.

Use the Jaccard distance, the cardinality (size of set) of the symmetric difference between the two sets divided by the cardinality of their union. In other terms, union minus intersection all divided by union.

This assumes that the elements can be compared in a discrete fashion, i.e. they are equal or not. A desirable property is that the Jaccard distance is a metric.

Similar Questions

Using Google Maps API how to determine the driving distance between two locations ? Platform -Java

I have 3 subplots (3 rows and 1 column). We can use fig.subplots_adjust(hspace=0.2) to adjust the distance between the subplots. this will change the distance between subplots for all case. How can I

How do you calculate the distance between 2 cities?

I am developing an app where in user has to record location and then locate the car.When he selects locate car option the car that is parked should be displayed.I calculate the distance but distance a

I need to get the distance between two points without showing the map on a iPhone app. I'm trying to use CLoudMade ... but I can't figure it out from the documentation provided. CAn anyone help me ...

Edit: MSSQL 2008 I am trying to calculate the distance between two positions on a map. I have stored in my data: Longitude, Latitude, X POS, Y POS. I have been previously using the below snippet. DECL

I am trying to get the minimum distance between a given point and the coast. My example is the distance of Madrid to the coast: library(rgeos) library(maptools) coast <- readShapeLines(Natural_Ear

I have two datasets (tracks) with points in x/y which represent GPS positions. I want to analyze the distance between both tracks. The points are not necessary in sync, but having the same frequency,

I have two large data sets of numeric keys (millions of entries in each) and need to set up a data structure where I can quickly identify key matches between the two sets, allowing for some fixed vari

I have two paragraphs of text on a webpage and would like only one of them to be visible at a time. Ideally, I would like to achieve this through a set of buttons at the top of the page, possibly thro

I'm trying to do distance checks between two circles without using square roots. According to many sources online (such as this one), the way to do this is to subtract the square of the sum of the cir

double distance; Location locationA = new Location(point A); locationA.setLatitude(latA); locationA.setLongitude(lngA); Location locationB = new Location(point B); locationB.setLatitude(latB); Loc

I have a set of lng/lat coordinates. What would be an efficient method of calculating the greatest distance between any two points in the set (the maximum diameter if you will)? A naive way is to u

i am trying to calculate the distance and bearing between two geopoints. This is a code that i have found on the internet but it returns different values. double currentLat = 51.43376; double currentL

I need to calculate the distance between two points, but not in the regular way. I need to know 'the east to west distance' + 'the north to south distance'. I guess this is more simple then the regula

Once more something relatively simple, but confused as to what they want. the method to find distance on cartesian coordinate system is distance=sqrt[(x2-x1)^2 + (y2-y1)^2] but how do i apply it here?

I am working on a website which requires distance calculation between two zip-codes. I have a database table which consists of zip-codes and their related latitudes and longitudes. I make use of this

I have two plotted lines and I want to find the least distance error between them. When I simply subtract them from each other I get the error in the x-direction. But I am looking for the error in the

I have latitude and longitude of two places. Can i calculate the distance between them in android?

As we know, we could caculate the distance between two fingers in Windows Phone 7 by the help of toolkit's gesteuresevices. It would be like this: ManipulationStartedEventArgs.GetPosition(UIElment, fi

i would like to ask every one in group stackoverflow? i have a problem related to calculation distance between two points of latitude and longitude on iphone and and android. please help me. thank in

Is it ok to compare distances in a classic way (distance between 2 points: d = sqrt(pow(lat2-lat1, 2) + pow(lon2-lon1, 2)) ) using the latitude and longitude returned from google apis without any tran

I'm trying to calculate the time difference between two rows using shift(), but I get an unexpected error. I may be missing something obvious df['Delta'] = (df.index - df.index.shift(1)) This stateme

Is there an IOS API that wraps the google 'distance between two locations' API? That is, is there an apple IOS SDK method that one could call to use this google api at http://code.google.com/apis/map

I need to be able to calculate the minimum distance between two cities given a table that has the distances between pairs of cities. Two cities may not be connected directly, instead they may be conne

i used this code which takes longitude and latitude of two different location and calculates the distance between them my code is protected void Button1_Click(object sender, EventArgs e) { double lat1

I want to fix a constant vertical distance between 2 UILabels. When the first UILabel resizes, the UILabel below should dynamically adjust it's Y position. This is what I have right now: CGRect labelF

What is the best way to calculate AND add a field to a data file that shows the crow-fly distance (in miles) between two zip codes for each record (250K+) in a file? THANKS

I have two convex polygons in 3D. They are both flat on different planes, so they're a pair of faces. What's the simplest way to calculate the closest distance between these two polygons? Edit: The l

I'm calculating the distance between two GeoCoordinates. I'm testing my app against 3-4 other apps. When I'm calculating distance, I tend to get an average of 3.3 miles for my calculation whereas othe

I want to know the distance of two points or the width and height distance of my current mapview in km unit. Is there any android map api to do that?

Is there a way to have a distance between 2 addresses calculated by Google Maps? How?

How do you calculate the distance between five markers in Google maps V3? I know I have to use Haversine formula which I researched and even found a post in here teaching on calculating the distance b

I'd like to create a function that calculates the distance between two pairs of lat/longs using the pythag theorem instead of the haversine great-circle formula. Since this will be over relative short

I am trying to calculate distance between two matrices.For eg. it goes like this. Matrix-M1 1 2 3 4 5 6 7 8 9 0 Matrix-M2 1 1 2 2 Distance matrix ( d1 is distance between M1 1st row and M2 1st row,d

I want to calculate the time between two clicks of an attribute with javascript but I don't know how. For example; <a href=#>click here</a> if the user clicks more than once -let's say

I am having an issue calculating the distance between two bit vectors using common lisp. I am quite new to lisp and this is the final Homework problem for my Artificial Intelligence homework and belie

I want to compute the similarity (distance) between two vectors: v1 <- c(1, 0.5, 0, 0.1) v2 <- c(0.7, 1, 0.2, 0.1) I just want to know if a package is available for calculating different well-k

Trying to calculate from a .gpx file the distance between each gps points. I have tried 2 different formulas. This one should be more accurate: dist = 6378.388 * acos(sin(lat1) * sin(lat2) + cos(lat1)

I only can think to use GPS location to calculate the distance ... What if the two device is in the same building,but different floor How to get the vertical distance ? or any better ideas ???

I used http://maps.googleapis.com/maps/api/geocode/json to get lat and long and saved into DB. I am trying to write code that calculates between two locations. is there any good api that calculates di

I want to embed a feature on my website that would tell the distance between two adresses, let's say adress 1 : 100 Main St, IL ... and adress 2 : 200 Lincoln St, .. I know one way: I could use Seleni

I am completely lost on how to sort the coordinates into an array, and find the distance between them. This is the question: Create a new class called “Circle” that can be used to create customized c

I have two sound samples (.WAV) and I want to find the difference between them. I've read about this subject and I know that I should use FFT to do such a thing. Unfortunately, I can't find a way to d

i am using GPS to calculate distance between two points i.e. i am using windows phone as a tape measure but when i start i dont get the correct value infact even if i am standing still it gives me hun

Is it possible to calculate distance by land transport between two UK postcodes using Google Maps API V3? Has anyone accomplished this? Thank You

I want to put facility to display distance between two places selected by user. Is there any sample source code / service available to do so? Thanks in advance.

I have two vectors (single row matrices). Assume that we already know the length len. A = [ x1 x2 x3 x4 x5 .... ] B = [ y1 y2 y3 y4 y5 .... ] To calculate Euclidean distance between them what is the

I have a Json Data where I getting my data. There are many Objects, one of these are Coordinates, latitude and longitude. I have no problems with calculating the distance, displaying on maps etc. The

I am calculating the difference between two times and i am able to get the difference in hours and minuted using separate equations,is there any single equation from which i can get hours and minutes