In ruby, what is the most efficient way to calculate the bit difference between two unsigned integers (e.g. the hamming distance)?

Eg, I have integer a = 2323409845 and b = 1782647144.

Their binary representations are:

```
a = 10001010011111000110101110110101
b = 01101010010000010000100101101000
```

The bit difference between the a & b is 17..

I can do a logical XOR on them, but that will give me a different integer != 17, I would then have to iterate through the binary representation of the result and tally the # of 1s.

What is the most efficient way to calculate the bit difference?

Now, does the answer change for calculating the bit difference of sequences of many ints? E.g. given 2 sequences of unsigned integers:

```
x = {2323409845,641760420,509499086....}
y = {uint,uint,uint...}
```

What is the most efficient way to calculate the bit difference between the two sequences?

Would you iterate through the sequence, or is there a faster way to calculate the difference across the entire sequence at once?

An algorithm of Wegner:

```
def hamm_dist(a, b)
dist = 0
val = a ^ b
while not val.zero?
dist += 1
val &= val - 1
end
dist
end
p hamm_dist(2323409845, 1782647144) # => 17
```

You can make use of the optimized String functions in Ruby to do the bit counting, instead of pure arithmetic. It turns out to be about 6 times faster with some quick benchmarking.

```
def h2(a, b)
(a^b).to_s(2).count("1")
end
```

h1 is the normal way to calculate, while h2 converts the xor into a string, and counts the number of "1"s

Benchmark:

```
ruby-1.9.2-p180:001:0>> def h1(a, b)
ruby-1.9.2-p180:002:1*> ret = 0
ruby-1.9.2-p180:003:1*> xor = a ^ b
ruby-1.9.2-p180:004:1*> until xor == 0
ruby-1.9.2-p180:005:2*> ret += 1
ruby-1.9.2-p180:006:2*> xor &= xor - 1
ruby-1.9.2-p180:007:2*> end
ruby-1.9.2-p180:008:1*> ret
ruby-1.9.2-p180:009:1*> end
# => nil
ruby-1.9.2-p180:010:0>> def h2(a, b)
ruby-1.9.2-p180:011:1*> (a^b).to_s(2).count("1")
ruby-1.9.2-p180:012:1*> end
# => nil
ruby-1.9.2-p180:013:0>> h1(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:014:0>> h2(2323409845, 1782647144)
# => 17
ruby-1.9.2-p180:015:0>> quickbench(10**5) { h1(2323409845, 1782647144) }
Rehearsal ------------------------------------
2.060000 0.000000 2.060000 ( 1.944690)
--------------------------- total: 2.060000sec
user system total real
1.990000 0.000000 1.990000 ( 1.958056)
# => nil
ruby-1.9.2-p180:016:0>> quickbench(10**5) { h2(2323409845, 1782647144) }
Rehearsal ------------------------------------
0.340000 0.000000 0.340000 ( 0.333673)
--------------------------- total: 0.340000sec
user system total real
0.320000 0.000000 0.320000 ( 0.326854)
# => nil
ruby-1.9.2-p180:017:0>>
```

Per the suggestion of mu is too short, I wrote a simple C extension to use __builtin_popcount , and using benchmark verified that it is at least 3X faster than ruby's optimized string functions..

I looked at the following two tutorials:

In my program:

```
require './FastPopcount/fastpopcount.so'
include FastPopcount
def hamming(a,b)
popcount(a^b)
end
```

Then in the dir containing my program, I create a folder "PopCount" with the following files.

extconf.rb:

```
# Loads mkmf which is used to make makefiles for Ruby extensions
require 'mkmf'
# Give it a name
extension_name = 'fastpopcount'
# The destination
dir_config(extension_name)
# Do the work
create_makefile(extension_name)
```

popcount.c:

```
// Include the Ruby headers and goodies
#include "ruby.h"
// Defining a space for information and references about the module to be stored internally
VALUE FastPopcount = Qnil;
// Prototype for the initialization method - Ruby calls this, not you
void Init_fastpopcount();
// Prototype for our method 'popcount' - methods are prefixed by 'method_' here
VALUE method_popcount(int argc, VALUE *argv, VALUE self);
// The initialization method for this module
void Init_fastpopcount() {
FastPopcount = rb_define_module("FastPopcount");
rb_define_method(FastPopcount, "popcount", method_popcount, 1);
}
// Our 'popcount' method.. it uses the builtin popcount
VALUE method_popcount(int argc, VALUE *argv, VALUE self) {
return INT2NUM(__builtin_popcount(NUM2UINT(argv)));
}
```

Then in the popcount directory run:

ruby extconf.rb make

Then run the program, and there you have it....fastest way to do hamming distance in ruby.

Similar Questions

I need to calculate distance between two locations in my android application. Then I need to display the way on the map. Looks like I need to use some Google API to calculate the distance. Should it b

When you're iterating over hundreds of lines in a file, what is the most (and least) efficient way to run regular expressions in Python? Specifically, is the following bad form? for line in file: data

I would like to extract some information from a string in Ruby by only reading the String once (O(n) time complexity). Here is an example: The string looks like this: -location here -time 7:30pm -acti

I'm looking for the most efficent way (i.e. the lesser keys pressed) to indexing the last element of an array. Then something like a <- c(1,2,3) n <- length(a) b <- a[n] should not be used,

I know that serializing an object is (to my knowledge) the only way to effectively deep-copy an object (as long as it isn't stateful like IO and whatnot), but is one way particularly more efficient th

Hi I have a pair of 3*3 matrix and I want to calculate the Hamming distance of these; I found that and that , but i can not applied it I m using 2011 version of Matlab. Thanks

May be this sound stupid but, which one is the most efficient way to load image? A BitmapImage bmp = new BitmapImage(); using(FileStream fileStream = new FileStream(source_path, FileMode.Open)) { bmp.

I've implemented code in MATLAB that similar to hamming distance. for input i have one matix .I want to apply my formula that use hamming distance. my formula like this: way is Considers two row(x,y)

I need to compare a large number of strings similar to 50358c591cef4d76. I have a Hamming distance function (using pHash) I can use. How do I do this efficiently? My pseudocode would be: For each stri

I have a DataFrame that consists of many stacked time series. The index is (poolId, month) where both are integers, the month being the number of months since 2000. What's the best way to calculate

function returnsAnArray () { return array ('test'); } echo returnsAnArray ()[0]; generates a syntax error in PHP. What's the most efficient way to directly obtain an element from a returned array wit

With a XML type column in SQL server, what is the most efficient way to read this back into an XmlDocument in ADO.Net? For this particular use, an XmlDocument is needed for random-access to the loaded

I have a spreadsheet that has data like this Group,Region,Market G7,EMEA,Germany G7,NA,Canada G7,APAC,Japan What is the most efficient way to capture this information? I use a dictionary to store thi

What is the most efficient way to duplicate a row in an Sqlite3 database exactly except with an updated PrimaryKey?

It's always touted that KD trees are great for nearest neighbor searches. However, if your data set is all discrete values, with no real distance metric, are they still efficient? For example, if you

What is the most efficient way to get an array of months, from a specified date, up until the present day, grouped by year. Eg getMonths(August 2012) would output array( array(Year=>2013, mo

What is the most efficient(fast and safe) way of reading a log file in java? The log file continuously(almost every second) gets updated.

I have a set of n (~1000000) strings (DNA sequences) stored in a list trans. I have to find the minimum hamming distance of all sequences in the list. I implemented a naive brute force algorithm, whic

I have an ArrayList of objects in Java. The objects have four fields, two of which I'd use to consider the object equal to another. I'm looking for the most efficient way, given those two fields, to s

Using jQuery or straight Javascript, I'm looking for the best way to find the leftmost div (or in general the DOM element with the minimum or maximum position on either axis). So far I have two soluti

I am looking for a way to calculate the distance between 2 points on the globe. We've been told to use Haversine, which works fine to calculate the shortest distance between the 2 points. Now, I'd lik

What is the most efficient method/data structure to create collections of similar objects in python? Example: Assume I have a number of Point() instances. Each instance has an x attribute. I'd like to

Considering developer's perspective, what's the most efficient way to create, maintain, and improve a complex Web UI. I'm familiar with a bunch of toolkits like ext.net, telerik, devx. Silverlight is

Let's say we have an array list of objects ObjArray. What is the most efficient way for that object to locate itself within the list, and remove itself from the list? The way I tend to use is this: E

what is the most efficient way to calculate the least common multiple of two integers I just came up with this, it definitely leaves something to be desired int n = 7, m = 4, n1=n, m1=m; while (m1 !=

I just implemented a best match file search algorithm to find the closest match to a string in a dictionary. After profiling my code, I found out that the overwhelming majority of time is spent calcul

What is the most efficient way of getting current time/date/day/year in C language? As I have to execute this many times, I need a real efficient way. I am on freeBSD. thanks in advance.

I have a point set which I have stored its coordinates in three different arrays (xa, ya, za). Now, I want to calculate the euclidean distance between each point of this point set (xa[0], ya[0], za[0]

With the advent of rvalue references on top of Return Value Optimization, what would be the most efficient way to implement a core function like this? How can I improve this implementation or should I

I read the Wikipedia article on Hamming Weight and noticed something interesting: It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case,

What is the quickest way to get a large amount of data (think golf) and the most efficient (think performance) to get a large amount of data from a MySQL database to a session without having to contin

Let v1 be the target vector, v2 needs to be appended to the back of it. I'm now doing: v1.reserve(v1.size() + v2.size()); copy(v2.begin(), v2.end(), back_inserter(v1)); Is this the most efficient way

What's the most efficient way to pass a single char to a method expecting a CharSequence? This is what I've got: textView.setText(new String(new char[] {c} )); According to the answers given here, th

I have a requirement to calculate the installed base for units with different placements/shipments in different countries with different environments over many years given a set of certain retireme

What is the most efficient way to generate 10-character random alphanumeric string in c#?

I'm working on a the Hamming weight for a vector and what I do is count in linear way all the 1 in the vector, is there any more efficient way? int HammingWeight(vector<int> a){ int HG=0; for(in

I am working on replacing environment variables with values in Ruby, and I'm trying to figure out the best way to do this. I have a JSON feed that I am parsing, which looks like this: %MY_SERVER/json/

Finally taking the time to mess with php classes, and curious to know if something like the following is the most efficient way to repeatedly use mysql results: class ProjectDetails { private $Project

To compute the similarity between two documents, I create a feature vector containing the term frequencies. But then, for the next step, I can't decide between Cosine similarity and Hamming distanc

I want to calculate distance using 2 cellID (i am getting cell id from GsmCellLocation class). Please help me. Thanks

What's the most resources efficient way to take a screenshot of display object in as3? This is the code I am currently using: public static function img(o:DisplayObject,width:int,height:int):ByteArray

What is the best way to concatenate string to receive ukr:'Ukraine';rus:'Russia';fr:'France' result? public class Country { public int IdCountry { get; set; } public string Code { get; set; } public s

I want to calculate the nearest cosine neighbors of a vector from the rows of a matrix, and have been testing the performance of a few Python functions for doing this. def cos_loop_spatial(matrix, ve

I have a lucene's index with documents - all of them contain field that stores DateTime value. What would be recommended/most efficient way to extract document with highest value. How it would look li

I have some content with up to 2-levels of replies. I am wondering what the most efficient way to fetch and output the replies. I should note that I am planning on storing the comments with fields con

What's the most efficient way to serialize finite (non-recursive) algebraic-data-types which are comprised only of constructors? e.g. p = A | B q q = C | D r | E r = F | G Manually enumerating all v

I have an algorithm written in Java that I would like to make more efficient. A part that I think could be made more efficient is finding the smallest of 3 numbers. Currently I'm using the Math.min me

What is the most efficient way to concatenate N arrays of objects in JavaScript? The arrays are mutable, and the result can be stored in one of the input arrays.

I need to calculate distances between every pair of points in an array and only want to do that once per pair. Is what I've come up with efficient enough or is there a better way? Here's an example, a

I know that file_get_contents can be used to retrieve the source of a webpage, but I want to know the most efficient way. I have an old class I made a long time ago that uses something like this: $t