Let's Solve: Pairing Points to Minimize Distance

Allen Webster

#20536

February 19, 2019

So I have been thinking about a particular problem and thought it was probably easier to ask here then to search, as the description of the problem doesn't turn up any names that I can use to search more. There's a decent chance that this problem is already theoretically "solved" or shown to be NP-Hard or something, so if you just point me to relevant information that's fine. If you try to offer your own solutions or thoughts, that's great too.

The Problem:
Given two sets of points A and B where the cardinality (size) of set A and B are equal, find a set S of pairs of points meeting the requirements defined below.

In the abstract "points" can be anything with a distance metric d, that is to say we can take two "points" p and q and d(p,q) == 0 if and only if p == q; d(p,q) == d(q,p); and triangle inequality.

Let the "cost" of a single pair (p,q) be d(p,q), that is let the "cost" be defined by the distance metric.

Let the "total cost" of the set S be the sum of the cost of all pairs in S.

1. Each pair in S is made up of one point from A and one point from B.
2. Each point in A is a member of exactly one pair in S.
3. Each point in B is a member of exactly one pair in S.
4. S minimizes the total cost.

A Concrete Example:
To make the problem concrete I'll simply define a skeleton, in C++, of a function that solves the problem:

struct Vec3{
    float x;
    float y;
    float z;
};
struct Pair{
    int i;
    int j;
};
float distance(Vec3 p, Vec3 q){
    p.x -= q.x;
    p.y -= q.y;
    p.z -= q.z;
    return(sqrtf(p.x*p.x + p.y*p.y + p.z*p.z));
}
Pair *minimal_distance_pairing(Vec3 *a, Vec3 *b, int set_count){
    Pair *result = (Pair*)malloc(set_count*sizeof(Pair));
    // TODO: solve the problem
    return(result);
}

In this example we are given two sets of points in 3-dimensional euclidean space, with a euclidean distance metric, and want to pair off all the points while minimizing the total distance of the pairing.

I am interested both in both the euclidean version, and the fully abstract version, so whatever thoughts anyone wants to offer on either variant are welcome!

Edited by Allen Webster on February 20, 2019, 12:32am

Ben Visness

#20537

February 19, 2019

Gut reaction: this sounds a lot like the traveling salesman problem.

Mārtiņš Možeiko

#20538

February 19, 2019

Does rules allow one point to belong to multiple pairs?
For example, if points are positioned like this (on 2D plane):

*1*  *2*
*3*



     *4*
*5*  *6*

And A is {1,3,5}, and B is {2,4,6}, then S should be {1,2} {3,2} {5,4} {5,6}. Is this allowed?

Btw, because distance function is always nonnegative there's really no reason for sqrtf. You can compare squared distance between points and comparison result will be same as actual comparison of distance.

Edited by Mārtiņš Možeiko on February 19, 2019, 11:29pm

Asaf Gartner

#20540

February 20, 2019

Would this work: https://brilliant.org/wiki/hungarian-matching/ | https://en.wikipedia.org/wiki/Hungarian_algorithm ?

Allen Webster

#20541

February 20, 2019

Ben Visness
Gut reaction: this sounds a lot like the traveling salesman problem.

I agree to a certain extent. Basically the set of possible solutions in both cases are permutations of N elements, and they each are minimization problems. But unlike TSP the order of elements doesn't quite matter the same way here. In TSP a pair p,q has the same cost no matter where it is placed. In this problem the cost of p is determined by where in the output permutation it is placed, and not at all by it's neighbor.

So I can start to imagine an optimization where we form an NxN table showing the cost of each possible pairing, then the problem is one of selecting N spots in the matrix with no repeated columns or rows that minimizes the total cost of the N selections.

In the TSP the NxN matrix shows the cost of each possible travel path. You are still traveling into and out of each node once, and therefore still picking N spots in the matrix with no repeated columns or rows (for example, if the ith column encodes the cost of traveling out of node i for each possible destination node j, then travling into and out of each node once is the same as selecting one element from each row and column).

So I can see that both problems easily reduce to this matrix selection minimization problem:
TSP <= MATRIX-SELECTION
PAIRS <= MATIX-SELECTION

But that doesn't really prove anything useful about how hard "PAIRS" is. It may be that the metric version of TSP can reduce to this "PAIRS" problem but I haven't found it so far.

mmozeiko
Does rules allow one point to belong to multiple pairs?
For example, if points are positioned like this (on 2D plane):
1
2
3
4
5
6
7
*1*  *2*
*3*



     *4*
*5*  *6*
And A is {1,3,5}, and B is {2,4,6}, then S should be {1,2} {3,2} {5,4} {5,6}. Is this allowed?

Btw, because distance function is always nonnegative there's really no reason for sqrtf. You can compare squared distance between points and comparison result will be same as actual comparison of distance.

Ahh good question, I did not explain that is clearly as I inteded to. To make it clear I've adjusted rules 2 and 3 to reflect that each point is a member of exactly one pair. Another equivalent rule would be to simply state that the size of S is equal to the size of A and B.

Edited by Allen Webster on February 20, 2019, 12:40am

Allen Webster

#20542

February 20, 2019

AsafG
Would this work: https://brilliant.org/wiki/hungarian-matching/ | https://en.wikipedia.org/wiki/Hungarian_algorithm ?

Aha! That appears to be a generalized solution to a non-metric version of this problem, which makes it a good starting point for me. I'll bet that by taking advantage of the metric property we could possibly do better than O(n^3) and I'm almost certain we can do better than O(n^3) in the euclidean case. Regardless though this is super useful progress, thanks!

Edited by Allen Webster on February 20, 2019, 12:40am

Allen Webster

#20543

February 20, 2019

Okay so I walked away for a few minutes and then something hit me that I just had to post because I made a mistake earlier.

I realized that if my reduction of TSP to the MATRIX-SELECTION problem worked, and the Hungarian Matching algorithm worked, which solves what I was calling the MATRIX-SELECTION problem, then we would have a solution for TSP that runs in O(n^3). Now, we're all smart, but not that smart, so I had to figure out what was wrong with the most suspicious link in the chain my reduction of TSP to what I should have actually called the Hungarian Matching problem.

I said:

In the TSP the NxN matrix shows the cost of each possible travel path. You are still traveling into and out of each node once, and therefore still picking N spots in the matrix with no repeated columns or rows (for example, if the ith column encodes the cost of traveling out of node i for each possible destination node j, then travling into and out of each node once is the same as selecting one element from each row and column).

The problem with this is that it leaves out a special constraint which is that there be only one single cycle through the entire graph. Or in other words that if the jth row is selected in the ith column, the ith row cannot be selected in the jth column, nor can the ith row be selected in the kth column if the kth row is selected in the jth column (assuming N > 3) etc.

In my PAIRS problem and the generalized Hungarian Matching problem any selection in the matrix is legal so long as each row and column are used only once, and that freedom is what makes it easier to solve.

Edited by Allen Webster on February 20, 2019, 1:01am

Jesse

#20577

February 22, 2019

This sounds reducible to the Subset Sums problem where the OPT function is the distance minimization constraint. O(nD), so pseudo polynomial if the reduction isn't in my imagination alone.

On second thought -- is this different than a 'find the closest two points' problem? In Algorithm Design (Wesley 2005), chapter 13.7 offers a randomized algorithm that finds the two closest points in O(n) expected time. Or maybe you wanting this particular problem solved n times -- like a sort on distance?

Edited by Jesse on February 22, 2019, 9:30pm