MetaChat is an informal place for MeFites to touch base and post, discuss and
chatter about topics that may not belong on MetaFilter. Questions? Check the FAQ. Please note: This is important.
30 December 2005
I need help with math or statistics or something→[More:]
I have several entities. I need to group them, such that the entities that have the greatest "affinity" are "closest" together in "space".
Each entity has a quantitative "affinity" for each other entity. These "affinities" are not symmetric. e.g:
A sends 50% of its "love" to B, 25% to C, and 25% to D,
B sends 10% of its "love" to A, 10% to itself, 50% to C, and the remaining 30% to D, etc.
What's the algorithm or heuristic? It smells NP-complete to me, but then I'm not real bright.
Is there a decent statistical way to "show" the groups by "affinity"?
I assume you want to do this in 2D, in which case it's generally impossible to make a perfect representation. (At the other extreme, if you allow yourself enough dimensions to place the points, it's trivial). So the problem is usually one of minimizing error, rather than finding an exact solution, and you can't speak about algorithmic complexity until you are precise about the quantity to minimize.
Anyway, one way to approach the problem is to define an "unhappiness" function that measures how bad a configuration is. You think of it as a function of many variables (the coordinates of all the points you're trying to position). Initially, you just throw all the dots down wherever, or make some kind of a naive guess at a good start. Then you move stepwise against the gradient until you come to a local minimum.
Another way of modelling it is to, again, plop all the points down in some initial configuration - either randomly, or a reasonable guess. Then let each point exert a force on all the other points proportional to it's "affinity". Let each point move a bit in the direction of the net force acting on it, then recalculate and repeat. There's a popular music-matching applet out there somewhere that takes this approach. Eventually, you detect that the net forces on each object are sufficiently close to zero, and stop the algorithm. Your exit condition also has to allow for the possibility of orbits developing.
This is the point of view taken by many programs that try to draw graphs based on an adjacency matrix, which is a simplified version of your problem.
Clumping is harder than spatial arrangement. You might say that "figure out which of these belong together" is (one of) the holy grail(s) of computer perception.
I think at the moment your problem is not sufficiently well-defined to permit an answer. If you were working by hand, how would you clump the following sets:
1. $n$ objects, all with equal affinities for one another.
2. $n$ objects with a circular ordering; each object has great affinity for its succesor, but no affinity for any other object.
3. $n$ objects with a linear ordering; each object has great affinity for its neighbor(s), but no affinity for any other objects.
Until you can define a measure of what constitutes a "good" clumping, there won't be any progress. If you can devise a suitable measurement, then it's possible to derive a dynamical approach to maximizing the goodness.