Euclidean distance between two objects

Markus G. :

First of all I know what the Euclidean distance is and what it does or calculates between two vectors.

But my question is about how to calculate the distance between two class objects for example in Java or any other OOP-Language. I read pretty much stuff about machine learning already wrote a classifier using libraries etc. but I want to know how the Euclidean distance is calculated when I have for example this object:

class Object{
    String name;
    Color color;
    int price;
    int anotherProperty;
    double something;
    List<AnotherObject> another;
}

What I already know (If I am not wrong!) is that I have to convert this object to a(n) vector / array representing the properties or 'Features' (called in Machine Learning?)

But how can I do this? It is just this piece of puzzle which I need, to understand even more.

Do I have to collect all possible values for a property to convert it to a number and write it in the array/vector?

Example:

I guess the above object would be represented by an 6-dimensional array or smaller based on the 'Features' which are necessary to calculate. Let's say Color, Name and the price are those necessary features the array/vector based on the following data:

  • color: green (Lets say an enum with 5 possible values where green is the third one)
  • name: "foo" (I would not know how to convert this one maybe using addition of ascii code?)
  • price: 14 (Just take the integer?)

would look like this?

[3,324,14]

And if I do this with every Object from the same class I am able to calculate the Euclidean distance. Am I right or did I misunderstand something, or is it completely wrong?

rghome :

For each data type you need to choose an appropriate method of determing the distance. In many cases each data type may also itself have to be treated as a vector.

For colour, for example, you could express the colour as an RGB value and then take the Euclidian distance (take the 3 differences, square them, sum and then square root). You might want to chose a different colour-space than RGB (e.g., HSI). See here: Colour Difference.

Comparing two strings is easier: a common method is the Levenshtein distance. There is an method in the Apache commons StringUtils class.

Numbers - just take the difference.

Every type will require some consideration for the best way of either generating a distance directly or calculating the a numeric value that can then be subtracted to give a "distance".

Once you have a vector of all of the "values" of all of the fields for each object you can calculate the Euclidian distance (square the differences, sum and square root the sum).

In your case, if you have:

object 1: [3,324,14]
object 2: [5,123,10]

The Euclidian distance is:

sqrt( (3-5)^2 + (324-123)^2 + (14-10)^2 )

But in the case of comparing strings, the Levenshtein algorithm gives you the distance directly without intermediate numbers for the fields.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=91926&siteId=1