Feedback

Hi Ricky,

I did not respond to Deadpoolz2 :P
I guess I kept my page a little informal to keep things interesting and the readers engaged. Don't know if that worked well with everyone, but that was my thinking behind it.
So the questions are the attributes and the answers to these questions are my values between which I am calculating the distance. Since the attributes are categorical, my Euclidean distance(in K-means) works like: if they have the same answer, distance is 1; 0 if they have different answers.
I am not defining the error, this is how OkCupid finds your true percentage. The error is with regards to finding the true percentage, not with my experimental error.
So initially I am clustering the data and then checking if the rest of data is actually falling in the clusters. I tried with just training as well; they gave me the same clusters (so that's positive)
In my first set of experiments, I used just four of the most popular attributes (which everyone had answered, i.e. no missing data). I have extended it to incorporate more attributes (as I did in my presentation and my wikipage now).
With better granularity on the data, with more attributes with higher variance we can get better clusters. Since in the experiment with 4 attributes those questions were answered almost same by everyone.

Ritika