Feedback

Hi Ricky,

  • I did not respond to Deadpoolz2 :P
  • I guess I kept my page a little informal to keep things interesting and the readers engaged. Don't know if that worked well with everyone, but that was my thinking behind it.
  • So the questions are the attributes and the answers to these questions are my values between which I am calculating the distance. Since the attributes are categorical, my Euclidean distance(in K-means) works like: if they have the same answer, distance is 1; 0 if they have different answers.
  • I am not defining the error, this is how OkCupid finds your true percentage. The error is with regards to finding the true percentage, not with my experimental error.
  • So initially I am clustering the data and then checking if the rest of data is actually falling in the clusters. I tried with just training as well; they gave me the same clusters (so that's positive)
  • In my first set of experiments, I used just four of the most popular attributes (which everyone had answered, i.e. no missing data). I have extended it to incorporate more attributes (as I did in my presentation and my wikipage now).
  • With better granularity on the data, with more attributes with higher variance we can get better clusters. Since in the experiment with 4 attributes those questions were answered almost same by everyone.

Ritika

RitikaJain (talk)06:55, 23 April 2016