Feedback
Hey Ritika,
- I see Deadpoolz2 sent you a message. Did you ever respond? ;)
- Seeing how you go about understanding OkCupid is actually quite interesting haha. (Though this seems more like a blog post than a wiki page.)
- I get that your features are the answers to four questions, but how are you defining the distance measure between people?
- How are you defining error? I don't see why clustering would require an error function.
- Why is there a need for a test set if the model doesn't have a method for validation? Am I missing something?
- Why did you choose to only use four questions? Would be more interesting to use more.. no?
- From your results, it actually seems like k-means is incapable of extracting clusters since all it does it basically output one giant cluster..
TianQiChen (talk)
Hi Ricky,
- I did not respond to Deadpoolz2 :P
- I guess I kept my page a little informal to keep things interesting and the readers engaged. Don't know if that worked well with everyone, but that was my thinking behind it.
- So the questions are the attributes and the answers to these questions are my values between which I am calculating the distance. Since the attributes are categorical, my Euclidean distance(in K-means) works like: if they have the same answer, distance is 1; 0 if they have different answers.
- I am not defining the error, this is how OkCupid finds your true percentage. The error is with regards to finding the true percentage, not with my experimental error.
- So initially I am clustering the data and then checking if the rest of data is actually falling in the clusters. I tried with just training as well; they gave me the same clusters (so that's positive)
- In my first set of experiments, I used just four of the most popular attributes (which everyone had answered, i.e. no missing data). I have extended it to incorporate more attributes (as I did in my presentation and my wikipage now).
- With better granularity on the data, with more attributes with higher variance we can get better clusters. Since in the experiment with 4 attributes those questions were answered almost same by everyone.
Ritika
RitikaJain (talk)