Course talk:CPSC522/Analyzing online dating trends with Weka

From UBC Wiki

Contents

Thread titleRepliesLast modified
Critique106:59, 23 April 2016
Comments106:57, 23 April 2016
Feedback106:55, 23 April 2016
Suggestions201:18, 23 April 2016

Hi Ritika, A very nice piece of work, indeed. The only part which took some time to understand were the figures, I think they need more explanation to make them lucid.

TanujKrAasawat (talk)02:49, 21 April 2016

Hi Tanuj,
I'm glad you liked my page. Thank you so much for your feedback. I have tried elaborating upon the figures to make it easy for the reader to understand.
Thank you again for your inputs.
Ritika

RitikaJain (talk)06:59, 23 April 2016
 

Nice page!

1. There are a lot interesting figures but some of them are not easy to understand for people without any knowledge about AutoWeka.

2. Could you summarize your results more clearly?

YanZhao (talk)02:16, 21 April 2016

Hi Yan Zhao,
Thanks so much for your feedback. I have tried to elaborate upon the figures for users not familiar with Weka. And I've also worked on my results.
Thanks again,
Ritika

RitikaJain (talk)06:57, 23 April 2016
 

Hey Ritika,

  • I see Deadpoolz2 sent you a message. Did you ever respond? ;)
  • Seeing how you go about understanding OkCupid is actually quite interesting haha. (Though this seems more like a blog post than a wiki page.)
  • I get that your features are the answers to four questions, but how are you defining the distance measure between people?
  • How are you defining error? I don't see why clustering would require an error function.
  • Why is there a need for a test set if the model doesn't have a method for validation? Am I missing something?
  • Why did you choose to only use four questions? Would be more interesting to use more.. no?
  • From your results, it actually seems like k-means is incapable of extracting clusters since all it does it basically output one giant cluster..
TianQiChen (talk)04:42, 21 April 2016

Hi Ricky,

  • I did not respond to Deadpoolz2 :P
  • I guess I kept my page a little informal to keep things interesting and the readers engaged. Don't know if that worked well with everyone, but that was my thinking behind it.
  • So the questions are the attributes and the answers to these questions are my values between which I am calculating the distance. Since the attributes are categorical, my Euclidean distance(in K-means) works like: if they have the same answer, distance is 1; 0 if they have different answers.
  • I am not defining the error, this is how OkCupid finds your true percentage. The error is with regards to finding the true percentage, not with my experimental error.
  • So initially I am clustering the data and then checking if the rest of data is actually falling in the clusters. I tried with just training as well; they gave me the same clusters (so that's positive)
  • In my first set of experiments, I used just four of the most popular attributes (which everyone had answered, i.e. no missing data). I have extended it to incorporate more attributes (as I did in my presentation and my wikipage now).
  • With better granularity on the data, with more attributes with higher variance we can get better clusters. Since in the experiment with 4 attributes those questions were answered almost same by everyone.

Ritika

RitikaJain (talk)06:55, 23 April 2016
 

Suggestions

Edited by author.
Last edit: 05:29, 20 April 2016

Hi Ritika
Awesome page! It was really a fun(especially the almost fake ok cupid account :P) and interesting read. I think you have done a wonderful job and I did not have any problem reading and understanding the page. Just a few suggestions:

  • In the Hypothesis section it would be better if you add the link( http://www.wired.com/2014/01/how-to-hack-okcupid/) to some word instead of keeping it as a text.
  • In the training section you have mentioned- four most popular attributes (q34113, q85419, q416235, q20062). It is a bit confusing here. You have explained them later on in the Cluster Visualization section
  • You might consider adding some explanation to some of the figures - Preprocessing the data, Setting K-means clustering parameters. People who are not familiar with Weka might get confused.
  • I think the result section needs a bit of formatting. Maybe add some things in bold and the formulas(true match percentage, reasonable margin of error) in the algorithm format. These are trivial things and just suggestions from my end.
SamprityKashyap (talk)03:46, 20 April 2016

Thanks a lot for your suggestions Samprity. I'm glad you enjoyed going through my page.
I shall make the suggested changes and get back to you.
Thanks, Ritika

RitikaJain (talk)05:27, 20 April 2016

Hi Samprity,
I have made the suggested changes in my page. Thanks so much for your feedback! Let me know if there is anything else that needs modification.
Thanks,
Ritika

RitikaJain (talk)01:18, 23 April 2016