Suggestions
Hi Jordon,
Another great page! I enjoy reading your pages. Some suggestions and questions:
- Abstract: It is concise and precise. You have mentioned "Hypothesis considered is that the cosine similarity of vector representations"- I could not understand this in the abstract part. Some layman explanation would be better.
- In the Linked Sentences section, my understanding is that a post cannot refer to a sibling post. So in figure 2a 5 cannot refer to 4. Is this correct?
- In figure 2b {orange, gray, purple} correspond to {positive, neutral, negative}) and ({dashed, solid, dotted} correspond to {in_favour, impartial, against}. Don't they represent the same thing ie positive: in favour, neutral: impartial, negative: against? Why do we need different representations? (maybe I have missed something while reading)
- I did not quite understand the difference between the similarity measures discussed. Both of them seemed to use some form of vector representation for finding similarity between texts.
- In gathering data section how does the annotation work? Is it manual or is there some kind of tool/script to get annotated comments?
- In the procedure section, you have mentioned: "Pairs of linked sentences were extracted". Is it like the com_sen and art_sen pair that you mentioned in the background section?
- You may consider adding a line about ROC curve.
- How were threshold and error margin chosen in table 1?
Hi Samprity, thanks for the feedback! I may not have time to address it all in the page, but I'll do as much as I can.
- Abstract: It is concise and precise. You have mentioned "Hypothesis considered is that the cosine similarity of vector representations"- I could not understand this in the abstract part. Some layman explanation would be better.
- Perhaps I'll modify it to talk more about performance than about the specifics of the classification method.
- In the Linked Sentences section, my understanding is that a post cannot refer to a sibling post. So in figure 2a 5 cannot refer to 4. Is this correct?
- Correct, though I'm assuming that people will follow the proper reply-to format. That doesn't always happen.
- In figure 2b {orange, gray, purple} correspond to {positive, neutral, negative}) and ({dashed, solid, dotted} correspond to {in_favour, impartial, against}. Don't they represent the same thing ie positive: in favour, neutral: impartial, negative: against? Why do we need different representations? (maybe I have missed something while reading)
- Usually they're closely related, but there are occasions where someone can have negative sentiment about something but be in favour of it. For example, in an article about capital punishment, someone could comment something like "The death penalty is a necessary evil; it's barbaric, but justice must be met." This person would have negative sentiment about the death penalty but still be in favour of it. I didn't want to get into it too much because it's only tangentially related to the page.
- I did not quite understand the difference between the similarity measures discussed. Both of them seemed to use some form of vector representation for finding similarity between texts.
- The difference would be in calculating, for example, the similarity between the sentences "I love pie!" and "Cake is my favourite!" These sentences have no common words, and so their cosine similarity under one measure would be zero. The dimensions of word embeddings represent semantic features (in a poorly-understood way), and so the two sentences would score fairly high using that vector representation. If time permits and if it helps, I might include this example in the page.
- In gathering data section how does the annotation work? Is it manual or is there some kind of tool/script to get annotated comments?
- It was different for the two datasets; in the OnForumS dataset it was more like validation of system output than actual annotation, whereas the BC3 dataset was human-annotated.
- In the procedure section, you have mentioned: "Pairs of linked sentences were extracted". Is it like the com_sen and art_sen pair that you mentioned in the background section?
- Precisely.
- You may consider adding a line about ROC curve.
- I'll see how much time I have.
- How were threshold and error margin chosen in table 1?
- mean accuracy ((TP+TN)/all) and standard deviation over three runs; odd that I didn't include that in the table caption...