Course talk:CPSC522/Linking Sentences in Asynchronous Conversations

From UBC Wiki

Contents

Thread titleRepliesLast modified
response205:56, 23 April 2016
Feedback on Linking Sentences in Asynchronous Conversations119:19, 22 April 2016
Suggestions205:07, 20 April 2016
Suggestions304:33, 20 April 2016

Great Page Jordon,

One general query can the same idea be extended for the same subject talked about more than once in the same sentence? More precisely how cosine similarity would work in linking right subjects with right expressions inside a sentence?

PrithuBanerjee (talk)07:00, 21 April 2016

Hi Prithu,

I'm not sure I understand the question correctly, but I believe you're asking about looking at applications at a smaller granularity than the sentence level. I suppose there would be two chief considerations:

  • What is the smallest granularity at which the idea of a "topic" is meaningful? I suspect it would be at the clause level, where each clause represents a complete thought. For example, if we take the sentence "I love pie, and my son plays Minecraft all the time," there are two thoughts there, and each of those thoughts is about at least one topic (favourite foods, favourite video game, how we spend our time, etc.). So, if we were to attempt to break down sentences for more detailed subject agreement, I suspect we'd have to stop at the clause level; but that leads me to the next consideration:
  • How do we deal with aggregation? For example, consider a conversation with two topics: dogs and cats. If I say "My dog and my cat are both very good with children," then in a way I've aggregated the topics into one clause; and it might be worthwhile to split the aggregation and deal with that sentence as if it were two sentences ("My dog is good with kids" and "My cat is good with kids"). From a cosine similarity point of view, the aggregate clause would score relatively well with sentences about each of the two topics, but the split pair of sentences would likely score higher with their respective topics.

Of course, I don't have any actual data to back this up, but it seems reasonable to me. Did I actually address what it was you were asking, or did I misunderstand the question completely?

JordonJohnson (talk)19:34, 22 April 2016

Ya granularity was my query. What you said makes perfect sense. I encountered this granularity barrier in my project as well. For example "I wont the place was bad. If you have anything but taco in mind then you are safe" - it's challenging to understand the taco was bad at a restaurant, however other items were alright.

PrithuBanerjee (talk)05:56, 23 April 2016
 
 

Feedback on Linking Sentences in Asynchronous Conversations

Hi Jordon, An informative page with enough information. The writing in my opinion was concise and to the point. I believe the following points would be helpful for you to build the page.

  • The language in some places [e.g. Abstract and hypothesis] are written in complex sentences and a bit convoluted to understand. Breaking down into smaller chunks would be helpful in my opinion
  • I am curious to know your feedback on a matter you mentioned in the discussion section regarding how various random sentences can be linked. Is there anyway how random sentences in disjoint asynchronous conversations can be used. This might be beyond you page but some insight regarding that would be appreciated

I applaud your hardwork and contribution to the development of the WIKI

MDAbedRahman (talk)07:22, 21 April 2016

Hi Abed, thank you for the feedback!

I agree that I try to cram too much into individual sentences at times; I'll take a quick pass over the areas you mentioned and see what I can do.

Regarding your other point, I'm not quite sure if I understand your meaning, but my reasoning went like this:

  • Given an asynchronous conversation, let's call the set of all its sentences S, and the set of all its linked pairs of sentences L. If I randomly choose pairs of sentences from S, it is possible for some of the pairs to be in L.

If you're asking if there would be some use in looking at sentence pairs where each sentence is from a different conversation, I'm not sure that there would be too much use there. The end goal is to determine which pairs of sentences in an asynchronous conversation are linked, which means figuring out how to distinguish between linked sentence pairs and not-linked sentence pairs in the same conversation. If I had had a single dataset with annotations for linked sentences, not-linked sentences, and topic information, that would have been ideal.

JordonJohnson (talk)19:19, 22 April 2016
 

Suggestions

Hi Jordon,
Another great page! I enjoy reading your pages. Some suggestions and questions:

  • Abstract: It is concise and precise. You have mentioned "Hypothesis considered is that the cosine similarity of vector representations"- I could not understand this in the abstract part. Some layman explanation would be better.
  • In the Linked Sentences section, my understanding is that a post cannot refer to a sibling post. So in figure 2a 5 cannot refer to 4. Is this correct?
  • In figure 2b {orange, gray, purple} correspond to {positive, neutral, negative}) and ({dashed, solid, dotted} correspond to {in_favour, impartial, against}. Don't they represent the same thing ie positive: in favour, neutral: impartial, negative: against? Why do we need different representations? (maybe I have missed something while reading)
  • I did not quite understand the difference between the similarity measures discussed. Both of them seemed to use some form of vector representation for finding similarity between texts.
  • In gathering data section how does the annotation work? Is it manual or is there some kind of tool/script to get annotated comments?
  • In the procedure section, you have mentioned: "Pairs of linked sentences were extracted". Is it like the com_sen and art_sen pair that you mentioned in the background section?
  • You may consider adding a line about ROC curve.
  • How were threshold and error margin chosen in table 1?
SamprityKashyap (talk)23:58, 19 April 2016

Hi Samprity, thanks for the feedback! I may not have time to address it all in the page, but I'll do as much as I can.

  • Abstract: It is concise and precise. You have mentioned "Hypothesis considered is that the cosine similarity of vector representations"- I could not understand this in the abstract part. Some layman explanation would be better.
    • Perhaps I'll modify it to talk more about performance than about the specifics of the classification method.
  • In the Linked Sentences section, my understanding is that a post cannot refer to a sibling post. So in figure 2a 5 cannot refer to 4. Is this correct?
    • Correct, though I'm assuming that people will follow the proper reply-to format. That doesn't always happen.
  • In figure 2b {orange, gray, purple} correspond to {positive, neutral, negative}) and ({dashed, solid, dotted} correspond to {in_favour, impartial, against}. Don't they represent the same thing ie positive: in favour, neutral: impartial, negative: against? Why do we need different representations? (maybe I have missed something while reading)
    • Usually they're closely related, but there are occasions where someone can have negative sentiment about something but be in favour of it. For example, in an article about capital punishment, someone could comment something like "The death penalty is a necessary evil; it's barbaric, but justice must be met." This person would have negative sentiment about the death penalty but still be in favour of it. I didn't want to get into it too much because it's only tangentially related to the page.
  • I did not quite understand the difference between the similarity measures discussed. Both of them seemed to use some form of vector representation for finding similarity between texts.
    • The difference would be in calculating, for example, the similarity between the sentences "I love pie!" and "Cake is my favourite!" These sentences have no common words, and so their cosine similarity under one measure would be zero. The dimensions of word embeddings represent semantic features (in a poorly-understood way), and so the two sentences would score fairly high using that vector representation. If time permits and if it helps, I might include this example in the page.
  • In gathering data section how does the annotation work? Is it manual or is there some kind of tool/script to get annotated comments?
    • It was different for the two datasets; in the OnForumS dataset it was more like validation of system output than actual annotation, whereas the BC3 dataset was human-annotated.
  • In the procedure section, you have mentioned: "Pairs of linked sentences were extracted". Is it like the com_sen and art_sen pair that you mentioned in the background section?
    • Precisely.
  • You may consider adding a line about ROC curve.
    • I'll see how much time I have.
  • How were threshold and error margin chosen in table 1?
    • mean accuracy ((TP+TN)/all) and standard deviation over three runs; odd that I didn't include that in the table caption...
JordonJohnson (talk)04:47, 20 April 2016

Thank you for the clarifications!

SamprityKashyap (talk)05:07, 20 April 2016
 
 

Suggestions

Hi Jordan,

Good Job!

Just a few suggestions to make things easier to understand:


>Linking Sentences in Asynchronous Conversations

The sentence “Linking sentences in an asynchronous conversation is the process of determining, given a sentence in the conversation, the earlier sentence to which it refers.” seems a bit complicated for me. Non-native speakers might have troubles understanding the sentence.

>Abstract

It might be better to not to put the list of contents on the very first part of the abstract. Instead you can briefly go through the contents.

>Asynchronous Conversations

It should be explained more.

“Asynchronous conversations, however, can fracture into tree structures”. How can it fracture into tree structures? Does figure 1 indicate the tree structure? If it does, I cannot map the figure with a tree structure.

“a participant (or poster) can reply to any existing post”

Where is a reply sample in this picture?

You may want to add a simple synchronous conversation to make things easier to understand.

“Due to their continuing proliferation online, the exploration and analysis of asynchronous conversations are of significant interest to researchers.”

I did not get the idea of why they are important to study?

You may want to separate figure 2a and 2b or draw a line to separate them. (And one other suggestion could be to add the “a” in the figure below the corresponding part.

It might be a good idea to add some background about “sentiment analysis”. I’m not familiar with NLP and so going through different links to get the background is not a good idea. “Word embedding” has not been explained in the background too.

>Procedure

It is usually better to add the figure after the explanation about that. Figure 3 has been added before the explanation. You may want to add it after starting to talk about that (maybe in the result section).

BahareFatemi (talk)02:47, 19 April 2016

Hi Bahare, thanks for the feedback! I doubt I'll have time to enact changes based on all your suggestions, but they're appreciated in any case.

>Linking Sentences in Asynchronous Conversations The sentence “Linking sentences in an asynchronous conversation is the process of determining, given a sentence in the conversation, the earlier sentence to which it refers.” seems a bit complicated for me. Non-native speakers might have troubles understanding the sentence.

  • I'll split it into a couple of sentences to make it more readable.

>Abstract It might be better to not to put the list of contents on the very first part of the abstract. Instead you can briefly go through the contents.

  • I'm not quite sure what you mean here; could you please clarify?

>Asynchronous Conversations It should be explained more.

  • You gave a couple of examples of things you feel need clarification; is there anything else you feel is insufficiently explained in this section?

“Asynchronous conversations, however, can fracture into tree structures”. How can it fracture into tree structures? Does figure 1 indicate the tree structure? If it does, I cannot map the figure with a tree structure.

  • Figure 1 does indicate tree structure; the indentation indicates childhood of one post to another. Common visualizations of file system directory trees are the same. I'll try to be more explicit in the figure caption, and I'll revisit the explanation in the text.

“a participant (or poster) can reply to any existing post” Where is a reply sample in this picture?

  • The fact that sentence 6 appears above sentence 5 rather than below it is because it is in reply to the post with sentences 3 and 4. I'll try to make it more explicit in the figure caption.

You may want to add a simple synchronous conversation to make things easier to understand.

  • I had assumed that since all face-to-face conversations are synchronous, that wouldn't be necessary; but I'll consider it if time permits.

“Due to their continuing proliferation online, the exploration and analysis of asynchronous conversations are of significant interest to researchers.” I did not get the idea of why they are important to study?

  • I'll be more explicit about this in the text. The main issue is that methods to analyse synchronous conversations don't always work well with asynchronous conversations due to the structural differences, and so new/modified techniques need to be discovered.

You may want to separate figure 2a and 2b or draw a line to separate them. (And one other suggestion could be to add the “a” in the figure below the corresponding part.

  • I'll consider this if time permits.

It might be a good idea to add some background about “sentiment analysis”. I’m not familiar with NLP and so going through different links to get the background is not a good idea. “Word embedding” has not been explained in the background too.

  • sentiment analysis is only tangentially related to this page as an interesting application of links; since it isn't directly related to the hypothesis, I'm hesitant to spend extra time on it. Word embeddings are more directly related, but I doubt I'll have time to add much about that.

>Procedure It is usually better to add the figure after the explanation about that. Figure 3 has been added before the explanation. You may want to add it after starting to talk about that (maybe in the result section).

  • Agreed, but when I tried it, it did some annoying things due to its proximity to Figure 4. If time permits I may fiddle with it again.
JordonJohnson (talk)19:50, 19 April 2016

The page is awesome. Even if you don't have time to go through these issues, it is still great.

It's maybe because I don't know much NLP. Others might understand these things easily.

Cheers,

BahareFatemi (talk)03:01, 20 April 2016

You raised some excellent points, and I'll be sure to account for as many of them as I can; thank you again!  :)

JordonJohnson (talk)04:33, 20 April 2016