critique

Hi Samprity,

Interesting work about a challenging topic! It's especially good that you touched on so many of the challenges involved in sentiment analysis.

Section-specific feedback:

Abstract

  • Related pages section - I expected this to list related pages on the course wiki, but all the links are external. These links might be better suited to a "See Also" section at the end near the references.

Hypothesis

  • The hypothesis doesn't feel very precise to me. What modifications are you testing? What relationships are you trying to find? About which data?
  • It may improve the overall flow of the page if the hypothesis goes after the background. Or perhaps not; it's something to consider in any case.

What is Sentiment Analysis?

  • The figure doesn't seem to contribute anything to the page. It might be more useful to have a sample text with positive and negative polarity words highlighted.

Examples of Sentiment Analysis

  • I'd like to see a citation about the Obama administration's use of sentiment analysis.

Methodology

  • There are more grammar issues here than in the previous sections of the page.
  • You use both "we" and "I"; you should pick one (probably "we" since there are collaborators) and use it consistently.

Training

  • I have some concerns about negation handling as described (fair enough though, since negation is a non-trivial problem anyway). Two examples:
    • "This was far from the best movie I've ever seen" - a type of negation not handled by your regex.
    • "This movie didn't have very good characterization, but the CGI was amazing." - do you check for words such as "but" after negation?

Evaluation and results

  • Extra proofreading would be good here as well.
  • I thought from reading the page up to this point that probabilities were being determined for the polarity of individual words, and then you aggregate them somehow to determine an overall label for a sentence. Then, you provide results for entire movie reviews. As a result, I'm a little confused about what granularity you're aiming for, and what aggregation methods you use.
  • I don't see Neutral labels in the movie review results (or in the training data, for that matter). I might have missed something while reading, but is it assumed that all sentences in movie reviews will have some sentiment polarity?

Discussion and Future Work

  • I guess this is going back to the hypothesis, but were you going for 100%? Were you going for "better than random"?
JordonJohnson (talk)20:56, 18 April 2016

Thanks for the great feedback!

  1. I will move the related pages section to see also section
  2. I initially did have hypothesis after background. I will try to change the hypothesis to accommodate your suggestions
  3. "This was far from the best movie I've ever seen" "This movie didn't have very good characterization, but the CGI was amazing." Yes, these are not handled by the regex. (I am a newbie to ML)
  4. I was mostly focussing on the tokenization part. The aggregation was taken care by the library. I can look through the library and add some explaination.
  5. Yes neutral labels were not there in the dataset.
  6. I wanted to improve the accuracy, basically better than random!
SamprityKashyap (talk)21:17, 18 April 2016

As far as I know, negation is nowhere close to being a solved problem, so no worries there as long as it's acknowledged in the page.

As far as the position of the hypothesis in the page, it might be wise to wait for the other critiques as well, as that's more a stylistic decision than one with a correct/incorrect answer.

I think David really wants our hypotheses to be as precise as possible, so he's going to want to see "better than random" somewhere in your hypothesis section.

Thanks for the quick reply!

JordonJohnson (talk)22:07, 18 April 2016

Thanks a lot for the constructive suggestions! I will modify the hypothesis to make it more specific.

SamprityKashyap (talk)22:17, 18 April 2016