Critique

I am happy to see this page on transformers. It is a fundamental topic and a must-read for everyone in this course. The author has provided a clear walk-through and also talked about BERT/GPT variants. Here are a few suggestions/clarifications.

  • Could you extend a little more on positional encodings (though I understand that it can be added as a separate foundational page, maybe you can direct to some links for clarity).
  • I agree with Mathew and I think you should stick to the notation from the paper while explaining self-attention.
  • Please add links to pages such as RNN/GRU/CNN/LSTMs in “Builds On” section.
  • Figure 3 misses link to source.
  • I think it would be nice to also add links to terms such as ReLU, Adam optimizer, cross entropy etc.v
NIKHILSHENOY (talk)20:22, 19 March 2023