Critique
From Talk:Transformers
I am happy to see this page on transformers. It is a fundamental topic and a must-read for everyone in this course. The author has provided a clear walk-through and also talked about BERT/GPT variants. Here are a few suggestions/clarifications.
- Could you extend a little more on positional encodings (though I understand that it can be added as a separate foundational page, maybe you can direct to some links for clarity).
- I agree with Mathew and I think you should stick to the notation from the paper while explaining self-attention.
- Please add links to pages such as RNN/GRU/CNN/LSTMs in “Builds On” section.
- Figure 3 misses link to source.
- I think it would be nice to also add links to terms such as ReLU, Adam optimizer, cross entropy etc.v
NIKHILSHENOY (talk)