October assignment feedback 2

October assignment feedback 2

Grammar[wikitext]

Existing text Suggestion
that combines Transformer and that combines the Transformer and
For example, Compressive Transformer[10] adds For example, a compressive transformer[10] adds
The starting group of memory tokens acts as a read memory The starting group of memory tokens act as a read memory" (is this 'a read memory' or just 'read memory'?)
The choice of how many previous segments to backpropagate is a hyperparameter, with BPTT unroll varying from 0 to 4 previous segments. Sentence is unclear
It can handle sequences over 1 million tokens on a single GPU A bit awkward

Style[wikitext]

  1. Math like 'SG' don't use math font
  2. ◦ is really small and hard to read, consider using in-line as well.
  3. I suggest getting rid of the Content header and upgrading your others. I think it would make the sections feel distinct and more bite-sized.
  4. There are inconsistencies like with the channel-mixing sub-block formulas and vs and .

Content[wikitext]

  1. Can "ameliorates" be defined or a more common word be used?
  2. Can things like "internal attention" be defined?
  3. The Recurrent Memory Transformer header seems like it should be on the same heading level as transformer-XL based on the way it's written.
  4. The caption on the self-attention diagram doesn't feel like it explains what is in the image.
  5. Simiarly, the EMAT diagram could indicate what pieces of the diagram are doing what (left side versus right side)
  6. The outputs for segment are somewhat hard to parse as side-by-side equations. Putting them vertically might help readability.
  7. The paragraph starting with "The RWKV architecture, named after its four fundamental elements" feels like an intro and could benefit from being sooner in the section.
  8. For the sections with , would be used instead to be consistent with RWKV?
  9. The WKV computation explanation is clear. I think it's a good example of breaking down presented information
  10. Is transformer a proper noun? Sometimes it is capitalized and sometimes not. I would assume not.

General[wikitext]

Some terms are left undefined and without a hyperlink, like "differentiable external memory". Image captions should focus more on the image and not the concept the image is presenting. Shorter captions on images would be nicer and putting the longer parts in the body and referencing the image like "Figure 1". Some styling is inconsistent like math sometimes uses quotes. Explaining an equation in plain language can really improve readability.

I enjoyed reading this.

ClairRoss (talk)22:13, 16 October 2023

Thank you for your feedback. I've updated the page and implemented the changes based on your suggestions, particularly focusing on resolving grammatical and stylistic issues. Regarding the inconsistencies in the channel-mixing sub-block formulas, I've kept the original naming as it appears in the paper to maintain accuracy.

In response to Dr. Poole's feedback and to enhance the article's clarity, I've removed the section related to EMAT. As for using WKV, it involves matrix multiplication, where altering the order of vectors impacts the output, making it unfeasible to change. Apart from these points, I've revised the remaining sections as per your comments.

Regarding your suggestion to use shorter captions for figures, I'd like to note that while it's common practice in research papers, it's less common on Wiki pages, especially given the limitations of not being able to link from the text to the images.

AmirhosseinAbaskohi (talk)07:12, 20 October 2023