File:RMT Peformance.png

File
File history
File usage

Size of this preview: 800 × 226 pixels. Other resolution: 1,351 × 381 pixels.

Original file ‎(1,351 × 381 pixels, file size: 101 KB, MIME type: image/png)

Summary

Description	English: In this image, we observe the performance of three models - Baseline, Transformer-XL, and the RMT - on copy and reverse tasks. In a single segment setting, all models perform admirably, as the entire sequence is accessible without the need for recurrence. However, when the number of segments increases, the non-recurrent Baseline model encounters challenges in task-solving. In contrast, both memory models, Transformer-XL and RMT, exhibit the ability to retain crucial information from previous segments in memory. Of significance, RMT surpasses Transformer-XL in performance as the number of segments rises, as demonstrated in the panels showcasing per-character accuracy on various tasks. These tasks encompass copying, reversing, and associative retrieval, each with distinct source/target sequence lengths, and memory/cache sizes equivalent to the segment length for both models. Furthermore, it's important to note that RMT does not pass gradients between segments in this experiment, leading to distinct results compared to the Baseline model.
Date	2022(2022)
File source	https://arxiv.org/pdf/2207.06881.pdf
Author	Aydar Bulatov,Yuri Kuratov, Mikhail S. Burtse

Licensing

Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-ShareAlike 4.0. The full text of this license may be found here: CC by-sa 4.0

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	04:16, 11 October 2023		1,351 × 381 (101 KB)	AmirhosseinAbaskohi (talk \| contribs)	Uploaded a work by Aydar Bulatov,Yuri Kuratov, Mikhail S. Burtse from https://arxiv.org/pdf/2207.06881.pdf with UploadWizard

You cannot overwrite this file.

File usage

The following page uses this file:

Course:CPSC522/Scaling Memory for Transformers