File:Overall pre-training and fine-tuning procedures for BERT. Apart from output layers, the same architectures are used in both pre-training and fine-tuning.png

File
File history
File usage

Size of this preview: 800 × 330 pixels. Other resolution: 1,198 × 494 pixels.

Original file ‎(1,198 × 494 pixels, file size: 124 KB, MIME type: image/png)

Summary

Description	English: Overall pre-training and fine-tuning procedures for BERT. Apart from output layers, the same architectures are used in both pre-training and fine-tuning. The same pre-trained model parameters are used to initialize models for different downstream tasks. During fine-tuning, all parameters are fine-tuned. [CLS] is a special symbol added in front of every input example, and [SEP] is a special separator token (e.g. separating questions/answers).
Date	14 March 2023(2023-03-14)
File source	BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Author	Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova

Licensing

Permission is granted to copy, distribute and/or modify this document according to the terms in Creative Commons License, Attribution-ShareAlike 4.0. The full text of this license may be found here: CC by-sa 4.0

File history

Click on a date/time to view the file as it appeared at that time.

	Date/Time	Thumbnail	Dimensions	User	Comment
current	18:41, 17 March 2023		1,198 × 494 (124 KB)	MEHARBHATIA (talk \| contribs)	Uploaded a work by Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova from BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding with UploadWizard

You cannot overwrite this file.

File usage

The following page uses this file:

Transformers