Hi @dilawn, have you figured this out yet by any chance? I have just posted a similar question on not finding causal masks in RobertaForCausalLM and BertGeneration classes. I am just wondering if I misunderstand the concept or missing the line of code that applies these causal masks.
Related topics
| Topic | Replies | Views | Activity | |
|---|---|---|---|---|
| Forward-looking or left-context attention mask (left-to-right) generation with BertGeneration and RobertaForCausalLM | 3 | 1412 | October 27, 2020 | |
| Where does causal mask get generated for T5 decoder? | 2 | 762 | January 9, 2024 | |
| Difference between transformer encoder and decoder | 1 | 12072 | March 12, 2021 | |
| Questions on the `BertModelLMHeadModel` | 7 | 6416 | October 5, 2020 | |
| Remove causal mask from Llama decoder | 5 | 1033 | October 22, 2024 |