The following columns in the training set don't have a corresponding argument

Hello,

I am trying to fine tune a T5 model using xsum dataset but I am getting the following. Should I just change the sames “summary, document and id” to something else? Or that’s due to something else?

The following columns in the training set don’t have a corresponding argument in T5ForConditionalGeneration.forward and have been ignored: summary, document, id.

Many thanks!

Hi,

The Trainer automatically ignores columns in your dataset which aren’t used by the model. For T5 for instance, the model expects input_ids, attention_mask, labels etc., but not “summary”, “document”, “id”. As long as input_ids etc are in your dataset, it’s fine.

The warning is just telling you that those columns aren’t used.


No it not working for please help!
import pandas as pd

Load the original dataset

data = load_dataset(“jcordon5/cybersecurity-rules”)

Convert the training dataset to a pandas DataFrame

df = pd.DataFrame(data[“train”])

Create a new Dataset from the DataFrame, ensuring only the required columns

df = df[[‘instruction’, ‘output’]] # Keep only ‘instruction’ and ‘output’

Convert the DataFrame back into a Dataset

train_dataset = Dataset.from_pandas(df)

Create a DatasetDict

data = DatasetDict({

‘train’: train_dataset

})

Verify the structure of the new dataset dictionary

data

data = data.map(
lambda samples: tokenizer(
samples[“instruction”], padding=“max_length”,
truncation=True
),
batched=True
)
data

Formatting function (ensure full output is included)

def formatting_func(example):
text = f"Output: {example[‘output’]}" # Return entire output
return [text]