This work presents a novel approach to protein feature prediction, using a customized, task-specific version of the ESM-1b transformer model. This model contains an extra 1280-dimensional token, which was trained end-to-end to store enzyme-related information salient to the downstream prediction task. The model was trained on a dataset with ~18,000 experimentally confirmed positive enzyme-substrate pairs, and outperforms previously published enzyme family-specific prediction models. Data augmentation was used to create negative training examples, by randomly sampling small molecules similar to the substrates in experimentally confirmed enzyme-substrate pairs. All small molecules were numerically represented with task-specific fingerprints created with graph neural networks.