PhosBoost Phosphosite Predictions tracks for IWGSC v2, Morex v3 and Sang
Phosphosites were predicted using PhosBoost, a novel machine learning approach (Poretsky et al., under review) which leverages the power of gradient boosting trees and pretrained protein language models to predict protein phosphorylation. A model trained on the complete qPTMplants database protein phosphorylation data was used to generate genome-wide phosphosite predictions in plants. Phosphosites were also inferred from the qPTMplants phosphosites based on sequence similarity by using a DIAMOND pairwise sequence alignment analysis step. For all proteins, phosphosites in one representative gene model were predicted.