Supplemental Tables for “Substring Selection for Biomedical Document Classification

 

Supplement Table 1. Summary of the five PTM datasets

PTM types

Positives

Negatives

Acetylation

55

868

Glycosylation

41

711

Methylation

27

171

Phosphorylation

79

389

Hydroxylation

27

133

 

 

 

Supplement Table 2. The Unlabeled Glycosylation Abstracts

Attributes Used for Ranking

Stemmed Word

Substring

Full Dataset

Top50  

Top50

Bottom50  

Bottom50