S I F T ing the Human Variation Databases

This page contains supplemental information to "Accounting for Human Polymorphisms Predicted to Affect Protein Function" (Genome Research 12:436-446).

Download Predictions on Databases

About the files you are downloading (after decompressing and extracting the files)

*.seq contains the reference protein sequence used for prediction.
*.alignedfasta contains the alignment used for SIFT prediction. The sequences were chosen by SIFT by conservation with a median conservation cutoff of 2.75 searching SWISS-PROT/TrEMBL.
*.prediction contains the predictions for a protein sequence. Only those predictions with median conservation cutoff <= 3.25 were included.

A line in XP_001290.subst reads:

P1395L INTOLERANT 0.03 3.17 2.06 2.862 1.200 15 28

1rst field: Substitution from P at position 1395 to L.
2nd field: Predicted to be intolerant.
3rd field: SIFT score of 0.03.
4th field: Conservation score at this position was 3.17.
5th field: Conservation of the neighboring positions were examined (two on either side of the substitution) is 2.06.
6th field: The sequences that have an amino acid represented at this position have a median conservation of 2.862. (This should be under 3.25, our cutoff.) This corresponds to the median conservation cutoff.
7th field: Ignore.
8th field: Number of sequences that have an amino acid represented at this position (15 in this example).
9th field: Total number of sequences in alignmen (28 in this example).

Supplementary information

Variants detected from sequencing error (6 could be real polymorphisms)

Biased nsSNPs predicted to be damaging from Sunyaev et al. paper

Removing sequences > 90%, 95%, and 99% identical to the query gives similar results

November 2001
