S I F T ing the Human Variation Databases

This page contains supplemental information to "Accounting for Human Polymorphisms Predicted to Affect Protein Function" (Genome Research 12:436-446).

Download Predictions on Databases

About the files you are downloading (after decompressing and extracting the files)

*.seq contains the reference protein sequence used for prediction.
*.alignedfasta contains the alignment used for SIFT prediction. The sequences were chosen by SIFT by conservation with a median conservation cutoff of 2.75 searching SWISS-PROT/TrEMBL.
*.prediction contains the predictions for a protein sequence. Only those predictions with median conservation cutoff <= 3.25 were included.

A line in XP_001290.subst reads:

P1395L INTOLERANT 0.03 3.17 2.06 2.862 1.200 15 28

1rst field: Substitution from P at position 1395 to L.
2nd field: Predicted to be intolerant.
3rd field: SIFT score of 0.03.
4th field: Conservation score at this position was 3.17.
5th field: Conservation of the neighboring positions were examined (two on either side of the substitution) is 2.06.
6th field: The sequences that have an amino acid represented at this position have a median conservation of 2.862. (This should be under 3.25, our cutoff.) This corresponds to the median conservation cutoff.
7th field: Ignore.
8th field: Number of sequences that have an amino acid represented at this position (15 in this example).
9th field: Total number of sequences in alignmen (28 in this example).

