
In the Formula 1, p is a query protein submitted for prediction, s(p) and f(p) are the calculated score and categorized family for the protein, with possible non-transporter for f(p), which is discriminated by the A threshold. s(p) takes the minimum similarity score among all the transporter qij in the TCDB database, where i is the id of family Fi with the range 1 ... m, j is the member id in the family Fi, within the range 1 ... ki. The similarity score between transporter qij and protein p has three cases. In the first two cases, the belonged TC family Fi has at least k members, then the TMS on the family and the TMS of the protein p (tms(p)) are checked. The protein p will be discarded if the tms conflicted. It will go to the first case if its blast score between qij and p is in the blast threshold t and weighted score will be used in this case. It will go to the second case if the blast score is beyond the blast threshold and so do all other member in family Fi, when HMM score of p on the family will substitute all the members on the family. In the last case that the family Fi has less members than the threshold k, only blast score is checked. It will be discarded if the blast score is beyond threshold. In the component of s(p), MDLi is denoted as the HMM model of the family Fi if existed, pi be the expected TMS number of Fi, si is the standard deviation of TMS number in the family.