Window length 2n+1.

페이지 정보

profile_image
작성자
댓글 0건 조회 23회 작성일 24-05-16 07:44

본문

To investigate the position-specific amino acid composition around
Window length 2n+1. To investigate the position-specific amino acid composition around the carboxylation sites, a positional weighted matrix (PWM) is determined using non-homologous positive set of training data [32]. The PWM specifies the occurring frequency of amino acids in each position of a fragment. A matrix of (2n +1) elements, where 2n+1 refers to the window length and m contains 21 elements that stands for the 20 amino acids and one terminal signal, is referred to in order to encode each fragment sequence in the training data. The amino acids that undergoes post-translational modification were reported to be exposed on the surface of a protein [19]. Thus, the accessible surface area (ASA) surrounding the carboxylation sites is considered. Due to the fact that almost all of the experimental carboxylated proteins do not contain a corresponding PDB tertiary structure, RVP-Net [21,22] is utilized to calculate the ASA value, which is the percentage of the solvent-accessible area of each amino acid on a protein sequence. RVP-net is a prediction tool to determine the value of residual ASAs using neighborhood information and yields a mean absolute difference of 18.0 ?19.5 between the predicted and experimental values of ASA [22]. In this work, the full-length sequences of carboxylated proteins are submitted to RVP-Net to calculate the ASA value for all amino acids. The ASA values of the amino acids surrounding the carboxylation site are normalized to zero to one. In PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/20618955 investigating secondary structures surrounding carboxylation sites, PSIPRED [23] was utilized to predict the secondary structure from a given protein sequence. PSIPRED applied two feed-forward neural networks to predict secondary structure using the results from PSIBLAST (Position Specific Iterated - BLAST) [33]. PSIPRED 2.0 has been reported as the top out of 20 evaluated methods 4-Bromopicolinaldehyde by achieving a mean Q 3 score of 80.6 for a test data containing 40 domains which have no significant similarity to PDB structures [34]. The full-length sequences of carboxylated proteins are submitted to PSIPRED to obtain the secondary structure of PubMed ID:https://www.ncbi.nlm.nih.gov/pubmed/12711626 all amino acids. The resulted data of PSIPRED is encoded in terms of "H" for helix, "E" for sheet, and "C" for coil. In order to transform the three terms intoTable 1 Statistics of experimentally verified carboxylation sites in training data and independent testing dataData set Number of carboxylated proteins Number of carboxylated glutamate residues Number of non-carboxylated glutamate residues All data (UniProt release 15.0 and HPRD 8.0) 134 463 854 Training Dimethyl 4-iodopyridine-2,6-dicarboxylate data (nonhomologous) 79 302 567 Independent testing data (nonhomologous) 14 60Lee et al. BMC Bioinformatics 2011, 12(Suppl 13):S10 http://www.biomedcentral.com/1471-2105/12/S13/SPage 4 ofnumeric vectors, a three-dimensional binary vector is applied: helix (H) is encoded as "100," sheet (E) is encoded as "010," and coil (C) is encoded as "001".Model learning and evaluationIn this study, support vector machine (SVM) is employed in order to create predictive models that utilize the explored features. In terms of binary classification, SVM adopts a kernel function to map samples into a higher dimensional space and subsequently determines a hyper-plane for effectively discriminating between the two classes of samples with a maximum margin and a minimum inaccuracy. LibSVM [35] is employed to generate a binary prediction model using the positive and negative training sets. The kernel function of SVM.

댓글목록

등록된 댓글이 없습니다.

회원로그인

회원가입