Bioinformatic determination of dominant amino acid sequence and mutation hotspots in hepatitis B virus X protein
WANG Wei1, HE Yonggang1, PU Yonglan2, PAN Shaokun1,3, XIE Youhua1
1. Key Laboratory of Medical Molecular Virology of Ministries of Education and Health, School of Basic Medical Sciences, Fudan University, Shanghai 200032, China; 2. Department of Infectious Diseases, Taicang First People’s Hospital, Taicang 215400, China; 3. Shanghai Keybiomed Technology Co., Ltd. Shanghai 201206, China
摘要: 乙型肝炎病毒X蛋白(hepatitis B virus X protein,HBx)全长154个氨基酸,与肝癌发生密切相关。为确定HBx的优势氨基酸序列和热点突变位点,在GenBank中下载所有HBx的氨基酸序列13 950条,剔除插入突变、缺失突变和起始密码子非甲硫氨酸的序列,最后保留7 126条。通过分析这7 126条序列,计算出HBx每个位点的氨基酸分布情况,出现频率最高的氨基酸为该位点的优势氨基酸,其他氨基酸为突变氨基酸。154个位点的优势氨基酸组成HBx优势氨基酸序列。突变率>10.0%的热点突变位点有32个。其中第36、42、44、87、88和127位氨基酸有4种(突变率>1.0%)以上突变形式,具有较高的多态性。与肝癌密切相关的K130M/V131I双突变率为34.7%。通过7 126条HBx序列与优势序列的同源性比较,随机选出其中50条序列(2条与优势序列同源性<75%,48条同源性为76%~99%),与23条参考序列及优势序列共同构建系统发生树。结果显示,HBx优势氨基酸序列属于基因型C,这与基因型C为全球主要流行型一致。本研究首次系统性分析了GenBank中HBx的优势序列,确定了32个HBx热点突变位点和6个多态性较高的位点,为基于HBx突变的基础和应用研究奠定了基础。
Abstract:Hepatitis B virus X protein (HBx), a 154-amino acid protein encoded by hepatitis B virus (HBV), has been implicated in the development of hepatocellular carcinoma (HCC). The aim of the present study is to establish a complete and dominant peptide sequence of HBx. A total of 13 950 HBx protein sequences were retrieved from GenBank. After excluding the non-complete ones that harbor insertions, deletions and non-methionine start amino acid, 7 126 were included in the study. Occurrence frequencies of the 20 amino acids at each position of HBx were calculated. The amino acid with the highest frequency at each position was designated as the dominant amino acid at the position and all the dominant residues constituted the complete and dominant peptide sequence of HBx. All the non-dominant amino acids were taken as the mutants. 32 hotspots with a mutation rate over 10.0% were detected. The positions 36, 42, 44, 87, 88, and 127 showed a higher polymorphism, having more than 4 kinds of mutations with a rate over 1.0%. The K130M/V131I dual mutation, regarded as a high risk factor of HBV-related HCC, showed an occurrence frequency at 34.7% in this study. The homology between the dominant sequence and each of the 7 126 HBx sequences was calculated. 50 sequences, which included 2 sequences with a homology to the dominant below 75% and 48 sequences with a homology between 76%-99%, were chosen for phylogenetic tree construction. The phylogenetic tree with the dominant HBx, 50 selected HBx and 23 reference sequences showed that the dominant HBx belonged to genotype C. This study has systematically constituted a calculated HBx peptide sequence with dominant amino acid at each position, laying the foundation for the basic and applied research on HBx.
王葳1,何永刚1,浦永兰2,潘少坤1,3,谢幼华1. 乙型肝炎病毒X蛋白的优势氨基酸序列与热点突变位点的生物信息学分析[J]. 微生物与感染, 2016, 11(6): 338-346.
WANG Wei1, HE Yonggang1, PU Yonglan2, PAN Shaokun1,3, XIE Youhua1. Bioinformatic determination of dominant amino acid sequence and mutation hotspots in hepatitis B virus X protein. JOURNAL OF MICROBES AND INFECTIONS, 2016, 11(6): 338-346.