CN102268485A - Method for carrying out interaction between virus and human protein by text mining - Google Patents
Method for carrying out interaction between virus and human protein by text mining Download PDFInfo
- Publication number
- CN102268485A CN102268485A CN2010101931303A CN201010193130A CN102268485A CN 102268485 A CN102268485 A CN 102268485A CN 2010101931303 A CN2010101931303 A CN 2010101931303A CN 201010193130 A CN201010193130 A CN 201010193130A CN 102268485 A CN102268485 A CN 102268485A
- Authority
- CN
- China
- Prior art keywords
- virus
- human
- protein
- transcription factor
- human protein
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)
Abstract
The invention designs a method for studying an interaction relation between a virus and human proteins by a literature mining technology. The method comprises the following main steps of: step1. constructing a database of interaction between a virus and human proteins by the literature mining technology; step 2: analyzing interaction relations between the virus and the human proteins utilizing the database constructed in step 1, and extracting a transcription factor gene; step 3. carrying out a protein prediction on the transcription factor, and predicting a regulatory protein and a human gene being regulated; step 4. verifying an expression regulation relation between the virus and the predicted human protein by experiments. The method is characterized by introducing the literature mining technology to screen the human transcription factor gene regulated by the virus protein, and by verifying an indirect expression regulation relation between the virus and the human protein through experiments.
Description
Technical field
The invention belongs to biological technical field, relate to a kind of method of utilizing the document digging technology to carry out virus and human protein repercussion study.
Background technology
Virus is a kind of acellular life form, and it is made of a nucleic acid long-chain and protein enclosure, and virus does not have the metabolism mechanism of oneself, does not have enzyme system.Therefore virus has been left host cell, just become without any vital movement, can not independent self-reproduction chemical substance.In case after entering host cell, the ability that it just can utilize the matter and energy in the cell and duplicate, transcribe and translate is according to its genetic information that nucleic acid the comprised generation a new generation virus the same with it.
One of principal character of virus is an oncogenic function, has some viruses can bring out innocent tumour, as the papilloma virus of rabbit fibroma virus, molluscum contagiosum virus and the papovavirus section of Poxviridae; Other has some can bring out malignant tumour, can be divided into DNA tumour virus and RNA tumour virus by its nucleic acid species.Virus is showing the metabolism non-activity on one's body certainly, the tool infectivity, and pair cell has absolute interdependence, shows: tool heredity is active to be present in extracellular virus, but keeps infection activity; When virus in host cell is, self nucleic acid, infected cell are just duplicated, translate, expressed to virus.For host cell, the intrusion of virus tends to cause the change of host cell proteins expression pattern, suppresses the host protein expression of gene, and this inhibition will influence the normal physiological function of host cell and the pathogenic process and the result of decision virus.
Current research shows, there is various ways in virus to the inhibition regulating and controlling effect of human protein, a kind of hypothesis of the special proposition of the present invention: virus is by the autogene expressing protein, the expression of regulation and control human protein transcription factor, regulate and control the expression of human protein again via the transcription factor of particular expression, this is a kind of form of viral modulin.
Summary of the invention
The problem that method of the present invention is mainly studied is virus and the interactional relation of human protein, finds out the regulatory mechanism of virus to human protein based on a kind of document digging technology research.Therefore we propose a kind of hypothesis of viral modulin, and its hypothesis is: virus can the human target protein of direct regulation and control, but comes the expression of indirect adjustments and controls target protein by regulating and control relevant transcription factor.We have designed following flow process for this reason:
1, utilize the document digging technology to make up the interaction database of virus and Human genome
2, utilize above-mentioned database that virus and human protein are carried out the interaction relationship analysis, and therefrom extract transcription factor gene
3, transcription factor is carried out the modulin prediction, the modulin that dopes promptly is considered to have indirect interaction with virus.
4,, verify that its expression regulation concerns to virus and the checking that experimentizes of human regulatory gene
Description of drawings
Fig. 1 is the implementing procedure figure of the inventive method
Fig. 2 is to use the HBV that the document digging technology finds out and the interaction relationship of human protein gene
Fig. 3 is the result who adopts western blot experimental verification HBV and human protein gene IFNAR2 expression regulation relation.
Embodiment
Method of the present invention will be that example is introduced concrete mode of the invention process with the interaction relationship of HBV (hepatitis B virus) and human protein.
The interaction database of step 1, structure HBV and Human genome, the pertinent literature of download HBV and Human genome interaction relationship from PubMed literature summary database makes up database.Step is:
1) Document searching ﹠amp; Formating: document searching and format, utilize keyword to carry out document searching, and document be organized into the XML form.
2) Sentence tokenlization using Lingpipe: utilize the Sentencetokenlization instrument in the Lingpipe tool kit that summary texts is separated into single sentence.Follow-up analysis is fundamental unit with the sentence.
3) Human gene mention tagging using ABNER: utilize ABNER software to carry out the location of the description of Human genome.Extract Human genome.
4) Conjunction resolution: in the description for the gene that extracts, as " STAT3/5gene " with resolved one-tenth STAT3gene and STAT5gene
5) Gene name normalization based on Entrez database:, the description of the gene in article unification need be official's gene symbol, conveniently to analyze and to compare owing to the name for gene in the free text is relatively more chaotic.Gene symbol is as the criterion with the entrez gene database of NCBI.
6) Verb tagging using Lingpipe and inhouse protein-protein interaction verbdictionary: the verb dictionary of setting up an interactions between protein earlier, comprise as repress, regulate, inhibit, interact, phosphorylate, downregulate, all verbs such as upregulate and modification thereof.Dictionary is drawn materials from BioNLP project (http://bionlp.sourceforge.net/).Utilize the Lingpipe tool kit to separate the verb of interactions between protein in the sentence then.And make up bibliographic data base with this.
Step 2, utilize the interaction relationship of above-mentioned database analysis HBV and human transcription factor.At first, become HBV albumen synonym dictionary (compilation is from entrez gene database), utilize the Lingpipe tool kit to separate sentence then, obtain the proteic description of HBV, database is carried out statistical study.Analytical results shows that HBV can raise AP-1, and the NF-kappaB isoreactivity suppresses p53, the E2F1 isoreactivity.
Step 3, the transcription factor that step 2 is obtained are carried out the modulin prediction, main method is to human all genes, extract its upstream 3000bp (base) sequence, by PWM algorithm (Pulse-widthmodulation), utilize m2transfac 1.0 softwares (http://www.gene-regulation.com/pub/programs.html) prediction to be subjected to the gene of above-mentioned transcription factor regulation and control.
Step 4, experimental verification, from the human modulin that dopes, filter out protein gene IFNAR2 expressed proteins and carry out western blot experimental verification, verify the expression regulation relation of itself and HBV, the result shows between HBV and the human protein IFNAR2 to have indirect regulation relationship
More than be the description of this invention and non-limiting, based on other embodiment of inventive concept, all among protection scope of the present invention.
Claims (1)
1. the method for utilizing the document digging technology to carry out virus and human protein repercussion study of the present invention is primarily characterized in that:
Step 1 utilizes the document digging technology to make up the interaction database of virus and Human genome;
Step 2 utilizes step 1 institute database construction to analyze the interaction relationship of virus and human protein, extracts transcription factor gene;
Step 3 is carried out albumen prediction to transcription factor, dope modulin and regulate and control Human genome;
Step 4, experimental verification virus concerns with expression regulation between the human protein that dopes
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101931303A CN102268485A (en) | 2010-06-04 | 2010-06-04 | Method for carrying out interaction between virus and human protein by text mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2010101931303A CN102268485A (en) | 2010-06-04 | 2010-06-04 | Method for carrying out interaction between virus and human protein by text mining |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102268485A true CN102268485A (en) | 2011-12-07 |
Family
ID=45050938
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2010101931303A Pending CN102268485A (en) | 2010-06-04 | 2010-06-04 | Method for carrying out interaction between virus and human protein by text mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102268485A (en) |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1660890A (en) * | 2000-03-10 | 2005-08-31 | 第一制药株式会社 | Method for predicting protein-protein interaction |
-
2010
- 2010-06-04 CN CN2010101931303A patent/CN102268485A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1660890A (en) * | 2000-03-10 | 2005-08-31 | 第一制药株式会社 | Method for predicting protein-protein interaction |
Non-Patent Citations (4)
Title |
---|
《国外医学(微生物分册)》 20000630 程立等 人类嗜T细胞白血病I型病毒TAX蛋白对NF-kappaB转录因子的作用 摘要、第12页右栏最后1段 1 , 第6期 * |
B CHASSEY ET AL: "Hepatitis C virus infection protein network", 《MOLECULAR SYSTEMS BIOLOGY》, no. 230, 4 November 2008 (2008-11-04) * |
孙景春等: "大规模蛋白质相互作用数据的分析与应用", 《科学通报》, no. 19, 31 October 2005 (2005-10-31), pages 2055 - 2060 * |
程立等: "人类嗜T细胞白血病I型病毒TAX蛋白对NF-κB转录因子的作用", 《国外医学(微生物分册)》, no. 6, 30 June 2000 (2000-06-30), pages 12 - 1 * |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Zhao et al. | IRESbase: a comprehensive database of experimentally validated internal ribosome entry sites | |
Sun et al. | CircCode: a powerful tool for identifying circRNA coding ability | |
Sacco et al. | Recent insights and novel bioinformatics tools to understand the role of microRNAs binding to 5'untranslated region | |
Petsalaki et al. | PredSL: a tool for the N-terminal sequence-based prediction of protein subcellular localization | |
Ausländer et al. | Programmable single-cell mammalian biocomputers | |
Zhan et al. | Accurate prediction of ncRNA-protein interactions from the integration of sequence and evolutionary information | |
Farahmand et al. | ModEx: A text mining system for extracting mode of regulation of transcription factor-gene regulatory interaction | |
Seoud et al. | TMT-HCC: A tool for text mining the biomedical literature for hepatocellular carcinoma (HCC) biomarkers identification | |
Saha et al. | VGIchan: prediction and classification of voltage-gated ion channels | |
CN102268485A (en) | Method for carrying out interaction between virus and human protein by text mining | |
Dong et al. | Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding | |
Ji | Rfoot: Transcriptome‐scale identification of RNA‐protein complexes from ribosome profiling data | |
Washietl et al. | Identifying structural noncoding RNAs using RNAz | |
Saha et al. | An overview of bioinformatics and computational genomics in modern plant science | |
de Pretis et al. | Computational and experimental methods to decipher the epigenetic code | |
Li et al. | Feature selection for the prediction of translation initiation sites | |
Stalder et al. | SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution | |
Liang et al. | LncRNAs: Current understanding, future directions, and challenges | |
Asim | An Efficient Automated Machine Learning Framework for Genomics and Proteomics Sequence Analysis | |
Xiao et al. | Using cellular automata to simulate domain evolution in proteins | |
Hu et al. | LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization | |
Kongburan et al. | Metabolite named entity recognition: A hybrid approach | |
Djordjevic et al. | Scoring targets of transcription in bacteria rather than focusing on individual binding sites | |
Sokolov | Biomarkers for depression: genetic, epigenetic, and expression evidence | |
Tao | Studies on the network biology impact in pharmacology and toxicology |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20111207 |