CN102268485A

CN102268485A - Method for carrying out interaction between virus and human protein by text mining

Info

Publication number: CN102268485A
Application number: CN2010101931303A
Authority: CN
Inventors: 曾华宗
Original assignee: SHANGHAI CLUSTER BIOTECH CO Ltd
Current assignee: SHANGHAI CLUSTER BIOTECH CO Ltd
Priority date: 2010-06-04
Filing date: 2010-06-04
Publication date: 2011-12-07

Abstract

The invention designs a method for studying an interaction relation between a virus and human proteins by a literature mining technology. The method comprises the following main steps of: step1. constructing a database of interaction between a virus and human proteins by the literature mining technology; step 2: analyzing interaction relations between the virus and the human proteins utilizing the database constructed in step 1, and extracting a transcription factor gene; step 3. carrying out a protein prediction on the transcription factor, and predicting a regulatory protein and a human gene being regulated; step 4. verifying an expression regulation relation between the virus and the predicted human protein by experiments. The method is characterized by introducing the literature mining technology to screen the human transcription factor gene regulated by the virus protein, and by verifying an indirect expression regulation relation between the virus and the human protein through experiments.

Description

A kind ofly utilize text mining to carry out virus and the interactional method of human protein

Technical field

The invention belongs to biological technical field, relate to a kind of method of utilizing the document digging technology to carry out virus and human protein repercussion study.

Background technology

Virus is a kind of acellular life form, and it is made of a nucleic acid long-chain and protein enclosure, and virus does not have the metabolism mechanism of oneself, does not have enzyme system.Therefore virus has been left host cell, just become without any vital movement, can not independent self-reproduction chemical substance.In case after entering host cell, the ability that it just can utilize the matter and energy in the cell and duplicate, transcribe and translate is according to its genetic information that nucleic acid the comprised generation a new generation virus the same with it.

One of principal character of virus is an oncogenic function, has some viruses can bring out innocent tumour, as the papilloma virus of rabbit fibroma virus, molluscum contagiosum virus and the papovavirus section of Poxviridae; Other has some can bring out malignant tumour, can be divided into DNA tumour virus and RNA tumour virus by its nucleic acid species.Virus is showing the metabolism non-activity on one's body certainly, the tool infectivity, and pair cell has absolute interdependence, shows: tool heredity is active to be present in extracellular virus, but keeps infection activity; When virus in host cell is, self nucleic acid, infected cell are just duplicated, translate, expressed to virus.For host cell, the intrusion of virus tends to cause the change of host cell proteins expression pattern, suppresses the host protein expression of gene, and this inhibition will influence the normal physiological function of host cell and the pathogenic process and the result of decision virus.

Current research shows, there is various ways in virus to the inhibition regulating and controlling effect of human protein, a kind of hypothesis of the special proposition of the present invention: virus is by the autogene expressing protein, the expression of regulation and control human protein transcription factor, regulate and control the expression of human protein again via the transcription factor of particular expression, this is a kind of form of viral modulin.

Summary of the invention

The problem that method of the present invention is mainly studied is virus and the interactional relation of human protein, finds out the regulatory mechanism of virus to human protein based on a kind of document digging technology research.Therefore we propose a kind of hypothesis of viral modulin, and its hypothesis is: virus can the human target protein of direct regulation and control, but comes the expression of indirect adjustments and controls target protein by regulating and control relevant transcription factor.We have designed following flow process for this reason:

1, utilize the document digging technology to make up the interaction database of virus and Human genome

2, utilize above-mentioned database that virus and human protein are carried out the interaction relationship analysis, and therefrom extract transcription factor gene

3, transcription factor is carried out the modulin prediction, the modulin that dopes promptly is considered to have indirect interaction with virus.

4,, verify that its expression regulation concerns to virus and the checking that experimentizes of human regulatory gene

Description of drawings

Fig. 1 is the implementing procedure figure of the inventive method

Fig. 2 is to use the HBV that the document digging technology finds out and the interaction relationship of human protein gene

Fig. 3 is the result who adopts western blot experimental verification HBV and human protein gene IFNAR2 expression regulation relation.

Embodiment

Method of the present invention will be that example is introduced concrete mode of the invention process with the interaction relationship of HBV (hepatitis B virus) and human protein.

The interaction database of step 1, structure HBV and Human genome, the pertinent literature of download HBV and Human genome interaction relationship from PubMed literature summary database makes up database.Step is:

1) Document searching ﹠amp; Formating: document searching and format, utilize keyword to carry out document searching, and document be organized into the XML form.

2) Sentence tokenlization using Lingpipe: utilize the Sentencetokenlization instrument in the Lingpipe tool kit that summary texts is separated into single sentence.Follow-up analysis is fundamental unit with the sentence.

3) Human gene mention tagging using ABNER: utilize ABNER software to carry out the location of the description of Human genome.Extract Human genome.

4) Conjunction resolution: in the description for the gene that extracts, as " STAT3/5gene " with resolved one-tenth STAT3gene and STAT5gene

5) Gene name normalization based on Entrez database:, the description of the gene in article unification need be official's gene symbol, conveniently to analyze and to compare owing to the name for gene in the free text is relatively more chaotic.Gene symbol is as the criterion with the entrez gene database of NCBI.

6) Verb tagging using Lingpipe and inhouse protein-protein interaction verbdictionary: the verb dictionary of setting up an interactions between protein earlier, comprise as repress, regulate, inhibit, interact, phosphorylate, downregulate, all verbs such as upregulate and modification thereof.Dictionary is drawn materials from BioNLP project (http://bionlp.sourceforge.net/).Utilize the Lingpipe tool kit to separate the verb of interactions between protein in the sentence then.And make up bibliographic data base with this.

Step 2, utilize the interaction relationship of above-mentioned database analysis HBV and human transcription factor.At first, become HBV albumen synonym dictionary (compilation is from entrez gene database), utilize the Lingpipe tool kit to separate sentence then, obtain the proteic description of HBV, database is carried out statistical study.Analytical results shows that HBV can raise AP-1, and the NF-kappaB isoreactivity suppresses p53, the E2F1 isoreactivity.

Step 3, the transcription factor that step 2 is obtained are carried out the modulin prediction, main method is to human all genes, extract its upstream 3000bp (base) sequence, by PWM algorithm (Pulse-widthmodulation), utilize m2transfac 1.0 softwares (http://www.gene-regulation.com/pub/programs.html) prediction to be subjected to the gene of above-mentioned transcription factor regulation and control.

Step 4, experimental verification, from the human modulin that dopes, filter out protein gene IFNAR2 expressed proteins and carry out western blot experimental verification, verify the expression regulation relation of itself and HBV, the result shows between HBV and the human protein IFNAR2 to have indirect regulation relationship

More than be the description of this invention and non-limiting, based on other embodiment of inventive concept, all among protection scope of the present invention.

Claims

1. the method for utilizing the document digging technology to carry out virus and human protein repercussion study of the present invention is primarily characterized in that:

Step 1 utilizes the document digging technology to make up the interaction database of virus and Human genome;

Step 2 utilizes step 1 institute database construction to analyze the interaction relationship of virus and human protein, extracts transcription factor gene;

Step 3 is carried out albumen prediction to transcription factor, dope modulin and regulate and control Human genome;

Step 4, experimental verification virus concerns with expression regulation between the human protein that dopes