[go: up one dir, main page]

CN102268485A - Method for carrying out interaction between virus and human protein by text mining - Google Patents

Method for carrying out interaction between virus and human protein by text mining Download PDF

Info

Publication number
CN102268485A
CN102268485A CN2010101931303A CN201010193130A CN102268485A CN 102268485 A CN102268485 A CN 102268485A CN 2010101931303 A CN2010101931303 A CN 2010101931303A CN 201010193130 A CN201010193130 A CN 201010193130A CN 102268485 A CN102268485 A CN 102268485A
Authority
CN
China
Prior art keywords
virus
human
protein
transcription factor
human protein
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010101931303A
Other languages
Chinese (zh)
Inventor
曾华宗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI CLUSTER BIOTECH CO Ltd
Original Assignee
SHANGHAI CLUSTER BIOTECH CO Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by SHANGHAI CLUSTER BIOTECH CO Ltd filed Critical SHANGHAI CLUSTER BIOTECH CO Ltd
Priority to CN2010101931303A priority Critical patent/CN102268485A/en
Publication of CN102268485A publication Critical patent/CN102268485A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Measuring Or Testing Involving Enzymes Or Micro-Organisms (AREA)

Abstract

The invention designs a method for studying an interaction relation between a virus and human proteins by a literature mining technology. The method comprises the following main steps of: step1. constructing a database of interaction between a virus and human proteins by the literature mining technology; step 2: analyzing interaction relations between the virus and the human proteins utilizing the database constructed in step 1, and extracting a transcription factor gene; step 3. carrying out a protein prediction on the transcription factor, and predicting a regulatory protein and a human gene being regulated; step 4. verifying an expression regulation relation between the virus and the predicted human protein by experiments. The method is characterized by introducing the literature mining technology to screen the human transcription factor gene regulated by the virus protein, and by verifying an indirect expression regulation relation between the virus and the human protein through experiments.

Description

A kind ofly utilize text mining to carry out virus and the interactional method of human protein
Technical field
The invention belongs to biological technical field, relate to a kind of method of utilizing the document digging technology to carry out virus and human protein repercussion study.
Background technology
Virus is a kind of acellular life form, and it is made of a nucleic acid long-chain and protein enclosure, and virus does not have the metabolism mechanism of oneself, does not have enzyme system.Therefore virus has been left host cell, just become without any vital movement, can not independent self-reproduction chemical substance.In case after entering host cell, the ability that it just can utilize the matter and energy in the cell and duplicate, transcribe and translate is according to its genetic information that nucleic acid the comprised generation a new generation virus the same with it.
One of principal character of virus is an oncogenic function, has some viruses can bring out innocent tumour, as the papilloma virus of rabbit fibroma virus, molluscum contagiosum virus and the papovavirus section of Poxviridae; Other has some can bring out malignant tumour, can be divided into DNA tumour virus and RNA tumour virus by its nucleic acid species.Virus is showing the metabolism non-activity on one's body certainly, the tool infectivity, and pair cell has absolute interdependence, shows: tool heredity is active to be present in extracellular virus, but keeps infection activity; When virus in host cell is, self nucleic acid, infected cell are just duplicated, translate, expressed to virus.For host cell, the intrusion of virus tends to cause the change of host cell proteins expression pattern, suppresses the host protein expression of gene, and this inhibition will influence the normal physiological function of host cell and the pathogenic process and the result of decision virus.
Current research shows, there is various ways in virus to the inhibition regulating and controlling effect of human protein, a kind of hypothesis of the special proposition of the present invention: virus is by the autogene expressing protein, the expression of regulation and control human protein transcription factor, regulate and control the expression of human protein again via the transcription factor of particular expression, this is a kind of form of viral modulin.
Summary of the invention
The problem that method of the present invention is mainly studied is virus and the interactional relation of human protein, finds out the regulatory mechanism of virus to human protein based on a kind of document digging technology research.Therefore we propose a kind of hypothesis of viral modulin, and its hypothesis is: virus can the human target protein of direct regulation and control, but comes the expression of indirect adjustments and controls target protein by regulating and control relevant transcription factor.We have designed following flow process for this reason:
1, utilize the document digging technology to make up the interaction database of virus and Human genome
2, utilize above-mentioned database that virus and human protein are carried out the interaction relationship analysis, and therefrom extract transcription factor gene
3, transcription factor is carried out the modulin prediction, the modulin that dopes promptly is considered to have indirect interaction with virus.
4,, verify that its expression regulation concerns to virus and the checking that experimentizes of human regulatory gene
Description of drawings
Fig. 1 is the implementing procedure figure of the inventive method
Fig. 2 is to use the HBV that the document digging technology finds out and the interaction relationship of human protein gene
Fig. 3 is the result who adopts western blot experimental verification HBV and human protein gene IFNAR2 expression regulation relation.
Embodiment
Method of the present invention will be that example is introduced concrete mode of the invention process with the interaction relationship of HBV (hepatitis B virus) and human protein.
The interaction database of step 1, structure HBV and Human genome, the pertinent literature of download HBV and Human genome interaction relationship from PubMed literature summary database makes up database.Step is:
1) Document searching ﹠amp; Formating: document searching and format, utilize keyword to carry out document searching, and document be organized into the XML form.
2) Sentence tokenlization using Lingpipe: utilize the Sentencetokenlization instrument in the Lingpipe tool kit that summary texts is separated into single sentence.Follow-up analysis is fundamental unit with the sentence.
3) Human gene mention tagging using ABNER: utilize ABNER software to carry out the location of the description of Human genome.Extract Human genome.
4) Conjunction resolution: in the description for the gene that extracts, as " STAT3/5gene " with resolved one-tenth STAT3gene and STAT5gene
5) Gene name normalization based on Entrez database:, the description of the gene in article unification need be official's gene symbol, conveniently to analyze and to compare owing to the name for gene in the free text is relatively more chaotic.Gene symbol is as the criterion with the entrez gene database of NCBI.
6) Verb tagging using Lingpipe and inhouse protein-protein interaction verbdictionary: the verb dictionary of setting up an interactions between protein earlier, comprise as repress, regulate, inhibit, interact, phosphorylate, downregulate, all verbs such as upregulate and modification thereof.Dictionary is drawn materials from BioNLP project (http://bionlp.sourceforge.net/).Utilize the Lingpipe tool kit to separate the verb of interactions between protein in the sentence then.And make up bibliographic data base with this.
Step 2, utilize the interaction relationship of above-mentioned database analysis HBV and human transcription factor.At first, become HBV albumen synonym dictionary (compilation is from entrez gene database), utilize the Lingpipe tool kit to separate sentence then, obtain the proteic description of HBV, database is carried out statistical study.Analytical results shows that HBV can raise AP-1, and the NF-kappaB isoreactivity suppresses p53, the E2F1 isoreactivity.
Step 3, the transcription factor that step 2 is obtained are carried out the modulin prediction, main method is to human all genes, extract its upstream 3000bp (base) sequence, by PWM algorithm (Pulse-widthmodulation), utilize m2transfac 1.0 softwares (http://www.gene-regulation.com/pub/programs.html) prediction to be subjected to the gene of above-mentioned transcription factor regulation and control.
Step 4, experimental verification, from the human modulin that dopes, filter out protein gene IFNAR2 expressed proteins and carry out western blot experimental verification, verify the expression regulation relation of itself and HBV, the result shows between HBV and the human protein IFNAR2 to have indirect regulation relationship
More than be the description of this invention and non-limiting, based on other embodiment of inventive concept, all among protection scope of the present invention.

Claims (1)

1. the method for utilizing the document digging technology to carry out virus and human protein repercussion study of the present invention is primarily characterized in that:
Step 1 utilizes the document digging technology to make up the interaction database of virus and Human genome;
Step 2 utilizes step 1 institute database construction to analyze the interaction relationship of virus and human protein, extracts transcription factor gene;
Step 3 is carried out albumen prediction to transcription factor, dope modulin and regulate and control Human genome;
Step 4, experimental verification virus concerns with expression regulation between the human protein that dopes
CN2010101931303A 2010-06-04 2010-06-04 Method for carrying out interaction between virus and human protein by text mining Pending CN102268485A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010101931303A CN102268485A (en) 2010-06-04 2010-06-04 Method for carrying out interaction between virus and human protein by text mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010101931303A CN102268485A (en) 2010-06-04 2010-06-04 Method for carrying out interaction between virus and human protein by text mining

Publications (1)

Publication Number Publication Date
CN102268485A true CN102268485A (en) 2011-12-07

Family

ID=45050938

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010101931303A Pending CN102268485A (en) 2010-06-04 2010-06-04 Method for carrying out interaction between virus and human protein by text mining

Country Status (1)

Country Link
CN (1) CN102268485A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1660890A (en) * 2000-03-10 2005-08-31 第一制药株式会社 Method for predicting protein-protein interaction

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1660890A (en) * 2000-03-10 2005-08-31 第一制药株式会社 Method for predicting protein-protein interaction

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
《国外医学(微生物分册)》 20000630 程立等 人类嗜T细胞白血病I型病毒TAX蛋白对NF-kappaB转录因子的作用 摘要、第12页右栏最后1段 1 , 第6期 *
B CHASSEY ET AL: "Hepatitis C virus infection protein network", 《MOLECULAR SYSTEMS BIOLOGY》, no. 230, 4 November 2008 (2008-11-04) *
孙景春等: "大规模蛋白质相互作用数据的分析与应用", 《科学通报》, no. 19, 31 October 2005 (2005-10-31), pages 2055 - 2060 *
程立等: "人类嗜T细胞白血病I型病毒TAX蛋白对NF-κB转录因子的作用", 《国外医学(微生物分册)》, no. 6, 30 June 2000 (2000-06-30), pages 12 - 1 *

Similar Documents

Publication Publication Date Title
Zhao et al. IRESbase: a comprehensive database of experimentally validated internal ribosome entry sites
Sun et al. CircCode: a powerful tool for identifying circRNA coding ability
Sacco et al. Recent insights and novel bioinformatics tools to understand the role of microRNAs binding to 5'untranslated region
Petsalaki et al. PredSL: a tool for the N-terminal sequence-based prediction of protein subcellular localization
Ausländer et al. Programmable single-cell mammalian biocomputers
Zhan et al. Accurate prediction of ncRNA-protein interactions from the integration of sequence and evolutionary information
Farahmand et al. ModEx: A text mining system for extracting mode of regulation of transcription factor-gene regulatory interaction
Seoud et al. TMT-HCC: A tool for text mining the biomedical literature for hepatocellular carcinoma (HCC) biomarkers identification
Saha et al. VGIchan: prediction and classification of voltage-gated ion channels
CN102268485A (en) Method for carrying out interaction between virus and human protein by text mining
Dong et al. Antimicrobial Peptides Prediction method based on sequence multidimensional feature embedding
Ji Rfoot: Transcriptome‐scale identification of RNA‐protein complexes from ribosome profiling data
Washietl et al. Identifying structural noncoding RNAs using RNAz
Saha et al. An overview of bioinformatics and computational genomics in modern plant science
de Pretis et al. Computational and experimental methods to decipher the epigenetic code
Li et al. Feature selection for the prediction of translation initiation sites
Stalder et al. SWATH-MS co-expression profiles reveal paralogue interference in protein complex evolution
Liang et al. LncRNAs: Current understanding, future directions, and challenges
Asim An Efficient Automated Machine Learning Framework for Genomics and Proteomics Sequence Analysis
Xiao et al. Using cellular automata to simulate domain evolution in proteins
Hu et al. LncSTPred: a predictive model of lncRNA subcellular localization and decipherment of the biological determinants influencing localization
Kongburan et al. Metabolite named entity recognition: A hybrid approach
Djordjevic et al. Scoring targets of transcription in bacteria rather than focusing on individual binding sites
Sokolov Biomarkers for depression: genetic, epigenetic, and expression evidence
Tao Studies on the network biology impact in pharmacology and toxicology

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20111207