Full Text Available Chaperonin GroEL (Cpn60 requires cofactor GroES (Cpn10 for protein refolding in bacteria that possess single groEL and groES genes in a bicistronic groESL operon. Among 4,861 completely-sequenced prokaryotic genomes, 884 possess duplicate groEL genes and 770 possess groEL genes with no neighboring groES. It is unclear whether stand-alone groEL requires groES in order to function and, if required, how duplicate groEL genes and unequal groES genes balance their expressions. In Myxococcus xanthus DK1622, we determined that, while duplicate groELs were alternatively deletable, the single groES that clusters with groEL1 was essential for cell survival. Either GroEL1 or GroEL2 required interactions with GroES for in vitro and in vivo functions. Deletion of groEL1 or groEL2 resulted in decreased expressions of both groEL and groES; and ectopic complementation of groEL recovered not only the groEL but also groES expressions. The addition of an extra groES gene upstream groEL2 to form a bicistronic operon had almost no influence on groES expression and the cell survival rate, whereas over-expression of groES using a self-replicating plasmid simultaneously increased the groEL expressions. The results indicated that M. xanthus DK1622 cells coordinate expressions of the duplicate groEL and single groES genes for synergistic functions of GroELs and GroES. We proposed a potential regulation mechanism for the expression coordination.
Hindersson, P; Høiby, N; Bangsborg, Jette Marie
shock expression signals were identified upstream of the L. micdadei groEL gene. Further upstream, a poly-T region, also a feature of the sigma 32-regulated Escherichia coli groELS heat shock operon, was found. Despite the high degree of homology of the expression signals in E. coli and L. micdadei...
Akad, F; Eybishtz, A; Edelbaum, D; Gorovits, R; Dar-Issa, O; Iraki, N; Czosnek, H
Some (perhaps all) plant viruses transmitted in a circulative manner by their insect vectors avoid destruction in the haemolymph by interacting with GroEL homologues, ensuring transmission. We have previously shown that the phloem-limited begomovirus tomato yellow leaf curl virus (TYLCV) interacts in vivo and in vitro with GroEL produced by the whitefly vector Bemisia tabaci. In this study, we have exploited this phenomenon to generate transgenic tomato plants expressing the whitefly GroEL in their phloem. We postulated that following inoculation, TYLCV particles will be trapped by GroEL in the plant phloem, thereby inhibiting virus replication and movement, thereby rendering the plants resistant. A whitefly GroEL gene was cloned in an Agrobacterium vector under the control of an Arabidopsis phloem-specific promoter, which was used to transform two tomato genotypes. During three consecutive generations, plants expressing GroEL exhibited mild or no disease symptoms upon whitefly-mediated inoculation of TYLCV. In vitro assays indicated that the sap of resistant plants contained GroEL-TYLCV complexes. Infected resistant plants served as virus source for whitefly-mediated transmission as effectively as infected non-transgenic tomato. Non-transgenic susceptible tomato plants grafted on resistant GroEL-transgenic scions remained susceptible, although GroEL translocated into the grafted plant and GroEL-TYLCV complexes were detected in the grafted tissues.
Hossain, Muhammad Tofazzal; Kim, Young-Ok; Kong, In-Soo
Vibrio parahaemolyticus is a significant cause of human gastrointestinal disorders worldwide, transmitted primarily by ingestion of raw or undercooked contaminated seafood. In this study, a multiplex PCR assay for the detection and differentiation of V. parahaemolyticus strains was developed using primer sets for a species-specific marker, groEL, and two virulence markers, tdh and trh. Multiplex PCR conditions were standardised, and extracted genomic DNA of 70 V. parahaemolyticus strains was used for identification. The sensitivity and efficacy of this method were validated using artificially inoculated shellfish and seawater. The expected sizes of amplicons were 510 bp, 382 bp, and 171 bp for groEL, tdh and trh, respectively. PCR products were sufficiently different in size, and the detection limits of the multiplex PCR for groEL, tdh and trh were each 200 pg DNA. Specific detection and differentiation of virulent from non-virulent strains in shellfish homogenates and seawater was also possible after artificial inoculation with various V. parahaemolyticus strains. This newly developed multiplex PCR is a rapid assay for detection and differentiation of pathogenic V. parahaemolyticus strains, and could be used to prevent disease outbreaks and protect public health by helping the seafood industry maintain a safe shellfish supply.
Full Text Available Kinetic analyses of GroE-assisted folding provide a dynamic sequence of molecular events that underlie chaperonin function. We used stopped-flow analysis of various fluorescent GroEL mutants to obtain details regarding the sequence of events that transpire immediately after ATP binding to GroEL and GroEL with prebound unfolded proteins. Characterization of GroEL CP86, a circularly permuted GroEL with the polypeptide ends relocated to the vicinity of the ATP binding site, showed that GroES binding and protection of unfolded protein from solution is achieved surprisingly early in the functional cycle, and in spite of greatly reduced apical domain movement. Analysis of fluorescent GroEL SR-1 and GroEL D398A variants suggested that among other factors, the presence of two GroEL rings and a specific conformational rearrangement of Helix M in GroEL contribute significantly to the rapid release of unfolded protein from the GroEL apical domain.
Li, Leilei; Wieme, Anneleen; Spitaels, Freek; Balzarini, Tom; Nunes, Olga C; Manaia, Célia M; Van Landschoot, Anita; De Vuyst, Luc; Cleenwerck, Ilse; Vandamme, Peter
Five acetic acid bacteria isolates, awK9_3, awK9_4 ( = LMG 27543), awK9_5 ( = LMG 28092), awK9_6 and awK9_9, obtained during a study of micro-organisms present in traditionally produced kefir, were grouped on the basis of their MALDI-TOF MS profile with LMG 1530 and LMG 1531(T), two strains currently classified as members of the genus Acetobacter. Phylogenetic analysis based on nearly complete 16S rRNA gene sequences as well as on concatenated partial sequences of the housekeeping genes dnaK, groEL and rpoB indicated that these isolates were representatives of a single novel species together with LMG 1530 and LMG 1531(T) in the genus Acetobacter, with Acetobacter aceti, Acetobacter nitrogenifigens, Acetobacter oeni and Acetobacter estunensis as nearest phylogenetic neighbours. Pairwise similarity of 16S rRNA gene sequences between LMG 1531(T) and the type strains of the above-mentioned species were 99.7%, 99.1%, 98.4% and 98.2%, respectively. DNA-DNA hybridizations confirmed that status, while amplified fragment length polymorphism (AFLP) and random amplified polymorphic DNA (RAPD) data indicated that LMG 1531(T), LMG 1530, LMG 27543 and LMG 28092 represent at least two different strains of the novel species. The major fatty acid of LMG 1531(T) and LMG 27543 was C18 : 1ω7c. The major ubiquinone present was Q-9 and the DNA G+C contents of LMG 1531(T) and LMG 27543 were 58.3 and 56.7 mol%, respectively. The strains were able to grow on D-fructose and D-sorbitol as a single carbon source. They were also able to grow on yeast extract with 30% D-glucose and on standard medium with pH 3.6 or containing 1% NaCl. They had a weak ability to produce acid from d-arabinose. These features enabled their differentiation from their nearest phylogenetic neighbours. The name Acetobacter sicerae sp. nov. is proposed with LMG 1531(T) ( = NCIMB 8941(T)) as the type strain. © 2014 IUMS.
Jung, Y-J; Choi, Y-J; An, S-J; Lee, H-R; Jun, H-K; Choi, B-K
Tannerella forsythia is a major periodontal pathogen, and T. forsythia GroEL is a molecular chaperone homologous to human heat-shock protein 60. Interleukin-17 (IL-17) has been implicated in the pathogenesis of periodontitis and several systemic diseases. This study investigated the potential of T. forsythia GroEL to induce inflammatory bone resorption and examined the cooperative effect of IL-17 and T. forsythia GroEL on inflammatory responses. Human gingival fibroblasts (HGFs) and periodontal ligament (PDL) fibroblasts were stimulated with T. forsythia GroEL and/or IL-17. Gene expression of IL-6, IL-8, and cyclooxygenase-2 (COX-2) and concentrations of IL-6, IL-8, and prostaglandin E2 (PGE2 ) were measured by real-time reverse transcription polymerase chain reaction and enzyme-linked immunosorbent assays, respectively. After stimulation of MG63 cells with T. forsythia GroEL and/or IL-17, gene expression of osteoprotegerin (OPG) was examined. After subcutaneous injection of T. forsythia GroEL and/or IL-17 above the calvaria of BALB/c mice, calvarial bone resorption was assessed by micro-computed tomography and histological examination. Tannerella forsythia GroEL induced IL-6 and IL-8 production in HGFs and PDL cells, and IL-17 further promoted IL-6 and IL-8 production. Both T. forsythia GroEL and IL-17 synergistically increased PGE2 production and inhibited OPG gene expression. Calvarial bone resorption was induced by T. forsythia GroEL injection, and simultaneous injection of T. forsythia GroEL and IL-17 further increased bone resorption. These results suggest that T. forsythia GroEL is a novel virulence factor that can contribute to inflammatory bone resorption caused by T. forsythia and synergizes with IL-17 to exacerbate inflammation and bone resorption.
Potnis, Akhilesh A; Rajaram, Hema; Apte, Shree K
The nitrogen-fixing cyanobacterium, Anabaena L-31 has two Hsp60 proteins, 59 kDa GroEL coded by the second gene of groESL operon and 61 kDa Cpn60 coded by cpn60 gene. Anabaena GroEL formed stable higher oligomer (>12-mer) in the presence of K(+) and prevented thermal aggregation of malate dehydrogenase (MDH). Using three protein substrates (MDH, All1541 and green fluorescent protein), it was found that the refolding activity of Anabaena GroEL was lower than that of Escherichia coli GroEL, but independent of both GroES and ATP. This correlated with in vivo data. GroEL exhibited ATPase activity which was enhanced in the presence of GroES and absence of a denatured protein, contrary to that observed for bacterial GroEL. However, a significant role for ATP could not be ascertained during in vitro folding assays. The monomeric Cpn60 exhibited much lower refolding activity than GroEL, unaffected by GroES and ATP. In vitro studies revealed inhibition of the refolding activity of Anabaena GroEL by Cpn60, which could be due to their different oligomeric status. The role of GroES and ATP may have been added during the course of evolution from the ancient cyanobacteria to modern day bacteria enhancing the refolding ability and ensuring wider scope of substrates for GroEL.
目的 克隆结核分枝杆菌基因组携带的两组GroEL基因,即GroEL1和GroEL2,它们分别编码热休克蛋白HSP60(cpn60.1)和HSP65(cpn60.2),并构建真核表达质粒.方法 用带有EcoR I和Xba I酶切位点的特异性引物,从结核杆菌H37Rv株中PCR扩增两组GroEL基因,构建重组真核表达质粒pCDNA3.1-GroEL1和pCDNA3.1-GroEL2,并进行酶切和测序鉴定.结果 成功克隆了结核分枝杆菌两组GroEL基因,酶切鉴定pCDNA3.1-GroEL1和pCDNA3.1-GroEL2构建成功,经DNA测序证实,载体上插入的序列与GenBank公布的序列一致.结论 获得了结核分枝杆菌GroEL1和GroEL2基因,成功地构建了含GroEL1和GroEL2基因的真核表达质粒,为研究其作为核酸疫苗在免疫学方面作用提供材料.
Full Text Available GroEL chaperonin is well-known to interact with a wide variety of polypeptide chains. Here we show the data related to our previous work (http://dx.doi.org/10.1016/j.pep.2015.11.020 , and concerning the interaction of GroEL with native (lysozyme, α-lactalbumin and denatured (lysozyme, α-lactalbumin and pepsin proteins in solution. The use of affinity chromatography on the base of denatured pepsin for GroEL purification from fluorescent impurities is represented as well.
Full Text Available Abstract Background Sequence and transcriptional variability within and between individuals are typically studied independently. The joint analysis of sequence and gene expression variation (genetical genomics provides insight into the role of linked sequence variation in the regulation of gene expression. We investigated the role of sequence variation in cis on gene expression (cis sequence effects in a group of genes commonly studied in cancer research in lymphoblastoid cell lines. We estimated the proportion of genes exhibiting cis sequence effects and the proportion of gene expression variation explained by cis sequence effects using three different analytical approaches, and compared our results to the literature. Results We generated gene expression profiling data at N = 697 candidate genes from N = 30 lymphoblastoid cell lines for this study and used available candidate gene resequencing data at N = 552 candidate genes to identify N = 30 candidate genes with sufficient variance in both datasets for the investigation of cis sequence effects. We used two additive models and the haplotype phylogeny scanning approach of Templeton (Tree Scanning to evaluate association between individual SNPs, all SNPs at a gene, and diplotypes, with log-transformed gene expression. SNPs and diplotypes at eight candidate genes exhibited statistically significant (p cis sequence effects in our study, respectively. Conclusion Based on analysis of our results and the extant literature, one in four genes exhibits significant cis sequence effects, and for these genes, about 30% of gene expression variation is accounted for by cis sequence variation. Despite diverse experimental approaches, the presence or absence of significant cis sequence effects is largely supported by previously published studies.
Gilissen, C.; Hoischen, A.; Brunner, H.G.; Veltman, J.A.
Next generation sequencing can be used to search for Mendelian disease genes in an unbiased manner by sequencing the entire protein-coding sequence, known as the exome, or even the entire human genome. Identifying the pathogenic mutation amongst thousands to millions of genomic variants is a major c
EVERS, ME; LANGER, T; HARDER, W; HARTL, FU; VEENHUIS, M; Hartl, Franz-Ulrich
We have studied the use of yeast peroxisomal alcohol oxidase (AO) as a model protein for in vitro binding by GroEL. Dilution of denatured AO in neutral buffer leads to aggregation of the protein, which is prevented by the addition of GroEL. Formation of complexes between GroEL and denatured AO was d
Full Text Available Abstract Background Synaptotagmins exist as a large gene family in mammals. There is much interest in the function of certain family members which act crucially in the regulated synaptic vesicle exocytosis required for efficient neurotransmission. Knowledge of the functions of other family members is relatively poor and the presence of Synaptotagmin genes in plants indicates a role for the family as a whole which is wider than neurotransmission. Identification of the Synaptotagmin genes within completely sequenced genomes can provide the entire Synaptotagmin gene complement of each sequenced organism. Defining the detailed structures of all the Synaptotagmin genes and their encoded products can provide a useful resource for functional studies and a deeper understanding of the evolution of the gene family. The current rapid increase in the number of sequenced genomes from different branches of the tree of life, together with the public deposition of evolutionarily diverse transcript sequences make such studies worthwhile. Results I have compiled a detailed list of the Synaptotagmin genes of Caenorhabditis, Anopheles, Drosophila, Ciona, Danio, Fugu, Mus, Homo, Arabidopsis and Oryza by examining genomic and transcript sequences from public sequence databases together with some transcript sequences obtained by cDNA library screening and RT-PCR. I have compared all of the genes and investigated the relationship between plant Synaptotagmins and their non-Synaptotagmin counterparts. Conclusions I have identified and compared 98 Synaptotagmin genes from 10 sequenced genomes. Detailed comparison of transcript sequences reveals abundant and complex variation in Synaptotagmin gene expression and indicates the presence of Synaptotagmin genes in all animals and land plants. Amino acid sequence comparisons indicate patterns of conservation and diversity in function. Phylogenetic analysis shows the origin of Synaptotagmins in multicellular eukaryotes and their
The abundance of expressed gene and protein sequences available in the biological information databases facilitates comparison of protein homologies. A high degree of sequence similarity typically implies homology regarding structure and function and may provide clues to antibody cross-react...
WEI Fang-ping; LI Sheng; MA Hong-ru
A network of 3719 tRNA gene sequences was constructed using simplest alignment. Its topology, degree distribution and clustering coefficient were studied. The behaviors of the network shift from fluctuated distribution to scale-free distribution when the similarity degree of the tRNA gene sequences increases. The tRNA gene sequences with the same anticodon identity are more self-organized than those with different anticodon identities and form local clusters in the network. Some vertices of the local cluster have a high connection with other local clusters, and the probable reason was given. Moreover, a network constructed by the same number of random tRNA sequences was used to make comparisons. The relationships between the properties of the tRNA similarity network and the characters of tRNA evolutionary history were discussed.
Whelan, Nathan V; Kocot, Kevin M; Santos, Scott R; Halanych, Kenneth M
Nemerteans are one of few animal groups that have evolved the ability to utilize toxins for both defense and subduing prey, but little is known about specific nemertean toxins. In particular, no study has identified specific toxin genes even though peptide toxins are known from some nemertean species. Information about toxin genes is needed to better understand evolution of toxins across animals and possibly provide novel targets for pharmaceutical and industrial applications. We sequenced and annotated transcriptomes of two free-living and one commensal nemertean and annotated an additional six publicly available nemertean transcriptomes to identify putative toxin genes. Approximately 63-74% of predicted open reading frames in each transcriptome were annotated with gene names, and all species had similar percentages of transcripts annotated with each higher-level GO term. Every nemertean analyzed possessed genes with high sequence similarities to known animal toxins including those from stonefish, cephalopods, and sea anemones. One toxin-like gene found in all nemerteans analyzed had high sequence similarity to Plancitoxin-1, a DNase II hepatotoxin that may function well at low pH, which suggests that the acidic body walls of some nemerteans could work to enhance the efficacy of protein toxins. The highest number of toxin-like genes found in any one species was seven and the lowest was three. The diversity of toxin-like nemertean genes found here is greater than previously documented, and these animals are likely an ideal system for exploring toxin evolution and industrial applications of toxins. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Full Text Available Leishmania Homologue of receptors for Activated C Kinase (LACK antigen is a 36-kDa protein, which provokes a very early immune response against Leishmania infection. There are several reports on the expression of LACK through different life-cycle stages of genus Leishmania, but only a few of them have focused on L.tropica.The present study provides details of the cloning, DNA sequencing and gene expression of LACK in this parasite species. First, several local isolates of Leishmania parasites were typed in our laboratory using PCR technique to verify of Leishmania parasite species. After that, LACK gene was amplified and cloned into a vector for sequencing. Finally, the expression of this molecule in logarithmic and stationary growth phase promastigotes, as well as in amastigotes, was evaluated by Reverse Transcription-PCR (RT-PCR technique.The typing result confirmed that all our local isolates belong to L.tropica. LACK gene sequence was determined and high similarity was observed with the sequences of other Leishmania species. Furthermore, the expression of LACK gene in both promastigotes and amastigotes forms was confirmed.Overall, the data set the stage for future studies of the properties and immune role of LACK gene products.
Wiborg, O; Hyldig-Nielsen, J J; Jensen, E O
We present the complete nucleotide sequences of two leghemoglobin genes isolated from soybean DNA. Both genes contain three intervening sequences in identical positions. Comparison of the coding sequences with known amino-acid sequences of soybean leghemoglobins suggest that the two genes...
SHI Juan-zi; LIU Yan-fang; YAO Yuan-qing; YAN Wei; ZHU Feng; ZHAO Zhong-liang
To clone genes specifically expressed in the placenta of patients with preeclampsia, and to explain the mechanism in the etiopathology ofpreeclampsia. Methods: The placentae ofpreeclamptic and normotensive subjects with pregnancy were used as models, and the cDNA Library was constructed and 20 differentially expressed fragments were cloned after a new version of PCR-based subtractive hybridization. The false positive clones were identified by reverse dot blot analysis. With one of the obtained gene taken as the probe, the placentas of 10 normal pregnant women and 10 preeclamptic patients were studied by using dot hybridization methods. Results: Six false positive clones were identified by reverse dot blot, and the rest 14 clones were identified as preeclampsia-related genes. These clones were sequenced, and analyzed with BLAST analysis system. Eleven of 14 clones were genes already known, among which one belongs to necdin family; the rest 3 were identified as novel genes. These 3 genes were acknowledged by GenBank, with the accession numbers AF232216, AF232217, AF233648. The results of dot hybridization using necdin gene as probe were as follows: (1) There was this mRNA in the placental tissues of normal pregnancy as well as in that ofpreeclampsia.(2) The intensity of transcription of this mRNA in the placental tissues of preeclampsia increased significantly compared with that of the normal pregnancy (P<0.05). Conclusions: This study for the first time reported this group of genes, especially necdin-expressing gene, which are related to the etiopathology of preeclampsia. In addition, the overtranscription ofnecdin gene has been found in preeclampsia. It is helpful in further studies of the etiology ofpreeclampsia.
Full Text Available Although much of the information regarding genes' expressions is encoded in the genome, deciphering such information has been very challenging. We reexamined Beer and Tavazoie's (BT approach to predict mRNA expression patterns of 2,587 genes in Saccharomyces cerevisiae from the information in their respective promoter sequences. Instead of fitting complex Bayesian network models, we trained naïve Bayes classifiers using only the sequence-motif matching scores provided by BT. Our simple models correctly predict expression patterns for 79% of the genes, based on the same criterion and the same cross-validation (CV procedure as BT, which compares favorably to the 73% accuracy of BT. The fact that our approach did not use position and orientation information of the predicted binding sites but achieved a higher prediction accuracy, motivated us to investigate a few biological predictions made by BT. We found that some of their predictions, especially those related to motif orientations and positions, are at best circumstantial. For example, the combinatorial rules suggested by BT for the PAC and RRPE motifs are not unique to the cluster of genes from which the predictive model was inferred, and there are simpler rules that are statistically more significant than BT's ones. We also show that CV procedure used by BT to estimate their method's prediction accuracy is inappropriate and may have overestimated the prediction accuracy by about 10%.
Charbon, Godefroid; Wang, Jiangyun; Brustad, Eric
Escherichia coli cells, indicated colocalization with the cell division protein FtsZ at the cleavage furrow, while a second E. coli study of fixed cells indicated more even distribution throughout the cytoplasm. Here, for the first time, we have examined the spatial distribution of GroEL in living cells using......The molecular chaperone GroEL is required for bacterial growth under all conditions, mediating folding assistance, via its central cavity, to a diverse set of cytosolic proteins; yet the subcellular localization of GroEL remains unresolved. An earlier study, using antibody probing of fixed...... incorporation of a fluorescent unnatural amino acid into the chaperone. Fluorescence microscopy indicated that GroEL is diffusely distributed, both under normal and stress conditions. Importantly, the present procedure uses a small, fluorescent unnatural amino acid to visualize GroEL in vivo, avoiding...
Fan, H; Mark, AE
Bacterial chaperonin, GroEL, together with its co-chaperonin, GroES, facilitates the folding of a variety of polypeptides. Experiments suggest that GroEL stimulates protein folding by multiple cycles of binding and release. Misfolded proteins first bind to an exposed hydrophobic surface on GroEL. Gr
How and when the first DNA sequence of a gene was determined? In 1977, F. Sanger came up with an innovative technology to sequence DNA by using chain terminators, and determined the entire DNA sequence of the 5375-base genome of bacteriophage φX 174 (Sanger et al., 1977). While this Sanger's achievement has been recognized as the first DNA sequencing of genes, we had determined DNA sequence of a gene, albeit a partial sequence, 11 years before the Sanger's DNA sequence (Okada et al., 1966).
Kilstrup, Mogens; Jacobsen, Susanne; Hammer, Karin
laboratory strain MG1363, which was originally derived from a dairy strain, After two-dimensional separation of proteins, the DnaK, GroEL, and GroES heat shock proteins, the HrcA (Orf1) heat shack repressor, and the glycolytic enzymes pyruvate kinase, glyceral-dehyde-3-phosphate dehydrogenase......, and phosphoglycerate kinase were identified by a combination of Western blotting and direct N-terminal amino acid sequencing of proteins from the gels, Of 400 to 500 visible proteins, 17 were induced more than twofold during heat stress, Two classes of heat stress proteins were identified from their temporal induction...... proteins exhibited a gradual increase in the rate of synthesis after the onset of stress, Unlike other organisms, all salt stress-induced proteins in L. lactis were also subjected to heat stress induction. DnaK, GroEL, and GroES showed similar temporal patterns of induction during salt stress, resembling...
Hyatt, Philip Douglas [ORNL; LoCascio, Philip F [ORNL; Hauser, Loren John [ORNL; Uberbacher, Edward C [ORNL
Gene prediction in metagenomic sequences remains a difficult problem. Current sequencing technologies do not achieve sufficient coverage to assemble the individual genomes in a typical sample; consequently, sequencing runs produce a large number of short sequences whose exact origin is unknown. Since these sequences are usually smaller than the average length of a gene, algorithms must make predictions based on very little data. We present MetaProdigal, a metagenomic version of the gene prediction program Prodigal, that can identify genes in short, anonymous coding sequences with a high degree of accuracy. The novel value of the method consists of enhanced translation initiation site identification, ability to identify sequences that use alternate genetic codes and confidence values for each gene call. We compare the results of MetaProdigal with other methods and conclude with a discussion of future improvements.
佟晓冬; 杨征; 董晓燕; 孙彦
Expanded bed adsorption (EBA) is an integrative downstream processing technique for the purification of biological substances directly from unclarified feedstock. In this study, molecular chaperone GroEL, an important protein folding helper both in vivo and in vitro, was purified by the single-step EBA technique from the unclarified homogenate of recombinant E. coli cells. Compared with packed bed adsorption, the EBA technique provided a single-step approach to yield an electrophoretic purity of GroEL. After the homogenate loading and column washing in the expanded bed mode, the GroEL protein was recovered by stepwise salt-gradient elution in packed-bed or expanded-bed modes, respectively. The expanded-bed elution mode was found as efficient as the packed-bed mode in the purification of GroEL from cell disruptate.
Full Text Available BACKGROUND: The Escherichia coli chaperonin GroEL subunit consists of three domains linked via two hinge regions, and each domain is responsible for a specific role in the functional mechanism. Here, we have used circular permutation to study the structural and functional characteristics of the GroEL subunit. METHODOLOGY/PRINCIPAL FINDINGS: Three soluble, partially active mutants with polypeptide ends relocated into various positions of the apical domain of GroEL were isolated and studied. The basic functional hallmarks of GroEL (ATPase and chaperoning activities were retained in all three mutants. Certain functional characteristics, such as basal ATPase activity and ATPase inhibition by the cochaperonin GroES, differed in the mutants while at the same time, the ability to facilitate the refolding of rhodanese was roughly equal. Stopped-flow fluorescence experiments using a fluorescent variant of the circularly permuted GroEL CP376 revealed that a specific kinetic transition that reflects movements of the apical domain was missing in this mutant. This mutant also displayed several characteristics that suggested that the apical domains were behaving in an uncoordinated fashion. CONCLUSIONS/SIGNIFICANCE: The loss of apical domain coordination and a concomitant decrease in functional ability highlights the importance of certain conformational signals that are relayed through domain interlinks in GroEL. We propose that circular permutation is a very versatile tool to probe chaperonin structure and function.
Amemiya, Kei; Meyers, Jennifer L; Deshazer, David; Riggins, Renaldo N; Halasohoris, Stephanie; England, Marilyn; Ribot, Wilson; Norris, Sarah L; Waag, David M
We examined, by enzyme-linked immunosorbent assay and Western blot analysis, the host immune response to 2 heat-shock proteins (hsps) in a patient and mice previously infected with Burkholderia mallei. The patient was the first reported human glanders case in 50 years in the United States. The expression of the groEL and dnaK operons appeared to be dependent upon a sigma(32) RNA polymerase as suggested by conserved heat-shock promoter sequences, and the groESL operon may be negatively regulated by a controlling invert repeat of chaperone expression (CIRCE) site. In the antisera, the GroEL protein was found to be more immunoreactive than the DnaK protein in both a human patient and mice previously infected with B. mallei. Examination of the supernatant of a growing culture of B. mallei showed that more GroEL protein than DnaK protein was released from the cell. This may occur similarly within an infected host causing an elevated host immune response to the B. mallei hsps.
Chibon, P.Y.F.R.P.; Schoof, H.; Visser, R.G.F.; Finkers, H.J.
Marker2sequence (M2S) aims at mining quantitative trait loci (QTLs) for candidate genes. For each gene, within the QTL region, M2S uses data integration technology to integrate putative gene function with associated gene ontology terms, proteins, pathways and literature. As a typical QTL region
Have, Christian Theil; Mørk, Søren
We introduce a new type of probabilistic sequence model, that model the sequential composition of reading frames of genes in a genome. Our approach extends gene finders with a model of the sequential composition of genes at the genome-level -- effectively producing a sequential genome annotation...... and are evaluated by the effect on prediction performance. Since bacterial gene finding to a large extent is a solved problem it forms an ideal proving ground for evaluating the explicit modeling of larger scale gene sequence composition of genomes. We conclude that the sequential composition of gene reading frames...... as output. The model can be used to obtain the most probable genome annotation based on a combination of i: a gene finder score of each gene candidate and ii: the sequence of the reading frames of gene candidates through a genome. The model --- as well as a higher order variant --- is developed and tested...
Vipin Singh Rana
Full Text Available Cotton leaf curl virus (CLCuV (Gemininiviridae: Begomovirus is the causative agent of leaf curl disease in cotton plants (Gossypium hirsutum. CLCuV is exclusively transmitted by the whitefly species B. tabaci (Gennadius (Hemiptera: Alerodidae. B. tabaci contains several biotypes which harbor dissimilar bacterial endo-symbiotic community. It is reported that these bacterial endosymbionts produce a 63 kDa chaperon GroEL protein which binds to geminivirus particles and protects them from rapid degradation in gut and haemolymph. In biotype B, GroEL protein of Hamiltonella has been shown to interact with Tomato yellow leaf curl virus (TYLCV. The present study was initiated to find out whether endosymbionts of B. tabaci are similarly involved in CLCuV transmission in Sriganganagar (Rajasthan, an area endemic with cotton leaf curl disease. Biotype and endosymbiont diversity of B. tabaci were identified using MtCO1 and 16S rDNA genes respectively. Analysis of our results indicated that the collected B. tabaci population belong to AsiaII genetic group and harbor the primary endosymbiont Portiera and the secondary endosymbiont Arsenophonus. The GroEL proteins of Portiera and Arsenophonus were purified and in-vitro interaction studies were carried out using pull down and co-immunoprecipitation assays. In-vivo interaction was confirmed using yeast two hybrid system. In both in-vitro and in-vivo studies, the GroEL protein of Arsenophonus was found to be interacting with the CLCuV coat protein. Further, we also localized the presence of Arsenophonus in the salivary glands and the midgut of B. tabaci besides the already reported bacteriocytes. These results suggest the involvement of Arsenophonus in the transmission of CLCuV in AsiaII genetic group of B. tabaci.
Rana, Vipin Singh; Singh, Shalini Thakur; Priya, Natarajan Gayatri; Kumar, Jitendra; Rajagopal, Raman
Cotton leaf curl virus (CLCuV) (Gemininiviridae: Begomovirus) is the causative agent of leaf curl disease in cotton plants (Gossypium hirsutum). CLCuV is exclusively transmitted by the whitefly species B. tabaci (Gennadius) (Hemiptera: Alerodidae). B. tabaci contains several biotypes which harbor dissimilar bacterial endo-symbiotic community. It is reported that these bacterial endosymbionts produce a 63 kDa chaperon GroEL protein which binds to geminivirus particles and protects them from rapid degradation in gut and haemolymph. In biotype B, GroEL protein of Hamiltonella has been shown to interact with Tomato yellow leaf curl virus (TYLCV). The present study was initiated to find out whether endosymbionts of B. tabaci are similarly involved in CLCuV transmission in Sriganganagar (Rajasthan), an area endemic with cotton leaf curl disease. Biotype and endosymbiont diversity of B. tabaci were identified using MtCO1 and 16S rDNA genes respectively. Analysis of our results indicated that the collected B. tabaci population belong to AsiaII genetic group and harbor the primary endosymbiont Portiera and the secondary endosymbiont Arsenophonus. The GroEL proteins of Portiera and Arsenophonus were purified and in-vitro interaction studies were carried out using pull down and co-immunoprecipitation assays. In-vivo interaction was confirmed using yeast two hybrid system. In both in-vitro and in-vivo studies, the GroEL protein of Arsenophonus was found to be interacting with the CLCuV coat protein. Further, we also localized the presence of Arsenophonus in the salivary glands and the midgut of B. tabaci besides the already reported bacteriocytes. These results suggest the involvement of Arsenophonus in the transmission of CLCuV in AsiaII genetic group of B. tabaci.
Full Text Available Abstract Background The advent of high throughput sequencing technology has enabled the 1000 Genomes Project Pilot 3 to generate complete sequence data for more than 906 genes and 8,140 exons representing 697 subjects. The 1000 Genomes database provides a critical opportunity for further interpreting disease associations with single nucleotide polymorphisms (SNPs discovered from genetic association studies. Currently, direct sequencing of candidate genes or regions on a large number of subjects remains both cost- and time-prohibitive. Results To accelerate the translation from discovery to functional studies, we propose an in silico gene sequencing method (ISS, which predicts phased sequences of intragenic regions, using SNPs. The key underlying idea of our method is to infer diploid sequences (a pair of phased sequences/alleles at every functional locus utilizing the deep sequencing data from the 1000 Genomes Project and SNP data from the HapMap Project, and to build prediction models using flanking SNPs. Using this method, we have developed a database of prediction models for 611 known genes. Sequence prediction accuracy for these genes is 96.26% on average (ranges 79%-100%. This database of prediction models can be enhanced and scaled up to include new genes as the 1000 Genomes Project sequences additional genes on additional individuals. Applying our predictive model for the KCNJ11 gene to the Wellcome Trust Case Control Consortium (WTCCC Type 2 diabetes cohort, we demonstrate how the prediction of phased sequences inferred from GWAS SNP genotype data can be used to facilitate interpretation and identify a probable functional mechanism such as protein changes. Conclusions Prior to the general availability of routine sequencing of all subjects, the ISS method proposed here provides a time- and cost-effective approach to broadening the characterization of disease associated SNPs and regions, and facilitating the prioritization of candidate
dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn
, from faecal samples of 124 European individuals. The gene set, ,150 times larger than the human gene complement, contains an overwhelming majority of the prevalent (more frequent) microbial genes of the cohort and probably includes a large proportion of the prevalent human intestinal microbial genes......To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence...
Lewin, Astrid; Tran, Thi Tuyen; Jacob, Daniela; Mayer, Martin; Freytag, Barbara; Appel, Bernd
DNA transfer between pro- and eukaryotes occurs either during natural horizontal gene transfer or as a result of the employment of gene technology. We analysed the capacity of DNA sequences from a eukaryotic donor organism (Saccharomyces cerevisiae) to serve as promoter region in a prokaryotic recipient (Escherichia coli) by creating fusions between promoterless luxAB genes from Vibrio harveyi and random DNA sequences from S. cerevisiae and measuring the luminescence of transformed E. coli. Fifty-four out of 100 randomly analysed S. cerevisiae DNA sequences caused considerable gene expression in E. coli. Determination of transcription start sites within six selected yeast sequences in E. coli confirmed the existence of bacterial -10 and -35 consensus sequences at appropriate distances upstream from transcription initiation sites. Our results demonstrate that the probability of transcription of transferred eukaryotic DNA in bacteria is extremely high and does not require the insertion of the transferred DNA behind a promoter of the recipient genome.
Anthonius Y.P.B.C. Widyatmoko
Full Text Available Sequence polymorphisms among and within four Acacia species, A. aulacocarpa, A. auriculiformis, A. crassicarpa, and A. mangium, were investigated using four chloroplast DNA genes (atpA, petA, rbcL, and rpoA. The phylogenetic relationship among these species is discussed in light of the results of the sequence information. No intraspecific sequence variation was found in the four genes of the four species, and a conservative rate of mutation of the chloroplast DNA genes was also confirmed in the Acacia species. In the atpA and petA of the four genes, all four species possessed identical sequences, and no sequence variation was found among the four Acacia species. In the rbcL and rpoA genes, however, sequence polymorphisms were revealed among these species. Acacia aulacocarpa and A. crassicarpa shared an identical sequence, and A. auriculiformis and A. mangium also showed no sequence variation. The fact that A. mangium and A. auriculiformis shared identical sequences as did A. aulacocarpa and A. crassicarpa indicated that the two respective species were extremely closely related. Although a putative natural hybrid of A. aulacocarpa and A. auriculiformis has been reported, our results suggested that natural hybridization should be further verified using molecular markers.
Shamblott, M J; Litman, G W
Antibody to Heterodontus francisci (horned shark) immunoglobulin light chain was used to screen a spleen cDNA expression library, and recombinant clones encoding light chain genes were isolated. The complete sequences of the mature coding regions of two light chain genes in this phylogenetically distant vertebrate have been determined and are reported here. Comparisons of the sequences are consistent with the presence of mammalian-like framework and complementarity-determining regions. The predicted amino acid sequences of the genes are more related to mammalian lambda than to kappa light chains. The nucleotide sequences of the genes are most related to mammalian T-cell antigen receptor beta chain. Heterodontus light chain genes may reflect characteristics of the common ancestor of immunoglobulin and T-cell antigen receptors before its evolutionary diversification.
Ueno, Taro; Taguchi, Hideki; Tadakuma, Hisashi; Yoshida, Masasuke; Funatsu, Takashi
GroEL encapsulates nonnative substrate proteins in a central cavity capped by GroES, providing a safe folding cage. Conventional models assume that a single timer lasting approximately 8 s governs the ATP hydrolysis-driven GroEL chaperonin cycle. We examine single molecule imaging of GFP folding within the cavity, binding release dynamics of GroEL-GroES, ensemble measurements of GroEL/substrate FRET, and the initial kinetics of GroEL ATPase activity. We conclude that the cycle consists of two successive timers of approximately 3 s and approximately 5 s duration. During the first timer, GroEL is bound to ATP, substrate protein, and GroES. When the first timer ends, the substrate protein is released into the central cavity and folding begins. ATP hydrolysis and phosphate release immediately follow this transition. ADP, GroES, and substrate depart GroEL after the second timer is complete. This mechanism explains how GroES binding to a GroEL-substrate complex encapsulates the substrate rather than allowing it to escape into solution.
Varuna P. Panicker
Full Text Available Aim: An attempt has been made to identify and study the nucleotide sequence variability in exon 5 - exon 6 regions of guinea fowl Tapasin gene. Materials and Methods: Blood samples were collected from randomly selected birds (12 guinea fowl birds and Tapasin gene amplified using chicken specific primers designed from GenBank submitted sequences. Polymerase chain reaction conditions were standardized so as get only single amplicons. Obtained products were then cloned and sequenced; sequences were then analyzed using suitable software. Results: Amplicon size of the Tapasin gene in guinea fowl was same as reported in chicken with areas of transitions and transversions. The sequence variations reported in these coding sequences might have influence in the protein structure, which may be correlated with the increased immune status of the bird when compared with chicken breeds. Conclusion: Since Tapasin gene is an immunologically important gene, which plays an important role in the immune status of the bird. Sequence variations in the gene can be correlated with the altered immune status of the bird.
Ferrin Thomas E
Full Text Available Abstract Background Gene knockouts in a model organism such as mouse provide a valuable resource for the study of basic biology and human disease. Determining which gene has been inactivated by an untargeted gene trapping event poses a challenging annotation problem because gene trap sequence tags, which represent sequence near the vector insertion site of a trapped gene, are typically short and often contain unresolved residues. To understand better the localization of these sequences on the mouse genome, we compared stand-alone versions of the alignment programs BLAT, SSAHA, and MegaBLAST. A set of 3,369 sequence tags was aligned to build 34 of the mouse genome using default parameters for each algorithm. Known genome coordinates for the cognate set of full-length genes (1,659 sequences were used to evaluate localization results. Results In general, all three programs performed well in terms of localizing sequences to a general region of the genome, with only relatively subtle errors identified for a small proportion of the sequence tags. However, large differences in performance were noted with regard to correctly identifying exon boundaries. BLAT correctly identified the vast majority of exon boundaries, while SSAHA and MegaBLAST missed the majority of exon boundaries. SSAHA consistently reported the fewest false positives and is the fastest algorithm. MegaBLAST was comparable to BLAT in speed, but was the most susceptible to localizing sequence tags incorrectly to pseudogenes. Conclusion The differences in performance for sequence tags and full-length reference sequences were surprisingly small. Characteristic variations in localization results for each program were noted that affect the localization of sequence at exon boundaries, in particular.
Hutsko, Stephanie L; Lilburn, Michael S; Wick, Macdonald
We successfully designed and validated degenerative primers for turkey genes MUC2, RPS13, TBP and TFF2 based on chicken sequences in order to use gene transcription analysis to evaluate (quantify) the mucin transcription to probiotic supplementation in turkeys. Primers were designed for the genes MUC2, TFF2, RPS13 and TBP using a degenerative primer design method based on the available Gallus gallus sequences. All primer sets, which produced a single PCR amplicon of the expected sizes, were cloned into the TOPO(®) vector and then transformed into TOP 10(®) competent cells. Plasmid DNA isolation was performed on the TOP10(®) cell culture and sent for sequencing. Sequences were analyzed using NCBI BLAST. All genes sequenced had over 90% homology with both the chicken and predicted turkey sequences. The sequences were used to design new 100% homologous primer sets for the genes of interest. © 2016 Poultry Science Association Inc.
Hood, Elizabeth; Teoh, Thomas
This invention is in the field of plant biology and agriculture and relates to novel seed specific promoter regions. The present invention further provide methods of producing proteins and other products of interest and methods of controlling expression of nucleic acid sequences of interest using the seed specific promoter regions.
Antonov, Ivan; Baranov, Pavel; Borodovsky, Mark
Database annotations of prokaryotic genomes and eukaryotic mRNA sequences pay relatively low attention to frame transitions that disrupt protein-coding genes. Frame transitions (frameshifts) could be caused by sequencing errors or indel mutations inside protein-coding regions. Other observed frameshifts are related to recoding events (that evolved to control expression of some genes). Earlier, we have developed an algorithm and software program GeneTack for ab initio frameshift finding in intronless genes. Here, we describe a database (freely available at http://topaz.gatech.edu/GeneTack/db.html) containing genes with frameshifts (fs-genes) predicted by GeneTack. The database includes 206 991 fs-genes from 1106 complete prokaryotic genomes and 45 295 frameshifts predicted in mRNA sequences from 100 eukaryotic genomes. The whole set of fs-genes was grouped into clusters based on sequence similarity between fs-proteins (conceptually translated fs-genes), conservation of the frameshift position and frameshift direction (-1, +1). The fs-genes can be retrieved by similarity search to a given query sequence via a web interface, by fs-gene cluster browsing, etc. Clusters of fs-genes are characterized with respect to their likely origin, such as pseudogenization, phase variation, etc. The largest clusters contain fs-genes with programed frameshifts (related to recoding events).
Full Text Available The present investigation was aimed at understanding the molecular mechanism of gene amplification. Interplay of fragile sites in promoting gene amplification was also elucidated. The amplification promoting sequences were chosen from the Saccharomyces cerevisiae ARS, 5S rRNA regions of Plantago ovata and P. lagopus, proposed sites of replication pausing at Ste20 gene locus of S. cerevisiae, and the bend DNA sequences within fragile site FRA11A in humans. The gene amplification assays showed that plasmid bearing APS from yeast and human beings led to enhanced protein concentration as compared to the wild type. Both the in silico and in vitro analyses were pointed out at the strong bending potential of these APS. In addition, high mitotic stability and presence of TTTT repeats and SAR amongst these sequences encourage gene amplification. Phylogenetic analysis of S. cerevisiae ARS was also conducted. The combinatorial power of different aspects of APS analyzed in the present investigation was harnessed to reach a consensus about the factors which stimulate gene expression, in presence of these sequences. It was concluded that the mechanism of gene amplification was that AT rich tracts present in fragile sites of yeast serve as binding sites for MAR/SAR and DNA unwinding elements. The DNA protein interactions necessary for ORC activation are facilitated by DNA bending. These specific bindings at ORC promote repeated rounds of DNA replication leading to gene amplification.
Davidsen, T.; Rodland, E.A.; Lagesen, K.
coding regions are the DNA uptake sequences (DUS) required for natural genetic transformation. More importantly, we found a significantly higher density of DUS within genes involved in DNA repair, recombination, restriction-modification and replication than in any other annotated gene group......Repeated sequence signatures are characteristic features of all genomic DNA. We have made a rigorous search for repeat genomic sequences in the human pathogens Neisseria meningitidis, Neisseria gonorrhoeae and Haemophilus influenzae and found that by far the most frequent 9-10mers residing within...
Full Text Available Porphyromonas gingivalis is a major periodontal pathogen that contains a variety of virulence factors. The antibody titer to P. gingivalis GroEL, a homologue of HSP60, is significantly higher in periodontitis patients than in healthy control subjects, suggesting that P. gingivalis GroEL is a potential stimulator of periodontal disease. However, the specific role of GroEL in periodontal disease remains unclear. Here, we investigated the effect of P. gingivalis GroEL on human periodontal ligament (PDL cells in vitro, as well as its effect on alveolar bone resorption in rats in vivo. First, we found that stimulation of PDL cells with recombinant GroEL increased the secretion of the bone resorption-associated cytokines interleukin (IL-6 and IL-8, potentially via NF-κB activation. Furthermore, GroEL could effectively stimulate PDL cell migration, possibly through activation of integrin α1 and α2 mRNA expression as well as cytoskeletal reorganization. Additionally, GroEL may be involved in osteoclastogenesis via receptor activator of nuclear factor κ-B ligand (RANKL activation and alkaline phosphatase (ALP mRNA inhibition in PDL cells. Finally, we inoculated GroEL into rat gingiva, and the results of microcomputed tomography (micro-CT and histomorphometric assays indicated that the administration of GroEL significantly increased inflammation and bone loss. In conclusion, P. gingivalis GroEL may act as a potent virulence factor, contributing to osteoclastogenesis of PDL cells and resulting in periodontal disease with alveolar bone resorption.
Full Text Available The number and function of endothelial progenitor cells (EPCs are sensitive to hyperglycemia, hypertension, and smoking in humans, which are also associated with the development of atherosclerosis. GroEL1 from Chlamydia pneumoniae has been found in atherosclerotic lesions and is related to atherosclerotic pathogenesis. However, the actual effects of GroEL1 on EPC function are unclear. In this study, we investigate the EPC function in GroEL1-administered hind limb-ischemic C57BL/B6 and C57BL/10ScNJ (a toll-like receptor 4 (TLR4 mutation mice and human EPCs. In mice, laser Doppler imaging, flow cytometry, and immunohistochemistry were used to evaluate the degree of neo-vasculogenesis, circulating level of EPCs, and expression of CD34, vWF, and endothelial nitric oxide synthase (eNOS in vessels. Blood flow in the ischemic limb was significantly impaired in C57BL/B6 but not C57BL/10ScNJ mice treated with GroEL1. Circulating EPCs were also decreased after GroEL1 administration in C57BL/B6 mice. Additionally, GroEL1 inhibited the expression of CD34 and eNOS in C57BL/B6 ischemic muscle. In vitro, GroEL1 impaired the capacity of differentiation, mobilization, tube formation, and migration of EPCs. GroEL1 increased senescence, which was mediated by caspases, p38 MAPK, and ERK1/2 signaling in EPCs. Furthermore, GroEL1 decreased integrin and E-selectin expression and induced inflammatory responses in EPCs. In conclusion, these findings suggest that TLR4 and impaired NO-related mechanisms could contribute to the reduced number and functional activity of EPCs in the presence of GroEL1 from C. pneumoniae.
Full Text Available Recent advance in sequencing technology has enabled comprehensive profiling of genetic alterations in cancer. We have established a targeted sequencing platform using next-generation sequencing (NGS technology for clinical use, which can provide mutation and copy number variation data. NGS was performed with paired-end library enriched with exons of 183 cancer-related genes. Normal and tumor tissue pairs of 60 colorectal adenocarcinomas were used to test feasibility. Somatic mutation and copy number alteration were analyzed. A total of 526 somatic non-synonymous sequence variations were found in 113 genes. Among these, 278 single nucleotide variations were 232 different somatic point mutations. 216 SNV were 79 known single nucleotide polymorphisms in the dbSNP. 32 indels were 28 different indel mutations. Median number of mutated gene per tumor was 4 (range 0-23. Copy number gain (>X2 fold was found in 65 genes in 40 patients, whereas copy number loss (
Noonan, James P; Grimwood, Jane; Danke, Joshua; Schmutz, Jeremy; Dickson, Mark; Amemiya, Chris T; Myers, Richard M
The coelacanth is one of the nearest living relatives of tetrapods. However, a teleost species such as zebrafish or Fugu is typically used as the outgroup in current tetrapod comparative sequence analyses. Such studies are complicated by the fact that teleost genomes have undergone a whole-genome duplication event, as well as individual gene-duplication events. Here, we demonstrate the value of coelacanth genome sequence by complete sequencing and analysis of the protocadherin gene cluster of the Indonesian coelacanth, Latimeria menadoensis. We found that coelacanth has 49 protocadherin cluster genes organized in the same three ordered subclusters, alpha, beta, and gamma, as the 54 protocadherin cluster genes in human. In contrast, whole-genome and tandem duplications have generated two zebrafish protocadherin clusters comprised of at least 97 genes. Additionally, zebrafish protocadherins are far more prone to homogenizing gene conversion events than coelacanth protocadherins, suggesting that recombination- and duplication-driven plasticity may be a feature of teleost genomes. Our results indicate that coelacanth provides the ideal outgroup sequence against which tetrapod genomes can be measured. We therefore present L. menadoensis as a candidate for whole-genome sequencing.
Full Text Available Ingestion of staphylococcal enterotoxins preformed by Staphylococcus aureus in food leads to staphylococcal food poisoning, the most prevalent foodborne intoxication worldwide. There are five major staphylococcal enterotoxins: SEA, SEB, SEC, SED, and SEE. While variants of these toxins have been described and were linked to specific hosts or levels or enterotoxin production, data on sequence variation is still limited. In this study, we aim to extend the knowledge on promoter and gene variants of the major enterotoxins SEB, SEC, and SED. To this end, we determined seb, sec, and sed promoter and gene sequences of a well-characterized set of enterotoxigenic Staphylococcus aureus strains originating from foodborne outbreaks, human infections, human nasal colonization, rabbits, and cattle. New nucleotide sequence variants were detected for all three enterotoxins and a novel amino acid sequence variant of SED was detected in a strain associated with human nasal colonization. While the seb promoter and gene sequences exhibited a high degree of variability, the sec and sed promoter and gene were more conserved. Interestingly, a truncated variant of sed was detected in all tested sed harboring rabbit strains. The generated data represents a further step towards improved understanding of strain-specific differences in enterotoxin expression and host-specific variation in enterotoxin sequences.
Full Text Available For the vast majority of species - including many economically or ecologically important organisms, progress in biological research is hampered due to the lack of a reference genome sequence. Despite recent advances in sequencing technologies, several factors still limit the availability of such a critical resource. At the same time, many research groups and international consortia have already produced BAC libraries and physical maps and now are in a position to proceed with the development of whole-genome sequences organized around a physical map anchored to a genetic map. We propose a BAC-by-BAC sequencing protocol that combines combinatorial pooling design and second-generation sequencing technology to efficiently approach denovo selective genome sequencing. We show that combinatorial pooling is a cost-effective and practical alternative to exhaustive DNA barcoding when preparing sequencing libraries for hundreds or thousands of DNA samples, such as in this case gene-bearing minimum-tiling-path BAC clones. The novelty of the protocol hinges on the computational ability to efficiently compare hundred millions of short reads and assign them to the correct BAC clones (deconvolution so that the assembly can be carried out clone-by-clone. Experimental results on simulated data for the rice genome show that the deconvolution is very accurate, and the resulting BAC assemblies have high quality. Results on real data for a gene-rich subset of the barley genome confirm that the deconvolution is accurate and the BAC assemblies have good quality. While our method cannot provide the level of completeness that one would achieve with a comprehensive whole-genome sequencing project, we show that it is quite successful in reconstructing the gene sequences within BACs. In the case of plants such as barley, this level of sequence knowledge is sufficient to support critical end-point objectives such as map-based cloning and marker-assisted breeding.
Yu Gao; Lumei Chi; Yinshi Jin; Guangxian Nan
PCR amplification and sequencing of whole blood DNA from an individual with hereditary spastic paraplegia, as well as family members, revealed a fragment of proteolipid protein 1 (PLP1) gene exon 1, which excluded the possibility of isomer 1 expression for this family. The fragment sequence of exon 3 and exon 5 was consistent with the proteolipid protein 1 sequence at NCBI. In the proband samples, a PLP1 point mutation in exon 4 was detected at the basic group of position 844, T→C, phenylalanine→leucine. In proband samples from a male cousin, the basic group at position 844 was C, but gene sequencing signals revealed mixed signals of T and C, indicating possible mutation at this locus. Results demonstrated that changes in PLP1 exon 4 amino acids were associated with onset of hereditary spastic paraplegia.
Porteous David J
Full Text Available Abstract Background Regions of interest identified through genetic linkage studies regularly exceed 30 centimorgans in size and can contain hundreds of genes. Traditionally this number is reduced by matching functional annotation to knowledge of the disease or phenotype in question. However, here we show that disease genes share patterns of sequence-based features that can provide a good basis for automatic prioritization of candidates by machine learning. Results We examined a variety of sequence-based features and found that for many of them there are significant differences between the sets of genes known to be involved in human hereditary disease and those not known to be involved in disease. We have created an automatic classifier called PROSPECTR based on those features using the alternating decision tree algorithm which ranks genes in the order of likelihood of involvement in disease. On average, PROSPECTR enriches lists for disease genes two-fold 77% of the time, five-fold 37% of the time and twenty-fold 11% of the time. Conclusion PROSPECTR is a simple and effective way to identify genes involved in Mendelian and oligogenic disorders. It performs markedly better than the single existing sequence-based classifier on novel data. PROSPECTR could save investigators looking at large regions of interest time and effort by prioritizing positional candidate genes for mutation detection and case-control association studies.
dos Santos, Marcelo Bertalan Quintanilha; Sicheritz-Pontén, Thomas; Nielsen, Henrik Bjørn;
To understand the impact of gut microbes on human health and well-being it is crucial to assess their genetic potential. Here we describe the Illumina-based metagenomic sequencing, assembly and characterization of 3.3 million non-redundant microbial genes, derived from 576.7 gigabases of sequence...... gut metagenome and the minimal gut bacterial genome in terms of functions present in all individuals and most bacteria, respectively....
Full Text Available High throughput sequencing (HTS yields tens of thousands to millions of sequences that require a large amount of pre-processing work to clean various artifacts. Such cleaning cannot be performed manually. Existing programs are not suitable for immunoglobulin (Ig genes, which are variable and often highly mutated. This paper describes Ig-HTS-Cleaner (Ig High Throughput Sequencing Cleaner, a program containing a simple cleaning procedure that successfully deals with pre-processing of Ig sequences derived from HTS, and Ig-Indel-Identifier (Ig Insertion – Deletion Identifier, a program for identifying legitimate and artifact insertions and/or deletions (indels. Our programs were designed for analyzing Ig gene sequences obtained by 454 sequencing, but they are applicable to all types of sequences and sequencing platforms. Ig-HTS-Cleaner and Ig-Indel-Identifier have been implemented in Java and saved as executable JAR files, supported on Linux and MS Windows. No special requirements are needed in order to run the programs, except for correctly constructing the input files as explained in the text. The programs' performance has been tested and validated on real and simulated data sets.
Herzog, Michel; Maroteaux, Luc
We present the complete sequence of the nuclear-encoded small-ribosomal-subunit RNA inferred from the cloned gene sequence of the dinoflagellate Prorocentrum micans. The dinoflagellate 17S rRNA sequence of 1798 nucleotides is contained in a family of 200 tandemly repeated genes per haploid genome. A tentative model of the secondary structure of P. micans 17S rRNA is presented. This sequence is compared with the small-ribosomal-subunit rRNA of Xenopus laevis (Animalia), Saccharomyces cerevisiae (Fungi), Zea mays (Planta), Dictyostelium discoideum (Protoctista), and Halobacterium volcanii (Monera). Although the secondary structure of the dinoflagellate 17S rRNA presents most of the eukaryotic characteristics, it contains sufficient archaeobacterial-like structural features to reinforce the view that dinoflagellates branch off very early from the eukaryotic lineage. PMID:16578795
The ank gene of the agent of human granulocytic ehrlichiosis (HGE) codes for a protein with a predicted molecular size of 131.2 kDa that is recognized by serum from both dogs and humans infected with granulocytic ehrlichiae. As part of an effort to assess the phylogenetic relatedness of granulocytic ehrlichiae from different geographic regions and in different host species, the ank gene was PCR amplified and sequenced from a variety of sources. These included 10 blood specimens from patients ...
Harikrishnan, Srilakshmy L; Pucholt, Pascal; Berlin, Sofia
Whole genome duplications (WGD) have had strong impacts on species diversification by triggering evolutionary novelties, however, relatively little is known about the balance between gene loss and forces involved in the retention of duplicated genes originating from a WGD. We analyzed putative Salicoid duplicates in willows, originating from the Salicoid WGD, which took place more than 45 Mya. Contigs were constructed by de novo assembly of RNA-seq data derived from leaves and roots from two genotypes. Among the 48,508 contigs, 3,778 pairs were, based on fourfold synonymous third-codon transversion rates and syntenic positions, predicted to be Salicoid duplicates. Both copies were in most cases expressed in both tissues and 74% were significantly differentially expressed. Mean Ka/Ks was 0.23, suggesting that the Salicoid duplicates are evolving by purifying selection. Gene Ontology enrichment analyses showed that functions related to DNA- and nucleic acid binding were over-represented among the non-differentially expressed Salicoid duplicates, while functions related to biosynthesis and metabolism were over-represented among the differentially expressed Salicoid duplicates. We propose that the differentially expressed Salicoid duplicates are regulatory neo- and/or subfunctionalized, while the non-differentially expressed are dose sensitive, hence, functionally conserved. Multiple evolutionary processes, thus drive the retention of Salicoid duplicates in willows.
Montie, S. Kadis, and S. I. Ajl (ed.), Microbial toxins, vol. 3. Academic Press, Inc., New York. 23. Little, S. F., and G. B. Knudaon. 1986...Takkinen, and L. Kaariainen. 1981. Nucleotide sequence of the promoter and NHa-terminal signal peptide region of the a- amylase gene from Bacillus
Wang, Shuqiang; Shen, Yanyan; Hu, Jinxing
Quantitative models of gene regulatory activity have the potential to improve our mechanistic understanding of transcriptional regulation. However, the few models available today have been based on simplistic assumptions about the sequences being modeled or heuristic approximations of the underlying regulatory mechanisms. In this work, we have developed a thermodynamics-based model to predict gene expression driven by any DNA sequence. The proposed model relies on a continuous time, differential equation description of transcriptional dynamics. The sequence features of the promoter are exploited to derive the binding affinity which is derived based on statistical molecular thermodynamics. Experimental results show that the proposed model can effectively identify the activity levels of transcription factors and the regulatory parameters. Comparing with the previous models, the proposed model can reveal more biological sense.
Md Abul Hassan Samee
Full Text Available Modeling a gene's expression from its intergenic locus and trans-regulatory context is a fundamental goal in computational biology. Owing to the distributed nature of cis-regulatory information and the poorly understood mechanisms that integrate such information, gene locus modeling is a more challenging task than modeling individual enhancers. Here we report the first quantitative model of a gene's expression pattern as a function of its locus. We model the expression readout of a locus in two tiers: 1 combinatorial regulation by transcription factors bound to each enhancer is predicted by a thermodynamics-based model and 2 independent contributions from multiple enhancers are linearly combined to fit the gene expression pattern. The model does not require any prior knowledge about enhancers contributing toward a gene's expression. We demonstrate that the model captures the complex multi-domain expression patterns of anterior-posterior patterning genes in the early Drosophila embryo. Altogether, we model the expression patterns of 27 genes; these include several gap genes, pair-rule genes, and anterior, posterior, trunk, and terminal genes. We find that the model-selected enhancers for each gene overlap strongly with its experimentally characterized enhancers. Our findings also suggest the presence of sequence-segments in the locus that would contribute ectopic expression patterns and hence were "shut down" by the model. We applied our model to identify the transcription factors responsible for forming the stripe boundaries of the studied genes. The resulting network of regulatory interactions exhibits a high level of agreement with known regulatory influences on the target genes. Finally, we analyzed whether and why our assumption of enhancer independence was necessary for the genes we studied. We found a deterioration of expression when binding sites in one enhancer were allowed to influence the readout of another enhancer. Thus, interference
Mark A Chapman
Full Text Available Analyses aimed at identifying genes that have been targeted by past selection provide a powerful means for investigating the molecular basis of adaptive differentiation. In the case of crop plants, such studies have the potential to not only shed light on important evolutionary processes, but also to identify genes of agronomic interest. In this study, we test for evidence of positive selection at the DNA sequence level in a set of candidate genes previously identified in a genome-wide scan for genotypic evidence of selection during the evolution of cultivated sunflower. In the majority of cases, we were able to confirm the effects of selection in shaping diversity at these loci. Notably, the genes that were found to be under selection via our sequence-based analyses were devoid of variation in the cultivated sunflower gene pool. This result confirms a possible strategy for streamlining the search for adaptively-important loci process by pre-screening the derived population to identify the strongest candidates before sequencing them in the ancestral population.
Marrs, C F; Schoolnik, G; Koomey, J M; Hardy, J; Rothbard, J; Falkow, S
Moraxella bovis pili have been shown to play a major role in both infectivity and protective immunity of bovine infectious keratoconjunctivitis. Sonicated M. bovis DNA from the piliated strain EPP63 was inserted into the vector lambda gt11 with EcoRI linkers. Recombinant phage were screened with an oligonucleotide probe based on the amino-terminal portion of the DNA sequence of a Neisseria gonorrhoeae pilin gene. Two candidate phages produced a protein that comigrated with EPP63 beta pilin in sodium dodecyl sulfate-polyacrylamide gels and bound anti-pilus antisera. The 1.9-kilobase insert from one of these, lambda gt11M182, was subcloned in both orientations into pBR322, forming the plasmids pMxB7 and pMxB9, both of which produced beta pilin, as did pMxB12, a HindIII deletion derivative of pMxB7. In HB101(pMxB12), the M. bovis pilin protein was shown to be primarily localized in the inner membrane. The entire 939-base-pair insert of pMxB12 was sequenced, revealing a ribosome binding site just upstream of the coding region and an AT-rich region further upstream containing some potential RNA polymerase recognition sites. The translation of the sequence predicts a six-amino-acid leader sequence preceding the phenylalanine that begins the mature protein. Codon usage analysis of the M. bovis beta pilin gene revealed greater use of the CUA codon for leucine than usual for a well-expressed Escherichia coli gene. Comparisons of the M. bovis EPP63 beta pilin protein sequence with other pilin gene sequences are presented.
Full Text Available Abstract Background Genomic imprinting is an evolutionary conserved mechanism of epigenetic gene regulation in placental mammals that results in silencing of one of the parental alleles. In order to decipher interactions between allele-specific DNA methylation of imprinted genes and evolutionary conservation, we performed a genome-wide comparative investigation of genomic sequences and highly conserved elements of imprinted genes in human and mouse. Results Evolutionarily conserved elements in imprinted regions differ from those associated with autosomal genes in various ways. Whereas for maternally expressed genes strong divergence of protein-encoding sequences is most prominent, paternally expressed genes exhibit substantial conservation of coding and noncoding sequences. Conserved elements in imprinted regions are marked by enrichment of CpG dinucleotides and low (TpG+CpA/(2·CpG ratios indicate reduced CpG deamination. Interestingly, paternally and maternally expressed genes can be distinguished by differences in G+C and CpG contents that might be associated with unusual epigenetic features. Especially noncoding conserved elements of paternally expressed genes are exceptionally G+C and CpG rich. In addition, we confirmed a frequent occurrence of intronic CpG islands and observed a decelerated degeneration of ancient LINE-1 repeats. We also found a moderate enrichment of YY1 and CTCF binding sites in imprinted regions and identified several short sequence motifs in highly conserved elements that might act as additional regulatory elements. Conclusions We discovered several novel conserved DNA features that might be related to allele-specific DNA methylation. Our results hint at reduced CpG deamination rates in imprinted regions, which affects mostly noncoding conserved elements of paternally expressed genes. Pronounced differences between maternally and paternally expressed genes imply specific modes of evolution as a result of differences in
Leung, Carl; Palmer, Richard E [Nanoscale Physics Research Laboratory, School of Physics and Astronomy, University of Birmingham, Edgbaston, Birmingham B15 2TT (United Kingdom)], E-mail: firstname.lastname@example.org
Understanding and controlling protein adsorption on surfaces is fundamental to many biological processes ranging from cell adhesion to the fabrication of protein biochips. In general, proteins need to retain their 3D conformation to perform their intended functions. However, when they are presented with a solid surface, complex interactions ranging from weak non-covalent binding to strong covalent bonding may occur, which can potentially induce conformational changes within the adsorbed protein. To investigate the surface adsorption process and its effects on a model protein, the chaperonin GroEL, we have applied contact mode atomic force microscopy, in buffer solution to probe the interactions between single proteins and surfaces in real space. We will discuss the adsorption of GroEL molecules on planar surfaces (mica, graphite and gold) and specifically tailored nanostructured surfaces, which present structural features on the size scale of individual biological molecules. (topical review)
Koike-Takeshita, Ayumi; Mitsuoka, Kaoru; Taguchi, Hideki
The Escherichia coli chaperonin GroEL is a double-ring chaperone that assists protein folding with the aid of GroES and ATP. Asp-398 in GroEL is known as one of the critical residues on ATP hydrolysis because GroEL(D398A) mutant is deficient in ATP hydrolysis (Asp-52 in the corresponding E. coli GroEL, in addition to Asp-398 is also important for ATP hydrolysis. We investigated the role of Asp-52 in GroEL and found that ATPase activity of GroEL(D52A) and GroEL(D52A/D398A) mutants was ∼ 20% and Asp-52 in E. coli GroEL is also involved in the ATP hydrolysis. GroEL(D52A/D398A) formed a symmetric football-shaped GroEL-GroES complex in the presence of ATP, again confirming the importance of the symmetric complex during the GroEL ATPase cycle. Notably, the symmetric complex of GroEL(D52A/D398A) was extremely stable, with a half-time of ∼ 150 h (∼ 6 days), providing a good model to characterize the football-shaped complex. © 2014 by The American Society for Biochemistry and Molecular Biology, Inc.
Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P
Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.
Trifonov, E. N.
Only about 1/20 of DNA of higher organisms codes for proteins, by means of classical triplet code. The rest of DNA sequences is largely silent, with unclear functions, if any. The triplet code is not the only code (message) carried by the sequences. There are three levels of molecular communication, where the same sequence ``talks'' to various bimolecules, while having, respectively, three different appearances: DNA, RNA and protein. Since the molecular structures and, hence, sequence specific preferences of these are substantially different, the original DNA sequence has to carry simultaneously three types of sequence patterns (codes, messages), thus, being a composite structure in which one had the same letter (nucleotide) is frequently involved in several overlapping codes of different nature. This multiplicity and overlapping of the codes is a unique feature of the Gnomic, language of genetic sequences. The coexisting codes have to be degenerate in various degrees to allow an optimal and concerted performance of all the encoded functions. There is an obvious conflict between the best possible performance of a given function and necessity to compromise the quality of a given sequence pattern in favor of other patterns. It appears that the major role of various changes in the sequences on their ``ontogenetic'' way from DNA to RNA to protein, like RNA editing and splicing, or protein post-translational modifications is to resolve such conflicts. New data are presented strongly indicating that the gene splicing is such a device to resolve the conflict between the code of DNA folding in chromatin and the triplet code for protein synthesis.
Edberg Jeffrey C
Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.
WANG Qigui; ZHANG Lei; HU Xiaoxiang; FAN Baoliang; LI Ning; LI Hui; WU Changxin
Duck prion gene was cloned and sequenced. Similar to mammalian prion protein (PrP), duck prion is encoded by a single exon of a single copy in genome, which was confirmed by Southern blot analysis. All of the structural features of mammalian PrP were also identified in the duck PrP. Compared with mammalian PrP, it exhibited a 30 % of general similarity. When compared with chicken PrP, it showed a higher homology of 97%. A phylogenetic tree was constructed to trace evolution of prion gene in animals.
Wu, Jia Qian; Shteynberg, David; Arumugam, Manimozhiyan
The publication of a draft sequence of a third mammalian genome--that of the rat--suggests a need to rethink genome annotation. New mammalian sequences will not receive the kind of labor-intensive annotation efforts that are currently being devoted to human. In this paper, we demonstrate...... an alternative approach: reverse transcription-polymerase chain reaction (RT-PCR) and direct sequencing based on dual-genome de novo predictions from TWINSCAN. We tested 444 TWINSCAN-predicted rat genes that showed significant homology to known human genes implicated in disease but that were partially...
Moraxella bovis pili have been shown to play a major role in both infectivity and protective immunity of bovine infectious keratoconjunctivitis. Sonicated M. bovis DNA from the piliated strain EPP63 was inserted into the vector lambda gt11 with EcoRI linkers. Recombinant phage were screened with an oligonucleotide probe based on the amino-terminal portion of the DNA sequence of a Neisseria gonorrhoeae pilin gene. Two candidate phages produced a protein that comigrated with EPP63 beta pilin in...
Full Text Available Spider silk includes seven protein based fibers and glue-like substances produced by glands in the spider's abdomen. Minor ampullate silk is used to make the auxiliary spiral of the orb-web and also for wrapping prey, has a high tensile strength and does not supercontract in water. So far, only partial cDNA sequences have been obtained for minor ampullate spidroins (MiSps. Here we describe the first MiSp full-length gene sequence from the spider species Araneus ventricosus, using a multidimensional PCR approach. Comparative analysis of the sequence reveals regulatory elements, as well as unique spidroin gene and protein architecture including the presence of an unusually large intron. The spliced full-length transcript of MiSp gene is 5440 bp in size and encodes 1766 amino acid residues organized into conserved nonrepetitive N- and C-terminal domains and a central predominantly repetitive region composed of four units that are iterated in a non regular manner. The repeats are more conserved within A. ventricosus MiSp than compared to repeats from homologous proteins, and are interrupted by two nonrepetitive spacer regions, which have 100% identity even at the nucleotide level.
Naik, Subhashchandra; Haque, Inamul; Degner, Nick; Kornilayev, Boris; Bomhoff, Gregory; Hodges,Jacob; Khorassani, Ara-Azad; Katayama, Hiroo; Morris, Jill; Kelly, Jeffery; Seed, John; Fisher, Mark T.
Over the past five years, it has become increasingly apparent to researchers that the initial promise and excitement of using gene replacement therapies to ameliorate folding diseases are still far from being broadly or easily applicable. Because a large number of human diseases are protein folding diseases (~30 to 50%), many researchers now realize that more directed approaches to target and reverse the fundamental misfolding reactions preceding disease are highly feasible and offer the pote...
Patra, Monobesh; Roy, Sourav Singha; Dasgupta, Rakhi; Basu, Tarakdas
The stability of heat-shock transcription factor σ(32) in Escherichia coli has long been known to be modulated only by its own transcribed chaperone DnaK. Very few reports suggest a role for another heat-shock chaperone, GroEL, for maintenance of cellular σ(32) level. The present study demonstrates in vivo physical association between GroEL and σ(32) in E. coli at physiological temperature. This study further reveals that neither DnaK nor GroEL singly can modulate σ(32) stability in vivo; there is an ordered network between them, where GroEL acts upstream of DnaK.
Marcelo Bento Soares
In previous years, with support from the U.S. Department of Energy, we developed methods for construction of normalized and subtracted cDNA libraries, and constructed hundreds of high-quality libraries for production of Expressed Sequence Tags (ESTs). Our clones were made widely available to the scientific community through the IMAGE Consortium, and millions of ESTs were produced from our libraries either by collaborators or by our own sequencing laboratory at the University of Iowa. During this grant period, we focused on (1) the development of a method for preferential cloning of tissue-specific and/or rare transcripts, (2) its utilization to expedite EST-based gene discovery for the NIH Mouse Brain Molecular Anatomy Project, (3) further development and optimization of a method for construction of full-length-enriched cDNA libraries, and (4) modification of a plasmid vector to maximize efficiency of full-length cDNA sequencing by the transposon-mediated approach. It is noteworthy that the technology developed for preferential cloning of rare mRNAs enabled identification of over 2,000 mouse transcripts differentially expressed in the hippocampus. In addition, the method that we optimized for construction of full-length-enriched cDNA libraries was successfully utilized for the production of approximately fifty libraries from the developing mouse nervous system, from which over 2,500 full-ORF-containing cDNAs have been identified and accurately sequenced in their entirety either by our group or by the NIH-Mammalian Gene Collection Program Sequencing Team.
Yin-Long QIU; Zhi-Duan CHEN; Libo LI; Bin WANG; Jia-Yu XUE; Tory A. HENDRY; Rui-Qi LI; Joseph W. BROWN; Yang LIU; Geordan T. HUDSON
An angiosperm phylogeny was reconstructed in a maximum likelihood analysis of sequences of four mitochondrial genes, atpl, matR, had5, and rps3, from 380 species that represent 376 genera and 296 families of seed plants. It is largely congruent with the phylogeny of angiosperms reconstructed from chloroplast genes atpB, matK, and rbcL, and nuclear 18S rDNA. The basalmost lineage consists of Amborella and Nymphaeales (including Hydatellaceae). Austrobaileyales follow this clade and are sister to the mesangiosperms, which include Chloranthaceae, Ceratophyllum, magnoliids, monocots, and eudicots. With the exception of Chloranthaceae being sister to Ceratophyllum, relationships among these five lineages are not well supported. In eudicots, Ranunculales, Sabiales, Proteales, Trochodendrales, Buxales, Gunnerales, Saxifragales, Vitales, Berberidopsidales, and Dilleniales form a basal grade of lines that diverged before the diversification of rosids and asterids. Within rosids, the COM (Celastrales-Oxalidales-Malpighiales) clade is sister to malvids (or rosid Ⅱ), instead of to the nitrogen-fixing clade as found in all previous large-scale molecular analyses of angiosperms. Santalales and Caryophyllales are members of an expanded asterid clade. This study shows that the mitochondrial genes are informative markers for resolving relationships among genera, families, or higher rank taxa across angiosperms. The low substitution rates and low homoplasy levels of the mitochondrial genes relative to the chloroplast genes, as found in this study, make them particularly useful for reconstructing ancient phylogenetic relationships. A mitochondrial gene-based angiosperm phylogeny provides an independent and essential reference for comparison with hypotheses of angiosperm phylogeny based on chloroplast genes, nuclear genes, and non-molecular data to reconstruct the underlying organismal phylogeny.
M. Ananda Chitra
Full Text Available Background: Staphylococcus pseudintermedius (SP is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objective of this study was to detect and sequence analyzing the AgrA, B, and D of SP isolated from canine skin infections. Materials and Methods: In this study, we have isolated and identified SP from canine pyoderma and otitis cases by polymerase chain reaction (PCR and confirmed by PCR-restriction fragment length polymorphism. Primers for SP agrA and agrBD genes were designed using online primer designing software and BLAST searched for its specificity. Amplification of the agr genes was carried out for 53 isolates of SP by PCR and sequencing of agrA, B, and D were carried out for five isolates and analyzed using DNAstar and Mega5.2 software. Results: A total of 53 (59% SP isolates were obtained from 90 samples. 15 isolates (28% were confirmed to be methicillinresistant SP (MRSP with the detection of the mecA gene. Accessory gene regulator A, B, and D genes were detected in all the SP isolates. Complete nucleotide sequences of the above three genes for five isolates were submitted to GenBank, and their accession numbers are from KJ133557 to KJ133571. AgrA amino acid sequence analysis showed that it is mainly made of alpha-helices and is hydrophilic in nature. AgrB is a transmembrane protein, and AgrD encodes the precursor of the autoinducing peptide (AIP. Sequencing of the agrD gene revealed that the 5 canine SP strains tested could be divided into three Agr specificity groups (RIPTSTGFF, KIPTSTGFF, and RIPISTGFF based on the putative AIP produced by each strain
Poinar, Hendrik; Kuch, Melanie; McDonald, Gregory; Martin, Paul; Pääbo, Svante
The determination of nuclear DNA sequences from ancient remains would open many novel opportunities such as the resolution of phylogenies, the sexing of hominid and animal remains, and the characterization of genes involved in phenotypic traits. However, to date, single-copy nuclear DNA sequences from fossils have been determined only from bones and teeth of woolly mammoths preserved in the permafrost. Since the best preserved ancient nucleic acids tend to stem from cold environments, this has led to the assumption that nuclear DNA would be retrievable only from frozen remains. We have previously shown that Pleistocene coprolites stemming from the extinct Shasta sloth (Nothrotheriops shastensis, Megatheriidae) contain mitochondrial (mt) DNA from the animal that produced them as well as chloroplast (cp) DNA from the ingested plants. Recent attempts to resolve the phylogeny of two families of extinct sloths by using strictly mitochondrial DNA has been inconclusive. We have prepared DNA extracts from a ground sloth coprolite from Gypsum Cave, Nevada, and quantitated the number of mtDNA copies for three different fragment lengths by using real-time PCR. We amplified one multicopy and three single-copy nuclear gene fragments and used the concatenated sequence to resolve the phylogeny. These results show that ancient single-copy nuclear DNA can be recovered from warm, arid climates. Thus, nuclear DNA preservation is not restricted to cold climates.
ZHAO Yan; WANG Jun-wei; MA Bo; ZHAO Xiao-yan
In this paper, a 1,860 bp sequence in IRs region of duck enteritis virus(DEV)was amplified by single oligonucleotide nested PCR with a single primer designed according to partial sequence of USI and then a pair of primers designed according to the 3' UTR of US8 gene and 5'end of the new getting sequence were used to amplify a 2,426 bp sequence toward the TRs region.Sequence analysis revealed that the both sequences contained an identical 990 bp open reading frame of DEV US1 gene.The two ORFs were in opposite transcription orientation.Sequence comparison of the nucleotide sequence and the deduced amino acid sequence of US1 gene showed relatively high identity to Mardivirus.Phylogenetic tree analysis showed that the eleven herpesviruses viruses were classified into three groups, and the duck enteritis virus was most closely related to Mardivirus.
Naik, Subhashchandra; Haque, Inamul; Degner, Nick; Kornilayev, Boris; Bomhoff, Gregory; Hodges, Jacob; Khorassani, Ara-Azad; Katayama, Hiroo; Morris, Jill; Kelly, Jeffery; Seed, John; Fisher, Mark T.
Over the past five years, it has become increasingly apparent to researchers that the initial promise and excitement of using gene replacement therapies to ameliorate folding diseases are still far from being broadly or easily applicable. Because a large number of human diseases are protein folding diseases (~30 to 50%), many researchers now realize that more directed approaches to target and reverse the fundamental misfolding reactions preceding disease are highly feasible and offer the potential of developing more targeted drug therapies. This is also true with a large number of so called “orphan protein folding diseases”. The development of a broad-based general screening array method using the chaperonin as a detection platform will enable us to screen large chemical combinatorial libraries for specific ligands against the elusive transient, primary reactions that often lead to protein misfolding. This development will provide a highly desirable tool for the pharmaceutical, academic and medical professions. PMID:19802819
Xu, Shi Xia; Ren, Wen Hua; Li, Shu Zhen; Wei, Fu Wen; Zhou, Kai Ya; Yang, Guang
Sequence variability at three major histocompatibility complex (MHC) genes (DQB, DRA, and MHC-I) of cetaceans was investigated in order to get an overall understanding of cetacean MHC evolution. Little sequence variation was detected at the DRA locus, while extensive and considerable variability were found at the MHC-I and DQB loci. Phylogenetic reconstruction and sequence comparison revealed extensive sharing of identical MHC alleles among different species at the three MHC loci examined. Comparisons of phylogenetic trees for these MHC loci with the trees reconstructed only based on non-PBR sites revealed that allelic similarity/identity possibly reflected common ancestry and were not due to adaptive convergence. At the same time, trans-species evolution was also evidenced that the allelic diversity of the three MHC loci clearly pre-dated species divergence events according to the relaxed molecular clock. It may be the forces of balancing selection acting to maintain the high sequence variability and identical alleles in trans-specific manner at the MHC-I and DQB loci.
Full Text Available The mining of periodic patterns in dengue database is an interesting research problem that can be used for predicting the future evolution of dengue viruses. In this paper, we propose an algorithm called Recurrence Finder (RECFIN that uses the suffix tree for detecting the periodic patterns of dengue gene sequence. Also, the RECFIN finds the presence of palindrome which indicates the possibilities of formation of proteins. Further, this paper computes the periodicity of nucleic acid and amino acid sequences of any length. The periodicity based association rules are used to diagnose the type of dengue. The time complexity of the proposed algorithm is O(n2. We demonstrate the effectiveness of the proposed approach by comparing the experimental results performed on dengue virus serotypes dataset with NCBI-BLAST algorithm.
Olson, Nathan D; Lund, Steven P; Zook, Justin M; Rojas-Cornejo, Fabiola; Beck, Brian; Foy, Carole; Huggett, Jim; Whale, Alexandra S; Sui, Zhiwei; Baoutina, Anna; Dobeson, Michael; Partis, Lina; Morrow, Jayne B
This study presents the results from an interlaboratory sequencing study for which we developed a novel high-resolution method for comparing data from different sequencing platforms for a multi-copy, paralogous gene. The combination of PCR amplification and 16S ribosomal RNA gene (16S rRNA) sequencing has revolutionized bacteriology by enabling rapid identification, frequently without the need for culture. To assess variability between laboratories in sequencing 16S rRNA, six laboratories sequenced the gene encoding the 16S rRNA from Escherichia coli O157:H7 strain EDL933 and Listeria monocytogenes serovar 4b strain NCTC11994. Participants performed sequencing methods and protocols available in their laboratories: Sanger sequencing, Roche 454 pyrosequencing(®), or Ion Torrent PGM(®). The sequencing data were evaluated on three levels: (1) identity of biologically conserved position, (2) ratio of 16S rRNA gene copies featuring identified variants, and (3) the collection of variant combinations in a set of 16S rRNA gene copies. The same set of biologically conserved positions was identified for each sequencing method. Analytical methods using Bayesian and maximum likelihood statistics were developed to estimate variant copy ratios, which describe the ratio of nucleotides at each identified biologically variable position, as well as the likely set of variant combinations present in 16S rRNA gene copies. Our results indicate that estimated variant copy ratios at biologically variable positions were only reproducible for high throughput sequencing methods. Furthermore, the likely variant combination set was only reproducible with increased sequencing depth and longer read lengths. We also demonstrate novel methods for evaluating variable positions when comparing multi-copy gene sequence data from multiple laboratories generated using multiple sequencing technologies.
Yagi, Ryoichi; Miyamoto, Ryosuke; Morino, Hiroyuki; Izumi, Yuishin; Kuramochi, Masahito; Kurashige, Takashi; Maruyama, Hirofumi; Mizuno, Noriyoshi; Kurihara, Hidemi; Kawakami, Hideshi
Alzheimer's disease (AD) is the most common form of dementia. To date, several genes have been identified as the cause of AD, including PSEN1, PSEN2, and APP. The association between APOE and late-onset AD has also been reported. We here used a bench top next-generation sequencer, which uses an integrated semiconductor device, detects hydrogen ions, and operates at a high-speed using nonoptical technology. We examined 45 Japanese AD patients with positive family histories, and 29 sporadic patients with early onset (useful for detecting genetic variations in familial AD.
Nierychlo, Marta; Larsen, Poul; Jørgensen, Mads Koustrup
S rRNA gene amplicon sequencing has been developed over the past few years and is now ready to use for more comprehensive studies related to plant operation and optimization thanks to short analysis time, low cost, high throughput, and high taxonomic resolution. In this study we show how 16S r...... to the presence of filamentous microorganisms was monitored weekly over 4 months. Microthrix was identified as a causative filament and suitable control measures were introduced. The level of Microthrix was reduced after 1-2 months but a number of other filamentous species were still present, with most of them...
Full Text Available Abstract Background An important consideration when analyzing both microarray and quantitative PCR expression data is the selection of appropriate genes as endogenous controls or reference genes. This step is especially critical when identifying genes differentially expressed between datasets. Moreover, reference genes suitable in one context (e.g. lung cancer may not be suitable in another (e.g. breast cancer. Currently, the main approach to identify reference genes involves the mining of expression microarray data for highly expressed and relatively constant transcripts across a sample set. A caveat here is the requirement for transcript normalization prior to analysis, and measurements obtained are relative, not absolute. Alternatively, as sequencing-based technologies provide digital quantitative output, absolute quantification ensues, and reference gene identification becomes more accurate. Methods Serial analysis of gene expression (SAGE profiles of non-malignant and malignant lung samples were compared using a permutation test to identify the most stably expressed genes across all samples. Subsequently, the specificity of the reference genes was evaluated across multiple tissue types, their constancy of expression was assessed using quantitative RT-PCR (qPCR, and their impact on differential expression analysis of microarray data was evaluated. Results We show that (i conventional references genes such as ACTB and GAPDH are highly variable between cancerous and non-cancerous samples, (ii reference genes identified for lung cancer do not perform well for other cancer types (breast and brain, (iii reference genes identified through SAGE show low variability using qPCR in a different cohort of samples, and (iv normalization of a lung cancer gene expression microarray dataset with or without our reference genes, yields different results for differential gene expression and subsequent analyses. Specifically, key established pathways in lung
Júlio C. Borges
Full Text Available Some newly synthesized proteins require the assistance of molecular chaperones for their correct folding. Chaperones are also involved in the dissolution of protein aggregates making their study significant for both biotechnology and medicine and the identification of chaperones and stress-related protein sequences in different organisms is an important task. We used bioinformatic tools to investigate the information generated by the Sugarcane Expressed Sequence Tag (SUCEST genome project in order to identify and annotate molecular chaperones. We considered that the SUCEST sequences belonged to this category of proteins when their E-values were lower than 1.0e-05. Our annotation shows that 4,164 of the 5’ expressed sequence tag (EST sequences were homologous to molecular chaperones, nearly 1.8% of all the 5’ ESTs sequenced during the SUCEST project. About 43% of the chaperones which we found were Hsp70 chaperones and its co-chaperones, 10% were Hsp90 chaperones and 13% were peptidyl-prolyl cis, trans isomerase. Based on the annotation results we predicted 156 different chaperone gene subclasses in the sugarcane genome. Taken together, our results indicate that genes which encode chaperones were diverse and abundantly expressed in sugarcane cells, which emphasizes their biological importance.Algumas proteínas ao serem sintetizadas necessitam do auxílio de chaperones moleculares para seu correto enovelamento. Chaperones também estão envolvidas na dissolução de agregados protéicos, fazendo com que seu estudo seja de relevância biotecnológica e médica. Portanto, a identificação de seqüências de chaperones moleculares é uma tarefa importante. Nós usamos ferramentas de bioinformática para procurar informações geradas pelo sugarcane EST Genome Project (SUCEST a fim de identificar e anotar chaperones e proteínas relacionas ao estresse. As seqüências do SUCEST eram anotadas como pertencentes a uma categoria de proteínas se o E
Andrew M. Lynn; Chakresh Kumar Jain; K. Kosalai; Pranjan Barman; Nupur Thakur; Harish Batra; Alok Bhattacharya
Genomic sequence data are often available well before the annotated sequence is published. We present a method for analysis of genomic DNA to identify coding sequences using the GeneScan algorithm and characterize these resultant sequences by BLAST. The routines are used to develop a system for automated annotation of genome DNA sequences.
Xue, Yuan; Ankala, Arunkanth; Wilcox, William R; Hegde, Madhuri R
Next-generation sequencing is changing the paradigm of clinical genetic testing. Today there are numerous molecular tests available, including single-gene tests, gene panels, and exome sequencing or genome sequencing. As a result, ordering physicians face the conundrum of selecting the best diagnostic tool for their patients with genetic conditions. Single-gene testing is often most appropriate for conditions with distinctive clinical features and minimal locus heterogeneity. Next-generation sequencing-based gene panel testing, which can be complemented with array comparative genomic hybridization and other ancillary methods, provides a comprehensive and feasible approach for heterogeneous disorders. Exome sequencing and genome sequencing have the advantage of being unbiased regarding what set of genes is analyzed, enabling parallel interrogation of most of the genes in the human genome. However, current limitations of next-generation sequencing technology and our variant interpretation capabilities caution us against offering exome sequencing or genome sequencing as either stand-alone or first-choice diagnostic approaches. A growing interest in personalized medicine calls for the application of genome sequencing in clinical diagnostics, but major challenges must be addressed before its full potential can be realized. Here, we propose a testing algorithm to help clinicians opt for the most appropriate molecular diagnostic tool for each scenario.
Zhang, Yijuan; Akintola, Oluwafemi S; Liu, Ken J A; Sun, Bingyun
Microarray (MA) and high-throughput sequencing are two commonly used detection systems for global gene expression profiling. Although these two systems are frequently used in parallel, the differences in their final results have not been examined thoroughly. Transcriptomic analysis of housekeeping (HK) genes provides a unique opportunity to reliably examine the technical difference between these two systems. We investigated here the structure, genome location, expression quantity, microarray probe coverage, as well as biological functions of differentially identified human HK genes by 9 MA and 6 sequencing studies. These in-depth analyses allowed us to discover, for the first time, a subset of transcripts encoding membrane, cell surface and nuclear proteins that were prone to differential identification by the two platforms. We hope that the discovery can aid the future development of these technologies for comprehensive transcriptomic studies. Copyright © 2015 Elsevier B.V. All rights reserved.
Full Text Available Abstract Background Although the extent of horizontal gene transfer (HGT in complete genomes has been widely studied, its influence in the evolution of natural communities of prokaryotes remains unknown. The availability of metagenomic sequences allows us to address the study of global patterns of prokaryotic evolution in samples from natural communities. However, the methods that have been commonly used for the study of HGT are not suitable for metagenomic samples. Therefore it is important to develop new methods or to adapt existing ones to be used with metagenomic sequences. Results We have created two different methods that are suitable for the study of HGT in metagenomic samples. The methods are based on phylogenetic and DNA compositional approaches, and have allowed us to assess the extent of possible HGT events in metagenomes for the first time. The methods are shown to be compatible and quite precise, although they probably underestimate the number of possible events. Our results show that the phylogenetic method detects HGT in between 0.8% and 1.5% of the sequences, while DNA compositional methods identify putative HGT in between 2% and 8% of the sequences. These ranges are very similar to these found in complete genomes by related approaches. Both methods act with a different sensitivity since they probably target HGT events of different ages: the compositional method mostly identifies recent transfers, while the phylogenetic is more suitable for the detections of older events. Nevertheless, the study of the number of HGT events in metagenomic sequences from different communities shows a consistent trend for both methods: the lower amount is found for the sequences of the Sargasso Sea metagenome, while the higher quantity is found in the whale fall metagenome from the bottom of the ocean. The significance of these observations is discussed. Conclusion The computational approaches that are used to find possible HGT events in complete
Yilmaz, Pelin; Kottmann, Renzo; Field, Dawn
Here we present a standard developed by the Genomic Standards Consortium (GSC) for reporting marker gene sequences--the minimum information about a marker gene sequence (MIMARKS). We also introduce a system for describing the environment from which a biological sample originates. The 'environmental...... present the minimum information about any (x) sequence (MIxS). Adoption of MIxS will enhance our ability to analyze natural genetic diversity documented by massive DNA sequencing efforts from myriad ecosystems in our ever-changing biosphere....
Ferrario, Chiara; Borgo, Francesca; de Las Rivas, Blanca; Muñoz, Rosario; Ricci, Giovanni; Fortina, Maria Grazia
The histidine decarboxylase gene cluster of Morganella morganii DSM30146(T) was sequenced, and four open reading frames, named hdcT1, hdc, hdcT2, and hisRS were identified. Two putative histidine/histamine antiporters (hdcT1 and hdcT2) were located upstream and downstream the hdc gene, codifying a pyridoxal-P dependent histidine decarboxylase, and followed by hisRS gene encoding a histidyl-tRNA synthetase. This organization was comparable with the gene cluster of other known Gram negative bacteria, particularly with that of Klebsiella oxytoca. Recombinant Escherichia coli strains harboring plasmids carrying the M. morganii hdc gene were shown to overproduce histidine decarboxylase, after IPTG induction at 37 °C for 4 h. Quantitative RT-PCR experiments revealed the hdc and hisRS genes were highly induced under acidic and histidine-rich conditions. This work represents the first description and identification of the hdc-related genes in M. morganii. Results support the hypothesis that the histidine decarboxylation reaction in this prolific histamine producing species may play a role in acid survival. The knowledge of the role and the regulation of genes involved in histidine decarboxylation should improve the design of rational strategies to avoid toxic histamine production in foods.
Barbier, P; Ishihama, A
Variation in the DNA sequence of the 10 kDa prolamin gene family within the wild rice species Oryza rufipogon was probed using the direct sequencing of PCR-amplified genes. A comparison of the nucleotide and deduced amino-acid sequences of eight Asian strains of O. rufipogon and one strain of the related African species O. longistaminata is presented.
: Hemoglobin-y gene of channel catfish , lctalurus punctatus, was cloned and sequenced . Total RNA from head kidneys was isolated, reverse transcribed and amplified . The sequence of the channel catfish hemoglobin-y gene consists of 600 nucleotides . Analysis of the nucleotide sequence reveals one o...
Ji, Yan-shan; Johnson, Betty H; Webb, M Scott; Thompson, E Brad
DBD* is a novel gene encoding an 89 amino acid peptide that is constitutively lethal to leukemic cells. DBD* was derived from the DNA binding domain of the human glucocorticoid receptor by a frameshift that replaces the final 21 C-terminal amino acids of the domain. Previous studies suggested that DBD* no longer acted as the natural DNA binding domain. To confirm and extend these results, we mutated DBD* in 29 single amino acid positions, critical for the function in the native domain or of possible functional significance in the novel 21 amino acid C-terminal sequence. Steroid-resistant leukemic ICR-27-4 cells were transiently transfected by electroporation with each of the 29 mutants. Cell kill was evaluated by trypan blue dye exclusion, a WST-1 tetrazolium-based assay for cell respiration, propidium iodide exclusion, and Hoechst 33258 staining of chromatin. Eleven of the 29 point mutants increased, whereas four decreased antileukemic activity. The remainder had no effect on activity. The nonconcordances between these effects and native DNA binding domain function strongly suggest that the lethality of DBD* is distinct from that of the glucocorticoid receptor. Transfections of fragments of DBD* showed that optimal activity localized to the sequence for its C-terminal 32 amino acids.
Rudd, K E
The EcoGene database provides a set of gene and protein sequences derived from the genome sequence of Escherichia coli K-12. EcoGene is a source of re-annotated sequences for the SWISS-PROT and Colibri databases. EcoGene is used for genetic and physical map compilations in collaboration with the Coli Genetic Stock Center. The EcoGene12 release includes 4293 genes. EcoGene12 differs from the GenBank annotation of the complete genome sequence in several ways, including (i) the revision of 706 predicted or confirmed gene start sites, (ii) the correction or hypothetical reconstruction of 61 frame-shifts caused by either sequence error or mutation, (iii) the reconstruction of 14 protein sequences interrupted by the insertion of IS elements, and (iv) pre-dictions that 92 genes are partially deleted gene fragments. A literature survey identified 717 proteins whose N-terminal amino acids have been verified by sequencing. 12 446 cross-references to 6835 literature citations and s are provided. EcoGene is accessible at a new website: http://bmb.med.miami.edu/EcoGene/EcoWeb. Users can search and retrieve individual EcoGene GenePages or they can download large datasets for incorporation into database management systems, facilitating various genome-scale computational and functional analyses.
Schloss, Patrick D; Jenior, Matthew L; Koumpouras, Charles C; Westcott, Sarah L; Highlander, Sarah K
Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina's MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT) DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3-V5, V1-V3, V1-V5, V1-V6, and V1-V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1-V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina's MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.
Patrick D. Schloss
Full Text Available Over the past 10 years, microbial ecologists have largely abandoned sequencing 16S rRNA genes by the Sanger sequencing method and have instead adopted highly parallelized sequencing platforms. These new platforms, such as 454 and Illumina’s MiSeq, have allowed researchers to obtain millions of high quality but short sequences. The result of the added sequencing depth has been significant improvements in experimental design. The tradeoff has been the decline in the number of full-length reference sequences that are deposited into databases. To overcome this problem, we tested the ability of the PacBio Single Molecule, Real-Time (SMRT DNA sequencing platform to generate sequence reads from the 16S rRNA gene. We generated sequencing data from the V4, V3–V5, V1–V3, V1–V5, V1–V6, and V1–V9 variable regions from within the 16S rRNA gene using DNA from a synthetic mock community and natural samples collected from human feces, mouse feces, and soil. The mock community allowed us to assess the actual sequencing error rate and how that error rate changed when different curation methods were applied. We developed a simple method based on sequence characteristics and quality scores to reduce the observed error rate for the V1–V9 region from 0.69 to 0.027%. This error rate is comparable to what has been observed for the shorter reads generated by 454 and Illumina’s MiSeq sequencing platforms. Although the per base sequencing cost is still significantly more than that of MiSeq, the prospect of supplementing reference databases with full-length sequences from organisms below the limit of detection from the Sanger approach is exciting.
Calò, Valentina; Bruno, Loredana; Paglia, Laura La; Perez, Marco; Margarese, Naomi [Department of Surgery and Oncology, Regional Reference Center for the Biomolecular Characterization and Genetic Screening of Hereditary Tumors, University of Palermo, Via del Vespro 127, 90127 Palermo (Italy); Gaudio, Francesca Di [Department of Medical Biotechnologies and Legal Medicine, University of Palermo, Palermo (Italy); Russo, Antonio, E-mail: email@example.com [Department of Surgery and Oncology, Regional Reference Center for the Biomolecular Characterization and Genetic Screening of Hereditary Tumors, University of Palermo, Via del Vespro 127, 90127 Palermo (Italy)
Germline mutations in BRCA1/2 genes are responsible for a large proportion of hereditary breast and/or ovarian cancers. Many highly penetrant predisposition alleles have been identified and include frameshift or nonsense mutations that lead to the translation of a truncated protein. Other alleles contain missense mutations, which result in amino acid substitution and intronic variants with splicing effect. The discovery of variants of uncertain/unclassified significance (VUS) is a result that can complicate rather than improve the risk assessment process. VUSs are mainly missense mutations, but also include a number of intronic variants and in-frame deletions and insertions. Over 2,000 unique BRCA1 and BRCA2 missense variants have been identified, located throughout the whole gene (Breast Cancer Information Core Database (BIC database)). Up to 10–20% of the BRCA tests report the identification of a variant of uncertain significance. There are many methods to discriminate deleterious/high-risk from neutral/low-risk unclassified variants (i.e., analysis of the cosegregation in families of the VUS, measure of the influence of the VUSs on the wild-type protein activity, comparison of sequence conservation across multiple species), but only an integrated analysis of these methods can contribute to a real interpretation of the functional and clinical role of the discussed variants. The aim of our manuscript is to review the studies on BRCA VUS in order to clarify their clinical relevance.
Degefu, Y.; Paulin, L.; Lübeck, Peter Stephensen
A gene encoding an endoxylanase from the phytopathogenic fungus Helminthosporium turcicum Pass. was cloned and sequenced. The entire nucleotide sequence of a 1991 bp genomic fragment containing an endoxylanase gene was determined. The xylanase gene of 795 bp, interrupted by two introns of 52 and ...
Brahmachari Samir K
Full Text Available Abstract Background Poly purine.pyrimidine sequences have the potential to adopt intramolecular triplex structures and are overrepresented upstream of genes in eukaryotes. These sequences may regulate gene expression by modulating the interaction of transcription factors with DNA sequences upstream of genes. Results A poly purine.pyrimidine sequence with the potential to adopt an intramolecular triplex DNA structure was designed. The sequence was inserted within a nucleosome positioned upstream of the β-galactosidase gene in yeast, Saccharomyces cerevisiae, between the cycl promoter and gal 10Upstream Activating Sequences (UASg. Upon derepression with galactose, β-galactosidase gene expression is reduced 12-fold in cells carrying single copy poly purine.pyrimidine sequences. This reduction in expression is correlated with reduced transcription. Furthermore, we show that plasmids carrying a poly purine.pyrimidine sequence are not specifically lost from yeast cells. Conclusion We propose that a poly purine.pyrimidine sequence upstream of a gene affects transcription. Plasmids carrying this sequence are not specifically lost from cells and thus no additional effort is needed for the replication of these sequences in eukaryotic cells.
Moses M Muraya
Full Text Available A major goal of maize genomic research is to identify sequence polymorphisms responsible for phenotypic variation in traits of economic importance. Large-scale detection of sequence variation is critical for linking genes, or genomic regions, to phenotypes. However, due to its size and complexity, it remains expensive to generate whole genome sequences of sufficient coverage for divergent maize lines, even with access to next generation sequencing (NGS technology. Because methods involving reduction of genome complexity, such as genotyping-by-sequencing (GBS, assess only a limited fraction of sequence variation, targeted sequencing of selected genomic loci offers an attractive alternative. We therefore designed a sequence capture assay to target 29 Mb genomic regions and surveyed a total of 4,648 genes possibly affecting biomass production in 21 diverse inbred maize lines (7 flints, 14 dents. Captured and enriched genomic DNA was sequenced using the 454 NGS platform to 19.6-fold average depth coverage, and a broad evaluation of read alignment and variant calling methods was performed to select optimal procedures for variant discovery. Sequence alignment with the B73 reference and de novo assembly identified 383,145 putative single nucleotide polymorphisms (SNPs, of which 42,685 were non-synonymous alterations and 7,139 caused frameshifts. Presence/absence variation (PAV of genes was also detected. We found that substantial sequence variation exists among genomic regions targeted in this study, which was particularly evident within coding regions. This diversification has the potential to broaden functional diversity and generate phenotypic variation that may lead to new adaptations and the modification of important agronomic traits. Further, annotated SNPs identified here will serve as useful genetic tools and as candidates in searches for phenotype-altering DNA variation. In summary, we demonstrated that sequencing of captured DNA is a powerful
Mizushima, T.; Matsuo, M.; Sekimizu, K
Various fluoroquinolones (norfloxacin, enoxacin, ofloxacin, levofloxacin, and sparfloxacin) induce DnaK and GroEL heat shock proteins in Escherichia coli. The induction is transient, consistent with the kinetics of cellular DNA relaxation. The concentrations of fluoroquinolones required for induction are similar to those required for DNA relaxation and much higher than those required for cell death.
Libich, David S.; Tugarinov, Vitali; Clore, G. Marius
The prototypical chaperonin GroEL assists protein folding through an ATP-dependent encapsulation mechanism. The details of how GroEL folds proteins remain elusive, particularly because encapsulation is not an absolute requirement for successful re/folding. Here we make use of a metastable model protein substrate, comprising a triple mutant of Fyn SH3, to directly demonstrate, by simultaneous analysis of three complementary NMR-based relaxation experiments (lifetime line broadening, dark state exchange saturation transfer, and Carr–Purcell–Meinboom–Gill relaxation dispersion), that apo GroEL accelerates the overall interconversion rate between the native state and a well-defined folding intermediate by about 20-fold, under conditions where the “invisible” GroEL-bound states have occupancies below 1%. This is largely achieved through a 500-fold acceleration in the folded-to-intermediate transition of the protein substrate. Catalysis is modulated by a kinetic deuterium isotope effect that reduces the overall interconversion rate between the GroEL-bound species by about 3-fold, indicative of a significant hydrophobic contribution. The location of the GroEL binding site on the folding intermediate, mapped from 15N, 1HN, and 13Cmethyl relaxation dispersion experiments, is composed of a prominent, surface-exposed hydrophobic patch. PMID:26124125
Grauers, Anna; Wang, Jingwen; Einarsdottir, Elisabet; Simony, Ane; Danielsson, Aina; Åkesson, Kristina; Ohlin, Acke; Halldin, Klas; Grabowski, Pawel; Tenne, Max; Laivuori, Hannele; Dahlman, Ingrid; Andersen, Mikkel; Christensen, Steen Bach; Karlsson, Magnus K; Jiao, Hong; Kere, Juha; Gerdhem, Paul
Idiopathic scoliosis is a spinal deformity affecting approximately 3% of otherwise healthy children or adolescents. The etiology is still largely unknown but has an important genetic component. Genome-wide association studies have identified a number of common genetic variants that are significantly associated with idiopathic scoliosis in Asian and Caucasian populations, rs11190870 close to the LBX1 gene being the most replicated finding. The aim of the present study was to investigate the genetics of idiopathic scoliosis in a Scandinavian cohort by performing a candidate gene study of four variants previously shown to be associated with idiopathic scoliosis and exome sequencing of idiopathic scoliosis patients with a severe phenotype to identify possible novel scoliosis risk variants. This was a case control study. A total of 1,739 patients with idiopathic scoliosis and 1,812 controls were included. The outcome measure was idiopathic scoliosis. The variants rs10510181, rs11190870, rs12946942, and rs6570507 were genotyped in 1,739 patients with idiopathic scoliosis and 1,812 controls. Exome sequencing was performed on pooled samples from 100 surgically treated idiopathic scoliosis patients. Novel or rare missense, nonsense, or splice site variants were selected for individual genotyping in the 1,739 cases and 1,812 controls. In addition, the 5'UTR, noncoding exon and promoter regions of LBX1, not covered by exome sequencing, were Sanger sequenced in the 100 pooled samples. Of the four candidate genes, an intergenic variant, rs11190870, downstream of the LBX1 gene, showed a highly significant association to idiopathic scoliosis in 1,739 cases and 1,812 controls (p=7.0×10(-18)). We identified 20 novel variants by exome sequencing after filtration and an initial genotyping validation. However, we could not verify any association to idiopathic scoliosis in the large cohort of 1,739 cases and 1,812 controls. We did not find any variants in the 5'UTR, noncoding exon and
Bull, Peter C; Buckee, Caroline O; Kyes, Sue; Kortok, Moses M; Thathy, Vandana; Guyah, Bernard; Stoute, José A; Newbold, Chris I; Marsh, Kevin
Plasmodium falciparum erythrocyte membrane protein 1 (PfEMP1) is a potentially important family of immune targets, encoded by an extremely diverse gene family called var. Understanding of the genetic organization of var genes is hampered by sequence mosaicism that results from a long history of non-homologous recombination. Here we have used software designed to analyse social networks to visualize the relationships between large collections of short var sequences tags sampled from clinical parasite isolates. In this approach, two sequences are connected if they share one or more highly polymorphic sequence blocks. The results show that the majority of analysed sequences including several var-like sequences from the chimpanzee parasite Plasmodium reichenowi can be either directly or indirectly linked together in a single unbroken network. However, the network is highly structured and contains putative subgroups of recombining sequences. The major subgroup contains the previously described group A var genes, previously proposed to be genetically distinct. Another subgroup contains sequences found to be associated with rosetting, a parasite virulence phenotype. The mosaic structure of the sequences and their division into subgroups may reflect the conflicting problems of maximizing antigenic diversity and minimizing epitope sharing between variants while maintaining their host cell binding functions.
Hitte, C; Madeoy, J; Kirkness, EF; Priat, C; Lorentzen, TD; Senger, F; Thomas, D; Derrien, T; Ramirez, C; Scott, C; Evanno, G; Pullar, B; Cadieu, E; Oza, [No Value; Lourgant, K; Jaffe, DB; Tacher, S; Dreano, S; Berkova, N; Andre, C; Deloukas, P; Fraser, C; Lindblad-Toh, K; Ostrander, EA; Galibert, F
Accurate and comprehensive sequence coverage for large genomes has been restricted to only a few species of specific interest. Lower sequence coverage (survey sequencing) of related species can yield a wealth of information about gene content and putative regulatory elements. But survey sequences la
Zhao, Jinying; Zhu, Yun; Xiong, Momiao
The critical barrier in interaction analysis for next-generation sequencing (NGS) data is that the traditional pairwise interaction analysis that is suitable for common variants is difficult to apply to rare variants because of their prohibitive computational time, large number of tests and low power. The great challenges for successful detection of interactions with NGS data are (1) the demands in the paradigm of changes in interaction analysis; (2) severe multiple testing; and (3) heavy computations. To meet these challenges, we shift the paradigm of interaction analysis between two SNPs to interaction analysis between two genomic regions. In other words, we take a gene as a unit of analysis and use functional data analysis techniques as dimensional reduction tools to develop a novel statistic to collectively test interaction between all possible pairs of SNPs within two genome regions. By intensive simulations, we demonstrate that the functional logistic regression for interaction analysis has the correct type 1 error rates and higher power to detect interaction than the currently used methods. The proposed method was applied to a coronary artery disease dataset from the Wellcome Trust Case Control Consortium (WTCCC) study and the Framingham Heart Study (FHS) dataset, and the early-onset myocardial infarction (EOMI) exome sequence datasets with European origin from the NHLBI's Exome Sequencing Project. We discovered that 6 of 27 pairs of significantly interacted genes in the FHS were replicated in the independent WTCCC study and 24 pairs of significantly interacted genes after applying Bonferroni correction in the EOMI study.
Full Text Available Among gene families it is the Hox genes and among metazoan animals it is the insects (Hexapoda that have attracted particular attention for studying the evolution of development. Surprisingly though, no Hox genes have been isolated from 26 out of 35 insect orders yet, and the existing sequences derive mainly from only two orders (61% from Hymenoptera and 22% from Diptera. We have designed insect specific primers and isolated 37 new partial homeobox sequences of Hox cluster genes (lab, pb, Hox3, ftz, Antp, Scr, abd-a, Abd-B, Dfd, and Ubx from six insect orders, which are crucial to insect phylogenetics. These new gene sequences provide a first step towards comparative Hox gene studies in insects. Furthermore, comparative distance analyses of homeobox sequences reveal a correlation between gene divergence rate and species radiation success with insects showing the highest rate of homeobox sequence evolution.
Deeley, M C; Yanofsky, C
The tryptophanase structural gene, tnaA, of Escherichia coli K-12 was cloned and sequenced. The size, amino acid composition, and sequence of the protein predicted from the nucleotide sequence agree with protein structure data previously acquired by others for the tryptophanase of E. coli B. Physiological data indicated that the region controlling expression of tnaA was present in the cloned segment. Sequence data suggested that a second structural gene of unknown function was located distal ...
Jackson Andrew P
Full Text Available Abstract Background The genome sequence of the protistan parasite Trypanosoma brucei contains many tandem gene arrays. Gene duplicates are created through tandem duplication and are expressed through polycistronic transcription, suggesting that the primary purpose of long, tandem arrays is to increase gene dosage in an environment where individual gene promoters are absent. This report presents the first account of the tandem gene arrays in the T. brucei genome, employing several related genome sequences to establish how variation is created and removed. Results A systematic survey of tandem gene arrays showed that substantial sequence variation existed across the genome; variation from different regions of an array often produced inconsistent phylogenetic affinities. Phylogenetic relationships of gene duplicates were consistent with concerted evolution being a widespread homogenising force. However, tandem duplicates were not usually identical; therefore, any homogenising effect was coincident with divergence among duplicates. Allelic gene conversion was detected using various criteria and was apparently able to both remove and introduce sequence variation. Tandem arrays containing structural heterogeneity demonstrated how sequence homogenisation and differentiation can occur within a single locus. Conclusion The use of multiple genome sequences in a comparative analysis of tandem gene arrays identified substantial sequence variation among gene duplicates. The distribution of sequence variation is determined by a dynamic balance of conservative and innovative evolutionary forces. Gene trees from various species showed that intraspecific duplicates evolve in concert, perhaps through frequent gene conversion, although this does not prevent sequence divergence, especially where structural heterogeneity physically separates a duplicate from its neighbours. In describing dynamics of sequence variation that have consequences beyond gene dosage, this
Thompson, Cristiane C; Thompson, Fabiano L; Vicente, Ana Carolina P; Swings, Jean
We investigated the use of atpA gene sequences as alternative phylogenetic and identification markers for vibrios. A fragment of 1322 bp (corresponding to approximately 88% of the coding region) was analysed in 151 strains of vibrios. The relationships observed were in agreement with the phylogeny inferred from 16S rRNA gene sequence analysis. For instance, the Vibrio cholerae, Vibrio halioticoli, Vibrio harveyi and Vibrio splendidus species groups appeared in the atpA gene phylogenetic analyses, suggesting that these groups may be considered as separate genera within the current Vibrio genus. Overall, atpA gene sequences appeared to be more discriminatory for species differentiation than 16S rRNA gene sequences. 16S rRNA gene sequence similarities above 97% corresponded to atpA gene sequences similarities above 80%. The intraspecies variation in the atpA gene sequence was about 99% sequence similarity. The results showed clearly that atpA gene sequences are a suitable alternative for the identification and phylogenetic study of vibrios.
Trobos, Margarita; Christensen, Henrik; Sunde, Marianne
The sul2 gene encodes sulphonamide resistance (Sul(R)) and is commonly found in Escherichia coli from different hosts. We typed E coli isolates by multilocus sequence typing (MLST) and compared the results to sequence variation of sul2, in order to investigate the relation to host origin of patho......The sul2 gene encodes sulphonamide resistance (Sul(R)) and is commonly found in Escherichia coli from different hosts. We typed E coli isolates by multilocus sequence typing (MLST) and compared the results to sequence variation of sul2, in order to investigate the relation to host origin...... of pathogenic and commensal E coli strains and to investigate whether transfer of sul2 into different genomic lineages has happened multiple times. Sixty-eight E coli isolated in Denmark and Norway from different hosts and years were MLST typed and sul2 PCR products were sequenced and compared. PFGE...
Full Text Available BACKGROUND: The intra- and inter-species genetic diversity of bacteria and the absence of 'reference', or the most representative, sequences of individual species present a significant challenge for sequence-based identification. The aims of this study were to determine the utility, and compare the performance of several clustering and classification algorithms to identify the species of 364 sequences of 16S rRNA gene with a defined species in GenBank, and 110 sequences of 16S rRNA gene with no defined species, all within the genus Nocardia. METHODS: A total of 364 16S rRNA gene sequences of Nocardia species were studied. In addition, 110 16S rRNA gene sequences assigned only to the Nocardia genus level at the time of submission to GenBank were used for machine learning classification experiments. Different clustering algorithms were compared with a novel algorithm or the linear mapping (LM of the distance matrix. Principal Components Analysis was used for the dimensionality reduction and visualization. RESULTS: The LM algorithm achieved the highest performance and classified the set of 364 16S rRNA sequences into 80 clusters, the majority of which (83.52% corresponded with the original species. The most representative 16S rRNA sequences for individual Nocardia species have been identified as 'centroids' in respective clusters from which the distances to all other sequences were minimized; 110 16S rRNA gene sequences with identifications recorded only at the genus level were classified using machine learning methods. Simple kNN machine learning demonstrated the highest performance and classified Nocardia species sequences with an accuracy of 92.7% and a mean frequency of 0.578. CONCLUSION: The identification of centroids of 16S rRNA gene sequence clusters using novel distance matrix clustering enables the identification of the most representative sequences for each individual species of Nocardia and allows the quantitation of inter- and intra
Joshi, Mohan Chandra; Sharma, Animesh; Kant, Sashi; Birah, Ajanta; Gupta, Gorakh Prasad; Khan, Sharik R; Bhatnagar, Rakesh; Banerjee, Nirupama
Xenorhabdus nematophila secretes insecticidal proteins to kill its larval prey. We have isolated an approximately 58-kDa GroEL homolog, secreted in the culture medium through outer membrane vesicles. The protein was orally insecticidal to the major crop pest Helicoverpa armigera with an LC50 of approximately 3.6 microg/g diet. For optimal insecticidal activity all three domains of the protein, apical, intermediate, and equatorial, were necessary. The apical domain alone was able to bind to the larval gut membranes and manifest low level insecticidal activity. At equimolar concentrations, the apical domain contained approximately one-third and the apical-intermediate domain approximately one-half bioactivity of that of the full-length protein. Interaction of the protein with the larval gut membrane was specifically inhibited by N-acetylglucosamine and chito-oligosaccharides. Treatment of the larval gut membranes with chitinase abolished protein binding. Based on the three-dimensional structural model, mutational analysis demonstrated that surface-exposed residues Thr-347 and Ser-356 in the apical domain were crucial for both binding to the gut epithelium and insecticidal activity. Double mutant T347A,S356A was 80% less toxic (p activity with Kd approximately 0.64 microm and Bmax approximately 4.68 micromol/g chitin. The variation in chitin binding activity of the mutant proteins was in good agreement with membrane binding characteristics and insecticidal activity. The less toxic double mutant XnGroEL showed an approximately 8-fold increase of Kd in chitin binding assay. Our results demonstrate that X. nematophila secretes an insecticidal GroEL protein with chitin binding activity.
Martin, Dorrelyn P.; Miya, Jharna; Reeser, Julie W.; Roychowdhury, Sameek
RNA sequencing (RNAseq) is a versatile method that can be utilized to detect and characterize gene expression, mutations, gene fusions, and noncoding RNAs. Standard RNAseq requires 30 – 100 million sequencing reads and can include multiple RNA products such as mRNA and noncoding RNAs. We demonstrate how targeted RNAseq (capture) permits a focused study on selected RNA products using a desktop sequencer. RNAseq capture can characterize unannotated, low, or transiently expressed transcripts that may otherwise be missed using traditional RNAseq methods. Here we describe the extraction of RNA from cell lines, ribosomal RNA depletion, cDNA synthesis, preparation of barcoded libraries, hybridization and capture of targeted transcripts and multiplex sequencing on a desktop sequencer. We also outline the computational analysis pipeline, which includes quality control assessment, alignment, fusion detection, gene expression quantification and identification of single nucleotide variants. This assay allows for targeted transcript sequencing to characterize gene expression, gene fusions, and mutations. PMID:27585245
YU Quan; ZHANG Guo-quan; FUKUSHI Hideto; YAMAGUCHI Tsuyoshi; HIRAI Katsuya
Objective: To know some genetical characterizations of Coxiella burnetii Chinese isolates by comparing the com1 gene sequence. Methods: com1 gene sequences of Chinese isolates were amplified, sequenced, and analyzed by comparing our result and the previous published data. Results: Three different com1 sequences were identified in 7 Chinese isolates. Sequence comparison indicated that the isolates harboring the QpRS plasmid could be defined as a new group and, in addition, the isolates carrying the same plasmid type showed similar com1 gene sequence. Conclusion: Study suggests that the classification of the group based on the com1 gene sequence is highly associated with the plasmid type of the isolates and, however, little related to disease forms and geographical origins of the isolates.
Full Text Available Jeevan Thavanathan,1 Nay Ming Huang,1 Kwai Lin Thong2 1Low Dimension Material Research Center, Department of Physics, 2Institute of Biological Sciences, Faculty of Science, University of Malaya, Kuala Lumpur, Malaysia Abstract: We have developed a colorimetric biosensor using a dual platform of gold nanoparticles and graphene oxide sheets for the detection of Salmonella enterica. The presence of the invA gene in S. enterica causes a change in color of the biosensor from its original pinkish-red to a light purplish solution. This occurs through the aggregation of the primary gold nanoparticles–conjugated DNA probe onto the surface of the secondary graphene oxide–conjugated DNA probe through DNA hybridization with the targeted DNA sequence. Spectrophotometry analysis showed a shift in wavelength from 525 nm to 600 nm with 1 µM of DNA target. Specificity testing revealed that the biosensor was able to detect various serovars of the S. enterica while no color change was observed with the other bacterial species. Sensitivity testing revealed the limit of detection was at 1 nM of DNA target. This proves the effectiveness of the biosensor in the detection of S. enterica through DNA hybridization. Keywords: biosensor, DNA hybridization, DNA probe, gold nanoparticles, graphene oxide, Salmonella enterica
Taciana de Carvalho Coutinho
Full Text Available ABSTRACT Cotton fiber are tubular cells which develop from the differentiation of ovule epidermis. In addition to being one of the most important natural fiber of the textile group, cotton fiber afford an excellent experimental system for studying the cell wall. The aim of this work was to isolate and characterise the genes expressed in cotton fiber (Gossypium hirsutum L. to be used in future work in cotton breeding. Fiber of the cotton cultivar CNPA ITA 90 II were used to extract RNA for the subsequent generation of a cDNA library. Seventeen sequences were obtained, of which 14 were already described in the NCBI database (National Centre for Biotechnology Information, such as those encoding the lipid transfer proteins (LTPs and arabinogalactans (AGP. However, other cDNAs such as the B05 clone, which displays homology with the glycosyltransferases, have still not been described for this crop. Nevertheless, results showed that several clones obtained in this study are associated with cell wall proteins, wall-modifying enzymes and lipid transfer proteins directly involved in fiber development.
Full Text Available Abstract Background In understanding the evolutionary process of vertebrates, cyclostomes (hagfishes and lamprey occupy crucial positions. Resolving molecular phylogenetic relationships of cyclostome genes with gnathostomes (jawed vertebrates genes is indispensable in deciphering both the species tree and gene trees. However, molecular phylogenetic analyses, especially those including lamprey genes, have produced highly discordant results between gene families. To efficiently scrutinize this problem using partial genome assemblies of early vertebrates, we focused on the potassium voltage-gated channel, shaker-related (KCNA family, whose members are mostly single-exon. Results Seven sea lamprey KCNA genes as well as six elephant shark genes were identified, and their orthologies to bony vertebrate subgroups were assessed. In contrast to robustly supported orthology of the elephant shark genes to gnathostome subgroups, clear orthology of any sea lamprey gene could not be established. Notably, sea lamprey KCNA sequences displayed unique codon usage pattern and amino acid composition, probably associated with exceptionally high GC-content in their coding regions. This lamprey-specific property of coding sequences was also observed generally for genes outside this gene family. Conclusions Our results suggest that secondary modifications of sequence properties unique to the lamprey lineage may be one of the factors preventing robust orthology assessments of lamprey genes, which deserves further genome-wide validation. The lamprey lineage-specific alteration of protein-coding sequence properties needs to be taken into consideration in tackling the key questions about early vertebrate evolution.
Khan Shafiq A
Full Text Available Abstract Background Complete genome annotation will likely be achieved through a combination of computer-based analysis of available genome sequences combined with direct experimental characterization of expressed regions of individual genomes. We have utilized a comparative genomics approach involving the sequencing of randomly selected hamster testis cDNAs to begin to identify genes not previously annotated on the human, mouse, rat and Fugu (pufferfish genomes. Results 735 distinct sequences were analyzed for their relatedness to known sequences in public databases. Eight of these sequences were derived from previously unidentified genes and expression of these genes in testis was confirmed by Northern blotting. The genomic locations of each sequence were mapped in human, mouse, rat and pufferfish, where applicable, and the structure of their cognate genes was derived using computer-based predictions, genomic comparisons and analysis of uncharacterized cDNA sequences from human and macaque. Conclusion The use of a comparative genomics approach resulted in the identification of eight cDNAs that correspond to previously uncharacterized genes in the human genome. The proteins encoded by these genes included a new member of the kinesin superfamily, a SET/MYND-domain protein, and six proteins for which no specific function could be predicted. Each gene was expressed primarily in testis, suggesting that they may play roles in the development and/or function of testicular cells.
Grauers, Anna; Wang, Jingwen; Einarsdottir, Elisabet
BACKGROUND CONTEXT: Idiopathic scoliosis is a spinal deformity affecting approximately 3% of otherwise healthy children or adolescents. The etiology is still largely unknown but has an important genetic component. Genome-wide association studies have identified a number of common genetic variants...... that are significantly associated with idiopathic scoliosis in Asian and Caucasian populations, rs11190870 close to the LBX1 gene being the most replicated finding. PURPOSE: The aim of the present study was to investigate the genetics of idiopathic scoliosis in a Scandinavian cohort by performing a candidate gene study...... of four variants previously shown to be associated with idiopathic scoliosis and exome sequencing of idiopathic scoliosis patients with a severe phenotype to identify possible novel scoliosis risk variants. STUDY DESIGN: This was a case control study. PATIENT SAMPLE: A total of 1,739 patients...
王艳霞; 王红玲; 王敏; 骆健美
Arthrobacter simplex,a strain with steroid C1,2 dehydrogenation ability,has been widely used in the steroid industry, usually adding ethanol in the transforming system may catalyze the dissolution of hydrophobic substrates. Our prior work by LC-MS/MS and bioinformatics analysis found that heat shock protein(HSP)GroEL was closely correlated to the cell adaptive behavior under ethanol stress condition. Based on this concept,groEL gene was cloned by PCR while usingA. simplexTCCC 11037 genome as a template. The results showed that the homologies of nucleotide and amino acid sequence of GroEL in the strain were the highest with that fromNocardioides sp. JS614 by the ratio of 89% and 95%,respectively. Then its heterologous expression was conducted in Escherichia coli BL21(DE3)with the vector pET-22b (+),and the results showed that the recombinant strain presented the significant increase on the survival performance under the shock condition with high-concentration of methanol,hypertonicity and superacid,as well as the growth performance under the proper stresses of them. The study was the first report on the cloning and expression of genegroEL fromA. simplex,and also provided fundamental data for exploring the biological function of HSP protein GroEL from A. simplex.%以简单节杆菌（Arthrobacter simplex）TCCC 11037基因组为模板，通过PCR方法进行groEL基因克隆，结果表明，该菌株中GroEL的核苷酸和氨基酸序列与Nocardioidessp. JS614来源的GroEL同源性最高，分别为89%和95%。通过pET-22b（+）将其在大肠杆菌BL21（DE3）中进行异源表达，结果表明，重组菌株在甲醇、高渗和强酸这3种压力高浓度冲击条件下的存活能力和适当压力条件下的生长性能均明显提高。
马巍; 吴玲; 王德利; 刘淼; 任惠民; 杨广笑; 王全颖
Objective Molecular cloning and sequencing of the human matured fragment of human nerve growth factor(NGF) gene. Methods Extracting the human genomic DNA from the white blood cells as templates, the gene of NGF was cloned by using PCR and T-vector cloning method. Screening the positive clones and identified by the restriction enzymes, and then the cloned amplified fragment was sequenced and analyzed. Results DNA sequence comparison the cloned gene of NGF with the GenBank (V01511) sequence demonstrated that both of sequences were identical, 354bp length. Conclusion Cloning the NGF gene from the human genomic DNA has paved the way for further study on gene therapy of nerve system injury.
Wang, Y.F.; Yu, M.; Pas, te M.F.W.; Yerle, M.; Liu, B.; Fan, B.; Xiong, T.; Li, K.
The full-length cDNA of porcine genes (PSME1 and PSME2) encoding proteasome activators PA28¿- and ß-subunits were obtained by the rapid amplification of cDNA ends (RACE). The nucleotide sequences and the predicted protein sequences share high sequence identity with their mammalian counterparts. The
Spicker, J.S.; Wikman, F.; Lu, M.L.;
We have trained an artificial neural network to predict the sequence of the human TP53 tumor suppressor gene based on a p53 GeneChip. The trained neural network uses as input the fluorescence intensities of DNA hybridized to oligonucleotides on the surface of the chip and makes between zero...... and four errors in the predicted 1300 bp sequence when tested on wild-type TP53 sequence....
王焱; 王征宇; 周鸣南; 蔡菊娥; 孙兰英; 刘新垣; B.L.Daugherty; S.Pestka
A murine new alpha interferon gene (mIFN-αB) was found by primer-based sequencing method in a murine genomic DNA library. The gene was cloned and its sequence was determined. It was expressed in Escherichia coli under the control of the PL promoter which resulted in antiviral activity on mouse L-cells. The sequence of mlFN-αB has been accepted by GENEBANK.
Lund, T; Rehfeld, J F; Olsen, Jørgen
In order to deduce the primary structure of bovine preprogastrin we therefore sequenced a gastrin DNA clone isolated from a bovine liver cosmid library. Bovine preprogastrin comprises 104 amino acids and consists of a signal peptide, a 37 amino acid spacer-sequence, the gastrin-34 sequence followed...... by an amidation-site (Gly-Arg-Arg), and a C-terminal nonapeptide. Comparison with human, porcine, and rat cDNA sequences revealed extensive homology in the coding region as well as in short noncoding structures....
Full Text Available The full-length cDNA sequence of a porcine gene, MOSPD2, was amplified using the rapid amplification of cDNA ends method based on a pig expressed sequence tag sequence which was highly homologous to the coding sequence of the human MOSPD2 gene. Sequence prediction analysis revealed that the open reading frame of this gene encodes a protein of 491 amino acids that has high homology with the motile sperm domain-containing protein 2 (MOSPD2 of five species: horse (89%, human (90%, chimpanzee (89%, rhesus monkey (89% and mouse (85%; thus, it could be defined as a porcine MOSPD2 gene. This novel porcine gene was assigned GeneID: 100153601. This gene is structured in 15 exons and 14 introns as revealed by computer-assisted analysis. The phylogenetic analysis revealed that the porcine MOSPD2 gene has a closer genetic relationship with the MOSPD2 gene of horse. Tissue expression analysis indicated that the porcine MOSPD2 gene is generally and differentially expressed in the spleen, muscle, skin, kidney, lung, liver, fat and heart. Our experiment is the first to establish the primary foundation for further research on the porcine MOSPD2 gene.
Have, Christian Theil; Mørk, Søren
using the probabilistic logic programming language and machine learning system PRISM - a fast and efficient model prototyping environment, using bacterial gene finding performance as a benchmark of signal strength. The model is used to prune a set of gene predictions from an underlying gene finder...
Guang-Rong Li; Tao Lang; En-Nian Yang; Cheng Liu; Zu-Jun Yang
Although the unique properties of wheat -gliadin gene family are well characterized, little is known about the evolution and genomic divergence of -gliadin gene family within the Triticeae. We isolated a total of 203 -gliadin gene sequences from 11 representative diploid and polyploid Triticeae species, and found 108 sequences putatively functional. Our results indicate that -gliadin genes may have possibly originated from wild Secale species, where the sequences contain the shortest repetitive domains and display minimum variation. A miniature inverted-repeat transposable element insertion is reported for the first time in -gliadin gene sequence of Thinopyrum intermedium in this study, indicating that the transposable element might have contributed to the diversification of -gliadin genes family among Triticeae genomes. The phylogenetic analyses revealed that the -gliadin gene sequences of Dasypyrum, Australopyrum, Lophopyrum, Eremopyrum and Pseudoroengeria species have amplified several times. A search for four typical toxic epitopes for celiac disease within the Triticeae -gliadin gene sequences showed that the -gliadins of wild Secale, Australopyrum and Agropyron genomes lack all four epitopes, while other Triticeae species have accumulated these epitopes, suggesting that the evolution of these toxic epitopes sequences occurred during the course of speciation, domestication or polyploidization of Triticeae.
Shamblott, M J; Litman, G W
The genomic organization and sequence of immunoglobulin light chain genes in Heterodontus francisci (horned shark), a phylogenetically primitive vertebrate, have been characterized. Light chain variable (VL) and joining (JI) segments are separated by 380 nucleotides and together with the single constant region exon (CI), occupy less than 2.7 kb, the closest linkage described thus far for a rearranging gene system. The VL segment is flanked by a characteristic recombination signal sequence possessing a 12 nucleotide spacer; the recombination signal sequence flanking the JL segment is 23 nucleotides. The VL genes, unlike heavy chain genes, possess a typical upstream regulatory octamer as well as conserved enhancer core sequences in the intervening sequence separating JL and CL. Restriction mapping and genomic Southern blotting are consistent with the presence of multiple light chain gene clusters. There appear to be considerably fewer light than heavy chain genes. Heavy and light chain clusters show no evidence of genomic linkage using field inversion gel electrophoresis. The findings of major differences in the organization and functional rearrangement properties of immunoglobulin genes in species representing different levels of vertebrate evolution, but consistent similarity in the organization of heavy and light chain genes within a species, suggests that these systems may be coevolving. Images PMID:2511000
Waterhouse, Robert M.; Zdobnov, Evgeny M.; Kriventseva, Evgenia V.
Delineating ancestral gene relations among a large set of sequenced eukaryotic genomes allowed us to rigorously examine links between evolutionary and functional traits. We classified 86% of over 1.36 million protein-coding genes from 40 vertebrates, 23 arthropods, and 32 fungi into orthologous groups and linked over 90% of them to Gene Ontology or InterPro annotations. Quantifying properties of ortholog phyletic retention, copy-number variation, and sequence conservation, we examined correlations with gene essentiality and functional traits. More than half of vertebrate, arthropod, and fungal orthologs are universally present across each lineage. These universal orthologs are preferentially distributed in groups with almost all single-copy or all multicopy genes, and sequence evolution of the predominantly single-copy orthologous groups is markedly more constrained. Essential genes from representative model organisms, Mus musculus, Drosophila melanogaster, and Saccharomyces cerevisiae, are significantly enriched in universal orthologs within each lineage, and essential-gene-containing groups consistently exhibit greater sequence conservation than those without. This study of eukaryotic gene repertoire evolution identifies shared fundamental principles and highlights lineage-specific features, it also confirms that essential genes are highly retained and conclusively supports the “knockout-rate prediction” of stronger constraints on essential gene sequence evolution. However, the distinction between sequence conservation of single- versus multicopy orthologs is quantitatively more prominent than between orthologous groups with and without essential genes. The previously underappreciated difference in the tolerance of gene duplications and contrasting evolutionary modes of “single-copy control” versus “multicopy license” may reflect a major evolutionary mechanism that allows extended exploration of gene sequence space. PMID:21148284
Kitamoto, N; Yamagata, H; Kato, T.; Tsukagoshi, N; Udaka, S
A gene coding for thermophilic beta-amylase of Clostridium thermosulfurogenes was cloned into Bacillus subtilis, and its nucleotide sequence was determined. The nucleotide sequence suggested that the thermophilic beta-amylase is translated from monocistronic mRNA as a secretory precursor with a signal peptide of 32 amino acid residues. The deduced amino acid sequence of the mature beta-amylase contained 519 residues with a molecular weight of 57,167. The amino acid sequence of the C. thermosu...
Muñoz-Amatriaín, María; Lonardi, Stefano; Luo, MingCheng; Madishetty, Kavitha; Svensson, Jan T; Moscou, Matthew J; Wanamaker, Steve; Jiang, Tao; Kleinhofs, Andris; Muehlbauer, Gary J; Wise, Roger P; Stein, Nils; Ma, Yaqin; Rodriguez, Edmundo; Kudrna, Dave; Bhat, Prasanna R; Chao, Shiaoman; Condamine, Pascal; Heinen, Shane; Resnik, Josh; Wing, Rod; Witt, Heather N; Alpert, Matthew; Beccuti, Marco; Bozdag, Serdar; Cordero, Francesca; Mirebrahim, Hamid; Ounit, Rachid; Wu, Yonghui; You, Frank; Zheng, Jie; Simková, Hana; Dolezel, Jaroslav; Grimwood, Jane; Schmutz, Jeremy; Duma, Denisa; Altschmied, Lothar; Blake, Tom; Bregitzer, Phil; Cooper, Laurel; Dilbirligi, Muharrem; Falk, Anders; Feiz, Leila; Graner, Andreas; Gustafson, Perry; Hayes, Patrick M; Lemaux, Peggy; Mammadov, Jafar; Close, Timothy J
Barley (Hordeum vulgare L.) possesses a large and highly repetitive genome of 5.1 Gb that has hindered the development of a complete sequence. In 2012, the International Barley Sequencing Consortium released a resource integrating whole-genome shotgun sequences with a physical and genetic framework. However, because only 6278 bacterial artificial chromosome (BACs) in the physical map were sequenced, fine structure was limited. To gain access to the gene-containing portion of the barley genome at high resolution, we identified and sequenced 15 622 BACs representing the minimal tiling path of 72 052 physical-mapped gene-bearing BACs. This generated ~1.7 Gb of genomic sequence containing an estimated 2/3 of all Morex barley genes. Exploration of these sequenced BACs revealed that although distal ends of chromosomes contain most of the gene-enriched BACs and are characterized by high recombination rates, there are also gene-dense regions with suppressed recombination. We made use of published map-anchored sequence data from Aegilops tauschii to develop a synteny viewer between barley and the ancestor of the wheat D-genome. Except for some notable inversions, there is a high level of collinearity between the two species. The software HarvEST:Barley provides facile access to BAC sequences and their annotations, along with the barley-Ae. tauschii synteny viewer. These BAC sequences constitute a resource to improve the efficiency of marker development, map-based cloning, and comparative genomics in barley and related crops. Additional knowledge about regions of the barley genome that are gene-dense but low recombination is particularly relevant.
Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David
more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...
Ladefoged, Søren; Christiansen, Gunna
of which showed similarity to that which encodes the LicA protein of Haemophilus influenzae. The organization of the genes in the region showed no resemblance to that in the corresponding regions of other bacteria sequenced so far. The gyrA gene was mapped 35 kb downstream from the gyrB gene....
Chaillou, S.; Lokman, B.C.; Leer, R.J.; Posthuma, C.; Postma, P.W.; Pouwels, P.H.
Two genes, xylP and xylQ, from the xylose regulon of Lactobacillus pentosus were cloned and sequenced. Together with the repressor gene of the regulon, xylR, the xylPQ genes form an operon which is inducible by xylose and which is transcribed from a promoter located 145 bp upstream of xylP. A putati
Élida ML Rabelo
Full Text Available Continuing the Schistosoma mansoni Genome Project 363 new templates were sequenced generating 205 more ESTs corresponding to 91 genes. Seventy four of these genes (81% had not previously been described in S. mansoni. Among the newly discovered genes there are several of significant biological interest such as synaptophysin, NIFs-like and rho-GDP dissociation inhibitor
Doyle, Christine J; Mazins, Adam; Graham, Rikki M A; Fang, Ning-Xia; Smith, Helen V; Jennison, Amy V
By conducting a molecular characterization of Corynebacterium diphtheriae strains in Australia, we identified novel sequences, nonfunctional toxin genes, and 5 recent cases of toxigenic cutaneous diphtheria. These findings highlight the importance of extrapharyngeal infections for toxin gene-bearing (functional or not) and non-toxin gene-bearing C. diphtheriae strains. Continued surveillance is recommended.
Full Text Available Aim: To analyze the promoter sequence of toll-like receptor (TLR genes in Vechur cattle, an indigenous breed of Kerala with the sequence of Bos taurus and access the differences that could be attributed to innate immune responses against bovine mastitis. Materials and Methods: Blood samples were collected from Jugular vein of Vechur cattle, maintained at Vechur cattle conservation center of Kerala Veterinary and Animal Sciences University, using an acid-citrate-dextrose anticoagulant. The genomic DNA was extracted, and polymerase chain reaction was carried out to amplify the promoter region of TLRs. The amplified product of TLR2, 4, and 9 promoter regions was sequenced by Sanger enzymatic DNA sequencing technique. Results: The sequence of promoter region of TLR2 of Vechur cattle with the B. taurus sequence present in GenBank showed 98% similarity and revealed variants for four sequence motifs. The sequence of the promoter region of TLR4 of Vechur cattle revealed 99% similarity with that of B. taurus sequence but not reveals significant variant in motifregions. However, two heterozygous loci were observed from the chromatogram. Promoter sequence of TLR9 gene also showed 99% similarity to B. taurus sequence and revealed variants for four sequence motifs. Conclusion: The results of this study indicate that significant variation in the promoter of TLR2 and 9 genes in Vechur cattle breed and may potentially link the influence the innate immunity response against mastitis diseases.
Jensen Roderick V
Full Text Available Abstract Background Sinorhizobium meliloti is an agriculturally important model symbiont. There is an ongoing need to update and improve its genome annotation. In this study, we used a high-throughput pyrosequencing approach to sequence the transcriptome of S. meliloti, and search for new bacterial genes missed in the previous genome annotation. This is the first report of sequencing a bacterial transcriptome using the pyrosequencing technology. Results Our pilot sequencing run generated 19,005 reads with an average length of 136 nucleotides per read. From these data, we identified 20 new genes. These new gene transcripts were confirmed by RT-PCR and their possible functions were analyzed. Conclusion Our results indicate that high-throughput sequence analysis of bacterial transcriptomes is feasible and next-generation sequencing technologies will greatly facilitate the discovery of new genes and improve genome annotation.
Full Text Available Abstract Background Previous studies have reported on the presence of Murine Mammary Tumor Virus (MMTV-like gene sequences in human cancer tissue specimens. Here, we search for MMTV-like gene sequences in lung diseases including carcinomas specimens from a Mexican population. This study was based on our previous study reporting that the INER51 lung cancer cell line, from a pleural effusion of a Mexican patient, contains MMTV-like env gene sequences. Results The MMTV-like env gene sequences have been detected in three out of 18 specimens studied, by PCR using a specific set of MMTV-like primers. The three identified MMTV-like gene sequences, which were assigned as INER6, HZ101, and HZ14, were 99%, 98%, and 97% homologous, respectively, as compared to GenBank sequence accession number AY161347. The INER6 and HZ-101 samples were isolated from lung cancer specimens, and the HZ-14 was isolated from an acute inflammatory lung infiltrate sample. Two of the env sequences exhibited disruption of the reading frame due to mutations. Conclusion In summary, we identified the presence of MMTV-like gene sequences in 2 out of 11 (18% of the lung carcinomas and 1 out of 7 (14% of acute inflamatory lung infiltrate specimens studied of a Mexican Population.
Kilstrup, Mogens; Jacobsen, Susanne; Hammer, Karin;
The bacterium Lactococcus lactis has become a model organism in studies of growth physiology and membrane transport, as a result of its simple fermentative metabolism. It is also used as a model for studying the importance of specific genes and functions during lie in excess nutrients, by compari......The bacterium Lactococcus lactis has become a model organism in studies of growth physiology and membrane transport, as a result of its simple fermentative metabolism. It is also used as a model for studying the importance of specific genes and functions during lie in excess nutrients......, by comparison of prototrophic wild-type strains and auxotrophic domesticated (daily) strains. In a study of the capacity of domesticated strains to perform directed responses toward various stress conditions, we have analyzed the heat and salt stress response in the established L,. lactis subsp. cremoris...... laboratory strain MG1363, which was originally derived from a dairy strain, After two-dimensional separation of proteins, the DnaK, GroEL, and GroES heat shock proteins, the HrcA (Orf1) heat shack repressor, and the glycolytic enzymes pyruvate kinase, glyceral-dehyde-3-phosphate dehydrogenase...
Saroj K. Dangi
Full Text Available Aim: Blackleg disease is caused by Clostridium chauvoei in ruminants. Although virulence factors such as C. chauvoei toxin A, sialidase, and flagellin are well characterized, hyaluronidases of C. chauvoei are not characterized. The present study was aimed at cloning and sequence analysis of hyaluronoglucosaminidase (nagH gene of C. chauvoei. Materials and Methods: C. chauvoei strain ATCC 10092 was grown in ATCC 2107 media and confirmed by polymerase chain reaction (PCR using the primers specific for 16-23S rDNA spacer region. nagH gene of C. chauvoei was amplified and cloned into pRham-SUMO vector and transformed into Escherichia cloni 10G cells. The construct was then transformed into E. cloni cells. Colony PCR was carried out to screen the colonies followed by sequencing of nagH gene in the construct. Results: PCR amplification yielded nagH gene of 1143 bp product, which was cloned in prokaryotic expression system. Colony PCR, as well as sequencing of nagH gene, confirmed the presence of insert. Sequence was then subjected to BLAST analysis of NCBI, which confirmed that the sequence was indeed of nagH gene of C. chauvoei. Phylogenetic analysis of the sequence showed that it is closely related to Clostridium perfringens and Clostridium paraputrificum. Conclusion: The gene for virulence factor nagH was cloned into a prokaryotic expression vector and confirmed by sequencing.
FANG XiaoMin; XU NingYing; REN ShouWen
CACNA1S gene encodes the α1 subunit of the calcium channel. The mutation of CACNA1S gene can cause hypokalemic periodic paralysis (HypoKPP) and maliglant hyperthermla synarome (MHS) in human beings. Current research on CACNA1S was mainly in human being and model animal, but rarely in livestock and poultry. In this study, Yorkshire pigs (23), Pietrain pigs (30), Jinhua pigs (115) and the second generation (126) of crossbred of Jinhua and Pietrein were used. Primers were designed according to the sequence of human CACNA1S gene and PCR was carried out using pig genome DNA.PCR products were sequenced and compared with that of human, and then single nucleotide polymorphisms (SNPs) were investigated by PCR-SSCP, while PCR-RFLP tests were performed to validate the mutations. Results indicated: (1) the 5211 bp DNA fragments of porcine CACNA1S gene were acquired (GenBank accession number: DQ767693 ) and the identity of the exon region was 82.6% between human and pig; (2) fifty-seven mutations were found within the cloned sequences, among which 24 were in exon region; (3) the results of PCR-RFLP were in accordance with that of PCR-SSCP. According to the EST of porcine CACNA1S gene published in GenBank (Bx914582, Bx666997), 8 of the 11 SNPs identified in the present study were consistent with the base difference between two EST fragments.
ZHOU Yanhong; JING Hui; LI Yanen; LIU Huailan
Expressed sequence tags (ESTs), which have piled up considerably so far, provide a valuable resource for finding new genes, disease-relevant genes, and for recognizing alternative splicing variants, SNP sites, etc. The prerequisite for carrying out these researches is to correctly ascertain the gene-sequence-related ESTs. Based on analysis of the alignment results between some known gene sequences and ESTs in public database, several measures including Identity Check, Gap Check, Inclusion Check and Length Check have been introduced to judge whether an EST alignment is related to a gene sequence or not. A computational program EDSAc1.0 has been developed to identify true EST alignments and exon regions of query gene sequences. When tested with human gene sequences in the standard dataset HMR195 and evaluated with the standard measures of gene prediction performance, EDSAc1.0 can identify protein- coding regions with specificity of 0.997 and sensitivity of 0.88 at the nucleotide level, which outperform that of the counterpart TAP. A web server of EDSAc1.0 is available at http://infosci.hust.edu.cn.
Harrison Paul M
Full Text Available Abstract Pseudogenes arise from the decay of gene copies following either RNA-mediated duplication (processed pseudogenes or DNA-mediated duplication (nonprocessed pseudogenes. Here, we show that long protein-coding genes tend to produce more nonprocessed pseudogenes than short genes, whereas the opposite is true for processed pseudogenes. Protein-coding genes longer than 3000 bp are 6 times more likely to produce nonprocessed pseudogenes than processed ones. Reviewers This article was reviewed by Dr. Dan Graur and Dr. Craig Nelson (nominated by Dr. J Peter Gogarten.
Zhang Dong-jie; Liu Di; Wang Liang; He Xin-miao; Wang Wen-tao
In order to study the gene sequence of Min pig Y-box binding protein (YB-1) gene, the complete coding sequence of Min pig YB-1 gene was cloned by RT-PCR, the sequence features were analyzed by some software and online website. The results showed that the complete CDS of Min pig Y-box was found to be 975 bp long, encoding 324 amino acids. It contained a conserved cold shock domain and several phosphorylation sites, but had no transmembrane domains, and was consistent with a protein found in the cytoplasm. Min pig YB-1 nucleotides shared high similarity (61.37%-97.66%) with other mammals.
Songsivilai, S; Bye, J M; Marks, J D; Hughes-Jones, N C
Universal oligonucleotide primers, designed for amplifying and sequencing genes encoding the rearranged human lambda immunoglobulin variable region, were validated by amplification of the lambda light chain genes from four human heterohybridoma cell lines and in the generation of a cDNA library of human V lambda sequences from Epstein-Barr virus-transformed human peripheral blood lymphocytes. This technique allows rapid cloning and sequencing of human immunoglobulin genes, and has potential applications in the rescue of unstable human antibody-producing cell lines and in the production of human monoclonal antibodies.
Kuprina, N Iu; Krol', E S; Koriabin, M Iu; Shestopalov, B V; Bliskovskiĭ, V V; Bannikov, V M; Gizatullin, R Z; Kirillov, A V; Kravtsov, V Iu; Zakhar'ev, V M
We have analyzed the CHL15 gene, earlier identified in a screen for yeast mutants with increased loss of chromosome III and artificial circular and linear chromosomes in mitosis. Mutations in the CHL15 gene lead to a 100-fold increase in the rate of chromosome III loss per cell division and a 200-fold increase in the rate of marker homozygosis on this chromosome by mitotic recombination. Analysis of segregation of artificial circular minichromosome and artificially generated nonessential marker chromosome fragment indicated that sister chromatid loss (1:0 segregation) is a main reason of chromosome destabilization in the chl15-1 mutant. A genomic clone of CHL15 was isolated and used to map its physical position on chromosome XVI. Nucleotide sequence analysis of CHL15 revealed a 2.8-kb open reading frame with a 105-kD predicted protein sequence. At the N-terminal region of the protein sequences potentially able to form DNA-binding domains defined as zinc-fingers were found. The C-terminal region of the predicted protein displayed a similarity to sequence of regulatory proteins known as the helix-loop-helix (HLH) proteins. Data on partial deletion analysis suggest that the HLH domain is essential for the function of the CHL15 gene product. Analysis of the upstream untranslated region of CHL15 revealed the presence of the hexamer element, ACGCGT (an MluI restriction site) controlling both the periodic expression and coordinate regulation of the DNA synthesis genes in budding yeast. Deletion in the RAD52 gene, the product of which is involved in double-strand break/recombination repair and replication, leads to a considerable decrease in the growth rate of the chl15 mutant. We suggest that CHL15 is a new DNA synthesis gene in the yeast Saccharomyces cerevisiae.
LI Ji-chang; TONG Guang-zhi; QIU Hua-ji; ZHOU Yan-jun; XUE Qiang
By means of PCR,the gene encoding gD of bovine herpesvirus-1 (BHV-1) strain Luojing was amplified,cloned and sequenced.The nucleotide sequence of this gD gene was 1 251 bp,encoding 417 amino acids.Comparied with the published P8-2 strain,the homology of the necleotide sequence is 99.92%,and that of the deduced amino acid sequence is 100%.The results indicated that gD of BHV-1 was highly conservative.
Full Text Available Identification of pathways involved in the structural transitions of biomolecular systems is often complicated by the transient nature of the conformations visited across energy barriers and the multiplicity of paths accessible in the multidimensional energy landscape. This task becomes even more challenging in exploring molecular systems on the order of megadaltons. Coarse-grained models that lend themselves to analytical solutions appear to be the only possible means of approaching such cases. Motivated by the utility of elastic network models for describing the collective dynamics of biomolecular systems and by the growing theoretical and experimental evidence in support of the intrinsic accessibility of functional substates, we introduce a new method, adaptive anisotropic network model (aANM, for exploring functional transitions. Application to bacterial chaperonin GroEL and comparisons with experimental data, results from action minimization algorithm, and previous simulations support the utility of aANM as a computationally efficient, yet physically plausible, tool for unraveling potential transition pathways sampled by large complexes/assemblies. An important outcome is the assessment of the critical inter-residue interactions formed/broken near the transition state(s, most of which involve conserved residues.
Several vanadium compounds have been known for the hypoglycemic and anticancer effects. However, the mechanisms of the pharmacological and toxicological effects were not clear. In this work, we investigated the potential targets of vanadium in mitochondria. Vanadyl ions were found to bind to mitochondria from rat liver with a stoichiometry of 244±58 nmol/mg protein and an apparent dissocia- tion constant (Kd) of (2.0±0.8)×10-16 mol/L. Using size exclusion chromatography, a vanadium-binding protein was isolated and identified to be the 60-kDa heat shock protein (HSP60) by mass spectrometry analysis and immunoassays. Additionally, binding of vanadyl ions was found to result in depolymerization of homo-oligomeric HSP60 (GroEL). HSP60 is an indispensable molecular chaperone and involved in many kinds of pathogenesis of inflammatory and autoimmune diseases, e.g. type 1 diabetes. Our results suggested that HSP60 could be a novel important target involved in the biological and/or toxicological effects of vanadium compounds.
J. Molenaar (Jan); J. Koster (Jan); D. Zwijnenburg (Danny); P. van Sluis (Peter); L.J. Valentijn (Linda); I. van der Ploeg (Ida); M. Hamdi (Mohamed); J. van Nes (Johan); B.A. Westerman (Bart); J. van Arkel (Jennemiek); M.E. Ebus; F. Haneveld (Franciska); A. Lakeman (Arjan); L. Schild (Linda); P. Molenaar (Piet); P. Stroeken (Peter); M.M. van Noesel (Max); I. Øra (Ingrid); J.P. di Santo (James); H.N. Caron (Huib); E.M. Westerhout (Ellen); R. Versteeg (Rogier)
textabstractNeuroblastoma is a childhood tumour of the peripheral sympathetic nervous system. The pathogenesis has for a long time been quite enigmatic, as only very few gene defects were identified in this often lethal tumour. Frequently detected gene alterations are limited to MYCN amplification (
Dumas, Laura; Dickens, C Michael; Anderson, Nathan; Davis, Jonathan; Bennett, Beth; Radcliffe, Richard A; Sikela, James M
It has been well documented that genetic factors can influence predisposition to develop alcoholism. While the underlying genomic changes may be of several types, two of the most common and disease associated are copy number variations (CNVs) and sequence alterations of protein coding regions. The goal of this study was to identify CNVs and single-nucleotide polymorphisms that occur in gene coding regions that may play a role in influencing the risk of an individual developing alcoholism. Toward this end, two mouse strains were used that have been selectively bred based on their differential sensitivity to alcohol: the Inbred long sleep (ILS) and Inbred short sleep (ISS) mouse strains. Differences in initial response to alcohol have been linked to risk for alcoholism, and the ILS/ISS strains are used to investigate the genetics of initial sensitivity to alcohol. Array comparative genomic hybridization (arrayCGH) and exome sequencing were conducted to identify CNVs and gene coding sequence differences, respectively, between ILS and ISS mice. Mouse arrayCGH was performed using catalog Agilent 1 × 244 k mouse arrays. Subsequently, exome sequencing was carried out using an Illumina HiSeq 2000 instrument. ArrayCGH detected 74 CNVs that were strain-specific (38 ILS/36 ISS), including several ISS-specific deletions that contained genes implicated in brain function and neurotransmitter release. Among several interesting coding variations detected by exome sequencing was the gain of a premature stop codon in the alpha-amylase 2B (AMY2B) gene specifically in the ILS strain. In total, exome sequencing detected 2,597 and 1,768 strain-specific exonic gene variants in the ILS and ISS mice, respectively. This study represents the most comprehensive and detailed genomic comparison of ILS and ISS mouse strains to date. The two complementary genome-wide approaches identified strain-specific CNVs and gene coding sequence variations that should provide strong candidates to
Full Text Available Molecular diagnosis for hereditary breast and ovarian cancer (HBOC involves systematic DNA sequencing of predisposition genes like BRCA1 or BRCA2. Deleterious mutations within such genes are responsible for developing the disease, but other sequence variants can also be identified. Common Single Nucleotide Polymorphisms (SNPs are usually present in human genome, defining alleles whose frequencies widely vary in different populations. Either intragenic or intronic, silent or generating aminoacid substitutions, SNPs cannot be afforded themselves a predisposition status. However, prevalent SNPs can be used to define gene haplotypes, with also various frequencies. Since some mutation can easily be assigned to haplotypes (such is the case for BRCA1 gene, SNPs can therefore provide usual information in interpreting gene mutations effects on hereditary predisposition to cancer. Here we describe 10 BRCA2 SNPs identified by complete gene sequencing
De La Torre, Amanda R; Lin, Yao-Cheng; Van de Peer, Yves; Ingvarsson, Pär K
The recent sequencing of several gymnosperm genomes has greatly facilitated studying the evolution of their genes and gene families. In this study, we examine the evidence for expression-mediated selection in the first two fully sequenced representatives of the gymnosperm plant clade (Picea abies and Picea glauca). We use genome-wide estimates of gene expression (>50,000 expressed genes) to study the relationship between gene expression, codon bias, rates of sequence divergence, protein length, and gene duplication. We found that gene expression is correlated with rates of sequence divergence and codon bias, suggesting that natural selection is acting on Picea protein-coding genes for translational efficiency. Gene expression, rates of sequence divergence, and codon bias are correlated with the size of gene families, with large multicopy gene families having, on average, a lower expression level and breadth, lower codon bias, and higher rates of sequence divergence than single-copy gene families. Tissue-specific patterns of gene expression were more common in large gene families with large gene expression divergence than in single-copy families. Recent family expansions combined with large gene expression variation in paralogs and increased rates of sequence evolution suggest that some Picea gene families are rapidly evolving to cope with biotic and abiotic stress. Our study highlights the importance of gene expression and natural selection in shaping the evolution of protein-coding genes in Picea species, and sets the ground for further studies investigating the evolution of individual gene families in gymnosperms.
Hu, Jiaqing; Yang, Dandan; Chen, Wei; Li, Chuanhao; Wang, Yandong; Zeng, Yongqing; Wang, Hui
There is little genomic information regarding gene expression differences at the whole blood transcriptome level of different pig breeds at the neonatal stage. To solve this, we characterized differentially expressed genes (DEGs) in the whole blood of Dapulian (DPL) and Landrace piglets using RNA-seq (RNA-sequencing) technology. In this study, 83 DEGs were identified between the two breeds. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses identified immun...
Richards, Stephen; Liu, Yue; Bettencourt, Brian R.; Hradecky, Pavel; Letovsky, Stan; Nielsen, Rasmus; Thornton, Kevin; Todd, Melissa J.; Chen, Rui; Meisel, Richard P.; Couronne, Olivier; Hua, Sujun; Smith, Mark A.; Bussemaker, Harmen J.; van Batenburg, Marinus F.; Howells, Sally L.; Scherer, Steven E.; Sodergren, Erica; Matthews, Beverly B.; Crosby, Madeline A.; Schroeder, Andrew J.; Ortiz-Barrientos, Daniel; Rives, Catherine M.; Metzker, Michael L.; Muzny, Donna M.; Scott, Graham; Steffen, David; Wheeler, David A.; Worley, Kim C.; Havlak, Paul; Durbin, K. James; Egan, Amy; Gill, Rachel; Hume, Jennifer; Morgan, Margaret B.; Miner, George; Hamilton, Cerissa; Huang, Yanmei; Waldron, Lenee; Verduzco, Daniel; Blankenburg, Kerstin P.; Dubchak, Inna; Noor, Mohamed A.F.; Anderson, Wyatt; White, Kevin P.; Clark, Andrew G.; Schaeffer, Stephen W.; Gelbart, William; Weinstock, George M.; Gibbs, Richard A.
The genome sequence of a second fruit fly, D. pseudoobscura, presents an opportunity for comparative analysis of a primary model organism D. melanogaster. The vast majority of Drosophila genes have remained on the same arm, but within each arm gene order has been extensively reshuffled leading to the identification of approximately 1300 syntenic blocks. A repetitive sequence is found in the D. pseudoobscura genome at many junctions between adjacent syntenic blocks. Analysis of this novel repetitive element family suggests that recombination between offset elements may have given rise to many paracentric inversions, thereby contributing to the shuffling of gene order in the D. pseudoobscura lineage. Based on sequence similarity and synteny, 10,516 putative orthologs have been identified as a core gene set conserved over 35 My since divergence. Genes expressed in the testes had higher amino acid sequence divergence than the genome wide average consistent with the rapid evolution of sex-specific proteins. Cis-regulatory sequences are more conserved than control sequences between the species but the difference is slight, suggesting that the evolution of cis-regulatory elements is flexible. Overall, a picture of repeat mediated chromosomal rearrangement, and high co-adaptation of both male genes and cis-regulatory sequences emerges as important themes of genome divergence between these species of Drosophila.
S. A. Wani
Full Text Available Aim: Present study was carried out to sequence and phylogenetic analysis of CXCR2 gene of Murrah buffalo. Materials and Methods: For the present investigation, from a group of forty eight Murrah buffaloes (Bubalus bubalis, blood samples were collected randomly from eight animals, out of which four were healthy and four were mastitic. Results: The amplification of Interleukin-8B (IL-8B receptor gene target sequence was carried out using the primer pair in an optimized polymerase chain reaction. Partial sequencing of IL-8B receptor gene of Bubalus bubalis (Murrah has been done successfully. The sequences of IL-8B receptor gene showed 99% homology to that of Bos indicus × Bos taurus, 98% to that of Bos taurus, 97% to that of Ovis aries, 93% to that of Sus scrofa, 92% to that of Equus caballus and 90% to that of Felis catus. Conclusion: From the present study it can be concluded that the PCR amplification procedure for target region of IL-8B receptor gene yielding 459 bp products has been standardized, which yielded consistent and specific amplification. Amplification of partial IL-8B receptor gene (exon 2- 459 bp using self designed primers specific for cattle ortholog sequence signifies that the locus is conserved in cattle and buffaloes. In phylogenetic tree, the target sequence of IL-8B receptor gene of Bubalus bubalis were found to be more closely related to Bos indicus × Bos Taurus and Bos taurus than to Ovis aries and Sus scrofa.
Suzuki, Akinori; Komata, Hidero; Iwashita, Shogo; Seto, Shotaro; Ikeya, Hironobu; Tabata, Mitsutoshi; Kitano, Takashi
In vertebrates, there are four major genes in the RH (Rhesus) gene family, RH, RHAG, RHBG, and RHCG. These genes are thought to have been formed by the two rounds of whole-genome duplication (2R-WGD) in the common ancestor of all vertebrates. In our previous work, where we analyzed details of the gene duplications process of this gene family, three nucleotide sequences belonging to this family were identified in Far Eastern brook lamprey (Lethenteron reissneri), and the phylogenetic positions of the genes were determined. Lampreys, along with hagfishes, are cyclostomata (jawless fishes), which is a sister group of gnathostomata (jawed vertebrates). Although those results suggested that one gene was orthologous to the gnathostome RHCG genes, we did not identify clear orthologues for other genes. In this study, therefore, we identified three novel cDNA sequences that belong to the RH gene family using de novo transcriptome analysis of another cyclostome: the brown hagfish (Eptatretus atami). We also determined the nucleotide sequences for the RHBG and RHCG genes in a red stingray (Dasyatis akajei), which belongs to the cartilaginous fishes. The phylogenetic tree showed that two brown hagfish genes, which were probably duplicated in the cyclostome lineage, formed a cluster with the gnathostome RHAG genes, whereas another brown hagfish gene formed a cluster with the gnathostome RHCG genes. We estimated that the RH genes had a higher evolutionary rate than the RHAG, RHBG, and RHCG genes. Interestingly, in the RHBG genes, only the bird lineage showed a higher rate of nonsynonymous substitutions. It is likely that this higher rate was caused by a state of relaxed functional constraints rather than positive selection nor by pseudogenization.
Wang, Qingguo; Xia, Junfeng; Jia, Peilin; Pao, William; Zhao, Zhongming
Gene fusions are important genomic events in human cancer because their fusion gene products can drive the development of cancer and thus are potential prognostic tools or therapeutic targets in anti-cancer treatment. Major advancements have been made in computational approaches for fusion gene discovery over the past 3 years due to improvements and widespread applications of high-throughput next generation sequencing (NGS) technologies. To identify fusions from NGS data, existing methods typically leverage the strengths of both sequencing technologies and computational strategies. In this article, we review the NGS and computational features of existing methods for fusion gene detection and suggest directions for future development.
Isaki, L; Kawakami, M; Beers, R; Hom, R; Wu, H.C.
In Escherichia coli, prolipoprotein signal peptidase is encoded by the lsp gene, which is organized into an operon consisting of ileS, lsp, and three open reading frames, designated genes x, orf-149, and orf-316. The Enterobacter aerogenes lsp gene was cloned and expressed in E. coli. The nucleotide sequence of the Enterobacter aerogenes lsp gene and a part of its flanking sequences were determined. A high degree of homology was found between the E. coli ileS-lsp operon and the corresponding ...
Full Text Available Gene conversion (GCV as a mechanism of immunoglobulin diversification is well established in a few species. However, definitive evidence of GCV-like events in human immunoglobulin genes is scarce. GCV is mediated by activation-induced cytidine deaminase (AID. The lack of evidence of GCV in human rearranged immunoglobulin gene sequences is puzzling given the presence of highly similar germline donors and all the enzymatic machinery required for GCV. In this study, we undertook a computational analysis of rearranged IGHV3-23*01 gene sequences from common variable immunodeficiency (CVID patients and healthy individuals to survey ‘GCV-like’ activities. Our search identified strong evidence of GCV-like patterns. Germline VH sequences were identified as potential donors for clustered mutations in rearranged IGHV3-23*01 gene sequences. We identified minimum and maximum sequence identities between donor and recipient sequences that can serve as targets for GCV and our findings are consistent with those reported in literature. We observed that GCV-like tracts are flanked by activation-induced cytidine deaminase (AID hotspot motifs. Structural modeling of IGHV3-23*01 gene sequence revealed that hypermutable bases flanking GCV-like tracts, are in the single stranded DNA (ssDNA of stable stem-loop structures (SLSs. SsDNA is inherently fragile and also an optimal target for AID. We speculate that GCV could have been initiated by the targeting of hypermutable bases in ssDNA state in stable SLSs, plausibly by AID. We have observed that the frequency of GCV-like events is significantly higher in rearranged IGHV323-*01 sequences from healthy individuals compared to that of CVID patients. GCV, unlike SHM, can result in multiple base substitutions that can alter many amino acids. The extensive changes in antibody affinity by GCV-like events, as identified in this study would be instrumental in protecting humans against pathogens that diversify their genome by
Kothapalli, Naga Rama; Fugmann, Sebastian D
Activation-induced cytidine deaminase (AID) is a key enzyme for antibody-mediated immune responses. Antibodies are encoded by the immunoglobulin genes and AID acts as a transcription-dependent DNA mutator on these genes to improve antibody affinity and effector functions. An emerging theme in field is that many transcribed genes are potential targets of AID, presenting an obvious danger to genomic integrity. Thus there are mechanisms in place to ensure that mutagenic outcomes of AID activity are specifically restricted to the immunoglobulin loci. Cis-regulatory targeting elements mediate this effect and their mode of action is probably a combination of immunoglobulin gene specific activation of AID and a perversion of faithful DNA repair towards error-prone outcomes.
Oct 18, 2007 ... 1Department of Environmental Biotechnology, Bharathidasan University, ... script and UA cloning vector (QIAGEN PCR Cloning Kit) was used to clone ..... Expression of a Arabidopsis sucrose synthase gene indicates a role.
Full Text Available Recently, a novel species, Mycobacterium yongonense (DSM 45126(T, was introduced and while it is phylogenetically related to Mycobacterium intracellulare, it has a distinct RNA polymerase β-subunit gene (rpoB sequence that is identical to that of Mycobacterium parascrofulaceum, which is a distantly related scotochromogen, which suggests the acquisition of the rpoB gene via a potential lateral gene transfer (LGT event. The aims of this study are to prove the presence of the LGT event in the rpoB gene of the M. yongonense strains via multilocus sequence analysis (MLSA. In order to determine the potential of an LGT event in the rpoB gene of the M. yongonense, the MLSA based on full rpoB sequences (3447 or 3450 bp and on partial sequences of five other targets [16S rRNA (1383 or 1395 bp, hsp65 (603 bp, dnaJ (192 bp, recA (1053 bp, and sodA (501 bp] were conducted. Incongruences between the phylogenetic analysis of the full rpoB and the five other genes in a total of three M. yongonense strains [two clinical strains (MOTT-12 and MOTT-27 and one type strain (DSM 45126(T] were observed, suggesting that rpoB gene of three M. yongonense strains may have been acquired very recently via an LGT event from M. parascrofulaceum, which is a distantly related scotochromogen.
Feb 16, 2012 ... insecticidal toxin coding gene was cloned from an Egyptian isolate of B. thuringiensis. Sequence analyses of the cryIC ... insecticidal proteins into crop plants to protect them from ..... eggplant, cotton, cucumber and tomato, etc.
Pérez del Molino Bernal, IC; Cano García, María Eliecer; García de la Fuente, C; Martínez Martínez, Luis; López, Monica; Fernández Mazarrasa, Carlos; Agüero Balbín, Jesús
Recurrent bloodstream infections caused by a Gram-positive bacterium affected an immunocompromised child. Tsukamurella pulmonis was the microorganism identified by secA1 gene sequencing. Antibiotic treatment in combination with removal of the subcutaneous port healed the patient.
Background Salamanders are unique among vertebrates in their ability to completely regenerate amputated limbs through the mediation of blastema cells located at the stump ends. This regeneration is nerve-dependent because blastema formation and regeneration does not occur after limb denervation. To obtain the genomic information of blastema tissues, de novo transcriptomes from both blastema tissues and denervated stump ends of Ambystoma mexicanum (axolotls) 14 days post-amputation were sequenced and compared using Solexa DNA sequencing. Results The sequencing done for this study produced 40,688,892 reads that were assembled into 307,345 transcribed sequences. The N50 of transcribed sequence length was 562 bases. A similarity search with known proteins identified 39,200 different genes to be expressed during limb regeneration with a cut-off E-value exceeding 10-5. We annotated assembled sequences by using gene descriptions, gene ontology, and clusters of orthologous group terms. Targeted searches using these annotations showed that the majority of the genes were in the categories of essential metabolic pathways, transcription factors and conserved signaling pathways, and novel candidate genes for regenerative processes. We discovered and confirmed numerous sequences of the candidate genes by using quantitative polymerase chain reaction and in situ hybridization. Conclusion The results of this study demonstrate that de novo transcriptome sequencing allows gene expression analysis in a species lacking genome information and provides the most comprehensive mRNA sequence resources for axolotls. The characterization of the axolotl transcriptome can help elucidate the molecular mechanisms underlying blastema formation during limb regeneration. PMID:23815514
Kanda, Kojun; Pflug, James M; Sproul, John S; Dasenko, Mark A; Maddison, David R
In this paper we explore high-throughput Illumina sequencing of nuclear protein-coding, ribosomal, and mitochondrial genes in small, dried insects stored in natural history collections. We sequenced one tenebrionid beetle and 12 carabid beetles ranging in size from 3.7 to 9.7 mm in length that have been stored in various museums for 4 to 84 years. Although we chose a number of old, small specimens for which we expected low sequence recovery, we successfully recovered at least some low-copy nuclear protein-coding genes from all specimens. For example, in one 56-year-old beetle, 4.4 mm in length, our de novo assembly recovered about 63% of approximately 41,900 nucleotides in a target suite of 67 nuclear protein-coding gene fragments, and 70% using a reference-based assembly. Even in the least successfully sequenced carabid specimen, reference-based assembly yielded fragments that were at least 50% of the target length for 34 of 67 nuclear protein-coding gene fragments. Exploration of alternative references for reference-based assembly revealed few signs of bias created by the reference. For all specimens we recovered almost complete copies of ribosomal and mitochondrial genes. We verified the general accuracy of the sequences through comparisons with sequences obtained from PCR and Sanger sequencing, including of conspecific, fresh specimens, and through phylogenetic analysis that tested the placement of sequences in predicted regions. A few possible inaccuracies in the sequences were detected, but these rarely affected the phylogenetic placement of the samples. Although our sample sizes are low, an exploratory regression study suggests that the dominant factor in predicting success at recovering nuclear protein-coding genes is a high number of Illumina reads, with success at PCR of COI and killing by immersion in ethanol being secondary factors; in analyses of only high-read samples, the primary significant explanatory variable was body length, with small beetles
Fredrick, Kurt; Ibba, Michael
Sixty-one codons specify 20 amino acids, offering cells many options for encoding a polypeptide sequence. Two new studies (Cannarrozzi et al., 2010; Tuller et al., 2010) now foster the idea that patterns of codon usage can control ribosome speed, fine-tuning translation to increase the efficiency...
Jagielska, Elżbieta; Płucienniczak, Andrzej; Dąbrowska, Magdalena; Dowierciał, Anna; Rode, Wojciech
Thymidylate synthase is a housekeeping gene, designated ancient due to its role in DNA synthesis and ubiquitous phyletic distribution. The genomic sequences were characterized coding for thymidylate synthase in two species of the genus Trichinella, an encapsulating T. spiralis and a non-encapsulating T. pseudospiralis. Based on the sequence of parasitic nematode Trichinella spiralis thymidylate synthase cDNA, PCR techniques were employed. Each of the respective gene structures encompassed 6 exons and 5 introns located in conserved sites. Comparison with the corresponding gene structures of other eukaryotic species revealed lack of common introns that would be shared among selected fungi, nematodes, mammals and plants. The two deduced amino acid sequences were 96% identical. In addition to the thymidylate synthase gene, the intron-less retrocopy, i.e. a processed pseudogene, with sequence identical to the T. spiralis gene coding region, was found to be present within the T. pseudospiralis genome. This pseudogene, instead of the gene, was confirmed by RT-PCR to be expressed in the parasite muscle larvae. Intron load, as well as distribution of exon and intron phases in thymidylate synthase genes from various sources, point against the theory of gene assembly by the primordial exon shuffling and support the theory of evolutionary late intron insertion into spliceosomal genes. Thymidylate synthase pseudogene expressed in T. pseudospiralis muscle larvae is designated a retrogene.
Naser, S; Thompson, F L; Hoste, B; Gevers, D; Vandemeulebroecke, K; Cleenwerck, I; Thompson, C C; Vancanneyt, M; Swings, J
The relatedness among 91 Enterococcus strains representing all validly described species was investigated by comparing a 1,102-bp fragment of atpA, the gene encoding the alpha subunit of ATP synthase. The relationships observed were in agreement with the phylogeny inferred from 16S rRNA gene sequence analysis. However, atpA gene sequences were much more discriminatory than 16S rRNA for species differentiation. All species were differentiated on the basis of atpA sequences with, at a maximum, 92% similarity. Six members of the Enterococcus faecium species group (E. faecium, E. hirae, E. durans, E. villorum, E. mundtii, and E. ratti) showed > 99% 16S rRNA gene sequence similarity, but the highest value of atpA gene sequence similarity was only 89.9%. The intraspecies atpA sequence similarities for all species except E. faecium strains varied from 98.6 to 100%; the E. faecium strains had a lower atpA sequence similarity of 96.3%. Our data clearly show that atpA provides an alternative tool for the phylogenetic study and identification of enterococci.
Zhao, Ya-e; Wang, Zheng-hang; Xu, Yang; Xu, Ji-ru; Liu, Wen-yan; Wei, Meng; Wang, Chu-ying
To our knowledge, few reports on Demodex studied at the molecular level are available at present. In this study our group, for the first time, cloned, sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum, Demodex brevis, and Demodex canis (three isolates from each species) from Xi'an China, by designing specific primers based on the only partial sequence of the CHS gene of D. canis from Japan, retrieved from GenBank. Results show that amplification was successful only in three D. canis isolates and one D. brevis isolate out of the nine Demodex isolates. The obtained fragments were sequenced to be 339 bp for D. canis and 338 bp for D. brevis. The CHS gene sequence similarities between the three Xi'an D. canis isolates and one Japanese D. canis isolate ranged from 99.7% to 100.0%, and those between four D. canis isolates and one D. brevis isolate were 99.1%-99.4%. Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters, according with the traditional classification. Two open reading frames (ORFs) were identified in each CHS gene sequenced, and their corresponding amino acid sequences were located at the catalytic domain. The relatively conserved sequences could be deduced to be a CHS class A gene, which is associated with chitin synthesis in the integument of Demodex mites.
Full Text Available Leptin is a 16 kD protein, synthesized by adipose tissue and is involved in regulation of feed intake, energy balance, fertility and immune functions. Present study was undertaken with the objectives of sequence characterization and studying the nucleotide variation in leptin gene in Murrah buffalo. The leptin gene consists of three exons and two introns which spans about 18.9kb, of which the first exon is not transcribed into protein. In buffaloes, the leptin gene is located on chromosome eight and maps to BBU 8q32. The leptin gene was amplified by PCR using oligonucleotide primers to obtain 289 bp fragment comprising of exon 2 and 405 bp fragment containing exon 3 of leptin gene. The amplicons were sequenced to identify variation at nucleotide level. Sequence comparison of buffalo with cattle reveals variation at five nucleotide sequences at positions 983, 1083, 1147, 1152, 1221 and all the SNPs are synonymous resulting no in change in amino acids. Three of these eight nucleotide variations have been reported for the first time in buffalo. The results indicate conservation of DNA sequence between cattle and buffalo. Nucleotide sequence variations observed at leptin gene between Bubalus bubalis and Bos taurus species revealed 97% nucleotide identity.
J-P. Julien (Jean-Pierre); F. Cote; L. Beaudet (Lucille); M. Sidky (Malak); D. Flavell (David); F.G. Grosveld (Frank); W. Mushynski (Walter)
textabstractWe have determined the complete nucleotide sequence of the mouse gene encoding the neurofilament NF-H protein. The C-terminal domain of NF-H is very rich in charged amino acids (aa) and contains a 3-aa sequence, Lys-Ser-Pro, that is repeated 51 times within a stretch of 368 aa. The
Uil, T.G.; Haisma, H.J.; Rots, Marianne
Designer molecules that can specifically target pre-determined DNA sequences provide a means to modulate endogenous gene function. Different classes of sequence-specific DNA-binding agents have been developed, including triplex-forming molecules, synthetic polyamides and designer zinc finger protein
Wang, Cai-Yun; Ma, Yun-Han; He, Da-Jian; Yang, Shi-Hua
In this paper, partial sequences of the tree shrew (Tupaia belangeri) Klf4, Sox2, and c-Myc genes were cloned and sequenced, which were 382, 612, and 485 bp in length and encoded 127, 204, and 161 amino acids, respectively. Whereas, their cDNA sequence identities with those of human were 89%, 98%, and 89%, respectively. Their phylogenetic tree results indicated different topologies and suggested individual evolutional pathways. These results can facilitate further functional studies.
Yang, Ching-chia; Sakai, Hiroaki; Numa, Hisataka; Itoh, Takeshi
Although a large number of genes are expected to correctly solve a phylogenetic relationship, inconsistent gene tree topologies have been observed. This conflicting evidence in gene tree topologies, known as gene tree discordance, becomes increasingly important as advanced sequencing technologies produce an enormous amount of sequence information for phylogenomic studies among closely related species. Here, we aim to characterize the gene tree discordance of the Asian cultivated rice Oryza sativa and its progenitor, O. rufipogon, which will be an ideal case study of gene tree discordance. Using genome and cDNA sequences of O. sativa and O. rufipogon, we have conducted the first in-depth analyses of gene tree discordance in Asian rice. Our comparison of full-length cDNA sequences of O. rufipogon with the genome sequences of the japonica and indica cultivars of O. sativa revealed that 60% of the gene trees showed a topology consistent with the expected one, whereas the remaining genes supported significantly different topologies. Moreover, the proportions of the topologies deviated significantly from expectation, suggesting at least one hybridization event between the two subgroups of O. sativa, japonica and indica. In fact, a genome-wide alignment between japonica and indica indicated that significant portions of the indica genome are derived from japonica. In addition, literature concerning the pedigree of the indica cultivar strongly supported the hybridization hypothesis. Our molecular evolutionary analyses deciphered complicated evolutionary processes in closely related species. They also demonstrated the importance of gene tree discordance in the era of high-speed DNA sequencing.
Xiufeng Zhong; Yongping Li; Shuqi Huang; Bo Ning; Chunyan Zhang; Jianliang Zheng; Guanguang Feng
Purpose: To clone the variable region gene of light chain of monoclonal antibody against human retinoblastoma and to analyze the characterization of its nucleotide sequence as well as amino acid sequence.Methods: Total RNA was extracted from 3C6 hybridoma cells secreting specific monoclonal antibody(McAb)against human retinoblastoma(RB), then transcripted reversely into cDNA with olig-dT primers.The variable region of the light chain (VL) gene fragments was amplified using polymeerase chain reaction(PCR) and further cloned into pGEM(R) -T Easy vector. Then, 3C6 VL cDNA was sequenced by Sanger's method.Homologous analysis was done by NCBI BLAST.Results: The complete nucleotide sequence of 3C6 VL cDNA consisted of 321 bp encoding 107 amino acid residues, containing four workframe regions(FRs)and three complementarity-determining regions (CDRs) as well as the typical structure of two cys residues. The sequence is most homological to a member of the Vk9 gene family, and its chain utilizes the Jkl gene segment.Conclusion: The light chain variable region gene of the McAb against human RB was amplified successfully , which belongs to the Vk9 gene family and utilizes Vk-Jk1 gene rearrangement. This study lays a good basis for constructing a recombinant antibody and for making a new targeted therapeutic agents against retinoblastoma.
The objective of this project is the development of practical software to automate the identification of genes in anonymous DNA sequences from the human, and other higher eukaryotic genomes. A software system for automated sequence analysis, gm (gene modeler) has been designed, implemented, tested, and distributed to several dozen laboratories worldwide. A significantly faster, more robust, and more flexible version of this software, gm 2.0 has now been completed, and is being tested by operational use to analyze human cosmid sequence data. A range of efforts to further understand the features of eukaryoyic gene sequences are also underway. This progress report also contains papers coming out of the project including the following: gm: a Tool for Exploratory Analysis of DNA Sequence Data; The Human THE-LTR(O) and MstII Interspersed Repeats are subfamilies of a single widely distruted highly variable repeat family; Information contents and dinucleotide compostions of plant intron sequences vary with evolutionary origin; Splicing signals in Drosophila: intron size, information content, and consensus sequences; Integration of automated sequence analysis into mapping and sequencing projects; Software for the C. elegans genome project.
Lu, Cairui; Zou, Changsong; Song, Guoli
Traditional gene mapping using forward genetic approaches is conducted primarily through construction of a genetic linkage map, the process of which is tedious and time-consuming, and often results in low accuracy of mapping and large mapping intervals. With the rapid development of high-throughput sequencing technology and decreasing cost of sequencing, a variety of simple and quick methods of gene mapping through sequencing have been developed, including direct sequencing of the mutant genome, sequencing of selective mutant DNA pooling, genetic map construction through sequencing of individuals in population, as well as sequencing of transcriptome and partial genome. These methods can be used to identify mutations at the nucleotide level and has been applied in complex genetic background. Recent reports have shown that sequencing mapping could be even done without the reference of genome sequence, hybridization, and genetic linkage information, which made it possible to perform forward genetic study in many non-model species. In this review, we summarized these new technologies and their application in gene mapping.
Full Text Available Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS, termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1 genes that are known to cause disease through haploinsufficiency, 2 genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3 genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4 genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding
Petrovski, Slavé; Gussow, Ayal B; Wang, Quanli; Halvorsen, Matt; Han, Yujun; Weir, William H; Allen, Andrew S; Goldstein, David B
Noncoding sequence contains pathogenic mutations. Yet, compared with mutations in protein-coding sequence, pathogenic regulatory mutations are notoriously difficult to recognize. Most fundamentally, we are not yet adept at recognizing the sequence stretches in the human genome that are most important in regulating the expression of genes. For this reason, it is difficult to apply to the regulatory regions the same kinds of analytical paradigms that are being successfully applied to identify mutations among protein-coding regions that influence risk. To determine whether dosage sensitive genes have distinct patterns among their noncoding sequence, we present two primary approaches that focus solely on a gene's proximal noncoding regulatory sequence. The first approach is a regulatory sequence analogue of the recently introduced residual variation intolerance score (RVIS), termed noncoding RVIS, or ncRVIS. The ncRVIS compares observed and predicted levels of standing variation in the regulatory sequence of human genes. The second approach, termed ncGERP, reflects the phylogenetic conservation of a gene's regulatory sequence using GERP++. We assess how well these two approaches correlate with four gene lists that use different ways to identify genes known or likely to cause disease through changes in expression: 1) genes that are known to cause disease through haploinsufficiency, 2) genes curated as dosage sensitive in ClinGen's Genome Dosage Map, 3) genes judged likely to be under purifying selection for mutations that change expression levels because they are statistically depleted of loss-of-function variants in the general population, and 4) genes judged unlikely to cause disease based on the presence of copy number variants in the general population. We find that both noncoding scores are highly predictive of dosage sensitivity using any of these criteria. In a similar way to ncGERP, we assess two ensemble-based predictors of regional noncoding importance, nc
Boccia, Angelo; Petrillo, Mauro; di Bernardo, Diego; Guffanti, Alessandro; Mignone, Flavio; Confalonieri, Stefano; Luzi, Lucilla; Pesole, Graziano; Paolella, Giovanni; Ballabio, Andrea; Banfi, Sandro
The identification and study of evolutionarily conserved genomic sequences that surround disease-related genes is a valuable tool to gain insight into the functional role of these genes and to better elucidate the pathogenetic mechanisms of disease. We created the DG-CST (Disease Gene Conserved Sequence Tags) database for the identification and detailed annotation of human–mouse conserved genomic sequences that are localized within or in the vicinity of human disease-related genes. CSTs are defined as sequences that show at least 70% identity between human and mouse over a length of at least 100 bp. The database contains CST data relative to over 1088 genes responsible for monogenetic human genetic diseases or involved in the susceptibility to multifactorial/polygenic diseases. DG-CST is accessible via the internet at http://dgcst.ceinge.unina.it/ and may be searched using both simple and complex queries. A graphic browser allows direct visualization of the CSTs and related annotations within the context of the relative gene and its transcripts. PMID:15608249
2. Department of Pure and Applied Zoology, Federal University of Agriculture, Abeokuta, Nigeria. 3. ... Keywords: Internal transcribed spacer genes, phylogenetic, genetic ... ization of fungi by polymerase chain reaction (PCR) am- .... Basic Local Alignment search Tool (BLAST) to establish ..... Population structure and.
Davidsen, T.; Rodland, E.A.; Lagesen, K.
in these organisms. Pasteurella multocida also displayed high frequencies of a putative DUS identical to that previously identified in H. influenzae and with a skewed distribution towards genome maintenance genes, indicating that this bacterium might be transformation competent under certain conditions....
De Leo, A.A.; Wheeler, D.; Lefevre, C.; Cheng, Jan-Fang; Hope, R.; Kuliwaba, J.; Nicholas, K.R.; Westermanc, M.; Graves, J.A.M.
Comparing globin genes and their flanking sequences across many species has allowed globin gene evolution to be reconstructed in great detail. Marsupial globin sequences have proved to be of exceptional significance. A previous finding of a beta-like omega gene in the alpha cluster in the tammar wallaby suggested that the alpha and beta cluster evolved via genome duplication and loss rather than tandem duplication. To confirm and extend this important finding we isolated and sequenced BACs containing the alpha and beta loci from the distantly related Australian marsupial Sminthopsis macroura. We report that the alpha gene lies in the same BAC as the beta-like omega gene, implying that the alpha-omega juxtaposition is likely to be conserved in all marsupials. The LUC7L gene was found 3' of the S. macroura alpha locus, a gene order shared with humans but not mouse, chicken or fugu. Sequencing a BAC contig that contained the S. macroura beta globin and epsilon globin loci showed that the globin cluster is flanked by olfactory genes, demonstrating a gene arrangement conserved for over 180 MY. Analysis of the region 5' to the S. macroura epsilon globin gene revealed a region similar to the eutherian LCR, containing sequences and potential transcription factor binding sites with homology to eutherian hypersensitive sites 1 to 5. FISH mapping of BACs containing S. macroura alpha and beta globin genes located the beta globin cluster on chromosome 3q and the alpha locus close to the centromere on 1q, resolving contradictory map locations obtained by previous radioactive in situ hybridization.
CACNA1S gene encodes the α1 subunit of the calcium channel. The mutation of CACNA1S gene can cause hypokalemic periodic paralysis (HypoKPP) and maliglant hyperthermia synarome (MHS) in hu-man beings. Current research on CACNA1S was mainly in human being and model animal, but rarely in livestock and poultry. In this study, Yorkshire pigs (23), Pietrain pigs (30), Jinhua pigs (115) and the second generation (126) of crossbred of Jinhua and Pietrain were used. Primers were designed ac-cording to the sequence of human CACNA1S gene and PCR was carried out using pig genome DNA. PCR products were sequenced and compared with that of human, and then single nucleotide poly-morphisms (SNPs) were investigated by PCR-SSCP, while PCR-RFLP tests were performed to validate the mutations. Results indicated: (1) the 5211 bp DNA fragments of porcine CACNA1S gene were ac-quired (GenBank accession number: DQ767693 ) and the identity of the exon region was 82.6% be-tween human and pig; (2) fifty-seven mutations were found within the cloned sequences, among which 24 were in exon region; (3) the results of PCR-RFLP were in accordance with that of PCR-SSCP. Ac-cording to the EST of porcine CACNA1S gene published in GenBank (Bx914582, Bx666997), 8 of the 11 SNPs identified in the present study were consistent with the base difference between two EST frag-ments.
Kehrenberg, Corinna; Aarestrup, Frank Møller; Schwarz, Stefan
exporter gene lsa(B) and the gene cfr for combined resistance to phenicols, lincosamides, oxazolidinones, pleuromutilins, and streptogramin A antibiotics bracketed by IS21-558 insertion sequences orientated in the same direction. A 6-bp target site duplication was detected at the integration site within...
The codons for some conserved amino acids are found to be the same between homologous genes from different species when the statistics of codon usage would suggest that they should be different. I examine whether this 'coincidence' of codon usage could be due to genetic mechanisms homogenising the DNA around specific sites. This paper describes the further analysis of the coincident codons in 19 genes (a total of 96 homologues) for slippage. Coincident codons arise in contexts of increased sequence simplicity, and have a high chance of occurring within sequences similar to the recombination-prone minisatellite 'core' sequence. This suggests a role of genetic homogenisation in their generation.
Dapunt, Ulrike; Gaida, Matthias M; Meyle, Eva; Prior, Birgit; Hänsch, Gertrud M
The recognition and phagocytosis of free-swimming (planktonic) bacteria by polymorphonuclear neutrophils have been investigated in depth. However, less is known about the neutrophil response towards bacterial biofilms. Our previous work demonstrated that neutrophils recognize activating entities within the extracellular polymeric substance (EPS) of biofilms (the bacterial heat shock protein GroEL) and that this process does not require opsonization. Aim of this study was to evaluate the release of DNA by neutrophils in response to biofilms, as well as the release of the inflammatory cytokine MRP-14. Neutrophils were stimulated with Staphylococcus epidermidis biofilms, planktonic bacteria, extracted EPS and GroEL. Release of DNA and of MRP-14 was evaluated. Furthermore, tissue samples from patients suffering from biofilm infections were collected and evaluated by histology. MRP-14 concentration in blood samples was measured. We were able to show that biofilms, the EPS and GroEL induce DNA release. MRP-14 was only released after stimulation with EPS, not GroEL. Histology of tissue samples revealed MRP-14 positive cells in association with neutrophil infiltration and MRP-14 concentration was elevated in blood samples of patients suffering from biofilm infections. Our data demonstrate that neutrophil-activating entities are present in the EPS and that GroEL induces DNA release by neutrophils.
Medema, Marnix H.; Takano, Eriko; Breitling, Rainer; Nowick, Katja
The genes encoding many biomolecular systems and pathways are genomically organized in operons or gene clusters. With MultiGeneBlast, we provide a user-friendly and effective tool to perform homology searches with operons or gene clusters as basic units, instead of single genes. The contextualizatio
Medema, Marnix H.; Takano, Eriko; Breitling, Rainer; Nowick, Katja
The genes encoding many biomolecular systems and pathways are genomically organized in operons or gene clusters. With MultiGeneBlast, we provide a user-friendly and effective tool to perform homology searches with operons or gene clusters as basic units, instead of single genes. The
My project focuses on characterizing different cyanobacteria in thrombolitic mats found on the island of Highborn Cay, Bahamas. Thrombolites are interesting ecosystems because of the ability of bacteria in these mats to remove carbon dioxide from the atmosphere and mineralize it as calcium carbonate. In the future they may be used as models to develop carbon sequestration technologies, which could be used as part of regenerative life systems in space. These thrombolitic communities are also significant because of their similarities to early communities of life on Earth. I targeted two cyanobacteria in my research, Dichothrix spp. and whatever black is, since they are believed to be important to carbon sequestration in these thrombolitic mats. The goal of my summer research project was to molecularly identify these two cyanobacteria. DNA was isolated from each organism through mat dissections and DNA extractions. I ran Polymerase Chain Reactions (PCR) to amplify the 16S ribosomal RNA (rRNA) gene in each cyanobacteria. This specific gene is found in almost all bacteria and is highly conserved, meaning any changes in the sequence are most likely due to evolution. As a result, the 16S rRNA gene can be used for bacterial identification of different species based on the sequence of their 16S rRNA gene. Since the exact sequence of the Dichothrix gene was unknown, I designed different primers that flanked the gene based on the known sequences from other taxonomically similar cyanobacteria. Once the 16S rRNA gene was amplified, I cloned the gene into specialized Escherichia coli cells and sent the gene products for sequencing. Once the sequence is obtained, it will be added to a genetic database for future reference to and classification of other Dichothrix sp.
Christiansen, S.K.; Justesen, A.F.; Giese, H.
to be similar for all four genes. The results of the codon-usage analysis suggest that Egh is more flexible than other fungi in the choice of nucleotides at the wobble position. Codon-usage preferences in Egh and barley genes indicate a level of difference which may be exploited to discriminate between fungal...... and plant genes in sequence mixtures. The Egh gpd promoter appears to be superior to that of the Egh beta-tubulin gene (tub2) for driving the E. coli beta-glucuronidase (GUS) gene in transformation experiments....
徐来祥; 孔繁华; 华育平
Growth hormone gene (GH) of Rhinopithecus roxellanae was amplified by PCR based on the sequences of the reported mammalian growth hormone gene for the first time. The amplified fragment was about 1.8 kb. It was cloned and its upper stream was sequenced. This sequencing region consists of a 5￠ flanking regulatory region, exon I and part of exon II, intron I of growth hormone gene. Comparing the corresponding sequences of growth hormone gene between Rhinopithecus roxellanae and the porcine, we concluded that the homology reached 81% in the region, and there was high conservation in the 5￠ flanking sequence. The kinds of amino acids of exon I and exon II for about 90% were the same to those in pig. Many mutations occurred in the degenerate site of the triplet code. In the nucleotides of intron I, there were only 72% homologies with those in pig. It means that introns and 3￠ flanking sequence maybe play an important part in growth hormone gene regulation of the different animals.
Kano, R; Aihara, S; Nakamura, Y; Watanabe, S; Hasegawa, A
Chitin synthase 1 (Chs1) genes from Microsporum equinum and Trichophyton equinum were compared with those of the other dermatophytes. The Chs1 nucleotide sequences of these dermatophytes from horses showed more than 80% similarity to those of Arthroderma benhamiae, A. fulvum, A. grubyi, A. gypseum, A. incruvatum, A. otae, A. simii, A. vanbreuseghemii, Epidermophyton floccosum, T. mentagrophytes var. interdigitale (T. interdigitale), T. rubrum and T. violaceum. Especially high degree of nucleotide sequence similarity of more than 99% was noted between the Chs1 gene fragments of M. equinum and A. otae, and those of T. equinum, T. interdigitale and A. vanbreuseghemii, respectively. The phylogenetic analysis of their sequences revealed that M. equinum was genetically very close to A. otae and T. equinum to A. vanbreuseghemii. A molecular analysis of Chs1 genes will provide useful information for the genetic relatedness of M. equinum and T. equinum and confirm the value of DNA sequencing in identification of these two dermatophytes.
Saville Barry J
Full Text Available Abstract Background Ustilago maydis is the basidiomycete fungus responsible for common smut of corn and is a model organism for the study of fungal phytopathogenesis. To aid in the annotation of the genome sequence of this organism, several expressed sequence tag (EST libraries were generated from a variety of U. maydis cell types. In addition to utility in the context of gene identification and structure annotation, the ESTs were analyzed to identify differentially abundant transcripts and to detect evidence of alternative splicing and anti-sense transcription. Results Four cDNA libraries were constructed using RNA isolated from U. maydis diploid teliospores (U. maydis strains 518 × 521 and haploid cells of strain 521 grown under nutrient rich, carbon starved, and nitrogen starved conditions. Using the genome sequence as a scaffold, the 15,901 ESTs were assembled into 6,101 contiguous expressed sequences (contigs; among these, 5,482 corresponded to predicted genes in the MUMDB (MIPS Ustilago maydis database, while 619 aligned to regions of the genome not yet designated as genes in MUMDB. A comparison of EST abundance identified numerous genes that may be regulated in a cell type or starvation-specific manner. The transcriptional response to nitrogen starvation was assessed using RT-qPCR. The results of this suggest that there may be cross-talk between the nitrogen and carbon signalling pathways in U. maydis. Bioinformatic analysis identified numerous examples of alternative splicing and anti-sense transcription. While intron retention was the predominant form of alternative splicing in U. maydis, other varieties were also evident (e.g. exon skipping. Selected instances of both alternative splicing and anti-sense transcription were independently confirmed using RT-PCR. Conclusion Through this work: 1 substantial sequence information has been provided for U. maydis genome annotation; 2 new genes were identified through the discovery of 619
Jin, Haining; Zhao, Qing; Gonzalez de Valdivia, Ernesto I;
The Shine-Dalgarno (SD+: 5'-AAGGAGG-3') sequence anchors the mRNA by base pairing to the 16S rRNA in the small ribosomal subunit during translation initiation. We have here compared how an SD+ sequence influences gene expression, if located upstream or downstream of an initiation codon....... The positive effect of an upstream SD+ is confirmed. A downstream SD+ gives decreased gene expression. This effect is also valid for appropriately modified natural Escherichia coli genes. If an SD+ is placed between two potential initiation codons, initiation takes place predominantly at the second start site...
Full Text Available Abstract Background Broad-spectrum fluoroquinolone antibiotics are central in modern health care and are used to treat and prevent a wide range of bacterial infections. The recently discovered qnr genes provide a mechanism of resistance with the potential to rapidly spread between bacteria using horizontal gene transfer. As for many antibiotic resistance genes present in pathogens today, qnr genes are hypothesized to originate from environmental bacteria. The vast amount of data generated by shotgun metagenomics can therefore be used to explore the diversity of qnr genes in more detail. Results In this paper we describe a new method to identify qnr genes in nucleotide sequence data. We show, using cross-validation, that the method has a high statistical power of correctly classifying sequences from novel classes of qnr genes, even for fragments as short as 100 nucleotides. Based on sequences from public repositories, the method was able to identify all previously reported plasmid-mediated qnr genes. In addition, several fragments from novel putative qnr genes were identified in metagenomes. The method was also able to annotate 39 chromosomal variants of which 11 have previously not been reported in literature. Conclusions The method described in this paper significantly improves the sensitivity and specificity of identification and annotation of qnr genes in nucleotide sequence data. The predicted novel putative qnr genes in the metagenomic data support the hypothesis of a large and uncharacterized diversity within this family of resistance genes in environmental bacterial communities. An implementation of the method is freely available at http://bioinformatics.math.chalmers.se/qnr/.
time. Further characterization of the plasmid was carried out by restriction mapping of pIB485 was performed. pIB485 DNA was digested with EcoRI...the interruption of the gene by insertion of the phage DNA. To characterize this unique regulation of gene expression, we sequenced the lipase gene...in a solution of 0.1% carboxymethylcellulose (added to stabilize the emulsion) by sonication for 7 - 10 minutes at 50w. This suspension was used to
Full Text Available RNA-sequencing and tailored bioinformatic methodologies have paved the way for identification of expressed fusion genes from the chaotic genomes of solid tumors. We have recently successfully exploited RNA-sequencing for the discovery of 24 novel fusion genes in breast cancer. Here, we demonstrate the importance of continuous optimization of the bioinformatic methodology for this purpose, and report the discovery and experimental validation of 13 additional fusion genes from the same samples. Integration of copy number profiling with the RNA-sequencing results revealed that the majority of the gene fusions were promoter-donating events that occurred at copy number transition points or involved high-level DNA-amplifications. Sequencing of genomic fusion break points confirmed that DNA-level rearrangements underlie selected fusion transcripts. Furthermore, a significant portion (>60% of the fusion genes were alternatively spliced. This illustrates the importance of reanalyzing sequencing data as gene definitions change and bioinformatic methods improve, and highlights the previously unforeseen isoform diversity among fusion transcripts.
刘俊君; 彭学贤; 莽克强
cDNA of soybean mosaic virus (Beijing isolate, SMV-BJ) has been synthesized, using viralgenomic RNA as template and random hexanucleotides as primers. Based on the sequences of SMV-BJ coat protein (CP) gene as well as SMV- and WMV-II-related regions, oligonucleotides were made as primers for polymerase chain reaction (PCR). NIb gene of SMV-BJ was amplified by PCR, and cloned into pBluescript SK. The complete sequence was determined. The comparison of NIb genes between SMV-BJ and WMV-II . (USA) shows that similarities for nucleotide sequence reach 80.3%, and the deduced amino acid sequence. 91 3%. In consideration of the high identities in between the CP gene and the 3’-non-coding region between them, WMV-II might be considered as a watermelon strain of SMV Besides, some unexpected sequences were found in the 3’-region of 2 NIb gene clones. Following modification and splicing, a binary vector of NIb gene has been constructed for its expression in higher plant for the purpose of studying the possible repl
Full Text Available The Chinese giant salamander, Andrias davidianus, is an important species in the course of evolution; however, there is insufficient genomic data in public databases for understanding its immunologic mechanisms. High-throughput transcriptome sequencing is necessary to generate an enormous number of transcript sequences from A. davidianus for gene discovery. In this study, we generated more than 40 million reads from samples of spleen and skin tissue using the Illumina paired-end sequencing technology. De novo assembly yielded 87,297 transcripts with a mean length of 734 base pairs (bp. Based on the sequence similarities, searching with known proteins, 38,916 genes were identified. Gene enrichment analysis determined that 981 transcripts were assigned to the immune system. Tissue-specific expression analysis indicated that 443 of transcripts were specifically expressed in the spleen and skin. Among these transcripts, 147 transcripts were found to be involved in immune responses and inflammatory reactions, such as fucolectin, β-defensins and lymphotoxin beta. Eight tissue-specific genes were selected for validation using real time reverse transcription quantitative PCR (qRT-PCR. The results showed that these genes were significantly more expressed in spleen and skin than in other tissues, suggesting that these genes have vital roles in the immune response. This work provides a comprehensive genomic sequence resource for A. davidianus and lays the foundation for future research on the immunologic and disease resistance mechanisms of A. davidianus and other amphibians.
Gaurav Batra; Vineeta Singh Chauhan; Amanjot Singh; Neelam K Sarkar; Anil Grover
Elucidation of genome sequence provides an excellent platform to understand detailed complexity of the various gene families. Hsp100 is an important family of chaperones in diverse living systems. There are eight putative gene loci encoding for Hsp100 proteins in Arabidopsis genome. In rice, two full-length Hsp100 cDNAs have been isolated and sequenced so far. Analysis of rice genomic sequence by in silico approach showed that two isolated rice Hsp100 cDNAs correspond to Os05g44340 and Os02g32520 genes in the rice genome database. There appears to be three additional proteins (encoded by Os03g31300, Os04g32560 and Os04g33210 gene loci) that are variably homologous to Os05g44340 and Os02g32520 throughout the entire amino acid sequence. The above five rice Hsp100 genes show significant similarities in the signature sequences known to be conserved among Hsp100 proteins. While Os05g44340 encodes cytoplasmic Hsp100 protein, those encoded by the other four genes are predicted to have chloroplast transit peptides.
Full Text Available The adiponectin gene (ADIPOQ plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5 of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A1-D1, A2-D2 were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A3-C3 and three SNPs were observed. Two patterns (A4-B4, A5-B5 and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg. In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A1, A2 and A3 were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A1-A3, A1-C3, B1-A3 and B1-C3 were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits.
Inês C Conceição
Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high
Worning, Peder; Jensen, Lars Juhl; Nelson, K. E.
The recently published complete DNA sequence of the bacterium Thermotoga maritima provides evidence, based on protein sequence conservation, for lateral gene transfer between Archaea and Bacteria. We introduce a new method of periodicity analysis of DNA sequences, based on structural parameters......, which brings independent evidence for the lateral gene transfer in the genome of T.maritima, The structural analysis relates the Archaea-like DNA sequences to the genome of Pyrococcus horikoshii. Analysis of 24 complete genomic DNA sequences shows different periodicity patterns for organisms...... of different origin, The typical genomic periodicity for Bacteria is 11 bp whilst it is 10 bp for Archaea, Eukaryotes have more complex spectra but the dominant period in the yeast Saccharomyces cerevisiae is 10.2 bp. These periodicities are most likely reflective of differences in chromatin structure....
YANG Hui; Zhang YaPing
Two multigene superfamilies, named V1R and V2R, encoding seven-transmembrane-domain G-protein coupled receptors (GPCRs) have been identified as pheromone receptors in mammals. Three V2R gene families have been described in mouse and rat. Here we screened the updated mouse genome sequence database and finally retrieved 63 putative functional V2R genes including three newly identified genes which formed a new additional family. We described the genomic organization of these genes and also characterized the conservation of mouse V2R protein sequences. These genomic and sequence information we described are useful as part of the evidence to speculate the functional domain of V2Rs and should give aid to the functionality study in the future.
Anna M. Roschanski
Full Text Available Premise of the study: We present a protocol for the annotation of transcriptome sequence data and the identification of candidate genes therein using the example of the nonmodel conifer Abies alba. Methods and Results: A normalized cDNA library was built from an A. alba seedling. The sequencing on a 454 platform yielded more than 1.5 million reads that were de novo assembled into 25 149 contigs. Two complementary approaches were applied to annotate gene fragments that code for (1 well-known proteins and (2 proteins that are potentially adaptively relevant. Primer development and testing yielded 88 amplicons that could successfully be resequenced from genomic DNA. Conclusions: The annotation workflow offers an efficient way to identify potential adaptively relevant genes from the large quantity of transcriptome sequence data. The primer set presented should be prioritized for single-nucleotide polymorphism detection in adaptively relevant genes in A. alba.
Krakowetz, Chantel N; Chilton, Neil B
The complete DNA sequences and secondary structure of the mitochondrial (mt) 16S ribosomal (r) RNA gene were determined for six Ixodes scapularis adults. There were 44 variable nucleotide positions in the 1252 bp sequence alignment. Most (95%) nucleotide alterations did not affect the integrity of the secondary structure of the gene because they either occurred at unpaired positions or represented compensatory changes that maintained the base pairing in helices. A large proportion (75%) of the intraspecific variation in DNA sequence occurred within Domains I, II and VI of the 16S gene. Therefore, several regions within this gene may be highly informative for studies of the population genetics and phylogeography of I. scapularis, a major vector of pathogens of humans and domestic animals in North America.
Magare V N
Full Text Available Corbicula regularis is a freshwater mussel found in the Indian sub-continent. In the present study, phylogenetic characterization of this important bivalve was attempted using 18S ribosomal RNA gene markers. Genomic DNA was extracted and 18S rRNA gene was amplified by universal primers. The amplification product was sequenced and compared with the nucleotide databases available online to evaluate phylogenetic relationship of the animal under study. Results indicated that 18S rRNA gene sequences of C. regularis showed high degree of similarity to another freshwater mussel, C. fluminea. This work constitutes the first ever sequence deposition of the C. regularis in the nucleotide databases highlighting the usefulness of 18S ribosomal gene markers for phylogenetic analysis.
Ram, Jeffrey L; Karim, Aos S; Sendler, Edward D; Kato, Ikuko
Understanding the identity and changes of organisms in the urogenital and other microbiomes of the human body may be key to discovering causes and new treatments of many ailments, such as vaginosis. High-throughput sequencing technologies have recently enabled discovery of the great diversity of the human microbiome. The cost per base of many of these sequencing platforms remains high (thousands of dollars per sample); however, the Illumina Genome Analyzer (IGA) is estimated to have a cost per base less than one-fifth of its nearest competitor. The main disadvantage of the IGA for sequencing PCR-amplified 16S rRNA genes is that the maximum read-length of the IGA is only 100 bases; whereas, at least 300 bases are needed to obtain phylogenetically informative data down to the genus and species level. In this paper we describe and conduct a pilot test of a multiplex sequencing strategy suitable for achieving total reads of > 300 bases per extracted DNA molecule on the IGA. Results show that all proposed primers produce products of the expected size and that correct sequences can be obtained, with all proposed forward primers. Various bioinformatic optimization of the Illumina Bustard analysis pipeline proved necessary to extract the correct sequence from IGA image data, and these modifications of the data files indicate that further optimization of the analysis pipeline may improve the quality rankings of the data and enable more sequence to be correctly analyzed. The successful application of this method could result in an unprecedentedly deep description (800,000 taxonomic identifications per sample) of the urogenital and other microbiomes in a large number of samples at a reasonable cost per sample.
Han, Yingxin; Zhang, Yinxin; Mei, Yanhua; Wang, Yuqi; Liu, Tao; Guan, Yanfang; Tan, Deming; Liang, Yu; Yang, Ling; Yi, Xin
Drug resistance to nucleoside analogs is a serious problem worldwide. Both drug resistance gene mutation detection and HBV genotyping are helpful for guiding clinical treatment. Total HBV DNA from 395 patients who were treated with single or multiple drugs including Lamivudine, Adefovir, Entecavir, Telbivudine, Tenofovir and Emtricitabine were sequenced using the HiSeq 2000 sequencing system and validated using the 3730 sequencing system. In addition, a mixed sample of HBV plasmid DNA was used to determine the cutoff value for HiSeq-sequencing, and 52 of the 395 samples were sequenced three times to evaluate the repeatability and stability of this technology. Of the 395 samples sequenced using both HiSeq and 3730 sequencing, the results from 346 were consistent, and the results from 49 were inconsistent. Among the 49 inconsistent results, 13 samples were detected as drug-resistance-positive using HiSeq but negative using 3730, and the other 36 samples showed a higher number of drug-resistance-positive gene mutations using HiSeq 2000 than using 3730. Gene mutations had an apparent frequency of 1% as assessed by the plasmid testing. Therefore, a 1% cutoff value was adopted. Furthermore, the experiment was repeated three times, and the same results were obtained in 49/52 samples using the HiSeq sequencing system. HiSeq sequencing can be used to analyze HBV gene mutations with high sensitivity, high fidelity, high throughput and automation and is a potential method for hepatitis B virus gene mutation detection and genotyping.
Xing, Wen-Rui; Hou, Bei-Wei; Guan, Jing-Jiao; Luo, Jing; Ding, Xiao-Yu
The LEAFY (LFY) homologous gene of Dendrobium moniliforme (L.) Sw. was cloned by new primers which were designed based on the conservative region of known sequences of orchid LEAFY gene. Partial LFY homologous gene was cloned by common PCR, then we got the complete LFY homologous gene Den LFY by Tail-PCR. The complete sequence of DenLFY gene was 3 575 bp which contained three exons and two introns. Using BLAST method, comparison analysis among the exon of LFY homologous gene indicted that the DenLFY gene had high identity with orchids LFY homologous, including the related fragment of PhalLFY (84%) in Phalaenopsis hybrid cultivar, LFY homologous gene in Oncidium (90%) and in other orchid (over 80%). Using MP analysis, Dendrobium is found to be the sister to Oncidium and Phalaenopsis. Homologous analysis demonstrated that the C-terminal amino acids were highly conserved. When the exons and introns were separately considered, exons and the sequence of amino acid were good markers for the function research of DenLFY gene. The second intron can be used in authentication research of Dendrobium based on the length polymorphism between Dendrobium moniliforme and Dendrobium officinale.
Full Text Available BACKGROUND: Glucokinase plays important tissue-specific roles in human physiology, where it acts as a sensor of blood glucose levels in the pancreas, and a few other cells of the gut and brain, and as the rate-limiting step in glucose metabolism in the liver. Liver-specific expression is driven by one of the two tissue-specific promoters, and has an absolute requirement for insulin. The sequences that mediate regulation by insulin are incompletely understood. METHODOLOGY/PRINCIPAL FINDINGS: To better understand the liver-specific expression of the human glucokinase gene we compared the structures of this gene from diverse mammals. Much of the sequence located between the 5' pancreatic beta-cell-specific and downstream liver-specific promoters of the glucokinase genes is composed of repetitive DNA elements that were inserted in parallel on different mammalian lineages. The transcriptional activity of the liver-specific promoter 5' flanking sequences were tested with and without downstream intronic sequences in two human liver cells lines, HepG2 and L-02. While glucokinase liver-specific 5' flanking sequences support expression in liver cell lines, a sequence located about 2000 bases 3' to the liver-specific mRNA start site represses gene expression. Enhanced reporter gene expression was observed in both cell lines when cells were treated with fetal calf serum, but only in the L-02 cells was expression enhanced by insulin. CONCLUSIONS/SIGNIFICANCE: Our results suggest that the normal liver L-02 cell line may be a better model to understand the regulation of the liver-specific expression of the human glucokinase gene. Our results also suggest that sequences downstream of the liver-specific mRNA start site have important roles in the regulation of liver-specific glucokinase gene expression.
Dalquen, Daniel A; Altenhoff, Adrian M; Gonnet, Gaston H; Dessimoz, Christophe
The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP) as well as two generic approaches (bidirectional best hit and reciprocal smallest distance). We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer) and technological artefacts (ambiguous sequences) on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall), lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
Daniel A Dalquen
Full Text Available The identification of orthologous genes, a prerequisite for numerous analyses in comparative and functional genomics, is commonly performed computationally from protein sequences. Several previous studies have compared the accuracy of orthology inference methods, but simulated data has not typically been considered in cross-method assessment studies. Yet, while dependent on model assumptions, simulation-based benchmarking offers unique advantages: contrary to empirical data, all aspects of simulated data are known with certainty. Furthermore, the flexibility of simulation makes it possible to investigate performance factors in isolation of one another.Here, we use simulated data to dissect the performance of six methods for orthology inference available as standalone software packages (Inparanoid, OMA, OrthoInspector, OrthoMCL, QuartetS, SPIMAP as well as two generic approaches (bidirectional best hit and reciprocal smallest distance. We investigate the impact of various evolutionary forces (gene duplication, insertion, deletion, and lateral gene transfer and technological artefacts (ambiguous sequences on orthology inference. We show that while gene duplication/loss and insertion/deletion are well handled by most methods (albeit for different trade-offs of precision and recall, lateral gene transfer disrupts all methods. As for ambiguous sequences, which might result from poor sequencing, assembly, or genome annotation, we show that they affect alignment score-based orthology methods more strongly than their distance-based counterparts.
Su, Y.; Zhang, H.; Madrid, R. [Univ. of California, San Francisco, CA (United States)] [and others
Charcot-Marie-Tooth disease (CMT), the most common genetic neuropathy, affects about 1 in 2600 people in Norway and is found worldwide. CMT Type 1 (CMT1) has slow nerve conduction with demyelinated Schwann cells. Autosomal dominant CMT Type 1B (CMT1B) results from mutations in the myelin protein zero gene which directs the synthesis of more than half of all Schwann cell protein. This gene was mapped to the chromosome 1q22-1q23.1 borderline by fluorescence in situ hybridization. The first 7 of 7 reported CMT1B mutations are unique. Thus the most effective means to identify CMT1B mutations in at-risk family members and fetuses is to sequence the entire coding sequence in dominant or sporadic CMT patients without the CMT1A duplication. Of the 19 primers used in 16 pars to uniquely amplify the entire MPZ coding sequence, 6 primer pairs were used to amplify and sequence the 6 exons. The DyeDeoxy Terminator cycle sequencing method used with four different color fluorescent lables was superior to manual sequencing because it sequences more bases unambiguously from extracted genomic DNA samples within 24 hours. This protocol was used to test 28 CMT and Dejerine-Sottas patients without CMT1A gene duplication. Sequencing MPZ gene-specific amplified fragments identified 9 polymorphic sites within the 6 exons that encode the 248 amino acid MPZ protein. The large number of major CMT1B mutations identified by single strand sequencing are being verified by reverse strand sequencing and when possible, by restriction enzyme analysis. This protocol can be used to distringuish CMT1B patients from othre CMT phenotypes and to determine the CMT1B status of relatives both presymptomatically and prenatally.
YALAN FENG; YONGYING ZHAO; KETAO WANG; YONG CHUN LI; XIANG WANG; JUN YIN
This study aimed to identify vernalization responsive genes in the winter wheat cultivar Jing841 by comparing the transcriptome data with that of a spring wheat cultivar Liaochun10. For each cultivar, seedlings before and after the vernalizationtreatment were sequenced by Solexa/Illumina sequencing. Genes differentially expressed after and before vernalization were identified as differentially expressed genes (DEGs) using false discovery rate (FDR) ≤ 0.001 and |log2 (fold change)| >1 as cutoffs. The Jing841-specific DEGs were screened and subjected to functional annotation using gene ontology (GO) database.Vernalization responsive genes among the specific genes were selected for validation by quantitative reverse transcription polymerase chain reaction (qRT-PCR) and the expression change over the time was investigated for the top 11 genes with the most significant expression differences. A total of 138,062 unigenes were obtained. Overall, 636 DEGs were identified as vernalization responsive genes including some known genes such as VRN-1 and COR14a, and some unknown contigs. The qRT-PCR validated changes in the expression of 18 DEGs that were detected by RNA-seq. Among them, 11 genes displayed four different types of expression patterns over time during the 30-day-long vernalization treatment. Genes or contigs such as VRN-A1, COR14a, IRIP, unigene1806 and Cl18953. Contig2 probably have critical roles in vernalization.
Full Text Available BACKGROUND: Horizontal gene transfer (HGT is recognized as one of the major forces for bacterial genome evolution. Many clinically important bacteria may acquire virulence factors and antibiotic resistance through HGT. The comparative genomic analysis has become an important tool for identifying HGT in emerging pathogens. In this study, the Serine-Aspartate Repeat (Sdr family has been compared among different sources of Staphylococcus aureus (S. aureus to discover sequence diversities within their genomes. METHODOLOGY/PRINCIPAL FINDINGS: Four sdr genes were analyzed for 21 different S. aureus strains and 218 mastitis-associated S. aureus isolates from Canada. Comparative genomic analyses revealed that S. aureus strains from bovine mastitis (RF122 and mastitis isolates in this study, ovine mastitis (ED133, pig (ST398, chicken (ED98, and human methicillin-resistant S. aureus (MRSA (TCH130, MRSA252, Mu3, Mu50, N315, 04-02981, JH1 and JH9 were highly associated with one another, presumably due to HGT. In addition, several types of insertion and deletion were found in sdr genes of many isolates. A new insertion sequence was found in mastitis isolates, which was presumably responsible for the HGT of sdrC gene among different strains. Moreover, the sdr genes could be used to type S. aureus. Regional difference of sdr genes distribution was also indicated among the tested S. aureus isolates. Finally, certain associations were found between sdr genes and subclinical or clinical mastitis isolates. CONCLUSIONS: Certain sdr gene sequences were shared in S. aureus strains and isolates from different species presumably due to HGT. Our results also suggest that the distributional assay of virulence factors should detect the full sequences or full functional regions of these factors. The traditional assay using short conserved regions may not be accurate or credible. These findings have important implications with regard to animal husbandry practices that may
Kaas Rolf S
Full Text Available Abstract Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness of the 186 sequenced E. coli genomes. The core-gene tree displays high confidence and divides the E. coli strains into the observed MLST type clades and also separates defined phylotypes. Conclusion The results of comparing a large and diverse E. coli dataset support the theory that reliable and good resolution phylogenies can be inferred from the core-genome. The results further suggest that the resolution at the isolate level may, subsequently be improved by targeting more variable genes. The use of whole genome sequencing will make it possible to eliminate, or at least reduce, the need for several typing steps used in traditional epidemiology.
Rauch, U; Meyer, H; Brakebusch, C
Brevican is a brain-specific proteoglycan belonging to the aggrecan family. Phage clones containing the complete mouse brevican open reading frame of 2649 bp and the complete 3'-untranslated region of 341 bp were isolated from a mouse brain cDNA library, and cosmid clones containing the mouse bre...... to an alternative brevican cDNA, coding for a GPI-linked isoform. Single strand conformation polymorphism analysis mapped the brevican gene (Bcan) to chromosome 3 between the microsatellite markers D3Mit22 and D3Mit11....
LU Xin-xin; YUAN Liang; WAN Xiao-hua; GENG Jia-jing
Background Accurate identification of bacterial isolates is an essential task in clinical microbiology. This study compared culturing to analyzing 16S rRNA gene sequences as methods to identify bacteria in clinical samples. We developed a key technique to directly identify bacteria in clinical samples via nucleic acid sequences, thus improving the ability to confirm pathogens.Methods We obtained 225 samples from Beijing Tongran Hospital and examined them by conventional culture and 16S rDNA sequencing to identify pathogens. This study made use of a modified sample pre-treatment technique which came from our laboratory to extract DNA. 16S rDNA was amplified by PCR. The amplified product was sequenced on a CEQ8000 capillary sequencer. Sequences were uploaded to the GenBank BLAST database for comparison.Results Among the positively cultivated bacterial strains, seven strains were identified differently by Vitek32 and by 16S rDNA sequencing. Twelve samples that were negative by standard culturing were determined to have pathogens by sequence analysis.Conclusion The use of 16S rRNA gene sequencing can improve clinical microbiology by providing better identification of unidentified bacteria or providing reference identification of unusual strains.
Full Text Available Abstract Background One of the most common and efficient methods for detecting mutations in genes is PCR amplification followed by direct sequencing. Until recently, the process of designing PCR assays has been to focus on individual assay parameters rather than concentrating on matching conditions for a set of assays. Primers for each individual assay were selected based on location and sequence concerns. The two primer sequences were then iteratively adjusted to make the individual assays work properly. This generally resulted in groups of assays with different annealing temperatures that required the use of multiple thermal cyclers or multiple passes in a single thermal cycler making diagnostic testing time-consuming, laborious and expensive. These factors have severely hampered diagnostic testing services, leaving many families without an answer for the exact cause of a familial genetic disease. A search of GeneTests for sequencing analysis of the entire coding sequence for genes that are known to cause muscular dystrophies returns only a small list of laboratories that perform comprehensive gene panels. The hypothesis for the study was that a complete set of universal assays can be designed to amplify and sequence any gene or family of genes using computer aided design tools. If true, this would allow automation and optimization of the mutation detection process resulting in reduced cost and increased throughput. Results An automated process has been developed for the detection of deletions, duplications/insertions and point mutations in any gene or family of genes and has been applied to ten genes known to bear mutations that cause muscular dystrophy: DMD; CAV3; CAPN3; FKRP; TRIM32; LMNA; SGCA; SGCB; SGCG; SGCD. Using this process, mutations have been found in five DMD patients and four LGMD patients (one in the FKRP gene, one in the CAV3 gene, and two likely causative heterozygous pairs of variations in the CAPN3 gene of two other
Fang, X. M.; Brennan, M. D.
Drosophila affinidisjuncta and Drosophila hawaiiensis are closely related species that display distinct tissue-specific expression patterns for their homologous alcohol dehydrogenase genes (Adh genes). In Drosophila melanogaster transformants, both genes are expressed at high levels in the larval and adult fat bodies, but the D. affinidisjuncta gene is expressed 10-50-fold more strongly in the larval and adult midguts and Malpighian tubules. The present study reports the mapping of cis-acting sequences contributing to the regulatory differences between these two genes in transformants. Chimeric genes were constructed and introduced into the germ line of D. melanogaster. Stage- and tissue-specific expression patterns were determined by measuring steady-state RNA levels in larvae and adults. Three portions of the promoter region make distinct contributions to the tissue-specific regulatory differences between the native genes. Sequences immediately upstream of the distal promoter have a strong effect in the adult Malpighian tubules, while sequences between the two promoters are relatively important in the larval Malpighian tubules. A third gene segment, immediately upstream of the proximal promoter, influences levels of the proximal Adh transcript in all tissues and developmental stages examined, and largely accounts for the regulatory difference in the larval and adult midguts. However, these as well as other sequences make smaller contributions to various aspects of the tissue-specific regulatory differences. In addition, some chimeric genes display aberrant RNA levels for the whole organism, suggesting close physical association between sequences involved in tissue-specific regulatory differences and those important for Adh expression in the larval and adult fat bodies. PMID:1644276
Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full
Full Text Available Abstract The bacterial genus Bartonella is classified in the alpha-2 Proteobacteria on the basis of 16S rDNA sequence comparison. The Bartonella two-component system feuPQ is found in nearly all bacterial species. We investigated the usefulness of the response regulator feuP gene sequence in the classification of 18 well characterized Bartonella species. Phylogenetic relationships were inferred using parsimony neighbour-joining and maximum-likelihood methods. Reliable classifications of most of the studied species were obtained. Bartonella were divided into two supported clades containing two supported clusters each. These results were similar to our previous data obtained with groEL ftsZ and ribC genes sequences. The wide range of feuP DNA sequence similarity 78.6 to 96.5 among Bartonella species makes it a promising candidate for multi-locus sequence typing MLST of clinical isolates. This is the first report proving the usefulness of feuP sequences in bartonellae classification at the species level.
Full Text Available In view of the growth hormone protein investigated and characterized from Salmo trutta caspius. Growth hormone gene in the Salmo trutta caspius have six exons in the full length that is translated into a Molecular Weight (kDa: ssDNA: 64.98 and dsDNA: 129.6. There are also 210 amino acid residue. The assembled full length of DNA contains open reading frame of growth hormone gene that contains 15 sequences in the full length. The average GC content is 47% and AT content is 53%. This protein multiple alignment has shown that this peptide is 100% identical to the corresponding homologous protein in the growth hormone protein which including Salmo salar (Accession number: AAA49558.1 and Rainbow trout (Salmo trutta (Accession number: AAA49555.1" sequences. The sequence of protein had deposited in Gene Bank, Accession number: AEK70940. Also we were analyzed second and third structure between sequences reported in Gene Bank Network system. The results are shown, there are homology between second structure in three sequences including: Salmo trutta caspius, Salmo salar and Rainbow trout. Regarding third structure, Salmo trutta caspius and Salmo salar are same type, but Rainbow trout has different homology with Salmo trutta caspius and Salmo salar. However, the sequences were observed three parallel " helix and in second structure there were almost same percent β sheet.
Genis, Alma D; Mosqueda, Juan J; Borgonio, Verónica M; Falcón, Alfonso; Alvarez, Antonio; Camacho, Minerva; de Lourdes Muñoz, Maria; Figueroa, Julio V
Variable merozoite surface antigens of Babesia bovis are exposed glycoproteins having a role in erythrocyte invasion. Members of this gene family include msa-1 and msa-2 (msa-2c, msa-2a(1), msa-2a(2), and msa-2b). Small subunit ribosomal (ssr)RNA gene is subject to evolutive pressure and has been used in phylogenetic studies. To determine the phylogenetic relationship among B. bovis Mexican isolates using different genetic markers, PCR amplicons, corresponding to msa-1, msa-2c, msa-2b, and ssrRNA genes, were cloned and plasmids carrying the corresponding inserts were sequenced. Comparative analysis of nucleotide and deduced amino acid sequences revealed distinct degrees of variability and identity among the coding gene sequences obtained from 12 geographically different B. bovis isolates and a reference strain. Overall sequence identities of 47.7%, 72.3%, 87.7%, and 94% were determined for msa-1, msa-2b, msa-2c, and ssrRNA, respectively. A robust phylogenetic tree was obtained with msa-2b sequences. The phylogenetic analysis suggests that Mexican B. bovis isolates group in clades not concordant with the Mexican geography. However, the Mexican isolates group together in an American clade separated from the Australian clade. Sequence heterogeneity in msa-1, msa-2b, and msa-2c coding regions of Mexican B. bovis isolates present in different geographical regions can be a result of either differential evolutive pressure or cattle movement from commercial trade.
Full Text Available Abstract Background The mammalian neurohypophysial hormones, vasopressin and oxytocin are involved in osmoregulation and uterine smooth muscle contraction respectively. All jawed vertebrates contain at least one homolog each of vasopressin and oxytocin whereas jawless vertebrates contain a single neurohypophysial hormone called vasotocin. The vasopressin homolog in non-mammalian vertebrates is vasotocin; and the oxytocin homolog is mesotocin in non-eutherian tetrapods, mesotocin and [Phe2]mesotocin in lungfishes, and isotocin in ray-finned fishes. The genes encoding vasopressin and oxytocin genes are closely linked in the human and rodent genomes in a tail-to-tail orientation. In contrast, their pufferfish homologs (vasotocin and isotocin are located on the same strand of DNA with isotocin gene located upstream of vasotocin gene separated by five genes, suggesting that this locus has experienced rearrangements in either mammalian or ray-finned fish lineage, or in both lineages. The coelacanths occupy a unique phylogenetic position close to the divergence of the mammalian and ray-finned fish lineages. Results We have sequenced a coelacanth (Latimeria menadoensis BAC clone encompassing the neurohypophysial hormone genes and investigated the evolutionary history of the vertebrate neurohypophysial hormone gene locus within a comparative genomics framework. The coelacanth contains vasotocin and mesotocin genes like non-mammalian tetrapods. The coelacanth genes are present on the same strand of DNA with no intervening genes, with the vasotocin gene located upstream of the mesotocin gene. Nucleotide sequences of the second exons of the two genes are under purifying selection implying a regulatory function. We have also analyzed the neurohypophysial hormone gene locus in the genomes of opossum, chicken and Xenopus tropicalis. The opossum contains two tandem copies of vasopressin and mesotocin genes. The vasotocin and mesotocin genes in chicken and
Fromont-Racine, M; Bucchini, D; Madsen, O;
Expression of the human insulin gene was examined in transgenic mouse lines carrying the gene with various lengths of DNA sequences 5' to the transcription start site (+1). Expression of the transgene was demonstrated by 1) the presence of human C-peptide in urine, 2) the presence of specific tra...... of the transgene was observed in cell types other than beta-islet cells.......Expression of the human insulin gene was examined in transgenic mouse lines carrying the gene with various lengths of DNA sequences 5' to the transcription start site (+1). Expression of the transgene was demonstrated by 1) the presence of human C-peptide in urine, 2) the presence of specific......, and -168 allowed correct initiation of the transcripts and cell specificity of expression, while quantitative expression gradually decreased. Deletion to -58 completely abolished the expression of the gene. The amount of human product that in mice harboring the longest fragment contributes up to 50...
Wang, Y; Li, R
Great homology existed between IPNS genes from surphur-containing beta-lactam antibiotics producers including procaryotes and eucaryotes. A DNA homologous band was confirmed in S. cattleya by Southern blot analysis using IPNS gene from S. lipmanii as a probe. A recombinant plasmid containing the cyclase gene involved in thienamycin biosynthesis and IPNS gene was obtained by complementary cloning with mutant from S. cattleya. DNA sequencing revealed that the IPNS gene of S. cattleya consists of 963 bp encoding a protein of 321 amino acids with ATG as start codon, TGA as stop codon. Pairwise comparison of the predicted amino acid sequences showed 56% and 64% similarity with IPNSs of S. clavuligerus and S. lipmanii, respectively.
Shu, Xianghua; Liu, Yonggang; Yang, Liangyu; Song, Chunlian; Hou, Jiafa
The complete coding sequences of 3 porcine genes - ASPA, NAGA, and HEXA - were amplified by the reverse transcriptase polymerase chain reaction (RT-PCR) based on the conserved sequence information of the mouse or other mammals and referenced pig ESTs. These 3 novel porcine genes were then deposited in the NCBI database and assigned GeneIDs: 100142661, 100142664 and 100142667. The phylogenetic tree analysis revealed that the porcine ASPA, NAGA, and HEXA all have closer genetic relationships with the ASPA, NAGA, and HEXA of cattle. Tissue expression profile analysis was also carried out and results revealed that swine ASPA, NAGA, and HEXA genes were differentially expressed in various organs, including skeletal muscle, the heart, liver, fat, kidney, lung, and small and large intestines. Our experiment is the first one to establish the foundation for further research on these 3 swine genes.
Mygind, Tina; Jacobsen, Iben Søgaard; Melkova, Renata
The gap gene encodes the glycolytic enzyme glyceraldehyde 3-phosphate dehydrogenase (GAPDH). The gene was cloned and sequenced from the Mycoplasma hominis type strain PG21(T). The intraspecies variability was investigated by inspection of restriction fragment length polymorphism (RFLP) patterns...... after polymerase chain reaction (PCR) amplification of the gap gene from 15 strains and furthermore by sequencing of part of the gene in eight strains. The M. hominis gap gene was found to vary more than the Escherichia coli counterpart, but the variation at nucleotide level gave rise to only a few...... to a 104-kDa band in addition to the expected 36-kDa band. The protein reacting at 104 kDa is a M. hominis protein with either an epitope similar to one on GAPDH, or it is an immunoglobulin binding protein...
Fouts, D L; Ruef, B J; Ridley, P T; Wrightsman, R A; Peterson, D S; Manning, J E
In previous studies we identified a 500-bp segment of the gene, TSA-1, which encodes an 85-kDa trypomastigote-specific surface antigen of the Peru strain of Trypanosoma cruzi. TSA-1 was shown to be located at a telomeric site and to contain a 27-bp tandem repeat unit within the coding region. This repeat unit defines a discrete subset of a multigene family and places the TSA-1 gene within this subset. In this study, we present the complete nucleotide sequence of the TSA-1 gene from the Peru strain. By homology matrix analysis, fragments of two other trypomastigote specific surface antigen genes, pTt34 and SA85-1.1, are shown to have extensive sequence homology with TSA-1 indicating that these genes are members of the same gene family as TSA-1. The TSA-1 subfamily was also found to be active in two other strains of T. cruzi, one of which contains multiple telomeric members and one of which contains a single non-telomeric member, suggesting that transcription is not necessarily dependent on the gene being located at a telomeric site. Also, while some of the sequences found in this gene family are present in 2 size classes of poly(A)+ RNA, others appear to be restricted to only 1 of the 2 RNA classes.
Polygalacturonase-inhibiting proteins (PGIP) play important roles in plant defense of pathogen, especially fungi. A pair of degenerated primers is designed based on the conserved sequence of 20 other known pgip genes and used to amplify Gossypium barbadense cultivation 7124 cDNA library by touch-down PCR. A 561 bp internal fragment of the pgip gene is obtained and used to design the primers for rapid amplification of cDNA ends. A composite pgip gene sequence is constructed from the products of 5′ and 3′ RACE, which are 666 bp and 906 bp respectively. Analysis of nucleic acid sequence shows 69.2% and 68.7% similarity to Citrus and Poncirus pgip genes, respectively. Its open reading frame of the gene encodes a polypeptide of 330 amino acids, in which 10 leucine-rich repeats arrange tandemly. A new set of primers is designed to the 5′ and 3′ ends of the gene, which allows amplification of the full-length gene from the cotton cDNA library. Genomic DNA analysis reveals that this gene has no intron.
Smith, M P; Feiss, M
Phage 21 is a temperate lambdoid coliphage, and its head-encoding genes, as well as those of phage lambda, are descended from a common ancestral phage. The head protein-encoding genes of phage 21 have been sequenced, confirming earlier genetic studies indicating that the head-encoding genes of 21 and lambda are analogous in location, size, and function. The phage 21 head-encoding genes identified (and their lambda analogues) include: 3(W), 4(B), 5(C), 6(Nu3), shp (D), 7(E), and 8(FII), respectively. An open reading frame, orf1, is analogous in position and shares some sequence identity with FI, a phage lambda gene involved in DNA packaging. The phage 21 major head protein, gp7, is predicted to have strong sequence identity (65%) with the lambda major capsid protein, gpE, including amino acids known to be important for capsid form determination. The nested genes 5/6 of phage 21 and C/Nu3 of lambda differ by several rearrangements including deletions and a triplication. The possibility that lambda genes C/Nu3 evolved from ancestal nested genes containing a triplication is discussed.
Chen, Siqi; Dong, Cheng; Wang, Qi; Zhong, Zhen; Qi, Yu; Ke, Xiaomei; Liu, Yuhe
We attempted to identify the genetic epidemiology of hereditary hearing loss among the Chinese Han population using next-generation sequencing (NGS). The entire length of the genes GJB2, SLC26A4, and GJB3, as well as exons of 57 additional candidate genes were sequenced from 116 individuals suffering from hearing loss. Thirty potentially causative mutations from these 60 genes were identified as the likely etiologies of hearing loss in 67 of the cases. In our study, SLC26A4 and GJB2 were the most frequently affected genes among the Chinese Han population with hearing loss. Collectively, they account for 52.8% of the cases, followed by MTRNR1, PCDH15, and TECTA. These data also illustrate that NGS can be used to identify rare alleles responsible for hereditary hearing loss: 22 of the 30 (73.3%) genes identified with mutations are rarely mutated in hereditary hearing loss and only account for 21.5% (42/195) of the total mutation frequency, explaining no more than 2% for each gene. These rarely mutated genes would be missed by conventional diagnostic sequencing approaches. NGS can be used effectively to identify both the common and rare genes causing hereditary hearing loss.
Soo-Dong Woo; Jae Young Choi; Yeon Ho Je; Byung Rae Jin
A local strain of Helicoverpa assulta nucleopolyhedrovirus (HasNPV) was isolated from infected H. assulta larvae in Korea. Restriction endonuclease fragment analysis, using 4 restriction enzymes, estimated that the total genome size of HasNPV is about 138 kb. A degenerate polymerase chain reaction (PCR) primer set for the polyhedrin gene successfully amplified the partial polyhedrin gene of HasNPV. The sequencing results showed that the about 430 bp PCR product was a fragment of the corresponding polyhedrin gene. Using HasNPV partial predicted polyhedrin to probe the Southern blots, we identified the location of the polyhedrin gene within the 6 kb EcoRI, 15 kb NcoI, 20 kb XhoI, 17 kb BglII and 3 kb ClaI fragments, respectively. The 3 kb ClaI fragment was cloned and the nucleotide sequences of the polyhedrin coding region and its flaking regions were determined. Nucleotide sequence analysis indicated the presence of an open reading frame of 735 nucleotides which could encode 245 amino acids with a predicted molecular mass of 29 kDa. The nucleotide sequences within the coding region of HasNPV polyhedrin shared 73.7% identity with the polyhedrin gene from Autographa californica NPV but were most closely related to Helicoverpa and Heliothis species NPVs with over 99% sequence identity.
Wagner, Andreas; de la Chaux, Nicole
Horizontal gene transfer in prokaryotes is rampant on short and intermediate evolutionary time scales. It poses a fundamental problem to our ability to reconstruct the evolutionary tree of life. Is it also frequent over long evolutionary distances? To address this question, we analyzed the evolution of 2,091 insertion sequences from all 20 major families in 438 completely sequenced prokaryotic genomes. Specifically, we mapped insertion sequence occurrence on a 16S rDNA tree of the genomes we analyzed, and we also constructed phylogenetic trees of the insertion sequence transposase coding sequences. We found only 30 cases of likely horizontal transfer among distantly related prokaryotic clades. Most of these horizontal transfer events are ancient. Only seven events are recent. Almost all of these transfer events occur between pairs of human pathogens or commensals. If true also for other, non-mobile DNA, the rarity of distant horizontal transfer increases the odds of reliable phylogenetic inference from sequence data.
Glória R. Franco
Full Text Available We have initiated a gene discovery program in Schistosoma mansoni based on the technique of Expressed Sequence Tags (ESTs, i.e. partial sequences of cDNAs obtained from single passes in automatic DNA sequencers. ESTs can be used to identify genese onf the basis of their homology whith sequences from other species deposited in DNA or protein databases. Trasncripts with sequences without matches in teh databases may represent novel parasite-specific genes. This approach has shown to be very efficient and in less than two years a broad range of novel genes has already been ascertained, more than doubling the number of known S. mansoni genes.
Wang, Jun Ling; Yang, Xu; Xia, Kun; Hu, Zheng Mao; Weng, Ling; Jin, Xin; Jiang, Hong; Zhang, Peng; Shen, Lu; Guo, Ji Feng; Li, Nan; Li, Ying Rui; Lei, Li Fang; Zhou, Jie; Du, Juan; Zhou, Ya Fang; Pan, Qian; Wang, Jian; Wang, Jun; Li, Rui Qiang; Tang, Bei Sha
Autosomal-dominant spinocerebellar ataxias constitute a large, heterogeneous group of progressive neurodegenerative diseases with multiple types. To date, classical genetic studies have revealed 31 distinct genetic forms of spinocerebellar ataxias and identified 19 causative genes. Traditional positional cloning strategies, however, have limitations for finding causative genes of rare Mendelian disorders. Here, we used a combined strategy of exome sequencing and linkage analysis to identify a novel spinocerebellar ataxia causative gene, TGM6. We sequenced the whole exome of four patients in a Chinese four-generation spinocerebellar ataxia family and identified a missense mutation, c.1550T-G transition (L517W), in exon 10 of TGM6. This change is at a highly conserved position, is predicted to have a functional impact, and completely cosegregated with the phenotype. The exome results were validated using linkage analysis. The mutation we identified using exome sequencing was located in the same region (20p13-12.2) as that identified by linkage analysis, which cross-validated TGM6 as the causative spinocerebellar ataxia gene in this family. We also showed that the causative gene could be mapped by a combined method of linkage analysis and sequencing of one sample from the family. We further confirmed our finding by identifying another missense mutation c.980A-G transition (D327G) in exon seven of TGM6 in an additional spinocerebellar ataxia family, which also cosegregated with the phenotype. Both mutations were absent in 500 normal unaffected individuals of matched geographical ancestry. The finding of TGM6 as a novel causative gene of spinocerebellar ataxia illustrates whole-exome sequencing of affected individuals from one family as an effective and cost efficient method for mapping genes of rare Mendelian disorders and the use of linkage analysis and exome sequencing for further improving efficiency.
Tercilio Calsa Jr.
Full Text Available Plastid-related sequences, derived from putative nuclear or plastome genes, were searched in a large collection of expressed sequence tags (ESTs and genomic sequences from the Citrus Biotechnology initiative in Brazil. The identified putative Citrus chloroplast gene sequences were compared to those from Arabidopsis, Eucalyptus and Pinus. Differential expression profiling for plastid-directed nuclear-encoded proteins and photosynthesis-related gene expression variation between Citrus sinensis and Citrus reticulata, when inoculated or not with Xylella fastidiosa, were also analyzed. Presumed Citrus plastome regions were more similar to Eucalyptus. Some putative genes appeared to be preferentially expressed in vegetative tissues (leaves and bark or in reproductive organs (flowers and fruits. Genes preferentially expressed in fruit and flower may be associated with hypothetical physiological functions. Expression pattern clustering analysis suggested that photosynthesis- and carbon fixation-related genes appeared to be up- or down-regulated in a resistant or susceptible Citrus species after Xylella inoculation in comparison to non-infected controls, generating novel information which may be helpful to develop novel genetic manipulation strategies to control Citrus variegated chlorosis (CVC.
Lisa A McGraw
Full Text Available The prairie vole (Microtus ochrogaster is an important model organism for the study of social behavior, yet our ability to correlate genes and behavior in this species has been limited due to a lack of genetic and genomic resources. Here we report the BAC-based targeted sequencing of behaviorally-relevant genes and flanking regions in the prairie vole. A total of 6.4 Mb of non-redundant or haplotype-specific sequence assemblies were generated that span the partial or complete sequence of 21 behaviorally-relevant genes as well as an additional 55 flanking genes. Estimates of nucleotide diversity from 13 loci based on alignments of 1.7 Mb of haplotype-specific assemblies revealed an average pair-wise heterozygosity (8.4×10(-3. Comparative analyses of the prairie vole proteins encoded by the behaviorally-relevant genes identified >100 substitutions specific to the prairie vole lineage. Finally, our sequencing data indicate that a duplication of the prairie vole AVPR1A locus likely originated from a recent segmental duplication spanning a minimum of 105 kb. In summary, the results of our study provide the genomic resources necessary for the molecular and genetic characterization of a high-priority set of candidate genes for regulating social behavior in the prairie vole.
Voronovsky, Andriy Y; Abbas, Charles A; Dmytruk, Kostyantyn V; Ishchuk, Olena P; Kshanovska, Barbara V; Sybirna, Kateryna A; Gaillardin, Claude; Sibirny, Andriy A
Previously cloned Candida famata (Debaryomyces hansenii) strain VKM Y-9 genomic DNA fragments containing genes RIB1 (codes for GTP cyclohydrolase II), RIB2 (encodes specific reductase), RIB5 (codes for dimethylribityllumazine synthase), RIB6 (encodes dihydroxybutanone phosphate synthase) and RIB7 (codes for riboflavin synthase) were sequenced. The derived amino acid sequences of C. famata RIB genes showed extensive homology to the corresponding sequences of riboflavin synthesis enzymes of other yeast species. The highest identity was observed to homologues of D. hansenii CBS767, as C. famata is the anamorph of this hemiascomycetous yeast. The D. hansenii CBS767 RIB3 gene encoding specific deaminase was cloned. This gene successfully complemented riboflavin auxotrophy of the rib3 mutant of flavinogenic yeast, Pichia guilliermondii. Putative iron-responsive elements (potential sites for binding of the transcription factors Fep1p or Aft1p and Aft2p) were found in the upstream regions of some C. famata and D. hansenii RIB genes. The sequences of C. famata RIB genes have been submitted to the EMBL data library under Accession Nos AJ810169-AJ810173.
Wei, Hongqing; Ma, Hongyu; Ma, Chunyan; Zhang, Fengying; Wang, Wei; Chen, Wei; Ma, Lingbo
The complete mitochondrial genome plays an important role in studies of genome-level characteristics and phylogenetic relationships. Here we determined the complete mitogenome sequence of Tridentiger trigonocephalus (Perciformes, Gobiidae), and discovered its phylogenetic relationship. This circular genome was 16 662 bp in length, and consisted of 37 typical genes, including 13 protein-coding genes, 22 tRNA genes, and two rRNA genes. The gene order of T. trigonocephalus mitochondrial genome was identical to those observed in most other vertebrates. Of 37 genes, 28 were encoded by heavy strand, while the others were encoded by light strand. The phylogenetic tree constructed by 13 concatenated protein-coding genes showed that T. trigonocephalus was closest to T. bifasciatus, and then to T. barbatus among the 20 species within suborder Gobioidei. This work should facilitate the studies on population genetic diversity, and molecular evolution in Gobioidei fishes.
Full Text Available Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.
Venkatesan, S; Gershowitz, A; Moss, B
The proximal part of the 10,000-base pair (bp) inverted terminal repetition of vaccinia virus DNA encodes at least three early mRNAs. A 2,236-bp segment of the repetition was sequenced to characterize two of the genes. This task was facilitated by constructing a series of recombinants containing overlapping deletions; oligonucleotide linkers with synthetic restriction sites provided points for radioactive labeling before sequencing by the chemical degradation method of Maxam and Gilbert (Methods Enzymol. 65:499-560, 1980). The ends of the transcripts were mapped by hybridizing labeled DNA fragments to early viral RNA and resolving nuclease S1-protected fragments in sequencing gels, by sequencing cDNA clones, and from the lengths of the RNAs. The nucleotide sequences for at least 60 bp upstream of both transcriptional initiation sites are more than 80% adenine . thymine rich and contain long runs of adenines and thymines with some homology to procaryotic and eucaryotic consensus sequences. The gene transcribed in the rightward direction encodes an RNA of approximately 530 nucleotides with a single open reading frame of 420 nucleotides. Preceding the first AUG, there is a heptanucleotide that can hybridize to the 3' end of 18S rRNA with only one mismatch. The derived amino acid sequence of the protein indicated a molecular weight of 15,500. The gene transcribed in the leftward direction encodes an RNA 1,000 to 1,100 nucleotides long with an open reading frame of 996 nucleotides and a leader sequence of only 5 to 6 nucleotides. The derived amino acid sequence of this protein indicated a molecular weight of 38,500. The 3' ends of the two transcripts were located within 100 bp of each other. Although there are adenine . thymine-rich clusters near the putative transcriptional termination sites, specific AATAAA polyadenylic acid signal sequences are absent.
Lau, Yun-Fai; Kan, Yuet Wai
We have developed a series of cosmids that can be used as vectors for genomic recombinant DNA library preparations, as expression vectors in mammalian cells for both transient and stable transformations, and as shuttle vectors between bacteria and mammalian cells. These cosmids were constructed by inserting one of the SV2-derived selectable gene markers-SV2-gpt, SV2-DHFR, and SV2-neo-in cosmid pJB8. High efficiency of genomic cloning was obtained with these cosmids and the size of the inserts was 30-42 kilobases. We isolated recombinant cosmids containing the human α -globin gene cluster from these genomic libraries. The simian virus 40 DNA in these selectable gene markers provides the origin of replication and enhancer sequences necessary for replication in permissive cells such as COS 7 cells and thereby allows transient expression of α -globin genes in these cells. These cosmids and their recombinants could also be stably transformed into mammalian cells by using the respective selection systems. Both of the adult α -globin genes were more actively expressed than the embryonic zeta -globin genes in these transformed cell lines. Because of the presence of the cohesive ends of the Charon 4A phage in the cosmids, the transforming DNA sequences could readily be rescued from these stably transformed cells into bacteria by in vitro packaging of total cellular DNA. Thus, these cosmid vectors are potentially useful for direct isolation of structural genes.
Ben-Ari Fuchs, Shani; Lieder, Iris; Stelzer, Gil; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from "data-to-knowledge-to-innovation," a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ ( geneanalytics.genecards.org ), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®--the human gene database; the MalaCards-the human diseases database; and the PathCards--the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®--the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene-tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell "cards" in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics, pharmacogenomics, vaccinomics
D`Souza, T.M.; Boominathan, K.; Reddy, C.A. [Michigan State Univ., East Lansing, MI (United States)
Degenerate primers corresponding to the consensus sequences of the copper-binding regions in the N-terminal domains of known basidiomycete laccases were used to isolate laccase gene-specific sequences from strains representing nine genera of wood rot fungi. All except three gave the expected PCR product of about 200 bp. Computer searches of the databases identified the sequences of each of the PCR product of about 200 bp. Computer searches of the databases identified the sequence of each of the PCR products analyzed as a laccase gene sequence, suggesting the specificity of the primers. PCR products of the white rot fungi Ganoderma lucidum, Phlebia brevispora, and Trametes versicolor showed 65 to 74% nucleotide sequence similarity to each other; the similarity in deduced amino acid sequences was 83 to 91%. The PCR products of Lentinula edodes and Lentinus tigrinus, on the other hand, showed relatively low nucleotide and amino acid similarities (58 to 64 and 62 to 81%, respectively); however, these similarities were still much higher than when compared with the corresponding regions in the laccases of the ascomycete fungi Aspergillus nidulans and Neurospora crassa. A few of the white rot fungi, as well as Gloeophyllum trabeum, a brown rot fungus, gave a 144-bp PCR fragment which had a nucleotide sequence similarity of 60 to 71%. Demonstration of laccase activity in G. trabeum and several other brown rot fungi was of particular interest because these organisms were not previously shown to produce laccases. 36 refs., 6 figs., 2 tabs.
Sato, Kengo; Kuroki, Yoko; Kumita, Wakako; Fujiyama, Asao; Toyoda, Atsushi; Kawai, Jun; Iriki, Atsushi; Sasaki, Erika; Okano, Hideyuki; Sakakibara, Yasubumi
The first draft of the common marmoset (Callithrix jacchus) genome was published by the Marmoset Genome Sequencing and Analysis Consortium. The draft was based on whole-genome shotgun sequencing, and the current assembly version is Callithrix_jacches-3.2.1, but there still exist 187,214 undetermined gap regions and supercontigs and relatively short contigs that are unmapped to chromosomes in the draft genome. We performed resequencing and assembly of the genome of common marmoset by deep sequencing with high-throughput sequencing technology. Several different sequence runs using Illumina sequencing platforms were executed, and 181 Gbp of high-quality bases including mate-pairs with long insert lengths of 3, 8, 20, and 40 Kbp were obtained, that is, approximately 60× coverage. The resequencing significantly improved the MGSAC draft genome sequence. The N50 of the contigs, which is a statistical measure used to evaluate assembly quality, doubled. As a result, 51% of the contigs (total length: 299 Mbp) that were unmapped to chromosomes in the MGSAC draft were merged with chromosomal contigs, and the improved genome sequence helped to detect 5,288 new genes that are homologous to human cDNAs and the gaps in 5,187 transcripts of the Ensembl gene annotations were completely filled.
A 605 bp section of mitochondrial 16S rRNA gene from Paralichthys olivaceus, Pseudorhombus cinnamomeus, Psetta maxima and Kareius bicoloratus, which represent 3 families of Order Pleuronectiformes was amplified by PCR and sequenced to show the molecular systematics of Pleuronectiformes for comparison with related gene sequences of other 6 flatfish downloaded from GenBank. Phylogenetic analysis based on genetic distance from related gene sequences of 10 flatfish showed that this method was ideal to explore the relationship between species, genera and families. Phylogenetic trees set-up is based on neighbor-joining, maximum parsimony and maximum likelihood methods that accords to the general rule of Pleuronectiformes evolution. But they also resulted in some confusion. Unlike data from morphological characters, P. olivaceus clustered with K.bicoloratus, but P. cinnamomeus did not cluster with P. olivaceus, which is worth further studying.
Tajima, K; Nakajima, K; Yamashita, H; Shiba, T; Munekata, M; Takai, M
The beta-glucosidase gene (bglxA) was cloned from the genomic DNA of Acetobacter xylinum ATCC 23769 and its nucleotide sequence (2200 bp) was determined. This bglxA gene was present downstream of the cellulose synthase operon and coded for a polypeptide of molecular mass 79 kDa. The overexpression of the beta-glucosidase in A. xylinum caused a tenfold increase in activity compared to the wild-type strain. In addition, the action pattern of the enzyme was identified as G3ase activity. The deduced amino acid sequence of the bglxA gene showed 72.3%, 49.6%, and 45.1% identity with the beta-glucosidases from A. xylinum subsp. sucrofermentans, Cellvibrio gilvus, and Mycobacterium tuberculosis, respectively. Based on amino acid sequence similarities, the beta-glucosidase (BglxA) was assigned to family 3 of the glycosyl hydrolases.
Translation initiation is governed by a limited number of mRNA sequence motifs within the translation initiation region (TIR). In bacteria and bacteriophages, one of the most important determinants is a Shine-Dalgarno (SD) sequence that base pairs with the anti-SD sequence GAUCACCUCCUUA localized in the 3' end of 16S rRNA. This work assesses a diversity of TIR features in phage T4, focusing on the SD sequence, its spacing to the start codon and relationship to gene expression and essentiality patterns. Analysis shows that GAGG is predominant of all core SD motifs in T4 and its related phages, particularly in early genes. Possible implication of the RegB activity is discussed.
Full Text Available High-throughput next-generation sequencing (NGS technology produces a tremendous amount of raw sequence data. The challenges for researchers are to process the raw data, to map the sequences to genome, to discover variants that are different from the reference genome, and to prioritize/rank the variants for the question of interest. The recent development of many computational algorithms and programs has vastly improved the ability to translate sequence data into valuable information for disease gene identification. However, the NGS data analysis is complex and could be overwhelming for researchers who are not familiar with the process. Here, we outline the analysis pipeline and describe some of the most commonly used principles and tools for analyzing NGS data for disease gene identification.
Full Text Available Abstract Background Plants from temperate regions are able to withstand freezing temperatures due to a process known as cold acclimation, which is a prior exposure to low, but non-freezing temperatures. During acclimation, a large number of genes are induced, bringing about biochemical changes in the plant, thought to be responsible for the subsequent increase in freezing tolerance. Key regulatory proteins in this process are the CBF1, 2 and 3 transcription factors which control the expression of a set of target genes referred to as the "CBF regulon". Results To assess the role of the CBF genes in cold acclimation and freezing tolerance of Arabidopsis thaliana, the CBF genes and their promoters were sequenced in the Versailles core collection, a set of 48 accessions that maximizes the naturally-occurring genetic diversity, as well as in the commonly used accessions Col-0 and WS. Extensive polymorphism was found in all three genes. Freezing tolerance was measured in all accessions to assess the variability in acclimated freezing tolerance. The effect of sequence polymorphism was investigated by evaluating the kinetics of CBF gene expression, as well as that of a subset of the target COR genes, in a set of eight accessions with contrasting freezing tolerance. Our data indicate that CBF genes as well as the selected COR genes are cold induced in all accessions, irrespective of their freezing tolerance. Although we observed different levels of expression in different accessions, CBF or COR gene expression was not closely correlated with freezing tolerance. Conclusion Our results indicate that the Versailles core collection contains significant natural variation with respect to freezing tolerance, polymorphism in the CBF genes and CBF and COR gene expression. Although there tends to be more CBF and COR gene expression in tolerant accessions, there are exceptions, reinforcing the idea that a complex network of genes is involved in freezing tolerance
She, Q; Sandal, N N; Stougaard, J
The soybean leghemoglobin lba gene promoter sequence was determined and aligned with the promoter sequence of the soybean lbc3 gene from the same gene family. Five highly conserved regions were found. There are two large conserved regions, one of which overlaps the basic promoter while the other ...
The concept of utilizing putative and unique gene sequences for the design of species specific probes was tested. The abundance profile of assigned functions within the Lactobacillus plantarum genome was used for the identification of the putative and unique gene sequence, csh. The targeted gene (cs...
Full Text Available Abstract Background The synaptic cell adhesion molecules, protocadherins, are a vertebrate innovation that accompanied the emergence of the neural tube and the elaborate central nervous system. In mammals, the protocadherins are encoded by three closely-linked clusters (α, β and γ of tandem genes and are hypothesized to provide a molecular code for specifying the remarkably-diverse neural connections in the central nervous system. Like mammals, the coelacanth, a lobe-finned fish, contains a single protocadherin locus, also arranged into α, β and γ clusters. Zebrafish, however, possesses two protocadherin loci that contain more than twice the number of genes as the coelacanth, but arranged only into α and γ clusters. To gain further insight into the evolutionary history of protocadherin clusters, we have sequenced and analyzed protocadherin clusters from the compact genome of the pufferfish, Fugu rubripes. Results Fugu contains two unlinked protocadherin loci, Pcdh1 and Pcdh2, that collectively consist of at least 77 genes. The fugu Pcdh1 locus has been subject to extensive degeneration, resulting in the complete loss of Pcdh1γ cluster. The fugu Pcdh genes have undergone lineage-specific regional gene conversion processes that have resulted in a remarkable regional sequence homogenization among paralogs in the same subcluster. Phylogenetic analyses show that most protocadherin genes are orthologous between fugu and zebrafish either individually or as paralog groups. Based on the inferred phylogenetic relationships of fugu and zebrafish genes, we have reconstructed the evolutionary history of protocadherin clusters in the teleost fish lineage. Conclusion Our results demonstrate the exceptional evolutionary dynamism of protocadherin genes in vertebrates in general, and in teleost fishes in particular. Besides the 'fish-specific' whole genome duplication, the evolution of protocadherin genes in teleost fishes is influenced by lineage
Lee, Chao-Zong; Liou, Guey-Yuh; Yuan, Gwo-Fang
Aflatoxins are polyketide-derived secondary metabolites produced by Aspergillus parasiticus, Aspergillus flavus, Aspergillus nomius and a few other species. The toxic effects of aflatoxins have adverse consequences for human health and agricultural economics. The aflR gene, a regulatory gene for aflatoxin biosynthesis, encodes a protein containing a zinc-finger DNA-binding motif. Although Aspergillus oryzae and Aspergillus sojae, which are used in fermented foods and in ingredient manufacture, have no record of producing aflatoxin, they have been shown to possess an aflR gene. This study examined 34 strains of Aspergillus section Flavi. The aflR gene of 23 of these strains was successfully amplified and sequenced. No aflR PCR products were found in five A. sojae strains or six strains of A. oryzae. These PCR results suggested that the aflR gene is absent or significantly different in some A. sojae and A. oryzae strains. The sequenced aflR genes from the 23 positive strains had greater than 96.6 % similarity, which was particularly conserved in the zinc-finger DNA-binding domain. The aflR gene of A. sojae has two obvious characteristics: an extra CTCATG sequence fragment and a C to T transition that causes premature termination of AFLR protein synthesis. Differences between A. parasiticus/A. sojae and A. flavus/A. oryzae aflR genes were also identified. Some strains of A. flavus as well as A. flavus var. viridis, A. oryzae var. viridis and A. oryzae var. effuses have an A. oryzae-type aflR gene. For all strains with the A. oryzae-type aflR gene, there was no evidence of aflatoxin production. It is suggested that for safety reasons, the aflR gene could be examined to assess possible aflatoxin production by Aspergillus section Flavi strains.
廖威明; 张春林; 李佛保; 曾炳芳; 曾益新
Background Mutation and expression change of p21WAF1/CIP1 may play a role in the growth of osteosarcoma. This study was to investigate the expression of the p21WAF1/CIP1 gene in human osteosarcoma, p21WAF1/CIP1 gene DNA sequence change and their relationships with the phenotype and clinical prognosis.Methods p21WAF1/CIP1 gene in 10 normal people and the tumours of 45 osteosarcoma patients were examined using polymerase chain reaction-single strand conformation polymorphism (PCR-SSCP) with silver staining. The PCR product with an abnormal strand was sequenced directly. The p21WAF1/CIP1 gene mRNA and P21 protein of 45 cases of osteosarcoma were investigated by using in situ hybridization and immunohistochemistry, respectively. Results The occurrence of P21 protein in osteosarcoma was 17.78% (8/45), and p21WAF1/CIP1 mRNA expression in osteosarcoma was 42.22% (19/45). The p21WAF1/CIP1 gene DNA sequencing of amplified production showed that in p21WAF1/CIP1 gene exon 3 of 36 cases of human osteosarcoma, there were 17 cases (47.22%) with C→T at position 609; 10 normal blood samples' DNA sequence analysis yielded 8 cases (80.00%) with C→T at the same position. Conclusions Along with the increase of malignancy, the expression of p21WAF1/CIP1mRNA and P21 protein in osteosarcoma tends to decrease. It is uncommon for the p21WAF1/CIP1 gene mutation to occur in human osteosarcoma. As a result, the possible existence of tumour subtypes of p21WAF1/CIP1 gene mutation should be investigated. Our research leads to the location of p21WAF1/CIP1 gene polymorphism of Chinese osteosarcoma patients, which can provide a basis for further research.
Shearman, C; Underwood, H; Jury, K; Gasson, M
A gene for the lysin of Lactococcus lactis bacteriphage phi vML3 was cloned using an Escherichia coli/bacteriophage lambda host-vector system. The gene was detected by its expression of antimicrobial activity against L. lactis cells in a bioassay. The cloned fragment was analysed by sub-cloning on to E. coli plasmid vectors and by restriction endonuclease and deletion mapping. Its entire DNA sequence was determined and an open reading frame for the lysin structural gene was identified. The sequenced lysin gene would express a protein of 187 amino acids with a molecular weight of 21,090, which is in good agreement with that of a protein detected after in vitro transcription and translation of DNA encoding the gene. Expression of the lysin gene in E. coli and B. subtilis from an adjacent bacteriophage promoter was readily detected but in L. lactis expression of lysin was found to be lethal. The bacteriophage phi vML3 lysin had sequence homology with protein 15 of B. subtilis bacteriophage PZA. This protein is involved in DNA packaging during bacteriophage maturation rather than in host cell lysis. The cloning and analysis of the phi vML3 lysin gene is of importance in further understanding lactic streptococcal bacteriophages, for the development of positive selection vectors and for biotechnological applications of relevance to the dairy industry.
Mesarich, Carl H; Griffiths, Scott A; van der Burgt, Ate; Okmen, Bilal; Beenen, Henriek G; Etalo, Desalegn W; Joosten, Matthieu H A J; de Wit, Pierre J G M
The Cf-5 gene of tomato confers resistance to strains of the fungal pathogen Cladosporium fulvum carrying the avirulence gene Avr5. Although Cf-5 has been cloned, Avr5 has remained elusive. We report the cloning of Avr5 using a combined bioinformatic and transcriptome sequencing approach. RNA-Seq was performed on the sequenced race 0 strain (0WU; carrying Avr5), as well as a race 5 strain (IPO 1979; lacking a functional Avr5 gene) during infection of susceptible tomato. Forty-four in planta-induced C. fulvum candidate effector (CfCE) genes of 0WU were identified that putatively encode a secreted, small cysteine-rich protein. An expressed transcript sequence comparison between strains revealed two polymorphic CfCE genes in IPO 1979. One of these conferred avirulence to IPO 1979 on Cf-5 tomato following complementation with the corresponding 0WU allele, confirming identification of Avr5. Complementation also led to increased fungal biomass during infection of susceptible tomato, signifying a role for Avr5 in virulence. Seven of eight race 5 strains investigated escape Cf-5-mediated resistance through deletion of the Avr5 gene. Avr5 is heavily flanked by repetitive elements, suggesting that repeat instability, in combination with Cf-5-mediated selection pressure, has led to the emergence of race 5 strains deleted for the Avr5 gene.
Full Text Available "nBackground: Acanthamoeba keratitis develops by pathogenic Acanthamoeba such as A. palestinensis. Indeed this species is one of the known causative agents of amoebic keratitis in Iran. Mannose Binding Protein (MBP is the main pathogenicity factors for developing this sight threatening disease. We aimed to characterize MBP gene in pathogenic Acanthamoeba isolates such as A. palestinensis."nMethods: This experimental research was performed in the School of Public Health, Tehran University of Medical Sciences, Tehran, Iran during 2007-2008. A. palestinensis was grown on 2% non-nutrient agar overlaid with Escherichia coli. DNA extraction was performed using phenol-chloroform method. PCR reaction and amplification were done using specific primer pairs of MBP. The amplified fragment were purified and sequenced. Finally, the obtained fragment was deposited in the gene data bank."nResults: A 900 bp PCR-product was recovered after PCR reaction. Sequence analysis of the purified PCR product revealed a gene with 943 nucleotides. Homology analysis of the obtained sequence showed 81% similarity with the available MBP gene in the gene data bank. The fragment was deposited in the gene data bank under accession number EU678895"nConclusion: MBP is known as the most important factor in Acanthamoeba pathogenesis cascade. Therefore, characterization of this gene can aid in developing better therapeutic agents and even immunization of high-risk people.
Ya-e ZHAO; Zheng-hang WANG; Yang XU; Ji-ru XU; Wen-yan LIU; Meng WEI; Chu-ying WANG
To our knowledge,few reports on Demodex studied at the molecular level are available at present.In this study our group,for the first time,cloned,sequenced and analyzed the chitin synthase (CHS) gene fragments of Demodex folliculorum,Demodex brevis,and Demodex canis (three isolates from each species) from Xi'an China,by designing specific primers based on the only partial sequence of the CHS gene of D.canis from Japan,retrieved from GenBank.Results show that amplification was successful only in three D.canis isolates and one D.brevis isolate out of the nine Demodex isolates.The obtained fragments were sequenced to be 339 bp for D.canis and 338 bp for D.brevis.The CHS gene sequence similarities between the three Xi'an D.canis isolates and one Japanese D.canis isolate ranged from 99.7％ to 100.0％,and those between four D.canis isolates and one D.brevis isolate were 99.1％-99.4％.Phylogenetic trees based on maximum parsimony (MP) and maximum likelihood (ML) methods shared the same clusters,according with the traditional classification.Two open reading frames (ORFs) were identified in each CHS gene sequenced,and their corresponding amino acid sequences were located at the catalytic domain.The relatively conserved sequences could be deduced to be a CHS class A gene,which is associated with chitin synthesis in the integument of Demodex mites.
Zhang, Bochao; Meng, Wenzhao; Prak, Eline T Luning; Hershberg, Uri
Immune repertoires are collections of lymphocytes that express diverse antigen receptor gene rearrangements consisting of Variable (V), (Diversity (D) in the case of heavy chains) and Joining (J) gene segments. Clonally related cells typically share the same germline gene segments and have highly similar junctional sequences within their third complementarity determining regions. Identifying clonal relatedness of sequences is a key step in the analysis of immune repertoires. The V gene is the most important for clone identification because it has the longest sequence and the greatest number of sequence variants. However, accurate identification of a clone's germline V gene source is challenging because there is a high degree of similarity between different germline V genes. This difficulty is compounded in antibodies, which can undergo somatic hypermutation. Furthermore, high-throughput sequencing experiments often generate partial sequences and have significant error rates. To address these issues, we describe a novel method to estimate which germline V genes (or alleles) cannot be discriminated under different conditions (read lengths, sequencing errors or somatic hypermutation frequencies). Starting with any set of germline V genes, this method measures their similarity using different sequencing lengths and calculates their likelihood of unambiguous assignment under different levels of mutation. Hence, one can identify, under different experimental and biological conditions, the germline V genes (or alleles) that cannot be uniquely identified and bundle them together into groups of specific V genes with highly similar sequences.
En Li; Xiaoqiang Li; Xiaobing Wu; Ge Feng; Man Zhang; Haitao Shi; Lijun Wang; Jianping Jiang
In this study, the complete nucleotide sequence (18,321 bp) of the mitochondrial (mt) genome of the round-tongued floating frog, Occidozyga martensii was determined. Although, the base composition and codon usage of O. martensii conformed to the typical vertebrate patterns, this mt genome contained 23 tRNAs (a tandem duplication of tRNA-Met gene). The LTPF tRNA-gene cluster, and the derived position of the ND5 gene downstream of the control region, were present in this mitogenome. Moreover, we found that in the WANCY tRNA-gene cluster, the tRNA-Asn gene was located between the tRNA-Tyr and COI genes instead of between the tRNA-Ala and tRNA-Cys genes, which is a novel mtDNA gene rearrangement in vertebrates. Based on the concatenated nucleotide sequences of the 13 protein-coding genes, phylogenetic analysis (BI, ML, MP) was performed to further clarify the phylogenetic relations of this species within anurans.
Dean, Rebecca; Wright, Alison E; Marsh-Rollo, Susan E; Nugent, Bridget M; Alonzo, Suzanne H; Mank, Judith E
Gene expression differences between males and females often underlie sexually dimorphic phenotypes, and the expression levels of genes that are differentially expressed between the sexes are thought to respond to sexual selection. Most studies on the transcriptomic response to sexual selection treat sexual selection as a single force, but postmating sexual selection in particular is expected to specifically target gonadal tissue. The three male morphs of the ocellated wrasse (Symphodus ocellatus) make it possible to test the role of postmating sexual selection in shaping the gonadal transcriptome. Nesting males hold territories and have the highest reproductive success, yet we detected feminization of their gonadal gene expression compared to satellite males. Satellite males are less brightly coloured and experience more intense sperm competition than nesting males. In line with postmating sexual selection affecting gonadal gene expression, we detected a more masculinized expression profile in satellites. Sneakers are the lowest quality males and showed both de-masculinization and de-feminization of gene expression. We also detected higher rates of gene sequence evolution of male-biased genes compared to unbiased genes, which could at least in part be explained by positive selection. Together, these results reveal the potential for postmating sexual selection to drive higher rates of gene sequence evolution and shape the gonadal transcriptome profile.
Full Text Available Functional gene ecological analyses using amplicon sequencing can be challenging as translated sequences are often burdened with shifted reading frames. The aim of this work was to evaluate several bioinformatics tools designed to correct errors which arise during sequencing in an effort to reduce the number of frame-shifts (FS. Genes encoding for alpha subunits of biphenyl (bphA and benzoate (benA dioxygenases were used as model sequences. FrameBot, a FS correction tool, was able to reduce the number of detected FS to zero. However, up to 43.1% of sequences were discarded by FrameBot as non-specific targets. Therefore, we proposed a de novo mode of FrameBot for FS correction, which works on a similar basis as common chimera identifying platforms and is not dependent on reference sequences. By nature of FrameBot de novo design, it is crucial to provide it with data as error free as possible. We tested the ability of several publicly available correction tools to decrease the number of errors in the data sets. The combination of Maximum Expected Error (MEE filtering and single linkage pre-clustering (SLP proved the most efficient read procession. Applying FrameBot de novo on the processed data enabled analysis of BphA sequences with minimal losses of potentially functional sequences not homologous to those previously known. This experiment also demonstrated the extensive diversity of dioxygenases in soil. A script which performs FrameBot de novo is presented in the supplementary material to the study and the tool was implemented into FunGene Pipeline available at http://fungene.cme.msu.edu/FunGenePipeline/ and https://github.com/rdpstaff/Framebot.
Full Text Available Abstract Background Mungbean is an important economical crop in Asia. However, genomic research has lagged behind other crop species due to the lack of polymorphic DNA markers found in this crop. The objective of this work is to develop and characterize microsatellite or simple sequence repeat (SSR markers from genome shotgun sequencing of mungbean. Result We have generated and characterized a total of 470,024 genome shotgun sequences covering 100.5 Mb of the mungbean (Vigna radiata (L. Wilczek genome using 454 sequencing technology. We identified 1,493 SSR motifs that could be used as potential molecular markers. Among 192 tested primer pairs in 17 mungbean accessions, 60 loci revealed polymorphism with polymorphic information content (PIC values ranging from 0.0555 to 0.6907 with an average of 0.2594. Majority of microsatellite markers were transferable in Vigna species, whereas transferability rates were only 22.90% and 24.43% in Phaseolus vulgaris and Glycine max, respectively. We also used 16 SSR loci to evaluate phylogenetic relationship of 35 genotypes of the Asian Vigna group. The genome survey sequences were further analyzed to search for gene content. The evidence suggested 1,542 gene fragments have been sequence tagged, that fell within intersected existing gene models and shared sequence homology with other proteins in the database. Furthermore, potential microRNAs that could regulate developmental stages and environmental responses were discovered from this dataset. Conclusion In this report, we provided evidence of generating remarkable levels of diverse microsatellite markers and gene content from high throughput genome shotgun sequencing of the mungbean genomic DNA. The markers could be used in germplasm analysis, accessing genetic diversity and linkage mapping of mungbean.
Ben-Ari Fuchs, Shani; Lieder, Iris; Mazor, Yaron; Buzhor, Ella; Kaplan, Sergey; Bogoch, Yoel; Plaschkes, Inbar; Shitrit, Alina; Rappaport, Noa; Kohn, Asher; Edgar, Ron; Shenhav, Liraz; Safran, Marilyn; Lancet, Doron; Guan-Golan, Yaron; Warshawsky, David; Shtrichman, Ronit
Abstract Postgenomics data are produced in large volumes by life sciences and clinical applications of novel omics diagnostics and therapeutics for precision medicine. To move from “data-to-knowledge-to-innovation,” a crucial missing step in the current era is, however, our limited understanding of biological and clinical contexts associated with data. Prominent among the emerging remedies to this challenge are the gene set enrichment tools. This study reports on GeneAnalytics™ (geneanalytics.genecards.org), a comprehensive and easy-to-apply gene set analysis tool for rapid contextualization of expression patterns and functional signatures embedded in the postgenomics Big Data domains, such as Next Generation Sequencing (NGS), RNAseq, and microarray experiments. GeneAnalytics' differentiating features include in-depth evidence-based scoring algorithms, an intuitive user interface and proprietary unified data. GeneAnalytics employs the LifeMap Science's GeneCards suite, including the GeneCards®—the human gene database; the MalaCards—the human diseases database; and the PathCards—the biological pathways database. Expression-based analysis in GeneAnalytics relies on the LifeMap Discovery®—the embryonic development and stem cells database, which includes manually curated expression data for normal and diseased tissues, enabling advanced matching algorithm for gene–tissue association. This assists in evaluating differentiation protocols and discovering biomarkers for tissues and cells. Results are directly linked to gene, disease, or cell “cards” in the GeneCards suite. Future developments aim to enhance the GeneAnalytics algorithm as well as visualizations, employing varied graphical display items. Such attributes make GeneAnalytics a broadly applicable postgenomics data analyses and interpretation tool for translation of data to knowledge-based innovation in various Big Data fields such as precision medicine, ecogenomics, nutrigenomics
Full Text Available It is known that the fragrant trait in rice (Oryza sativa L. is largely controlled by fgr gene on chromosome 8 and it has been specified that the existence of an 8 bp deletion and three single nucleotide polymorphism (SNP in exon 7 is effective on this trait. In this study, sequence alignment analysis of fgr exon7 on chromosome 8 for 11 different fragrant and non-fragrant cultivars revealed that 5 aromatic rice cultivars carried 3 SNPs and 8 bp deletion in exon7 which terminates prematurely at a TAA stop codon. However, 5 of the non-aromatics showed a sequence identical to the published Nipponbare, being non-fragrant Japonica variety sequence. An exception among them was Bejar, which had 8 bp deletion and 3SNPs but it was non-aromatic. Sequencing can determine nucleotide alignment of a gene and give beneficial information about gene function. In silico prediction showed proteins sequences alignment of fgr gene for Khazar and Domsiah genotypes were different. Betaine aldehyde dehydrogenase complete enzyme belongs to Khazar non-fragrant genotype that has complete length and 503 amino acids while non-functional BADH2 enzyme for Domsiah fragrant genotype has 251 amino acids that result in accumulate 2-acetyl-1-pyrroline (2AP and produces aroma in fragrant genotypes.
He, Y L; Wu, Y H; Quan, F S; Liu, Y G; Zhang, Y
To better understand the function of the myostatin gene and its promoter region in bovine, we amplified and sequenced the myostatin gene and promoter from the blood of Qinchuan and Red Angus cattle by using polymerase chain reaction. The sequences of Qinchuan and Red Angus cattle were compared with those of other cattle breeds available in GenBank. Exon splice sites were confirmed by mRNA sequencing. Compared to the published sequence (GenBank accession No. AF320998), 69 single nucleotide polymorphisms (SNPs) were identified in the Qinchuan myostatin gene, only one of which was an insertion mutation in Qinchuan cattle. There was a 16-bp insertion in the first 705-bp intron in 3 Qinchuan cattle. A total of 7 SNPs were identified in exon 3, in which the mutation occurred in the third base of the codon and was synonymous. On comparing the Qinchuan myostatin gene sequence to that of Red Angus cattle, a total of 50 SNPs were identified in the first and third exons. In addition, there were 18 SNPs identified in the Qinchuan cattle promoter region compared with those of other cattle compared to the Red Angus cattle myostatin promoter region. breeds (GenBank accession No. AF348479), but only 14 SNPs when compared to the Red Angus cattle myostatin promoter region.
张海予; 林敏; 萧凤回; 朱新生; 方宣钧; 尤崇杓; 朱玉贤
Total DNA of Alcaligenes faecalis was probed with both the nifH and nifHD sequences from K. pneumoniae. One positive band of about 4.6 kb was discovered. This nifH homologous fragment was cloned into the vector pBluescript SK to construct the recombinant plasmid pBZl. The inserted fragment in pBZl was analyzed by physical mapping and was further subcloned for sequencing. It was found that this A. faecalis nifHDK homology pos-sessed a typical σ54-dependent promoter region with upstream activator sequence (UAS) and A-T rich region. The nifH and nifD ORFs were 888 and 1 476 bp long respectively. The GC contents of these two genes were about 61. 6% and 60.0% . The intergenic regions of nifH-nifD and nifD-nifK were 101 and 105 bp respectively. There were sepa-rate SD sequences upstream of all the three genes. The deduced amino acid sequences of the nifH gene product (the Fe-protein ) and the nifD gene product (the Mo-Fc-protein) were also highly homologous to other nitrogen-fixing bacteria, especially in th
Sakoyama, Y.; Hong, K.J.; Byun, S.M.; Hisajima, H.; Ueda, S.; Yaoita, Y.; Hayashida, H.; Miyata, T.; Honjo, T.
To determine the phylogenetic relationships among hominoids and the dates of their divergence, the complete nucleotide sequences of the constant region of the immunoglobulin eta-chain (C/sub eta1/) genes from chimpanzee and orangutan have been determined. These sequences were compared with the human eta-chain constant-region sequence. A molecular clock (silent molecular clock), measured by the degree of sequence divergence at the synonymous (silent) positions of protein-encoding regions, was introduced for the present study. From the comparison of nucleotide sequences of ..cap alpha../sub 1/-antitrypsin and ..beta..- and delta-globulin genes between humans and Old World monkeys, the silent molecular clock was calibrated: the mean evolutionary rate of silent substitution was determined to be 1.56 x 10/sup -9/ substitutions per site per year. Using the silent molecular clock, the mean divergence dates of chimpanzee and orangutan from the human lineage were estimated as 6.4 +/- 2.6 million years and 17.3 +/- 4.5 million years, respectively. It was also shown that the evolutionary rate of primate genes is considerably slower than those of other mammalian genes.
Sekhon, Rajandeep S.; Briskine, Roman; Hirsch, Candice N.; Myers, Chad L.; Springer, Nathan M.; Buell, C. Robin; de Leon, Natalia; Kaeppler, Shawn M.
Transcriptome analysis is a valuable tool for identification and characterization of genes and pathways underlying plant growth and development. We previously published a microarray-based maize gene atlas from the analysis of 60 unique spatially and temporally separated tissues from 11 maize organs . To enhance the coverage and resolution of the maize gene atlas, we have analyzed 18 selected tissues representing five organs using RNA sequencing (RNA-Seq). For a direct comparison of the two methodologies, the same RNA samples originally used for our microarray-based atlas were evaluated using RNA-Seq. Both technologies produced similar transcriptome profiles as evident from high Pearson's correlation statistics ranging from 0.70 to 0.83, and from nearly identical clustering of the tissues. RNA-Seq provided enhanced coverage of the transcriptome, with 82.1% of the filtered maize genes detected as expressed in at least one tissue by RNA-Seq compared to only 56.5% detected by microarrays. Further, from the set of 465 maize genes that have been historically well characterized by mutant analysis, 427 show significant expression in at least one tissue by RNA-Seq compared to 390 by microarray analysis. RNA-Seq provided higher resolution for identifying tissue-specific expression as well as for distinguishing the expression profiles of closely related paralogs as compared to microarray-derived profiles. Co-expression analysis derived from the microarray and RNA-Seq data revealed that broadly similar networks result from both platforms, and that co-expression estimates are stable even when constructed from mixed data including both RNA-Seq and microarray expression data. The RNA-Seq information provides a useful complement to the microarray-based maize gene atlas and helps to further understand the dynamics of transcription during maize development. PMID:23637782
Hao, Huijing; Liang, Junrong; Duan, Ran; Chen, Yuhuang; Liu, Chang; Xiao, Yuchun; Li, Xu; Su, Mingming; Jing, Huaiqi; Wang, Xin
API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.
Full Text Available API 20E strip test, the standard for Enterobacteriaceae identification, is not sufficient to discriminate some Yersinia species for some unstable biochemical reactions and the same biochemical profile presented in some species, e.g. Yersinia ferderiksenii and Yersinia intermedia, which need a variety of molecular biology methods as auxiliaries for identification. The 16S rRNA gene is considered a valuable tool for assigning bacterial strains to species. However, the resolution of the 16S rRNA gene may be insufficient for discrimination because of the high similarity of sequences between some species and heterogeneity within copies at the intra-genomic level. In this study, for each strain we randomly selected five 16S rRNA gene clones from 768 Yersinia strains, and collected 3,840 sequences of the 16S rRNA gene from 10 species, which were divided into 439 patterns. The similarity among the five clones of 16S rRNA gene is over 99% for most strains. Identical sequences were found in strains of different species. A phylogenetic tree was constructed using the five 16S rRNA gene sequences for each strain where the phylogenetic classifications are consistent with biochemical tests; and species that are difficult to identify by biochemical phenotype can be differentiated. Most Yersinia strains form distinct groups within each species. However Yersinia kristensenii, a heterogeneous species, clusters with some Yersinia enterocolitica and Yersinia ferderiksenii/intermedia strains, while not affecting the overall efficiency of this species classification. In conclusion, through analysis derived from integrated information from multiple 16S rRNA gene sequences, the discrimination ability of Yersinia species is improved using our method.
Kong, F; Ma, Z; James, G; Gordon, S; Gilbert, G L
Ureaplasma urealyticum has been divided into 14 serovars. Recently, subdivision of U. urealyticum into two species has been proposed: U. parvum (previously U. urealyticum parvo biovar), comprising four serovars (1, 3, 6, 14) and U. urealyticum (previously U. urealyticum T-960 biovar), 10 serovars (2, 4, 5, 7-13). The multiple-banded antigen (MBA) genes of these species contain both species and serovar/subtype specific sequences. Based on whole sequences of the 5'-ends of MBA genes of U. parvum serovars and partial sequences of the 5'-ends of MBA genes of U. urealyticum serovars, we previously divided each of these species into three MBA genotypes. To further elucidate the relationships between serovars, we sequenced the whole 5'-ends of MBA genes of all 10 U. urealyticum serovars and partial repetitive regions of these genes from all serovars of U. parvum and U. urealyticum. For the first time, all four serovars of U. parvum were clearly differentiated from each other. In addition, the 10 serovars of U. urealyticum were divided into five MBA genotypes, as follows: MBA genotype A comprises serovars 2, 5, 8; MBA genotype B, serovar 10 only; MBA genotype C, serovars 4, 12, 13; MBA genotype D, serovar 9 only; and MBA genotype E comprises serovars 7 and 11. There were no sequence differences between members within each MBA genotype. Further work is required to identify other genes or other regions of the MBA genes that may be used to differentiate U. urealyticum serovars within MBA genotypes A, C and E. A better understanding of the molecular basis of serotype differentiation will help to improve subtyping methods for use in studies of the pathogenesis and epidemiology of these organisms.
Wang, Tao; Kong, Fanrong; Chen, Sharon; Xiao, Meng; Sorrell, Tania; Wang, Xiaoyan; Wang, Shuo; Sintchenko, Vitali
The identification of fastidious aerobic Actinomycetes such as Gordonia, Rhodococcus, and Tsukamurella has remained a challenge leading to clinically significant misclassifications. This study is intended to examine the feasibility of partial 5'-end 16S rRNA gene sequencing for the identification of Gordonia, Rhodococcus, and Tsukamurella, and defined potential reference sequences for species from each of these genera. The 16S rRNA gene sequence based identification algorithm for species identification was used and enhanced by aligning test sequences with reference sequences from the List of Prokaryotic Names with Standing in Nomenclature. Conventional PCR based 16S rRNA gene sequencing and the alignment of the isolate 16S rRNA gene sequence with reference sequences accurately identified 100% of clinical strains of aerobic Actinomycetes. While partial 16S rRNA gene sequences of reference type strains matched with the 16S rRNA gene sequences of 19 isolates in our data set, another 13 strains demonstrated a degree of polymorphism with a 1-4 bp difference in the regions of difference. 5'-end 606 bp 16S rRNA gene sequencing, coupled with the assignment of well defined reference sequences to clinically relevant species of bacteria, can be a useful strategy for improving the identification of clinically relevant aerobic Actinomycetes.
Allsopp, M; Visser, E S; du Plessis, J L; Vogel, S W; Allsopp, B A
Cowdria ruminantium is a rickettsial parasite which causes heartwater, a economically important disease of domestic and wild ruminants in tropical and subtropical Africa and parts of the Caribbean. Because existing diagnostic methods are unreliable, we investigated the small-subunit ribosomal RNA (srRNA) gene from heartwater-infected material to characterise the organisms present and to develop specific oligonucleotide probes for polymerase chain reaction (PCR) based diagnosis. DNA was obtained from ticks and ruminants from heartwater-free and heartwater-endemic areas from Cowdria in tissue culture. PCR was carried out using primers designed to amplify only rickettsial srRNA genes, the target region being the highly variable V1 loop. Amplicons were cloned and sequenced; 51% were C. ruminantium sequences corresponding to four genotypes, two of which were identical to previously reported C. ruminantium sequences while the other two were new. The four different Cowdria genotypes can be correlated with different phenotypes. Tissue-culture samples yielded only Cowdria genotype sequences, but an extraordinary heterogeneity of 16S sequences was obtained from field samples. In addition to Cowdria genotypes we found sequences from previously unknown Ehrlichia spp., sequences showing homology to other Rickettsiales and a variety of Pseudomonadaceae. One Ehrlichia sequence was phylogenetically closely related to Ehrlichia platys (Group II Ehrlichia) and one to Ehrlichia canis (Group III Ehrlichia). This latter sequence was from an isolate (Germishuys) made from a naturally infected sheep which, from brain smear examination and pathology, appeared to be suffering from heartwater; nevertheless no Cowdria genotype sequences were found in this isolate. In addition no Cowdria sequences were obtained from uninfected ticks. Complete 16S rRNA gene sequences were determined for two C. ruminantium genotypes and for two previously uncharacterised heartwater-associated Ehrlichia spp
Schreiber, Fabian; Wörheide, Gert; Morgenstern, Burkhard
In the absence of whole genome sequences for many organisms, the use of expressed sequence tags (EST) offers an affordable approach for researchers conducting phylogenetic analyses to gain insight about the evolutionary history of organisms. Reliable alignments for phylogenomic analyses are based on orthologous gene sequences from different taxa. So far, researchers have not sufficiently tackled the problem of the completely automated construction of such datasets. Existing software tools are either semi-automated, covering only part of the necessary data processing, or implemented as a pipeline, requiring the installation and configuration of a cascade of external tools, which may be time-consuming and hard to manage. To simplify data set construction for phylogenomic studies, we set up a web server that uses our recently developed OrthoSelect approach. To the best of our knowledge, our web server is the first web-based EST analysis pipeline that allows the detection of orthologous gene sequences in EST libraries and outputs orthologous gene alignments. Additionally, OrthoSelect provides the user with an extensive results section that lists and visualizes all important results, such as annotations, data matrices for each gene/taxon and orthologous gene alignments. The web server is available at http://orthoselect.gobics.de.
Ke-Xia Wang; Xue-Feng Wang
AIM: To clone and sequence the cagA gene fragment of Helicobacter pylori ( H pylori) with coccoid form.METHODS: H pylori strain NCTC11637 were transformed to coccoid form by exposure to antibiotics in subinhibitory concentrations. The coccoid H pyloriwas collected. cagA gene of the coccoid H pylori strain was amplified by PCR.After purified, the target fragment was cloned into plasmid pMD-18T. The recombinant plasmid pMD-18T-cagA was transformed into E. coli JM109. Positive clones were screened and identified by PCR and digestion with restriction endonucleases. The sequence of inserted fragment was then analysed.RESULTS: cagA gene of 3 444 bp was obtained from the coccoid H pylori genome DNA. The recombinant plasmid pMD-18T-cagA was constructed, then it was digested by BamH Ⅰ+Sac Ⅰ, and the product of digestion was identical with the predicted one. Sequence analysis showed that the homology of coccoid and the reported original sequence H pylori was 99.7%.CONCLUSION: The recombinant plasmid containing cagA gene from coccoid H pylori has been constructed successfully.The coccoid H pylori contain completed cagA gene, which may be related to pathogenicity of them.
Read, T D; Satola, S W; Farley, M M
Haemophilus influenzae pili are surface structures that promote attachment to human epithelial cells. The five genes that encode pili, hifABCDE, are found inserted in genomes either between pmbA and hpt (hif-1) or between purE and pepN (hif-2). We determined the sequence between the ends of the pilus clusters and bordering genes in a number of H. influenzae strains. The junctions of the hif-1 cluster (limited to biogroup aegyptius isolates) are structurally simple. In contrast, hif-2 junctions are highly diverse, complex assemblies of conserved intergenic sequences (including genes hicA and hicB) with evidence of frequent recombination. Variation at hif-2 junctions seems to be tied to multiple copies of a 23-bp Haemophilus intergenic dyad sequence. The hif-1 cluster appears to have originated in biogroup aegyptius strains from invasion of the hpt-pmbA region by a DNA template containing the hif-2 genes with termini in the hairpin loop of flanking intergenic dyad sequences. The pilus gene clusters are an interesting model of a mobile "pathogenicity island" not associated with a phage, transposon, or insertion element.
Igor R. Costa
Full Text Available Essential amino acids (EAA consist of a group of nine amino acids that animals are unable to synthesize via de novo pathways. Recently, it has been found that most metazoans lack the same set of enzymes responsible for the de novo EAA biosynthesis. Here we investigate the sequence conservation and evolution of all the metazoan remaining genes for EAA pathways. Initially, the set of all 49 enzymes responsible for the EAA de novo biosynthesis in yeast was retrieved. These enzymes were used as BLAST queries to search for similar sequences in a database containing 10 complete metazoan genomes. Eight enzymes typically attributed to EAA pathways were found to be ubiquitous in metazoan genomes, suggesting a conserved functional role. In this study, we address the question of how these genes evolved after losing their pathway partners. To do this, we compared metazoan genes with their fungal and plant orthologs. Using phylogenetic analysis with maximum likelihood, we found that acetolactate synthase (ALS and betaine-homocysteine S-methyltransferase (BHMT diverged from the expected Tree of Life (ToL relationships. High sequence conservation in the paraphyletic group Plant-Fungi was identified for these two genes using a newly developed Python algorithm. Selective pressure analysis of ALS and BHMT protein sequences showed higher non-synonymous mutation ratios in comparisons between metazoans/fungi and metazoans/plants, supporting the hypothesis that these two genes have undergone non-ToL evolution in animals.
Pua, Chee Jian; Bhalshankar, Jaydutt; Miao, Kui; Walsh, Roddy; John, Shibu; Lim, Shi Qi; Chow, Kingsley; Buchan, Rachel; Soh, Bee Yong; Lio, Pei Min; Lim, Jaclyn; Schafer, Sebastian; Lim, Jing Quan; Tan, Patrick; Whiffin, Nicola; Barton, Paul J; Ware, James S; Cook, Stuart A
Inherited cardiac conditions (ICCs) are characterised by marked genetic and allelic heterogeneity and require extensive sequencing for genetic characterisation. We iteratively optimised a targeted gene capture panel for ICCs that includes disease-causing, putatively pathogenic, research and phenocopy genes (n = 174 genes). We achieved high coverage of the target region on both MiSeq (>99.8% at ≥ 20× read depth, n = 12) and NextSeq (>99.9% at ≥ 20×, n = 48) platforms with 100% sensitivity and precision for single nucleotide variants and indels across the protein-coding target on the MiSeq. In the final assay, 40 out of 43 established ICC genes informative in clinical practice achieved complete coverage (100 % at ≥ 20×). By comparison, whole exome sequencing (WES; ∼ 80×), deep WES (∼ 500×) and whole genome sequencing (WGS; ∼ 70×) had poorer performance (88.1, 99.2 and 99.3% respectively at ≥ 20×) across the ICC target. The assay described here delivers highly accurate and affordable sequencing of ICC genes, complemented by accessible cloud-based computation and informatics. See Editorial in this issue (DOI: 10.1007/s12265-015-9667-8 ).
Shyam S. Dahiya
Full Text Available The Toll-like receptor 2 (TLR2 gene of old world camels (Camelus dromedarius and Camelus bactrianus was cloned and sequenced. The TLR2 gene of the dromedary camel had the highest nucleotide and amino acid identity with pig, i.e., 66.8% and 59.6%, respectively. Similarly, the TLR2 gene of the Bactrian camel also had the highest nucleotide and amino acid identity with pig, i.e., 85.7% and 81.4%, respectively. Dromedary and Bactrian camels shared 77.9% nucleotide and 73.6% amino acid identity with each other. Interestingly, the amidation motif is present in camel (Dromedary and Bactrian TLR2 only, and the TIR domain is absent in Dromedary camel TLR2. This is the first report of the TLR2 gene sequence of Dromedary and Bactrian camels.
JIN Wen-lin; Yamaguchi Hirofumi; Isigami Matiko; Yasuda Kentaro
This study describes variation of intron-3 of a-amylase gene from 156 breeds of adzuki beansusing SSCP(single-strand conformation polymorphism)analysis. Based on a-amylase gene structure and se-quence, A pair of PCR primers, F (CCTACATTCTAACACACCCT) and R (GCATATTGTGCCAGTACAAT)were designed to amplify intron-3 fragments of a-amylase gene. 14 variant types were detected, including 13,9, 10, 4 variant types in the wild, weed, locally cultivated and modern brought-up adzuki beans respectively,9, 8, 7 variant types of the wild adzuki beans from Japan, China and Korea respectively, and some other va-riant types in the local adzuki beans from China and Bhutan. 60 % of subjects of cultivated races were found tobe EE type in the experiment. In addition, sequence analysis of intron-3 of α-amylase gene from 8 varianttypes reveals the evolution process of various variant types in adzuki beans.
Xinli, Xiao; Lei, Peng
The complete mRNA sequence of watermelon Rab18 gene was amplified through the rapid amplification of cDNA ends (RACE) method. The full-length mRNA was 1010 bp containing a 645 bp open reading frame, which encodes a protein of 214 amino acids. Sequence analysis revealed that watermelon Rab18 protein shares high homology with the Rab18 of cucumber (99%), muskmelon (98%), Morus notabilis (90%), tomato (89%), wine grape (89%) and potato (88%). Phylogenetic analysis revealed that watermelon Rab18 gene has a closer genetic relationship with Rab18 gene of cucumber and muskmelon. Tissue expression profile analysis indicated that watermelon Rab18 gene was highly expressed in root, stem and leaf, moderately expressed in flower and weakly expressed in fruit.
LIU Jinxian; GAO Tianxiang; WANG Yujiang; ZHANG Yaping
Sequence variation of partial cytochrome b genes between two Coilia species, C. ectenes and C. mystus, was investigated. Of the 402 nucleotides, twenty-seven (6.72%) are polymorphic and all are synonymous substitutions. At the third positions of genetic condon of cytochrome b gene, the two species show an extreme anti-G bias (＜ 4 % ) and a pronounced bias towards A and C (＞68%). There is no amino acid sequence divergence between the partial cytochrome b genes of the two species, indicating a close genetic relationship between them. The k-2p genetic distance of partial cytochrome b segment of the two species is 0.072, suggesting that the species were separated 3.6 Ma ago, in the middle Pliocene. Our result reveals that the cytochrome b gene is an appropriate marker for studies of population genetic structures and phylogeographic patterns of the two species.
Hunt, D.M.; Cowing, J.A.; Patel, R. [Univ. of London (United Kingdom)] [and others
The sequences of the blue cone photopigments in the talapoin monkey (Miopithecus talapoin), an Old World primate, and in the marmoset (Callithrix jacchus), a New World monkey, are presented. Both genes are composed of 5 exons separated by 4 introns. In this respect, they are identical to the human blue gene, and intron sizes are also similar. Based on the level of amino acid identity, both monkey pigments are members of the S branch of pigments. Alignment of these sequences with the human gene requires the insertion/deletion of two separate codons in exon 1. The silent site divergence between these primate blue genes indicates a separation of the Old and New World primate lineages around 43 million years ago. 41 refs., 1 fig., 3 tabs.
Wanling Yang; Dingge Ying; Yu-Lung Lau
procedures may allow detection of many expres-sion features for less abundant gene variants. With the reduction of sequencing cost and the emerging of new generation sequencing technology, in-depth sequencing of cDNA pools or libraries may represent a better and powerful tool in gene expression profiling and cancer biomarker detection. We also propose using sequence-specific subtraction to remove hundreds of the most abundant housekeeping genes to in-crease sequencing depth without affecting relative expression ratio of other genes, as transcripts from as few as 300 most abundantly expressed genes constitute about 20% of the total transcriptome. In-depth sequencing also represents a unique ad-vantage of detecting unknown forms of transcripts, such as alternative splicing variants, fusion genes, and regulatory RNAs, as well as detecting mutations and polymorphisms that may play important roles in disease pathogenesis.
Li, Shou-Dong; Song, Li-Rong; Liu, Yong-Ding; Zhao, Jin-Dong
The structure gene for ferredoxin, petFI, from Anabaena siamensis has been amplified by polymerase chain reaction(PCR) and cloned into cloning vector pGEM-3zf(+). The nucleotide sequence of petFI has been determined with silver staining sequencing method. There is 96.8% homology between coding region of petFI from A. siamensis and that of petFI from A. sp. 7120. Amino acid sequences of seven strains of blue-green algae are compared.
LIU Ji-mei; CHENG Zai-quan; YANG Ming-zhi; WU Cheng-jun; WANG Ling-xian; SUN Yi-ding; HUANG Xing-qi
Two sets of degenerate oligonucleotide primers were designed according to amino acid conservedregions of reported plant disease resistance genes which encode proteins that contain nucleotide-binding site andleucine-rich repeats(NBS-LRR), and the plant disease resistance genes which encode serine/threonine proteinkinase(STK). By polymerase chain reaction(PCR), disease resistance gene analogues have been amplified fromthree wild rice species in Yunnan Province, China. The DNA fragments from amplification have been clonedinto the pGEM-T vector respectively. Sequencing of the DNA fragments indicated that 7 classes, 2 classes and6 classes NBS-LRR disease resistance gene analogues from Oryza rufipogon Griff. , Oryza officinalis Wall. ,and Oryza meyeriana Baill. were obtained respectively. The two representative fragments of TO12 from Ory-za officinalis Wall. and TR19 from Oryza rufipogon Griff. belong to the same class and homology of theirsequences are 100%. The result shows that the sequences of the same class disease resistance gene analogueshave no difference among different species of wild rice. 5 classes STK disease resistance gene analogues werealso obtained among which 4 classes from Oryza rufipogon Griff. , 1 class from Oryza officinalis Wall. Bycomparison analysis of amino acid sequences, we found that the obtained disease resistance gene analogues havevery iow identity(low to 25%) with the reported disease resistance gene L6, N, Bs2, Prf, Pto, Lr10 and Xa21etc. The finding suggests that the obtained disease resistance gene analogues are analogues of putative diseaseresistance genes that have not been isolated so far.
Full Text Available Abstract Background The rate of emergence of human pathogens is steadily increasing; most of these novel agents originate in wildlife. Bats, remarkably, are the natural reservoirs of many of the most pathogenic viruses in humans. There are two bat genome projects currently underway, a circumstance that promises to speed the discovery host factors important in the coevolution of bats with their viruses. These genomes, however, are not yet assembled and one of them will provide only low coverage, making the inference of most genes of immunological interest error-prone. Many more wildlife genome projects are underway and intend to provide only shallow coverage. Results We have developed a statistical method for the assembly of gene families from partial genomes. The method takes full advantage of the quality scores generated by base-calling software, incorporating them into a complete probabilistic error model, to overcome the limitation inherent in the inference of gene family members from partial sequence information. We validated the method by inferring the human IFNA genes from the genome trace archives, and used it to infer 61 type-I interferon genes, and single type-II interferon genes in the bats Pteropus vampyrus and Myotis lucifugus. We confirmed our inferences by direct cloning and sequencing of IFNA, IFNB, IFND, and IFNK in P. vampyrus, and by demonstrating transcription of some of the inferred genes by known interferon-inducing stimuli. Conclusion The statistical trace assembler described here provides a reliable method for extracting information from the many available and forthcoming partial or shallow genome sequencing projects, thereby facilitating the study of a wider variety of organisms with ecological and biomedical significance to humans than would otherwise be possible.
Cahn, S B
Two synthetic gene circuits -- the genetic toggle switch and the repressilator -- are analyzed quantitatively and discussed in the context of an educational module on gene circuits and feedback that constitutes the final topic of a year-long introductory physics sequence, aimed at biology and premedical undergraduate students. The genetic toggle switch consists of two genes, each of whose protein product represses the other's expression, while the repressilator consists of three genes, each of whose protein product represses the next gene's expression. Analytic, numerical, and electronic treatments of the genetic toggle switch shows that this gene circuit realizes bistability. A simplified treatment of the repressilator reveals that this circuit can realize sustained oscillations. In both cases, a "phase diagram" is obtained, that specifies the region of parameter space in which bistability or oscillatory behavior, respectively, occurs.
Full Text Available Abstract Background Duplicated genes frequently experience asymmetric rates of sequence evolution. Relaxed selective constraints and positive selection have both been invoked to explain the observation that one paralog within a gene-duplicate pair exhibits an accelerated rate of sequence evolution. In the majority of studies where asymmetric divergence has been established, there is no indication as to which gene copy, ancestral or derived, is evolving more rapidly. In this study we investigated the effect of local synteny (gene-neighborhood conservation and codon usage on the sequence evolution of gene duplicates in the S. cerevisiae genome. We further distinguish the gene duplicates into those that originated from a whole-genome duplication (WGD event (ohnologs versus small-scale duplications (SSD to determine if there exist any differences in their patterns of sequence evolution. Results For SSD pairs, the derived copy evolves faster than the ancestral copy. However, there is no relationship between rate asymmetry and synteny conservation (ancestral-like versus derived-like in ohnologs. mRNA abundance and optimal codon usage as measured by the CAI is lower in the derived SSD copies relative to ancestral paralogs. Moreover, in the case of ohnologs, the faster-evolving copy has lower CAI and lowered expression. Conclusions Together, these results suggest that relaxation of selection for codon usage and gene expression contribute to rate asymmetry in the evolution of duplicated genes and that in SSD pairs, the relaxation of selection stems from the loss of ancestral regulatory information in the derived copy.
Ovunc, Bugsu; Otto, Edgar A.; Vega-Warner, Virginia; Saisawat, Pawaree; Ashraf, Shazia; Ramaswami, Gokul; Fathy, Hanan M.; Schoeb, Dominik; Chernin, Gil; Lyons, Robert H.; Engin YILMAZ; Hildebrandt, Friedhelm
In two siblings of consanguineous parents with intermittent nephrotic-range proteinuria, we identified a homozygous deleterious frameshift mutation in the gene CUBN, which encodes cubulin, using exome capture and massively parallel re-sequencing. The mutation segregated with affected members of this family and was absent from 92 healthy individuals, thereby identifying a recessive mutation in CUBN as the single-gene cause of proteinuria in this sibship. Cubulin mutations cause a hereditary fo...
Tong, Yong-Qing; Liu, Bei; Fu, Chao-Hong; Zheng, Hong-Yun; Gu, Jian; Liu, Hang; Luo, Hong-Bo; Li, Yan
PKHD1 gene mutations are found responsible for autosomal recessive polycystic kidney disease (ARPKD). However, it is inconvenient to detect the mutations by common polymerase chain reaction (PCR) because the open reading frame of PKHD1 is very long. Recently, long-range (LR) PCR is demonstrated to be a more sensitive mutation screening method for PKHD1 by directly sequencing. In this study, the entire PKHD1 coding region was amplified by 29 reactions to avoid the specific PCR amplification of individual exons, which generated the size of 1 to 7 kb products by LR PCR. This method was compared to the screening method with standard direct sequencing of each individual exon of the gene by a reference laboratory in 15 patients with ARPKD. The results showed that a total of 37 genetic changes were detected with LR PCR sequencing, which included 33 variations identified by the reference laboratory with standard direct sequencing. LR PCR sequencing had 100% sensitivity, 96% specificity, and 97.0% accuracy, which were higher than those with standard direct sequencing method. In conclusion, LR PCR sequencing is a reliable method with high sensitivity, specificity and accuracy for detecting genetic variations. It also has more intronic coverage and lower cost, and is an applicable clinical method for complex genetic analyses.
Sawada, Ryusuke; Mitaku, Shigeki
The exon-intron structure of eukaryotic genes raises a question about the distribution of transmembrane regions in membrane proteins. Were exons that encode transmembrane regions formed simply by inserting introns into preexisting genes or by some kind of exon shuffling? To answer this question, the exon-per-gene distribution was analyzed for all genes in 40 eukaryotic genomes with a particular focus on exons encoding transmembrane segments. In 21 higher multicellular eukaryotes, the percentage of multi-exon genes (those containing at least one intron) within all genes in a genome was high (>70%) and with a mean of 87%. When genes were grouped by the number of exons per gene in higher eukaryotes, good exponential distributions were obtained not only for all genes but also for the exons encoding transmembrane segments, leading to a constant ratio of membrane proteins independent of the exon-per-gene number. The positional distribution of transmembrane regions in single-pass membrane proteins showed that they are generally located in the amino or carboxyl terminal regions. This nonrandom distribution of transmembrane regions explains the constant ratio of membrane proteins to the exon-per-gene numbers because there are always two terminal (i.e., the amino and carboxyl) regions - independent of the length of sequences.
Keinänen, R A; Wallén, M J; Kristo, P A; Laukkanen, M O; Toimela, T A; Helenius, M A; Kulomaa, M S
Using avidin cDNA as a hybridisation probe, we detected a gene family whose putative products are related to the chicken egg-white avidin. Two overlapping genomic clones were found to contain five genes (avidin-related genes 1-5, avr1-avr5), which have been cloned, characterized and sequenced. All of the genes have a four-exon structure with an overall identity with the avidin cDNA of 88-92%. The genes appear to have no pseudogenic features and, in fact, two of these genes have been shown to be transcribed. The putative proteins share a sequence identity of 68-78% with avidin. The amino acid residues responsible for the biotin-binding activity of avidin and the bacterial biotin-binding protein, streptavidin, are highly conserved. Since avidin is induced in both a progesterone-specific manner and in connection with inflammation, these genes offer a valuable tool to study complex gene regulation in vivo.
LIU Di; YANG Xiu-qin; YANG Jia-fang
Myostatin, with a highly conservative gene among breeds is a negative regulator of muscle. The 3' coding regions of wild boar and crossbred pig myostatin were cloned by RT-PCR and sequenced respectively. The homology of the nucleotide sequence between wild boar and crossbred pig was 100% and there was no difference in this region compared with pig myostatin gene of Genbank. This indicated that there was not change of gene sequence in this region during the evolution processes.
Full Text Available Abstract Background Obtaining the gene structure for a given protein encoding gene is an important step in many analyses. A software suited for this task should be readily accessible, accurate, easy to handle and should provide the user with a coherent representation of the most probable gene structure. It should be rigorous enough to optimise features on the level of single bases and at the same time flexible enough to allow for cross-species searches. Results WebScipio, a web interface to the Scipio software, allows a user to obtain the corresponding coding sequence structure of a here given a query protein sequence that belongs to an already assembled eukaryotic genome. The resulting gene structure is presented in various human readable formats like a schematic representation, and a detailed alignment of the query and the target sequence highlighting any discrepancies. WebScipio can also be used to identify and characterise the gene structures of homologs in related organisms. In addition, it offers a web service for integration with other programs. Conclusion WebScipio is a tool that allows users to get a high-quality gene structure prediction from a protein query. It offers more than 250 eukaryotic genomes that can be searched and produces predictions that are close to what can be achieved by manual annotation, for in-species and cross-species searches alike. WebScipio is freely accessible at http://www.webscipio.org.
Kim, Y S; Lee, S G; Park, S H; Song, K
The human DDX3 gene (GenBank accession No. U50553) is the human homologue of the mouse Ddx3 gene and is a member of the gene family that contains DEAD motifs. Previously, we mapped the gene to the Xp11.3-11.23. In this report, we describe the structural organization of the human DDX3 gene. It consisted of 17 exons that span approximately 16 kb. An Alu element was present in the intron 13. Its organization was the same as that of the human DBY gene, a closely related sequence present on the Y chromosome. We also identified two processed pseudogenes (DDX3) with a sequence that is highly homologous to those of DDX3 cDNAs, but contain a translation termination codon within its open-reading frame. Pseudogenes are mapped on human chromosomes 4 and X, respectively. In this paper, we discuss the relationships between DDX3 and its related sequences that have been isolated.
López Ferber, M; Argaud, O; Croizier, L; Croizier, G
Genetic differences between strains of a baculovirus are often limited to some restriction sites, short DNA deletions or absence of some nonessential genes. The recently coined bro gene family, represents a new major source of intraspecific variability. A comparison between two bro gene sets of Bombyx mori nucleopolyhedroviruses (NPV) shows that bro genes are distributed in three regions for the -T3 and -SC7 virus strains. In BmNPV T3, five bro genes are distributed in three genome locations, whereas the BmNPV SC7 strain possess a single bro copy in each region. In addition, each of the BmNPV SC7 bro genes belongs to one of the three subfamilies present in BmNPV T3. Analysis of bro copy sequences and of adjacent sequences suggests an active redistribution of sequences due to intraspecific recombination. The maintenance of one allele of each subfamily suggests that they play different roles in the viral cycle, and that they are essential.
Fischer, Iris; Steige, Kim A; Stephan, Wolfgang; Mboup, Mamadou
The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced) gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.
Full Text Available Melanocortin 4 Receptor gene is involved in sympathetic nerve activity, adrenal and thyroid functions, and media for leptin in regulating energy balance and homeostasis. The aim of this research was to perform genetic analysis of MC4R gene sequences from Bligon goats. Fourty blood samples of Bligon does were used for DNA extraction. The primers were designed after alignment of 12 DNA sequences of MC4R gene from goat, sheep, and cattle. The primers were constructed on the Capra hircus MC4R gene sequence from GenBank (accession No. NM_001285591. Two DNA polymorphisms of MC4R were revealed in exon region (g.998 A/G and g.1079 C/T. The SNP g.998 A/G was a non-synonymous polymorphism i.e., changing of amino acid from methionine (Met to isoleucine (Ile. The SNP g.1079 C/T was a synonymous polymorphism. Restriction enzyme mapping on Bligon goat MC4R gene revealed three restriction enzymes (RsaI (GT’AC, Acc651 (G’GTAC_C, and KpnI (G_GTAC’C, which can recognize the SNP at g.1079 C/T. The restriction enzymes may be used for genotyping of the gene target using PCR-RFLP method in the future research.
Full Text Available The wild tomato species Solanum chilense and S. peruvianum are a valuable non-model system for studying plant adaptation since they grow in diverse environments facing many abiotic constraints. Here we investigate the sequence evolution of regulatory regions of drought and cold responsive genes and their expression regulation. The coding regions of these genes were previously shown to exhibit signatures of positive selection. Expression profiles and sequence evolution of regulatory regions of members of the Asr (ABA/water stress/ripening induced gene family and the dehydrin gene pLC30-15 were analyzed in wild tomato populations from contrasting environments. For S. chilense, we found that Asr4 and pLC30-15 appear to respond much faster to drought conditions in accessions from very dry environments than accessions from more mesic locations. Sequence analysis suggests that the promoter of Asr2 and the downstream region of pLC30-15 are under positive selection in some local populations of S. chilense. By investigating gene expression differences at the population level we provide further support of our previous conclusions that Asr2, Asr4, and pLC30-15 are promising candidates for functional studies of adaptation. Our analysis also demonstrates the power of the candidate gene approach in evolutionary biology research and highlights the importance of wild Solanum species as a genetic resource for their cultivated relatives.
Li, Li; Brunk, Brian P.; Kissinger, Jessica C.; Pape, Deana; Tang, Keliang; Cole, Robert H.; Martin, John; Wylie, Todd; Dante, Mike; Fogarty, Steven J.; Howe, Daniel K.; Liberator, Paul; Diaz, Carmen; Anderson, Jennifer; White, Michael; Jerome, Maria E.; Johnson, Emily A.; Radke, Jay A.; Stoeckert, Christian J.; Waterston, Robert H.; Clifton, Sandra W.; Roos, David S.; Sibley, L. David
Large-scale EST sequencing projects for several important parasites within the phylum Apicomplexa were undertaken for the purpose of gene discovery. Included were several parasites of medical importance (Plasmodium falciparum, Toxoplasma gondii) and others of veterinary importance (Eimeria tenella, Sarcocystis neurona, and Neospora caninum). A total of 55,192 ESTs, deposited into dbEST/GenBank, were included in the analyses. The resulting sequences have been clustered into nonredundant gene assemblies and deposited into a relational database that supports a variety of sequence and text searches. This database has been used to compare the gene assemblies using BLAST similarity comparisons to the public protein databases to identify putative genes. Of these new entries, ∼15%–20% represent putative homologs with a conservative cutoff of p neurona: , , , , , , , , , , , , , –, –, –, –, –. Eimeria tenella: –, –, –, –, –, –, –, –, – , –, –, –, –, –, –, –, –, –, –, –. Neospora caninum: –, –, , – , –, –.] PMID:12618375
Full Text Available BACKGROUND: Aromatic essential oils extracted from fresh fruits of Litsea cubeba (Lour. Pers., have diverse medical and economic values. The dominant components in these essential oils are monoterpenes and sesquiterpenes. Understanding the molecular mechanisms of terpenoid biosynthesis is essential for improving the yield and quality of terpenes. However, the 40 available L. cubeba nucleotide sequences in the public databases are insufficient for studying the molecular mechanisms. Thus, high-throughput transcriptome sequencing of L. cubeba is necessary to generate large quantities of transcript sequences for the purpose of gene discovery, especially terpenoid biosynthesis related genes. RESULTS: Using Illumina paired-end sequencing, approximately 23.5 million high-quality reads were generated. De novo assembly yielded 68,648 unigenes with an average length of 834 bp. A total of 38,439 (56% unigenes were annotated for their functions, and 35,732 and 25,806 unigenes could be aligned to the GO and COG database, respectively. By searching against the Kyoto Encyclopedia of Genes and Genomes Pathway database (KEGG, 16,130 unigenes were assigned to 297 KEGG pathways, and 61 unigenes, which contained the mevalonate and 2-C-methyl-D-erythritol 4-phosphate pathways, could be related to terpenoid backbone biosynthesis. Of the 12,963 unigenes, 285 were annotated to the terpenoid pathways using the PlantCyc database. Additionally, 14 terpene synthase genes were identified from the transcriptome. The expression patterns of the 16 genes related to terpenoid biosynthesis were analyzed by RT-qPCR to explore their putative functions. CONCLUSION: RNA sequencing was effective in identifying a large quantity of sequence information. To our knowledge, this study is the first exploration of the L. cubeba transcriptome, and the substantial amount of transcripts obtained will accelerate the understanding of the molecular mechanisms of essential oils biosynthesis. The
By using a DNA fragment primarily encoding the reverse transcriptase (pol) region of the Syrian hamster intracisternal A particle (IAP; type A retrovirus) gene as a probe, human endogenous retrovirus genes, tentatively termed HERV-K genes, were cloned from a fetal human liver gene library. Typical HERV-K genes were 9.1 or 9.4 kilobases in length, having long terminal repeats (LTRs) of ca. 970 base pairs. Many structural features commonly observed on the retrovirus LTRs, such as the TATAA box, polyadenylation signal, and terminal inverted repeats, were present on each LTR, and a lysine (K) tRNA having a CUU anticodon was identified as a presumed primer tRNA. The HERV-K LTR, however, had little sequence homology to either the IAP LTR or other typical oncovirus LTRs. By filter hybridization, the number of HERV-K genes was estimated to be ca. 50 copies per haploid human genome. The cloned mouse mammary tumor virus (type B) gene was found to hybridize with both the HERV-K and IAP genes to essentially the same extent.
Chen, S W; Liu, T; Gao, Y; Zhang, C; Peng, S D; Bai, M B; Li, S J; Xu, L; Zhou, X Y; Lin, L B
Clubroot significantly affects plants of the Brassicaceae family and is one of the main diseases causing serious losses in B. napus yield. Few studies have investigated the clubroot-resistance mechanism in B. napus. Identification of clubroot-resistant genes may be used in clubroot-resistant breeding, as well as to elucidate the molecular mechanism behind B. napus clubroot-resistance. We used three B. napus transcriptome samples to construct a transcriptome sequencing library by using Illumina HiSeq™ 2000 sequencing and bioinformatic analysis. In total, 171 million high-quality reads were obtained, containing 96,149 unigenes of N50-value. We aligned the obtained unigenes with the Nr, Swiss-Prot, clusters of orthologous groups, and gene ontology databases and annotated their functions. In the Kyoto encyclopedia of genes and genomes database, 25,033 unigenes (26.04%) were assigned to 124 pathways. Many genes, including broad-spectrum disease-resistance genes, specific clubroot-resistant genes, and genes related to indole-3-acetic acid (IAA) signal transduction, cytokinin synthesis, and myrosinase synthesis in the Huashuang 3 variety of B. napus were found to be related to clubroot-resistance. The effective clubroot-resistance observed in this variety may be due to the induced increased expression of these disease-resistant genes and strong inhibition of the IAA signal transduction, cytokinin synthesis, and myrosinase synthesis. The homology observed between unigenes 0048482, 0061770 and the Crr1 gene shared 94% nucleotide similarity. Furthermore, unigene 0061770 could have originated from an inversion of the Crr1 5'-end sequence.
A. V. Vinogradov
Full Text Available Aim: to estimate the frequency of DNMT3A gene exons 18–26 point mutations in acute myeloid leukemia (AML patients (pts using target automatic sequencing technique.Material and Methods. Bone marrow and peripheral blood samples were obtained from 34 AML pts aged 21 to 64, who were treated in Sverdlovsk Regional Hematological Centre (Ekaterinburg during the period 2012–2014. Distribution of the pts according to FAB-classification was as follows: AML M0 – 3, M1 – 1, M2 – 12, M3 – 3, M4 – 10, M5 – 2, M6 – 1, M7 – 1, blastic plasmacytoid dendritic cell neoplasm – 1. Total RNA was extracted from leukemic cells and subjected to reverse transcription. DNMT3A gene exons 18–26 were amplified by PCR. Detection of mutations in DNMT3A gene was performed by direct sequencing. Sequencing was realized using an automatic genetic analyzer ABI Prism 310.Results. The average frequency of functionally significant point mutations in DNMT3A gene exons 18– 26 among the treated AML pts was 5.9%. They were detected in morphological subgroups M2 and M4(according to WHO classification. The average frequency of DNMT3A gene exons 18–26 point mutations among the AML M2 and M4 pts without chromosomal aberrations and TP53 gene point mutations was 14.3%. In both cases there were samples in which DNMT3A gene mutations were accompanied by molecular lesions of NPM1, KRAS and WT1 genes. AML pts with DNMT3A gene exons 18–26 point mutations characterized by poor response to standard chemotherapeutic regimens and unfavorable prognosis.
Jachiet, Pierre-Alain; Pogorelcnik, Romain; Berry, Anne; Lopez, Philippe; Bapteste, Eric
Gene fusion is an important evolutionary process. It can yield valuable information to infer the interactions and functions of proteins. Fused genes have been identified as non-transitive patterns of similarity in triplets of genes. To be computationally tractable, this approach usually imposes an a priori distinction between a dataset in which fused genes are searched for, and a dataset that may have provided genetic material for fusion. This reduces the 'genetic space' in which fusion can be discovered, as only a subset of triplets of genes is investigated. Moreover, this approach may have a high-false-positive rate, and it does not identify gene families descending from a common fusion event. We represent similarities between sequences as a network. This leads to an efficient formulation of previous methods of fused gene identification, which we implemented in the Python program FusedTriplets. Furthermore, we propose a new characterization of families of fused genes, as clique minimal separators of the sequence similarity network. This well-studied graph topology provides a robust and fast method of detection, well suited for automatic analyses of big datasets. We implemented this method in the C++ program MosaicFinder, which additionally uses local alignments to discard false-positive candidates and indicates potential fusion points. The grouping into families will help distinguish sequencing or prediction errors from real biological fusions, and it will yield additional insight into the function and history of fused genes. FusedTriplets and MosaicFinder are published under the GPL license and are freely available with their source code at this address: http://sourceforge.net/projects/mosaicfinder. Supplementary data are available at Bioinformatics online.
Hansen, E B; Hansen, F G; von Meyenburg, K
The nucleotide sequence of the dnaA gene and the first 10% of the dnaN gene was determined. From the nucleotide sequence the amino acid sequence of the dnaA gene product was derived. It is a basic protein of 467 amino acid residues with a molecular weight of 52.5 kD. The expression of the dnaA gene is in the counterclockwise direction like the one of the dnaN gene, for which potential startsites were found.
Jonathan A Scolnick
Full Text Available Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET, for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE tissue RNA in both normal tissue and cancer cells.
Scolnick, Jonathan A; Dimon, Michelle; Wang, I-Ching; Huelga, Stephanie C; Amorese, Douglas A
Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET), for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE) tissue RNA in both normal tissue and cancer cells.
Eikenella corrodens normally inhabits the human respiratory and gastrointestinal tracts but is frequently the cause of abscesses at various sites. Using the N-terminal portion of the Moraxella nonliquefaciens pilin gene as a hybridization probe, we cloned two tandemly located pilin genes of E. corrodens 31745, ecpC and ecpD, and expressed the two pilin genes separately in Escherichia coli. A comparison of the predicted amino acid sequences of E. corrodens 31745 EcpC and EcpD revealed consider...
Hallin, Peter Fischer; Stærfeldt, Hans Henrik; Rotenberg, Eva;
We present an interactive web application for visualizing genomic data of prokaryotic chromosomes. The tool (GeneWiz browser) allows users to carry out various analyses such as mapping alignments of homologous genes to other genomes, mapping of short sequencing reads to a reference chromosome......, and calculating DNA properties such as curvature or stacking energy along the chromosome. The GeneWiz browser produces an interactive graphic that enables zooming from a global scale down to single nucleotides, without changing the size of the plot. Its ability to disproportionally zoom provides optimal...
Fromont-Racine, M; Bucchini, D; Madsen, O
Expression of the human insulin gene was examined in transgenic mouse lines carrying the gene with various lengths of DNA sequences 5' to the transcription start site (+1). Expression of the transgene was demonstrated by 1) the presence of human C-peptide in urine, 2) the presence of specific......, and -168 allowed correct initiation of the transcripts and cell specificity of expression, while quantitative expression gradually decreased. Deletion to -58 completely abolished the expression of the gene. The amount of human product that in mice harboring the longest fragment contributes up to 50...... of the transgene was observed in cell types other than beta-islet cells....
Barghi, Neda; Concepcion, Gisela P; Olivera, Baldomero M; Lluisma, Arturo O
The evolvability of venom components (in particular, the gene-encoded peptide toxins) in venomous species serves as an adaptive strategy allowing them to target new prey types or respond to changes in the prey field. The structure, organization, and expression of the venom peptide genes may provide insights into the molecular mechanisms that drive the evolution of such genes. Conus is a particularly interesting group given the high chemical diversity of their venom peptides, and the rapid evolution of the conopeptide-encoding genes. Conus genomes, however, are large and characterized by a high proportion of repetitive sequences. As a result, the structure and organization of conopeptide genes have remained poorly known. In this study, a survey of the genome of Conus tribblei was undertaken to address this gap. A partial assembly of C. tribblei genome was generated; the assembly, though consisting of a large number of fragments, accounted for 2160.5 Mb of sequence. A large number of repetitive genomic elements consisting of 642.6 Mb of retrotransposable elements, simple repeats, and novel interspersed repeats were observed. We characterized the structural organization and distribution of conotoxin genes in the genome. A significant number of conopeptide genes (estimated to be between 148 and 193) belonging to different superfamilies with complete or nearly complete exon regions were observed, ~60 % of which were expressed. The unexpressed conopeptide genes represent hidden but significant conotoxin diversity. The conotoxin genes also differed in the frequency and length of the introns. The interruption of exons by long introns in the conopeptide genes and the presence of repeats in the introns may indicate the importance of introns in facilitating recombination, evolution and diversification of conotoxins. These findings advance our understanding of the structural framework that promotes the gene-level molecular evolution of venom peptides.
Full Text Available Oryza officinalis Wall ex Watt is one of the most important wild relatives of cultivated rice and exhibits high resistance to many diseases. It has been used as a source of genes for introgression into cultivated rice. However, there are limited genomic resources and little genetic information publicly reported for this species. To better understand the pathways and factors involved in disease resistance and accelerating the process of rice breeding, we carried out a de novo transcriptome sequencing of O. officinalis. In this research, 137,229 contigs were obtained ranging from 200 to 19,214 bp with an N50 of 2331 bp through de novo assembly of leaves, stems and roots in O. officinalis using an Illumina HiSeq 2000 platform. Based on sequence similarity searches against a non-redundant protein database, a total of 88,249 contigs were annotated with gene descriptions and 75,589 transcripts were further assigned to GO terms. Candidate genes for plant–pathogen interaction and plant hormones regulation pathways involved in disease-resistance were identified. Further analyses of gene expression profiles showed that the majority of genes related to disease resistance were all expressed in the three tissues. In addition, there are two kinds of rice bacterial blight-resistant genes in O. officinalis, including two Xa1 genes and three Xa26 genes. All 2 Xa1 genes showed the highest expression level in stem, whereas one of Xa26 was expressed dominantly in leaf and other 2 Xa26 genes displayed low expression level in all three tissues. This transcriptomic database provides an opportunity for identifying the genes involved in disease-resistance and will provide a basis for studying functional genomics of O. officinalis and genetic improvement of cultivated rice in the future.
Brad S. Coates
Full Text Available Feeding damage caused by the western corn rootworm, Diabrotica virgifera virgifera, is destructive to corn plants in North America and Europe where control remains challenging due to evolution of resistance to chemical and transgenic toxins. A BAC library, DvvBAC1, containing 109,486 clones with 104±34.5 kb inserts was created, which has an ~4.56X genome coverage based upon a 2.58 Gb (2.80 pg flow cytometry-estimated haploid genome size. Paired end sequencing of 1037 BAC inserts produced 1.17 Mb of data (~0.05% genome coverage and indicated ~9.4 and 16.0% of reads encode, respectively, endogenous genes and transposable elements (TEs. Sequencing genes within BAC full inserts demonstrated that TE densities are high within intergenic and intron regions and contribute to the increased gene size. Comparison of homologous genome regions cloned within different BAC clones indicated that TE movement may cause haplotype variation within the inbred strain. The data presented here indicate that the D. virgifera virgifera genome is large in size and contains a high proportion of repetitive sequence. These BAC sequencing methods that are applicable for characterization of genomes prior to sequencing may likely be valuable resources for genome annotation as well as scaffolding.
De Zoysa, P A; Connerton, I F; Watson, D C; Johnston, J R
The cloned NADP-specific glutamate dehydrogenase (GDH) genes of Aspergillus nidulans (gdhA) and Neurospora crassa (am) have been shown to hybridize under reduced stringency conditions to genomic sequences of the yeast Schwanniomyces occidentalis. Using 5' and 3' gene-specific probes, a unique 5.1 kb BclI restriction fragment that encompasses the entire Schwanniomyces sequence has been identified. A recombinant clone bearing the unique BclI fragment has been isolated from a pool of enriched clones in the yeast/E. coli shuttle vector pWH5 by colony hybridization. The identity of the plasmid clone was confirmed by functional complementation of the Saccharomyces cerevisiae gdh-1 mutation. The nucleotide sequence of the Schw. occidentalis GDH gene, which consists of 1380 nucleotides in a continuous reading frame of 459 amino acids, has been determined. The predicted amino acid sequence shows considerable homology with GDH proteins from other fungi and significant homology with all other available GDH sequences.
Li, Shi-Ping; Chang, Hong; Ma, Guo-Long; Chen, Hong-Yu; Ji, De-Jun; Geng, Rong-Qing
The gayal (Bos frontalis) is a very rare, semi-wild and semi-domestic bovine species. There still exist remarkable divergences on the gayal's origin and phylogenetic status. The cytochrome b (Cyt b) gene entire sequences (1,140 bp) of 11 gayals were sequenced and analyzed. Combined with other bovine Cyt b entire sequences cited in GenBank, the phylogenetic trees of genus Bos were reconstructed by neighbor-joining (NJ) and maximum parsimony (MP) methods with Bubalus bubalis as outgroup. Sequence analysis showed that, among 1,140 sites of Cyt b gene entire sequences of 11 gayals, 95 variable sites (8.33% of all sites) and 6 haplotypes were found, showing abundant genetic diversity in mitochondrial Cyt b gene of the gayals. Both NJ and MP trees demonstrated that the gayals in this study were markedly divided into three embranchments: one embranchment clustering with Bos taurus, another clustering with Bos indicus, and the third clustering with Bos gaurus. The result of phylogenetic analysis suggested that the gayal might be the domesticated form of the gaur (Bos gaurus), and a great proportion of the gayal bloodline was invaded by other bovine species.
DONG Yuzhi; YANG Chuanping; ZHANG Daoyuan; WANG Yucheng
Plant aquaporins are water-selected-channels in plants and are involved in seed germination,cell elongation,stoma movement,fertilization and so on.Some plant aquapotins also play an important role in drought stress response.In this paper,the gene encoding the Tamarix albiflonum Aquaporin (AQP) was amplified by 5'rapid amplification of cDNA end (RACE) on the basis of the sequence information obtained from the expressed sequence tag of the subtractive hybridization library constructed under PEG6000 stress.The cDNA of the T.albiflonum AQP gene is 1,043 bp long,encoding a protein of 287 amino acids with a predicted molecular mass of 30.9 kDa,has 6 transmembrane regions,and possessing the major intrinsic protein (MIP) family signal consensus sequence SGXHXNPAVT and the higher plant plasma membrane intrinsic protein (PIP) highly conservative sequence GGGANXXXXGY and TGI/TNPARSL /FGAA I/VI/VF/YN.A comparative molecular analysis of the nucleotide sequence in National Center for Biotechnology Information (NCBI) databases showed that it shared 95% homology with the gene ofArabidopsis thaliana (MIP-C),with a theoretical isoelectric point 8.84.
Xue, L Y; Qian, K X
Hoxa-11 gene is essential for the development of fish fins and tetrapod limbs. Based on the published nucleotide sequences of human and mouse Hoxa-11 genes, two degenerate primers were designed. Latimeria Hoxa-11 gene fragment was amplified by PCR, cloned and sequenced. The acquired Hox gene fragment, which encodes 204 amino acids, is comprised of 2,065 bp, including most exon 1, intron and partial exon 2. The homology of latimeria Hoxa-11 protein is 66.0% to human, 67.6% to mouse, 74.4% to chick, 72.8% to frog, and 59.7% to zebrafish, respectively. The exon 2 region including the homeobox and the splice site are highly conserved. However, the exon 1 region has increased in size by 16% from latimeria to human. Sequence analysis further revealed that exon 1 of latimeria Hoxa-11 could be divided into four regions: two highly conserved regions, a moderately conserved region, and a variable region adjacent to the intron. The size variation is primarily caused by the accumulation of alanine repeats and of flanking segments rich in glycine and serine in the variable region. It implies that the variable region might be related to acquisition of new functions in the fin-limb transition and vertebrate evolution. Besides the homeobox, two highly conserved regions in exon 1 and two phylogenetic footprints in the intro were found. The strong sequence conservation suggests an important functional role of these regions.
Butyrate is a nutritional element with strong epigenetic regulatory activity as an inhibitor of histone deacetylases (HDACs). Based on the analysis of differentially expressed genes induced by butyrate in the bovine epithelial cell using deep RNA-sequencing technology (RNA-seq), a set of unique gen...
Callegari, M.L.; Riboli, B.; Sanders, J.W; Cocconcelli, P.S.; Kok, J.; Venema, G; Morelli, L.
Lactobacillus helveticus CNRZ 892 contains a surface layer (S-layer) composed of protein monomers of 43 kDa organized in regular arrays. The gene encoding this protein (slpH) has been cloned in Escherichia coli and sequenced. slpH consists of 440 codons and is preceded by a ribosome-binding site (RB
The family Rutaceae encompasses several genera including the economically important genus Citrus. In this study, we selected 22 citrus relatives belonging to the various sub groups of Rutaceae and compared the sequences of three gene fragments. The accessions selected belong to the subfamily Rutoide...
Short, Cindy M; Suttle, Curtis A
Primers were designed to amplify a 592-bp region within a conserved structural gene (g20) found in some cyanophages. The goal was to use this gene as a proxy to infer genetic richness in natural cyanophage communities and to determine if sequences were more similar in similar environments. Gene products were amplified from samples from the Gulf of Mexico, the Arctic, Southern, and Northeast and Southeast Pacific Oceans, an Arctic cyanobacterial mat, a catfish production pond, lakes in Canada and Germany, and a depth of ca. 3,246 m in the Chuckchi Sea. Amplicons were separated by denaturing gradient gel electrophoresis, and selected bands were sequenced. Phylogenetic analysis revealed four previously unknown groups of g20 clusters, two of which were entirely found in freshwater. Also, sequences with >99% identities were recovered from environments that differed greatly in temperature and salinity. For example, nearly identical sequences were recovered from the Gulf of Mexico, the Southern Pacific Ocean, an Arctic freshwater cyanobacterial mat, and Lake Constance, Germany. These results imply that closely related hosts and the viruses infecting them are distributed widely across environments or that horizontal gene exchange occurs among phage communities from very different environments. Moreover, the amplification of g20 products from deep in the cyanobacterium-sparse Chuckchi Sea suggests that this primer set targets bacteriophages other than those infecting cyanobacteria.
Delétoile, Alexis; Decré, Dominique; Courant, Stéphanie; Passet, Virginie; Audo, Jennifer; Grimont, Patrick; Arlet, Guillaume; Brisse, Sylvain
Pantoea agglomerans and other Pantoea species cause infections in humans and are also pathogenic to plants, but the diversity of Pantoea strains and their possible association with hosts and disease remain poorly known, and identification of Pantoea species is difficult. We characterized 36 Pantoea strains, including 28 strains of diverse origins initially identified as P. agglomerans, by multilocus gene sequencing based on six protein-coding genes, by biochemical tests, and by antimicrobial susceptibility testing. Phylogenetic analysis and comparison with other species of Enterobacteriaceae revealed that the genus Pantoea is highly diverse. Most strains initially identified as P. agglomerans by use of API 20E strips belonged to a compact sequence cluster together with the type strain, but other strains belonged to diverse phylogenetic branches corresponding to other species of Pantoea or Enterobacteriaceae and to probable novel species. Biochemical characteristics such as fosfomycin resistance and utilization of d-tartrate could differentiate P. agglomerans from other Pantoea species. All 20 strains of P. agglomerans could be distinguished by multilocus sequence typing, revealing the very high discrimination power of this method for strain typing and population structure in this species, which is subdivided into two phylogenetic groups. PCR detection of the repA gene, associated with pathogenicity in plants, was positive in all clinical strains of P. agglomerans, suggesting that clinical and plant-associated strains do not form distinct populations. We provide a multilocus gene sequencing method that is a powerful tool for Pantoea species delineation and identification and for strain tracking.
I. Jansen (Iris); Ye, H. (Hui); Heetveld, S. (Sasja); Lechler, M.C. (Marie C.); Michels, H. (Helen); Seinstra, R.I. (Renée I.); Lubbe, S.J. (Steven J.); Drouet, V. (Valérie); S. Lesage (Suzanne); E. Majounie (Elisa); Gibbs, J.R. (J.Raphael); M.A. Nalls (Michael); M. Ryten (Mina); Botia, J.A. (Juan A.); J. Vandrovcova (Jana); J. Simón-Sánchez (Javier); Castillo-Lizardo, M. (Melissa); P. Rizzu (Patrizia); Blauwendraat, C. (Cornelis); Chouhan, A.K. (Amit K.); Li, Y. (Yarong); Yogi, P. (Puja); N. Amin (Najaf); C.M. van Duijn (Cock); Morris, H.R. (Huw R.); Brice, A. (Alexis); A. Singleton (Andrew); David, D.C. (Della C.); Nollen, E.A. (Ellen A.); A. Jain (Ashok); J.M. Shulman; P. Heutink (Peter); D.G. Hernandez (Dena); S. Arepalli (Sampath); J. Brooks (Janet); Price, R. (Ryan); Nicolas, A. (Aude); S. Chong (Sean); M.R. Cookson (Mark); A. Dillman (Allissa); M. Moore (Matt); B.J. Traynor (Bryan); A. Singleton (Andrew); V. Plagnol (Vincent); Nicholas W Wood,; U.-M. Sheerin (Una-Marie); Jose M Bras,; K. Charlesworth (Kate); M. Gardner (Mac); R. Guerreiro (Rita); D. Trabzuni (Danyah); Hardy, J. (John); M. Sharma; M. Saad (Mohamad); Javier Simón-Sánchez,; C. Schulte (Claudia); J.C. Corvol (Jean-Christophe); Dürr, A. (Alexandra); M. Vidailhet (M.); S. Sveinbjörnsdóttir (Sigurlaug); R.A. Barker (Roger); Caroline H Williams-Gray,; Y. Ben-Shlomo; H.W. Berendse (Henk W.); K.D. van Dijk (Karin); D. Berg (Daniela); K. Brockmann; K.D. Wurster (Kathrin); Mätzler, W. (Walter); Gasser, T. (Thomas); M. Martinez (Maria); R.M.A. de Bie (Rob); A. Biffi (Alessandro); D. Velseboer (Daan); B.R. Bloem (Bastiaan); B. Post (Bart); M. Wickremaratchi (Mirdhu); B. van de Warrenburg (Bart); Z. Bochdanovits (Zoltan); M. von Bonin (Malte); H. Pétursson (Hjörvar); O. Riess (Olaf); D.J. Burn (David); Lubbe, S. (Steven); Cooper, J.M. (J Mark); N.H. McNeill (Nathan); Schapira, A. (Anthony); Lungu, C. (Codrin); Chen, H. (Honglei); Dong, J. (Jing); Chinnery, P.F. (Patrick F.); G. Hudson (Gavin); Clarke, C.E. (Carl E.); C. Moorby (Catriona); C. Counsell (Carl); P. Damier (Philippe); J.-F. Dartigues; P. Deloukas (Panagiotis); E. Gray (Emma); T. Edkins (Ted); Hunt, S.E. (Sarah E.); S.C. Potter (Simon); A. Tashakkori-Ghanbaria (Avazeh); G. Deuschl (Günther); D. Lorenz (Delia); D.T. Dexter (David); F. Durif (Frank); J. Evans (Jonathan Mark); Langford, C. (Cordelia); T. Foltynie (Thomas); A.M. Goate (Alison); C. Harris (Clare); J.J. van Hilten (Jacobus); A. Hofman (Albert); J.R. Hollenbeck (John R.); J.L. Holton (Janice); Hu, M. (Michele); X. Huang (Xiaohong); Illig, T. (Thomas); P.V. Jónsson (Pálmi); J.-C. Lambert; S.S. O'Sullivan (Sean); T. Revesz (Tamas); K. Shaw (Karen); A.J. Lees (Andrew); P. Lichtner (Peter); P. Limousin (Patricia); G. Lopez; Escott-Price, V. (Valentina); J. Pearson (Justin); N. Williams (Nigel); E. Mudanohwo (Ese); J.S. Perlmutter (Joel); Pollak, P. (Pierre); F. Rivadeneira Ramirez (Fernando); A.G. Uitterlinden (André); S.J. Sawcer (Stephen); H. Scheffer (Hans); I. Shoulson (Ira); L. Shulman (Lee); Smith, C. (Colin); R. Walker (Robert); C.C.A. Spencer (Chris C.); A. Strange (Amy); H. Stefansson (Hreinn); F. Bettella (Francesco); J-A. Zwart (John-Anker); Stockton, J.D. (Joanna D.); D. Talbot; C.M. Tanner (Carlie); F. Tison (François); S. Winder-Rhodes (Sophie); K.P. Bhatia (Kailash)
textabstractBackground: Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we perform
Dhawan, B; Sebastian, S; Malhotra, R; Kapil, A; Gautam, D
We report the first case of prosthetic joint infection caused by Lysobacter thermophilus which was identified by 16S rRNA gene sequencing. Removal of prosthesis followed by antibiotic treatment resulted in good clinical outcome. This case illustrates the use of molecular diagnostics to detect uncommon organisms in suspected prosthetic infections.
WANG Han-yan; WEI Yu-ming; ZE Hong-yan; ZHENG You-liang
Three coding sequences of gliadins genes, designed as Gli2_Du1, Gli2_Du2 and Gli2_Du3, were isolated from the genomic DNA of Triticum durum accessions CItr5083. Gli2_Du1 and Gli2_Du2 contain 945 and 864 bp, encoding the mature proteins with 314 and 287 amino acid residues, respectively. Gli2_Du3 is recognized as a pseudogene due to the stop codon occurring in the coding region. The pseudogenes, commonly occurring in gliadins family, are attributed to the single base change C → T. The amino acid sequences deduced from these gene sequences were characterized with the typical structure of α-gliadin proteins, including the toxic sequences (PSQQQP). The peptide fraction PF(Y)PP(Q)is thought to be an extra unit of repetitive domain, slightly diverging from the previous report. Six cysteine residues were observed within two unique domains. Phylogenetic analysis showed Gli2_Du2 and Gli2_Du3 were closely related to the genes on chromosome 6A, whereas Gli2_Du1 seems to be more homologous with the genes on chromosome 6B.
Pérez Del Molino Bernal, Inmaculada C; Cano, María E; García de la Fuente, Celia; Martínez-Martínez, Luis; López, Mónica; Fernández-Mazarrasa, Carlos; Agüero, Jesús
Recurrent bloodstream infections caused by a Gram-positive bacterium affected an immunocompromised child. Tsukamurella pulmonis was the microorganism identified by secA1 gene sequencing. Antibiotic treatment in combination with removal of the subcutaneous port healed the patient. Copyright © 2015, American Society for Microbiology. All Rights Reserved.
Haroon, Attiya; Koide, Michio; Higa, Futoshi; Tateyama, Masao; Fujita, Jiro
The virulence factor known as the macrophage infectivity potentiator (mip) is responsible for the intracellular survival of Legionella species. In this study, we investigated the potential of the mip gene sequence to differentiate isolates of different species of Legionella and different serogroups of Legionella pneumophila. We used 35 clinical L. pneumophila isolates and one clinical isolate each of Legionella micdadei, Legionella longbeachae, and Legionella dumoffii (collected from hospitals all over Japan between 1980 and 2007). We used 19 environmental Legionella anisa isolates (collected in the Okinawa, Nara, Osaka, and Hyogo prefectures between 1987 and 2007) and two Legionella type strains. We extracted bacterial genomic DNA and amplified out the mip gene by PCR. PCR products were purified by agarose gel electrophoresis and the mip gene was then sequenced. The L. pneumophila isolates could be divided into two groups: one group was very similar to the type strain and was composed of serogroup (SG) 1 isolates only; the second group had more sequence variations and was composed of SG1 isolates as well as SG2, SG3, SG5, and SG10 isolates. Phylogenetic analysis displayed one cluster for L. anisa isolates, while other Legionella species were present at discrete levels. Our findings show that mip gene sequencing is an effective technique for differentiating L. pneumophila strains from other Legionella species.
I. Jansen (Iris); Ye, H. (Hui); Heetveld, S. (Sasja); Lechler, M.C. (Marie C.); Michels, H. (Helen); Seinstra, R.I. (Renée I.); Lubbe, S.J. (Steven J.); Drouet, V. (Valérie); S. Lesage (Suzanne); E. Majounie (Elisa); Gibbs, J.R. (J.Raphael); M.A. Nalls (Michael); M. Ryten (Mina); Botia, J.A. (Juan A.); J. Vandrovcova (Jana); J. Simón-Sánchez (Javier); Castillo-Lizardo, M. (Melissa); P. Rizzu (Patrizia); Blauwendraat, C. (Cornelis); Chouhan, A.K. (Amit K.); Li, Y. (Yarong); Yogi, P. (Puja); N. Amin (Najaf); C.M. van Duijn (Cock); Morris, H.R. (Huw R.); Brice, A. (Alexis); A. Singleton (Andrew); David, D.C. (Della C.); Nollen, E.A. (Ellen A.); A. Jain (Ashok); J.M. Shulman; P. Heutink (Peter); D.G. Hernandez (Dena); S. Arepalli (Sampath); J. Brooks (Janet); Price, R. (Ryan); Nicolas, A. (Aude); S. Chong (Sean); M.R. Cookson (Mark); A. Dillman (Allissa); M. Moore (Matt); B.J. Traynor (Bryan); A. Singleton (Andrew); V. Plagnol (Vincent); Nicholas W Wood,; U.-M. Sheerin (Una-Marie); Jose M Bras,; K. Charlesworth (Kate); M. Gardner (Mac); R. Guerreiro (Rita); D. Trabzuni (Danyah); Hardy, J. (John); M. Sharma; M. Saad (Mohamad); Javier Simón-Sánchez,; C. Schulte (Claudia); J.C. Corvol (Jean-Christophe); Dürr, A. (Alexandra); M. Vidailhet (M.); S. Sveinbjörnsdóttir (Sigurlaug); R.A. Barker (Roger); Caroline H Williams-Gray,; Y. Ben-Shlomo; H.W. Berendse (Henk W.); K.D. van Dijk (Karin); D. Berg (Daniela); K. Brockmann; K.D. Wurster (Kathrin); Mätzler, W. (Walter); Gasser, T. (Thomas); M. Martinez (Maria); R.M.A. de Bie (Rob); A. Biffi (Alessandro); D. Velseboer (Daan); B.R. Bloem (Bastiaan); B. Post (Bart); M. Wickremaratchi (Mirdhu); B. van de Warrenburg (Bart); Z. Bochdanovits (Zoltan); M. von Bonin (Malte); H. Pétursson (Hjörvar); O. Riess (Olaf); D.J. Burn (David); Lubbe, S. (Steven); Cooper, J.M. (J Mark); N.H. McNeill (Nathan); Schapira, A. (Anthony); Lungu, C. (Codrin); Chen, H. (Honglei); Dong, J. (Jing); Chinnery, P.F. (Patrick F.); G. Hudson (Gavin); Clarke, C.E. (Carl E.); C. Moorby (Catriona); C. Counsell (Carl); P. Damier (Philippe); J.-F. Dartigues; P. Deloukas (Panagiotis); E. Gray (Emma); T. Edkins (Ted); Hunt, S.E. (Sarah E.); S.C. Potter (Simon); A. Tashakkori-Ghanbaria (Avazeh); G. Deuschl (Günther); D. Lorenz (Delia); D.T. Dexter (David); F. Durif (Frank); J. Evans (Jonathan Mark); Langford, C. (Cordelia); T. Foltynie (Thomas); A.M. Goate (Alison); C. Harris (Clare); J.J. van Hilten (Jacobus); A. Hofman (Albert); J.R. Hollenbeck (John R.); J.L. Holton (Janice); Hu, M. (Michele); X. Huang (Xiaohong); Illig, T. (Thomas); P.V. Jónsson (Pálmi); J.-C. Lambert; S.S. O'Sullivan (Sean); T. Revesz (Tamas); K. Shaw (Karen); A.J. Lees (Andrew); P. Lichtner (Peter); P. Limousin (Patricia); G. Lopez; Escott-Price, V. (Valentina); J. Pearson (Justin); N. Williams (Nigel); E. Mudanohwo (Ese); J.S. Perlmutter (Joel); Pollak, P. (Pierre); F. Rivadeneira Ramirez (Fernando); A.G. Uitterlinden (André); S.J. Sawcer (Stephen); H. Scheffer (Hans); I. Shoulson (Ira); L. Shulman (Lee); Smith, C. (Colin); R. Walker (Robert); C.C.A. Spencer (Chris C.); A. Strange (Amy); H. Stefansson (Hreinn); F. Bettella (Francesco); J-A. Zwart (John-Anker); Stockton, J.D. (Joanna D.); D. Talbot; C.M. Tanner (Carlie); F. Tison (François); S. Winder-Rhodes (Sophie); K.P. Bhatia (Kailash)
textabstractBackground: Whole-exome sequencing (WES) has been successful in identifying genes that cause familial Parkinson's disease (PD). However, until now this approach has not been deployed to study large cohorts of unrelated participants. To discover rare PD susceptibility variants, we perform
DAVISON, J; BRUNEL, F; PHANOPOULOS, A; PROZZI, D; TERPSTRA, P
The nucleotide sequences of two genes involved in sodium dodecyl sulfate (SDS) degradation, by Pseudomonas, have been determined. One of these, sdsA, codes for an alkyl sulfatase (58 957 Da) and has similarity (31.8% identity over a 201-amino acid stretch) to the N terminus of a predicted protein of
Hoang, Van L T; Innes, David J; Shaw, P Nicholas; Monteith, Gregory R; Gidley, Michael J; Dietzgen, Ralf G
Mango fruits contain a broad spectrum of phenolic compounds which impart potential health benefits; their biosynthesis is catalysed by enzymes in the phenylpropanoid-flavonoid (PF) pathway. The aim of this study was to reveal the variability in genes involved in the PF pathway in three different mango varieties Mangifera indica L., a member of the family Anacardiaceae: Kensington Pride (KP), Irwin (IW) and Nam Doc Mai (NDM) and to determine associations with gene expression and mango flavonoid profiles. A close evolutionary relationship between mango genes and those from the woody species poplar of the Salicaceae family (Populus trichocarpa) and grape of the Vitaceae family (Vitis vinifera), was revealed through phylogenetic analysis of PF pathway genes. We discovered 145 SNPs in total within coding sequences with an average frequency of one SNP every 316 bp. Variety IW had the highest SNP frequency (one SNP every 258 bp) while KP and NDM had similar frequencies (one SNP every 369 bp and 360 bp, respectively). The position in the PF pathway appeared to influence the extent of genetic diversity of the encoded enzymes. The entry point enzymes phenylalanine lyase (PAL), cinnamate 4-mono-oxygenase (C4H) and chalcone synthase (CHS) had low levels of SNP diversity in their coding sequences, whereas anthocyanidin reductase (ANR) showed the highest SNP frequency followed by flavonoid 3'-hydroxylase (F3'H). Quantitative PCR revealed characteristic patterns of gene expression that differed between mango peel and flesh, and between varieties. The combination of mango expressed sequence tags and availability of well-established reference PF biosynthetic genes from other plant species allowed the identification of coding sequences of genes that may lead to the formation of important flavonoid compounds in mango fruits and facilitated characterisation of single nucleotide polymorphisms between varieties. We discovered an association between the extent of sequence variation and
Ono, M; Toh, H; Miyata, T; Awaya, T
We determined the complete nucleotide sequence of the intracisternal A-particle gene, IAP-H18, cloned from the normal Syrian hamster liver DNA. IAP-H18 was 7,951 base pairs in length with two identical long terminal repeats of 376 base pairs at both ends. On the coding strand, imperfect open reading frames corresponding to gag and pol of the retrovirus genome were observed, whereas many stop codons were present in the region corresponding to env. The putative H18 gag gene (809 amino acids) had a sequence homologous to the N-terminal half of the mouse mammary tumor virus gag gene and locally to the Rous sarcoma virus gag gene. The putative H18 pol gene (900 residues) was homologous to the Rous sarcoma virus pol gene almost throughout the entire region. Two conserved regions among the retrovirus pol genes have been reported. One presumably corresponds to the DNA polymerase and the RNase H domain, and the other corresponds to the DNA endonuclease domain of the multifunctional protein pol. By the comparison of the deduced amino acid sequences of the putative endonuclease domain of six representative oncovirus genomes, a phylogenetic tree of the oncovirus genomes was constructed, and the intracisternal A-particle (type A) genome was found to be more closely related to the mouse mammary tumor virus (type B) and squirrel monkey retrovirus (type D) genomes.
赛吉庆; 康良仪; 黄忠; 史春霖; 田波; 谢友菊
The 3’-terminal 1 279 nucleotide sequence of maize dwarf mosaic virus (MDMV) genome has been determined. This sequence contains an open reading frame of 1023 nudeotides and a 3’ -non-coding region of 256 nucleotides. The open reading frame includes all of the coding regions for the viral capsid protein (CP) and part of the viral nuclear inclusion protein (Nib). The predicted viral CP consists of 313 amino acid residues with a calculated molecular weight of 35400. The amino acid sequence of the viral CP derived from MDMV cDNA shows about 47%-54% homology to that of 4 other potyviruses. The viral CP gene was constructed in frame with the lacZ gene in pUC19 plasmid and expressed in E. coli cells. The fusion polypeptide positively reacted in Western blot with an antiserum prepared against the native viral CP.
Ullrich, A; Dull, T J; Gray, A; Philips, J A; Peter, S
The nucleotide sequence of a highly repetitive sequence region upstream from the human insulin gene is reported. The length of this region varies between alleles in the population, and appears to be stably transmitted to the next generation in a Mendelian fashion. There is no significant correlation between the length of this sequence and two types of diabetes mellitus. We observe variation in the cleavability of a BglI recognition site downstream from the human insulin gene, which is probably due to variable nucleotide modification. This presumed modification state appears not to be inherited, and varies between tissues within an individual and between individuals for a given tissue. Both alleles in a given tissue DNA sample are modified to the same extent.
Liu, Zhan-Lin; Zhang, Da-Ming; Wang, Xiao-Ru
In higher plants the primary and the secondary structures of 5S ribosomal RNA gene are considered highly conservative. Little is known about the 5S rRNA gene structure, organization and variation in gyimnosperms. In this study we analyzed sequence and structure variation of 5S rRNA gene in Pinus through cloning and sequencing multiple copies of 5S rDNA repeats from individual trees of five pines, P. bungeana, P. tabulaeformis, P. yunnanensis, P. massoniana and P. densata. Pinus bungeana is from the subgenus Strobus while the other four are from the subgenus Pinus (diploxylon pines). Our results revealed variations in both primary and secondary structure among copies of 5S rDNA within individual genomes and between species. 5S rRNA gene in Pinus is 120 bp long in most of the 122 clones we sequenced except for one or two deletions in three clones. Among these clones 50 unique sequences were identified and they were shared by different pine species. Our sequences were compared to 13 sequences each representing a different gymnosperm species, and to six sequences representing both angiosperm monocots and dicots. Average sequence similarity was 97.1% among Pinus species and 94.3% between Pinus and other gymnosperms. Between gymnosperms and angiosperms the sequence similarity decreased to 88.1%. Similar to other molecular data, significant sequence divergence was found between the two Pinus subgenera. The 5S gene tree (neighbor-joining tree) grouped the four diploxylon pines together and separated them distinctly from P. bungeana. Comparison of sequence divergence within individuals and between species suggested that concerted evolution has been very weak especially after the divergence of the four diploxylon pines. The phylogenetic information contained in the 5S rRNA gene is limited due to its shorter length and the difficulties in identifying orthologous and paralogous copies of rDNA multigene family further complicate its phylogenetic application. Pinus densata is a
GUO BaoCheng; LI JunBing; TONG ChaoBo; HE ShunPing
A PCR survey for Sox genes in a young tetraploid fish Tor douronensis (Teleostei: Cyprinidae) was per-formed to access the evolutionary fates of important functional genes after genome duplication caused by polyploidization event. Totally 13 Sox genes were obtained in Tor douronensis, which represent SoxB, SoxC and SoxE groups. Phylogenetic analysis of Sox genes in Tor douronensis provided evidence for fish-specific genome duplication, and suggested that Sox19 might be a teleost specific Sox gene member. Sequence analysis revealed most of the nucleotide substitutions between duplicated copies of Sox genes caused by tetraploidization event or their orthologues in other species are silent substitutions. It would appear that the sequences are under purifying selective pressure, strongly suggesting that they repre- sent functional genes and supporting selection against all null allele at either of two duplicated loci of Sox4a, Sox9a and Sox9b. Surprising variations of the intron length and similarities of two duplicated copies of Sox9a and Sox9b, suggest that Tor douronensis might be an allotetraploidy.
Elizabeth A. Robb
Full Text Available The chicken coloboma mutation exhibits features similar to human congenital developmental malformations such as ocular coloboma, cleft-palate, dwarfism, and polydactyly. The coloboma-associated region and encoded genes were investigated using advanced genomic, genetic, and gene expression technologies. Initially, the mutation was linked to a 990 kb region encoding 11 genes; the application of the genetic and genomic tools led to a reduction of the linked region to 176 kb and the elimination of 7 genes. Furthermore, bioinformatics analyses of capture array-next generation sequence data identified genetic elements including SNPs, insertions, deletions, gaps, chromosomal rearrangements, and miRNA binding sites within the introgressed causative region relative to the reference genome sequence. Coloboma-specific variants within exons, UTRs, and splice sites were studied for their contribution to the mutant phenotype. Our compiled results suggest three genes for future studies. The three candidate genes, SLC30A5 (a zinc transporter, CENPH (a centromere protein, and CDK7 (a cyclin-dependent kinase, are differentially expressed (compared to normal embryos at stages and in tissues affected by the coloboma mutation. Of these genes, two (SLC30A5 and CENPH are considered high-priority candidate based upon studies in other vertebrate model systems.
Zhu, Ying; Mang, Hyung-gon; Sun, Qi; Qian, Jun; Hipps, Ashley; Hua, Jian
Next-generation sequencing technologies are accelerating gene discovery by combining multiple steps of mapping and cloning used in the traditional map-based approach into one step using DNA sequence polymorphisms existing between two different accessions/strains/backgrounds of the same species. The existing next-generation sequencing method, like the traditional one, requires the use of a segregating population from a cross of a mutant organism in one accession with a wild-type (WT) organism in a different accession. It therefore could potentially be limited by modification of mutant phenotypes in different accessions and/or by the lengthy process required to construct a particular mapping parent in a second accession. Here we present mapping and cloning of an enhancer mutation with next-generation sequencing on bulked segregants in the same accession using sequence polymorphisms induced by a chemical mutagen. This method complements the conventional cloning approach and makes forward genetics more feasible and powerful in molecularly dissecting biological processes in any organisms. The pipeline developed in this study can be used to clone causal genes in background of single mutants or higher order of mutants and in species with or without sequence information on multiple accessions.
Chénard, C; Suttle, C A
Many cyanophage isolates which infect the marine cyanobacteria Synechococcus spp. and Prochlorococcus spp. contain a gene homologous to psbA, which codes for the D1 protein involved in photosynthesis. In the present study, cyanophage psbA gene fragments were readily amplified from freshwater and marine samples, confirming their widespread occurrence in aquatic communities. Phylogenetic analyses demonstrated that sequences from freshwaters have an evolutionary history that is distinct from that of their marine counterparts. Similarly, sequences from cyanophages infecting Prochlorococcus and Synechococcus spp. were readily discriminated, as were sequences from podoviruses and myoviruses. Viral psbA sequences from the same geographic origins clustered within different clades. For example, cyanophage psbA sequences from the Arctic Ocean fell within the Synechococcus as well as Prochlorococcus phage groups. Moreover, as psbA sequences are not confined to a single family of phages, they provide an additional genetic marker that can be used to explore the diversity and evolutionary history of cyanophages in aquatic environments.
Ramy Karam Aziz
Full Text Available Phages are the most abundant biological entities on Earth and play major ecological roles, yet the current sequenced phage genomes do not adequately represent their diversity, and little is known about the abundance and distribution of these sequenced genomes in nature. Although the study of phage ecology has benefited tremendously from the emergence of metagenomic sequencing, a systematic survey of phage genes and genomes in various ecosystems is still lacking, and fundamental questions about phage biology, lifestyle, and ecology remain unanswered. To address these questions and improve comparative analysis of phages in different metagenomes, we screened a core set of publicly available metagenomic samples for sequences related to completely sequenced phages using the web tool, Phage Eco-Locator. We then adopted and deployed an array of mathematical and statistical metrics for a multidimensional estimation of the abundance and distribution of phage genes and genomes in various ecosystems. Experiments using those metrics individually showed their usefulness in emphasizing the pervasive, yet uneven, distribution of known phage sequences in environmental metagenomes. Using these metrics in combination allowed us to resolve phage genomes into clusters that correlated with their genotypes and taxonomic classes as well as their ecological properties. We propose adding this set of metrics to current metaviromic analysis pipelines, where they can provide insight regarding phage mosaicism, habitat specificity, and evolution.
Full Text Available Abstract Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics.
Full Text Available generated by an Adobe application 11.5606 Iridovirus was known as agents that caused serious systemic disease in freshwater and marine fishes. The mortality up to 100% of orange-spotted grouper (Epinephelus coioides due to iridovirus infection has been reported in Indonesia. The gene encoding capsid protein of iridovirus is supposed to be conserved and has the potency for the development of control methods. The objectives of this study are to clone the gene encoding capsid protein iridovirus and to analyze their sequences. The spleen tissues of orange-spotted grouper were collected and extracted their DNA. The DNA fragment of capsid protein of iridovirus genes were amplified by PCR using designed primers with the extraction DNA as templates. The amplified DNA fragments were cloned in pBSKSII and sequenced. The genes encoding capsid protein of iridovirus from Jepara and Bali were successfully amplified and cloned. The Jepara clone (IJP03 contained complete open reading frame (ORF of the gene composed by 1362 bp nucleotides which encoded 453 amino acids. Those Jepara and Bali (IGD01 clones shared 99.8% similarity in nucleotide level and 99.4% at amino acid level. Based on those sequences, Indonesian iridovirus was belonged to genus Megalocystivirus and shared 99,6-99,9% similarity on nucleotide level with DGIV, ISKNV, MCIV, and ALIV Normal 0 36 false false false
Full Text Available The objective of this research was to identify diversity of exon 5 UTMP gene fragment in Bali cattle using direct sequencing. The total 60 blood samples of Bali Cattle derived from BPTU Bali in Bali siland (20 heads, BPTU Serading in Sumbawa island (20 heads and Village Breeding Center in Barru District South Sulawesi (20 heads were used to evaluate their genetic diversity at exon 5 UTMP gene. The forward and reverse data sequences were analyzed using Bioedit program and alignment analysis was carried out using MEGA5 program. Meanwhile haplotype analysis was performed by DnaSPv5 program. The result showed that partial sequences in exon 5 UTMP gene had 16 haplotypes with the highest number of haplotypes ware found in VBC Barru district South Sulawesi (8 haplotypes. Moreover, the highest average of haplotype (h and nucleotide (p diversity were found in VBC Barru district South Sulawesi were 0.7949 and 0.0016, respectively. In addition, minisatellite insersion was found in exon 5 UTMP gene fragment on Bali cattle which are consist of 5'-CCA GTC ATG AAG AAG GCA GAG GTC GTC GTG CCG GCG AAA-3'. According to our results, haplotype and minisatellite variation in exon 5 UTMP gene fragment can be used as a candidate genetic marker specific for reproductive trait in the Bali cattle and for its strategy breeding program in the future.
Full Text Available Hedgehog (Hh genes play major roles in animal development and studies of their evolution, expression and function point to major differences among chordates. Here we focused on Hh genes in lampreys in order to characterize the evolution of Hh signalling at the emergence of vertebrates. Screening of a cosmid library of the river lamprey Lampetra fluviatilis and searching the preliminary genome assembly of the sea lamprey Petromyzon marinus indicate that lampreys have two Hh genes, named Hha and Hhb. Phylogenetic analyses suggest that Hha and Hhb are lamprey-specific paralogs closely related to Sonic/Indian Hh genes. Expression analysis indicates that Hha and Hhb are expressed in a Sonic Hh-like pattern. The two transcripts are expressed in largely overlapping but not identical domains in the lamprey embryonic brain, including a newly-described expression domain in the nasohypophyseal placode. Global alignments of genomic sequences and local alignment with known gnathostome regulatory motifs show that lamprey Hhs share conserved non-coding elements (CNE with gnathostome Hhs albeit with sequences that have significantly diverged and dispersed. Functional assays using zebrafish embryos demonstrate gnathostome-like midline enhancer activity for CNEs contained in intron2. We conclude that lamprey Hh genes are gnathostome Shh-like in terms of expression and regulation. In addition, they show some lamprey-specific features, including duplication and structural (but not functional changes in the intronic/regulatory sequences.
Geremek, Maciej; Schoenmaker, Frederieke; Zietkiewicz, Ewa; Pogorzelski, Andrzej; Diehl, Scott; Wijmenga, Cisca; Witt, Michal
Primary ciliary dyskinesia (PCD) is a rare genetic disorder, which shows extensive genetic heterogeneity and is mostly inherited in an autosomal recessive fashion. There are four genes with a proven pathogenetic role in PCD. DNAH5 and DNAI1 are involved in 28 and 10% of PCD cases, respectively, while two other genes, DNAH11 and TXNDC3, have been identified as causal in one PCD family each. We have previously identified a 3.5 cM (2.82 Mb) region on chromosome 15q linked to Kartagener syndrome (KS), a subtype of PCD characterized by the randomization of body organ positioning. We have now refined the KS candidate region to a 1.8 Mb segment containing 18 known genes. The coding regions of these genes and three neighboring genes were subjected to sequence analysis in seven KS probands, and we were able to identify 60 single nucleotide sequence variants, 35 of which resided in mRNA coding sequences. However, none of the variations alone could explain the occurrence of the disease in these patients.
Soh, Y Q Shirleen; Alföldi, Jessica; Pyntikova, Tatyana; Brown, Laura G; Graves, Tina; Minx, Patrick J; Fulton, Robert S; Kremitzki, Colin; Koutseva, Natalia; Mueller, Jacob L; Rozen, Steve; Hughes, Jennifer F; Owens, Elaine; Womack, James E; Murphy, William J; Cao, Qing; de Jong, Pieter; Warren, Wesley C; Wilson, Richard K; Skaletsky, Helen; Page, David C
We sequenced the MSY (male-specific region of the Y chromosome) of the C57BL/6J strain of the laboratory mouse Mus musculus. In contrast to theories that Y chromosomes are heterochromatic and gene poor, the mouse MSY is 99.9% euchromatic and contains about 700 protein-coding genes. Only 2% of the MSY derives from the ancestral autosomes that gave rise to the mammalian sex chromosomes. Instead, all but 45 of the MSY's genes belong to three acquired, massively amplified gene families that have no homologs on primate MSYs but do have acquired, amplified homologs on the mouse X chromosome. The complete mouse MSY sequence brings to light dramatic forces in sex chromosome evolution: lineage-specific convergent acquisition and amplification of X-Y gene families, possibly fueled by antagonism between acquired X-Y homologs. The mouse MSY sequence presents opportunities for experimental studies of a sex-specific chromosome in its entirety, in a genetically tractable model organism.
Cui, Yanhua; Tu, Ran; Guan, Yue; Ma, Luyan; Chen, Sanfeng
The fhuE gene of Escherichia coli encodes the FhuE protein, which is a receptor protein in the coprogen-mediated siderophore iron-transport system. A fhuE gene homologue from Azospirillum brasilense, a nitrogen-fixing soil bacterium that lives in association with the roots of cereal grasses, was cloned, sequenced, and characterized. The A. brasilense fhuE encodes a protein of 802 amino acids with a predicted molecular weight of approximately 87 kDa. The deduced amino-acid sequence showed a high level of homology to the sequences of all the known fhuE gene products. The fhuE mutant was sensitive to iron starvation and defective in coprogen-mediated iron uptake. The mutant failed to express one membrane protein of approximately 78 kDa that was induced by iron starvation in the wild type. Complementation studies showed that the A. brasilense fhuE gene, when present on a low-copy number plasmid, could restore the functions of the mutant. Mutation in fhuE gene did not affect nitrogen fixation.
Pinheiro Daniel G
Full Text Available Abstract Background High-throughput molecular approaches for gene expression profiling, such as Serial Analysis of Gene Expression (SAGE, Massively Parallel Signature Sequencing (MPSS or Sequencing-by-Synthesis (SBS represent powerful techniques that provide global transcription profiles of different cell types through sequencing of short fragments of transcripts, denominated sequence tags. These techniques have improved our understanding about the relationships between these expression profiles and cellular phenotypes. Despite this, more reliable datasets are still necessary. In this work, we present a web-based tool named S3T: Score System for Sequence Tags, to index sequenced tags in accordance with their reliability. This is made through a series of evaluations based on a defined rule set. S3T allows the identification/selection of tags, considered more reliable for further gene expression analysis. Results This methodology was applied to a public SAGE dataset. In order to compare data before and after filtering, a hierarchical clustering analysis was performed in samples from the same type of tissue, in distinct biological conditions, using these two datasets. Our results provide evidences suggesting that it is possible to find more congruous clusters after using S3T scoring system. Conclusion These results substantiate the proposed application to generate more reliable data. This is a significant contribution for determination of global gene expression profiles. The library analysis with S3T is freely available at http://gdm.fmrp.usp.br/s3t/. S3T source code and datasets can also be downloaded from the aforementioned website.
Foltz, David W
A novel repeat sequence with a conserved secondary structure is described from two nonadjacent introns of the ATP synthase beta-subunit gene in sea stars of the order Forcipulatida (Echinodermata: Asteroidea). The repeat is present in both introns of all forcipulate sea stars examined, which suggests that it is an ancient feature of this gene (with an approximate age of 200 Mya). Both stem and loop regions show high levels of sequence constraint when compared to flanking nonrepetitive intronic regions. The repeat was also detected in (1) the family Pterasteridae, order Velatida and (2) the family Korethrasteridae, order Velatida. The repeat was not detected in (1) the family Echinasteridae, order Spinulosida, (2) the family Astropectinidae, order Paxillosida, (3) the family Solasteridae, order Velatida, or (4) the family Goniasteridae, order Valvatida. The repeat lacks similarity to published sequences in unrestricted GenBank searches, and there are no significant open reading frames in the repeat or in the flanking intron sequences. Comparison via parametric bootstrapping to a published phylogeny based on 4.2 kb of nuclear and mitochondrial sequence for a subset of these species allowed the null hypothesis of a congruent phylogeny to be rejected for each repeat, when compared separately to the published phylogeny. In contrast, the flanking nonrepetitive sequences in each intron yielded separate phylogenies that were each congruent with the published phylogeny. In four species, the repeat in one or both introns has apparently experienced gene conversion. The two introns also show a correlated pattern of nucleotide substitutions, even after excluding the putative cases of gene conversion.
Full Text Available Abstract Background In Drosophila melanogaster, the dosage-compensation system that equalizes X-linked gene expression between males and females, thereby assuring that an appropriate balance is maintained between the expression of genes on the X chromosome(s and the autosomes, is at least partially mediated by the Male-Specific Lethal (MSL complex. This complex binds to genes with a preference for exons on the male X chromosome with a 3' bias, and it targets most expressed genes on the X chromosome. However, a number of genes are expressed but not targeted by the complex. High affinity sites seem to be responsible for initial recruitment of the complex to the X chromosome, but the targeting to and within individual genes is poorly understood. Results We have extensively examined X chromosome sequence variation within five types of gene features (promoters, 5' UTRs, coding sequences, introns, 3' UTRs and intergenic sequences, and assessed its potential involvement in dosage compensation. Presented results show that: the X chromosome has a distinct sequence composition within its gene features; some of the detected variation correlates with genes targeted by the MSL-complex; the insulator protein BEAF-32 preferentially binds upstream of MSL-bound genes; BEAF-32 and MOF co-localizes in promoters; and that bound genes have a distinct sequence composition that shows a 3' bias within coding sequence. Conclusions Although, many strongly bound genes are close to a high affinity site neither our promoter motif nor our coding sequence signatures show any correlation to HAS. Based on the results presented here, we believe that there are sequences in the promoters and coding sequences of targeted genes that have the potential to direct the secondary spreading of the MSL-complex to nearby genes.
Philip, Philge; Pettersson, Fredrik; Stenberg, Per
In Drosophila melanogaster, the dosage-compensation system that equalizes X-linked gene expression between males and females, thereby assuring that an appropriate balance is maintained between the expression of genes on the X chromosome(s) and the autosomes, is at least partially mediated by the Male-Specific Lethal (MSL) complex. This complex binds to genes with a preference for exons on the male X chromosome with a 3' bias, and it targets most expressed genes on the X chromosome. However, a number of genes are expressed but not targeted by the complex. High affinity sites seem to be responsible for initial recruitment of the complex to the X chromosome, but the targeting to and within individual genes is poorly understood. We have extensively examined X chromosome sequence variation within five types of gene features (promoters, 5' UTRs, coding sequences, introns, 3' UTRs) and intergenic sequences, and assessed its potential involvement in dosage compensation. Presented results show that: the X chromosome has a distinct sequence composition within its gene features; some of the detected variation correlates with genes targeted by the MSL-complex; the insulator protein BEAF-32 preferentially binds upstream of MSL-bound genes; BEAF-32 and MOF co-localizes in promoters; and that bound genes have a distinct sequence composition that shows a 3' bias within coding sequence. Although, many strongly bound genes are close to a high affinity site neither our promoter motif nor our coding sequence signatures show any correlation to HAS. Based on the results presented here, we believe that there are sequences in the promoters and coding sequences of targeted genes that have the potential to direct the secondary spreading of the MSL-complex to nearby genes.
Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M
X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.
Tannous, Joanna; El Khoury, Rhoda; Snini, Selma P; Lippi, Yannick; El Khoury, André; Atoui, Ali; Lteif, Roger; Oswald, Isabelle P; Puel, Olivier
Patulin is a polyketide-derived mycotoxin produced by numerous filamentous fungi. Among them, Penicillium expansum is by far the most problematic species. This fungus is a destructive phytopathogen capable of growing on fruit, provoking the blue mold decay of apples and producing significant amounts of patulin. The biosynthetic pathway of this mycotoxin is chemically well-characterized, but its genetic bases remain largely unknown with only few characterized genes in less economic relevant species. The present study consisted of the identification and positional organization of the patulin gene cluster in P. expansum strain NRRL 35695. Several amplification reactions were performed with degenerative primers that were designed based on sequences from the orthologous genes available in other species. An improved genome Walking approach was used in order to sequence the remaining adjacent genes of the cluster. RACE-PCR was also carried out from mRNAs to determine the start and stop codons of the coding sequences. The patulin gene cluster in P. expansum consists of 15 genes in the following order: patH, patG, patF, patE, patD, patC, patB, patA, patM, patN, patO, patL, patI, patJ, and patK. These genes share 60-70% of identity with orthologous genes grouped differently, within a putative patulin cluster described in a non-producing strain of Aspergillus clavatus. The kinetics of patulin cluster genes expression was studied under patulin-permissive conditions (natural apple-based medium) and patulin-restrictive conditions (Eagle's minimal essential medium), and demonstrated a significant association between gene expression and patulin production. In conclusion, the sequence of the patulin cluster in P. expansum constitutes a key step for a better understanding of the mechanisms leading to patulin production in this fungus. It will allow the role of each gene to be elucidated, and help to define strategies to reduce patulin production in apple-based products.
Dollive, Serena; Peterfreund, Gregory L; Sherrill-Mix, Scott; Bittinger, Kyle; Sinha, Rohini; Hoffmann, Christian; Nabel, Christopher S; Hill, David A; Artis, David; Bachman, Michael A; Custers-Allen, Rebecca; Grunberg, Stephanie; Wu, Gary D; Lewis, James D; Bushman, Frederic D
Eukaryotic microorganisms are important but understudied components of the human microbiome. Here we present a pipeline for analysis of deep sequencing data on single cell eukaryotes. We designed a new 18S rRNA gene-specific PCR primer set and compared a published rRNA gene internal transcribed spacer (ITS) gene primer set. Amplicons were tested against 24 specimens from defined eukaryotes and eight well-characterized human stool samples. A software pipeline https://sourceforge.net/projects/brocc/ was developed for taxonomic attribution, validated against simulated data, and tested on pyrosequence data. This study provides a well-characterized tool kit for sequence-based enumeration of eukaryotic organisms in human microbiome samples.
Ovunc, Bugsu; Otto, Edgar A.; Vega-Warner, Virginia; Saisawat, Pawaree; Ashraf, Shazia; Ramaswami, Gokul; Fathy, Hanan M.; Schoeb, Dominik; Chernin, Gil; Lyons, Robert H.; Yilmaz, Engin
In two siblings of consanguineous parents with intermittent nephrotic-range proteinuria, we identified a homozygous deleterious frameshift mutation in the gene CUBN, which encodes cubulin, using exome capture and massively parallel re-sequencing. The mutation segregated with affected members of this family and was absent from 92 healthy individuals, thereby identifying a recessive mutation in CUBN as the single-gene cause of proteinuria in this sibship. Cubulin mutations cause a hereditary form of megaloblastic anemia secondary to vitamin B12 deficiency, and proteinuria occurs in 50% of cases since cubilin is coreceptor for both the intestinal vitamin B12-intrinsic factor complex and the tubular reabsorption of protein in the proximal tubule. In summary, we report successful use of exome capture and massively parallel re-sequencing to identify a rare, single-gene cause of nephropathy. PMID:21903995
Ovunc, Bugsu; Otto, Edgar A; Vega-Warner, Virginia; Saisawat, Pawaree; Ashraf, Shazia; Ramaswami, Gokul; Fathy, Hanan M; Schoeb, Dominik; Chernin, Gil; Lyons, Robert H; Yilmaz, Engin; Hildebrandt, Friedhelm
In two siblings of consanguineous parents with intermittent nephrotic-range proteinuria, we identified a homozygous deleterious frameshift mutation in the gene CUBN, which encodes cubulin, using exome capture and massively parallel re-sequencing. The mutation segregated with affected members of this family and was absent from 92 healthy individuals, thereby identifying a recessive mutation in CUBN as the single-gene cause of proteinuria in this sibship. Cubulin mutations cause a hereditary form of megaloblastic anemia secondary to vitamin B(12) deficiency, and proteinuria occurs in 50% of cases since cubilin is coreceptor for both the intestinal vitamin B(12)-intrinsic factor complex and the tubular reabsorption of protein in the proximal tubule. In summary, we report successful use of exome capture and massively parallel re-sequencing to identify a rare, single-gene cause of nephropathy.
Full Text Available Abstract Background Cowpea, Vigna unguiculata (L. Walp., is one of the most important food and forage legumes in the semi-arid tropics because of its drought tolerance and ability to grow on poor quality soils. Approximately 80% of cowpea production takes place in the dry savannahs of tropical West and Central Africa, mostly by poor subsistence farmers. Despite its economic and social importance in the developing world, cowpea remains to a large extent an underexploited crop. Among the major goals of cowpea breeding and improvement programs is the stacking of desirable agronomic traits, such as disease and pest resistance and response to abiotic stresses. Implementation of marker-assisted selection and breeding programs is severely limited by a paucity of trait-linked markers and a general lack of information on gene structure and organization. With a nuclear genome size estimated at ~620 Mb, the cowpea genome is an ideal target for reduced representation sequencing. Results We report here the sequencing and analysis of the gene-rich, hypomethylated portion of the cowpea genome selectively cloned by methylation filtration (MF technology. Over 250,000 gene-space sequence reads (GSRs with an average length of 610 bp were generated, yielding ~160 Mb of sequence information. The GSRs were assembled, annotated by BLAST homology searches of four public protein annotation databases and four plant proteomes (A. thaliana, M. truncatula, O. sativa, and P. trichocarpa, and analyzed using various domain and gene modeling tools. A total of 41,260 GSR assemblies and singletons were annotated, of which 19,786 have unique GenBank accession numbers. Within the GSR dataset, 29% of the sequences were annotated using the Arabidopsis Gene Ontology (GO with the largest categories of assigned function being catalytic activity and metabolic processes, groups that include the majority of cellular enzymes and components of amino acid, carbohydrate and lipid metabolism. A
Maria H Chahrour
Full Text Available Although autism has a clear genetic component, the high genetic heterogeneity of the disorder has been a challenge for the identification of causative genes. We used homozygosity analysis to identify probands from nonconsanguineous families that showed evidence of distant shared ancestry, suggesting potentially recessive mutations. Whole-exome sequencing of 16 probands revealed validated homozygous, potentially pathogenic recessive mutations that segregated perfectly with disease in 4/16 families. The candidate genes (UBE3B, CLTCL1, NCKAP5L, ZNF18 encode proteins involved in proteolysis, GTPase-mediated signaling, cytoskeletal organization, and other pathways. Furthermore, neuronal depolarization regulated the transcription of these genes, suggesting potential activity-dependent roles in neurons. We present a multidimensional strategy for filtering whole-exome sequence data to find candidate recessive mutations in autism, which may have broader applicability to other complex, heterogeneous disorders.
The nucleotide sequence of the coding and regulatory regions of the alpha-amylase gene (aml) of Streptomyces limosus was determined. High-resolution S1 mapping was used to locate the 5' end of the transcript and demonstrated that the gene is transcribed from a unique promoter. The predicted amino acid sequence has considerable identity to mammalian and invertebrate alpha-amylases, but not to those of plant, fungal, or eubacterial origin. Consistent with this is the susceptibility of the enzym...
Full Text Available Comparative genomic sequencing is shedding new light on bacterial identification, taxonomy and phylogeny. An in silico assessment of a core gene set necessary for cellular functioning was made to determine a consensus set of genes that would be useful for the identification, taxonomy and phylogeny of the species belonging to the subclass Actinobacteridae which contained two orders Actinomycetales and Bifidobacteriales. The subclass Actinobacteridae comprised about 85% of the actinobacteria families. The following recommended criteria were used to establish a comprehensive gene set; the gene should (i be long enough to contain phylogenetically useful information, (ii not be subject to horizontal gene transfer, (iii be a single copy (iv have at least two regions sufficiently conserved that allow the design of amplification and sequencing primers and (v predict whole-genome relationships. We applied these constraints to 50 different Actinobacteridae genomes and made 1,224 pairwise comparisons of the genome conserved regions and gene fragments obtained by using Sequence VARiability Analysis Program (SVARAP, which allow designing the primers. Following a comparative statistical modeling phase, 3 gene fragments were selected, ychF, rpoB, and secY with R2>0.85. Selected sets of broad range primers were tested from the 3 gene fragments and were demonstrated to be useful for amplification and sequencing of 25 species belonging to 9 genera of Actinobacteridae. The intraspecies similarities were 96.3-100% for ychF, 97.8-100% for rpoB and 96.9-100% for secY among 73 strains belonging to 15 species of the subclass Actinobacteridae compare to 99.4-100% for 16S rRNA. The phylogenetic topology obtained from the combined datasets ychF+rpoB+secY was globally similar to that inferred from the 16S rRNA but with higher confidence. It was concluded that multi-locus sequence analysis using core gene set might represent the first consensus and valid approach for
Leonhardt, E.A.; Kapp, L.N.; Young, B.R.; Murnane, J.P. (Univ. of California, San Francisco, CA (United States))
A radioresistant cell clone (1B3) was previously isolated after transfection of an ataxia-telangiectasia (AT) group D cell line with a human cosmid library. A cosmid rescued from the integration site in 1B3 contained human DNA from chromosome position 11q23, the same region shown by both genetic linkage and chromosome transfer to contain the genes for AT complementation groups A/B, C, and D. A gene within the cosmid (ATDC) was found to produce mRNAs of different sizes. A cDNA for one of the most abundant mRNAs (3.0 kb) was isolated from a HeLa cell library. In the present study, the authors sequenced the 3.0-kb cDNA and the surrounding intron DNA in the cosmids. They used polymerase chain reaction, with primers in the introns, to confirm the number of exons and to analyze DNA from AT group D cells for mutations within this gene. Although no mutations were found, they do not rule out the possibility that mutations may be present within the regulatory sequences or coding sequences found in other mRNAs specific for this gene. From the sequence analysis, they found that the ATDC gene product is one of a group of proteins that share multiple zinc finger motifs and an adjacent leucine zipper motif. These proteins have been proposed to form homo- or hetero-dimers involved in nucleic acid binding, consistent with the fact that many of these proteins appear to be transcriptional regulatory factors involved in carcinogenesis and/or differentiation. The likelihood that the ATDC gene product is involved in transcriptional regulation could explain the pleiomorphic characteristics of AT, including abnormal cell cycle regulation. 36 refs., 5 figs., 2 tabs.
Zhanji Liu; Suping Feng; Manish K.Pandey; Xiaoping Chen; Albert K.Culbreath; Rajeev K.Varshney; Baozhu Guo
Low genetic diversity makes peanut (Arachis hypogaea L.) very vulnerable to plant pathogens,causing severe yield loss and reduced seed quality.Several hundred partial genomic DNA sequences as nucleotide-binding-site leucine-rich repeat (NBS-LRR) resistance genes (R) have been identified,but a small portion with expressed transcripts has been found.We aimed to identify resistance gene analogs (RGAs) from peanut expressed sequence tags (ESTs) and to develop polymorphic markers.The protein sequences of 54 known R genes were used to identify homologs from peanut ESTs from public databases.A total of 1,053 ESTs corresponding to six different classes of known R genes were recovered,and assembled 156 contigs and 229 singletons as peanut-expressed RGAs.There were 69 that encoded for NBS-LRR proteins,191 that encoded for protein kinases,82 that encoded for LRR-PK/transmembrane proteins,28 that encoded for Toxin reductases,11 that encoded for LRR-domain containing proteins and four that encoded for TM-domain containing proteins.Twenty-eight simple sequence repeats (SSRs)were identified from 25 peanut expressed RGAs.One SSR polymorphic marker (RGA121) was identified.Two polymerase chain reaction-based markers (Ahsw-1 and Ahsw-2) developed from RGA013 were homologous to the Tomato Spotted Wilt Virus (TSWV) resistance gene.All three markers were mapped on the same linkage group AhlV.These expressed RGAs are the source for RGA-tagged marker development and identification of peanut resistance genes.
Full Text Available White mold, caused by the necrotrophic fungus (Lib. de Bary, is a major disease of common bean ( L.. WM7.1 and WM8.3 are two quantitative trait loci (QTL with major effects on tolerance to the pathogen. Advanced backcross populations segregating individually for either of the two QTL, and a recombinant inbred (RI population segregating for both QTL were used to fine map and confirm the genetic location of the QTL. The QTL intervals were physically mapped using the reference common bean genome sequence, and the physical intervals for each QTL were further confirmed by sequence-based introgression mapping. Using whole-genome sequence data from susceptible and tolerant DNA pools, introgressed regions were identified as those with significantly higher numbers of single-nucleotide polymorphisms (SNPs relative to the whole genome. By combining the QTL and SNP data, WM7.1 was located to a 660-kb region that contained 41 gene models on the proximal end of chromosome Pv07, while the WM8.3 introgression was narrowed to a 1.36-Mb region containing 70 gene models. The most polymorphic candidate gene in the WM7.1 region encodes a BEACH-domain protein associated with apoptosis. Within the WM8.3 interval, a receptor-like protein with the potential to recognize pathogen effectors was the most polymorphic gene. The use of gene and sequence-based mapping identified two candidate genes whose putative functions are consistent with the current model of pathogenicity.
Takihara, Hayato; Ogihara, Jun; Yoshida, Takao; Okuda, Shujiro; Nakajima, Mutsuyasu; Iwabuchi, Noriyuki; Sunairi, Michio
We previously reported that R. erythropolis PR4 translocated from the aqueous to the alkane phase, and then grew in two phase cultures to which long-chain alkanes had been added. This was considered to be beneficial for bioremediation. In the present study, we investigated the proteins involved in the translocation of R. erythropolis PR4. The results of our proteogenomic analysis suggested that GroEL2 was upregulated more in cells that translocated inside of the pristane (C19) phase than in those located at the aqueous-alkane interface attached to the n-dodecane (C12) surface. PR4 (pK4-EL2-1) and PR4 (pK4-ΔEL2-1) strains were constructed to confirm the effects of the upregulation of GroEL2 in translocated cells. The expression of GroEL2 in PR4 (pK4-EL2-1) was 15.5-fold higher than that in PR4 (pK4-ΔEL2-1) in two phase cultures containing C12. The growth and cell surface lipophilicity of PR4 were enhanced by the introduction of pK4-EL2-1. These results suggested that the plasmid overexpression of groEL2 in PR4 (pK4-EL2-1) led to changes in cell localization, enhanced growth, and increased cell surface lipophilicity. Thus, we concluded that the overexpression of GroEL2 may play an important role in increasing the organic solvent tolerance of R. erythropolis PR4 in aqueous-alkane two phase cultures.
Fukura, K; Yamamoto, A; Hashimoto, T; Goto, N
The small subunit ribosomal RNA (SrRNA) gene of Trichomonas tenax ATCC30207 was amplified by PCR and the 1.55-kb product was cloned into plasmid vector pUC18. Four clones were isolated and sequenced. The insert DNAs were 1,552 bp long and their G+C contents were 48.1%; three of them had exactly the same DNA sequences and one had only one nucleotide change. A representative SrRNA sequence was analyzed and a phylogenetic tree was estimated by the neighbor-joining (NJ) method. Among the protists examined, T. tenax was placed as the closest relative of Tritrichomonas foetus, as expected from the traditional taxonomy. The total homology between the two SrRNA sequences was 89.2%.
Turakainen, Hilkka; Korhola, Matti
We have cloned by complementation in Saccharomyces cerevisiae and sequenced a LEU2 gene from the sour dough yeast Candida milleri CBS 8195 and studied its chromosomal location. The LEU2 coding sequence was 1092 nt long encoding a putative beta-isopropylmalate dehydrogenase protein of 363 amino acids. The nucleotide sequence in the coding region had 71.6% identity to S. cerevisiae LEU2 sequence. On the protein level, the identity of C. milleri Leu2p to S. cerevisiae Leu2p was 84.1%. The CmLEU2 DNA probe hybridized to one to three chromosomal bands and two or three BamHI restriction fragments in C. milleri but did not give any signal to chromosomes or restriction fragments of C. albicans, S. cerevisiae, S. exiguus or Torulaspora delbrueckii. Using CmLEU2 probe for DNA hybridization makes it easy to quickly identify C. milleri among other sour dough yeasts.
Willis, D; Foglesong, D; Granoff, A
We have used "gene walking" with synthetic oligonucleotides and M13 dideoxynucleotide sequencing techniques to obtain the complete coding and flanking sequences of the gene encoding a major immediate-early RNA (molecular weight, 169,000) of frog virus 3. R-loop mapping of the cloned XbaI K fragment of frog virus 3 DNA with immediate-early RNA from infected cells showed that an RNA of approximately 500 to 600 nucleotides (the right size to code for the immediate-early viral 18-kilodalton protein of unknown function) hybridized to a region within 100 base pairs of one end of the XbaI K fragment; no evidence for splicing was observed in the electron microscope or by single-strand nuclease analysis. Further restriction mapping narrowed the location of the gene to the XbaI end of a 2-kilobase-pair XbaI-Bg/II fragment, which was bidirectionally subcloned into the bacteriophage pair mp10 and mp11 for sequencing. Mung bean nuclease mapping was used to identify both the 5' and the 3' ends of the mRNA. The 5' end mapped within an AT-rich region 19 base pairs upstream from two in-phase AUG start codons that were immediately followed by an open reading frame of 157 amino acids. Another AT-rich sequence was found at -29 base pairs from the 5' end of the mRNA start site; this sequence may function as a TATA box. The 3' end of the message displayed considerable microheterogeneity, but clearly terminated within a third AT-rich region 50 to 60 base pairs from the translation stop codon. The eucaryotic polyadenylic acid addition signal (AATAAA) was not present, a finding to be expected since frog virus 3 mRNA is not polyadenylated. Both the single-stranded mp10 clone of the XbaI-Bg/II fragment and a 15-base oligonucleotide complementary to the region flanking the two AUG translation start codons inhibited translation of the immediate-early 18-kilodalton protein in vitro, confirming the identity of the sequenced gene. As the regulatory sequences of this gene did not resemble those of
M. Ananda Chitra; Jayanthy, C.; Nagarajan, B.
Background: Staphylococcus pseudintermedius (SP) is the major pathogenic species of dogs involved in a wide variety of skin and soft tissue infections. The accessory gene regulator (agr) locus of Staphylococcus aureus has been extensively studied, and it influences the expression of many virulence genes. It encodes a two-component signal transduction system that leads to down-regulation of surface proteins and up-regulation of secreted proteins during in vitro growth of S. aureus. The objecti...
Morrill, Benson H; Rickords, Lee F; Schafstall, Heather J
Sequence length polymorphisms between the amelogenin (AMELX) and the amelogenin-like (AMELY) genes both within and between several mammalian species have been identified and utilized for sex determination, species identification, and to elucidate evolutionary relationships. Sex determination via polymerase chain reaction (PCR) assays of the AMELX and AMELY genes has been successful in greater apes, prosimians, and two species of old world monkeys. To date, no sex determination PCR assay using AMELX and AMELY has been developed for new world monkeys. In this study, we present partial AMELX and AMELY sequences for five old world monkey species (Mandrillus sphinx, Macaca nemestrina, Macaca fuscata, Macaca mulatta, and Macaca fascicularis) along with primer sets that can be used for sex determination of these five species. In addition, we compare the sequences we generated with other primate AMELX and AMELY sequences available on GenBank and discuss sequence length polymorphisms and their usefulness in sex determination within primates. The mandrill and four species of macaque all share two similar deletion regions with each other, the human, and the chimpanzee in the region sequenced. These two deletion regions are 176-181 and 8 nucleotides in length. In analyzing existing primate sequences on GenBank, we also discovered that a separate six-nucleotide polymorphism located approximately 300 nucleotides upstream of the 177 nucleotide polymorphism in sequences of humans and chimps was also present in two species of new world monkeys (Saimiri boliviensis and Saimiri sciureus). We designed primers that incorporate this polymorphism, creating the first AMELX and AMELY PCR primer set that has been used successfully to generate two bands in a new world monkey species.
Rodriguez Jose M
Full Text Available Abstract Background The potential adaptive significance of transposable elements (TEs to the host genomes in which they reside is a topic that has been hotly debated by molecular evolutionists for more than two decades. Recent genomic analyses have demonstrated that TE fragments are associated with functional genes in plants and animals. These findings suggest that TEs may contribute significantly to gene evolution. Results We have analyzed two transposable elements associated with genes in the sequenced Drosophila melanogaster y; cn bw sp strain. A fragment of the Antonia long terminal repeat (LTR retrotransposon is present in the intron of Chitinase 3 (Cht3, a gene located within the constitutive heterochromatin of chromosome 2L. Within the euchromatin of chromosome 2R a full-length Burdock LTR retrotransposon is located immediately 3' to cathD, a gene encoding cathepsin D. We tested for the presence of these two TE/gene associations in strains representing 12 geographically diverse populations of D. melanogaster. While the cathD insertion variant was detected only in the sequenced y; cn bw sp strain, the insertion variant present in the heterochromatic Cht3 gene was found to be fixed throughout twelve D. melanogaster populations and in a D. mauritiana strain suggesting that it maybe of adaptive significance. To further test this hypothesis, we sequenced a 685bp region spanning the LTR fragment in the intron of Cht3 in strains representative of the two sibling species D. melanogaster and D. mauritiana (~2.7 million years divergent. The level of sequence divergence between the two species within this region was significantly lower than expected from the neutral substitution rate and lower than the divergence observed between a randomly selected intron of the Drosophila Alcohol dehydrogenase gene (Adh. Conclusions Our results suggest that a 359 bp fragment of an Antonia retrotransposon (complete LTR is 659 bp located within the intron of the
Full Text Available Abstract Background High-throughput re-sequencing is rapidly becoming the method of choice for studies of neutral and adaptive processes in natural populations across taxa. As re-sequencing the genome of large numbers of samples is still cost-prohibitive in many cases, methods for genome complexity reduction have been developed in attempts to capture most ecologically-relevant genetic variation. One of these approaches is sequence capture, in which oligonucleotide baits specific to genomic regions of interest are synthesized and used to retrieve and sequence those regions. Results We used sequence capture to re-sequence most predicted exons, their upstream regulatory regions, as well as numerous random genomic intervals in a panel of 48 genotypes of the angiosperm tree Populus trichocarpa (black cottonwood, or ‘poplar’. A total of 20.76Mb (5% of the poplar genome was targeted, corresponding to 173,040 baits. With 12 indexed samples run in each of four lanes on an Illumina HiSeq instrument (2x100 paired-end, 86.8% of the bait regions were on average sequenced at a depth ≥10X. Few off-target regions (>250bp away from any bait were present in the data, but on average ~80bp on either side of the baits were captured and sequenced to an acceptable depth (≥10X to call heterozygous SNPs. Nucleotide diversity estimates within and adjacent to protein-coding genes were similar to those previously reported in Populus spp., while intergenic regions had higher values consistent with a relaxation of selection. Conclusions Our results illustrate the efficiency and utility of sequence capture for re-sequencing highly heterozygous tree genomes, and suggest design considerations to optimize the use of baits in future studies.
Liu, Zhanji; Crampton, Mollee; Todd, Antonette; Kalavacharla, Venu
Common bean (Phaseolus vulgaris L.) is one of the most important legumes in the world. Several diseases severely reduce bean production and quality; therefore, it is very important to better understand disease resistance in common bean in order to prevent these losses. More than 70 resistance (R) genes which confer resistance against various pathogens have been cloned from diverse plant species. Most R genes share highly conserved domains which facilitates the identification of new candidate R genes from the same species or other species. The goals of this study were to isolate expressed R gene-like sequences (RGLs) from 454-derived transcriptomic sequences and expressed sequence tags (ESTs) of common bean, and to develop RGL-tagged molecular markers. A data-mining approach was used to identify tentative P. vulgaris R gene-like sequences from approximately 1.69 million 454-derived sequences and 116,716 ESTs deposited in GenBank. A total of 365 non-redundant sequences were identified and named as common bean (P. vulgaris = Pv) resistance gene-like sequences (PvRGLs). Among the identified PvRGLs, about 60% (218 PvRGLs) were from 454-derived sequences. Reverse transcriptase-polymerase chain reaction (RT-PCR) analysis confirmed that PvRGLs were actually expressed in the leaves of common bean. Upon comparison to P. vulgaris genomic sequences, 105 (28.77%) of the 365 tentative PvRGLs could be integrated into the existing common bean physical map. Based on the syntenic blocks between common bean and soybean, 237 (64.93%) PvRGLs were anchored on the P. vulgaris genetic map and will need to be mapped to determine order. In addition, 11 sequence-tagged-site (STS) and 19 cleaved amplified polymorphic sequence (CAPS) molecular markers were developed for 25 unique PvRGLs. In total, 365 PvRGLs were successfully identified from 454-derived transcriptomic sequences and ESTs available in GenBank and about 65% of PvRGLs were integrated into the common bean genetic map. A total of 30
Zhang, Chengjun; Wang, Jun; Marowsky, Nicholas C; Long, Manyuan; Wing, Rod A; Fan, Chuanzhu
In an effort to identify newly evolved genes in rice, we searched the genomes of Asian-cultivated rice Oryza sativa ssp. japonica and its wild progenitors, looking for lineage-specific genes. Using genome pairwise comparison of approximately 20-Mb DNA sequences from the chromosome 3 short arm (Chr3s) in six rice species, O. sativa, O. nivara, O. rufipogon, O. glaberrima, O. barthii, and O. punctata, combined with synonymous substitution rate tests and other evidence, we were able to identify potential recently duplicated genes, which evolved within the last 1 Myr. We identified 28 functional O. sativa genes, which likely originated after O. sativa diverged from O. glaberrima. These genes account for around 1% (28/3,176) of all annotated genes on O. sativa's Chr3s. Among the 28 new genes, two recently duplicated segments contained eight genes. Fourteen of the 28 new genes consist of chimeric gene structure derived from one or multiple parental genes and flanking targeting sequences. Although the majority of these 28 new genes were formed by single or segmental DNA-based gene duplication and recombination, we found two genes that were likely originated partially through exon shuffling. Sequence divergence tests between new genes and their putative progenitors indicated that new genes were most likely evolving under natural selection. We showed all 28 new genes appeared to be functional, as suggested by Ka/Ks analysis and the presence of RNA-seq, cDNA, expressed sequence tag, massively parallel signature sequencing, and/or small RNA data. The high rate of new gene origination and of chimeric gene formation in rice may demonstrate rice's broad diversification, domestication, its environmental adaptation, and the role of new genes in rice speciation.
Schorn, Michelle A; Alanjary, Mohammad M; Aguinaldo, Kristen; Korobeynikov, Anton; Podell, Sheila; Patin, Nastassia; Lincecum, Tommie; Jensen, Paul R; Ziemert, Nadine; Moore, Bradley S
Traditional natural product discovery methods have nearly exhausted the accessible diversity of microbial chemicals, making new sources and techniques paramount in the search for new molecules. Marine actinomycete bacteria have recently come into the spotlight as fruitful producers of structurally diverse secondary metabolites, and remain relatively untapped. In this study, we sequenced 21 marine-derived actinomycete strains, rarely studied for their secondary metabolite potential and under-represented in current genomic databases. We found that genome size and phylogeny were good predictors of biosynthetic gene cluster diversity, with larger genomes rivalling the well-known marine producers in the Streptomyces and Salinispora genera. Genomes in the Micrococcineae suborder, however, had consistently the lowest number of biosynthetic gene clusters. By networking individual gene clusters into gene cluster families, we were able to computationally estimate the degree of novelty each genus contributed to the current sequence databases. Based on the similarity measures between all actinobacteria in the Joint Genome Institute's Atlas of Biosynthetic gene Clusters database, rare marine genera show a high degree of novelty and diversity, with Corynebacterium, Gordonia, Nocardiopsis, Saccharomonospora and Pseudonocardia genera representing the highest gene cluster diversity. This research validates that rare marine actinomycetes are important candidates for exploration, as they are relatively unstudied, and their relatives are historically rich in secondary metabolites.
Chang, Perng-Kuang; Horn, Bruce W; Dorner, Joe W
Aspergillus flavus populations are genetically diverse. Isolates that produce either, neither, or both aflatoxins and cyclopiazonic acid (CPA) are present in the field. We investigated defects in the aflatoxin gene cluster in 38 nonaflatoxigenic A. flavus isolates collected from southern United States. PCR assays using aflatoxin-gene-specific primers grouped these isolates into eight (A-H) deletion patterns. Patterns C, E, G, and H, which contain 40 kb deletions, were examined for their sequence breakpoints. Pattern C has one breakpoint in the cypA 3' untranslated region (UTR) and another in the verA coding region. Pattern E has a breakpoint in the amdA coding region and another in the ver1 5'UTR. Pattern G contains a deletion identical to the one found in pattern C and has another deletion that extends from the cypA coding region to one end of the chromosome as suggested by the presence of telomeric sequence repeats, CCCTAATGTTGA. Pattern H has a deletion of the entire aflatoxin gene cluster from the hexA coding region in the sugar utilization gene cluster to the telomeric region. Thus, deletions in the aflatoxin gene cluster among A. flavus isolates are not rare, and the patterns appear to be diverse. Genetic drift may be a driving force that is responsible for the loss of the entire aflatoxin gene cluster in nonaflatoxigenic A. flavus isolates when aflatoxins have lost their adaptive value in nature.
Full Text Available Peroxisome proliferator-activated receptor gamma (PPARγ is one of the most extensively studied ligand-inducible transcription factors (TFs, able to modulate its transcriptional activity through conformational changes. It is of particular interest because of its pleiotropic functions: it plays a crucial role in the expression of key genes involved in adipogenesis, lipid and glucid metabolism, atherosclerosis, inflammation, and cancer. Its protein isoforms, the wide number of PPARγ target genes, ligands, and coregulators contribute to determine the complexity of its function. In addition, the presence of genetic variants is likely to affect expression levels of target genes although the impact of PPARG gene variations on the expression of target genes is not fully understood. The introduction of massively parallel sequencing platforms—in the Next Generation Sequencing (NGS era—has revolutionized the way of investigating the genetic causes of inherited diseases. In this context, DNA-Seq for identifying—within both coding and regulatory regions of PPARG gene—novel nucleotide variations and haplotypes associated to human diseases, ChIP-Seq for defining a PPARγ binding map, and RNA-Seq for unraveling the wide and intricate gene pathways regulated by PPARG, represent incredible steps toward the understanding of PPARγ in health and disease.
Kitamoto, N; Yamagata, H; Kato, T; Tsukagoshi, N; Udaka, S
A gene coding for thermophilic beta-amylase of Clostridium thermosulfurogenes was cloned into Bacillus subtilis, and its nucleotide sequence was determined. The nucleotide sequence suggested that the thermophilic beta-amylase is translated from monocistronic mRNA as a secretory precursor with a signal peptide of 32 amino acid residues. The deduced amino acid sequence of the mature beta-amylase contained 519 residues with a molecular weight of 57,167. The amino acid sequence of the C. thermosulfurogenes beta-amylase showed 54, 32, and 32% homology with those of the Bacillus polymyxa, soybean, and barley beta-amylases, respectively. Twelve well-conserved regions were found among the amino acid sequences of the four beta-amylases. To elucidate the mechanism rendering the C. thermosulfurogenes beta-amylase thermophilic, its amino acid sequence was compared with that of the B. polymyxa beta-amylase. The C. thermosulfurogenes beta-amyulase contained more Cys residues and fewer hydrophilic amino acid residues than the B. polymyxa beta-amylase did. Several regions were found in the amino acid sequence of the C. thermosulfurogenes beta-amylase, where the hydrophobicity was remarkably high as compared with that of the corresponding regions of the B. polymyxa beta-amylase. PMID:2461360
TANGGUOQING; YONGYANBAI; 等
Chitinase,which catalyzes the hydrolysis of the β-1,4-acetyl-D-glucosamine linkages of the fungal cell wall polymer chitin,is involved in inducible plants defense system.By construction of cabbage(Brassica oleracea var. capitata) genomic library and screening the library with pRCH8,a probe of rice chitinase gene fragment,a chitinase genomic sequence was isolated.The complete uncleotide sequence of the putative cabbage chitinase gene (cabch29) was determined,with its longest open reading frame (ORF) encoding a polypeptide of 413 aa.This polypeptide consists of a 21 aa N-terminal signal peptide,two chitin-binding domains different from those of other classes of plant chitinases,and a catalytic domain.Homology analysis illustrated that this cabch29 gene has 58.8% identity at the nucleotide level with the pRCH8 ORF probe and has 50% identity at the amino acid level tiwh the catalytic domains of chitinase from bean,maize and sugar beet.Meanwhile,several kinds of cis-elements,such as TATA box,CAAT box,GATA motif,ASF-1 binding site,wound-response elements and AATAAA,have also been discovered in the flanking region of cabch29 gene.
N. H. Birkebæk
Full Text Available We report on a boy with diabetes mellitus and a phenotype indicating glucokinase (GCK insufficiency, but a normal GCK gene examination applying direct gene sequencing. The boy was referred for diabetes mellitus at 7.5 years old. His father, grandfather and great grandfather suffered type 2 DM. Several blood glucose profiles showed (BG of 6.5–10 mmol/L L. After three years on neutral insulin Hagedorn (NPH in a dose of 0.3 IU/kg/day haemoglobin A1c (HbA1c was 6.8%. Treatment was changed to sulphonylurea 750 mg a day, and after 4 years HbA1c was 7%. At that time a multiplex ligation-dependent amplification gene dosage assay (MLPA was done, revealing a whole GCK gene deletion. Medical treatment was ceased, and after one year HbA1c was 6.8%. This case underscores the importance of a MLPA examination if the phenotype of a patient is strongly indicative of GCK insufficiency and no mutation is identified using direct sequencing.
Abbady, A Q; Al-Daoude, A; Al-Mariri, A; Zarkawi, M; Muyldermans, S
The deployment of today's antibodies that are able to distinguish Brucella from the closely similar pathogens, such as Yersinia, is still considered a great challenge since both pathogens share identical LPS (lipopolysaccharide) O-ring epitopes. In addition, because of the great impact of Brucella on health and economy in many countries including Syria, much effort is going to the development of next generation vaccines, mainly on the identification of new immunogenic proteins of this pathogen. In this context, Brucella-specific nanobodies (Nbs), camel genetic engineered heavy-chain antibody fragments, could be of great value. Previously, a large Nb library was constructed from a camel immunized with heat-killed Brucella. Phage display panning of this 'immune' library with Brucella total lysate resulted in a remarkable fast enrichment for a Nb referred to as NbBruc02. In the present work, we investigated the main characteristics of this Nb that can efficiently distinguish under well-defined conditions the Brucella from other bacteria including Yersinia. NbBruc02 showed a strong and specific interaction with its antigen within the crude lysate as tested by a surface plasmon resonance (SPR) biosensor and it was also able to pull down its cognate antigen from such lysate by immuno-capturing. Using matrix assisted laser desorption ionization time-of-flight mass spectrometry (MALDI-TOF-MS), NbBruc02 specific antigen was identified as chaperonin GroEL, also known as heat shock protein of 60 kDa (HSP-60), which represents a Brucella immunodominant antigen responsible of maintaining proteins folding during stress conditions. Interestingly, the antigen recognition by NbBruc02 was found to be affected by the state of GroEL folding. Thus, the Nb technology applied in the field of infectious diseases, e.g. brucellosis, yields two outcomes: (1) it generates specific binders that can be used for diagnosis, and perhaps treatment, and (2) it identifies the immunogenic candidate
Zacher, Angela; Kaulich, Kerstin; Stepanow, Stefanie; Wolter, Marietta; Köhrer, Karl; Felsberg, Jörg; Malzkorn, Bastian; Reifenberger, Guido
Current classification of gliomas is based on histological criteria according to the World Health Organization (WHO) classification of tumors of the central nervous system. Over the past years, characteristic genetic profiles have been identified in various glioma types. These can refine tumor diagnostics and provide important prognostic and predictive information. We report on the establishment and validation of gene panel next generation sequencing (NGS) for the molecular diagnostics of gliomas. We designed a glioma-tailored gene panel covering 660 amplicons derived from 20 genes frequently aberrant in different glioma types. Sensitivity and specificity of glioma gene panel NGS for detection of DNA sequence variants and copy number changes were validated by single gene analyses. NGS-based mutation detection was optimized for application on formalin-fixed paraffin-embedded tissue specimens including small stereotactic biopsy samples. NGS data obtained in a retrospective analysis of 121 gliomas allowed for their molecular classification into distinct biological groups, including (i) isocitrate dehydrogenase gene (IDH) 1 or 2 mutant astrocytic gliomas with frequent α-thalassemia/mental retardation syndrome X-linked (ATRX) and tumor protein p53 (TP53) gene mutations, (ii) IDH mutant oligodendroglial tumors with 1p/19q codeletion, telomerase reverse transcriptase (TERT) promoter mutation and frequent Drosophila homolog of capicua (CIC) gene mutation, as well as (iii) IDH wildtype glioblastomas with frequent TERT promoter mutation, phosphatase and tensin homolog (PTEN) mutation and/or epidermal growth factor receptor (EGFR) amplification. Oligoastrocytic gliomas were genetically assigned to either of these groups. Our findings implicate gene panel NGS as a promising diagnostic technique that may facilitate integrated histological and molecular glioma classification.
Full Text Available Abstract Background Usher syndrome (USH combines sensorineural deafness with blindness. It is inherited in an autosomal recessive mode. Early diagnosis is critical for adapted educational and patient management choices, and for genetic counseling. To date, nine causative genes have been identified for the three clinical subtypes (USH1, USH2 and USH3. Current diagnostic strategies make use of a genotyping microarray that is based on the previously reported mutations. The purpose of this study was to design a more accurate molecular diagnosis tool. Methods We sequenced the 366 coding exons and flanking regions of the nine known USH genes, in 54 USH patients (27 USH1, 21 USH2 and 6 USH3. Results Biallelic mutations were detected in 39 patients (72% and monoallelic mutations in an additional 10 patients (18.5%. In addition to biallelic mutations in one of the USH genes, presumably pathogenic mutations in another USH gene were detected in seven patients (13%, and another patient carried monoallelic mutations in three different USH genes. Notably, none of the USH3 patients carried detectable mutations in the only known USH3 gene, whereas they all carried mutations in USH2 genes. Most importantly, the currently used microarray would have detected only 30 of the 81 different mutations that we found, of which 39 (48% were novel. Conclusions Based on these results, complete exon sequencing of the currently known USH genes stands as a definite improvement for molecular diagnosis of this disease, which is of utmost importance in the perspective of gene therapy.
Kris Genelyn B. Dimasuay
Full Text Available Trichomonads are obligate anaerobes generally found in the digestive and genitourinary tract of domestic animals. In this study, four trichomonad isolates were obtained from carabao, dog, and pig hosts using rectal swab. Genomic DNA was extracted using Chelex method and the 18S rRNA gene was successfully amplified through novel sets of primers and undergone DNA sequencing. Aligned isolate sequences together with retrieved 18S rRNA gene sequences of known trichomonads were utilized to generate phylogenetic trees using maximum likelihood and neighbor-joining analyses. Two isolates from carabao were identified as Simplicimonas similis while each isolate from dog and pig was identified as Pentatrichomonas hominis and Trichomitus batrachorum, respectively. This is the first report of S. similis in carabao and the identification of T. batrachorum in pig using 18S rRNA gene sequence analysis. The generated phylogenetic tree yielded three distinct groups mostly with relatively moderate to high bootstrap support and in agreement with the most recent classification. Pathogenic potential of the trichomonads in these hosts still needs further investigation.
Abdel-Moneim, Ahmed S; Al-Malky, Mater I R; Alsulaimani, Adnan A A; Abuelsaad, Abdelaziz S A; Mohamed, Imad; Ismail, Ayman K
Group A rotavirus is responsible for inducing severe diarrhea in young children worldwide. Rotavirus vaccines are used to control the disease in many countries. In the current study, the sequences of human rotavirus G and P types in Saudi Arabia are reported and compared to different relevant published sequences. In addition, the VP4 and VP7 genes of the G1P strains are compared to different antigenic epitopes of the rotavirus vaccines. Stool samples were collected from children under 2 years suffering from severe diarrhea. Screening of the rotavirus-positive samples was performed with rapid antigen detection kit. RNA was amplified from rotavirus-positive samples by reverse transcriptase polymerase chain reaction assay for both VP4 and VP7 genes. Direct sequencing of the VP4 and VP7 genes was conducted and the obtained sequences were compared to each other and to the rotavirus vaccines. Both G1P G1P genotypes were detected. Phylogenetic analysis revealed that the detected strains belong to G1 lineage 1 and 2, P lineage 3, and to P lineage 5. Multiple amino acid substitutions were detected between the Saudi RVA strains and the commonly used vaccines. The current findings emphasize the importance of the continuous surveillance of the circulating rotavirus strains, which is crucial for monitoring virus evolution and helping in predicting the protection level afforded by rotavirus vaccines.
McGraw, N J; Bailey, J N; Cleaves, G R; Dembinski, D R; Gocke, C R; Joliffe, L K; MacWright, R S; McAllister, W T
The RNA polymerases encoded by bacteriophages T3 and T7 have similar structures, but exhibit nearly exclusive template specificities. We have determined the nucleotide sequence of the region of T3 DNA that encodes the T3 RNA polymerase (the gene 1.0 region), and have compared this sequence with the corresponding region of T7 DNA. The predicted amino acid sequence of the T3 RNA polymerase exhibits very few changes when compared to the T7 enzyme (82% of the residues are identical). Significant differences appear to cluster in three distinct regions in the amino-terminal half of the protein. Analysis of the data from both enzymes suggests features that may be important for polymerase function. In particular, a region that differs between the T3 and T7 enzymes exhibits significant homology to the bi-helical domain that is common to many sequence-specific DNA binding proteins. The region that flanks the structural gene contains a number of regulatory elements including: a promoter for the E. coli RNA polymerase, a potential processing site for RNase III and a promoter for the T3 polymerase. The promoter for the T3 RNA polymerase is located only 12 base pairs distal to the stop codon for the structural gene. PMID:3903658
Bidichadani, S.I.; Lanyon, W.G.; Connor, J.M. [Glascow Univ. (United Kingdom)] [and others
Hemophilia A is a common X-linked recessive disorder of bleeding caused by deleterious mutations in the gene for clotting factor VIII. The large size of the factor VIII gene, the high frequency of de novo mutations and its tissue-specific expression complicate the detection of mutations. We have used a combination of RT-PCR of ectopic factor VIII transcripts and genomic DNA-PCRs to amplify the entire essential sequence of the factor VIII gene. This is followed by chemical mismatch cleavage analysis and direct sequencing in order to facilitate a comprehensive search for mutations. We describe the characterization of nine potentially pathogenic mutations, six of which are novel. In each case, a correlation of the genotype with the observed phenotype is presented. In order to evaluate the pathogenicity of the five missense mutations detected, we have analyzed them for evolutionary sequence conservation and for their involvement of sequence motifs catalogued in the PROSITE database of protein sites and patterns.
Cowan, Graeme; Weston-Bell, Nicola J; Bryant, Dean; Seckinger, Anja; Hose, Dirk; Zojer, Niklas; Sahota, Surinder S
Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway.
Kawakami, K; Toma, C; Honma, Y
A gene (apk) encoding the extracellular protease of Aeromonas caviae Ae6 has been cloned and sequenced. For cloning the gene, the DNA genomic library was screened using skim milk LB agar. One clone harboring plasmid pKK3 was selected for sequencing. Nucleotide sequencing of the 3.5 kb region of pKK3 revealed a single open reading frame (ORF) of 1,785 bp encoding 595 amino acids. The deduced polypeptide contained a putative 16-amino acid signal peptide followed by a large propeptide. The N-terminal amino acid sequence of purified recombinant protein (APK) was consistent with the DNA sequence. This result suggested a mature protein of 412 amino acids with a molecular mass of 44 kDa. However, the molecular mass of purified recombinant APK revealed 34 kDa by SDS-PAGE, suggesting that further processing at the C-terminal region took place. The 2 motifs of zinc binding sites deduced are highly conserved in the APK as well as in other zinc metalloproteases including Vibrio proteolyticus neutral protease, Emp V from Vibrio vulnificus, HA/P from Vibrio cholerae, and Pseudomonas aeruginosa elastase. Proteolytic activity was inhibited by EDTA, Zincov, 1,10-phenanthroline and tetraethylenepentamine while unaffected by the other inhibitors tested. The protease showed maximum activity at pH 7.0 and was inactivated by heating at 80 C for 15 min. These results together suggest that APK belongs to the thermolysin family of metalloendopeptidases.
Weijie Wang; Xiaobang Hu; Donghui Zhang; Jianhua Jiao; Yan Sun; Lei Ma; Changliang Zhu
Objective: To obtain the complete β-actin gene from Aedes albopictus. Methods: Total RNA was extracted from C6/36 cells. Degenerate primers were designed based on the β-actin sequences of An. gambiae, Ae. aegypti, Cx. pipiens pallens and D.melanogaster. By RT-PCR, the product was amplified, purified, cloned into the pGT vector and sequenced. The β-actin sequence was aligned and phylogenetically analyzed by the BLAST program and the CLUSTAL W program. Results: A sequence of 1132 bp including an open reading frame of 1131 bp was obtained (GenBank DQ657949). The deduced protein had 376 amino acids.Aligned to SWISS-PROT, it exhibited a high level of identity with β-actins from Anopheles, Drosophila and Culex at the amino acid sequence level. Phylogenetic analysis indicated that Ae. albopictus β-actin was much more homologous with invertebrate β-actin than with vertebrate β-actin. Conclusion: The gene may be used as the internal control in the experiments of Ae. albopictus.
Yogesh S Shouche; Milind S Patole
Mosquitoes are vectors for the transmission of many human pathogens that include viruses, nematodes and protozoa. For the understanding of their vectorial capacity, identification of disease carrying and refractory strains is essential. Recently, molecular taxonomic techniques have been utilized for this purpose. Sequence analysis of the mitochondrial 16S rRNA gene has been used for molecular taxonomy in many insects. In this paper, we have analysed a 450 bp hypervariable region of the mitochondrial 16S rRNA gene in three major genera of mosquitoes, Aedes, Anopheles and Culex. The sequence was found to be unusually A + T rich and in substitutions the rate of transversions was higher than the transition rate. A phylogenetic tree was constructed with these sequences. An interesting feature of the sequences was a stretch of Ts that distinguished between Aedes and Culex on the one hand, and Anopheles on the other. This is the first report of mitochondrial rRNA sequences from these medically important genera of mosquitoes.
Cai, Hongsheng; Tian, Shan; Dong, Hansong
The MYB proteins constitute one of the largest transcription factor families in plants. Much research has been performed to determine their structures, functions, and evolution, especially in the model plants, Arabidopsis, and rice. However, this transcription factor family has been much less studied in wheat (Triticum aestivum), for which no genome sequence is yet available. Despite this, expressed sequence tags are an important resource that permits opportunities for large scale gene identification. In this study, a total of 218 sequences from wheat were identified and confirmed to be putative MYB proteins, including 1RMYB, R2R3-type MYB, 3RMYB, and 4RMYB types. A total of 36 R2R3-type MYB genes with complete open reading frames were obtained. The putative orthologs were assigned in rice and Arabidopsis based on the phylogenetic tree. Tissue-specific expression pattern analyses confirmed the predicted orthologs, and this meant that gene information could be inferred from the Arabidopsis genes. Moreover, the motifs flanking the MYB domain were analyzed using the MEME web server. The distribution of motifs among wheat MYB proteins was investigated and this facilitated subfamily classification.
Kouzarides, T.; Bankier, A.T.; Satchwell, S.C.; Weston, K.; Tomlinson, P.; Barrell, B.G.
DNA sequence analysis has revealed that the gene coding for the human cytomegalovirus (HCMV) DNA polymerase is present within the long unique region of the virus genome. Identification is based on extensive amino acid homology between the predicted HCMV open reading frame HFLF2 and the DNA polymerase of herpes simplex virus type 1. The authors present here a 5280 base-pair DNA sequence containing the HCMV pol gene, along with the analysis of transcripts encoded within this region. Since HCMV pol also shows homology to the predicted Epstein-Barr virus pol, they were able to analyze the extent of homology between the DNA polymerases of three distantly related herpes viruses, HCMV, Epstein-Barr virus, and herpes simplex virus. The comparison shows that these DNA polymerases exhibit considerable amino acid homology and highlights a number of highly conserved regions; two such regions show homology to sequences within the adenovirus type 2 DNA polymerase. The HCMV pol gene is flanked by open reading frames with homology to those of other herpes viruses; upstream, there is a reading frame homologous to the glycoprotein B gene of herpes simplex virus type I and Epstein-Barr virus, and downstream there is a reading frame homologous to BFLF2 of Epstein-Barr virus.
Boldbaatar, Bazartseren; Bazartseren, Tsevel; Koba, Ryota; Murakami, Hironobu; Oguma, Keisuke; Murakami, Kenji; Sentsui, Hiroshi
In the current study, primers described previously and modified versions of these primers were evaluated for amplification of full-length gag genes from different equine infectious anemia virus (EIAV) strains from several countries, including the USA, Germany and Japan. Each strain was inoculated into a primary horse leukocyte culture, and the full-length gag gene was amplified by reverse transcription polymerase chain reaction. Each amplified gag gene was cloned into a plasmid vector for sequencing, and the detectable copy numbers of target DNA were determined. Use of a mixture of two forward primers and one reverse primer in the polymerase chain reaction enabled the amplification of all EIAV strains used in this study. However, further study is required to confirm these primers as universal for all EIAV strains. The nucleotide sequence of gag is considered highly conserved, as evidenced by the use of gag-encoded capsid proteins as a common antigen for the detection of EIAV in serological tests. However, significant sequence variation in the gag genes of different EIAV strains was found in the current study.
Full Text Available Abstract Background Oceans cover more than 70% of the earth's surface and are critical for the homeostasis of the environment. Among the components of the ocean ecosystem, zooplankton play vital roles in energy and matter transfer through the system. Despite their importance, understanding of zooplankton biodiversity is limited because of their fragile nature, small body size, and the large number of species from various taxonomic phyla. Here we present the results of single-gene zooplankton community analysis using a method that determines a large number of mitochondrial COI gene sequences from a bulk zooplankton sample. This approach will enable us to estimate the species richness of almost the entire zooplankton community. Results A sample was collected from a depth of 721 m to the surface in the western equatorial Pacific off Pohnpei Island, Micronesia, with a plankton net equipped with a 2-m2 mouth opening. A total of 1,336 mitochondrial COI gene sequences were determined from the cDNA library made from the sample. From the determined sequences, the occurrence of 189 species of zooplankton was estimated. BLASTN search results showed high degrees of similarity (>98% between the query and database for 10 species, including holozooplankton and merozooplankton. Conclusion In conjunction with the Census of Marine Zooplankton and Barcode of Life projects, single-gene zooplankton community analysis will be a powerful tool for estimating the species richness of zooplankton communities.
Full Text Available Abstract Background Mimivirus, a giant dsDNA virus infecting Acanthamoeba, is the prototype of the mimiviridae family, the latest addition to the family of the nucleocytoplasmic large DNA viruses (NCLDVs. Its 1.2 Mb-genome was initially predicted to encode 917 genes. A subsequent RNA-Seq analysis precisely mapped many transcript boundaries and identified 75 new genes. Findings We now report a much deeper analysis using the SOLiD™ technology combining RNA-Seq of the Mimivirus transcriptome during the infectious cycle (202.4 Million reads, and a complete genome re-sequencing (45.3 Million reads. This study corrected the genome sequence and identified several single nucleotide polymorphisms. Our results also provided clear evidence of previously overlooked transcription units, including an important RNA polymerase subunit distantly related to Euryarchea homologues. The total Mimivirus gene count is now 1018, 11% greater than the original annotation. Conclusions This study highlights the huge progress brought about by ultra-deep sequencing for the comprehensive annotation of virus genomes, opening the door to a complete one-nucleotide resolution level description of their transcriptional activity, and to the realistic modeling of the viral genome expression at the ultimate molecular level. This work also illustrates the need to go beyond bioinformatics-only approaches for the annotation of short protein and non-coding genes in viral genomes.
Campbell, H.D.; Tucker, W.Q.J.; Hort, Y.; Martinson, M.E.; Mayo, G.; Clutterbuck, E.J.; Sanderson, C.J.; Young, I.G.
The human eosinophil differentiation factor (EDF) gene was cloned from a genomic library in lambda phage EMBL3A by using a murine EDF cDNA clone as a probe. The DNA sequence of a 3.2-kilobase BamHI fragment spanning the gene was determined. The gene contains three introns. The predicted amino acid sequence of 134 amino acids is identical with that recently reported for human interleukin 5 but shows no significant homology with other known hemopoietic growth regulators. The amino acid sequence shows strong homology (approx. 70% identity) with that of murine EDF. Recombinant human EDF, expressed from the human EDF gene after transfection into monkey COS cells, stimulated the production of eosinophils and eosinophil colonies from normal human bone marrow but had no effect on the production of neutrophils or mononuclear cells (monocytes and lymphoid cells). The apparent specificity of human EDF for the eosinophil lineage in myeloid hemopoiesis contrasts with the properties of human interleukin 3 and granulocyte/macrophage and granulocyte colony-stimulating factors but is directly analogous to the biological properties of murine EDF. Human EDF therefore represents a distinct hemopoietic growth factor that could play a central role in the regulation of eosinophilia.
Full Text Available In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ. Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization.
Calame, K; Rogers, J; Early, P; Davis, M; Livant, D; Wall, R; Hood, L
The IgM molecule is composed of subunits made up of two light chain and two heavy chain (mu) polypeptides. The mu chain is encoded by several gene segments--variable (V), joining (J) and constant (Cmu). The Cmu gene segment is of particular interest for several reasons. First, the mu chain must exist in two very different environments--as an integral membrane protein in receptor IgM molecules (micrometer) and as soluble serum protein in IgM molecules into the blood (mus). Second, the Cmu region in mus is composed of four homology units or domains (Cmu1, Cmu2, Cmu3 and Cmu4) of approximately 110 amino acid residues plus a C-terminal tail of 19 residues. We asked two questions concerning the organisation of the Cmu gene segment. (1) Are the homology units separated by intervening DNA sequences as has been reported for alpha (ref. 5), gamma 1 (ref. 6) and gamma 2b (ref. 7) heavy chain genes? (2) Is the C-terminal tail separated from the Cmu4 domain by an intervening DNA sequence? If so, DNA rearrangements or RNA splicing could generate hydrophilic and hydrophobic C-terminal tails for the mus and micrometer polypeptides, respectively. We demonstrate here that intervening DNA sequences separate each of the four coding regions for Cmu domains, and that the coding regions for the Cmu4 domains and the C-terminal tail are directly contiguous.
Naveed, Muhammad; Mubeen, Samavia; Khan, SamiUllah; Ahmed, Iftikhar; Khalid, Nauman; Suleria, Hafiz Ansar Rasul; Bano, Asghari; Mumtaz, Abdul Samad
In the present study, samples of rhizosphere and root nodules were collected from different areas of Pakistan to isolate plant growth promoting rhizobacteria. Identification of bacterial isolates was made by 16S rRNA gene sequence analysis and taxonomical confirmation on EzTaxon Server. The identified bacterial strains were belonged to 5 genera i.e. Ensifer, Bacillus, Pseudomona, Leclercia and Rhizobium. Phylogenetic analysis inferred from 16S rRNA gene sequences showed the evolutionary relationship of bacterial strains with the respective genera. Based on phylogenetic analysis, some candidate novel species were also identified. The bacterial strains were also characterized for morphological, physiological, biochemical tests and glucose dehydrogenase (gdh) gene that involved in the phosphate solublization using cofactor pyrroloquinolone quinone (PQQ). Seven rhizoshperic and 3 root nodulating stains are positive for gdh gene. Furthermore, this study confirms a novel association between microbes and their hosts like field grown crops, leguminous and non-leguminous plants. It was concluded that a diverse group of bacterial population exist in the rhizosphere and root nodules that might be useful in evaluating the mechanisms behind plant microbial interactions and strains QAU-63 and QAU-68 have sequence similarity of 97 and 95% which might be declared as novel after further taxonomic characterization.
Nicholas R Polato
Full Text Available BACKGROUND: Cnidarians, including corals and anemones, offer unique insights into metazoan evolution because they harbor genetic similarities with vertebrates beyond that found in model invertebrates and retain genes known only from non-metazoans. Cataloging genes expressed in Acropora palmata, a foundation-species of reefs in the Caribbean and western Atlantic, will advance our understanding of the genetic basis of ecologically important traits in corals and comes at a time when sequencing efforts in other cnidarians allow for multi-species comparisons. RESULTS: A cDNA library from a sample enriched for symbiont free larval tissue was sequenced on the 454 GS-FLX platform. Over 960,000 reads were obtained and assembled into 42,630 contigs. Annotation data was acquired for 57% of the assembled sequences. Analysis of the assembled sequences indicated that 83-100% of all A. palmata transcripts were tagged, and provided a rough estimate of the total number genes expressed in our samples (~18,000-20,000. The coral annotation data contained many of the same molecular components as in the Bilateria, particularly in pathways associated with oxidative stress and DNA damage repair, and provided evidence that homologs of p53, a key player in DNA repair pathways, has experienced selection along the branch separating Cnidaria and Bilateria. Transcriptome wide screens of paralog groups and transition/transversion ratios highlighted genes including: green fluorescent proteins, carbonic anhydrase, and oxidative stress proteins; and functional groups involved in protein and nucleic acid metabolism, and the formation of structural molecules. These results provide a starting point for study of adaptive evolution in corals. CONCLUSIONS: Currently available transcriptome data now make comparative studies of the mechanisms underlying coral's evolutionary success possible. Here we identified candidate genes that enable corals to maintain genomic integrity despite
Full Text Available Abstract Background Myostatin (MSTN is a member of the transforming growth factor-β superfamily that negatively regulates growth of skeletal muscle tissue. The gene encoding for the MSTN peptide is a consolidate candidate for the enhancement of productivity in terrestrial livestock. This gene potentially represents an important target for growth improvement of cultured finfish. Results Here we report molecular characterization, tissue expression and sequence variability of the barramundi (Lates calcarifer MSTN-1 gene. The barramundi MSTN-1 was encoded by three exons 379, 371 and 381 bp in length and translated into a 376-amino acid peptide. Intron 1 and 2 were 412 and 819 bp in length and presented typical GT...AG splicing sites. The upstream region contained cis-regulatory elements such as TATA-box and E-boxes. A first assessment of sequence variability suggested that higher mutation rates are found in the 5' flanking region with several SNP's present in this species. A putative micro RNA target site has also been observed in the 3'UTR (untranslated region and is highly conserved across teleost fish. The deduced amino acid sequence was conserved across vertebrates and exhibited characteristic conserved putative functional residues including a cleavage motif of proteolysis (RXXR, nine cysteines and two glycosilation sites. A qualitative analysis of the barramundi MSTN-1 expression pattern revealed that, in adult fish, transcripts are differentially expressed in various tissues other than skeletal muscles including gill, heart, kidney, intestine, liver, spleen, eye, gonad and brain. Conclusion Our findings provide valuable insights such as sequence variation and genomic information which will aid the further investigation of the barramundi MSTN-1 gene in association with growth. The finding for the first time in finfish MSTN of a miRNA target site in the 3'UTR provides an opportunity for the identification of regulatory mutations on the
Full Text Available Abstract Background The identification and study of proteins from metagenomic datasets can shed light on the roles and interactions of the source organisms in their communities. However, metagenomic datasets are characterized by the presence of organisms with varying GC composition, codon usage biases etc., and consequently gene identification is challenging. The vast amount of sequence data also requires faster protein family classification tools. Results We present a computational improvement to a sequence clustering approach that we developed previously to identify and classify protein coding genes in large microbial metagenomic datasets. The clustering approach can be used to identify protein coding genes in prokaryotes, viruses, and intron-less eukaryotes. The computational improvement is based on an incremental clustering method that does not require the expensive all-against-all compute that was required by the original approach, while still preserving the remote homology detection capabilities. We present evaluations of the clustering approach in protein-coding gene identification and classification, and also present the results of updating the protein clusters from our previous work with recent genomic and metagenomic sequences. The clustering results are available via CAMERA, (http://camera.calit2.net. Conclusion The clustering paradigm is shown to be a very useful tool in the analysis of microbial metagenomic data. The incremental clustering method is shown to be much faster than the original approach in identifying genes, grouping sequences into existing protein families, and also identifying novel families that have multiple members in a metagenomic dataset. These clusters provide a basis for further studies of protein families.
Leach, Lindsey J
BACKGROUND: Bread wheat (Triticum aestivum) has a large, complex and hexaploid genome consisting of A, B and D homoeologous chromosome sets. Therefore each wheat gene potentially exists as a trio of A, B and D homoeoloci, each of which may contribute differentially to wheat phenotypes. We describe a novel approach combining wheat cytogenetic resources (chromosome substitution \\'nullisomic-tetrasomic\\' lines) with next generation deep sequencing of gene transcripts (RNA-Seq), to directly and accurately identify homoeologue-specific single nucleotide variants and quantify the relative contribution of individual homoeoloci to gene expression. RESULTS: We discover, based on a sample comprising ~5-10% of the total wheat gene content, that at least 45% of wheat genes are expressed from all three distinct homoeoloci. Most of these genes show strikingly biased expression patterns in which expression is dominated by a single homoeolocus. The remaining ~55% of wheat genes are expressed from either one or two homoeoloci only, through a combination of extensive transcriptional silencing and homoeolocus loss. CONCLUSIONS: We conclude that wheat is tending towards functional diploidy, through a variety of mechanisms causing single homoeoloci to become the predominant source of gene transcripts. This discovery has profound consequences for wheat breeding and our understanding of wheat evolution.
Qiao, X; Wu, J H; Wu, R B; Su, R; Li, C; Zhang, Y J; Wang, R J; Zhao, Y H; Fan, Y X; Zhang, W G; Li, J Q
The mammalian hair follicle (HF) is a unique, highly regenerative organ with a distinct developmental cycle. Cashmere goat (Capra hircus) HFs can be divided into two categories based on structure and development time: primary and secondary follicles. To identify differentially expressed genes (DEGs) in the primary and secondary HFs of cashmere goats, the RNA sequencing of six individuals from Arbas, Inner Mongolia, was performed. A total of 617 DEGs were identified; 297 were upregulated while 320 were downregulated. Gene ontology analysis revealed that the main functions of the upregulated genes were electron transport, respiratory electron transport, mitochondrial electron transport, and gene expression. The downregulated genes were mainly involved in cell autophagy, protein complexes, neutrophil aggregation, and bacterial fungal defense reactions. According to the Kyoto Encyclopedia of Genes and Genomes database, these genes are mainly involved in the metabolism of cysteine and methionine, RNA polymerization, and the MAPK signaling pathway, and were enriched in primary follicles. A microRNA-target network revealed that secondary follicles are involved in several important biological processes, such as the synthesis of keratin-associated proteins and enzymes involved in amino acid biosynthesis. In summary, these findings will increase our understanding of the complex molecular mechanisms of HF development and cycling, and provide a basis for the further study of the genes and functions of HF development.
Ajay K. Singh
Full Text Available Neurotoxin complex (NTC genes are arranged in two known hemagglutinin (HA and open reading frame X (ORFX clusters. NTC genes have been analyzed in four serotypes A, B, E and F of Clostridium botulinum causing human botulism. Analysis of amino acid sequences of NT genes demonstrated significant differences among subtypes and four serotypes. Phylogram tree of NT genes reveals that serotypes A1 and B1 are much closer compared to serotype E1 and F1. However, non-toxic non-hemagglutinin (NTNH gene is highly conserved among four serotypes. Analysis of phylogram tree of NTNH gene reveals that serotypes A and F are more closely related compared to serotype B and E. Additionally, sequences of HAs and ORFX genes are very divergent but these genes are specific in subtypes and serotypes of Clostridium botulinum. Information derived from sequence analyses of NTC has direct implication in development of detection tools and therapeutic countermeasures for botulism.
Liu, Pengfei; Jiang, Wenhua; Zhou, Shiyong; Gao, Jun; Zhang, Huilai
Breast cancer is a common malignancy in women and contribute largely to the cancer related death. The purpose of this study is to confirm the roles of GATA3 and identify potential biomarkers of breast cancer. Chromatin Immunoprecipitation combined with high-throughput sequencing (ChIP-Seq) (GSM1642515) and gene expression profiles (GSE24249) were downloaded from the Gene Expression Omnibus (GEO) database. Bowtie2 and MACS2 were used for the mapping and peak calling of the ChIP-Seq data respectively. ChIPseeker, a R bioconductor package was adopted for the annotation of the enriched peaks. For the gene expression profiles, we used affy and limma package to do normalization and differential expression analysis. The genes with fold change >2 and adjusted P-Value ChIP-Seq and gene expression profiles. The Database for Annotation, Visualization and Integrated Discovery (DAVID) was used for the functional enrichment analysis of overlapping genes between the target genes and differential expression genes (DEGs). What's more, the protein-protein interaction (PPI) network of the overlapping genes was obtained through the Human Protein Reference Database (HPRD). A total of 46,487 peaks were identified for GATA3 and out of which, 3256 ones were found to located at -3000 ~ 0 bp from the transcription start sites (TSS) of their nearby gene. A total of 236 down- and 343 up-regulated genes were screened out in GATA3 overexpression breast cancer samples compared with those in control. The combined analysis of ChIP-Seq and gene expression dataset showed GATA3 act as a repressor in breast cancer. Besides, 68 overlaps were obtained between the DEGs and genes included in peaks located at -3000 ~ 0 bp from TSS. Gene Ontology (GO) terms and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways related to cancer progression and gene regulation were found to be enriched in those overlaps. In the PPI network, NDRG1, JUP and etc. were found to directly interact with large number of
Full Text Available Urodele amphibians such as Japanese common newts have a remarkable ability to regenerate their injured neural retina, even as adults. We found that hematological- and neurological-expressed sequence 1 (Hn1 gene was induced in depigmented retinal pigment epithelial (RPE cells, and its expression was maintained at later stages of newt retinal regeneration. In this study, we investigated the distribution of the HN1 protein, the product of the Hn1 gene, in the developing retinas. Our immunohistochemical analyses suggested that the HN1 protein was highly expressed in an immature retina, and the subcellular localization changed during this retinogenesis as observed in newt retinal regeneration. We also found that the expression of Hn1 gene was not induced in mouse after retinal removal. Our results showed that Hn1 gene can be useful for detection of undifferentiated and dedifferentiated cells during both newt retinal development and regeneration.
Full Text Available Though pleiotropy, which refers to the phenomenon of a gene affecting multiple traits, has long played a central role in genetics, development, and evolution, estimation of the number of pleiotropy components remains a hard mission to accomplish. In this paper, we report a newly developed software package, Genepleio, to estimate the effective gene pleiotropy from phylogenetic analysis of protein sequences. Since this estimate can be interpreted as the minimum pleiotropy of a gene, it is used to play a role of reference for many empirical pleiotropy measures. This work would facilitate our understanding of how gene pleiotropy affects the pattern of genotype-phenotype map and the consequence of organismal evolution.
Halligan, B D; Desiderio, S V
Extracts of nuclei from B- and T-lymphoid cells contain a protein that binds specifically to the conserved nonamer DNA sequence within the recombinational signals of immunoglobulin genes. Complexes with DNA fragments from four kappa light-chain joining (J) segments have the same electrophoretic mobility. Nonamer-containing DNA fragments from heavy-chain and light-chain genes compete for binding. Within the 5'-flanking DNA of the J kappa 4 gene segment, the binding site has been localized to a 27-base-pair interval spanning the nonamer region. The binding activity is recovered as a single peak after ion-exchange chromatography. The site of binding of the protein and its presence in nuclei of lymphoid cells suggest that it may function in the assembly of immunoglobulin genes.
SHEN Xiquan; YANG Guanpin; LIU Yongjian
The 18S ribosomal DNA gene (18S rDNA) sequences (approximately 1300 bp in length) were amplified from the DNA extracted from the free-living marine nematodes collected from the inter-tidal sediment of Qingdao coast in bulk with nematode specific primers. The PCR products were cloned, re-amplified, digested with Rsa I and Hin6Ⅰ restriction endonucleases and separated in agarose gel. Among 17 restriction fragment length types, types 1, 2 and 6 covered 61.2％, 14.4％ and 9.3％ of the clones analyzed, respectively, while the remaining 14 only covered 21 clones, which accounted for 15.1％ of the total. Twenty-four representative clones were sequenced and phylogenetically analyzed by referring to those currently available in RDP and GenBank databases. Although it was hard to assign these sequences to known species or genera due to the lack of the 18S rDNA sequence data of known marine free-living nematodes, the obtained sequences were assigned to the nematodes of Adenophorea. Among them, twelve sequences were close to Pontonema vulgare and Adoncholaimus sp., four to Daptonemaprocerus and two (identical) to Enoplus brevis. Our results showed that free-living marine nematode diversities could be determined by PCR retrieving and analysis of the 18S rDNA sequences and an 18S rDNA sequence could be assigned to a species or a genus only if the 18S rDNA sequences of the free-living marine nematodes were accumulated to some extent.
Full Text Available Abstract Background Artificial transcription factors (ATFs are composed of DNA-binding and functional domains. These domains can be fused together to create proteins that can bind a chosen DNA sequence. To construct a valid ATF, it is necessary to design suitable DNA-binding and functional domains. The Cys2-His2 zinc finger motif is the ideal structural scaffold on which to construct a sequence-specific protein. A20 is a cytoplasmic zinc finger protein that inhibits nuclear factor kappa-B activity and tumor necrosis factor (TNF-mediated programmed cell death. A20 has been shown to prevent TNF-induced cytotoxicity in a variety of cell types including fibroblasts, B lymphocytes, WEHI 164 cells, NIH 3T3 cells and endothelial cells. Results In order to design a zinc finger protein (ZFP structural domain that binds specific target sequences in the A20 gene promoter region, the structure and sequence composition of this promoter were analyzed by bioinformatics methods. The target sequences in the A20 promoter were submitted to the on-line ZF Tools server of the Barbas Laboratory, Scripps Research Institute (TSRI, to obtain a specific 18 bp target sequence and also the amino acid sequence of a ZFP that would bind to it. Sequence characterization and structural modeling of the predicted ZFP were performed by bioinformatics methods. The optimized DNA sequence of this artificial ZFP was recombined into the eukaryotic expression vector pIRES2-EGFP to construct pIRES2-EGFP/ZFP-flag recombinants, and the expression and biological activity of the ZFP were analyzed by RT-PCR, western blotting and EMSA, respectively. The ZFP was designed successfully and exhibited biological activity. Conclusion It is feasible to design specific zinc finger proteins by bioinformatics methods.
Full Text Available Abstract Background After 10-year-use of AFLP (Amplified Fragment Length Polymorphism technology for DNA fingerprinting and mRNA profiling, large repertories of genome- and transcriptome-derived sequences are available in public databases for model, crop and tree species. AFLP marker systems have been and are being extensively exploited for genome scanning and gene mapping, as well as cDNA-AFLP for transcriptome profiling and differentially expressed gene cloning. The evaluation, annotation and classification of genomic markers and expressed transcripts would be of great utility for both functional genomics and systems biology research in plants. This may be achieved by means of the Gene Ontology (GO, consisting in three structured vocabularies (i.e. ontologies describing genes, transcripts and proteins of any organism in terms of their associated cellular component, biological process and molecular function in a species-independent manner. In this paper, the functional annotation of about 8,000 AFLP-derived ESTs retrieved in the NCBI databases was carried out by using GO terminology. Results Descriptive statistics on the type, size and nature of gene sequences obtained by means of AFLP technology were calculated. The gene products associated with mRNA transcripts were then classified according to the three main GO vocabularies. A comparison of the functional content of cDNA-AFLP records was also performed by splitting the sequence dataset into monocots and dicots and by comparing them to all annotated ESTs of Arabidopsis and rice, respectively. On the whole, the statistical parameters adopted for the in silico AFLP-derived transcriptome-anchored sequence analysis proved to be critical for obtaining reliable GO results. Such an exhaustive annotation may offer a suitable platform for functional genomics, particularly useful in non-model species. Conclusion Reliable GO annotations of AFLP-derived sequences can be gathered through the optimization
Full Text Available Background: Leptospirosis is a zoonotic disease in humans and animals, caused by the bacterium Leptospira interrogans. Gene expressing LipL21 is one of the genes identified in the bacterium, existing only in the pathogenic strains. The aim of this study was to cloning and analyzing the sequence of the gene encoding surface lipoprotein, LipL21, in five vaccinal leptospira serovars in Iran. Material and Methods: Pathogenic Leptospira interrogans serovars were cultured in EMJH medium with 10% rabbit serum. After genomic DNA extraction, PCR with specific primers was employed and the resulting product inserted in a vector then transferred into E. Coli DH5&alpha. The recombinant plasmids were finally sent for sequencing. Results: The analysis of gene lipL21 in domestic vaccinal serovars and comparison of them with other serovars in the GenBank database revealed that three vaccinal serovars serjo hardjo, canicola and pomona had 100% similarity with each other and grippotyphosa serovar had the highest difference with the vaccinal serovars. In general, the results showed that this gene is a highly conserved gene in the domestic vaccinal serovars and serovars in the GenBank database with more than 95.7 percent similarity. Conclusion: These results showed that the gene, lipL21, is highly conserved in the vaccinal serovars (similarities > 96.4 %. Therefore, the gene encoding surface protein LipL21 can serve as a useful serologic test with high specificity and sensitivity for diagnosis of leptospirosis in clinical samples and in future as an effective subunit vaccine candidate to be used.
Full Text Available Abstract Background Readthrough fusions across adjacent genes in the genome, or transcription-induced chimeras (TICs, have been estimated using expressed sequence tag (EST libraries to involve 4-6% of all genes. Deep transcriptional sequencing (RNA-Seq now makes it possible to study the occurrence and expression levels of TICs in individual samples across the genome. Methods We performed single-end RNA-Seq on three human prostate adenocarcinoma samples and their corresponding normal tissues, as well as brain and universal reference samples. We developed two bioinformatics methods to specifically identify TIC events: a targeted alignment method using artificial exon-exon junctions within 200,000 bp from adjacent genes, and genomic alignment allowing splicing within individual reads. We performed further experimental verification and characterization of selected TIC and fusion events using quantitative RT-PCR and comparative genomic hybridization microarrays. Results Targeted alignment against artificial exon-exon junctions yielded 339 distinct TIC events, including 32 gene pairs with multiple isoforms. The false discovery rate was estimated to be 1.5%. Spliced alignment to the genome was less sensitive, finding only 18% of those found by targeted alignment in 33-nt reads and 59% of those in 50-nt reads. However, spliced alignment revealed 30 cases of TICs with intervening exons, in addition to distant inversions, scrambled genes, and translocations. Our findings increase the catalog of observed TIC gene pairs by 66%. We verified 6 of 6 predicted TICs in all prostate samples, and 2 of 5 predicted novel distant gene fusions, both private events among 54 prostate tumor samples tested. Expression of TICs correlates with that of the upstream gene, which can explain the prostate-specific pattern of some TIC events and the restriction of the SLC45A3-ELK4 e4-e2 TIC to ERG-negative prostate samples, as confirmed in 20 matched prostate tumor and normal
Full Text Available Gobionine species belonging to the genera Pseudorasbora, Pseudopungtungia, and Pungtungia (Teleostei; Cypriniformes; Cyprinidae have been heavily studied because of problems on taxonomy, threats of extinction, invasion, and human health. Nucleotide sequences of three nuclear genes, that is, recombination activating protein gene 1 (rag1, recombination activating gene 2 (rag2, and early growth response 1 gene (egr1, from Pseudorasbora, Pseudopungtungia, and Pungtungia species residing in China, Japan, and Korea, were analyzed to elucidate their intergeneric and interspecific phylogenetic relationships. In the phylogenetic tree inferred from their multiple gene sequences, Pseudorasbora, Pseudopungtungia and Pungtungia species ramified into three phylogenetically distinct clades; the “tenuicorpa” clade composed of Pseudopungtungia tenuicorpa, the “parva” clade composed of all Pseudorasbora species/subspecies, and the “herzi” clade composed of Pseudopungtungia nigra, and Pungtungia herzi. The genus Pseudorasbora was recovered as monophyletic, while the genus Pseudopungtungia was recovered as polyphyletic. Our phylogenetic result implies the unstable taxonomic status of the genus Pseudopungtungia.
Taniyama, Daisuke; Abe, Yoshihiko; Sakai, Tetsuya; Kikuchi, Takahide; Takahashi, Takashi
Streptococcus canis (Sc) is a zoonotic pathogen that is transferred mainly from companion animals to humans. One of the major virulence factors in Sc is the M-like protein encoded by the scm gene, which is involved in anti-phagocytic activities, as well as the recruitment of plasminogen to the bacterial surface in cooperation with enolase, and the consequent enhancement of bacterial transmigration and survival. This is the first reported human case of uncomplicated bacteremia following a dog bite, caused by Streptococcus canis harboring the scm gene. The similarity of the 16S rRNA from the infecting species to that of the Sc type strain, as well as the amplification of the species-specific cfg gene, encoding a co-hemolysin, was used to confirm the species identity. Furthermore, the isolate was confirmed as sequence type 9. The partial scm gene sequence harbored by the isolate was closely related to those of other two Sc strains. While this isolate did not possess the erm(A), erm(B), or mef(A), macrolide/lincosamide resistance genes, it was not susceptible to azithromycin: its susceptibility was intermediate. Even though human Sc bacteremia is rare, clinicians should be aware of this microorganism, as well as Pasteurella sp., Prevotella sp., and Capnocytophaga sp., when examining and treating patients with fever who maintain close contact with companion animals.
Jung, Jaejoon; Jeong, Haeyoung; Kim, Hyun Ju; Lee, Dong-Woo; Lee, Sang Jun
Ocean sediments are commonly subject to the pollution of various heavy metals. Intracellular heavy metal concentrations in marine microorganisms should be kept within allowable concentrations. Here, we report redundant heavy metal resistance related genes encoding heavy metal-sensing transcriptional regulators (i.e. cadC), heavy metal efflux pumps, and detoxifying enzymes in the complete genome sequence of Bacillus oceanisediminis 2691. By comparing CadC sequences of strain 2691 with those from other bacterial genomes, we demonstrated that each cadC gene located in the chromosome or plasmid of 2691 cells are similar to those of various near or distant microbes, which might shed light on evolutionary trajectories of redundant heavy metal resistance genes. In application aspects, these diverse heavy metal sensing genes can be harnessed as synthetic biological parts, modules, and devices for the development of heavy metal-specific biosensors. Heavy metal bioremediation technologies or platform cells can be also developed based on the marine genomic information of heavy metal resistance and/or detoxification genes in a bacterial isolate from ocean sediments.
Masamura, Noriya; McCallum, John; Khrustaleva, Ludmila; Kenel, Fernand; Pither-Joyce, Meegham; Shono, Jinji; Suzuki, Go; Mukai, Yasuhiko; Yamauchi, Naoki; Shigyo, Masayoshi
Lachrymatory factor synthase (LFS) catalyzes the formation of lachrymatory factor, one of the most distinctive traits of bulb onion (Allium cepa L.). Therefore, we used LFS as a model for a functional gene in a huge genome, and we examined the chromosomal organization of LFS in A. cepa by multiple approaches. The first-level analysis completed the chromosomal assignment of LFS gene to chromosome 5 of A. cepa via the use of a complete set of A. fistulosum-shallot (A. cepa L. Aggregatum group) monosomic addition lines. Subsequent use of an F(2) mapping population from the interspecific cross A. cepa × A. roylei confirmed the assignment of an LFS locus to this chromosome. Sequence comparison of two BAC clones bearing LFS genes, LFS amplicons from diverse germplasm, and expressed sequences from a doubled haploid line revealed variation consistent with duplicated LFS genes. Furthermore, the BAC-FISH study using the two BAC clones as a probe showed that LFS genes are localized in the proximal region of the long arm of the chromosome. These results suggested that LFS in A. cepa is transcribed from at least two loci and that they are localized on chromosome 5.
Braidotti, G; Borthwick, I A; May, B K
The housekeeping enzyme 5-aminolevulinate synthase (ALAS) regulates the supply of heme for respiratory cytochromes. Here we report on the isolation of a genomic clone for the rat ALAS gene. The 5'-flanking region was fused to the chloramphenicol acetyltransferase gene and transient expression analysis revealed the presence of both positive and negative cis-acting sequences. Expression was substantially increased by the inclusion of the first intron located in the 5'-untranslated region. Sequence analysis of the promoter identified two elements at positions -59 and -88 bp with strong similarity to the binding site for nuclear respiratory factor 1 (NRF-1). Gel shift analysis revealed that both NRF-1 elements formed nucleoprotein complexes which could be abolished by an authentic NRF-1 oligomer. Mutagenesis of each NRF-1 motif in the ALAS promoter gave substantially lowered levels of chloramphenicol acetyltransferase expression, whereas mutagenesis of both NRF-1 motifs resulted in the almost complete loss of expression. These results establish that the NRF-1 motifs in the ALAS promoter are critical for promoter activity. NRF-1 binding sites have been identified in the promoters of several nuclear genes encoding mitochondrial proteins concerned with oxidative phosphorylation. The present studies suggest that NRF-1 may co-ordinate the supply of mitochondrial heme with the synthesis of respiratory cytochromes by regulating expression of ALAS. In erythroid cells, NRF-1 may be less important for controlling heme levels since an erythroid ALAS gene is strongly expressed and the promoter for this gene apparently lacks NRF-1 binding sites.
Jia-Wei Liu; Asan; Jun Sun; Sergio Vano-Galvan; Feng-Xia Liu; Xiu-Xiu Wei; Dong-Lai Ma
Background: The dyschromatoses are a group of disorders characterized by simultaneous hyperpigmented macules together with hypopigmented macules.Dyschromatosis universalis hereditaria (DUH) and dyschromatosis symmetrica hereditaria are two major types.While clinical and histological presentations are similar in these two diseases, genetic diagnosis is critical in the differential diagnosis of these entities.Methods: Three patients initially diagnosed with DUH were included.The gene test was carried out by targeted gene sequencing.All mutations detected on ADAR1 and ABCB6 genes were analyzed according to the frequency in control database, the mutation types, and the published evidence to determine the pathogenicity.Results: Family pedigree and clinical presentations were reported in 3 patients from two Chinese families.All patients have prominent cutaneous dyschromatoses involving the whole body without systemic complications.Different pathogenic genes in these patients with similar phenotype were identified: One novel mutation on ADAR1 (c.1325C＞G) and one recurrent mutation in ABCB6 (c.1270T＞C), which successfully distinguished two diseases with the similar phenotype.Conclusion: Targeted gene sequencing is an effective tool for genetic diagnosis in pigmentary skin diseases.
JEN BEVILACQUA; ANDREW HESSE; BRIAN CORMIER; JENNIFER DAVEY; DEVANSHI PATEL; KRITIKA SHANKAR; HONEY V. REDDI
Epilepsy is one of the most common neurological disorders with about 500 genes thought to be involved across the phenotypic spectrum (Busch et al. 2014; Ran et al. 2014), which includes monogenic, multigenic, epistatic and pleiotropic phenotype manifestations (Busch et al. 2014; Thomas et al. 2014), driving the need for a comprehensive diagnostic test. Next-generation sequencing (NGS) allows for the simultaneous investigation of a large number of genes, making it a very attractive option for a condition as diverse as epilepsy at a low cost compared to traditional Sanger sequencing (Lemke et al. 2012; Németh et al. 2013). Our 377 gene epilepsy NGS test was developed to include genes known to cause or have published association with epilepsy and seizure-related disorders. Given the scale of information that is generated, the efficacy of an NGS panel depends on a number of factors, including the genes present on the panel, prebioinformatic and postbioinformatic analysis protocols, as well as reporting criteria, prompting the current study, a retrospective analysis of 305 cases tested for the epilepsy panel.
HAN Xian-jie; WANG Jun-wei; MA Bo
The glycoprotein H (gH) gene homologue of duck plague virus (DPV) was cloned by degenerate polymerase chain reaction (PCR) and sequenced. It was located immediately downstream from the thymidine kinase gene (TK). In addition,the 3'-end of the gene homologue to herpesvirus UL21 was located downstream from the gH gene. DPV gH gene open reading frame (ORF) was 2 505 bp in length and its primary translation product was a polypeptide of 834 amino acids long.It possessed several characteristics of membrane glycoproteins, including an N-terminal hydrophobic signal sequence,an external domain containing eight putative N-linked glycosylation sites, a C-terminal transmembrane domain, and a charged cytoplasmic tail. Comparison with other herpesvirus revealed identities of 20.2, 25.1, 23.0, 23.0, 26.5 and 26.0% with the gH counterparts of the human herpesvirus virus 1 (HSV1), equine herpesvirus 4 (EHV4), bovine herpesvirus 1 (BHV1), pseudorabies virus (PRV), gallid herpesvirus 2 (GHV2) and gallid herpesvirus 3 (GHV3), respectively.
Wang, Yimin; Liu, Xuejuan; Li, Yulin
Microsatellite instability (MSI) is detected in a wide variety of tumors. It is thought that mismatch repair gene mutation or inactivation is the major cause of MSI. Microsatellite sequences are predominantly distributed in intergenic or intronic DNA. However, MSI is found in the exonic sequences of some genes, causing their inactivation. In this report, we searched GenBank for candidate genes containing potential MSI sequences in exonic regions. Twenty seven target genes were selected for MSI analysis. Instability was found in 70% of these genes (14/20) with head and neck squamous cell carcinoma (HNSCC). Interestingly, no instability was detected in mononucleotide repeats in genes or in intergenic sequences. We conclude that instability of mononucleotide repeats is a rare event in HNSCC. High MSI phenotype in young HNSCC patients is limited to noncoding regions only. MSI percentage in HNSCC tumor is closely related to the repeat type, repeat location and patient's age.
Li, Zhao; Hu, Guanghui; Liu, Xiangfeng; Zhou, Yao; Li, Yu; Zhang, Xu; Yuan, Xiaohui; Zhang, Qian; Yang, Deguang; Wang, Tianyu; Zhang, Zhiwu
Originating in a tropical climate, maize has faced great challenges as cultivation has expanded to the majority of the world's temperate zones. In these zones, frost and cold temperatures are major factors that prevent maize from reaching its full yield potential. Among 30 elite maize inbred lines adapted to northern China, we identified two lines of extreme, but opposite, freezing tolerance levels—highly tolerant and highly sensitive. During the seedling stage of these two lines, we used RNA-seq to measure changes in maize whole genome transcriptome before and after freezing treatment. In total, 19,794 genes were expressed, of which 4550 exhibited differential expression due to either treatment (before or after freezing) or line type (tolerant or sensitive). Of the 4550 differently expressed genes, 948 exhibited differential expression due to treatment within line or lines under freezing condition. Analysis of gene ontology found that these 948 genes were significantly enriched for binding functions (DNA binding, ATP binding, and metal ion binding), protein kinase activity, and peptidase activity. Based on their enrichment, literature support, and significant levels of differential expression, 30 of these 948 genes were selected for quantitative real-time PCR (qRT-PCR) validation. The validation confirmed our RNA-Seq-based findings, with squared correlation coefficients of 80% and 50% in the tolerance and sensitive lines, respectively. This study provided valuable resources for further studies to enhance understanding of the molecular mechanisms underlying maize early freezing response and enable targeted breeding strategies for developing varieties with superior frost resistance to achieve yield potential. PMID:27774095
La Scola, Bernard; Zeaiter, Zaher; Khamis, Atieh; Raoult, Didier
The definition of new species is currently based on polyphasic classification that includes both determination of phenotypic characteristics and DNA-DNA homology. However, none of these techniques is convenient for the rapid characterization of fastidious or non-culturable bacteria. Using sequences available in the GenBank database, we compared the similarities of gene fragments among the currently recognized Bartonella species. This comparison led to both the definition of similarity values that discriminated Bartonella at the species level and assessment of the relative discriminatory power of each gene examined. In this perspective, rpoB and gltA were found to be the most potent.
Fulks, K A; Marrs, C F; Stevens, S P; Green, M R
Moraxella bovis EPP63 is able to produce two antigenically distinct pili called Q and I pili (previously called beta and alpha pili). Hybridization studies have shown that the transition between the types is due to inversion of a 2.1-kilobase segment of chromosomal DNA. We present the sequence of a 4.1-kilobase region of cloned DNA spanning the entire inversion region in orientation 1 (Q pilin expressed). Comparison of this sequence with the sequence of the polymerase chain reaction-amplified genomic DNA from orientation 2 (I pilin expressed) allows the site-specific region of recombination to be localized to a 26-base-pair region in which sequence similarity to the left inverted repeat of the Salmonella typhimurium hin system was previously noted. In addition, 50% sequence similarity was seen in a 60-base-pair segment of our sequence to the recombinational enhancer of bacteriophage P1, an inversion system related to the hin system of S. typhimurium. Finally, two open reading frames representing potential genes were identified.
Yan Zhou; Lin Ye; Li Lin; Jun Li; Xuegang Wang; Hao Xu; Yibin Pan; Wei Lin; Wei Tian; Jing Liu; Liping Wei; Jiabin Tang; Siqi Liu; Huanming Yang; Jun Yu; Jian Wang; Michael G. Walker; Xiuqing Zhang; Jun Wang; Songnian Hu; Huayong Xu; Yajun Deng; Jianhai Dong
Expressed Sequence Tag (EST) analysis has pioneered genome-wide gene discovery and expression profiling. In order to establish a gene expression index in the rice cultivar indica, we sequenced and analyzed 86,136 ESTs from nine rice cDNA libraries from the super hybrid cultivar LYP9 and its parental cultivars. We assembled these ESTs into 13,232 contigs and leave 8,976 singletons. Overall, 7,497 sequences were found similar to the existing sequences in GenBank and 14,711 are novel. These sequences are classified by molecular function, biological process and pathways according to the Gene Ontology. We compared our sequenced ESTs with the publicly available 95,000 ESTs from japonica, and found little sequence variation, despite the large difference between genome sequences. We then assembled the combined 173,000 rice ESTs for further analysis. Using the pooled ESTs, we compared gene expression in metabolism pathway between rice and Avabidopsis according to KEGG. We further profiled gene expression patterns in different tis sues, developmental stages, and in a conditional sterile mutant, after checking the libraries are comparable by means of sequence coverage. We also identified some possible library specific genes and a number of enzymes and transcription factors that contribute to rice development.
Martins, C; Fontes, L R; Bueno, O C; Martins, V G
The Asian subterranean termite, Coptotermes gestroi, originally from northeast India through Burma, Thailand, Malaysia, and the Indonesian archipelago, is a major termite pest introduced in several countries around the world, including Brazil. We sequenced the mitochondrial COII gene from individuals representing 23 populations. Phylogenetic analysis of COII gene sequences from this and other studies resulted in two main groups: (1) populations of Cleveland (USA) and four populations of Malaysia and (2) populations of Brazil, four populations of Malaysia, and one population from each of Thailand, Puerto Rico, and Key West (USA). Three new localities are reported here, considerably enlarging the distribution of C. gestroi in Brazil: Campo Grande (state of Mato Grosso do Sul), Itajaí (state of Santa Catarina), and Porto Alegre (state of Rio Grande do Sul).
ZHU Shan-ying; WANG Wen-bing; ZHU Jiang
Clanis bilineata Nucleo Polyhedro Virus (CbNPV) was purified from Clanis bilineata larva. To obtain the molecular information of the virus, the genomic DNA of CbNPV was extracted, and a DNA fragment library of the virus was constructed using shotgun. The positive clones were then sequenced and analyzed. An open-reading frame (ORF) that has high identity with the gp41 gene of most NPVs was found in the library. The gp41 gene of CbNPV is 933 base pair long and encodes a protein of 310 amino acids. The result of the amino acid sequence analysis showed that the CbNPV gp41 has 53-61 and 56-73% identities with Group Ⅰ and Ⅱ NPVs gp41 proteins, respectively. The result indicates that the isolated CbNPV is a novel baculovirus, and the CbNPV shares a much closer relationship with Group Ⅱ NPVs.
Markus H Antwerpen
Full Text Available The zoonotic disease tularemia is caused by the bacterium Francisella tularensis. This pathogen is considered as a category A select agent with potential to be misused in bioterrorism. Molecular typing based on DNA-sequence like canSNP-typing or MLVA has become the accepted standard for this organism. Due to the organism's highly clonal nature, the current typing methods have reached their limit of discrimination for classifying closely related subpopulations within the subspecies F. tularensis ssp. holarctica. We introduce a new gene-by-gene approach, MLST+, based on whole genome data of 15 sequenced F. tularensis ssp. holarctica strains and apply this approach to investigate an epidemic of lethal tularemia among non-human primates in two animal facilities in Germany. Due to the high resolution of MLST+ we are able to demonstrate that three independent clones of this highly infectious pathogen were responsible for these spatially and temporally restricted outbreaks.
Djerbi, Soraya; Lindskog, Mats; Arvestad, Lars; Sterky, Fredrik; Teeri, Tuula T
The genome sequence of Populus trichocarpa was screened for genes encoding cellulose synthases by using full-length cDNA sequences and ESTs previously identified in the tissue specific cDNA libraries of other poplars. The data obtained revealed 18 distinct CesA gene sequences in P. trichocarpa. The identified genes were grouped in seven gene pairs, one group of three sequences and one single gene. Evidence from gene expression studies of hybrid aspen suggests that both copies of at least one pair, CesA3-1 and CesA3-2, are actively transcribed. No sequences corresponding to the gene pair, CesA6-1 and CesA6-2, were found in Arabidopsis or hybrid aspen, while one homologous gene has been identified in the rice genome and an active transcript in Populus tremuloides. A phylogenetic analysis suggests that the CesA genes previously associated with secondary cell wall synthesis originate from a single ancestor gene and group in three distinct subgroups. The newly identified copies of CesA genes in P. trichocarpa give rise to a number of new questions concerning the mechanism of cellulose synthesis in trees.
Silvestri, Valentina; Zelli, Veronica; Valentini, Virginia; Rizzolo, Piera; Navazio, Anna Sara; Coppa, Anna; Agata, Simona; Oliani, Cristina; Barana, Daniela; Castrignanò, Tiziana; Viel, Alessandra; Russo, Antonio; Tibiletti, Maria Grazia; Zanna, Ines; Masala, Giovanna; Cortesi, Laura; Manoukian, Siranoush; Azzollini, Jacopo; Peissel, Bernard; Bonanni, Bernardo; Peterlongo, Paolo; Radice, Paolo; Palli, Domenico; Giannini, Giuseppe; Chillemi, Giovanni; Montagna, Marco; Ottini, Laura
Male breast cancer (MBC) is a rare disease whose etiology appears to be largely associated with genetic factors. BRCA1 and BRCA2 mutations account for about 10% of all MBC cases. Thus, a fraction of MBC cases are expected to be due to genetic factors not yet identified. To further explain the genetic susceptibility for MBC, whole-exome sequencing (WES) and targeted gene sequencing were applied to high-risk, BRCA1/2 mutation-negative MBC cases. Germ-line DNA of 1 male and 2 female BRCA1/2 mutation-negative breast cancer (BC) cases from a pedigree showing a first-degree family history of MBC was analyzed with WES. Targeted gene sequencing for the validation of WES results was performed for 48 high-risk, BRCA1/2 mutation-negative MBC cases from an Italian multicenter study of MBC. A case-control series of 433 BRCA1/2 mutation-negative MBC and female breast cancer (FBC) cases and 849 male and female controls was included in the study. WES in the family identified the partner and localizer of BRCA2 (PALB2) c.419delA truncating mutation carried by the proband, her father, and her paternal uncle (all affected with BC) and the N-acetyltransferase 1 (NAT1) c.97C>T nonsense mutation carried by the proband's maternal aunt. Targeted PALB2 sequencing detected the c.1984A>T nonsense mutation in 1 of the 48 BRCA1/2 mutation-negative MBC cases. NAT1 c.97C>T was not found in the case-control series. These results add strength to the evidence showing that PALB2 is involved in BC risk for both sexes and indicate that consideration should be given to clinical testing of PALB2 for BRCA1/2 mutation-negative families with multiple MBC and FBC cases. Cancer 2017;123:210-218. © 2016 American Cancer Society. © 2016 American Cancer Society.
Goubin, Gerard; Goldman, Debra S.; Luce, Judith; Neiman, Paul E.; Cooper, Geoffrey M.
A transforming gene detected by transfection of chicken B-cell lymphoma DNA has been isolated by molecular cloning. It is homologous to a conserved family of sequences present in normal chicken and human DNAs but is not related to transforming genes of acutely transforming retroviruses. The nucleotide sequence of the cloned transforming gene suggests that it encodes a protein that is partially homologous to the amino terminus of transferrin and related proteins although only about one tenth the size of transferrin.
This Report concludes the DOE Human Genome Program project, ``Identification of Genes in Anonymous DNA Sequence.`` The central goals of this project have been (1) understanding the problem of identifying genes in anonymous sequences, and (2) development of tools, primarily the automated identification system gm, for identifying genes. The activities supported under the previous award are summarized here to provide a single complete report on the activities supported as part of the project from its inception to its completion.
Bangsborg, Jette Marie; Cianciotto, N P; Hindersson, P
After the demonstration of analogs of the Legionella pneumophila macrophage infectivity potentiator (Mip) protein in other Legionella species, the Legionella micdadei mip gene was cloned and expressed in Escherichia coli. DNA sequence analysis of the L. micdadei mip gene contained in the plasmid p...... homology with the mip-like genes of several Legionella species. Furthermore, amino acid sequence comparisons revealed significant homology to two eukaryotic proteins with isomerase activity (FK506-binding proteins)....
Full Text Available Abstract Background Eucalyptus species are among the most planted hardwoods in the world because of their rapid growth, adaptability and valuable wood properties. The development and integration of genomic resources into breeding practice will be increasingly important in the decades to come. Bacterial artificial chromosome (BAC libraries are key genomic tools that enable positional cloning of important traits, synteny evaluation, and the development of genome framework physical maps for genetic linkage and genome sequencing. Results We describe the construction and characterization of two deep-coverage BAC libraries EG_Ba and EG_Bb obtained from nuclear DNA fragments of E. grandis (clone BRASUZ1 digested with HindIII and BstYI, respectively. Genome coverages of 17 and 15 haploid genome equivalents were estimated for EG_Ba and EG_Bb, respectively. Both libraries contained large inserts, with average sizes ranging from 135 Kb (Eg_Bb to 157 Kb (Eg_Ba, very low extra-nuclear genome contamination providing a probability of finding a single copy gene ≥ 99.99%. Libraries were screened for the presence of several genes of interest via hybridizations to high-density BAC filters followed by PCR validation. Five selected BAC clones were sequenced and assembled using the Roche GS FLX technology providing the whole sequence of the E. grandis chloroplast genome, and complete genomic sequences of important lignin biosynthesis genes. Conclusions The two E. grandis BAC libraries described in this study represent an important milestone for the advancement of Eucalyptus genomics and forest tree research. These BAC resources have a highly redundant genome coverage (> 15×, contain large average inserts and have a very low percentage of clones with organellar DNA or empty vectors. These publicly available BAC libraries are thus suitable for a broad range of applications in genetic and genomic research in Eucalyptus and possibly in related species of Myrtaceae
Butler Margaret I
Full Text Available Abstract Background Inteins are self-splicing protein elements. They are translated as inserts within host proteins that excise themselves and ligate the flanking portions of the host protein (exteins with a peptide bond. They are encoded as in-frame insertions within the genes for the host proteins. Inteins are found in all three domains of life and in viruses, but have a very sporadic distribution. Only a small number of intein coding sequences have been identified in eukaryotic nuclear genes, and all of these are from ascomycete or basidiomycete fungi. Results We identified seven intein coding sequences within nuclear genes coding for the second largest subunits of RNA polymerase. These sequences were found in diverse eukaryotes: one is in the second largest subunit of RNA polymerase I (RPA2 from the ascomycete fungus Phaeosphaeria nodorum, one is in the RNA polymerase III (RPC2 of the slime mould Dictyostelium discoideum and four intein coding sequences are in RNA polymerase II genes (RPB2, one each from the green alga Chlamydomonas reinhardtii, the zygomycete fungus Spiromyces aspiralis and the chytrid fungi Batrachochytrium dendrobatidis and Coelomomyces stegomyiae. The remaining intein coding sequence is in a viral relic embedded within the genome of the oomycete Phytophthora ramorum. The Chlamydomonas and Dictyostelium inteins are the first nuclear-encoded inteins found outside of the fungi. These new inteins represent a unique dataset: they are found in homologous proteins that form a paralogous group. Although these paralogues diverged early in eukaryotic evolution, their sequences can be aligned over most of their length. The inteins are inserted at multiple distinct sites, each of which corresponds to a highly conserved region of RNA polymerase. This dataset supports earlier work suggesting that inteins preferentially occur in highly conserved regions of their host proteins. Conclusion The identification of these new inteins
Hu, Jiaqing; Yang, Dandan; Chen, Wei; Li, Chuanhao; Wang, Yandong; Zeng, Yongqing; Wang, Hui
There is little genomic information regarding gene expression differences at the whole blood transcriptome level of different pig breeds at the neonatal stage. To solve this, we characterized differentially expressed genes (DEGs) in the whole blood of Dapulian (DPL) and Landrace piglets using RNA-seq (RNA-sequencing) technology. In this study, 83 DEGs were identified between the two breeds. Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analyses identified immune response and metabolism as the most commonly enriched terms and pathways in the DEGs. Genes related to immunity and lipid metabolism were more highly expressed in the DPL piglets, while genes related to body growth were more highly expressed in the Landrace piglets. Additionally, the DPL piglets had twofold more single nucleotide polymorphisms (SNPs) and alternative splicing (AS) than the Landrace piglets. These results expand our knowledge of the genes transcribed in the piglet whole blood of two breeds and provide a basis for future research of the molecular mechanisms underlying the piglet differences.
Hutcherson, J A; Gogeneni, H; Yoder-Himes, D; Hendrickson, E L; Hackett, M; Whiteley, M; Lamont, R J; Scott, D A
Porphyromonas gingivalis is a Gram-negative anaerobe and keystone periodontal pathogen. A mariner transposon insertion mutant library has recently been used to define 463 genes as putatively essential for the in vitro growth of P. gingivalis ATCC 33277 in planktonic culture (Library 1). We have independently generated a transposon insertion mutant library (Library 2) for the same P. gingivalis strain and herein compare genes that are putatively essential for in vitro growth in complex media, as defined by both libraries. In all, 281 genes (61%) identified by Library 1 were common to Library 2. Many of these common genes are involved in fundamentally important metabolic pathways, notably pyrimidine cycling as well as lipopolysaccharide, peptidoglycan, pantothenate and coenzyme A biosynthesis, and nicotinate and nicotinamide metabolism. Also in common are genes encoding heat-shock protein homologues, sigma factors, enzymes with proteolytic activity, and the majority of sec-related protein export genes. In addition to facilitating a better understanding of critical physiological processes, transposon-sequencing technology has the potential to identify novel strategies for the control of P. gingivalis infections. Those genes defined as essential by two independently generated TnSeq mutant libraries are likely to represent particularly attractive therapeutic targets.
Full Text Available There is little genomic information regarding gene expression differences at the whole blood transcriptome level of different pig breeds at the neonatal stage. To solve this, we characterized differentially expressed genes (DEGs in the whole blood of Dapulian (DPL and Landrace piglets using RNA-seq (RNA-sequencing technology. In this study, 83 DEGs were identified between the two breeds. Gene Ontology (GO and Kyoto Encyclopedia of Genes and Genomes (KEGG pathway analyses identified immune response and metabolism as the most commonly enriched terms and pathways in the DEGs. Genes related to immunity and lipid metabolism were more highly expressed in the DPL piglets, while genes related to body growth were more highly expressed in the Landrace piglets. Additionally, the DPL piglets had twofold more single nucleotide polymorphisms (SNPs and alternative splicing (AS than the Landrace piglets. These results expand our knowledge of the genes transcribed in the piglet whole blood of two breeds and provide a basis for future research of the molecular mechanisms underlying the piglet differences.
Hyun, Tae Kyung; Rim, Yeonggil; Jang, Hui-Jeong; Kim, Cheol Hong; Park, Jongsun; Kumar, Ritesh; Lee, Sunghoon; Kim, Byung Chul; Bhak, Jong; Nguyen-Quoc, Binh; Kim, Seon-Won; Lee, Sang Yeol; Kim, Jae-Yean
The ripe fruit of Momordica cochinchinensis Spreng, known as gac, is featured by very high carotenoid content. Although this plant might be a good resource for carotenoid metabolic engineering, so far, the genes involved in the carotenoid metabolic pathways in gac were unidentified due to lack of genomic information in the public database. In order to expedite the process of gene discovery, we have undertaken Illumina deep sequencing of mRNA prepared from aril of gac fruit. From 51,446,670 high-quality reads, we obtained 81,404 assembled unigenes with average length of 388 base pairs. At the protein level, gac aril transcripts showed about 81.5% similarity with cucumber proteomes. In addition 17,104 unigenes have been assigned to specific metabolic pathways in Kyoto Encyclopedia of Genes and Genomes, and all of known enzymes involved in terpenoid backbones biosynthetic and carotenoid biosynthetic pathways were also identified in our library. To analyze the relationship between putative carotenoid biosynthesis genes and alteration of carotenoid content during fruit ripening, digital gene expression analysis was performed on three different ripening stages of aril. This study has revealed putative phytoene synthase, 15-cis-phytone desaturase, zeta-carotene desaturase, carotenoid isomerase and lycopene epsilon cyclase might be key factors for controlling carotenoid contents during aril ripening. Taken together, this study has also made availability of a large gene database. This unique information for gac gene discovery would be helpful to facilitate functional studies for improving carotenoid quantities.
Ceccarelli, A; Zhukovskaya, N; Kawata, T; Bozzaro, S; Williams, J
The ecmB gene of Dictyostelium is expressed at culmination both in the prestalk cells that enter the stalk tube and in ancillary stalk cell structures such as the basal disc. Stalk tube-specific expression is regulated by sequence elements within the cap-site proximal part of the promoter, the stalk tube (ST) promoter region. Dd-STATa, a member of the STAT transcription factor family, binds to elements present in the ST promoter-region and represses transcription prior to entry into the stalk tube. We have characterised an activatory DNA sequence element, that lies distal to the repressor elements and that is both necessary and sufficient for expression within the stalk tube. We have mapped this activator to a 28 nucleotide region (the 28-mer) within which we have identified a GA-containing sequence element that is required for efficient gene transcription. The Dd-STATa protein binds to the 28-mer in an in vitro binding assay, and binding is dependent upon the GA-containing sequence. However, the ecmB gene is expressed in a Dd-STATa null mutant, therefore Dd-STATa cannot be responsible for activating the 28-mer in vivo. Instead, we identified a distinct 28-mer binding activity in nuclear extracts from the Dd-STATa null mutant, the activity of this GA binding activity being largely masked in wild type extracts by the high affinity binding of the Dd-STATa protein. We suggest, that in addition to the long range repression exerted by binding to the two known repressor sites, Dd-STATa inhibits transcription by direct competition with this putative activator for binding to the GA sequence.
Francis Sheila E
Full Text Available Abstract Background Most members of the serpin family of proteins are potent, irreversible inhibitors of specific serine or cysteine proteinases. Inhibitory serpins are distinguished from members of other families of proteinase inhibitors by their metastable structure and unique suicide-substrate mechanism. Animal serpins exert control over a remarkable diversity of physiological processes including blood coagulation, fibrinolysis, innate immunity and aspects of development. Relatively little is known about the complement of serpin genes in plant genomes and the biological functions of plant serpins. Results A structurally refined amino-acid sequence alignment of the 14 full-length serpins encoded in the genome of the japonica rice Oryza sativa cv. Nipponbare (a monocot showed a diversity of reactive-centre sequences (which largely determine inhibitory specificity and a low degree of identity with those of serpins in Arabidopsis (a eudicot. A new convenient and functionally informative nomenclature for plant serpins in which the reactive-centre sequence is incorporated into the serpin name was developed and applied to the rice serpins. A phylogenetic analysis of the rice serpins provided evidence for two main clades and a number of relatively recent gene duplications. Transcriptional analysis showed vastly different levels of basal expression among eight selected rice serpin genes in callus tissue, during seedling development, among vegetative tissues of mature plants and throughout seed development. The gene OsSRP-LRS (Os03g41419, encoding a putative orthologue of Arabidopsis AtSerpin1 (At1g47710, was expressed ubiquitously and at high levels. The second most highly expressed serpin gene was OsSRP-PLP (Os11g11500, encoding a non-inhibitory serpin with a surprisingly well-conserved reactive-centre loop (RCL sequence among putative orthologues in other grass species. Conclusions The diversity of reactive-centre sequences among the putatively
This issue includes six articles that develop and apply statistical methods for the analysis of gene sequencing data of different types. The methods are tailored to the different data types and, in each case, lead to biological insights not readily identified without the use of statistical methods. A common feature in all articles is the development of methods for analyzing simultaneously data of different types (e.g., genotype, phenotype, pedigree, etc.); that is, using data of one type to i...
Full Text Available Internal symmetry is commonly observed in the majority of fundamental protein folds. Meanwhile, sufficient evidence suggests that nascent polypeptide chains of proteins have the potential to start the co-translational folding process and this process allows mRNA to contain additional information on protein structure. In this paper, we study the relationship between gene sequences and protein structures from the viewpoint of symmetry to explore how gene sequences code for structural symmetry in proteins. We found that, for a set of two-fold symmetric proteins from left-handed beta-helix fold, intragenic symmetry always exists in their corresponding gene sequences. Meanwhile, codon usage bias and local mRNA structure might be involved in modulating translation speed for the fo