Mohanty, Sujata; Khanna, Radhika
Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.
Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M
The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.
Zakham, F.; Belayachi, L.; Ussery, David
. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length...... defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene...... the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str...
Czaban, Adrian; Byrne, Stephen; Sharma, Sapna
, winter hardiness, drought tolerance and resistance to grazing. In this study we have sequenced and assembled the low copy fraction of the genomes of Lolium westerwoldicum, Lolium multiflorum, Festuca pratensis and Lolium temulentum. We have also generated de-novo transcriptome assemblies for each species......, and these have aided in the annotation of the genomic sequence. Using this data we were able to generate annotated assemblies of the gene rich regions of the four species to complement the already sequenced Lolium perenne genome. Using these gene models we have identified orthologous genes between the species...
Full Text Available Qat (Catha edulis, Celastraceae is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA genes, 8 ribosomal RNA (rRNA genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae.
Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang
Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128
Full Text Available Abstract Background Hookworms, infecting over one billion people, are the mostly closely related major human parasites to the model nematode Caenorhabditis elegans. Applying genomics techniques to these species, we analyzed 3,840 and 3,149 genes from Ancylostoma caninum and A. ceylanicum. Results Transcripts originated from libraries representing infective L3 larva, stimulated L3, arrested L3, and adults. Most genes are represented in single stages including abundant transcripts like hsp-20 in infective L3 and vit-3 in adults. Over 80% of the genes have homologs in C. elegans, and nearly 30% of these were with observable RNA interference phenotypes. Homologies were identified to nematode-specific and clade V specific gene families. To study the evolution of hookworm genes, 574 A. caninum / A. ceylanicum orthologs were identified, all of which were found to be under purifying selection with distribution ratios of nonsynonymous to synonymous amino acid substitutions similar to that reported for C. elegans / C. briggsae orthologs. The phylogenetic distance between A. caninum and A. ceylanicum is almost identical to that for C. elegans / C. briggsae. Conclusion The genes discovered should substantially accelerate research toward better understanding of the parasites' basic biology as well as new therapies including vaccines and novel anthelmintics.
Yu, Jingyin; Tehrim, Sadia; Zhang, Fengqi; Tong, Chaobo; Huang, Junyan; Cheng, Xiaohui; Dong, Caihua; Zhou, Yanqiu; Qin, Rui; Hua, Wei; Liu, Shengyi
Plant disease resistance (R) genes with the nucleotide binding site (NBS) play an important role in offering resistance to pathogens. The availability of complete genome sequences of Brassica oleracea and Brassica rapa provides an important opportunity for researchers to identify and characterize NBS-encoding R genes in Brassica species and to compare with analogues in Arabidopsis thaliana based on a comparative genomics approach. However, little is known about the evolutionary fate of NBS-encoding genes in the Brassica lineage after split from A. thaliana. Here we present genome-wide analysis of NBS-encoding genes in B. oleracea, B. rapa and A. thaliana. Through the employment of HMM search and manual curation, we identified 157, 206 and 167 NBS-encoding genes in B. oleracea, B. rapa and A. thaliana genomes, respectively. Phylogenetic analysis among 3 species classified NBS-encoding genes into 6 subgroups. Tandem duplication and whole genome triplication (WGT) analyses revealed that after WGT of the Brassica ancestor, NBS-encoding homologous gene pairs on triplicated regions in Brassica ancestor were deleted or lost quickly, but NBS-encoding genes in Brassica species experienced species-specific gene amplification by tandem duplication after divergence of B. rapa and B. oleracea. Expression profiling of NBS-encoding orthologous gene pairs indicated the differential expression pattern of retained orthologous gene copies in B. oleracea and B. rapa. Furthermore, evolutionary analysis of CNL type NBS-encoding orthologous gene pairs among 3 species suggested that orthologous genes in B. rapa species have undergone stronger negative selection than those in B .oleracea species. But for TNL type, there are no significant differences in the orthologous gene pairs between the two species. This study is first identification and characterization of NBS-encoding genes in B. rapa and B. oleracea based on whole genome sequences. Through tandem duplication and whole genome
Full Text Available Aconitum (Ranunculaceae consists of approximately 400 species distributed in the temperate regions of the northern hemisphere. Many species are well-known herbs, mainly used for analgesia and anti-inflammatory purposes. This genus is well represented in China and has gained widespread attention for its toxicity and detoxification properties. In southwestern China, several Aconitum species, called ‘Dula’ in the Yi Nationality, were often used to control the poisonous effects of other Aconitum plants. In this study, the complete chloroplast (cp genomes of these species were determined for the first time through Illumina paired-end sequencing. Our results indicate that their cp genomes ranged from 151,214 bp (A. episcopale to 155,769 bp (A. delavayi in length. A total of 111–112 unique genes were identified, including 85 protein-coding genes, 36–37 tRNA genes and eight ribosomal RNA genes (rRNA. We also analyzed codon usage, IR expansion or contraction and simple sequence repeats in the cp genomes. Eight variable regions were identified and these may potentially be useful as specific DNA barcodes for species identification of Aconitum. Phylogenetic analysis revealed that all five studied species formed a new clade and were resolved with 100% bootstrap support. This study will provide genomic resources and potential plastid markers for DNA barcoding, further taxonomy and germplasm exploration of Aconitum.
Meng, Jing; Li, Xuepei; Li, Hongtao; Yang, Junbo; Wang, Hong; He, Jun
Aconitum (Ranunculaceae) consists of approximately 400 species distributed in the temperate regions of the northern hemisphere. Many species are well-known herbs, mainly used for analgesia and anti-inflammatory purposes. This genus is well represented in China and has gained widespread attention for its toxicity and detoxification properties. In southwestern China, several Aconitum species, called ‘Dula’ in the Yi Nationality, were often used to control the poisonous effects of other Aconitum plants. In this study, the complete chloroplast (cp) genomes of these species were determined for the first time through Illumina paired-end sequencing. Our results indicate that their cp genomes ranged from 151,214 bp ( A. episcopale ) to 155,769 bp ( A. delavayi ) in length. A total of 111⁻112 unique genes were identified, including 85 protein-coding genes, 36⁻37 tRNA genes and eight ribosomal RNA genes (rRNA). We also analyzed codon usage, IR expansion or contraction and simple sequence repeats in the cp genomes. Eight variable regions were identified and these may potentially be useful as specific DNA barcodes for species identification of Aconitum . Phylogenetic analysis revealed that all five studied species formed a new clade and were resolved with 100% bootstrap support. This study will provide genomic resources and potential plastid markers for DNA barcoding, further taxonomy and germplasm exploration of Aconitum .
Asaf, Sajjad; Khan, Abdul Latif; Khan, Abdur Rahim; Waqas, Muhammad; Kang, Sang-Mo; Khan, Muhammad Aaqil; Shahzad, Raheem; Seo, Chang-Woo; Shin, Jae-Ho; Lee, In-Jung
Oryza minuta (Poaceae family) is a tetraploid wild relative of cultivated rice with a BBCC genome. O. minuta has the potential to resist against various pathogenic diseases such as bacterial blight (BB), white backed planthopper (WBPH) and brown plant hopper (BPH). Here, we sequenced and annotated the complete mitochondrial genome of O. minuta. The mtDNA genome is 515,022 bp, containing 60 protein coding genes, 31 tRNA genes and two rRNA genes. The mitochondrial genome organization and the gene content at the nucleotide level are highly similar (89%) to that of O. rufipogon. Comparison with other related species revealed that most of the genes with known function are conserved among the Poaceae members. Similarly, O. minuta mt genome shared 24 protein-coding genes, 15 tRNA genes and 1 ribosomal RNA gene with other rice species (indica and japonica). The evolutionary relationship and phylogenetic analysis revealed that O. minuta is more closely related to O. rufipogon than to any other related species. Such studies are essential to understand the evolutionary divergence among species and analyze common gene pools to combat risks in the current scenario of a changing environment.
Crkvenjakov, R.; Drmanac, R.
The subject of this paper is the definition of species based on the assumption that genome is the fundamental level for the origin and maintenance of biological diversity. For this view to be logically consistent it is necessary to assume the existence and operation of the new law which we call genome law. For this reason the genome law is included in the explanation of species phenomenon presented here even if its precise formulation and elaboration are left for the future. The intellectual underpinnings of this definition can be traced to Goldschmidt. We wish to explore some philosophical aspects of the definition of species in terms of the genome. The point of proposing the definition on these grounds is that any real advance in evolutionary theory has to be correct in both its philosophy and its science.
Wang, Liyuan; Xing, Huixian; Yuan, Yanchao; Wang, Xianlin; Saeed, Muhammad; Tao, Jincai; Feng, Wei; Zhang, Guihua; Song, Xianliang; Sun, Xuezhen
Codon usage bias (CUB) is an important evolutionary feature in a genome which provides important information for studying organism evolution, gene function and exogenous gene expression. The CUB and its shaping factors in the nuclear genomes of four sequenced cotton species, G. arboreum (A2), G. raimondii (D5), G. hirsutum (AD1) and G. barbadense (AD2) were analyzed in the present study. The effective number of codons (ENC) analysis showed the CUB was weak in these four species and the four subgenomes of the two tetraploids. Codon composition analysis revealed these four species preferred to use pyrimidine-rich codons more frequently than purine-rich codons. Correlation analysis indicated that the base content at the third position of codons affect the degree of codon preference. PR2-bias plot and ENC-plot analyses revealed that the CUB patterns in these genomes and subgenomes were influenced by combined effects of translational selection, directional mutation and other factors. The translational selection (P2) analysis results, together with the non-significant correlation between GC12 and GC3, further revealed that translational selection played the dominant role over mutation pressure in the codon usage bias. Through relative synonymous codon usage (RSCU) analysis, we detected 25 high frequency codons preferred to end with T or A, and 31 low frequency codons inclined to end with C or G in these four species and four subgenomes. Finally, 19 to 26 optimal codons with 19 common ones were determined for each species and subgenomes, which preferred to end with A or T. We concluded that the codon usage bias was weak and the translation selection was the main shaping factor in nuclear genes of these four cotton genomes and four subgenomes.
Johri, Parul; Krenek, Sascha; Marinov, Georgi K; Doak, Thomas G; Berendonk, Thomas U; Lynch, Michael
Population-genomic analyses are essential to understanding factors shaping genomic variation and lineage-specific sequence constraints. The dearth of such analyses for unicellular eukaryotes prompted us to assess genomic variation in Paramecium, one of the most well-studied ciliate genera. The Paramecium aurelia complex consists of ∼15 morphologically indistinguishable species that diverged subsequent to two rounds of whole-genome duplications (WGDs, as long as 320 MYA) and possess extremely streamlined genomes. We examine patterns of both nuclear and mitochondrial polymorphism, by sequencing whole genomes of 10-13 worldwide isolates of each of three species belonging to the P. aurelia complex: P. tetraurelia, P. biaurelia, P. sexaurelia, as well as two outgroup species that do not share the WGDs: P. caudatum and P. multimicronucleatum. An apparent absence of global geographic population structure suggests continuous or recent dispersal of Paramecium over long distances. Intergenic regions are highly constrained relative to coding sequences, especially in P. caudatum and P. multimicronucleatum that have shorter intergenic distances. Sequence diversity and divergence are reduced up to ∼100-150 bp both upstream and downstream of genes, suggesting strong constraints imposed by the presence of densely packed regulatory modules. In addition, comparison of sequence variation at non-synonymous and synonymous sites suggests similar recent selective pressures on paralogs within and orthologs across the deeply diverging species. This study presents the first genome-wide population-genomic analysis in ciliates and provides a valuable resource for future studies in evolutionary and functional genetics in Paramecium. © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. All rights reserved. For permissions, please e-mail: firstname.lastname@example.org.
Full Text Available Insect mitochondrial genomes (mitogenomes are of great interest in exploring molecular evolution, phylogenetics and population genetics. Only two mitogenomes have been previously released in the insect group Aphididae, which consists of about 5,000 known species including some agricultural, forestry and horticultural pests. Here we report the complete 16,317 bp mitogenome of Cavariella salicicola and two nearly complete mitogenomes of Aphis glycines and Pterocomma pilosum. We also present a first comparative analysis of mitochondrial genomes of aphids. Results showed that aphid mitogenomes share conserved genomic organization, nucleotide and amino acid composition, and codon usage features. All 37 genes usually present in animal mitogenomes were sequenced and annotated. The analysis of gene evolutionary rate revealed the lowest and highest rates for COI and ATP8, respectively. A unique repeat region exclusively in aphid mitogenomes, which included variable numbers of tandem repeats in a lineage-specific manner, was highlighted for the first time. This region may have a function as another origin of replication. Phylogenetic reconstructions based on protein-coding genes and the stem-loop structures of control regions confirmed a sister relationship between Cavariella and pterocommatines. Current evidence suggest that pterocommatines could be formally transferred into Macrosiphini. Our paper also offers methodological instructions for obtaining other Aphididae mitochondrial genomes.
Evandro Vagner Tambarussi
Full Text Available The availability of chloroplast genome (cpDNA sequences of Atropa belladonna, Nicotiana sylvestris, N.tabacum, N. tomentosiformis, Solanum bulbocastanum, S. lycopersicum and S. tuberosum, which are Solanaceae species,allowed us to analyze the organization of cpSSRs in their genic and intergenic regions. In general, the number of cpSSRs incpDNA ranged from 161 in S. tuberosum to 226 in N. tabacum, and the number of intergenic cpSSRs was higher than geniccpSSRs. The mononucleotide repeats were the most frequent in studied species, but we also identified di-, tri-, tetra-, pentaandhexanucleotide repeats. Multiple alignments of all cpSSRs sequences from Solanaceae species made the identification ofnucleotide variability possible and the phylogeny was estimated by maximum parsimony. Our study showed that the plastomedatabase can be exploited for phylogenetic analysis and biotechnological approaches.
Canals, Rocio; McClelland, Michael; Santiviago, Carlos A.; Andrews-Polymenis, Helene
Progress in the study of Salmonella survival, colonization, and virulence has increased rapidly with the advent of complete genome sequencing and higher capacity assays for transcriptomic and proteomic analysis. Although many of these techniques have yet to be used to directly assay Salmonella growth on foods, these assays are currently in use to determine Salmonella factors necessary for growth in animal models including livestock animals and in in vitro conditions that mimic many different environments. As sequencing of the Salmonella genome and microarray analysis have revolutionized genomics and transcriptomics of salmonellae over the last decade, so are new high-throughput sequencing technologies currently accelerating the pace of our studies and allowing us to approach complex problems that were not previously experimentally tractable.
Gilchrist, Anthony Stuart; Shearman, Deborah C A; Frommer, Marianne; Raphael, Kathryn A; Deshpande, Nandan P; Wilkins, Marc R; Sherwin, William B; Sved, John A
The tephritid fruit flies include a number of economically important pests of horticulture, with a large accumulated body of research on their biology and control. Amongst the Tephritidae, the genus Bactrocera, containing over 400 species, presents various species groups of potential utility for genetic studies of speciation, behaviour or pest control. In Australia, there exists a triad of closely-related, sympatric Bactrocera species which do not mate in the wild but which, despite distinct morphologies and behaviours, can be force-mated in the laboratory to produce fertile hybrid offspring. To exploit the opportunities offered by genomics, such as the efficient identification of genetic loci central to pest behaviour and to the earliest stages of speciation, investigators require genomic resources for future investigations. We produced a draft de novo genome assembly of Australia's major tephritid pest species, Bactrocera tryoni. The male genome (650-700 Mbp) includes approximately 150 Mb of interspersed repetitive DNA sequences and 60 Mb of satellite DNA. Assessment using conserved core eukaryotic sequences indicated 98% completeness. Over 16,000 MAKER-derived gene models showed a large degree of overlap with other Dipteran reference genomes. The sequence of the ribosomal RNA transcribed unit was also determined. Unscaffolded assemblies of B. neohumeralis and B. jarvisi were then produced; comparison with B. tryoni showed that the species are more closely related than any Drosophila species pair. The similarity of the genomes was exploited to identify 4924 potentially diagnostic indels between the species, all of which occur in non-coding regions. This first draft B. tryoni genome resembles other dipteran genomes in terms of size and putative coding sequences. For all three species included in this study, we have identified a comprehensive set of non-redundant repetitive sequences, including the ribosomal RNA unit, and have quantified the major satellite DNA
Full Text Available BACKGROUND: The genus Camellia, belonging to the family Theaceae, is economically important group in flowering plants. Frequent interspecific hybridization together with polyploidization has made them become taxonomically "difficult taxa". The DNA content is often used to measure genome size variation and has largely advanced our understanding of plant evolution and genome variation. The goals of this study were to investigate patterns of interspecific and intraspecific variation of DNA contents and further explore genome size evolution in a phylogenetic context of the genus. METHODOLOGY/PRINCIPAL FINDINGS: The DNA amount in the genus was determined by using propidium iodide flow cytometry analysis for a total of 139 individual plants representing almost all sections of the two subgenera, Camellia and Thea. An improved WPB buffer was proven to be suitable for the Camellia species, which was able to counteract the negative effects of secondary metabolite and generated high-quality results with low coefficient of variation values (CV <5%. Our results showed trivial effects on different tissues of flowers, leaves and buds as well as cytosolic compounds on the estimation of DNA amount. The DNA content of C. sinensis var. assamica was estimated to be 1C = 3.01 pg by flow cytometric analysis, which is equal to a genome size of about 2940 Mb. CONCLUSION: Intraspecific and interspecific variations were observed in the genus Camellia, and as expected, the latter was larger than the former. Our study suggests a directional trend of increasing genome size in the genus Camellia probably owing to the frequent polyploidization events.
Fouts, Derrick E; Matthias, Michael A; Adhikarla, Haritha; Adler, Ben; Amorim-Santos, Luciane; Berg, Douglas E; Bulach, Dieter; Buschiazzo, Alejandro; Chang, Yung-Fu; Galloway, Renee L; Haake, David A; Haft, Daniel H; Hartskeerl, Rudy; Ko, Albert I; Levett, Paul N; Matsunaga, James; Mechaly, Ariel E; Monk, Jonathan M; Nascimento, Ana L T; Nelson, Karen E; Palsson, Bernhard; Peacock, Sharon J; Picardeau, Mathieu; Ricaldi, Jessica N; Thaipandungpanit, Janjira; Wunder, Elsio A; Yang, X Frank; Zhang, Jun-Jie; Vinetz, Joseph M
Leptospirosis, caused by spirochetes of the genus Leptospira, is a globally widespread, neglected and emerging zoonotic disease. While whole genome analysis of individual pathogenic, intermediately pathogenic and saprophytic Leptospira species has been reported, comprehensive cross-species genomic comparison of all known species of infectious and non-infectious Leptospira, with the goal of identifying genes related to pathogenesis and mammalian host adaptation, remains a key gap in the field. Infectious Leptospira, comprised of pathogenic and intermediately pathogenic Leptospira, evolutionarily diverged from non-infectious, saprophytic Leptospira, as demonstrated by the following computational biology analyses: 1) the definitive taxonomy and evolutionary relatedness among all known Leptospira species; 2) genomically-predicted metabolic reconstructions that indicate novel adaptation of infectious Leptospira to mammals, including sialic acid biosynthesis, pathogen-specific porphyrin metabolism and the first-time demonstration of cobalamin (B12) autotrophy as a bacterial virulence factor; 3) CRISPR/Cas systems demonstrated only to be present in pathogenic Leptospira, suggesting a potential mechanism for this clade's refractoriness to gene targeting; 4) finding Leptospira pathogen-specific specialized protein secretion systems; 5) novel virulence-related genes/gene families such as the Virulence Modifying (VM) (PF07598 paralogs) proteins and pathogen-specific adhesins; 6) discovery of novel, pathogen-specific protein modification and secretion mechanisms including unique lipoprotein signal peptide motifs, Sec-independent twin arginine protein secretion motifs, and the absence of certain canonical signal recognition particle proteins from all Leptospira; and 7) and demonstration of infectious Leptospira-specific signal-responsive gene expression, motility and chemotaxis systems. By identifying large scale changes in infectious (pathogenic and intermediately pathogenic
Derrick E Fouts
Full Text Available Leptospirosis, caused by spirochetes of the genus Leptospira, is a globally widespread, neglected and emerging zoonotic disease. While whole genome analysis of individual pathogenic, intermediately pathogenic and saprophytic Leptospira species has been reported, comprehensive cross-species genomic comparison of all known species of infectious and non-infectious Leptospira, with the goal of identifying genes related to pathogenesis and mammalian host adaptation, remains a key gap in the field. Infectious Leptospira, comprised of pathogenic and intermediately pathogenic Leptospira, evolutionarily diverged from non-infectious, saprophytic Leptospira, as demonstrated by the following computational biology analyses: 1 the definitive taxonomy and evolutionary relatedness among all known Leptospira species; 2 genomically-predicted metabolic reconstructions that indicate novel adaptation of infectious Leptospira to mammals, including sialic acid biosynthesis, pathogen-specific porphyrin metabolism and the first-time demonstration of cobalamin (B12 autotrophy as a bacterial virulence factor; 3 CRISPR/Cas systems demonstrated only to be present in pathogenic Leptospira, suggesting a potential mechanism for this clade's refractoriness to gene targeting; 4 finding Leptospira pathogen-specific specialized protein secretion systems; 5 novel virulence-related genes/gene families such as the Virulence Modifying (VM (PF07598 paralogs proteins and pathogen-specific adhesins; 6 discovery of novel, pathogen-specific protein modification and secretion mechanisms including unique lipoprotein signal peptide motifs, Sec-independent twin arginine protein secretion motifs, and the absence of certain canonical signal recognition particle proteins from all Leptospira; and 7 and demonstration of infectious Leptospira-specific signal-responsive gene expression, motility and chemotaxis systems. By identifying large scale changes in infectious (pathogenic and intermediately
Full Text Available Aconitum species (belonging to the Ranunculaceae are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.
Park, Inkyu; Kim, Wook-Jin; Yang, Sungyu; Yeo, Sang-Min; Li, Hulin; Moon, Byeong Cheol
Aconitum species (belonging to the Ranunculaceae) are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp) genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region) marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.
Kumagai, Masahiko; Tanaka, Tsuyoshi; Ohyanagi, Hajime; Hsing, Yue-Ie C.; Itoh, Takeshi
This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.
This chapter summarizes recent data obtained from genome sequencing, annotation projects, and studies on the genome diversity of Oryza sativa and related Oryza species. O. sativa, commonly known as Asian rice, is the first monocot species whose complete genome sequence was deciphered based on physical mapping by an international collaborative effort. This genome, along with its accurate and comprehensive annotation, has become an indispensable foundation for crop genomics and breeding. With the development of innovative sequencing technologies, genomic studies of O. sativa have dramatically increased; in particular, a large number of cultivars and wild accessions have been sequenced and compared with the reference rice genome. Since de novo genome sequencing has become cost-effective, the genome of African cultivated rice, O. glaberrima, has also been determined. Comparative genomic studies have highlighted the independent domestication processes of different rice species, but it also turned out that Asian and African rice share a common gene set that has experienced similar artificial selection. An international project aimed at constructing reference genomes and examining the genome diversity of wild Oryza species is currently underway, and the genomes of some species are publicly available. This project provides a platform for investigations such as the evolution, development, polyploidization, and improvement of crops. Studies on the genomic diversity of Oryza species, including wild species, should provide new insights to solve the problem of growing food demands in the face of rapid climatic changes.
Naushad, Sohail; Barkema, Herman W.; Luby, Christopher; Condas, Larissa A. Z.; Nobrega, Diego B.; Carson, Domonique A.; De Buck, Jeroen
Non-aureus staphylococci (NAS), a heterogeneous group of a large number of species and subspecies, are the most frequently isolated pathogens from intramammary infections in dairy cattle. Phylogenetic relationships among bovine NAS species are controversial and have mostly been determined based on single-gene trees. Herein, we analyzed phylogeny of bovine NAS species using whole-genome sequencing (WGS) of 441 distinct isolates. In addition, evolutionary relationships among bovine NAS were estimated from multilocus data of 16S rRNA, hsp60, rpoB, sodA, and tuf genes and sequences from these and numerous other single genes/proteins. All phylogenies were created with FastTree, Maximum-Likelihood, Maximum-Parsimony, and Neighbor-Joining methods. Regardless of methodology, WGS-trees clearly separated bovine NAS species into five monophyletic coherent clades. Furthermore, there were consistent interspecies relationships within clades in all WGS phylogenetic reconstructions. Except for the Maximum-Parsimony tree, multilocus data analysis similarly produced five clades. There were large variations in determining clades and interspecies relationships in single gene/protein trees, under different methods of tree constructions, highlighting limitations of using single genes for determining bovine NAS phylogeny. However, based on WGS data, we established a robust phylogeny of bovine NAS species, unaffected by method or model of evolutionary reconstructions. Therefore, it is now possible to determine associations between phylogeny and many biological traits, such as virulence, antimicrobial resistance, environmental niche, geographical distribution, and host specificity. PMID:28066335
Soares, René Arderius; Passaglia, Luciane Maria Pereira
Bradyrhizobium elkanii is successfully used in the formulation of commercial inoculants and, together with B. japonicum, it fully supplies the plant nitrogen demands. Despite the similarity between B. japonicum and B. elkanii species, several works demonstrated genetic and physiological differences between them. In this work Representational Difference Analysis (RDA) was used for genomic comparison between B. elkanii SEMIA 587, a crop inoculant strain, and B. japonicum USDA 110, a reference strain. Two hundred sequences were obtained. From these, 46 sequences belonged exclusively to the genome of B. elkanii strain, and 154 showed similarity to sequences from B. japonicum genome. From the 46 sequences with no similarity to sequences from B. japonicum, 39 showed no similarity to sequences in public databases and seven showed similarity to sequences of genes coding for known proteins. These seven sequences were divided in three groups: similar to sequences from other Bradyrhizobium strains, similar to sequences from other nitrogen-fixing bacteria, and similar to sequences from non nitrogen-fixing bacteria. These new sequences could be used as DNA markers in order to investigate the rates of genetic material gain and loss in natural Bradyrhizobium strains.
Cornelius M. Kyalo
Full Text Available Streptocarpus teitensis (Gesneriaceae is an endemic species listed as critically endangered in the International Union for Conservation of Nature (IUCN red list of threatened species. However, the sequence and genome information of this species remains to be limited. In this article, we present the complete chloroplast genome structure of Streptocarpus teitensis and its evolution inferred through comparative studies with other related species. S. teitensis displayed a chloroplast genome size of 153,207 bp, sheltering a pair of inverted repeats (IR of 25,402 bp each split by small and large single-copy (SSC and LSC regions of 18,300 and 84,103 bp, respectively. The chloroplast genome was observed to contain 116 unique genes, of which 80 are protein-coding, 32 are transfer RNAs, and four are ribosomal RNAs. In addition, a total of 196 SSR markers were detected in the chloroplast genome of Streptocarpus teitensis with mononucleotides (57.1% being the majority, followed by trinucleotides (33.2% and dinucleotides and tetranucleotides (both 4.1%, and pentanucleotides being the least (1.5%. Genome alignment indicated that this genome was comparable to other sequenced members of order Lamiales. The phylogenetic analysis suggested that Streptocarpus teitensis is closely related to Lysionotus pauciflorus and Dorcoceras hygrometricum.
Full Text Available Abstract Background The Rickettsia genus includes 25 validated species, 17 of which are proven human pathogens. Among these, the pathogenicity varies greatly, from the highly virulent R. prowazekii, which causes epidemic typhus and kills its arthropod host, to the mild pathogen R. africae, the agent of African tick-bite fever, which does not affect the fitness of its tick vector. Results We evaluated the clonality of R. africae in 70 patients and 155 ticks, and determined its genome sequence, which comprises a circular chromosome of 1,278,540 bp including a tra operon and an unstable 12,377-bp plasmid. To study the genetic characteristics associated with virulence, we compared this species to R. prowazekii, R. rickettsii and R. conorii. R. africae and R. prowazekii have, respectively, the less and most decayed genomes. Eighteen genes are present only in R. africae including one with a putative protease domain upregulated at 37°C. Conclusion Based on these data, we speculate that a loss of regulatory genes causes an increase of virulence of rickettsial species in ticks and mammals. We also speculate that in Rickettsia species virulence is mostly associated with gene loss. The genome sequence was deposited in GenBank under accession number [GenBank: NZ_AAUY01000001].
Full Text Available Tilia is an ecologically and economically important genus in the family Malvaceae. However, there is no complete plastid genome of Tilia sequenced to date, and the taxonomy of Tilia is difficult owing to frequent hybridization and polyploidization. A well-supported interspecific relationships of this genus is not available due to limited informative sites from the commonly used molecular markers. We report here the complete plastid genome sequences of four Tilia species determined by the Illumina technology. The Tilia plastid genome is 162,653 bp to 162,796 bp in length, encoding 113 unique genes and a total number of 130 genes. The gene order and organization of the Tilia plastid genome exhibits the general structure of angiosperms and is very similar to other published plastid genomes of Malvaceae. As other long-lived tree genera, the sequence divergence among the four Tilia plastid genomes is very low. And we analyzed the nucleotide substitution patterns and the evolution of insertions and deletions in the Tilia plastid genomes. Finally, we build a phylogeny of the four sampled Tilia species with high supports using plastid phylogenomics, suggesting that it is an efficient way to resolve the phylogenetic relationships of this genus.
Fu, Peng-Cheng; Zhang, Yan-Zhao; Geng, Hui-Min; Chen, Shi-Long
The chloroplast (cp) genome is useful in plant systematics, genetic diversity analysis, molecular identification and divergence dating. The genus Gentiana contains 362 species, but there are only two valuable complete cp genomes. The purpose of this study is to report the characterization of complete cp genome of G. lawrencei var. farreri , which is endemic to the Qinghai-Tibetan Plateau (QTP). Using high throughput sequencing technology, we got the complete nucleotide sequence of the G. lawrencei var. farreri cp genome. The comparison analysis including genome difference and gene divergence was performed with its congeneric species G. straminea . The simple sequence repeats (SSRs) and phylogenetics were studied as well. The cp genome of G. lawrencei var. farreri is a circular molecule of 138,750 bp, containing a pair of 24,653 bp inverted repeats which are separated by small and large single-copy regions of 11,365 and 78,082 bp, respectively. The cp genome contains 130 known genes, including 85 protein coding genes (PCGs), eight ribosomal RNA genes and 37 tRNA genes. Comparative analyses indicated that G. lawrencei var. farreri is 10,241 bp shorter than its congeneric species G. straminea. Four large gaps were detected that are responsible for 85% of the total sequence loss. Further detailed analyses revealed that 10 PCGs were included in the four gaps that encode nine NADH dehydrogenase subunits. The cp gene content, order and orientation are similar to those of its congeneric species, but with some variation among the PCGs. Three genes, ndhB , ndhF and clpP , have high nonsynonymous to synonymous values. There are 34 SSRs in the G. lawrencei var. farreri cp genome, of which 25 are mononucleotide repeats: no dinucleotide repeats were detected. Comparison with the G. straminea cp genome indicated that five SSRs have length polymorphisms and 23 SSRs are species-specific. The phylogenetic analysis of 48 PCGs from 12 Gentianales taxa cp genomes clearly identified
Ransangan, Julian; Manin, Benny Obrain
Betanodavirus is the causative agent of the viral nervous necrosis (VNN) or viral encephalopathy and retinopathy disease in marine fish. This disease is responsible for most of the mass mortalities that occurred in marine fish hatcheries in Malaysia. The genome of this virus consists of two positive-sense RNA molecules which are the RNA1 and RNA2. The RNA1 molecule contains the RdRp gene which encodes for the RNA-dependent RNA polymerase and the RNA2 molecule contains the Cp gene which encodes for the viral coat protein. In this study, total RNAs were extracted from 32 fish specimens representing the four most cultured marine fish species in Malaysia. The fish specimens were collected from different hatcheries and aquaculture farms in Malaysia. The RNA1 was successfully amplified using three pairs of overlapping PCR primers whereas the RNA2 was amplified using a pair of primers. The nucleotide analysis of RdRp gene revealed that the Betanodavirus in Malaysia were 94.5-99.7% similar to the RGNNV genotype, 79.8-82.1% similar to SJNNV genotype, 81.5-82.4% similar to BFNNV genotype and 79.8-80.7% similar to TPNNV genotype. However, they showed lower similarities to FHV (9.4-14.2%) and BBV (7.2-15.7%), respectively. Similarly, the Cp gene revealed that the viruses showed high nucleotide similarity to RGNNV (95.9-99.8%), SJNNV (72.2-77.4%), BFNNV (80.9-83.5%), TPNNV (77.2-78.1%) and TNV (75.1-76.5%). However, as in the RdRp gene, the coat protein gene was highly dissimilar to FHV (3.0%) and BBV (2.6-4.1%), respectively. Based on the genome analysis, the Betanodavirus infecting cultured marine fish species in Malaysia belong to the RGNNV genotype. However, the phylogenetic analysis of the genes revealed that the viruses can be further divided into nine sub-groups. This has been expected since various marine fish species of different origins are cultured in Malaysia. Copyright Â© 2011 Elsevier B.V. All rights reserved.
Javed Iqbal Wattoo
Full Text Available Background: DNA barcoding is a novel method of species identification based on nucleotide diversity of conserved sequences. The establishment and refining of plant DNA barcoding systems is more challenging due to high genetic diversity among different species. Therefore, targeting the conserved nuclear transcribed regions would be more reliable for plant scientists to reveal genetic diversity, species discrimination and phylogeny. Methods: In this study, we amplified and sequenced the chloroplast DNA regions (matk+rbcl of Solanum nigrum, Euphorbia helioscopia and Dalbergia sissoo to study the functional annotation, homology modeling and sequence analysis to allow a more efficient utilization of these sequences among different plant species. These three species represent three families; Solanaceae, Euphorbiaceae and Fabaceae respectively. Biological sequence homology and divergence of amplified sequences was studied using Basic Local Alignment Tool (BLAST. Results: Both primers (matk+rbcl showed good amplification in three species. The sequenced regions reveled conserved genome information for future identification of different medicinal plants belonging to these species. The amplified conserved barcodes revealed different levels of biological homology after sequence analysis. The results clearly showed that the use of these conserved DNA sequences as barcode primers would be an accurate way for species identification and discrimination. Conclusion: The amplification and sequencing of conserved genome regions identified a novel sequence of matK in native species of Solanum nigrum. The findings of the study would be applicable in medicinal industry to establish DNA based identification of different medicinal plant species to monitor adulteration.
Cao, Yunpeng; Meng, Dandan; Abdullah, Muhammad; Jin, Qing; Lin, Yi; Cai, Yongping
The VQ motif-containing gene, a member of the plant-specific genes, is involved in the plant developmental process and various stress responses. The VQ motif-containing gene family has been studied in several plants, such as rice ( Oryza sativa ), maize ( Zea mays ), and Arabidopsis ( Arabidopsis thaliana ). However, no systematic study has been performed in Pyrus species, which have important economic value. In our study, we identified 41 and 28 VQ motif-containing genes in Pyrus bretschneideri and Pyrus communis , respectively. Phylogenetic trees were calculated using A. thaliana and O. sativa VQ motif-containing genes as a template, allowing us to categorize these genes into nine subfamilies. Thirty-two and eight paralogous of VQ motif-containing genes were found in P. bretschneideri and P. communis , respectively, showing that the VQ motif-containing genes had a more remarkable expansion in P. bretschneideri than in P. communis . A total of 31 orthologous pairs were identified from the P. bretschneideri and P. communis VQ motif-containing genes. Additionally, among the paralogs, we found that these duplication gene pairs probably derived from segmental duplication/whole-genome duplication (WGD) events in the genomes of P. bretschneideri and P. communis , respectively. The gene expression profiles in both P. bretschneideri and P. communis fruits suggested functional redundancy for some orthologous gene pairs derived from a common ancestry, and sub-functionalization or neo-functionalization for some of them. Our study provided the first systematic evolutionary analysis of the VQ motif-containing genes in Pyrus , and highlighted the diversification and duplication of VQ motif-containing genes in both P. bretschneideri and P. communis .
Thao Thi Nguyen
Full Text Available Type VI secretion system (T6SS has been discovered in a variety of gram-negative bacteria as a versatile weapon to stimulate the killing of eukaryotic cells or prokaryotic competitors. Type VI secretion effectors (T6SEs are well known as key virulence factors for important pathogenic bacteria. In many Burkholderia species, T6SS has evolved as the most complicated secretion pathway with distinguished types to translocate diverse T6SEs, suggesting their essential roles in this genus. Here we attempted to detect and characterize T6SSs and potential T6SEs in target genomes of plant-associated and environmental Burkholderia species based on computational analyses. In total, 66 potential functional T6SS clusters were found in 30 target Burkholderia bacterial genomes, of which 33% possess three or four clusters. The core proteins in each cluster were specified and phylogenetic trees of three components (i.e., TssC, TssD, TssL were constructed to elucidate the relationship among the identified T6SS clusters. Next, we identified 322 potential T6SEs in the target genomes based on homology searches and explored the important domains conserved in effector candidates. In addition, using the screening approach based on the profile hidden Markov model (pHMM of T6SEs that possess markers for type VI effectors (MIX motif (MIX T6SEs, 57 revealed proteins that were not included in training datasets were recognized as novel MIX T6SE candidates from the Burkholderia species. This approach could be useful to identify potential T6SEs from other bacterial genomes.
Full Text Available Plant type III polyketide synthase (PKS can catalyse the formation of a series of secondary metabolites with different structures and different biological functions; the enzyme plays an important role in plant growth, development and resistance to stress. At present, the PKS gene has been identified and studied in a variety of plants. Here, we identified 11 PKS genes from upland cotton (Gossypium hirsutum and compared them with 41 PKS genes in Populus tremula, Vitis vinifera, Malus domestica and Arabidopsis thaliana. According to the phylogenetic tree, a total of 52 PKS genes can be divided into four subfamilies (I–IV. The analysis of gene structures and conserved motifs revealed that most of the PKS genes were composed of two exons and one intron and there are two characteristic conserved domains (Chal_sti_synt_N and Chal_sti_synt_C of the PKS gene family. In our study of the five species, gene duplication was found in addition to Arabidopsis thaliana and we determined that purifying selection has been of great significance in maintaining the function of PKS gene family. From qRT-PCR analysis and a combination of the role of the accumulation of proanthocyanidins (PAs in brown cotton fibers, we concluded that five PKS genes are candidate genes involved in brown cotton fiber pigment synthesis. These results are important for the further study of brown cotton PKS genes. It not only reveals the relationship between PKS gene family and pigment in brown cotton, but also creates conditions for improving the quality of brown cotton fiber.
Hornett Emily A
Full Text Available Abstract Background How well does RNA-Seq data perform for quantitative whole gene expression analysis in the absence of a genome? This is one unanswered question facing the rapidly growing number of researchers studying non-model species. Using Homo sapiens data and resources, we compared the direct mapping of sequencing reads to predicted genes from the genome with mapping to de novo transcriptomes assembled from RNA-Seq data. Gene coverage and expression analysis was further investigated in the non-model context by using increasingly divergent genomic reference species to group assembled contigs by unique genes. Results Eight transcriptome sets, composed of varying amounts of Illumina and 454 data, were assembled and assessed. Hybrid 454/Illumina assemblies had the highest transcriptome and individual gene coverage. Quantitative whole gene expression levels were highly similar between using a de novo hybrid assembly and the predicted genes as a scaffold, although mapping to the de novo transcriptome assembly provided data on fewer genes. Using non-target species as reference scaffolds does result in some loss of sequence and expression data, and bias and error increase with evolutionary distance. However, within a 100 million year window these effect sizes are relatively small. Conclusions Predicted gene sets from sequenced genomes of related species can provide a powerful method for grouping RNA-Seq reads and annotating contigs. Gene expression results can be produced that are similar to results obtained using gene models derived from a high quality genome, though biased towards conserved genes. Our results demonstrate the power and limitations of conducting RNA-Seq in non-model species.
Yasuike, Motoshige; Nishiki, Issei; Iwasaki, Yuki; Nakamura, Yoji; Fujiwara, Atushi; Shimahara, Yoshiko; Kamaishi, Takashi; Yoshida, Terutoyo; Nagai, Satoshi; Kobayashi, Takanori; Katoh, Masaya
Nocardiosis caused by Nocardia seriolae is one of the major threats in the aquaculture of Seriola species (yellowtail; S. quinqueradiata, amberjack; S. dumerili and kingfish; S. lalandi) in Japan. Here, we report the complete nucleotide genome sequence of N. seriolae UTF1, isolated from a cultured yellowtail. The genome is a circular chromosome of 8,121,733 bp with a G+C content of 68.1% that encodes 7,697 predicted proteins. In the N. seriolae UTF1 predicted genes, we found orthologs of virulence factors of pathogenic mycobacteria and human clinical Nocardia isolates involved in host cell invasion, modulation of phagocyte function and survival inside the macrophages. The virulence factor candidates provide an essential basis for understanding their pathogenic mechanisms at the molecular level by the fish nocardiosis research community in future studies. We also found many potential antibiotic resistance genes on the N. seriolae UTF1 chromosome. Comparative analysis with the four existing complete genomes, N. farcinica IFM 10152, N. brasiliensis HUJEG-1 and N. cyriacigeorgica GUH-2 and N. nova SH22a, revealed that 2,745 orthologous genes were present in all five Nocardia genomes (core genes) and 1,982 genes were unique to N. seriolae UTF1. In particular, the N. seriolae UTF1 genome contains a greater number of mobile elements and genes of unknown function that comprise the differences in structure and gene content from the other Nocardia genomes. In addition, a lot of the N. seriolae UTF1-specific genes were assigned to the ABC transport system. Because of limited resources in ocean environments, these N. seriolae UTF1 specific ABC transporters might facilitate adaptation strategies essential for marine environment survival. Thus, the availability of the complete N. seriolae UTF1 genome sequence will provide a valuable resource for comparative genomic studies of N. seriolae isolates, as well as provide new insights into the ecological and functional diversity of
Carolina M. Voloch
Full Text Available Lyssavirus is a diverse genus of viruses that infect a variety of mammalian hosts, typically causing encephalitis. The evolution of this lineage, particularly the rabies virus, has been a focus of research because of the extensive occurrence of cross-species transmission, and the distinctive geographical patterns present throughout the diversification of these viruses. Although numerous studies have examined pattern-related questions concerning Lyssavirus evolution, analyses of the evolutionary processes acting on Lyssavirus diversification are scarce. To clarify the relevance of positive natural selection in Lyssavirus diversification, we conducted a comprehensive scan for episodic diversifying selection across all lineages and codon sites of the five coding regions in lyssavirus genomes. Although the genomes of these viruses are generally conserved, the glycoprotein (G, RNA-dependent RNA polymerase (L and polymerase (P genes were frequently targets of adaptive evolution during the diversification of the genus. Adaptive evolution is particularly manifest in the glycoprotein gene, which was inferred to have experienced the highest density of positively selected codon sites along branches. Substitutions in the L gene were found to be associated with the early diversification of phylogroups. A comparison between the number of positively selected sites inferred along the branches of RABV population branches and Lyssavirus intespecies branches suggested that the occurrence of positive selection was similar on the five coding regions of the genome in both groups.
Voloch, Carolina M; Capellão, Renata T; Mello, Beatriz; Schrago, Carlos G
Lyssavirus is a diverse genus of viruses that infect a variety of mammalian hosts, typically causing encephalitis. The evolution of this lineage, particularly the rabies virus, has been a focus of research because of the extensive occurrence of cross-species transmission, and the distinctive geographical patterns present throughout the diversification of these viruses. Although numerous studies have examined pattern-related questions concerning Lyssavirus evolution, analyses of the evolutionary processes acting on Lyssavirus diversification are scarce. To clarify the relevance of positive natural selection in Lyssavirus diversification, we conducted a comprehensive scan for episodic diversifying selection across all lineages and codon sites of the five coding regions in lyssavirus genomes. Although the genomes of these viruses are generally conserved, the glycoprotein (G), RNA-dependent RNA polymerase (L) and polymerase (P) genes were frequently targets of adaptive evolution during the diversification of the genus. Adaptive evolution is particularly manifest in the glycoprotein gene, which was inferred to have experienced the highest density of positively selected codon sites along branches. Substitutions in the L gene were found to be associated with the early diversification of phylogroups. A comparison between the number of positively selected sites inferred along the branches of RABV population branches and Lyssavirus intespecies branches suggested that the occurrence of positive selection was similar on the five coding regions of the genome in both groups.
Vongsangnak, Wanwipa; Klanchui, Amornpan; Tawornsamretkit, Iyarest; Tatiyaborwornchai, Witthawin; Laoteng, Kobkul; Meechai, Asawin
We present a novel genome-scale metabolic model iWV1213 of Mucor circinelloides, which is an oleaginous fungus for industrial applications. The model contains 1213 genes, 1413 metabolites and 1326 metabolic reactions across different compartments. We demonstrate that iWV1213 is able to accurately predict the growth rates of M. circinelloides on various nutrient sources and culture conditions using Flux Balance Analysis and Phenotypic Phase Plane analysis. Comparative analysis of three oleaginous genome-scale models, including M. circinelloides (iWV1213), Mortierella alpina (iCY1106) and Yarrowia lipolytica (iYL619_PCP) revealed that iWV1213 possesses a higher number of genes involved in carbohydrate, amino acid, and lipid metabolisms that might contribute to its versatility in nutrient utilization. Moreover, the identification of unique and common active reactions among the Zygomycetes oleaginous models using Flux Variability Analysis unveiled a set of gene/enzyme candidates as metabolic engineering targets for cellular improvement. Thus, iWV1213 offers a powerful metabolic engineering tool for multi-level omics analysis, enabling strain optimization as a cell factory platform of lipid-based production. Copyright © 2016 Elsevier B.V. All rights reserved.
Voolstra, Christian R.
Stony corals form the foundation of coral reef ecosystems. Their phylogeny is characterized by a deep evolutionary divergence that separates corals into a robust and complex clade dating back to at least 245 mya. However, the genomic consequences and clade-specific evolution remain unexplored. In this study we have produced the genome of a robust coral, Stylophora pistillata, and compared it to the available genome of a complex coral, Acropora digitifera. We conducted a fine-scale gene-based analysis focusing on ortholog groups. Among the core set of conserved proteins, we found an emphasis on processes related to the cnidarian-dinoflagellate symbiosis. Genes associated with the algal symbiosis were also independently expanded in both species, but both corals diverged on the identity of ortholog groups expanded, and we found uneven expansions in genes associated with innate immunity and stress response. Our analyses demonstrate that coral genomes can be surprisingly disparate. Future analyses incorporating more genomic data should be able to determine whether the patterns elucidated here are not only characteristic of the differences between S. pistillata and A. digitifera but also representative of corals from the robust and complex clade at large.
Voolstra, Christian R.; Li, Yong; Liew, Yi Jin; Baumgarten, Sebastian; Zoccola, Didier; Flot, Jean-Franç ois; Tambutté , Sylvie; Allemand, Denis; Aranda, Manuel
Stony corals form the foundation of coral reef ecosystems. Their phylogeny is characterized by a deep evolutionary divergence that separates corals into a robust and complex clade dating back to at least 245 mya. However, the genomic consequences and clade-specific evolution remain unexplored. In this study we have produced the genome of a robust coral, Stylophora pistillata, and compared it to the available genome of a complex coral, Acropora digitifera. We conducted a fine-scale gene-based analysis focusing on ortholog groups. Among the core set of conserved proteins, we found an emphasis on processes related to the cnidarian-dinoflagellate symbiosis. Genes associated with the algal symbiosis were also independently expanded in both species, but both corals diverged on the identity of ortholog groups expanded, and we found uneven expansions in genes associated with innate immunity and stress response. Our analyses demonstrate that coral genomes can be surprisingly disparate. Future analyses incorporating more genomic data should be able to determine whether the patterns elucidated here are not only characteristic of the differences between S. pistillata and A. digitifera but also representative of corals from the robust and complex clade at large.
Campanaro, Stefano; Treu, Laura; Vendramin, Veronica
species after 30 days of incubation and made it possible to identify those species that are able to grow in that extreme environment. The genome sequence of Lactobacillus fabifermentans, one of the dominant species identified, was then analyzed using shotgun sequencing and comparative genomics....... The results revealed that it is one of the largest genomes among the Lactobacillus sequenced and is characterized by a large number of genes involved in carbohydrate utilization and in the regulation of gene expression. The genome was shaped through a large number of gene duplication events, while lateral...... gene transfer contributed to a lesser extent with respect to other Lactobacillus species. According to genomic analysis, its carbohydrate utilization pattern and ability to form biofilm are the main genetic traits linked to the adaptation the species underwent permitting it to grow in fermenting grape...
Morin, Phillip A; Archer, Frederick I.; Foote, Andrew David
Killer whales (Orcinus orca) currently comprise a single, cosmopolitan species with a diverse diet. However, studies over the last 30 yr have revealed populations of sympatric "ecotypes" with discrete prey preferences, morphology, and behaviors. Although these ecotypes avoid social interactions...... and are not known to interbreed, genetic studies to date have found extremely low levels of diversity in the mitochondrial control region, and few clear phylogeographic patterns worldwide. This low level of diversity is likely due to low mitochondrial mutation rates that are common to cetaceans. Using killer whales...... as a case study, we have developed a method to readily sequence, assemble, and analyze complete mitochondrial genomes from large numbers of samples to more accurately assess phylogeography and estimate divergence times. This represents an important tool for wildlife management, not only for killer whales...
Full Text Available Some mammals breed throughout the year, while others breed only at certain times of year. These differences in reproductive behavior can be explained by evolution. We identified positively-selected genes in two sets of species with different degrees of relatedness including seasonal and non-seasonal breeding species, using branch-site models. After stringent filtering by sum of pairs scoring, we revealed that more genes underwent positive selection in seasonal compared with non-seasonal breeding species. Positively-selected genes were verified by cDNA mapping of the positive sites with the corresponding cDNA sequences. The design of the evolutionary analysis can effectively lower the false-positive rate and thus identify valid positive genes. Validated, positively-selected genes, including CGA, DNAH1, INVS, and CD151, were related to reproductive behaviors such as spermatogenesis and cell proliferation in non-seasonal breeding species. Genes in seasonal breeding species, including THRAP3, TH1L, and CMTM6, may be related to the evolution of sperm and the circadian rhythm system. Identification of these positively-selected genes might help to identify the molecular mechanisms underlying seasonal and non-seasonal reproductive behaviors.
Imoto, Junichi M; Saitoh, Kenji; Sasaki, Takeshi; Yonezawa, Takahiro; Adachi, Jun; Kartavtsev, Yuri P; Miya, Masaki; Nishida, Mutsumi; Hanzawa, Naoto
The distribution of freshwater taxa is a good biogeographic model to study pattern and process of vicariance and dispersal. The subfamily Leuciscinae (Cyprinidae, Teleostei) consists of many species distributed widely in Eurasia and North America. Leuciscinae have been divided into two phyletic groups, leuciscin and phoxinin. The phylogenetic relationships between major clades within the subfamily are poorly understood, largely because of the overwhelming diversity of the group. The origin of the Far Eastern phoxinin is an interesting question regarding the evolutionary history of Leuciscinae. Here we present phylogenetic analysis of 31 species of Leuciscinae and outgroups based on complete mitochondrial genome sequences to clarify the phylogenetic relationships and to infer the evolutionary history of the subfamily. Phylogenetic analysis suggests that the Far Eastern phoxinin species comprised the monophyletic clades Tribolodon, Pseudaspius, Oreoleuciscus and Far Eastern Phoxinus. The Far Eastern phoxinin clade was independent of other Leuciscinae lineages and was closer to North American phoxinins than European leuciscins. All of our analysis also suggested that leuciscins and phoxinins each constituted monophyletic groups. Divergence time estimation suggested that Leuciscinae species diverged from outgroups such as Tincinae to be 83.3 million years ago (Mya) in the Late Cretaceous and leuciscin and phoxinin shared a common ancestor 70.7 Mya. Radiation of Leuciscinae lineages occurred during the Late Cretaceous to Paleocene. This period also witnessed the radiation of tetrapods. Reconstruction of ancestral areas indicates Leuciscinae species originated within Europe. Leuciscin species evolved in Europe and the ancestor of phoxinin was distributed in North America. The Far Eastern phoxinins would have dispersed from North America to Far East across the Beringia land bridge. The present study suggests important roles for the continental rearrangements during the
Full Text Available Abstract Background Reconstructing the evolutionary history of organisms using traditional phylogenetic methods may suffer from inaccurate sequence alignment. An alternative approach, particularly effective when whole genome sequences are available, is to employ methods that don’t use explicit sequence alignments. We extend a novel phylogenetic method based on Singular Value Decomposition (SVD to reconstruct the phylogeny of 12 sequenced Drosophila species. SVD analysis provides accurate comparisons for a high fraction of sequences within whole genomes without the prior identification of orthologs or homologous sites. With this method all protein sequences are converted to peptide frequency vectors within a matrix that is decomposed to provide simplified vector representations for each protein of the genome in a reduced dimensional space. These vectors are summed together to provide a vector representation for each species, and the angle between these vectors provides distance measures that are used to construct species trees. Results An unfiltered whole genome analysis (193,622 predicted proteins strongly supports the currently accepted phylogeny for 12 Drosophila species at higher dimensions except for the generally accepted but difficult to discern sister relationship between D. erecta and D. yakuba. Also, in accordance with previous studies, many sequences appear to support alternative phylogenies. In this case, we observed grouping of D. erecta with D. sechellia when approximately 55% to 95% of the proteins were removed using a filter based on projection values or by reducing resolution by using fewer dimensions. Similar results were obtained when just the melanogaster subgroup was analyzed. Conclusions These results indicate that using our novel phylogenetic method, it is possible to consult and interpret all predicted protein sequences within multiple whole genomes to produce accurate phylogenetic estimations of relatedness between
Varghese, Neha J; Mukherjee, Supratim; Ivanova, Natalia; Konstantinidis, Konstantinos T; Mavrommatis, Kostas; Kyrpides, Nikos C; Pati, Amrita
Increased sequencing of microbial genomes has revealed that prevailing prokaryotic species assignments can be inconsistent with whole genome information for a significant number of species. The long-standing need for a systematic and scalable species assignment technique can be met by the genome-wide Average Nucleotide Identity (gANI) metric, which is widely acknowledged as a robust measure of genomic relatedness. In this work, we demonstrate that the combination of gANI and the alignment fraction (AF) between two genomes accurately reflects their genomic relatedness. We introduce an efficient implementation of AF,gANI and discuss its successful application to 86.5M genome pairs between 13,151 prokaryotic genomes assigned to 3032 species. Subsequently, by comparing the genome clusters obtained from complete linkage clustering of these pairs to existing taxonomy, we observed that nearly 18% of all prokaryotic species suffer from anomalies in species definition. Our results can be used to explore central questions such as whether microorganisms form a continuum of genetic diversity or distinct species represented by distinct genetic signatures. We propose that this precise and objective AF,gANI-based species definition: the MiSI (Microbial Species Identifier) method, be used to address previous inconsistencies in species classification and as the primary guide for new taxonomic species assignment, supplemented by the traditional polyphasic approach, as required. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.
Crkvenjakov, R.; Drmanac, R.
A genome is the sum total of the DNA sequences in the cells of an individual organism. The common usage that species possess genomes comes naturally to biochemists, who have shown that all protein and nucleic acid molecules are at the same time species- and individual-specific, with minor individual variations being superimposed on a consensus sequence that is constant for a species. By extension, this property is attributed to the common features of DNA in the chromosomes of members of a given species and is called species genome. Our proposal for the definition of a biological species is as follows: A species comprises a group of actual and potential biological organisms built according to a unique genome program that is recorded, and at least in part expressed, in the structures of their genomic nucleic acid molecule(s), having intragroup sequence differences which can be fully interconverted in the process of organismal reproduction.
Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G.; O’Brien, Stephen J.; Johnson, Warren E.
Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondr...
Zhong, Hua-Ming; Zhang, Hong-Hai; Sha, Wei-Lai; Zhang, Cheng-De; Chen, Yu-Cai
The whole mitochondrial genome sequence of red fox (Vuples vuples) was determined. It had a total length of 16 723 bp. As in most mammal mitochondrial genome, it contained 13 protein coding genes, two ribosome RNA genes, 22 transfer RNA genes and one control region. The base composition was 31.3% A, 26.1% C, 14.8% G and 27.8% T, respectively. The codon usage of red fox, arctic fox, gray wolf, domestic dog and coyote followed the same pattern except for an unusual ATT start codon, which initiates the NADH dehydrogenase subunit 3 gene in the red fox. A long tandem repeat rich in AC was found between conserved sequence block 1 and 2 in the control region. In order to confirm the phylogenetic relationships of red fox to other canids, phylogenetic trees were reconstructed by neighbor-joining and maximum parsimony methods using 12 concatenated heavy-strand protein-coding genes. The result indicated that arctic fox was the sister group of red fox and they both belong to the red fox-like clade in family Canidae, while gray wolf, domestic dog and coyote belong to wolf-like clade. The result was in accordance with existing phylogenetic results.
VNTR numbers that occurred over the course of one year. Conclusions The comparative genomic analysis of the SAG clarifies the phylogenetics of these bacteria and supports the distinct species classification. Numerous potential virulence determinants were identified and provide a foundation for further studies into SAG pathogenesis. Furthermore, the data may be used to enable the development of rapid diagnostic assays and therapeutics for these pathogens. PMID:24341328
Luis E. Rojas
Full Text Available The extraction methods of genomic DNA are usually laborious and hazardous to human health and the environment by the use of organic solvents (chloroform and phenol. In this work a protocol for in situ extraction of genomic DNA by alkaline lysis is validated. It was used in order to amplify regions of DNA in four species of plants and fungi by polymerase chain reaction (PCR. From plant material of Saccharum officinarum L., Carica papaya L. and Digitalis purpurea L. it was possible to extend different regions of the genome through PCR. Furthermore, it was possible to amplify a fragment of avr-4 gene DNA purified from lyophilized mycelium of Mycosphaerella fijiensis. Additionally, it was possible to amplify the region ap24 transgene inserted into the genome of banana cv. `Grande naine' (Musa AAA. Key words: alkaline lysis, Carica papaya L., Digitalis purpurea L., Musa, Saccharum officinarum L.
Kim, Jae-Heup; Antunes, Agostinho; Luo, Shu-Jin; Menninger, Joan; Nash, William G; O'Brien, Stephen J; Johnson, Warren E
Translocation of cymtDNA into the nuclear genome, also referred to as numt, has been reported in many species, including several closely related to the domestic cat (Felis catus). We describe the recent transposition of 12,536 bp of the 17 kb mitochondrial genome into the nucleus of the common ancestor of the five Panthera genus species: tiger, P. tigris; snow leopard, P. uncia; jaguar, P. onca; leopard, P. pardus; and lion, P. leo. This nuclear integration, representing 74% of the mitochondrial genome, is one of the largest to be reported in eukaryotes. The Panthera genus numt differs from the numt previously described in the Felis genus in: (1) chromosomal location (F2-telomeric region vs. D2-centromeric region), (2) gene make up (from the ND5 to the ATP8 vs. from the CR to the COII), (3) size (12.5 vs. 7.9 kb), and (4) structure (single monomer vs. tandemly repeated in Felis). These distinctions indicate that the origin of this large numt fragment in the nuclear genome of the Panthera species is an independent insertion from that of the domestic cat lineage, which has been further supported by phylogenetic analyses. The tiger cymtDNA shared around 90% sequence identity with the homologous numt sequence, suggesting an origin for the Panthera numt at around 3.5 million years ago, prior to the radiation of the five extant Panthera species.
Crkvenjakov, R.; Dramanac, R.
A genome is the sum total of the DNA sequences in the cells of an individual organism. The common usage that species possess genomes comes naturally to biochemists, who have shown that all protein and nucleic acid molecules are at the same time species and individual-specific, with minor individual variations being superimposed on a consensus sequence that is constant for a species. By extension, this property is attributed to the common features of DNA in the chromosomes of members of a given species and is called (species) genome. The definition of species based on chromosomes, genes, or genome common to its member organisms has been implied or mentioned in passing numerous times. Some population biologists think that members of species have similar ``homeostatic genotypes,`` which are to a degree resistant to mutation or environmental change in the production of a basic phenotype.
Wu, Jian; Peng, Zhen; Liu, Songyu; He, Yanjun; Cheng, Lin; Kong, Fuling; Wang, Jie; Lu, Gang
Auxin plays key roles in a wide variety of plant activities, including embryo development, leaf formation, phototropism, fruit development and root initiation and development. Auxin/indoleacetic acid (Aux/IAA) genes, encoding short-lived nuclear proteins, are key regulators in the auxin transduction pathway. But how they work is still unknown. In order to conduct a systematic analysis of this gene family in Solanaceae species, a genome-wide search for the homologues of auxin response genes was carried out. Here, 26 and 27 non redundant AUX/IAAs were identified in tomato and potato, respectively. Using tomato as a model, a comprehensive overview of SlIAA gene family is presented, including the gene structures, phylogeny, chromosome locations, conserved motifs and cis-elements in promoter sequences. A phylogenetic tree generated from alignments of the predicted protein sequences of 31 OsIAAs, 29 AtIAAs, 31 ZmIAAs, and 26 SlIAAs revealed that these IAAs were clustered into three major groups and ten subgroups. Among them, seven subgroups were present in both monocot and dicot species, which indicated that the major functional diversification within the IAA family predated the monocot/dicot divergence. In contrast, group C and some other subgroups seemed to be species-specific. Quantitative real-time PCR (qRT-PCR) analysis showed that 19 of the 26 SlIAA genes could be detected in all tomato organs/tissues, however, seven of them were specifically expressed in some of tomato tissues. The transcript abundance of 17 SlIAA genes were increased within a few hours when the seedlings were treated with exogenous IAA. However, those of other six SlIAAs were decreased. The results of stress treatments showed that most SIIAA family genes responded to at least one of the three stress treatments, however, they exhibited diverse expression levels under different abiotic stress conditions in tomato seedlings. SlIAA20, SlIAA21 and SlIAA22 were not significantly influenced by stress
Díez-Del-Molino, David; Sánchez-Barreiro, Fatima; Barnes, Ian; Gilbert, M Thomas P; Dalén, Love
Many species have undergone dramatic population size declines over the past centuries. Although stochastic genetic processes during and after such declines are thought to elevate the risk of extinction, comparative analyses of genomic data from several endangered species suggest little concordance between genome-wide diversity and current population sizes. This is likely because species-specific life-history traits and ancient bottlenecks overshadow the genetic effect of recent demographic declines. Therefore, we advocate that temporal sampling of genomic data provides a more accurate approach to quantify genetic threats in endangered species. Specifically, genomic data from predecline museum specimens will provide valuable baseline data that enable accurate estimation of recent decreases in genome-wide diversity, increases in inbreeding levels, and accumulation of deleterious genetic variation. Copyright © 2017 Elsevier Ltd. All rights reserved.
Nishibori, M; Tsudzuki, M; Hayashi, T; Yamamoto, Y; Yasue, H
Coturnix chinensis (blue-breasted quail) has been classically grouped in Galliformes Phasianidae Coturnix, based on morphologic features and biochemical evidence. Since the blue-breasted quail has the smallest body size among the species of Galliformes, in addition to a short generation time and an excellent reproductive performance, it is a possible model fowl for breeding and physiological studies of the Coturnix japonica (Japanese quail) and Gallus gallus domesticus (chicken), which are classified in the same family as blue-breasted quail. However, since its phylogenetic position in the family Phasianidae has not been determined conclusively, the sequence of the entire blue-breasted quail mitochondria (mt) genome was obtained to provide genetic information for phylogenetic analysis in the present study. The blue-breasted quail mtDNA was found to be a circular DNA of 16,687 base pairs (bp) with the same genomic structure as the mtDNAs of Japanese quail and chicken, though it is smaller than Japanese quail and chicken mtDNAs by 10 bp and 88 bp, respectively. The sequence identity of all mitochondrial genes, including those for 12S and 16S ribosomal RNAs, between blue-breasted quail and Japanese quail ranged from 84.5% to 93.5%; between blue-breasted quail and chicken, sequence identity ranged from 78.0% to 89.6%. In order to obtain information on the phylogenetic position of blue-breasted quail in Galliformes Phasianidae, the 2,184 bp sequence comprising NADH dehydrogenase subunit 2 and cytochrome b genes available for eight species in Galliformes [Japanese quail, chicken, Gallus varius (green junglefowl), Bambusicola thoracica (Chinese bamboo partridge), Pavo cristatus (Indian peafowl), Perdix perdix (gray partridge), Phasianus colchicus (ring-neck pheasant), and Tympanchus phasianellus (sharp-tailed grouse)] together with that of Aythya americana (redhead) were examined using a maximum likelihood (ML) method. The ML analyses on the first/second codon positions
Francisco J. Roig
Full Text Available Vibrio vulnificus (Vv is a multi-host pathogenic species currently subdivided into three biotypes (Bts. The three Bts are human-pathogens, but only Bt2 is also a fish-pathogen, an ability that is conferred by a transferable virulence-plasmid (pVvbt2. Here we present a phylogenomic analysis from the core genome of 80 Vv strains belonging to the three Bts recovered from a wide range of geographical and ecological sources. We have identified five well-supported phylogenetic groups or lineages (L. L1 comprises a mixture of clinical and environmental Bt1 strains, most of them involved in human clinical cases related to raw seafood ingestion. L2 is formed by a mixture of Bt1 and Bt2 strains from various sources, including diseased fish, and is related to the aquaculture industry. L3 is also linked to the aquaculture industry and includes Bt3 strains exclusively, mostly related to wound infections or secondary septicemia after farmed-fish handling. Lastly, L4 and L5 include a few strains of Bt1 associated with specific geographical areas. The phylogenetic trees for ChrI and II are not congruent to one another, which suggests that inter- and/or intra-chromosomal rearrangements have been produced along Vv evolution. Further, the phylogenetic trees for each chromosome and the virulence plasmid were also not congruent, which also suggests that pVvbt2 has been acquired independently by different clones, probably in fish farms. From all these clones, the one with zoonotic capabilities (Bt2-Serovar E has successfully spread worldwide. Based on these results, we propose a new updated classification of the species based on phylogenetic lineages rather than on Bts, as well as the inclusion of all Bt2 strains in a pathovar with the particular ability to cause fish vibriosis, for which we suggest the name “piscis.”
The species in the genus Oryza, encompassing nine genome types and 23 species, are a rich genetic resource and may have applications in deeper genomic analyses aiming to understand the evolution of plant genomes. With the advancement of next-generation sequencing (NGS) technology, a flood of Oryza species reference genomes and genomic variation information has become available in recent years. This genomic information, combined with the comprehensive phenotypic information that we are accumulating in our Oryzabase, can serve as an excellent genotype-phenotype association resource for analyzing rice functional and structural evolution, and the associated diversity of the Oryza genus. Here we integrate our previous and future phenotypic/habitat information and newly determined genotype information into a united repository, named OryzaGenome, providing the variant information with hyperlinks to Oryzabase. The current version of OryzaGenome includes genotype information of 446 O. rufipogon accessions derived by imputation and of 17 accessions derived by imputation-free deep sequencing. Two variant viewers are implemented: SNP Viewer as a conventional genome browser interface and Variant Table as a textbased browser for precise inspection of each variant one by one. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all of the closely and distantly related wild Oryza species from the NIG Wild Rice Collection will be available in future releases. All of the resources can be accessed through http://viewer.shigen.info/oryzagenome/.
Lakhwani, Deepika; Pandey, Ashutosh; Dhar, Yogeshwar Vikram; Bag, Sumit Kumar; Trivedi, Prabodh Kumar; Asif, Mehar Hasan
AP2/ERF domain containing transcription factor super family is one of the important regulators in the plant kingdom. The involvement of AP2/ERF family members has been elucidated in various processes associated with plant growth, development as well as in response to hormones, biotic and abiotic stresses. In this study, we carried out genome-wide analysis to identify members of AP2/ERF family in Musa acuminata (A genome) and Musa balbisiana (B genome) and changes leading to neofunctionalisation of genes. Analysis identified 265 and 318 AP2/ERF encoding genes in M. acuminata and M. balbisiana respectively which were further classified into ERF, DREB, AP2, RAV and Soloist groups. Comparative analysis indicated that AP2/ERF family has undergone duplication, loss and divergence during evolution and speciation of the Musa A and B genomes. We identified nine genes which are up-regulated during fruit ripening and might be components of the regulatory machinery operating during ethylene-dependent ripening in banana. Tissue-specific expression analysis of the genes suggests that different regulatory mechanisms might be involved in peel and pulp ripening process through recruiting specific ERFs in these tissues. Analysis also suggests that MaRAV-6 and MaERF026 have structurally diverged from their M. balbisiana counterparts and have attained new functions during ripening.
Zhang, Yanzhen; Ma, Ji; Yang, Bingxian; Li, Ruyi; Zhu, Wei; Sun, Lianli; Tian, Jingkui; Zhang, Lin
Taxus chinensis var. mairei (Taxaceae) is a domestic variety of yew species in local China. This plant is one of the sources for paclitaxel, which is a promising antineoplastic chemotherapy drugs during the last decade. We have sequenced the complete nucleotide sequence of the chloroplast (cp) genome of T. chinensis var. mairei. The T. chinensis var. mairei cp genome is 129,513 bp in length, with 113 single copy genes and two duplicated genes (trnI-CAU, trnQ-UUG). Among the 113 single copy genes, 9 are intron-containing. Compared to other land plant cp genomes, the T. chinensis var. mairei cp genome has lost one of the large inverted repeats (IRs) found in angiosperms, fern, liverwort, and gymnosperm such as Cycas revoluta and Ginkgo biloba L. Compared to related species, the gene order of T. chinensis var. mairei has a large inversion of ~110kb including 91 genes (from rps18 to accD) with gene contents unarranged. Repeat analysis identified 48 direct and 2 inverted repeats 30 bp long or longer with a sequence identity greater than 90%. Repeated short segments were found in genes rps18, rps19 and clpP. Analysis also revealed 22 simple sequence repeat (SSR) loci and almost all are composed of A or T. Copyright © 2014 Elsevier B.V. All rights reserved.
Full Text Available Forest health issues are on the rise in the United States, resulting from introduction of alien pests and diseases, coupled with abiotic stresses related to climate change. Increasingly, forest scientists are finding genetic/genomic resources valuable in addressing forest health issues. For a set of ten ecologically and economically important native hardwood tree species representing a broad phylogenetic spectrum, we used low coverage whole genome sequencing from multiplex Illumina paired ends to economically profile their genomic content. For six species, the genome content was further analyzed by flow cytometry in order to determine the nuclear genome size. Sequencing yielded a depth of 0.8X to 7.5X, from which in silico analysis yielded preliminary estimates of gene and repetitive sequence content in the genome for each species. Thousands of genomic SSRs were identified, with a clear predisposition toward dinucleotide repeats and AT-rich repeat motifs. Flanking primers were designed for SSR loci for all ten species, ranging from 891 loci in sugar maple to 18,167 in redbay. In summary, we have demonstrated that useful preliminary genome information including repeat content, gene content and useful SSR markers can be obtained at low cost and time input from a single lane of Illumina multiplex sequence.
Haase, Max A B; Kominek, Jacek; Langdon, Quinn K; Kurtzman, Cletus P; Hittinger, Chris Todd
Xylose fermentation is a rare trait that is immensely important to the cellulosic biofuel industry, and Candida tenuis is one of the few yeasts that has been reported with this trait. Here we report the isolation of two strains representing a candidate sister species to C. tenuis. Integrated analysis of genome sequence and physiology suggested the genetic basis of a number of traits, including variation between the novel species and C. tenuis in lactose metabolism due to the loss of genes encoding lactose permease and β-galactosidase in the former. Surprisingly, physiological characterization revealed that neither the type strain of C. tenuis nor this novel species fermented xylose in traditional assays. We reexamined three xylose-fermenting strains previously identified as C. tenuis and found that these strains belong to the genus Scheffersomyces and are not C. tenuis. We propose Yamadazyma laniorum f.a. sp. nov. to accommodate our new strains and designate its type strain as yHMH7 (=CBS 14780 = NRRL Y-63967T). Furthermore, we propose the transfer of Candida tenuis to the genus Yamadazyma as Yamadazyma tenuis comb. nov. This approach provides a roadmap for how integrated genome sequence and physiological analysis can yield insight into the mechanisms that generate yeast biodiversity. Published by Oxford University Press on behalf of FEMS 2017. This work is written by (a) US Government employee(s) and is in the public domain in the US.
Boffelli, Dario; Weer, Claire V.; Weng, Li; Lewis, Keith D.; Shoukry, Malak I.; Pachter, Lior; Keys, David N.; Rubin, Edward M.
Analysis of sequence variation among members of a single species offers a potential approach to identify functional DNA elements responsible for biological features unique to that species. Due to its high rate of allelic polymorphism and ease of genetic manipulability, we chose the sea squirt, Ciona intestinalis, to explore intra-species sequence comparisons for genome annotation. A large number of C. intestinalis specimens were collected from four continents and a set of genomic intervals amplified, resequenced and analyzed to determine the mutation rates at each nucleotide in the sequence. We found that regions with low mutation rates efficiently demarcated functionally constrained sequences: these include a set of noncoding elements, which we showed in C intestinalis transgenic assays to act as tissue-specific enhancers, as well as the location of coding sequences. This illustrates that comparisons of multiple members of a species can be used for genome annotation, suggesting a path for the annotation of the sequenced genomes of organisms occupying uncharacterized phylogenetic branches of the animal kingdom and raises the possibility that the resequencing of a large number of Homo sapiens individuals might be used to annotate the human genome and identify sequences defining traits unique to our species. The sequence data from this study has been submitted to GenBank under accession nos. AY667278-AY667407.
Moyle, Leonie C
Advances in genomics have rapidly accelerated research into the genetics of species differences, reproductive isolating barriers, and hybrid incompatibility. Recent genomic analyses in Drosophila species suggest that modified olfactory cues are involved in discrimination that is reinforced by natural selection.
Moolhuijzen, P; Cakir, M; Hunter, A; Schibeci, D; Macgregor, A; Smith, C; Francki, M; Jones, M G K; Appels, R; Bellgard, M
The identification of markers in legume pasture crops, which can be associated with traits such as protein and lipid production, disease resistance, and reduced pod shattering, is generally accepted as an important strategy for improving the agronomic performance of these crops. It has been demonstrated that many quantitative trait loci (QTLs) identified in one species can be found in other plant species. Detailed legume comparative genomic analyses can characterize the genome organization between model legume species (e.g., Medicago truncatula, Lotus japonicus) and economically important crops such as soybean (Glycine max), pea (Pisum sativum), chickpea (Cicer arietinum), and lupin (Lupinus angustifolius), thereby identifying candidate gene markers that can be used to track QTLs in lupin and pasture legume breeding. LegumeDB is a Web-based bioinformatics resource for legume researchers. LegumeDB analysis of Medicago truncatula expressed sequence tags (ESTs) has identified novel simple sequence repeat (SSR) markers (16 tested), some of which have been putatively linked to symbiosome membrane proteins in root nodules and cell-wall proteins important in plant-pathogen defence mechanisms. These novel markers by preliminary PCR assays have been detected in Medicago truncatula and detected in at least one other legume species, Lotus japonicus, Glycine max, Cicer arietinum, and (or) Lupinus angustifolius (15/16 tested). Ongoing research has validated some of these markers to map them in a range of legume species that can then be used to compile composite genetic and physical maps. In this paper, we outline the features and capabilities of LegumeDB as an interactive application that provides legume genetic and physical comparative maps, and the efficient feature identification and annotation of the vast tracks of model legume sequences for convenient data integration and visualization. LegumeDB has been used to identify potential novel cross-genera polymorphic legume
Wang, Dan; Zhang, Lin; Hu, JunFeng; Gao, Dianshuai; Liu, Xin; Sha, Yan
Lipases are physiologically important and ubiquitous enzymes that share a conserved domain and are classified into eight different families based on their amino acid sequences and fundamental biological properties. The Lipase3 family of lipases was reported to possess a canonical fold typical of α/β hydrolases and a typical catalytic triad, suggesting a distinct evolutionary origin for this family. Genes in the Lipase3 family do not have the same functions, but maintain the conserved Lipase3 domain. There have been extensive studies of Lipase3 structures and functions, but little is known about their evolutionary histories. In this study, all lipases within five plant species were identified, and their phylogenetic relationships and genetic properties were analyzed and used to group them into distinct evolutionary families. Each identified lipase family contained at least one dicot and monocot Lipase3 protein, indicating that the gene family was established before the split of dicots and monocots. Similar intron/exon numbers and predicted protein sequence lengths were found within individual groups. Twenty-four tandem Lipase3 gene duplications were identified, implying that the distinctive function of Lipase3 genes appears to be a consequence of translocation and neofunctionalization after gene duplication. The functional genes EDS1, PAD4, and SAG101 that are reportedly involved in pathogen response were all located in the same group. The nucleotide diversity (Dxy) and the ratio of nonsynonymous to synonymous nucleotide substitutions rates (Ka/Ks) of the three genes were significantly greater than the average across the genomes. We further observed evidence for selection maintaining diversity on three genes in the Toll-Interleukin-1 receptor type of nucleotide binding/leucine-rich repeat immune receptor (TIR-NBS LRR) immunity-response signaling pathway, indicating that they could be vulnerable to pathogen effectors.
Trees contribute to enormous plant oil reserves because many trees contain 50%-80% of oil (triacylglycerols, TAGs) in the fruits and kernels. TAGs accumulate in subcellular structures called oil bodies/droplets, in which TAGs are covered by low-molecular-mass hydrophobic proteins called oleosins (OLEs). The OLEs/TAGs ratio determines the size and shape of intracellular oil bodies. There is a lack of comprehensive sequence analysis and structural information of OLEs among diverse trees. The objectives of this study were to identify OLEs from 22 tree species (e.g., tung tree, tea-oil tree, castor bean), perform genome-wide analysis of OLEs, classify OLEs, identify conserved sequence motifs and amino acid residues, and predict secondary and three-dimensional structures in tree OLEs and OLE subfamilies. Data mining identified 65 OLEs with perfect conservation of the "proline knot" motif (PX5SPX3P) from 19 trees. These OLEs contained >40% hydrophobic amino acid residues. They displayed similar properties and amino acid composition. Genome-wide phylogenetic analysis and multiple sequence alignment demonstrated that these proteins could be classified into five OLE subfamilies. There were distinct patterns of sequence conservation among the OLE subfamilies and within individual tree species. Computational modeling indicated that OLEs were composed of at least three α-helixes connected with short coils without any β-strand and that they exhibited distinct 3D structures and ligand binding sites. These analyses provide fundamental information in the similarity and specificity of diverse OLE isoforms within the same subfamily and among the different species, which should facilitate studying the structure-function relationship and identify critical amino acid residues in OLEs for metabolic engineering of tree TAGs.
Omotayo Opemipo Oyedara
Full Text Available Bdellovibrio spp. are predatory bacteria with great potential as antimicrobial agents. Studies have shown that members of the genus Bdellovibrio exhibit peculiar characteristics that influence their ecological adaptations. In this study, whole genomes of two different Bdellovibrio spp. designated SKB1291214 and SSB218315 isolated from soil were sequenced. The core genes shared by all the Bdellovibrio spp. considered for the pangenome analysis including the epibiotic B. exovorus were 795. The number of unique genes identified in Bdellovibrio spp. SKB1291214, SSB218315, W, and B. exovorus JJS was 1343, 113, 857, and 1572, respectively. These unique genes encode hydrolytic, chemotaxis, and transporter proteins which might be useful for predation in the Bdellovibrio strains. Furthermore, the two Bdellovibrio strains exhibited differences based on the % GC content, amino acid identity, and 16S rRNA gene sequence. The 16S rRNA gene sequence of Bdellovibrio sp. SKB1291214 shared 99% identity with that of an uncultured Bdellovibrio sp. clone 12L 106 (a pairwise distance of 0.008 and 95–97% identity (a pairwise distance of 0.043 with that of other culturable terrestrial Bdellovibrio spp., including strain SSB218315. In Bdellovibrio sp. SKB1291214, 174 bp sequence was inserted at the host interaction (hit locus region usually attributed to prey attachment, invasion, and development of host independent Bdellovibrio phenotypes. Also, a gene equivalent to Bd0108 in B. bacteriovorus HD100 was not conserved in Bdellovibrio sp. SKB1291214. The results of this study provided information on the genetic characteristics and diversity of the genus Bdellovibrio that can contribute to their successful applications as a biocontrol agent.
Zhou, Hongsheng; Qi, Kaijie; Liu, Xing; Yin, Hao; Wang, Peng; Chen, Jianqing; Wu, Juyou; Zhang, Shaoling
The monovalent cation proton antiporters (CPAs) play essential roles in plant nutrition, development, and signal transduction by regulating ion and pH homeostasis of the cell. The CPAs of plants include the Na(+)/H(+) exchanger, K(+) efflux antiporter, and cation/H(+) exchanger families. However, currently, little is known about the CPA genes in Rosaceae species. In this study, 220 CPA genes were identified from five Rosaceae species (Pyrus bretschneideri, Malus domestica, Prunus persica, Fragaria vesca, and Prunus mume), and 53 of which came from P. bretschneideri. Phylogenetic, structure, collinearity, and gene expression analyses were conducted on the entire CPA genes of pear. Gene expression data showed that 35 and 37 CPA genes were expressed in pear fruit and pollen tubes, respectively. The transcript analysis of some CPA genes under abiotic stress conditions revealed that CPAs may play an important role in pollen tubes growth. The results presented here will be useful in improving understanding of the complexity of the CPA gene family and will promote functional characterization in future studies.
Qiao, Xin; Li, Meng; Li, Leiting; Yin, Hao; Wu, Juyou; Zhang, Shaoling
Heat shock transcription factors (Hsfs), which act as important transcriptional regulatory proteins in eukaryotes, play a central role in controlling the expression of heat-responsive genes. At present, the genomes of Chinese white pear ('Dangshansuli') and five other Rosaceae fruit crops have been fully sequenced. However, information about the Hsfs gene family in these Rosaceae species is limited, and the evolutionary history of the Hsfs gene family also remains unresolved. In this study, 137 Hsf genes were identified from six Rosaceae species (Pyrus bretschneideri, Malus × domestica, Prunus persica, Fragaria vesca, Prunus mume, and Pyrus communis), 29 of which came from Chinese white pear, designated as PbHsf. Based on the structural characteristics and phylogenetic analysis of these sequences, the Hsf family genes could be classified into three main groups (classes A, B, and C). Segmental and dispersed duplications were the primary forces underlying Hsf gene family expansion in the Rosaceae. Most of the PbHsf duplicated gene pairs were dated back to the recent whole-genome duplication (WGD, 30-45 million years ago (MYA)). Purifying selection also played a critical role in the evolution of Hsf genes. Transcriptome data demonstrated that the expression levels of the PbHsf genes were widely different. Six PbHsf genes were upregulated in fruit under naturally increased temperature. A comprehensive analysis of Hsf genes was performed in six Rosaceae species, and 137 full length Hsf genes were identified. The results presented here will undoubtedly be useful for better understanding the complexity of the Hsf gene family and will facilitate functional characterization in future studies.
Kersey, Paul J; Staines, Daniel M; Lawson, Daniel; Kulesha, Eugene; Derwent, Paul; Humphrey, Jay C; Hughes, Daniel S T; Keenan, Stephan; Kerhornou, Arnaud; Koscielny, Gautier; Langridge, Nicholas; McDowall, Mark D; Megy, Karine; Maheswari, Uma; Nuhn, Michael; Paulini, Michael; Pedro, Helder; Toneva, Iliana; Wilson, Derek; Yates, Andrew; Birney, Ewan
Ensembl Genomes (http://www.ensemblgenomes.org) is an integrative resource for genome-scale data from non-vertebrate species. The project exploits and extends technology (for genome annotation, analysis and dissemination) developed in the context of the (vertebrate-focused) Ensembl project and provides a complementary set of resources for non-vertebrate species through a consistent set of programmatic and interactive interfaces. These provide access to data including reference sequence, gene models, transcriptional data, polymorphisms and comparative analysis. Since its launch in 2009, Ensembl Genomes has undergone rapid expansion, with the goal of providing coverage of all major experimental organisms, and additionally including taxonomic reference points to provide the evolutionary context in which genes can be understood. Against the backdrop of a continuing increase in genome sequencing activities in all parts of the tree of life, we seek to work, wherever possible, with the communities actively generating and using data, and are participants in a growing range of collaborations involved in the annotation and analysis of genomes.
Nardeli, Sarah Muniz; Artico, Sinara; Aoyagi, Gustavo Mitsunori; de Moura, Stéfanie Menezes; da Franca Silva, Tatiane; Grossi-de-Sa, Maria Fatima; Romanel, Elisson; Alves-Ferreira, Marcio
The MADS-box gene family encodes transcription factors that share a highly conserved domain known to bind to DNA. Members of this family control various processes of development in plants, from root formation to fruit ripening. In this work, a survey of diploid (Gossypium raimondii and Gossypium arboreum) and tetraploid (Gossypium hirsutum) cotton genomes found a total of 147, 133 and 207 MADS-box genes, respectively, distributed in the MIKC, Mα, Mβ, Mγ, and Mδ subclades. A comparative phylogenetic analysis among cotton species, Arabidopsis, poplar and grapevine MADS-box homologous genes allowed us to evaluate the evolution of each MADS-box lineage in cotton plants and identify sequences within well-established subfamilies. Chromosomal localization and phylogenetic analysis revealed that G. raimondii and G. arboreum showed a conserved evolution of the MIKC subclade and a distinct pattern of duplication events in the Mα, Mγ and Mδ subclades. Additionally, G. hirsutum showed a combination of its parental subgenomes followed by a distinct evolutionary history including gene gain and loss in each subclade. qPCR analysis revealed the expression patterns of putative homologs in the AP1, AP3, AGL6, SEP4, AGL15, AG, AGL17, TM8, SVP, SOC and TT16 subfamilies of G. hirsutum. The identification of putative cotton orthologs is discussed in the light of evolution and gene expression data from other plants. This analysis of the MADS-box genes in Gossypium species opens an avenue to understanding the origin and evolution of each gene subfamily within diploid and polyploid species and paves the way for functional studies in cotton species. Copyright © 2018 Elsevier Masson SAS. All rights reserved.
Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching
The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314
Full Text Available Lactobacillus paracasei is a member of the normal human and animal gut microbiota and is used extensively in the food industry in starter cultures for dairy products or as probiotics. With the development of low-cost, high-throughput sequencing techniques it has become feasible to sequence many different strains of one species and to determine its "pan-genome". We have sequenced the genomes of 34 different L. paracasei strains, and performed a comparative genomics analysis. We analysed genome synteny and content, focussing on the pan-genome, core genome and variable genome. Each genome was shown to contain around 2800-3100 protein-coding genes, and comparative analysis identified over 4200 ortholog groups that comprise the pan-genome of this species, of which about 1800 ortholog groups make up the conserved core. Several factors previously associated with host-microbe interactions such as pili, cell-envelope proteinase, hydrolases p40 and p75 or the capacity to produce short branched-chain fatty acids (bkd operon are part of the L. paracasei core genome present in all analysed strains. The variome consists mainly of hypothetical proteins, phages, plasmids, transposon/conjugative elements, and known functions such as sugar metabolism, cell-surface proteins, transporters, CRISPR-associated proteins, and EPS biosynthesis proteins. An enormous variety and variability of sugar utilization gene cassettes were identified, with each strain harbouring between 25-53 cassettes, reflecting the high adaptability of L. paracasei to different niches. A phylogenomic tree was constructed based on total genome contents, and together with an analysis of horizontal gene transfer events we conclude that evolution of these L. paracasei strains is complex and not always related to niche adaptation. The results of this genome content comparison was used, together with high-throughput growth experiments on various carbohydrates, to perform gene-trait matching analysis
Susilo; Setyaningsih, M.
Solanum melongena (eggplant) is one of the diversity of the Solanum family which is grown and widely spread in Indonesia and widely used by the community. This research explored the genetic diversity of four local Indonesian eggplant species namely leuca, tekokak, gelatik and kopek by using RAPD (Random Amplified Polymorphic DNA). The samples were obtained from Agricultural Technology Assessment Institute (BPTP) Bogor, Indonesia. The result of data observation was in the form of Solanum melongena plant’s DNA profile analyzed descriptively and quantitatively. 30 DNA bands (28 polymorphic and 2 monomorphic) were successfully scored by using four primers (OPF-01, OPF-02, OPF-03, and OPF-04). The Primers were used able to amplify all of the four eggplant samples. The result of PCR-RAPD visualization produces bands of 300-1500 bp. The result of cluster analysis showed the existence of three clusters (A, B, and C). Cluster A (coefficient of equal to 49%) consisted of a gelatik, cluster B (coefficient of 65% equilibrium) consisted of TPU (Kopek) and TK (Tekokak), and cluster C (55% equilibrium coefficient) consisted of LC (Leunca). These results indicated that the closest proximity is found in samples of TK (Tekokak) and TPU (Kopek).
Signor, Sarah; Seher, Thaddeus; Kopp, Artyom
The development of genomic resources in non-model taxa is essential for understanding the genetic basis of biological diversity. Although the genomes of many Drosophila species have been sequenced, most of the phenotypic diversity in this genus remains to be explored. To facilitate the genetic analysis of interspecific and intraspecific variation, we have generated new genomic resources for seven species and subspecies in the D. ananassae species subgroup. We have generated large amounts of transcriptome sequence data for D. ercepeae, D. merina, D. bipectinata, D. malerkotliana malerkotliana, D. m. pallens, D. pseudoananassae pseudoananassae, and D. p. nigrens. de novo assembly resulted in contigs covering more than half of the predicted transcriptome and matching an average of 59% of annotated genes in the complete genome of D. ananassae. Most contigs, corresponding to an average of 49% of D. ananassae genes, contain sequence polymorphisms that can be used as genetic markers. Subsets of these markers were validated by genotyping the progeny of inter- and intraspecific crosses. The ananassae subgroup is an excellent model system for examining the molecular basis of speciation and phenotypic evolution. The new genomic resources will facilitate the genetic analysis of inter- and intraspecific differences in this lineage. Transcriptome sequencing provides a simple and cost-effective way to identify molecular markers at nearly single-gene density, and is equally applicable to any non-model taxa.
Full Text Available Amoeba-associated microorganisms (AAMs are frequently isolated from water networks. In this paper, we report the isolation and characterization of Protochlamydia massiliensis, an obligate intracellular Gram-negative bacterium belonging to the Parachlamydiaceae family in the Chlamydiales order, from a cooling water tower. This bacterium was isolated on Vermamoeba vermiformis. It has a multiple range of hosts among amoeba and is characterized by a typical replication cycle of Chlamydiae with a particularity, recently shown in some chlamydia, which is the absence of inclusion vacuoles in the V. vermiformis host, adding by this a new member of Chlamydiae undergoing developmental cycle changes in the newly adapted host V. vermiformis. Draft genome sequencing revealed a chromosome of 2.86 Mb consisting of four contigs and a plasmid of 92 Kb.
Komaki, Hisayuki; Ichikawa, Natsuko; Hosoyama, Akira; Takahashi-Nakaguchi, Azusa; Matsuzawa, Tetsuhiro; Suzuki, Ken-ichiro; Fujita, Nobuyuki; Gonoi, Tohru
Actinobacteria of the genus Nocardia usually live in soil or water and play saprophytic roles, but they also opportunistically infect the respiratory system, skin, and other organs of humans and animals. Primarily because of the clinical importance of the strains, some Nocardia genomes have been sequenced, and genome sequences have accumulated. Genome sizes of Nocardia strains are similar to those of Streptomyces strains, the producers of most antibiotics. In the present work, we compared secondary metabolite biosynthesis gene clusters of type-I polyketide synthase (PKS-I) and nonribosomal peptide synthetase (NRPS) among genomes of representative Nocardia species/strains based on domain organization and amino acid sequence homology. Draft genome sequences of Nocardia asteroides NBRC 15531(T), Nocardia otitidiscaviarum IFM 11049, Nocardia brasiliensis NBRC 14402(T), and N. brasiliensis IFM 10847 were read and compared with published complete genome sequences of Nocardia farcinica IFM 10152, Nocardia cyriacigeorgica GUH-2, and N. brasiliensis HUJEG-1. Genome sizes are as follows: N. farcinica, 6.0 Mb; N. cyriacigeorgica, 6.2 Mb; N. asteroides, 7.0 Mb; N. otitidiscaviarum, 7.8 Mb; and N. brasiliensis, 8.9 - 9.4 Mb. Predicted numbers of PKS-I, NRPS, and PKS-I/NRPS hybrid clusters ranged between 4-11, 7-13, and 1-6, respectively, depending on strains, and tended to increase with increasing genome size. Domain and module structures of representative or unique clusters are discussed in the text. We conclude the following: 1) genomes of Nocardia strains carry as many PKS-I and NRPS gene clusters as those of Streptomyces strains, 2) the number of PKS-I and NRPS gene clusters in Nocardia strains varies substantially depending on species, and N. brasiliensis strains carry the largest numbers of clusters among the species studied, 3) the seven Nocardia strains studied in the present work have seven common PKS-I and/or NRPS clusters, some of whose products are yet to be studied
Luis E. Rojas; Maritza Reyes; Naivy Pérez-Alonso; María I. Olóriz; Laisyn Posada-Pérez; Bárbara Ocaña; Orelvis Portal; Borys Chong-Pérez; Jorge L. Pérez Pérez
The extraction methods of genomic DNA are usually laborious and hazardous to human health and the environment by the use of organic solvents (chloroform and phenol). In this work a protocol for in situ extraction of genomic DNA by alkaline lysis is validated. It was used in order to amplify regions of DNA in four species of plants and fungi by polymerase chain reaction (PCR). From plant material of Saccharum officinarum L., Carica papaya L. and Digitalis purpurea L. it was possible to extend ...
The complete and fully assembled genome sequence of Aeromonas salmonicida subsp. pectinolytica and its comparative analysis with other Aeromonas species: investigation of the mobilome in environmental and pathogenic strains.
Pfeiffer, Friedhelm; Zamora-Lagos, Maria-Antonia; Blettinger, Martin; Yeroslaviz, Assa; Dahl, Andreas; Gruber, Stephan; Habermann, Bianca H
Due to the predominant usage of short-read sequencing to date, most bacterial genome sequences reported in the last years remain at the draft level. This precludes certain types of analyses, such as the in-depth analysis of genome plasticity. Here we report the finalized genome sequence of the environmental strain Aeromonas salmonicida subsp. pectinolytica 34mel, for which only a draft genome with 253 contigs is currently available. Successful completion of the transposon-rich genome critically depended on the PacBio long read sequencing technology. Using finalized genome sequences of A. salmonicida subsp. pectinolytica and other Aeromonads, we report the detailed analysis of the transposon composition of these bacterial species. Mobilome evolution is exemplified by a complex transposon, which has shifted from pathogenicity-related to environmental-related gene content in A. salmonicida subsp. pectinolytica 34mel. Obtaining the complete, circular genome of A. salmonicida subsp. pectinolytica allowed us to perform an in-depth analysis of its mobilome. We demonstrate the mobilome-dependent evolution of this strain's genetic profile from pathogenic to environmental.
Full Text Available Abstract Background Public databases now contain multitude of complete bacterial genomes, including several genomes of the same species. The available data offers new opportunities to address questions about bacterial genome evolution, a task that requires reliable fine comparison data of closely related genomes. Recent analyses have shown, using pairwise whole genome alignments, that it is possible to segment bacterial genomes into a common conserved backbone and strain-specific sequences called loops. Results Here, we generalize this approach and propose a strategy that allows systematic and non-biased genome segmentation based on multiple genome alignments. Segmentation analyses, as applied to 13 different bacterial species, confirmed the feasibility of our approach to discern the 'mosaic' organization of bacterial genomes. Segmentation results are available through a Web interface permitting functional analysis, extraction and visualization of the backbone/loops structure of documented genomes. To illustrate the potential of this approach, we performed a precise analysis of the mosaic organization of three E. coli strains and functional characterization of the loops. Conclusion The segmentation results including the backbone/loops structure of 13 bacterial species genomes are new and available for use by the scientific community at the URL: http://genome.jouy.inra.fr/mosaic.
Armijos-Jaramillo, Vinicio; Santander-Gordón, Daniela; Soria, Rosa; Pazmiño-Betancourth, Mauro; Echeverría, María Cristina
Streptomyces scabies is a common soil bacterium that causes scab symptoms in potatoes. Strong evidence indicates horizontal gene transfer (HGT) among bacteria has influenced the evolution of this plant pathogen and other Streptomyces spp. To extend the study of the HGT to the Streptomyces genus, we explored the effects of the inter-domain HGT in the S. scabies genome. We employed a semi-automatic pipeline based on BLASTp searches and phylogenetic reconstruction. The data show low impact of inter-domain HGT in the S. scabies genome; however, we found a putative plant pathogenesis related 1 (PR1) sequence in the genome of S. scabies and other species of the genus. It is possible that this gene could be used by S. scabies to out-compete other soil organisms. Copyright © 2016 Elsevier Inc. All rights reserved.
Gao, Xiao-Yang; Zhi, Xiao-Yang; Li, Hong-Wei; Klenk, Hans-Peter; Li, Wen-Jun
Members of the genus Streptococcus within the phylum Firmicutes are among the most diverse and significant zoonotic pathogens. This genus has gone through considerable taxonomic revision due to increasing improvements of chemotaxonomic approaches, DNA hybridization and 16S rRNA gene sequencing. It is proposed to place the majority of streptococci into “species groups”. However, the evolutionary implications of species groups are not clear presently. We use comparative genomic approaches to yield a better understanding of the evolution of Streptococcus through genome dynamics, population structure, phylogenies and virulence factor distribution of species groups. Genome dynamics analyses indicate that the pan-genome size increases with the addition of newly sequenced strains, while the core genome size decreases with sequential addition at the genus level and species group level. Population structure analysis reveals two distinct lineages, one including Pyogenic, Bovis, Mutans and Salivarius groups, and the other including Mitis, Anginosus and Unknown groups. Phylogenetic dendrograms show that species within the same species group cluster together, and infer two main clades in accordance with population structure analysis. Distribution of streptococcal virulence factors has no obvious patterns among the species groups; however, the evolution of some common virulence factors is congruous with the evolution of species groups, according to phylogenetic inference. We suggest that the proposed streptococcal species groups are reasonable from the viewpoints of comparative genomics; evolution of the genus is congruent with the individual evolutionary trajectories of different species groups. PMID:24977706
Ohyanagi, Hajime; Ebata, Toshinobu; Huang, Xuehui; Gong, Hao; Fujita, Masahiro; Mochizuki, Takako; Toyoda, Atsushi; Fujiyama, Asao; Kaminuma, Eli; Nakamura, Yasukazu; Feng, Qi; Wang, Zi Xuan; Han, Bin; Kurata, Nori
. Portable VCF (variant call format) file or tabdelimited file download is also available. Following these SNP (single nucleotide polymorphism) data, reference pseudomolecules/ scaffolds/contigs and genome-wide variation information for almost all
Salzberg, Steven L; Dunning Hotopp, Julie C; Delcher, Arthur L; Pop, Mihai; Smith, Douglas R; Eisen, Michael B; Nelson, William C
The Trace Archive is a repository for the raw, unanalyzed data generated by large-scale genome sequencing projects. The existence of this data offers scientists the possibility of discovering additional genomic sequences beyond those originally sequenced. In particular, if the source DNA for a sequencing project came from a species that was colonized by another organism, then the project may yield substantial amounts of genomic DNA, including near-complete genomes, from the symbiotic or parasitic organism. By searching the publicly available repository of DNA sequencing trace data, we discovered three new species of the bacterial endosymbiont Wolbachia pipientis in three different species of fruit fly: Drosophila ananassae, D. simulans, and D. mojavensis. We extracted all sequences with partial matches to a previously sequenced Wolbachia strain and assembled those sequences using customized software. For one of the three new species, the data recovered were sufficient to produce an assembly that covers more than 95% of the genome; for a second species the data produce the equivalent of a 'light shotgun' sampling of the genome, covering an estimated 75-80% of the genome; and for the third species the data cover approximately 6-7% of the genome. The results of this study reveal an unexpected benefit of depositing raw data in a central genome sequence repository: new species can be discovered within this data. The differences between these three new Wolbachia genomes and the previously sequenced strain revealed numerous rearrangements and insertions within each lineage and hundreds of novel genes. The three new genomes, with annotation, have been deposited in GenBank.
Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in
Full Text Available Eukaryotic genome size data are important both as the basis for comparative research into genome evolution and as estimators of the cost and difficulty of genome sequencing programs for non-model organisms. In this study, the genome size of 14 species of fireflies (Lampyridae (two genera in Lampyrinae, three genera in Luciolinae, and one genus in subfamily incertae sedis were estimated by propidium iodide (PI-based flow cytometry. The haploid genome sizes of Lampyridae ranged from 0.42 to 1.31 pg, a 3.1-fold span. Genome sizes of the fireflies varied within the tested subfamilies and genera. Lamprigera and Pyrocoelia species had large and small genome sizes, respectively. No correlation was found between genome size and morphological traits such as body length, body width, eye width, and antennal length. Our data provide additional information on genome size estimation of the firefly family Lampyridae. Furthermore, this study will help clarify the cost and difficulty of genome sequencing programs for non-model organisms and will help promote studies on firefly genome evolution.
Zhang, Yanjun; Du, Liuwen; Liu, Ao; Chen, Jianjun; Wu, Li; Hu, Weiming; Zhang, Wei; Kim, Kyunghee; Lee, Sang-Choon; Yang, Tae-Jin; Wang, Ying
Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp) genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR) region and the single-copy (SC) boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR) and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants. PMID:27014326
Full Text Available Epimedium L. is a phylogenetically and economically important genus in the family Berberidaceae. We here sequenced the complete chloroplast (cp genomes of four Epimedium species using Illumina sequencing technology via a combination of de novo and reference-guided assembly, which was also the first comprehensive cp genome analysis on Epimedium combining the cp genome sequence of E. koreanum previously reported. The five Epimedium cp genomes exhibited typical quadripartite and circular structure that was rather conserved in genomic structure and the synteny of gene order. However, these cp genomes presented obvious variations at the boundaries of the four regions because of the expansion and contraction of the inverted repeat (IR region and the single-copy (SC boundary regions. The trnQ-UUG duplication occurred in the five Epimedium cp genomes, which was not found in the other basal eudicotyledons. The rapidly evolving cp genome regions were detected among the five cp genomes, as well as the difference of simple sequence repeats (SSR and repeat sequence were identified. Phylogenetic relationships among the five Epimedium species based on their cp genomes showed accordance with the updated system of the genus on the whole, but reminded that the evolutionary relationships and the divisions of the genus need further investigation applying more evidences. The availability of these cp genomes provided valuable genetic information for accurately identifying species, taxonomy and phylogenetic resolution and evolution of Epimedium, and assist in exploration and utilization of Epimedium plants.
Asaf, Sajjad; Waqas, Muhammad; Khan, Abdul L; Khan, Muhammad A; Kang, Sang-Mo; Imran, Qari M; Shahzad, Raheem; Bilal, Saqib; Yun, Byung-Wook; Lee, In-Jung
Oryza minuta , a tetraploid wild relative of cultivated rice (family Poaceae), possesses a BBCC genome and contains genes that confer resistance to bacterial blight (BB) and white-backed (WBPH) and brown (BPH) plant hoppers. Based on the importance of this wild species, this study aimed to understand the phylogenetic relationships of O. minuta with other Oryza species through an in-depth analysis of the composition and diversity of the chloroplast (cp) genome. The analysis revealed a cp genome size of 135,094 bp with a typical quadripartite structure and consisting of a pair of inverted repeats separated by small and large single copies, 139 representative genes, and 419 randomly distributed microsatellites. The genomic organization, gene order, GC content and codon usage are similar to those of typical angiosperm cp genomes. Approximately 30 forward, 28 tandem and 20 palindromic repeats were detected in the O . minuta cp genome. Comparison of the complete O. minuta cp genome with another eleven Oryza species showed a high degree of sequence similarity and relatively high divergence of intergenic spacers. Phylogenetic analyses were conducted based on the complete genome sequence, 65 shared genes and matK gene showed same topologies and O. minuta forms a single clade with parental O. punctata . Thus, the complete O . minuta cp genome provides interesting insights and valuable information that can be used to identify related species and reconstruct its phylogeny.
Cao, Minh Duc; Allison, Lloyd; Dix, Trevor
Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.
Caimi, K; Repetto, S A; Varni, V; Ruybal, P
Leptospirosis is a zoonotic disease which global burden is increasing often related to climatic change. Hundreds of whole genome sequences from worldwide isolates of Leptospira spp. are available nowadays, together with online tools that permit to assign MLST sequence types (STs) directly from raw sequence data. In this work we have applied R7L-MLST to near 500 genomes and strains collection globally distributed. All 10 pathogenic species as well as intermediate were typed using this MLST scheme. The correlation observed between STs and serogroups in our previous work, is still satisfied with this higher dataset sustaining the implementation of MLST to assist serological classification as a complementary approach. Bayesian phylogenetic analysis of concatenated sequences from R7-MLST loci allowed us to resolve taxonomic inconsistencies but also showed that events such as recombination, gene conversion or lateral gene transfer played an important role in the evolution of Leptospira genus. Whole genome sequencing allows us to contribute with suitable epidemiologic information useful to apply in the design of control strategies and also in diagnostic methods for this illness. Copyright © 2017 Elsevier B.V. All rights reserved.
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.; Machida, Masayuki; Hirano, Takashi
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome. PMID:22912434
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E; Machida, Masayuki; Hirano, Takashi
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and compared the data with those of A. oryzae RIB40, a wild-type strain sequenced in 2005. The aim of this study was to evaluate the mutation pressure on the non-syntenic blocks (NSBs) of the genome, which were previously identified through comparative genomic analysis of A. oryzae, Aspergillus fumigatus, and Aspergillus nidulans. We found that genes within the NSBs of RIB326 accumulate mutations more frequently than those within the SBs, regardless of their distance from the telomeres or of their expression level. Our findings suggest that the high mutation frequency of NSBs might contribute to maintaining the diversity of the A. oryzae genome.
Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X
PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.
This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies.
Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor
Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.
Lu, Yan; Zhang, Chenglin; Wu, Xiaobing; Han, Haitang; Zhao, Yaofeng; Ren, Liming
Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC) library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλ)n. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates. PMID:26901135
Full Text Available Crocodilians are evolutionarily distinct reptiles that are distantly related to lizards and are thought to be the closest relatives of birds. Compared with birds and mammals, few studies have investigated the Ig light chain of crocodilians. Here, employing an Alligator sinensis genomic bacterial artificial chromosome (BAC library and available genome data, we characterized the genomic organization of the Alligator sinensis IgL gene loci. The Alligator sinensis has two IgL isotypes, λ and κ, the same as Anolis carolinensis. The Igλ locus contains 6 Cλ genes, each preceded by a Jλ gene, and 86 potentially functional Vλ genes upstream of (Jλ-Cλn. The Igκ locus contains a single Cκ gene, 6 Jκs and 62 functional Vκs. All VL genes are classified into a total of 31 families: 19 Vλ families and 12 Vκ families. Based on an analysis of the chromosomal location of the light chain genes among mammals, birds, lizards and frogs, the data further confirm that there are two IgL isotypes in the Alligator sinensis: Igλ and Igκ. By analyzing the cloned Igλ/κ cDNA, we identified a biased usage pattern of V families in the expressed Vλ and Vκ. An analysis of the junctions of the recombined VJ revealed the presence of N and P nucleotides in both expressed λ and κ sequences. Phylogenetic analysis of the V genes revealed V families shared by mammals, birds, reptiles and Xenopus, suggesting that these conserved V families are orthologous and have been retained during the evolution of IgL. Our data suggest that the Alligator sinensis IgL gene repertoire is highly diverse and complex and provide insight into immunoglobulin gene evolution in vertebrates.
Kim, Na-Rae; Kim, Kyunghee; Lee, Sang-Choon; Lee, Jung-Hoon; Cho, Seong-Hyun; Yu, Yeisoo; Kim, Young-Dong; Yang, Tae-Jin
Wisteria floribunda and Wisteria sinensis are ornamental woody vines in the Fabaceae. The complete chloroplast genome sequences of the two species were generated by de novo assembly using whole genome next generation sequences. The chloroplast genomes of W. floribunda and W. sinensis were 130 960 bp and 130 561 bp long, respectively, and showed inverted repeat (IR)-lacking structures as those reported in IRLC in the Fabaceae. The chloroplast genomes of both species contained same number of protein-coding sequences (77), tRNA genes (30), and rRNA genes (4). The phylogenetic analysis with the reported chloroplast genomes confirmed close taxonomical relationship of W. floribunda and W. sinensis.
Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping
Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.
Rep, M.; Kistler, H.C.
Comparative genomics is a powerful tool to infer the molecular basis of fungal pathogenicity and its evolution by identifying differences in gene content and genomic organization between fungi with different hosts or modes of infection. Through comparative analysis, pathogenicity-related chromosomes
Yu, Jie; Song, Yuqin; Ren, Yan; Qing, Yanting; Liu, Wenjun; Sun, Zhihong
The genomic diversity of different species within the genus Lactococcus and the relationships between genomic differentiation and environmental factors remain unclear. In this study, type isolates of ten Lactococcus species/subspecies were sequenced to assess their genomic characteristics, metabolic diversity, and phylogenetic relationships. The total genome sizes varied between 1.99 (Lactococcus plantarum) and 2.46 megabases (Mb; L. lactis subsp. lactis), and the G + C content ranged from 34.81 (L. lactis subsp. hordniae) to 39.67% (L. raffinolactis) with an average value of 37.02%. Analysis of genome dynamics indicated that the genus Lactococcus has an open pan-genome, while the core genome size decreased with sequential addition at the genus and species group levels. A phylogenetic dendrogram based on the concatenated amino acid sequences of 643 core genes was largely consistent with the phylogenetic tree obtained by 16S ribosomal RNA (rRNA) genes, but it provided a more robust phylogenetic resolution than the 16S rRNA gene-based analysis. Comparative genomics indicated that species in the genus Lactococcus had high degrees of diversity in genome size, gene content, and carbohydrate metabolism. This may be important for the specific adaptations that allow different Lactococcus species to survive in different environments. These results provide a quantitative basis for understanding the genomic and metabolic diversity within the genus Lactococcus, laying the foundation for future studies on taxonomy and functional genomics.
Hamzah, Haider Mousa
In the microbial fuel cell (MFC) project, power generation from Shewanella oneidensis MR-1 was analyzed looking for a novel system for both energy generation and sustainability. The results suggest the possibility of generating electricity from different organic substances, which include agricultural and industrial by-products. Shewanella oneidensis MR-1 generates usable electrons at 30°C using both submerged and solid state cultures. In the MFC biocathode experiment, most of the CO2 generated at the anodic chamber was converted into bicarbonate due the activity of carbonic anhydrase (CA) of the Gluconobacter sp.33 strain. These findings demonstrate the possibility of generation of electricity while at the same time allowing the biomimetic sequestration of CO2 using bacterial CA. In the mitochondrial genomes project, the filamentous fungal species Fusarium oxysporum was used as a model. This species causes wilt of several important agricultural crops. A previous study revealed that a highly variable region (HVR) in the mitochondrial DNA (mtDNA) of three species of Fusarium contained a large, variable unidentified open reading frame (LV-uORF). Using specific primers for two regions of the LV-uORF, six strains were found to contain the ORF by PCR and database searches identified 18 other strains outside of the Fusarium oxysporum species complex. The LV-uORF was also identified in three isolates of the F. oxysporum species complex. Interestingly, several F. oxysporum isolates lack the LV-uORF and instead contain 13 ORFs in the HVR, nine of which are unidentified. The high GC content and codon usage of the LV-uORF indicate that it did not co-evolve with other mt genes and was horizontally acquired and was introduced to the Fusarium lineage prior to speciation. The nonsynonymous/synonymous (dN/dS) ratio of the LV-uORFs (0.43) suggests it is under purifying selection and the putative polypeptide is predicted to be located in the mitochondrial membrane. Growth assays
Full Text Available Agassiz's desert tortoise (Gopherus agassizii is a long-lived species native to the Mojave Desert and is listed as threatened under the US Endangered Species Act. To aid conservation efforts for preserving the genetic diversity of this species, we generated a whole genome reference sequence with an annotation based on deep transcriptome sequences of adult skeletal muscle, lung, brain, and blood. The draft genome assembly for G. agassizii has a scaffold N50 length of 252 kbp and a total length of 2.4 Gbp. Genome annotation reveals 20,172 protein-coding genes in the G. agassizii assembly, and that gene structure is more similar to chicken than other turtles. We provide a series of comparative analyses demonstrating (1 that turtles are among the slowest-evolving genome-enabled reptiles, (2 amino acid changes in genes controlling desert tortoise traits such as shell development, longevity and osmoregulation, and (3 fixed variants across the Gopherus species complex in genes related to desert adaptations, including circadian rhythm and innate immune response. This G. agassizii genome reference and annotation is the first such resource for any tortoise, and will serve as a foundation for future analysis of the genetic basis of adaptations to the desert environment, allow for investigation into genomic factors affecting tortoise health, disease and longevity, and serve as a valuable resource for additional studies in this species complex.
Elizabeth B Liu
Full Text Available In November of 2007 a human adenovirus (HAdV was isolated from a bronchoalveolar lavage (BAL sample recovered from a biopsy of an AIDS patient who presented with fever, cough, tachycardia, and expiratory wheezes. To better understand the isolated virus, the genome was sequenced and analyzed using bioinformatic and phylogenomic analysis. The results suggest that this novel virus, which is provisionally named HAdV-D59, may have been created from multiple recombination events. Specifically, the penton, hexon, and fiber genes have high nucleotide identity to HAdV-D19C, HAdV-D25, and HAdV-D56, respectively. Serological results demonstrated that HAdV-D59 has a neutralization profile that is similar yet not identical to that of HAdV-D25. Furthermore, we observed a two-fold difference between the ability of HAdV-D15 and HAdV-D25 to be neutralized by reciprocal antiserum indicating that the two hexon proteins may be more similar in epitopic conformation than previously assumed. In contrast, hexon loops 1 and 2 of HAdV-D15 and HAdV-D25 share 79.13 and 92.56 percent nucleotide identity, respectively. These data suggest that serology and genomics do not always correlate.
Checcucci, Alice; Mengoni, Alessio
Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.
Yu, Xiang-Qin; Drew, Bryan T; Yang, Jun-Bo; Gao, Lian-Ming; Li, De-Zhu
Schima is an ecologically and economically important woody genus in tea family (Theaceae). Unresolved species delimitations and phylogenetic relationships within Schima limit our understanding of the genus and hinder utilization of the genus for economic purposes. In the present study, we conducted comparative analysis among the complete chloroplast (cp) genomes of 11 Schima species. Our results indicate that Schima cp genomes possess a typical quadripartite structure, with conserved genomic structure and gene order. The size of the Schima cp genome is about 157 kilo base pairs (kb). They consistently encode 114 unique genes, including 80 protein-coding genes, 30 tRNAs, and 4 rRNAs, with 17 duplicated in the inverted repeat (IR). These cp genomes are highly conserved and do not show obvious expansion or contraction of the IR region. The percent variability of the 68 coding and 93 noncoding (>150 bp) fragments is consistently less than 3%. The seven most widely touted DNA barcode regions as well as one promising barcode candidate showed low sequence divergence. Eight mutational hotspots were identified from the 11 cp genomes. These hotspots may potentially be useful as specific DNA barcodes for species identification of Schima. The 58 cpSSR loci reported here are complementary to the microsatellite markers identified from the nuclear genome, and will be leveraged for further population-level studies. Phylogenetic relationships among the 11 Schima species were resolved with strong support based on the cp genome data set, which corresponds well with the species distribution pattern. The data presented here will serve as a foundation to facilitate species identification, DNA barcoding and phylogenetic reconstructions for future exploration of Schima.
Min, Xueyang; Zhang, Zhengshe; Liu, Yisong; Wei, Xingyi; Liu, Zhipeng; Wang, Yanrong; Liu, Wenxian
Microsatellite (simple sequence repeats, SSRs) marker is one of the most widely used markers in marker-assisted breeding. As one type of functional markers, MicroRNA-based SSR (miRNA-SSR) markers have been exploited mainly in animals, but the development and characterization of miRNA-SSR markers in plants are still limited. In the present study, miRNA-SSR markers for Medicago truncatula ( M. truncatula ) were developed and their cross-species transferability in six leguminous species was evaluated. A total of 169 primer pairs were successfully designed from 130 M. truncatula miRNA genes, the majority of which were mononucleotide repeats (70.41%), followed by dinucleotide repeats (14.20%), compound repeats (11.24%) and trinucleotide repeats (4.14%). Functional classification of SSR-containing miRNA genes showed that all targets could be grouped into three Gene Ontology (GO) categories: 17 in biological process, 11 in molecular function, and 14 in cellular component. The miRNA-SSR markers showed high transferability in other six leguminous species, ranged from 74.56% to 90.53%. Furthermore, 25 Mt - miRNA-SSR markers were used to evaluate polymorphisms in 20 alfalfa accessions, and the polymorphism information content (PIC) values ranged from 0.39 to 0.89 with an average of 0.71, the allele number per marker varied from 3 to 18 with an average of 7.88, indicating a high level of informativeness. The present study is the first time developed and characterized of M. truncatula miRNA-SSRs and demonstrated their utility in transferability, these novel markers will be valuable for genetic diversity analysis, marker-assisted selection and genotyping in leguminous species.
Full Text Available Because it is suspected that gene content may partly explain host adaptation and ecology of pathogenic bacteria, it is important to study factors affecting genome composition and its evolution. While recent genomic advances have revealed extremely large pan-genomes for some bacterial species, it remains difficult to predict to what extent gene pool is accessible within or transferable between populations. As genomes bear imprints of the history of the organisms, gene distribution pattern analyses should provide insights into the forces and factors at play in the shaping and maintaining of bacterial genomes. In this study, we revisited the data obtained from a previous CGH microarrays analysis in order to assess the genomic plasticity of the R. solanacearum species complex. Gene distribution analyses demonstrated the remarkably dispersed genome of R. solanacearum with more than half of the genes being accessory. From the reconstruction of the ancestral genomes compositions, we were able to infer the number of gene gain and loss events along the phylogeny. Analyses of gene movement patterns reveal that factors associated with gene function, genomic localization and ecology delineate gene flow patterns. While the chromosome displayed lower rates of movement, the megaplasmid was clearly associated with hot-spots of gene gain and loss. Gene function was also confirmed to be an essential factor in gene gain and loss dynamics with significant differences in movement patterns between different COG categories. Finally, analyses of gene distribution highlighted possible highways of horizontal gene transfer. Due to sampling and design bias, we can only speculate on factors at play in this gene movement dynamic. Further studies examining precise conditions that favor gene transfer would provide invaluable insights in the fate of bacteria, species delineation and the emergence of successful pathogens.
Dixit, Purushottam D; Pang, Tin Yau; Maslov, Sergei
While bacteria divide clonally, horizontal gene transfer followed by homologous recombination is now recognized as an important contributor to their evolution. However, the details of how the competition between clonality and recombination shapes genome diversity remains poorly understood. Using a computational model, we find two principal regimes in bacterial evolution and identify two composite parameters that dictate the evolutionary fate of bacterial species. In the divergent regime, characterized by either a low recombination frequency or strict barriers to recombination, cohesion due to recombination is not sufficient to overcome the mutational drift. As a consequence, the divergence between pairs of genomes in the population steadily increases in the course of their evolution. The species lacks genetic coherence with sexually isolated clonal subpopulations continuously formed and dissolved. In contrast, in the metastable regime, characterized by a high recombination frequency combined with low barriers to recombination, genomes continuously recombine with the rest of the population. The population remains genetically cohesive and temporally stable. Notably, the transition between these two regimes can be affected by relatively small changes in evolutionary parameters. Using the Multi Locus Sequence Typing (MLST) data, we classify a number of bacterial species to be either the divergent or the metastable type. Generalizations of our framework to include selection, ecologically structured populations, and horizontal gene transfer of nonhomologous regions are discussed as well. Copyright © 2017 by the Genetics Society of America.
Ma, Qiuyue; Li, Shuxian; Bi, Changwei; Hao, Zhaodong; Sun, Congrui; Ye, Ning
Ziziphus jujuba is an important woody plant with high economic and medicinal value. Here, we analyzed and characterized the complete chloroplast (cp) genome of Z. jujuba, the first member of the Rhamnaceae family for which the chloroplast genome sequence has been reported. We also built a web browser for navigating the cp genome of Z. jujuba ( http://bio.njfu.edu.cn/gb2/gbrowse/Ziziphus_jujuba_cp/ ). Sequence analysis showed that this cp genome is 161,466 bp long and has a typical quadripartite structure of large (LSC, 89,120 bp) and small (SSC, 19,348 bp) single-copy regions separated by a pair of inverted repeats (IRs, 26,499 bp). The sequence contained 112 unique genes, including 78 protein-coding genes, 30 transfer RNAs, and four ribosomal RNAs. The genome structure, gene order, GC content, and codon usage are similar to other typical angiosperm cp genomes. A total of 38 tandem repeats, two forward repeats, and three palindromic repeats were detected in the Z. jujuba cp genome. Simple sequence repeat (SSR) analysis revealed that most SSRs were AT-rich. The homopolymer regions in the cp genome of Z. jujuba were verified and manually corrected by Sanger sequencing. One-third of mononucleotide repeats were found to be erroneously sequenced by the 454 pyrosequencing, which resulted in sequences of 1-4 bases shorter than that by the Sanger sequencing. Analyzing the cp genome of Z. jujuba revealed that the IR contraction and expansion events resulted in ycf1 and rps19 pseudogenes. A phylogenetic analysis based on 64 protein-coding genes showed that Z. jujuba was closely related to members of the Elaeagnaceae family, which will be helpful for phylogenetic studies of other Rosales species. The complete cp genome sequence of Z. jujuba will facilitate population, phylogenetic, and cp genetic engineering studies of this economic plant.
Genome wide analysis of orthologous clusters is an important component of comparative genomics studies. Identifying the overlap among orthologous clusters can enable us to elucidate the function and evolution of proteins across multiple species. Here, we report a web platform named OrthoVenn that i...
Mattagajasingh, Ilwola; Mukherjee, Arup Kumar; Das, Premananda
Thirty-one species of Mammillaria were selected to study the molecular phylogeny using random amplified polymorphic DNA (RAPD) markers. High amount of mucilage (gelling polysaccharides) present in Mammillaria was a major obstacle in isolating good quality genomic DNA. The CTAB (cetyl trimethyl ammonium bromide) method was modified to obtain good quality genomic DNA. Twenty-two random decamer primers resulted in 621 bands, all of which were polymorphic. The similarity matrix value varied from 0.109 to 0.622 indicating wide variability among the studied species. The dendrogram obtained from the unweighted pair group method using arithmetic averages (UPGMA) analysis revealed that some of the species did not follow the conventional classification. The present work shows the usefulness of RAPD markers for genetic characterization to establish phylogenetic relations among Mammillaria species.
Kelly, William J.; Altermann, Eric; Lambie, Suzanne C.; Leahy, Sinead C.
Phages of the P335 species infect Lactococcus lactis and have been particularly studied because of their association with strains of L. lactis subsp. cremoris used as dairy starter cultures. Unlike other lactococcal phages, those of the P335 species may have a temperate or lytic lifestyle, and are believed to originate from the starter cultures themselves. We have sequenced the genome of L. lactis subsp. cremoris KW2 isolated from fermented corn and found that it contains an integrated P335 species prophage. This 41 kb prophage (Φ KW2) has a mosaic structure with functional modules that are highly similar to several other phages of the P335 species associated with dairy starter cultures. Comparison of the genomes of 26 phages of the P335 species, with either a lytic or temperate lifestyle, shows that they can be divided into three groups and that the morphogenesis gene region is the most conserved. Analysis of these phage genomes in conjunction with the genomes of several L. lactis strains shows that prophage insertion is site specific and occurs at seven different chromosomal locations. Exactly how induced or lytic phages of the P335 species interact with carbohydrate cell surface receptors in the host cell envelope remains to be determined. Genes for the biosynthesis of a variable cell surface polysaccharide and for lipoteichoic acids (LTAs) are found in L. lactis and are the main candidates for phage receptors, as the genes for other cell surface carbohydrates have been lost from dairy starter strains. Overall, phages of the P335 species appear to have had only a minor role in the adaptation of L. lactis subsp. cremoris strains to the dairy environment, and instead they appear to be an integral part of the L. lactis chromosome. There remains a great deal to be discovered about their role, and their contribution to the evolution of the bacterial genome. PMID:24009606
Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France
Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were
Fu, Chao-Nan; Li, Hong-Tao; Milne, Richard; Zhang, Ting; Ma, Peng-Fei; Yang, Jing; Li, De-Zhu; Gao, Lian-Ming
The Cornales is the basal lineage of the asterids, the largest angiosperm clade. Phylogenetic relationships within the order were previously not fully resolved. Fifteen plastid genomes representing 14 species, ten genera and seven families of Cornales were newly sequenced for comparative analyses of genome features, evolution, and phylogenomics based on different partitioning schemes and filtering strategies. All plastomes of the 14 Cornales species had the typical quadripartite structure with a genome size ranging from 156,567 bp to 158,715 bp, which included two inverted repeats (25,859-26,451 bp) separated by a large single-copy region (86,089-87,835 bp) and a small single-copy region (18,250-18,856 bp) region. These plastomes encoded the same set of 114 unique genes including 31 transfer RNA, 4 ribosomal RNA and 79 coding genes, with an identical gene order across all examined Cornales species. Two genes (rpl22 and ycf15) contained premature stop codons in seven and five species respectively. The phylogenetic relationships among all sampled species were fully resolved with maximum support. Different filtering strategies (none, light and strict) of sequence alignment did not have an effect on these relationships. The topology recovered from coding and noncoding data sets was the same as for the whole plastome, regardless of filtering strategy. Moreover, mutational hotspots and highly informative regions were identified. Phylogenetic relationships among families and intergeneric relationships within family of Cornales were well resolved. Different filtering strategies and partitioning schemes do not influence the relationships. Plastid genomes have great potential to resolve deep phylogenetic relationships of plants.
Gaynor, Kaitlyn M; Solomon, Joseph W; Siller, Stefanie; Jessell, Linnet; Duffy, J Emmett; Rubenstein, Dustin R
Molecular markers are powerful tools for studying patterns of relatedness and parentage within populations and for making inferences about social evolution. However, the development of molecular markers for simultaneous study of multiple species presents challenges, particularly when species exhibit genome duplication or polyploidy. We developed microsatellite markers for Synalpheus shrimp, a genus in which species exhibit not only great variation in social organization, but also interspecific variation in genome size and partial genome duplication. From the four primary clades within Synalpheus, we identified microsatellites in the genomes of four species and in the consensus transcriptome of two species. Ultimately, we designed and tested primers for 143 microsatellite markers across 25 species. Although the majority of markers were disomic, many markers were polysomic for certain species. Surprisingly, we found no relationship between genome size and the number of polysomic markers. As expected, markers developed for a given species amplified better for closely related species than for more distant relatives. Finally, the markers developed from the transcriptome were more likely to work successfully and to be disomic than those developed from the genome, suggesting that consensus transcriptomes are likely to be conserved across species. Our findings suggest that the transcriptome, particularly consensus sequences from multiple species, can be a valuable source of molecular markers for taxa with complex, duplicated genomes. © 2017 John Wiley & Sons Ltd.
OhEigeartaigh, Sean S
Abstract Background In standard BLAST searches, no information other than the sequences of the query and the database entries is considered. However, in situations where two genes from different species have only borderline similarity in a BLAST search, the discovery that the genes are located within a region of conserved gene order (synteny) can provide additional evidence that they are orthologs. Thus, for interpreting borderline search results, it would be useful to know whether the syntenic context of a database hit is similar to that of the query. This principle has often been used in investigations of particular genes or genomic regions, but to our knowledge it has never been implemented systematically. Results We made use of the synteny information contained in the Yeast Gene Order Browser database for 11 yeast species to carry out a systematic search for protein-coding genes that were overlooked in the original annotations of one or more yeast genomes but which are syntenic with their orthologs. Such genes tend to have been overlooked because they are short, highly divergent, or contain introns. The key features of our software - called SearchDOGS - are that the database entries are classified into sets of genomic segments that are already known to be orthologous, and that very weak BLAST hits are retained for further analysis if their genomic location is similar to that of the query. Using SearchDOGS we identified 595 additional protein-coding genes among the 11 yeast species, including two new genes in Saccharomyces cerevisiae. We found additional genes for the mating pheromone a-factor in six species including Kluyveromyces lactis. Conclusions SearchDOGS has proven highly successful for identifying overlooked genes in the yeast genomes. We anticipate that our approach can be adapted for study of further groups of species, such as bacterial genomes. More generally, the concept of doing sequence similarity searches against databases to which external
Wu, L.; Liu, X.; Fields, M.W.; Thompson, D.K.; Bagwell, C.E.; Tiedje, J. M.; Hazen, T.C.; Zhou, J.
The definition and delineation of microbial species are of great importance and challenge due to the extent of evolution and diversity. Whole-genome DNA-DNA hybridization is the cornerstone for defining procaryotic species relatedness, but obtaining pairwise DNA-DNA reassociation values for a comprehensive phylogenetic analysis of procaryotes is tedious and time consuming. A previously described microarray format containing whole-genomic DNA (the community genome array or CGA) was rigorously evaluated as a high-throughput alternative to the traditional DNA-DNA reassociation approach for delineating procaryotic species relationships. DNA similarities for multiple bacterial strains obtained with the CGA-based hybridization were comparable to those obtained with various traditional whole-genome hybridization methods (r=0.87, P<0.01). Significant linear relationships were also observed between the CGA-based genome similarities and those derived from small subunit (SSU) rRNA gene sequences (r=0.79, P<0.0001), gyrB sequences (r=0.95, P<0.0001) or REP- and BOX-PCR fingerprinting profiles (r=0.82, P<0.0001). The CGA hybridization-revealed species relationships in several representative genera, including Pseudomonas, Azoarcus and Shewanella, were largely congruent with previous classifications based on various conventional whole-genome DNA-DNA reassociation, SSU rRNA and/or gyrB analyses. These results suggest that CGA-based DNA-DNA hybridization could serve as a powerful, high-throughput format for determining species relatedness among microorganisms.
[Phylogenetic relationships of the species of Oxytropis DC. subg. Oxytropis and Phacoxytropis (Fabaceae) from Asian Russia inferred from the nucleotide sequence analysis of the intergenic spacers of the chloroplast genome].
Kholina, A B; Kozyrenko, M M; Artyukova, E V; Sandanov, D V; Andrianova, E A
The nucleotide sequence analysis of trnH–psbA, trnL–trnF, and trnS–trnG intergenic spacer regions of chloroplast DNA performed in the representatives of the genus Oxytropis from Asian Russia provided clarification of the phylogenetic relationships of some species and sections in the subgenera Oxytropis and Phacoxytropis and in the genus Oxytropis as a whole. Only the section Mesogaea corresponds to the subgenus Phacoxytropis, while the section Janthina of the same subgenus groups together with the sections of the subgenus Oxytropis. The sections Chrysantha and Ortholoma of the subgenus Oxytropis are not only closely related to each other, but together with the section Mesogaea, they are grouped into the subgenus Phacoxytropis. It seems likely that the sections Chrysantha and Ortholoma should be assigned to the subgenus Phacoxytropis, and the section Janthina should be assigned to the subgenus Oxytropis. The molecular differences were identified between O. coerulea and O. mandshurica from the section Janthina that were indicative of considerable divergence of their chloroplast genomes and the species independence of the taxa. The species independence of O. czukotica belonging to the section Arctobia was also confirmed.
Kirsten Maren Ellegaard
Full Text Available The importance of host-specialization to speciation processes in obligate host-associated bacteria is well known, as is also the ability of recombination to generate cohesion in bacterial populations. However, whether divergent strains of highly recombining intracellular bacteria, such as Wolbachia, can maintain their genetic distinctness when infecting the same host is not known. We first developed a protocol for the genome sequencing of uncultivable endosymbionts. Using this method, we have sequenced the complete genomes of the Wolbachia strains wHa and wNo, which occur as natural double infections in Drosophila simulans populations on the Seychelles and in New Caledonia. Taxonomically, wHa belong to supergroup A and wNo to supergroup B. A comparative genomics study including additional strains supported the supergroup classification scheme and revealed 24 and 33 group-specific genes, putatively involved in host-adaptation processes. Recombination frequencies were high for strains of the same supergroup despite different host-preference patterns, leading to genomic cohesion. The inferred recombination fragments for strains of different supergroups were of short sizes, and the genomes of the co-infecting Wolbachia strains wHa and wNo were not more similar to each other and did not share more genes than other A- and B-group strains that infect different hosts. We conclude that Wolbachia strains of supergroup A and B represent genetically distinct clades, and that strains of different supergroups can co-exist in the same arthropod host without converging into the same species. This suggests that the supergroups are irreversibly separated and that barriers other than host-specialization are able to maintain distinct clades in recombining endosymbiont populations. Acquiring a good knowledge of the barriers to genetic exchange in Wolbachia will advance our understanding of how endosymbiont communities are constructed from vertically and horizontally
Full Text Available Rehmannia is a non-parasitic genus in Orobanchaceae including six species mainly distributed in central and north China. Its phylogenetic position and infrageneric relationships remain uncertain due to potential hybridization and polyploidization. In this study, we sequenced and compared the complete chloroplast genomes of six Rehmannia species using Illumina sequencing technology to elucidate the interspecific variations. Rehmannia plastomes exhibited typical quadripartite and circular structures with good synteny of gene order. The complete genomes ranged from 153,622 bp to 154,055 bp in length, including 133 genes encoding 88 proteins, 37 tRNAs, and 8 rRNAs. Three genes (rpoA, rpoC2, accD have potentially experienced positive selection. Plastome size variation of Rehmannia was mainly ascribed to the expansion and contraction of the border regions between the inverted repeat (IR region and the single-copy (SC regions. Despite of the conserved structure in Rehmannia plastomes, sequence variations provide useful phylogenetic information. Phylogenetic trees of 23 Lamiales species reconstructed with the complete plastomes suggested that Rehmannia was monophyletic and sister to the clade of Lindenbergia and the parasitic taxa in Orobanchaceae. The interspecific relationships within Rehmannia were completely different with the previous studies. In future, population phylogenomic works based on plastomes are urgently needed to clarify the evolutionary history of Rehmannia.
Varghese, Neha; Mukherjee, Supratim; ivanova, Natalia; Mavromatis, Konstantinos; Konstantinidis, Konstantinos; Kyrpides, Nikos; Pati, Amrita
For a long time prokaryotic species definition has been under debate and a constant source of turmoil in microbiology. This has recently prompted the ASM to call for a scalable and reproducible technique, which uses meaningful commonalities to cluster microorganisms into groups corresponding to prokaryotic species. Whole-genome Average Nucleotide Identity (gANI) was previously suggested as a measure of genetic distance that generally agrees with prokaryotic species assignments based on the accepted best practices (DNA-DNA hybridization and 16S rDNA similarity). In this work, we prove that gANI is indeed the meaningful commonality based on which microorganisms can be grouped into the aforementioned clusters. By analyzing 1.76 million pairs of genomes we find that identification of the closest relatives of an organism via gANI is precise, scalable, reproducible, and reflects the evolutionary dynamics of microbes. We model the previously unexplored statistical properties of gANI using 6,000 microbial genomes and apply species-specific gANI cutoffs to reveal anomalies in the current taxonomic species definitions for almost 50percent of the species with multiple genome sequences. We also provide evidence of speciation events and genetic continuums in 17.8percent of those species. We consider disagreements between gANI-based groupings and named species and demonstrate that the former have all the desired features to serve as the much-needed natural groups for moving forward with taxonomy. Further, the groupings identified are presented in detail at http://ani.jgi-psf.org to facilitate comprehensive downstream analysis for researchers across different disciplines
Li, Peirong; Zhang, Shujiang; Li, Fei; Zhang, Shifan; Zhang, Hui; Wang, Xiaowu; Sun, Rifei; Bonnema, Guusje; Borm, Theo J.A.
The Brassica genus comprises many economically important worldwide cultivated crops. The well-established model of the Brassica genus, U’s triangle, consists of three basic diploid plant species (Brassica rapa, Brassica oleracea, and Brassica nigra) and three amphidiploid species (Brassica napus,
Umemura, Myco; Koike, Hideaki; Yamane, Noriko; Koyama, Yoshinori; Satou, Yuki; Kikuzato, Ikuya; Teruya, Morimi; Tsukahara, Masatoshi; Imada, Yumi; Wachi, Youji; Miwa, Yukino; Yano, Shuichi; Tamano, Koichi; Kawarabayasi, Yutaka; Fujimori, Kazuhiro E.
Aspergillus oryzae has been utilized for over 1000 years in Japan for the production of various traditional foods, and a large number of A. oryzae strains have been isolated and/or selected for the effective fermentation of food ingredients. Characteristics of genetic alterations among the strains used are of particular interest in studies of A. oryzae. Here, we have sequenced the whole genome of an industrial fungal isolate, A. oryzae RIB326, by using a next-generation sequencing system and ...
Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.
Full Text Available Caulerpa species are marine green algae, which often act as invasive species with rapid clonal proliferation when growing outside their native biogeographical borders. Despite many publications on the genetics and ecology of Caulerpa species, their life history and ploidy levels are still to be resolved and are the subject of large controversy. While some authors claimed that the thallus found in nature has a haplodiplobiontic life cycle with heteromorphic alternation of generations, other authors claimed a diploid or haploid life cycle with only one generation involved. DAPI-staining with image analysis and microspectrophotometry were used to estimate relative nuclear DNA contents in three species of Caulerpa from the Mediterranean, at individual, population and species levels. Results show that ploidy levels and genome size vary in these three Caulerpa species, with a reduction in genome size for the invasive ones. Caulerpa species in the Mediterranean are polyploids in different life history phases; all sampled C. taxifolia and C. racemosa var. cylindracea were in haplophasic phase, but in C. prolifera, the native species, individuals were found in both diplophasic and haplophasic phases. Different levels of endopolyploidy were found in both C. prolifera and C. racemosa var. cylindracea. Life history is elucidated for the Mediterranean C. prolifera and it is hypothesized that haplophasic dominance in C. racemosa var. cylindracea and C. taxifolia is a beneficial trait for their invasive strategies.
Full Text Available Brucella species are the most important zoonotic pathogens worldwide and cause considerable harm to humans and animals. In this study, we presented the complete genome of B. suis 019 isolated from sheep (ovine with epididymitis. B. suis 019 has a rough phenotype and can infect sheep, rhesus monkeys and possibly humans. The comparative genome analysis demonstrated that B. suis 019 is closest to the vaccine strain B. suis bv. 1 str. S2. Further analysis associated the rsh gene to the pathogenicity of B. suis 019, and the WbkA gene to the rough phenotype of B. suis 019. The 019 complete genome data was deposited in the GenBank database with ID PRJNA308608.
Chabannes, Matthieu; Baurens, Franc-Christophe; Duroy, Pierre-Olivier; Bocs, Stéphanie; Vernerey, Marie-Stéphanie; Rodier-Goud, Marguerite; Barbe, Valérie; Gayral, Philippe
Plant pararetroviruses integrate serendipitously into their host genomes. The banana genome harbors integrated copies of banana streak virus (BSV) named endogenous BSV (eBSV) that are able to release infectious pararetrovirus. In this investigation, we characterized integrants of three BSV species—Goldfinger (eBSGFV), Imove (eBSImV), and Obino l'Ewai (eBSOLV)—in the seedy Musa balbisiana Pisang klutuk wulung (PKW) by studying their molecular structure, genomic organization, genomic landscape, and infectious capacity. All eBSVs exhibit extensive viral genome duplications and rearrangements. eBSV segregation analysis on an F1 population of PKW combined with fluorescent in situ hybridization analysis showed that eBSImV, eBSOLV, and eBSGFV are each present at a single locus. eBSOLV and eBSGFV contain two distinct alleles, whereas eBSImV has two structurally identical alleles. Genotyping of both eBSV and viral particles expressed in the progeny demonstrated that only one allele for each species is infectious. The infectious allele of eBSImV could not be identified since the two alleles are identical. Finally, we demonstrate that eBSGFV and eBSOLV are located on chromosome 1 and eBSImV is located on chromosome 2 of the reference Musa genome published recently. The structure and evolution of eBSVs suggest sequential integration into the plant genome, and haplotype divergence analysis confirms that the three loci display differential evolution. Based on our data, we propose a model for BSV integration and eBSV evolution in the Musa balbisiana genome. The mutual benefits of this unique host-pathogen association are also discussed. PMID:23720724
Jolly Emmitt R
Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.
Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu
The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Vamathevan, Jessica J., E-mail: email@example.com [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom); Hall, Matthew D.; Hasan, Samiul; Woollard, Peter M. [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom); Xu, Meng; Yang, Yulan; Li, Xin; Wang, Xiaoli [BGI-Shenzen, Shenzhen (China); Kenny, Steve [Safety Assessment, PTS, GlaxoSmithKline, Ware (United Kingdom); Brown, James R. [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Collegeville, PA (United States); Huxley-Jones, Julie [UK Platform Technology Sciences (PTS) Operations and Planning, PTS, GlaxoSmithKline, Stevenage (United Kingdom); Lyon, Jon; Haselden, John [Safety Assessment, PTS, GlaxoSmithKline, Ware (United Kingdom); Min, Jiumeng [BGI-Shenzen, Shenzhen (China); Sanseau, Philippe [Computational Biology, Quantitative Sciences, GlaxoSmithKline, Stevenage (United Kingdom)
Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of the Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research. - Highlights: • Genomes of the minipig and beagle dog, two species used in pharmaceutical studies. • First systematic comparative genome analysis of human and six experimental animals. • Key drug toxicology genes display unique duplication patterns across species. • Comparison of 317 drug targets show species-specific evolutionary patterns.
Vamathevan, Jessica J.; Hall, Matthew D.; Hasan, Samiul; Woollard, Peter M.; Xu, Meng; Yang, Yulan; Li, Xin; Wang, Xiaoli; Kenny, Steve; Brown, James R.; Huxley-Jones, Julie; Lyon, Jon; Haselden, John; Min, Jiumeng; Sanseau, Philippe
Improving drug attrition remains a challenge in pharmaceutical discovery and development. A major cause of early attrition is the demonstration of safety signals which can negate any therapeutic index previously established. Safety attrition needs to be put in context of clinical translation (i.e. human relevance) and is negatively impacted by differences between animal models and human. In order to minimize such an impact, an earlier assessment of pharmacological target homology across animal model species will enhance understanding of the context of animal safety signals and aid species selection during later regulatory toxicology studies. Here we sequenced the genomes of the Sus scrofa Göttingen minipig and the Canis familiaris beagle, two widely used animal species in regulatory safety studies. Comparative analyses of these new genomes with other key model organisms, namely mouse, rat, cynomolgus macaque, rhesus macaque, two related breeds (S. scrofa Duroc and C. familiaris boxer) and human reveal considerable variation in gene content. Key genes in toxicology and metabolism studies, such as the UGT2 family, CYP2D6, and SLCO1A2, displayed unique duplication patterns. Comparisons of 317 known human drug targets revealed surprising variation such as species-specific positive selection, duplication and higher occurrences of pseudogenized targets in beagle (41 genes) relative to minipig (19 genes). These data will facilitate the more effective use of animals in biomedical research. - Highlights: • Genomes of the minipig and beagle dog, two species used in pharmaceutical studies. • First systematic comparative genome analysis of human and six experimental animals. • Key drug toxicology genes display unique duplication patterns across species. • Comparison of 317 drug targets show species-specific evolutionary patterns
Mel'nyk, V M; Spiridonova, K V; Andrieiev, I O; Strashniuk, N M; Kunakh, V A
The comparative study of the genomes of intact plants-representatives of some species of the genus Gentiana L. as well as cultured cells of G. lutea and G. punctata was performed using restriction analysis. Species specificity of restriction fragment patterns for studied representatives of this genus was revealed. The differences between electrophoretic patterns of digested DNA purified from rhizome and leaves of G. lutea and G. punctata were found. The changes in genomes of G. lutea and G. punctata cells cultured in vitro compared with the genomes of intact plants were detected. The data obtained evidence that some of them may be of nonrandom character.
Zhang, Yingxiao; Iaffaldano, Brian J; Zhuang, Xiaofeng; Cardina, John; Cornish, Katrina
Rubber dandelion (Taraxacum kok-saghyz, TK) is being developed as a domestic source of natural rubber to meet increasing global demand. However, the domestication of TK is complicated by its colocation with two weedy dandelion species, Taraxacum brevicorniculatum (TB) and the common dandelion (Taraxacum officinale, TO). TB is often present as a seed contaminant within TK accessions, while TO is a pandemic weed, which may have the potential to hybridize with TK. To discriminate these species at the molecular level, and facilitate gene flow studies between the potential rubber crop, TK, and its weedy relatives, we generated genomic and marker resources for these three dandelion species. Complete chloroplast genome sequences of TK (151,338 bp), TO (151,299 bp), and TB (151,282 bp) were obtained using the Illumina GAII and MiSeq platforms. Chloroplast sequences were analyzed and annotated for all the three species. Phylogenetic analysis within Asteraceae showed that TK has a closer genetic distance to TB than to TO and Taraxacum species were most closely related to lettuce (Lactuca sativa). By sequencing multiple genotypes for each species and testing variants using gel-based methods, four chloroplast Single Nucleotide Polymorphism (SNP) variants were found to be fixed between TK and TO in large populations, and between TB and TO. Additionally, Expressed Sequence Tag (EST) resources developed for TO and TK permitted the identification of five nuclear species-specific SNP markers. The availability of chloroplast genomes of these three dandelion species, as well as chloroplast and nuclear molecular markers, will provide a powerful genetic resource for germplasm differentiation and purification, and the study of potential gene flow among Taraxacum species.
Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M
The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.
Troyer, Jennifer L; Pecon-Slattery, Jill; Roelke, Melody E; Johnson, Warren; VandeWoude, Sue; Vazquez-Salat, Nuria; Brown, Meredith; Frank, Laurence; Woodroffe, Rosie; Winterbach, Christiaan; Winterbach, Hanlie; Hemson, Graham; Bush, Mitch; Alexander, Kathleen A; Revilla, Eloy; O'Brien, Stephen J
Feline immunodeficiency virus (FIV) infects numerous wild and domestic feline species and is closely related to human immunodeficiency virus (HIV) and simian immunodeficiency virus (SIV). Species-specific strains of FIV have been described for domestic cat (Felis catus), puma (Puma concolor), lion (Panthera leo), leopard (Panthera pardus), and Pallas' cat (Otocolobus manul). Here, we employ a three-antigen Western blot screening (domestic cat, puma, and lion FIV antigens) and PCR analysis to survey worldwide prevalence, distribution, and genomic differentiation of FIV based on 3,055 specimens from 35 Felidae and 3 Hyaenidae species. Although FIV infects a wide variety of host species, it is confirmed to be endemic in free-ranging populations of nine Felidae and one Hyaenidae species. These include the large African carnivores (lion, leopard, cheetah, and spotted hyena), where FIV is widely distributed in multiple populations; most of the South American felids (puma, jaguar, ocelot, margay, Geoffroy's cat, and tigrina), which maintain a lower FIV-positive level throughout their range; and two Asian species, the Pallas' cat, which has a species-specific strain of FIV, and the leopard cat, which has a domestic cat FIV strain in one population. Phylogenetic analysis of FIV proviral sequence demonstrates that most species for which FIV is endemic harbor monophyletic, genetically distinct species-specific FIV strains, suggesting that FIV transfer between cat species has occurred in the past but is quite infrequent today.
Nicholas R Thomson
Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common
Andersson Jan O
Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.
Prabha, Ratna; Singh, Dhananjaya P; Sinha, Swati; Ahmad, Khurshid; Rai, Anil
With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes. Copyright © 2016 Elsevier B.V. All rights reserved.
Monica R Sanchez
Full Text Available Evolutionary outcomes depend not only on the selective forces acting upon a species, but also on the genetic background. However, large timescales and uncertain historical selection pressures can make it difficult to discern such important background differences between species. Experimental evolution is one tool to compare evolutionary potential of known genotypes in a controlled environment. Here we utilized a highly reproducible evolutionary adaptation in Saccharomyces cerevisiae to investigate whether experimental evolution of other yeast species would select for similar adaptive mutations. We evolved populations of S. cerevisiae, S. paradoxus, S. mikatae, S. uvarum, and interspecific hybrids between S. uvarum and S. cerevisiae for ~200-500 generations in sulfate-limited continuous culture. Wild-type S. cerevisiae cultures invariably amplify the high affinity sulfate transporter gene, SUL1. However, while amplification of the SUL1 locus was detected in S. paradoxus and S. mikatae populations, S. uvarum cultures instead selected for amplification of the paralog, SUL2. We measured the relative fitness of strains bearing deletions and amplifications of both SUL genes from different species, confirming that, converse to S. cerevisiae, S. uvarum SUL2 contributes more to fitness in sulfate limitation than S. uvarum SUL1. By measuring the fitness and gene expression of chimeric promoter-ORF constructs, we were able to delineate the cause of this differential fitness effect primarily to the promoter of S. uvarum SUL1. Our data show evidence of differential sub-functionalization among the sulfate transporters across Saccharomyces species through recent changes in noncoding sequence. Furthermore, these results show a clear example of how such background differences due to paralog divergence can drive changes in genome evolution.
Full Text Available Karyotype studies were performed in 18 populations of eight Helichrysum species in Iran. Those species showed chromosome numbers of 2n = 2x = 14; 2n = 4x = 24, 28 and 32; 2n = 6x = 36; 2n = 7x = 42; 2n = 8x = 48; 2n = 9x = 54; and 2n = 10x = 60. The chromosome numbers of H. davisianum, H. globiferum, H. leucocephalum and H. oocephalum are reported here for the first time. New ploidy levels are reported for H. oligocephalum (2n = 4x = 24 and H. plicatum (2n = 4x = 32. The chromosomes were metacentric and submetacentric. An ANOVA among H. globiferum and H. leucocephalum populations showed significant differences for the coefficient of variation for chromosome size, total form percentage and the asymmetry indices, indicating that changes in the chromosome structure of Helichrysum species occurred during their diversification. Significant positive correlations among the species and populations studied, in terms of the total chromosome length, lengths of the short arms and lengths of the long arms, indicate that these karyotypic features change simultaneously during speciation events. The genome sizes of Helichrysum species are reported here for first time. The 2C DNA content ranged from 8.13 pg (in H. rubicundum to 18.4 pg (in H. leucocephalum and H. davisianum. We found that C-value correlated significantly with ploidy level, total chromosome length, lengths of the long arms and lengths of the short arms (p<0.05, indicating that changes in chromosome structure are accompanied by changes in DNA content.
Grijseels, Sietske; Nielsen, Jens Christian; Randelovic, Milica; Nielsen, Jens; Nielsen, Kristian Fog; Workman, Mhairi; Frisvad, Jens Christian
A new soil-borne species belonging to the Penicillium section Canescentia is described, Penicillium arizonense sp. nov. (type strain CBS 141311T = IBT 12289T). The genome was sequenced and assembled into 33.7 Mb containing 12,502 predicted genes. A phylogenetic assessment based on marker genes confirmed the grouping of P. arizonense within section Canescentia. Compared to related species, P. arizonense proved to encode a high number of proteins involved in carbohydrate metabolism, in particular hemicellulases. Mining the genome for genes involved in secondary metabolite biosynthesis resulted in the identification of 62 putative biosynthetic gene clusters. Extracts of P. arizonense were analysed for secondary metabolites and austalides, pyripyropenes, tryptoquivalines, fumagillin, pseurotin A, curvulinic acid and xanthoepocin were detected. A comparative analysis against known pathways enabled the proposal of biosynthetic gene clusters in P. arizonense responsible for the synthesis of all detected compounds except curvulinic acid. The capacity to produce biomass degrading enzymes and the identification of a high chemical diversity in secreted bioactive secondary metabolites, offers a broad range of potential industrial applications for the new species P. arizonense. The description and availability of the genome sequence of P. arizonense, further provides the basis for biotechnological exploitation of this species. PMID:27739446
Schmoll, Monika; Dattenböck, Christoph; Carreras-Villaseñor, Nohemí; Mendoza-Mendoza, Artemio; Tisch, Doris; Alemán, Mario Ivan; Baker, Scott E.; Brown, Christopher; Cervantes-Badillo, Mayte Guadalupe; Cetz-Chel, José; Cristobal-Mondragon, Gema Rosa; Delaye, Luis; Esquivel-Naranjo, Edgardo Ulises; Frischmann, Alexa; Gallardo-Negrete, Jose de Jesus; García-Esquivel, Monica; Gomez-Rodriguez, Elida Yazmin; Greenwood, David R.; Hernández-Oñate, Miguel; Kruszewska, Joanna S.; Lawry, Robert; Mora-Montes, Hector M.; Muñoz-Centeno, Tania; Nieto-Jacobo, Maria Fernanda; Nogueira Lopez, Guillermo; Olmedo-Monfil, Vianey; Osorio-Concepcion, Macario; Piłsyk, Sebastian; Pomraning, Kyle R.; Rodriguez-Iglesias, Aroa; Rosales-Saavedra, Maria Teresa; Sánchez-Arreguín, J. Alejandro; Seidl-Seiboth, Verena; Stewart, Alison; Uresti-Rivera, Edith Elena; Wang, Chih-Li; Wang, Ting-Fang; Zeilinger, Susanne; Casas-Flores, Sergio; Herrera-Estrella, Alfredo
Fabricio B.M. Arraes
Full Text Available The annotation and comparative analyses of the genomes of Mycoplasma synoviae and Mycoplasma hyopneumonie, as well as of other Mollicutes (a group of bacteria devoid of a rigid cell wall, has set the grounds for a global understanding of their metabolism and infection mechanisms. According to the annotation data, M. synoviae and M. hyopneumoniae are able to perform glycolytic metabolism, but do not possess the enzymatic machinery for citrate and glyoxylate cycles, gluconeogenesis and the pentose phosphate pathway. Both can synthesize ATP by lactic fermentation, but only M. synoviae can convert acetaldehyde to acetate. Also, our genome analysis revealed that M. synoviae and M. hyopneumoniae are not expected to synthesize polysaccharides, but they can take up a variety of carbohydrates via the phosphoenolpyruvate-dependent phosphotransferase system (PEP-PTS. Our data showed that these two organisms are unable to synthesize purine and pyrimidine de novo, since they only possess the sequences which encode salvage pathway enzymes. Comparative analyses of M. synoviae and M. hyopneumoniae with other Mollicutes have revealed differential genes in the former two genomes coding for enzymes that participate in carbohydrate, amino acid and nucleotide metabolism and host-pathogen interaction. The identification of these metabolic pathways will provide a better understanding of the biology and pathogenicity of these organisms.
Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh
Feng, Shiqian; Yang, Qianqian; Li, Hu; Song, Fan; Stejskal, Václav; Opit, George P; Cai, Wanzhi; Li, Zhihong; Shao, Renfu
The booklouse, Liposcelis bostrychophila is an important storage pest worldwide. The mitochondrial (mt) genome of an asexual strain (Beibei, China) of the L. bostrychophila comprises two chromosomes; each chromosome contains approximate half of the 37 genes typically found in bilateral animals. The mt genomes of two sexual strains of L. bostrychophila , however, comprise five and seven chromosomes, respectively; each chromosome contains one to six genes. To understand mt genome evolution in L. bostrychophila , and whether L. bostrychophila is a cryptic species, we sequenced the mt genomes of six strains of asexual L. bostrychophila collected from different locations in China, Croatia, and the United States. The mt genomes of all six asexual strains of L. bostrychophila have two chromosomes. Phylogenetic analysis of mt genome sequences divided nine strains of L. bostrychophila into four groups. Each group has a distinct mt genome organization and substantial sequence divergence (48.7-87.4%) from other groups. Furthermore, the seven asexual strains of L. bostrychophila , including the published Beibei strain, are more closely related to two other species of booklice, L. paeta and L. sculptilimacula , than to the sexual strains of L. bostrychophila Our results revealed highly divergent mt genomes in the booklouse, L. bostrychophila , and indicate that L. bostrychophila is a cryptic species. Copyright © 2018 Feng et al.
Full Text Available The booklouse, Liposcelis bostrychophila is an important storage pest worldwide. The mitochondrial (mt genome of an asexual strain (Beibei, China of the L. bostrychophila comprises two chromosomes; each chromosome contains approximate half of the 37 genes typically found in bilateral animals. The mt genomes of two sexual strains of L. bostrychophila, however, comprise five and seven chromosomes, respectively; each chromosome contains one to six genes. To understand mt genome evolution in L. bostrychophila, and whether L. bostrychophila is a cryptic species, we sequenced the mt genomes of six strains of asexual L. bostrychophila collected from different locations in China, Croatia, and the United States. The mt genomes of all six asexual strains of L. bostrychophila have two chromosomes. Phylogenetic analysis of mt genome sequences divided nine strains of L. bostrychophila into four groups. Each group has a distinct mt genome organization and substantial sequence divergence (48.7–87.4% from other groups. Furthermore, the seven asexual strains of L. bostrychophila, including the published Beibei strain, are more closely related to two other species of booklice, L. paeta and L. sculptilimacula, than to the sexual strains of L. bostrychophila. Our results revealed highly divergent mt genomes in the booklouse, L. bostrychophila, and indicate that L. bostrychophila is a cryptic species.
Full Text Available Abstract Background With the rapid growth in the availability of genome sequence data, the automated identification of orthologous genes between species (orthologs is of fundamental importance to facilitate functional annotation and studies on comparative and evolutionary genomics. Genes with no apparent orthologs between the bovine and human genome may be responsible for major differences between the species, however, such genes are often neglected in functional genomics studies. Results A BLAST-based method was exploited to explore the current annotation and orthology predictions in Ensembl. Genes with no orthologs between the two genomes were classified into groups based on alignments, ontology, manual curation and publicly available information. Starting from a high quality and specific set of orthology predictions, as provided by Ensembl, hidden relationship between genes and genomes of different mammalian species were unveiled using a highly sensitive approach, based on sequence similarity and genomic comparison. Conclusions The analysis identified 3,801 bovine genes with no orthologs in human and 1010 human genes with no orthologs in cow, among which 411 and 43 genes, respectively, had no match at all in the other species. Most of the apparently non-orthologous genes may potentially have orthologs which were missed in the annotation process, despite having a high percentage of identity, because of differences in gene length and structure. The comparative analysis reported here identified gene variants, new genes and species-specific features and gave an overview of the other side of orthology which may help to improve the annotation of the bovine genome and the knowledge of structural differences between species.
Full Text Available Strains of Bifidobacterium longum, Bifidobacterium breve, and Bifidobacterium animalis are widely used as probiotics in the food industry. Although numerous studies have revealed the properties and functionality of these strains, it is uncertain whether these characteristics are species common or strain specific. To address this issue, we performed a comparative genomic analysis of 49 strains belonging to these three bifidobacterial species to describe their genetic diversity and to evaluate species-level differences. There were 166 common clusters between strains of B. breve and B. longum, whereas there were nine common clusters between strains of B. animalis and B. longum and four common clusters between strains of B. animalis and B. breve. Further analysis focused on carbohydrate metabolism revealed the existence of certain strain-dependent genes, such as those encoding enzymes for host glycan utilisation or certain membrane transporters, and many genes commonly distributed at the species level, as was previously reported in studies with limited strains. As B. longum and B. breve are human-residential bifidobacteria (HRB, whereas B. animalis is a non-HRB species, several of the differences in these species’ gene distributions might be the result of their adaptations to the nutrient environment. This information may aid both in selecting probiotic candidates and in understanding their potential function as probiotics.
Liu, Guo-Hua; Tian, Si-Qin; Cui, Ping; Fang, Su-Fang; Wang, Chun-Ren; Zhu, Xing-Quan
Rabbit coccidiosis caused by members of the genus Eimeria can cause enormous economic impact worldwide, but the genetics, epidemiology and biology of these parasites remain poorly understood. In the present study, we sequenced and annotated the complete mitochondrial (mt) genomes of five Eimeria species that commonly infect the domestic rabbits. The complete mt genomes of Eimeria intestinalis, Eimeria flavescens, Eimeria media, Eimeria vejdovskyi and Eimeria irresidua were 6261bp, 6258bp, 6168bp, 6254bp, 6259bp in length, respectively. All of the mt genomes consist of 3 genes for proteins (cytb, cox1, and cox3), 14 gene fragments for the large subunit (LSU) rRNA and 11 gene fragments for the small subunit (SSU) rRNA, but no transfer RNA (tRNA) genes. The gene order of the mt genomes is similar to that of Plasmodium, but distinct from Haemosporida and Theileria. Phylogenetic analyses based on full nucleotide sequences using Bayesian analysis revealed that the monophyly of the Eimeria of rabbits was strongly statistically supported with a Bayesian posterior probabilities. These data provide novel mtDNA markers for studying the population genetics and molecular epidemiology of the Eimeria species, and should have implications for the molecular diagnosis, prevention and control of coccidiosis in rabbits. Copyright © 2015 Elsevier Inc. All rights reserved.
J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.
Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.
Joardar, Vinita; Abrams, Natalie F; Hostetler, Jessica; Paukstelis, Paul J; Pakala, Suchitra; Pakala, Suman B; Zafar, Nikhat; Abolude, Olukemi O; Payne, Gary; Andrianopoulos, Alex; Denning, David W; Nierman, William C
The genera Aspergillus and Penicillium include some of the most beneficial as well as the most harmful fungal species such as the penicillin-producer Penicillium chrysogenum and the human pathogen Aspergillus fumigatus, respectively. Their mitochondrial genomic sequences may hold vital clues into the mechanisms of their evolution, population genetics, and biology, yet only a handful of these genomes have been fully sequenced and annotated. Here we report the complete sequence and annotation of the mitochondrial genomes of six Aspergillus and three Penicillium species: A. fumigatus, A. clavatus, A. oryzae, A. flavus, Neosartorya fischeri (A. fischerianus), A. terreus, P. chrysogenum, P. marneffei, and Talaromyces stipitatus (P. stipitatum). The accompanying comparative analysis of these and related publicly available mitochondrial genomes reveals wide variation in size (25-36 Kb) among these closely related fungi. The sources of genome expansion include group I introns and accessory genes encoding putative homing endonucleases, DNA and RNA polymerases (presumed to be of plasmid origin) and hypothetical proteins. The two smallest sequenced genomes (A. terreus and P. chrysogenum) do not contain introns in protein-coding genes, whereas the largest genome (T. stipitatus), contains a total of eleven introns. All of the sequenced genomes have a group I intron in the large ribosomal subunit RNA gene, suggesting that this intron is fixed in these species. Subsequent analysis of several A. fumigatus strains showed low intraspecies variation. This study also includes a phylogenetic analysis based on 14 concatenated core mitochondrial proteins. The phylogenetic tree has a different topology from published multilocus trees, highlighting the challenges still facing the Aspergillus systematics. The study expands the genomic resources available to fungal biologists by providing mitochondrial genomes with consistent annotations for future genetic, evolutionary and population
Wang, Kai; Wang, Yuyu; Yang, Ding
The complete mitochondrial (mt) genome of a stonefly species, Togoperla sp. (Plecoptera: Perlidae), was sequenced. The 15,723 bp long genome has the standard metazoan complement of 37 genes and an A+T-rich region, which is the same as the insect ancestral genome arrangement.
Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.
Bouyioukos, Costas; Elati, Mohamed; Képès, François
Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.
Chen, Peng; Han, Yuqing; Zhu, Chaoying; Gao, Bin; Ruan, Luzhang
The complete mitochondrial genome sequences of Porzana fusca and Porzana pusilla were determined. The two avian species share a high degree of homology in terms of mitochondrial genome organization and gene arrangement. Their corresponding mitochondrial genomes are 16,935 and 16,978 bp and consist of 37 genes and a control region. Their PCGs were both 11,365 bp long and have similar structure. Their tRNA gene sequences could be folded into canonical cloverleaf secondary structure, except for tRNA Ser (AGY) , which lost its "DHU" arm. Based on the concatenated nucleotide sequences of the complete mitochondrial DNA genes of 16 Rallidae species, reconstruction of phylogenetic trees and analysis of the molecular clock of P. fusca and P. pusilla indicated that these species from a sister group, which in turn are sister group to Rallina eurizonoides. The genus Gallirallus is a sister group to genus Lewinia, and these groups in turn are sister groups to genus Porphyrio. Moreover, molecular clock analyses suggested that the basal divergence of Rallidae could be traced back to 40.47 (41.46‒39.45) million years ago (Mya), and the divergence of Porzana occurred approximately 5.80 (15.16‒0.79) Mya.
Chen, Hao; Wang, Xiangfeng
In plants and animals, chromosomal breakage and fusion events based on conserved syntenic genomic blocks lead to conserved patterns of karyotype evolution among species of the same family. However, karyotype information has not been well utilized in genomic comparison studies. We present CrusView, a Java-based bioinformatic application utilizing Standard Widget Toolkit/Swing graphics libraries and a SQLite database for performing visualized analyses of comparative genomics data in Brassicaceae (crucifer) plants. Compared with similar software and databases, one of the unique features of CrusView is its integration of karyotype information when comparing two genomes. This feature allows users to perform karyotype-based genome assembly and karyotype-assisted genome synteny analyses with preset karyotype patterns of the Brassicaceae genomes. Additionally, CrusView is a local program, which gives its users high flexibility when analyzing unpublished genomes and allows users to upload self-defined genomic information so that they can visually study the associations between genome structural variations and genetic elements, including chromosomal rearrangements, genomic macrosynteny, gene families, high-frequency recombination sites, and tandem and segmental duplications between related species. This tool will greatly facilitate karyotype, chromosome, and genome evolution studies using visualized comparative genomics approaches in Brassicaceae species. CrusView is freely available at http://www.cmbb.arizona.edu/CrusView/.
Cao, Yinhe; Tung, Wen-Wen; Gao, J B
With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.
Santoni, Daniele; Romano-Spica, Vincenzo
Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.
Full Text Available The shift from outcrossing to self-fertilization is among the most common evolutionary transitions in flowering plants. Until recently, however, a genome-wide view of this transition has been obscured by both a dearth of appropriate data and the lack of appropriate population genomic methods to interpret such data. Here, we present a novel population genomic analysis detailing the origin of the selfing species, Capsella rubella, which recently split from its outcrossing sister, Capsella grandiflora. Due to the recency of the split, much of the variation within C. rubella is also found within C. grandiflora. We can therefore identify genomic regions where two C. rubella individuals have inherited the same or different segments of ancestral diversity (i.e. founding haplotypes present in C. rubella's founder(s. Based on this analysis, we show that C. rubella was founded by multiple individuals drawn from a diverse ancestral population closely related to extant C. grandiflora, that drift and selection have rapidly homogenized most of this ancestral variation since C. rubella's founding, and that little novel variation has accumulated within this time. Despite the extensive loss of ancestral variation, the approximately 25% of the genome for which two C. rubella individuals have inherited different founding haplotypes makes up roughly 90% of the genetic variation between them. To extend these findings, we develop a coalescent model that utilizes the inferred frequency of founding haplotypes and variation within founding haplotypes to estimate that C. rubella was founded by a potentially large number of individuals between 50 and 100 kya, and has subsequently experienced a twenty-fold reduction in its effective population size. As population genomic data from an increasing number of outcrossing/selfing pairs are generated, analyses like the one developed here will facilitate a fine-scaled view of the evolutionary and demographic impact of the
Feltus Frank A
Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of
Kraus, R.H.S.; Kerstens, H.H.D.; Hooft, van W.F.; Megens, H.J.W.C.; Elmberg, J.; Tsvey, A.; Sartakov, D.; Soloviev, S.A.; Crooijmans, R.P.M.A.; Groenen, M.A.M.; Ydenberg, R.C.; Prins, H.H.T.
The study of speciation and maintenance of species barriers is at the core of evolutionary biology. During speciation the genome of one population becomes separated from other populations of the same species, which may lead to genomic incompatibility with time. This separation is complete when no
Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.
Chen, Tsute; Gajare, Prasad; Olsen, Ingar; Dewhirst, Floyd E.
ABSTRACT The advent of next generation sequencing is producing more genomic sequences for various strains of many human oral microbial species and allows for insightful functional comparisons at both intra- and inter-species levels. This study performed in-silico functional comparisons for currently available genomic sequences of major species associated with periodontitis including Aggregatibacter actinomycetemcomitans (AA), Porphyromonas gingivalis (PG), Treponema denticola (TD), and Tannerella forsythia (TF), as well as several cariogenic and commensal streptococcal species. Complete or draft sequences were annotated with the RAST to infer structured functional subsystems for each genome. The subsystems profiles were clustered to groups of functions with similar patterns. Functional enrichment and depletion were evaluated based on hypergeometric distribution to identify subsystems that are unique or missing between two groups of genomes. Unique or missing metabolic pathways and biological functions were identified in different species. For example, components involved in flagellar motility were found only in the motile species TD, as expected, with few exceptions scattered in several streptococcal species, likely associated with chemotaxis. Transposable elements were only found in the two Bacteroidales species PG and TF, and half of the AA genomes. Genes involved in CRISPR were prevalent in most oral species. Furthermore, prophage related subsystems were also commonly found in most species except for PG and Streptococcus mutans, in which very few genomes contain prophage components. Comparisons between pathogenic (P) and nonpathogenic (NP) genomes also identified genes potentially important for virulence. Two such comparisons were performed between AA (P) and several A. aphrophilus (NP) strains, and between S. mutans + S. sobrinus (P) and other oral streptococcal species (NP). This comparative genomics approach can be readily used to identify functions unique to
Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W
The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants. Copyright © 2015 Jun et al.
Zhu, Huayu; Song, Pengyao; Koo, Dal-Hoe; Guo, Luqin; Li, Yanman; Sun, Shouru; Weng, Yiqun; Yang, Luming
Microsatellite markers are one of the most informative and versatile DNA-based markers used in plant genetic research, but their development has traditionally been difficult and costly. The whole genome sequencing with next-generation sequencing (NGS) technologies provides large amounts of sequence data to develop numerous microsatellite markers at whole genome scale. SSR markers have great advantage in cross-species comparisons and allow investigation of karyotype and genome evolution through highly efficient computation approaches such as in silico PCR. Here we described genome wide development and characterization of SSR markers in the watermelon (Citrullus lanatus) genome, which were then use in comparative analysis with two other important crop species in the Cucurbitaceae family: cucumber (Cucumis sativus L.) and melon (Cucumis melo L.). We further applied these markers in evaluating the genetic diversity and population structure in watermelon germplasm collections. A total of 39,523 microsatellite loci were identified from the watermelon draft genome with an overall density of 111 SSRs/Mbp, and 32,869 SSR primers were designed with suitable flanking sequences. The dinucleotide SSRs were the most common type representing 34.09 % of the total SSR loci and the AT-rich motifs were the most abundant in all nucleotide repeat types. In silico PCR analysis identified 832 and 925 SSR markers with each having a single amplicon in the cucumber and melon draft genome, respectively. Comparative analysis with these cross-species SSR markers revealed complicated mosaic patterns of syntenic blocks among the genomes of three species. In addition, genetic diversity analysis of 134 watermelon accessions with 32 highly informative SSR loci placed these lines into two groups with all accessions of C.lanatus var. citorides and three accessions of C. colocynthis clustered in one group and all accessions of C. lanatus var. lanatus and the remaining accessions of C. colocynthis
Stelzer, Claus-Peter; Riss, Simone; Stadler, Peter
Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg). Most of this variation (67%) could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32%) at lower taxonomic levels--within and among genealogical species--and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex.
Full Text Available Abstract Background Studies on genome size variation in animals are rarely done at lower taxonomic levels, e.g., slightly above/below the species level. Yet, such variation might provide important clues on the tempo and mode of genome size evolution. In this study we used the flow-cytometry method to study the evolution of genome size in the rotifer Brachionus plicatilis, a cryptic species complex consisting of at least 14 closely related species. Results We found an unexpectedly high variation in this species complex, with genome sizes ranging approximately seven-fold (haploid '1C' genome sizes: 0.056-0.416 pg. Most of this variation (67% could be ascribed to the major clades of the species complex, i.e. clades that are well separated according to most species definitions. However, we also found substantial variation (32% at lower taxonomic levels - within and among genealogical species - and, interestingly, among species pairs that are not completely reproductively isolated. In one genealogical species, called B. 'Austria', we found greatly enlarged genome sizes that could roughly be approximated as multiples of the genomes of its closest relatives, which suggests that whole-genome duplications have occurred early during separation of this lineage. Overall, genome size was significantly correlated to egg size and body size, even though the latter became non-significant after controlling for phylogenetic non-independence. Conclusions Our study suggests that substantial genome size variation can build up early during speciation, potentially even among isolated populations. An alternative, but not mutually exclusive interpretation might be that reproductive isolation tends to build up unusually slow in this species complex.
Kamanu, Frederick Kinyua
Funk, Helena T; Berg, Sabine; Krupinska, Karin; Maier, Uwe G; Krause, Kirsten
The holoparasitic plant genus Cuscuta comprises species with photosynthetic capacity and functional chloroplasts as well as achlorophyllous and intermediate forms with restricted photosynthetic activity and degenerated chloroplasts. Previous data indicated significant differences with respect to the plastid genome coding capacity in different Cuscuta species that could correlate with their photosynthetic activity. In order to shed light on the molecular changes accompanying the parasitic lifestyle, we sequenced the plastid chromosomes of the two species Cuscuta reflexa and Cuscuta gronovii. Both species are capable of performing photosynthesis, albeit with varying efficiencies. Together with the plastid genome of Epifagus virginiana, an achlorophyllous parasitic plant whose plastid genome has been sequenced, these species represent a series of progression towards total dependency on the host plant, ranging from reduced levels of photosynthesis in C. reflexa to a restricted photosynthetic activity and degenerated chloroplasts in C. gronovii to an achlorophyllous state in E. virginiana. The newly sequenced plastid genomes of C. reflexa and C. gronovii reveal that the chromosome structures are generally very similar to that of non-parasitic plants, although a number of species-specific insertions, deletions (indels) and sequence inversions were identified. However, we observed a gradual adaptation of the plastid genome to the different degrees of parasitism. The changes are particularly evident in C. gronovii and include (a) the parallel losses of genes for the subunits of the plastid-encoded RNA polymerase and the corresponding promoters from the plastid genome, (b) the first documented loss of the gene for a putative splicing factor, MatK, from the plastid genome and (c) a significant reduction of RNA editing. Overall, the comparative genomic analysis of plastid DNA from parasitic plants indicates a bias towards a simplification of the plastid gene expression
Maier Uwe G
Full Text Available Abstract Background The holoparasitic plant genus Cuscuta comprises species with photosynthetic capacity and functional chloroplasts as well as achlorophyllous and intermediate forms with restricted photosynthetic activity and degenerated chloroplasts. Previous data indicated significant differences with respect to the plastid genome coding capacity in different Cuscuta species that could correlate with their photosynthetic activity. In order to shed light on the molecular changes accompanying the parasitic lifestyle, we sequenced the plastid chromosomes of the two species Cuscuta reflexa and Cuscuta gronovii. Both species are capable of performing photosynthesis, albeit with varying efficiencies. Together with the plastid genome of Epifagus virginiana, an achlorophyllous parasitic plant whose plastid genome has been sequenced, these species represent a series of progression towards total dependency on the host plant, ranging from reduced levels of photosynthesis in C. reflexa to a restricted photosynthetic activity and degenerated chloroplasts in C. gronovii to an achlorophyllous state in E. virginiana. Results The newly sequenced plastid genomes of C. reflexa and C. gronovii reveal that the chromosome structures are generally very similar to that of non-parasitic plants, although a number of species-specific insertions, deletions (indels and sequence inversions were identified. However, we observed a gradual adaptation of the plastid genome to the different degrees of parasitism. The changes are particularly evident in C. gronovii and include (a the parallel losses of genes for the subunits of the plastid-encoded RNA polymerase and the corresponding promoters from the plastid genome, (b the first documented loss of the gene for a putative splicing factor, MatK, from the plastid genome and (c a significant reduction of RNA editing. Conclusion Overall, the comparative genomic analysis of plastid DNA from parasitic plants indicates a bias towards
Brown, D W; Butchko, R A E; Proctor, R H
Fusarium verticillioides (teleomorph Gibberella moniliformis) can be either an endophyte of maize, causing no visible disease, or a pathogen-causing disease of ears, stalks, roots and seedlings. At any stage, this fungus can synthesize fumonisins, a family of mycotoxins structurally similar to the sphingolipid sphinganine. Ingestion of fumonisin-contaminated maize has been associated with a number of animal diseases, including cancer in rodents, and exposure has been correlated with human oesophageal cancer in some regions of the world, and some evidence suggests that fumonisins are a risk factor for neural tube defects. A primary goal of the authors' laboratory is to eliminate fumonisin contamination of maize and maize products. Understanding how and why these toxins are made and the F. verticillioides-maize disease process will allow one to develop novel strategies to limit tissue destruction (rot) and fumonisin production. To meet this goal, genomic sequence data, expressed sequence tags (ESTs) and microarrays are being used to identify F. verticillioides genes involved in the biosynthesis of toxins and plant pathogenesis. This paper describes the current status of F. verticillioides genomic resources and three approaches being used to mine microarray data from a wild-type strain cultured in liquid fumonisin production medium for 12, 24, 48, 72, 96 and 120h. Taken together, these approaches demonstrate the power of microarray technology to provide information on different biological processes.
Zhao, Wenming; Wang, Jing; He, Ximiao
Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate....... Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/). Udgivelsesdato: 2004-Jan-1...
Wang, Long; Shi, Qinghua; Su, Handong; Wang, Yi; Sha, Lina; Fan, Xing; Kang, Houyang; Zhang, Haiqin; Zhou, Yonghong
The St genome is one of the most fundamental genomes in Triticeae. Repetitive sequences are widely used to distinguish different genomes or species. The primary objectives of this study were to (i) screen a new sequence that could easily distinguish the chromosome of the St genome from those of other genomes by fluorescence in situ hybridization (FISH) and (ii) investigate the genome constitution of some species that remain uncertain and controversial. We used degenerated oligonucleotide primer PCR (Dop-PCR), Dot-blot, and FISH to screen for a new marker of the St genome and to test the efficiency of this marker in the detection of the St chromosome at different ploidy levels. Signals produced by a new FISH marker (denoted St 2 -80) were present on the entire arm of chromosomes of the St genome, except in the centromeric region. On the contrary, St 2 -80 signals were present in the terminal region of chromosomes of the E, H, P, and Y genomes. No signal was detected in the A and B genomes, and only weak signals were detected in the terminal region of chromosomes of the D genome. St 2 -80 signals were obvious and stable in chromosomes of different genomes, whether diploid or polyploid. Therefore, St 2 -80 is a potential and useful FISH marker that can be used to distinguish the St genome from those of other genomes in Triticeae.
Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.
Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3
Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of
Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich
One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.
Fujimori, A.; Abe, M.
Full text: The gene of catalytic subunit of DNA dependent protein kinase is responsible gene for SCID mice. The molecules play a critical role in non-homologous end joining including the V(D)J recombination. Contribution of the molecules to the difference of radiosensitivity and the susceptibility to cancer has been suggested. Here we show the entire nucleotide sequence of approximately 193 kbp and 84 kbp genomic regions encoding the entire DNA-PKcs gene in the mouse and chicken respectively. Retroposon was found in the intron 51 of mouse genomic DNA-PKcs gene but in human and chicken. Comparative analysis of these two species strongly suggested that only two genes, DNA-PKcs and MCM4, exist in the region of both species. Several conserved sequences and cis elements, however, were predicted. Recently, the orthologous region for the human DNA-PKcs locus was completed. The results of further comparative study will be discussed
Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W
Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same
Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark
Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.
Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.
Full Text Available Since the sequencing of the honey bee genome, proteomics by mass spectrometry has become increasingly popular for biological analyses of this insect; but we have observed that the number of honey bee protein identifications is consistently low compared to other organisms . In this dataset, we use nanoelectrospray ionization-coupled liquid chromatography–tandem mass spectrometry (nLC–MS/MS to systematically investigate the root cause of low honey bee proteome coverage. To this end, we present here data from three key experiments: a controlled, cross-species analyses of samples from Apis mellifera, Drosophila melanogaster, Caenorhabditis elegans, Saccharomyces cerevisiae, Mus musculus and Homo sapiens; a proteomic analysis of an individual honey bee whose genome was also sequenced; and a cross-tissue honey bee proteome comparison. The cross-species dataset was interrogated to determine relative proteome coverages between species, and the other two datasets were used to search for polymorphic sequences and to compare protein cleavage profiles, respectively.
Background High-throughput SNP genotyping has become an essential requirement for molecular breeding and population genomics studies in plant species. Large scale SNP developments have been reported for several mainstream crops. A growing interest now exists to expand the speed and resolution of genetic analysis to outbred species with highly heterozygous genomes. When nucleotide diversity is high, a refined diagnosis of the target SNP sequence context is needed to convert queried SNPs into high-quality genotypes using the Golden Gate Genotyping Technology (GGGT). This issue becomes exacerbated when attempting to transfer SNPs across species, a scarcely explored topic in plants, and likely to become significant for population genomics and inter specific breeding applications in less domesticated and less funded plant genera. Results We have successfully developed the first set of 768 SNPs assayed by the GGGT for the highly heterozygous genome of Eucalyptus from a mixed Sanger/454 database with 1,164,695 ESTs and the preliminary 4.5X draft genome sequence for E. grandis. A systematic assessment of in silico SNP filtering requirements showed that stringent constraints on the SNP surrounding sequences have a significant impact on SNP genotyping performance and polymorphism. SNP assay success was high for the 288 SNPs selected with more rigorous in silico constraints; 93% of them provided high quality genotype calls and 71% of them were polymorphic in a diverse panel of 96 individuals of five different species. SNP reliability was high across nine Eucalyptus species belonging to three sections within subgenus Symphomyrtus and still satisfactory across species of two additional subgenera, although polymorphism declined as phylogenetic distance increased. Conclusions This study indicates that the GGGT performs well both within and across species of Eucalyptus notwithstanding its nucleotide diversity ≥2%. The development of a much larger array of informative SNPs across
Wang, Richard R-C; Jensen, Kevin B
The genome constitution of tetraploid Roegneria alashanica Keng has been in question for a long time. Most scientific studies have suggested that R. alashanica had two versions of the St genome, St 1 St 2 , similar to that of Pseudoroegneria elytrigioides (C. Yen & J.L. Yang) B.R. Lu. A study, however, concluded that R. alashanica had the StY genome formula typical for tetraploid species of Roegneria. For the present study, R. alashanica, Elymus longearistatus (Bioss.) Tzvelev (StY genomes), Pseudoroegneria strigosa (M. Bieb.) Á. Löve (St), Pseudoroegneria libanoctica (Hackel) D.R. Dewey (St), and Pseudoroegneria spicata (Pursh) Á. Löve (St) were screened for the Y-genome specific marker B14(F+R) 269 . All E. longearistatus plants expressed intense bands specific to the Y genome. Only 6 of 10 R. alashanica plants exhibited relatively faint bands for the STS marker. Previously, the genome in species of Pseudoroegneria exhibiting such faint Y-genome specific marker was designated as St Y . Based on these results, R. alashanica lacks the Y genome in E. longearistatus but likely possess two remotely related St genomes, St and St Y . According to its genome constitution, R. alashanica should be classified in the genus Pseudoroenera and given the new name Pseudoroegneria alashanica (Keng) R.R.-C. Wang and K.B. Jensen.
Schmoll, Monika; Dattenböck, Christoph; Carreras-Villaseñor, Nohemí; Mendoza-Mendoza, Artemio; Tisch, Doris; Alemán, Mario Ivan; Baker, Scott E; Brown, Christopher; Cervantes-Badillo, Mayte Guadalupe; Cetz-Chel, José; Cristobal-Mondragon, Gema Rosa; Delaye, Luis; Esquivel-Naranjo, Edgardo Ulises; Frischmann, Alexa; Gallardo-Negrete, Jose de Jesus; García-Esquivel, Monica; Gomez-Rodriguez, Elida Yazmin; Greenwood, David R; Hernández-Oñate, Miguel; Kruszewska, Joanna S; Lawry, Robert; Mora-Montes, Hector M; Muñoz-Centeno, Tania; Nieto-Jacobo, Maria Fernanda; Nogueira Lopez, Guillermo; Olmedo-Monfil, Vianey; Osorio-Concepcion, Macario; Piłsyk, Sebastian; Pomraning, Kyle R; Rodriguez-Iglesias, Aroa; Rosales-Saavedra, Maria Teresa; Sánchez-Arreguín, J Alejandro; Seidl-Seiboth, Verena; Stewart, Alison; Uresti-Rivera, Edith Elena; Wang, Chih-Li; Wang, Ting-Fang; Zeilinger, Susanne; Casas-Flores, Sergio; Herrera-Estrella, Alfredo
The genus Trichoderma contains fungi with high relevance for humans, with applications in enzyme production for plant cell wall degradation and use in biocontrol. Here, we provide a broad, comprehensive overview of the genomic content of these species for "hot topic" research aspects, including CAZymes, transport, transcription factors, and development, along with a detailed analysis and annotation of less-studied topics, such as signal transduction, genome integrity, chromatin, photobiology, or lipid, sulfur, and nitrogen metabolism in T. reesei, T. atroviride, and T. virens, and we open up new perspectives to those topics discussed previously. In total, we covered more than 2,000 of the predicted 9,000 to 11,000 genes of each Trichoderma species discussed, which is >20% of the respective gene content. Additionally, we considered available transcriptome data for the annotated genes. Highlights of our analyses include overall carbohydrate cleavage preferences due to the different genomic contents and regulation of the respective genes. We found light regulation of many sulfur metabolic genes. Additionally, a new Golgi 1,2-mannosidase likely involved in N-linked glycosylation was detected, as were indications for the ability of Trichoderma spp. to generate hybrid galactose-containing N-linked glycans. The genomic inventory of effector proteins revealed numerous compounds unique to Trichoderma, and these warrant further investigation. We found interesting expansions in the Trichoderma genus in several signaling pathways, such as G-protein-coupled receptors, RAS GTPases, and casein kinases. A particularly interesting feature absolutely unique to T. atroviride is the duplication of the alternative sulfur amino acid synthesis pathway. Copyright © 2016, American Society for Microbiology. All Rights Reserved.
Full Text Available Abstract Background Microsatellites or single sequence repeats (SSRs are a powerful choice of marker in the study of Phytophthora population biology, epidemiology, ecology, genetics and evolution. A strategy was tested in which the publicly available unigene datasets extracted from genome sequences of P. infestans, P. sojae and P. ramorum were mined for candidate SSR markers that could be applied to a wide range of Phytophthora species. Results A first approach, aimed at the identification of polymorphic SSR loci common to many Phytophthora species, yielded 171 reliable sequences containing 211 SSRs. Microsatellites were identified from 16 target species representing the breadth of diversity across the genus. Repeat number ranged from 3 to 16 with most having seven repeats or less and four being the most commonly found. Trinucleotide repeats such as (AAGn, (AGGn and (AGCn were the most common followed by pentanucleotide, tetranucleotide and dinucleotide repeats. A second approach was aimed at the identification of useful loci common to a restricted number of species more closely related to P. sojae (P. alni, P. cambivora, P. europaea and P. fragariae. This analysis yielded 10 trinucleotide and 2 tetranucleotide SSRs which were repeated 4, 5 or 6 times. Conclusion Key studies on inter- and intra-specific variation of selected microsatellites remain. Despite the screening of conserved gene coding regions, the sequence diversity between species was high and the identification of useful SSR loci applicable to anything other than the most closely related pairs of Phytophthora species was challenging. That said, many novel SSR loci for species other than the three 'source species' (P. infestans, P. sojae and P. ramorum are reported, offering great potential for the investigation of Phytophthora populations. In addition to the presence of microsatellites, many of the amplified regions may represent useful molecular marker regions for other studies as
Leichty, Aaron R; Brisson, Dustin
Population genomic analyses have demonstrated power to address major questions in evolutionary and molecular microbiology. Collecting populations of genomes is hindered in many microbial species by the absence of a cost effective and practical method to collect ample quantities of sufficiently pure genomic DNA for next-generation sequencing. Here we present a simple method to amplify genomes of a target microbial species present in a complex, natural sample. The selective whole genome amplification (SWGA) technique amplifies target genomes using nucleotide sequence motifs that are common in the target microbe genome, but rare in the background genomes, to prime the highly processive phi29 polymerase. SWGA thus selectively amplifies the target genome from samples in which it originally represented a minor fraction of the total DNA. The post-SWGA samples are enriched in target genomic DNA, which are ideal for population resequencing. We demonstrate the efficacy of SWGA using both laboratory-prepared mixtures of cultured microbes as well as a natural host-microbe association. Targeted amplification of Borrelia burgdorferi mixed with Escherichia coli at genome ratios of 1:2000 resulted in >10(5)-fold amplification of the target genomes with genomic extracts from Wolbachia pipientis-infected Drosophila melanogaster resulted in up to 70% of high-throughput resequencing reads mapping to the W. pipientis genome. By contrast, 2-9% of sequencing reads were derived from W. pipientis without prior amplification. The SWGA technique results in high sequencing coverage at a fraction of the sequencing effort, thus allowing population genomic studies at affordable costs. Copyright © 2014 by the Genetics Society of America.
Gadau, Jürgen; Helmkampf, Martin; Nygaard, Sanne
Ants (Hymenoptera, Formicidae) represent one of the most successful eusocial taxa in terms of both their geographic distribution and species number. The publication of seven ant genomes within the past year was a quantum leap for socio- and ant genomics. The diversity of social organization in ants...... between social and solitary insects, as well as among ant species. Altogether, these seven ant genomes open exciting new research avenues and opportunities for understanding the genetic basis and regulation of social species, and adaptive complex systems in general....... makes them excellent model organisms to study the evolution of social systems. Comparing the ant genomes with those of the honeybee, a lineage that evolved eusociality independently from ants, and solitary insects suggests that there are significant differences in key aspects of genome organization...
Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi
The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Voigt, Anja; Schöfl, Gerhard; Saluz, Hans Peter
Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.
Full Text Available Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis.A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins.This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.
Full Text Available Abstract Background Musa species (Zingiberaceae, Zingiberales including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no large-scale analyses of Musa genomic sequence have been conducted. This study compares genomic sequence in two Musa species with orthologous regions in the rice genome. Results We produced 1.4 Mb of Musa sequence from 13 BAC clones, annotated and analyzed them along with 4 previously sequenced BACs. The 443 predicted genes revealed that Zingiberales genes share GC content and distribution characteristics with eudicot and Poaceae genomes. Comparison with rice revealed microsynteny regions that have persisted since the divergence of the Commelinid orders Poales and Zingiberales at least 117 Mya. The previously hypothesized large-scale duplication event in the common ancestor of major cereal lineages within the Poaceae was verified. The divergence time distributions for Musa-Zingiber (Zingiberaceae, Zingiberales orthologs and paralogs provide strong evidence for a large-scale duplication event in the Musa lineage after its divergence from the Zingiberaceae approximately 61 Mya. Comparisons of genomic regions from M. acuminata and M. balbisiana revealed highly conserved genome structure, and indicated that these genomes diverged circa 4.6 Mya. Conclusion These results point to the utility of comparative analyses between distantly-related monocot species such as rice and Musa for improving our understanding of monocot genome evolution. Sequencing the genome of M. acuminata would provide a strong foundation for comparative genomics in the monocots. In addition a genome sequence would aid genomic and genetic analyses of cultivated Musa polyploid genotypes in research aimed at localizing and cloning genes controlling important agronomic
Singh, Sangeeta; Chand, Suresh; Singh, N. K.; Sharma, Tilak Raj
The resistance (R) genes and defense response (DR) genes have become very important resources for the development of disease resistant cultivars. In the present investigation, genome-wide identification, expression, phylogenetic and synteny analysis was done for R and DR-genes across three species of rice viz: Oryza sativa ssp indica cv 93-11, Oryza sativa ssp japonica and wild rice species, Oryza brachyantha. We used the in silico approach to identify and map 786 R -genes and 167 DR-genes, 672 R-genes and 142 DR-genes, 251 R-genes and 86 DR-genes in the japonica, indica and O. brachyanth a genomes, respectively. Our analysis showed that 60.5% and 55.6% of the R-genes are tandemly repeated within clusters and distributed over all the rice chromosomes in indica and japonica genomes, respectively. The phylogenetic analysis along with motif distribution shows high degree of conservation of R- and DR-genes in clusters. In silico expression analysis of R-genes and DR-genes showed more than 85% were expressed genes showing corresponding EST matches in the databases. This study gave special emphasis on mechanisms of gene evolution and duplication for R and DR genes across species. Analysis of paralogs across rice species indicated 17% and 4.38% R-genes, 29% and 11.63% DR-genes duplication in indica and Oryza brachyantha, as compared to 20% and 26% duplication of R-genes and DR-genes in japonica respectively. We found that during the course of duplication only 9.5% of R- and DR-genes changed their function and rest of the genes have maintained their identity. Syntenic relationship across three genomes inferred that more orthology is shared between indica and japonica genomes as compared to brachyantha genome. Genome wide identification of R-genes and DR-genes in the rice genome will help in allele mining and functional validation of these genes, and to understand molecular mechanism of disease resistance and their evolution in rice and related species. PMID:25902056
Full Text Available Inter-specific hybridization occurs frequently in higher plants, and represents a driving force of evolution and speciation. Inter-specific hybridization often induces genetic and epigenetic instabilities in the resultant homoploid hybrids or allopolyploids, a phenomenon known as genome shock. Although genetic and epigenetic consequences of hybridizations between rice subspecies (e.g., japonica and indica and closely related species sharing the same AA genome have been extensively investigated, those of inter-specific hybridizations between more remote species with different genomes in the rice genus, Oryza, remain largely unknown.We investigated the immediate chromosomal and molecular genetic/epigenetic instability of three triploid F1 hybrids produced by inter-specific crossing between species with divergent genomes of Oryza by genomic in situ hybridization (GISH and molecular marker analysis. Transcriptional and transpositional activity of several transposable elements (TEs and methylation stability of their flanking regions were also assessed. We made the following principle findings: (i all three triploid hybrids are stable in both chromosome number and gross structure; (ii stochastic changes in both DNA sequence and methylation occurred in individual plants of all three triploid hybrids, but in general methylation changes occurred at lower frequencies than genetic changes; (iii alteration in DNA methylation occurred to a greater extent in genomic loci flanking potentially active TEs than in randomly sampled loci; (iv transcriptional activation of several TEs commonly occurred in all three hybrids but transpositional events were detected in a genetic context-dependent manner.Artificially constructed inter-specific hybrids of remotely related species with divergent genomes in genus Oryza are chromosomally stable but show immediate and highly stochastic genetic and epigenetic instabilities at the molecular level. These novel hybrids might
Wang, Xiaowu; Wang, Hanzhong; Wang, Jun
We report the annotation and analysis of the draft genome sequence of Brassica rapa accession Chiifu-401-42, a Chinese cabbage. We modeled 41,174 protein coding genes in the B. rapa genome, which has undergone genome triplication. We used Arabidopsis thaliana as an outgroup for investigating...... of Brassica oil and vegetable crops....
Full Text Available The complex process of allopolyploid speciation includes various mechanisms ranging from species crosses and hybrid genome doubling to genome alterations and the establishment of new allopolyploids as persisting natural entities. Currently, little is known about the genetic mechanisms that underlie hybrid genome doubling, despite the fact that natural allopolyploid formation is highly dependent on this phenomenon. We examined the genetic basis for the spontaneous genome doubling of triploid F1 hybrids between the direct ancestors of allohexaploid common wheat (Triticum aestivum L., AABBDD genome, namely Triticumturgidum L. (AABB genome and Aegilopstauschii Coss. (DD genome. An Ae. tauschii intraspecific lineage that is closely related to the D genome of common wheat was identified by population-based analysis. Two representative accessions, one that produces a high-genome-doubling-frequency hybrid when crossed with a T. turgidum cultivar and the other that produces a low-genome-doubling-frequency hybrid with the same cultivar, were chosen from that lineage for further analyses. A series of investigations including fertility analysis, immunostaining, and quantitative trait locus (QTL analysis showed that (1 production of functional unreduced gametes through nonreductional meiosis is an early step key to successful hybrid genome doubling, (2 first division restitution is one of the cytological mechanisms that cause meiotic nonreduction during the production of functional male unreduced gametes, and (3 six QTLs in the Ae. tauschii genome, most of which likely regulate nonreductional meiosis and its subsequent gamete production processes, are involved in hybrid genome doubling. Interlineage comparisons of Ae. tauschii's ability to cause hybrid genome doubling suggested an evolutionary model for the natural variation pattern of the trait in which non-deleterious mutations in six QTLs may have important roles. The findings of this study demonstrated
Brankovics, Balázs; van Dam, Peter; Rep, Martijn; de Hoog, G Sybren; J van der Lee, Theo A; Waalwijk, Cees; van Diepeningen, Anne D
The Fusarium oxysporum species complex (FOSC) contains several phylogenetic lineages. Phylogenetic studies identified two to three major clades within the FOSC. The mitochondrial sequences are highly informative phylogenetic markers, but have been mostly neglected due to technical difficulties. A total of 61 complete mitogenomes of FOSC strains were de novo assembled and annotated. Length variations and intron patterns support the separation of three phylogenetic species. The variable region of the mitogenome that is typical for the genus Fusarium shows two new variants in the FOSC. The variant typical for Fusarium is found in members of all three clades, while variant 2 is found in clades 2 and 3 and variant 3 only in clade 2. The extended set of loci analyzed using a new implementation of the genealogical concordance species recognition method support the identification of three phylogenetic species within the FOSC. Comparative analysis of the mitogenomes in the FOSC revealed ongoing mitochondrial recombination within, but not between phylogenetic species. The recombination indicates the presence of a parasexual cycle in F. oxysporum. The obstacles hindering the usage of the mitogenomes are resolved by using next generation sequencing and selective genome assemblers, such as GRAbB. Complete mitogenome sequences offer a stable basis and reference point for phylogenetic and population genetic studies.
Lassiter, Erica S; Russ, Carsten; Nusbaum, Chad; Zeng, Qiandong; Saville, Amanda C; Olarte, Rodrigo A; Carbone, Ignazio; Hu, Chia-Hui; Seguin-Orlando, Andaine; Samaniego, Jose A; Thorne, Jeffrey L; Ristaino, Jean B
Phytophthora infestans is one of the most destructive plant pathogens of potato and tomato globally. The pathogen is closely related to four other Phytophthora species in the 1c clade including P. phaseoli, P. ipomoeae, P. mirabilis and P. andina that are important pathogens of other wild and domesticated hosts. P. andina is an interspecific hybrid between P. infestans and an unknown Phytophthora species. We have sequenced mitochondrial genomes of the sister species of P. infestans and examined the evolutionary relationships within the clade. Phylogenetic analysis indicates that the P. phaseoli mitochondrial lineage is basal within the clade. P. mirabilis and P. ipomoeae are sister lineages and share a common ancestor with the Ic mitochondrial lineage of P. andina. These lineages in turn are sister to the P. infestans and P. andina Ia mitochondrial lineages. The P. andina Ic lineage diverged much earlier than the P. andina Ia mitochondrial lineage and P. infestans. The presence of two mitochondrial lineages in P. andina supports the hybrid nature of this species. The ancestral state of the P. andina Ic lineage in the tree and its occurrence only in the Andean regions of Ecuador, Colombia and Peru suggests that the origin of this species hybrid in nature may occur there.
Chattaway, Marie A; Schaefer, Ulf; Tewolde, Rediat; Dallman, Timothy J; Jenkins, Claire
Escherichia coli and Shigella species are closely related and genetically constitute the same species. Differentiating between these two pathogens and accurately identifying the four species of Shigella are therefore challenging. The organism-specific bioinformatics whole-genome sequencing (WGS) typing pipelines at Public Health England are dependent on the initial identification of the bacterial species by use of a kmer-based approach. Of the 1,982 Escherichia coli and Shigella sp. isolates analyzed in this study, 1,957 (98.4%) had concordant results by both traditional biochemistry and serology (TB&S) and the kmer identification (ID) derived from the WGS data. Of the 25 mismatches identified, 10 were enteroinvasive E. coli isolates that were misidentified as Shigella flexneri or S. boydii by the kmer ID, and 8 were S. flexneri isolates misidentified by TB&S as S. boydii due to nonfunctional S. flexneri O antigen biosynthesis genes. Analysis of the population structure based on multilocus sequence typing (MLST) data derived from the WGS data showed that the remaining discrepant results belonged to clonal complex 288 (CC288), comprising both S. boydii and S. dysenteriae strains. Mismatches between the TB&S and kmer ID results were explained by the close phylogenetic relationship between the two species and were resolved with reference to the MLST data. Shigella can be differentiated from E. coli and accurately identified to the species level by use of kmer comparisons and MLST. Analysis of the WGS data provided explanations for the discordant results between TB&S and WGS data, revealed the true phylogenetic relationships between different species of Shigella, and identified emerging pathoadapted lineages. © Crown copyright 2017.
Krahulcová, Anna; Trávnícek, Pavel; Krahulec, František; Rejmánek, Marcel
Aesculus L. (horse chestnut, buckeye) is a genus of 12-19 extant woody species native to the temperate Northern Hemisphere. This genus is known for unusually large seeds among angiosperms. While chromosome counts are available for many Aesculus species, only one has had its genome size measured. The aim of this study is to provide more genome size data and analyse the relationship between genome size and seed mass in this genus. Chromosome numbers in root tip cuttings were confirmed for four species and reported for the first time for three additional species. Flow cytometric measurements of 2C nuclear DNA values were conducted on eight species, and mean seed mass values were estimated for the same taxa. The same chromosome number, 2 n = 40, was determined in all investigated taxa. Original measurements of 2C values for seven Aesculus species (eight taxa), added to just one reliable datum for A. hippocastanum , confirmed the notion that the genome size in this genus with relatively large seeds is surprisingly low, ranging from 0·955 pg 2C -1 in A. parviflora to 1·275 pg 2C -1 in A. glabra var. glabra. The chromosome number of 2 n = 40 seems to be conclusively the universal 2 n number for non-hybrid species in this genus. Aesculus genome sizes are relatively small, not only within its own family, Sapindaceae, but also within woody angiosperms. The genome sizes seem to be distinct and non-overlapping among the four major Aesculus clades. These results provide an extra support for the most recent reconstruction of Aesculus phylogeny. The correlation between the 2C values and seed masses in examined Aesculus species is slightly negative and not significant. However, when the four major clades are treated separately, there is consistent positive association between larger genome size and larger seed mass within individual lineages. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For
Kristopher J. L. Irizarry
Full Text Available Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
Irizarry, Kristopher J L; Bryant, Doug; Kalish, Jordan; Eng, Curtis; Schmidt, Peggy L; Barrett, Gini; Barr, Margaret C
Many endangered captive populations exhibit reduced genetic diversity resulting in health issues that impact reproductive fitness and quality of life. Numerous cost effective genomic sequencing and genotyping technologies provide unparalleled opportunity for incorporating genomics knowledge in management of endangered species. Genomic data, such as sequence data, transcriptome data, and genotyping data, provide critical information about a captive population that, when leveraged correctly, can be utilized to maximize population genetic variation while simultaneously reducing unintended introduction or propagation of undesirable phenotypes. Current approaches aimed at managing endangered captive populations utilize species survival plans (SSPs) that rely upon mean kinship estimates to maximize genetic diversity while simultaneously avoiding artificial selection in the breeding program. However, as genomic resources increase for each endangered species, the potential knowledge available for management also increases. Unlike model organisms in which considerable scientific resources are used to experimentally validate genotype-phenotype relationships, endangered species typically lack the necessary sample sizes and economic resources required for such studies. Even so, in the absence of experimentally verified genetic discoveries, genomics data still provides value. In fact, bioinformatics and comparative genomics approaches offer mechanisms for translating these raw genomics data sets into integrated knowledge that enable an informed approach to endangered species management.
West, Claire; James, Stephen A; Davey, Robert P; Dicks, Jo; Roberts, Ian N
The ribosomal RNA encapsulates a wealth of evolutionary information, including genetic variation that can be used to discriminate between organisms at a wide range of taxonomic levels. For example, the prokaryotic 16S rDNA sequence is very widely used both in phylogenetic studies and as a marker in metagenomic surveys and the internal transcribed spacer region, frequently used in plant phylogenetics, is now recognized as a fungal DNA barcode. However, this widespread use does not escape criticism, principally due to issues such as difficulties in classification of paralogous versus orthologous rDNA units and intragenomic variation, both of which may be significant barriers to accurate phylogenetic inference. We recently analyzed data sets from the Saccharomyces Genome Resequencing Project, characterizing rDNA sequence variation within multiple strains of the baker's yeast Saccharomyces cerevisiae and its nearest wild relative Saccharomyces paradoxus in unprecedented detail. Notably, both species possess single locus rDNA systems. Here, we use these new variation datasets to assess whether a more detailed characterization of the rDNA locus can alleviate the second of these phylogenetic issues, sequence heterogeneity, while controlling for the first. We demonstrate that a strong phylogenetic signal exists within both datasets and illustrate how they can be used, with existing methodology, to estimate intraspecies phylogenies of yeast strains consistent with those derived from whole-genome approaches. We also describe the use of partial Single Nucleotide Polymorphisms, a type of sequence variation found only in repetitive genomic regions, in identifying key evolutionary features such as genome hybridization events and show their consistency with whole-genome Structure analyses. We conclude that our approach can transform rDNA sequence heterogeneity from a problem to a useful source of evolutionary information, enabling the estimation of highly accurate phylogenies of
Full Text Available Papaver rhoeas L. and P. orientale L., which belong to the family Papaveraceae, are used as ornamental and medicinal plants. The chloroplast genome has been used for molecular markers, evolutionary biology, and barcoding identification. In this study, the complete chloroplast genome sequences of P. rhoeas and P. orientale are reported. Results show that the complete chloroplast genomes of P. rhoeas and P. orientale have typical quadripartite structures, which are comprised of circular 152,905 and 152,799-bp-long molecules, respectively. A total of 130 genes were identified in each genome, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence divergence analysis of four species from Papaveraceae indicated that the most divergent regions are found in the non-coding spacers with minimal differences among three Papaver species. These differences include the ycf1 gene and intergenic regions, such as rpoB-trnC, trnD-trnT, petA-psbJ, psbE-petL, and ccsA-ndhD. These regions are hypervariable regions, which can be used as specific DNA barcodes. This finding suggested that the chloroplast genome could be used as a powerful tool to resolve the phylogenetic positions and relationships of Papaveraceae. These results offer valuable information for future research in the identification of Papaver species and will benefit further investigations of these species.
Genome sequence analysis of five Canadian isolates of strawberry mottle virus reveals extensive intra-species diversity and a longer RNA2 with increased coding capacity compared to a previously characterized European isolate.
Bhagwat, Basdeo; Dickison, Virginia; Ding, Xinlun; Walker, Melanie; Bernardy, Michael; Bouthillier, Michel; Creelman, Alexa; DeYoung, Robyn; Li, Yinzi; Nie, Xianzhou; Wang, Aiming; Xiang, Yu; Sanfaçon, Hélène
In this study, we report the genome sequence of five isolates of strawberry mottle virus (family Secoviridae, order Picornavirales) from strawberry field samples with decline symptoms collected in Eastern Canada. The Canadian isolates differed from the previously characterized European isolate 1134 in that they had a longer RNA2, resulting in a 239-amino-acid extension of the C-terminal region of the polyprotein. Sequence analysis suggests that reassortment and recombination occurred among the isolates. Phylogenetic analysis revealed that the Canadian isolates are diverse, grouping in two separate branches along with isolates from Europe and the Americas.
Full Text Available Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome, indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs (such as transposases, integrases, and phage-associated genes revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation.
Magalhães, Diogo M; Scholte, Larissa L S; Silva, Nicholas V; Oliveira, Guilherme C; Zipfel, Cyril; Takita, Marco A; De Souza, Alessandra A
Leucine-rich repeat receptor-like kinases (LRR-RLKs) represent the largest subfamily of plant RLKs. The functions of most LRR-RLKs have remained undiscovered, and a few that have been experimentally characterized have been shown to have important roles in growth and development as well as in defense responses. Although RLK subfamilies have been previously studied in many plants, no comprehensive study has been performed on this gene family in Citrus species, which have high economic importance and are frequent targets for emerging pathogens. In this study, we performed in silico analysis to identify and classify LRR-RLK homologues in the predicted proteomes of Citrus clementina (clementine) and Citrus sinensis (sweet orange). In addition, we used large-scale phylogenetic approaches to elucidate the evolutionary relationships of the LRR-RLKs and further narrowed the analysis to the LRR-XII group, which contains several previously described cell surface immune receptors. We built integrative protein signature databases for Citrus clementina and Citrus sinensis using all predicted protein sequences obtained from whole genomes. A total of 300 and 297 proteins were identified as LRR-RLKs in C. clementina and C. sinensis, respectively. Maximum-likelihood phylogenetic trees were estimated using Arabidopsis LRR-RLK as a template and they allowed us to classify Citrus LRR-RLKs into 16 groups. The LRR-XII group showed a remarkable expansion, containing approximately 150 paralogs encoded in each Citrus genome. Phylogenetic analysis also demonstrated the existence of two distinct LRR-XII clades, each one constituted mainly by RD and non-RD kinases. We identified 68 orthologous pairs from the C. clementina and C. sinensis LRR-XII genes. In addition, among the paralogs, we identified a subset of 78 and 62 clustered genes probably derived from tandem duplication events in the genomes of C. clementina and C. sinensis, respectively. This work provided the first comprehensive
Wang, Xiaobo; Wu, Jian; Liang, Jianli; Cheng, Feng; Wang, Xiaowu
The Brassica database (BRAD) was built initially to assist users apply Brassica rapa and Arabidopsis thaliana genomic data efficiently to their research. However, many Brassicaceae genomes have been sequenced and released after its construction. These genomes are rich resources for comparative genomics, gene annotation and functional evolutionary studies of Brassica crops. Therefore, we have updated BRAD to version 2.0 (V2.0). In BRAD V2.0, 11 more Brassicaceae genomes have been integrated into the database, namely those of Arabidopsis lyrata, Aethionema arabicum, Brassica oleracea, Brassica napus, Camelina sativa, Capsella rubella, Leavenworthia alabamica, Sisymbrium irio and three extremophiles Schrenkiella parvula, Thellungiella halophila and Thellungiella salsuginea. BRAD V2.0 provides plots of syntenic genomic fragments between pairs of Brassicaceae species, from the level of chromosomes to genomic blocks. The Generic Synteny Browser (GBrowse_syn), a module of the Genome Browser (GBrowse), is used to show syntenic relationships between multiple genomes. Search functions for retrieving syntenic and non-syntenic orthologs, as well as their annotation and sequences are also provided. Furthermore, genome and annotation information have been imported into GBrowse so that all functional elements can be visualized in one frame. We plan to continually update BRAD by integrating more Brassicaceae genomes into the database. Database URL: http://brassicadb.org/brad/. © The Author(s) 2015. Published by Oxford University Press.
Full Text Available Fungi are the causal agents of many of the world's most serious plant diseases causing disastrous consequences for large-scale agricultural production. Pathogenicity genomic basis is complex in fungi as multicellular eukaryotic pathogens. Here, we report the genome sequence of C. sojina, and comparative genome analysis with plant pathogen members of the genus Mycosphaerella (Zymoseptoria. tritici (synonyms M. graminicola, M. pini, M. populorum and M. fijiensis - pathogens of wheat, pine, poplar and banana, respectively. Synteny or collinearity was limited between genomes of major Mycosphaerella pathogens. Comparative analysis with these related pathogen genomes indicated distinct genome-wide repeat organization features. It suggests repetitive elements might be responsible for considerable evolutionary genomic changes. These results reveal the background of genomic differences and similarities between Dothideomycete species. Wide diversity as well as conservation on genome features forms the potential genomic basis of the pathogen specialization, such as pathogenicity to woody vs. herbaceous hosts. Through comparative genome analysis among five Dothideomycete species, our results have shed light on the genome features of these related fungi species. It provides insight for understanding the genomic basis of fungal pathogenicity and disease resistance in the crop hosts.
Schaeffer, Stephen W; Bhutkar, Arjun; McAllister, Bryant F; Matsuda, Muneo; Matzkin, Luciano M; O'Grady, Patrick M; Rohde, Claudia; Valente, Vera L S; Aguadé, Montserrat; Anderson, Wyatt W; Edwards, Kevin; Garcia, Ana C L; Goodman, Josh; Hartigan, James; Kataoka, Eiko; Lapoint, Richard T; Lozovsky, Elena R; Machado, Carlos A; Noor, Mohamed A F; Papaceit, Montserrat; Reed, Laura K; Richards, Stephen; Rieger, Tania T; Russo, Susan M; Sato, Hajime; Segarra, Carmen; Smith, Douglas R; Smith, Temple F; Strelets, Victor; Tobari, Yoshiko N; Tomimura, Yoshihiko; Wasserman, Marvin; Watts, Thomas; Wilson, Robert; Yoshida, Kiyohito; Markow, Therese A; Gelbart, William M; Kaufman, Thomas C
The sequencing of the 12 genomes of members of the genus Drosophila was taken as an opportunity to reevaluate the genetic and physical maps for 11 of the species, in part to aid in the mapping of assembled scaffolds. Here, we present an overview of the importance of cytogenetic maps to Drosophila biology and to the concepts of chromosomal evolution. Physical and genetic markers were used to anchor the genome assembly scaffolds to the polytene chromosomal maps for each species. In addition, a computational approach was used to anchor smaller scaffolds on the basis of the analysis of syntenic blocks. We present the chromosomal map data from each of the 11 sequenced non-Drosophila melanogaster species as a series of sections. Each section reviews the history of the polytene chromosome maps for each species, presents the new polytene chromosome maps, and anchors the genomic scaffolds to the cytological maps using genetic and physical markers. The mapping data agree with Muller's idea that the majority of Drosophila genes are syntenic. Despite the conservation of genes within homologous chromosome arms across species, the karyotypes of these species have changed through the fusion of chromosomal arms followed by subsequent rearrangement events.
Lim, K Yoong; Kovarik, Ales; Matyasek, Roman; Chase, Mark W; Knapp, Sandra; McCarthy, Elizabeth; Clarkson, James J; Leitch, Andrew R
Combining phylogenetic reconstructions of species relationships with comparative genomic approaches is a powerful way to decipher evolutionary events associated with genome divergence. Here, we reconstruct the history of karyotype and tandem repeat evolution in species of diploid Nicotiana section Alatae. By analysis of plastid DNA, we resolved two clades with high bootstrap support, one containing N. alata, N. langsdorffii, N. forgetiana and N. bonariensis (called the n = 9 group) and another containing N. plumbaginifolia and N. longiflora (called the n = 10 group). Despite little plastid DNA sequence divergence, we observed, via fluorescent in situ hybridization, substantial chromosomal repatterning, including altered chromosome numbers, structure and distribution of repeats. Effort was focussed on 35S and 5S nuclear ribosomal DNA (rDNA) and the HRS60 satellite family of tandem repeats comprising the elements HRS60, NP3R and NP4R. We compared divergence of these repeats in diploids and polyploids of Nicotiana. There are dramatic shifts in the distribution of the satellite repeats and complete replacement of intergenic spacers (IGSs) of 35S rDNA associated with divergence of the species in section Alatae. We suggest that sequence homogenization has replaced HRS60 family repeats at sub-telomeric regions, but that this process may not occur, or occurs more slowly, when the repeats are found at intercalary locations. Sequence homogenization acts more rapidly (at least two orders of magnitude) on 35S rDNA than 5S rDNA and sub-telomeric satellite sequences. This rapid rate of divergence is analogous to that found in polyploid species, and is therefore, in plants, not only associated with polyploidy.
Kim, Seungill; Park, Minkyu; Yeom, Seon-In; Kim, Yong-Min; Lee, Je Min; Lee, Hyun-Ah; Seo, Eunyoung; Choi, Jaeyoung; Cheong, Kyeongchae; Kim, Ki-Tae; Jung, Kyongyong; Lee, Gir-Won; Oh, Sang-Keun; Bae, Chungyun; Kim, Saet-Byul; Lee, Hye-Young; Kim, Shin-Young; Kim, Myung-Shin; Kang, Byoung-Cheorl; Jo, Yeong Deuk; Yang, Hee-Bum; Jeong, Hee-Jin; Kang, Won-Hee; Kwon, Jin-Kyung; Shin, Chanseok; Lim, Jae Yun; Park, June Hyun; Huh, Jin Hoe; Kim, June-Sik; Kim, Byung-Dong; Cohen, Oded; Paran, Ilan; Suh, Mi Chung; Lee, Saet Buyl; Kim, Yeon-Ki; Shin, Younhee; Noh, Seung-Jae; Park, Junhyung; Seo, Young Sam; Kwon, Suk-Yoon; Kim, Hyun A; Park, Jeong Mee; Kim, Hyun-Jin; Choi, Sang-Bong; Bosland, Paul W; Reeves, Gregory; Jo, Sung-Hwan; Lee, Bong-Woo; Cho, Hyung-Taeg; Choi, Hee-Seung; Lee, Min-Soo; Yu, Yeisoo; Do Choi, Yang; Park, Beom-Seok; van Deynze, Allen; Ashrafi, Hamid; Hill, Theresa; Kim, Woo Taek; Pai, Hyun-Sook; Ahn, Hee Kyung; Yeam, Inhwa; Giovannoni, James J; Rose, Jocelyn K C; Sørensen, Iben; Lee, Sang-Jik; Kim, Ryan W; Choi, Ik-Young; Choi, Beom-Soon; Lim, Jong-Sung; Lee, Yong-Hwan; Choi, Doil
Hot pepper (Capsicum annuum), one of the oldest domesticated crops in the Americas, is the most widely grown spice crop in the world. We report whole-genome sequencing and assembly of the hot pepper (Mexican landrace of Capsicum annuum cv. CM334) at 186.6× coverage. We also report resequencing of two cultivated peppers and de novo sequencing of the wild species Capsicum chinense. The genome size of the hot pepper was approximately fourfold larger than that of its close relative tomato, and the genome showed an accumulation of Gypsy and Caulimoviridae family elements. Integrative genomic and transcriptomic analyses suggested that change in gene expression and neofunctionalization of capsaicin synthase have shaped capsaicinoid biosynthesis. We found differential molecular patterns of ripening regulators and ethylene synthesis in hot pepper and tomato. The reference genome will serve as a platform for improving the nutritional and medicinal values of Capsicum species.
Oyster aquaculture is an important sector of world food production. As such, it is imperative to develop a high quality reference genome for the eastern oyster, Crassostrea virginica, to assist in the elucidation of the genomic basis of commercially important traits. All genetic, gene expression and...
Genome sequencing efforts in the past decade were aimed at generating draft sequences of many prokaryotic and eukaryotic model organisms. Successful completion of unicellular eukaryotes, worm, fly and human genome have opened up the new field of molecular biology and function...
Teeling, Emma C; Vernes, Sonja C; Dávalos, Liliana M; Ray, David A; Gilbert, M Thomas P; Myers, Eugene
Bats are unique among mammals, possessing some of the rarest mammalian adaptations, including true self-powered flight, laryngeal echolocation, exceptional longevity, unique immunity, contracted genomes, and vocal learning. They provide key ecosystem services, pollinating tropical plants, dispersing seeds, and controlling insect pest populations, thus driving healthy ecosystems. They account for more than 20% of all living mammalian diversity, and their crown-group evolutionary history dates back to the Eocene. Despite their great numbers and diversity, many species are threatened and endangered. Here we announce Bat1K, an initiative to sequence the genomes of all living bat species (n∼1,300) to chromosome-level assembly. The Bat1K genome consortium unites bat biologists (>148 members as of writing), computational scientists, conservation organizations, genome technologists, and any interested individuals committed to a better understanding of the genetic and evolutionary mechanisms that underlie the unique adaptations of bats. Our aim is to catalog the unique genetic diversity present in all living bats to better understand the molecular basis of their unique adaptations; uncover their evolutionary history; link genotype with phenotype; and ultimately better understand, promote, and conserve bats. Here we review the unique adaptations of bats and highlight how chromosome-level genome assemblies can uncover the molecular basis of these traits. We present a novel sequencing and assembly strategy and review the striking societal and scientific benefits that will result from the Bat1K initiative.
Lieschen De Vos
Full Text Available The Gibberella fujikuroi complex includes many Fusarium species that cause significant losses in yield and quality of agricultural and forestry crops. Due to their economic importance, whole-genome sequence information has rapidly become available for species including Fusarium circinatum, Fusarium fujikuroi and Fusarium verticillioides, each of which represent one of the three main clades known in this complex. However, no previous studies have explored the genomic commonalities and differences among these fungi. In this study, a previously completed genetic linkage map for an interspecific cross between Fusarium temperatum and F. circinatum, together with genomic sequence data, was utilized to consider the level of synteny between the three Fusarium genomes. Regions that are homologous amongst the Fusarium genomes examined were identified using in silico and pyrosequenced amplified fragment length polymorphism (AFLP fragment analyses. Homology was determined using BLAST analysis of the sequences, with 777 homologous regions aligned to F. fujikuroi and F. verticillioides. This also made it possible to assign the linkage groups from the interspecific cross to their corresponding chromosomes in F. verticillioides and F. fujikuroi, as well as to assign two previously unmapped supercontigs of F. verticillioides to probable chromosomal locations. We further found evidence of a reciprocal translocation between the distal ends of chromosome 8 and 11, which apparently originated before the divergence of F. circinatum and F. temperatum. Overall, a remarkable level of macrosynteny was observed among the three Fusarium genomes, when comparing AFLP fragments. This study not only demonstrates how in silico AFLPs can aid in the integration of a genetic linkage map to the physical genome, but it also highlights the benefits of using this tool to study genomic synteny and architecture.
Background Brachyspira spp. colonize the intestines of some mammalian and avian species and show different degrees of enteropathogenicity. Brachyspira intermedia can cause production losses in chickens and strain PWS/AT now becomes the fourth genome to be completed in the genus Brachyspira. Results 15 classes of unique and shared genes were analyzed in B. intermedia, B. murdochii, B. hyodysenteriae and B. pilosicoli. The largest number of unique genes was found in B. intermedia and B. murdochii. This indicates the presence of larger pan-genomes. In general, hypothetical protein annotations are overrepresented among the unique genes. A 3.2 kb plasmid was found in B. intermedia strain PWS/AT. The plasmid was also present in the B. murdochii strain but not in nine other Brachyspira isolates. Within the Brachyspira genomes, genes had been translocated and also frequently switched between leading and lagging strands, a process that can be followed by different AT-skews in the third positions of synonymous codons. We also found evidence that bacteriophages were being remodeled and genes incorporated into them. Conclusions The accessory gene pool shapes species-specific traits. It is also influenced by reductive genome evolution and horizontal gene transfer. Gene-transfer events can cross both species and genus boundaries and bacteriophages appear to play an important role in this process. A mechanism for horizontal gene transfer appears to be gene translocations leading to remodeling of bacteriophages in combination with broad tropism. PMID:21816042
Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu
Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.
Rokas, Antonis; payne, gary; Federova, Natalie D.; Baker, Scott E.; Machida, Masa; yu, Jiujiang; georgianna, D. R.; Dean, Ralph A.; Bhatnagar, Deepak; Cleveland, T. E.; Wortman, Jennifer R.; Maiti, R.; Joardar, V.; Amedeo, Paolo; Denning, David W.; Nierman, William C.
Understanding the nature of species" boundaries is a fundamental question in evolutionary biology. The availability of genomes from several species of the genus Aspergillus allows us for the first time to examine the demarcation of fungal species at the whole-genome level. Here, we examine four case studies, two of which involve intraspecific comparisons, whereas the other two deal with interspecific genomic comparisons between closely related species. These four comparisons reveal significant variation in the nature of species boundaries across Aspergillus. For example, comparisons between A. fumigatus and Neosartorya fischeri (the teleomorph of A. fischerianus) and between A. oryzae and A. flavus suggest that measures of sequence similarity and species-specific genes are significantly higher for the A. fumigatus - N. fischeri pair. Importantly, the values obtained from the comparison between A. oryzae and A. flavus are remarkably similar to those obtained from an intra-specific comparison of A. fumigatus strains, giving support to the proposal that A. oryzae represents a distinct ecotype of A. flavus and not a distinct species. We argue that genomic data can aid Aspergillus taxonomy by serving as a source of novel and unprecedented amounts of comparative data, as a resource for the development of additional diagnostic tools, and finally as a knowledge database about the biological differences between strains and species.
Su, Fei; Tao, Fei; Tang, Hongzhi; Xu, Ping
Here we announce a 3.0-Mb assembly of the Bacillus coagulans Hammer strain, which is the type strain of the species within the genus Bacillus. Genomic analyses based on the sequence may provide insights into the phylogeny of the species and help to elucidate characteristics of the poorly studied strains of Bacillus coagulans.
Su, Fei; Tao, Fei; Tang, Hongzhi; Xu, Ping
Here we announce a 3.0-Mb assembly of the Bacillus coagulans Hammer strain, which is the type strain of the species within the genus Bacillus. Genomic analyses based on the sequence may provide insights into the phylogeny of the species and help to elucidate characteristics of the poorly studied strains of Bacillus coagulans.
Lang Chunting, Chunting
The legume Trifolium pratense (red clover) is an important fodder crop and produces important secondary metabolites. This makes red clover an interesting species. In this thesis, the red clover genome is compared to the legume model species Medicago truncatula, of which the
Filippidou, Sevasti; Jaussi, Marion; Junier, Thomas; Wunderlin, Tina; Roussel-Delif, Ludovic; Jeanneret, Nicole; Vieth-Hillebrand, Andrea; Vetter, Alexandra; Regenspurg, Simona; McMurry, Kim; Gleasner, Cheryl D.; Lo, Chien-Chi; Li, Paul; Vuyisich, Momchilo; Chain, Patrick S.
Anoxybacillus geothermalis strain GSsed3 is an endospore-forming thermophilic bacterium isolated from filter deposits in a geothermal site. This novel species has a larger genome size (7.2 Mb) than that of any other Anoxybacillus species, and it possesses genes that support its phenotypic metabolic characterization and suggest an intriguing link to metals. PMID:26067952
Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM
The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and
Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.
Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494
Beck, James B; Semple, John C
The ability to conduct species delimitation and phylogeny reconstruction with genomic data sets obtained exclusively from herbarium specimens would rapidly enhance our knowledge of large, taxonomically contentious plant genera. In this study, the utility of genotyping by sequencing is assessed in the notoriously difficult genus Solidago (Asteraceae) by attempting to obtain an informative single-nucleotide polymorphism data set from a set of specimens collected between 1970 and 2010. Reduced representation libraries were prepared and Illumina-sequenced from 95 Solidago herbarium specimen DNAs, and resulting reads were processed with the nonreference Universal Network-Enabled Analysis Kit (UNEAK) pipeline. Multidimensional clustering was used to assess the correspondence between genetic groups and morphologically defined species. Library construction and sequencing were successful in 93 of 95 samples. The UNEAK pipeline identified 8470 single-nucleotide polymorphisms, and a filtered data set was analyzed for each of three Solidago subsections. Although results varied, clustering identified genomic groups that often corresponded to currently recognized species or groups of closely related species. These results suggest that genotyping by sequencing is broadly applicable to DNAs obtained from herbarium specimens. The data obtained and their biological signal suggest that pairing genomics with large-scale herbarium sampling is a promising strategy in species-rich plant groups.
Schmidt, Johanna; Jezberová, Jitka; Koll, Ulrike; Hahn, Martin W.
ABSTRACT Microdiversification of a planktonic freshwater bacterium was studied by comparing 37 Polynucleobacter asymbioticus strains obtained from three geographically separated sites in the Austrian Alps. Genome comparison of nine strains revealed a core genome of 1.8 Mb, representing 81% of the average genome size. Seventy-five percent of the remaining flexible genome is clustered in genomic islands (GIs). Twenty-four genomic positions could be identified where GIs are potentially located. These positions are occupied strain specifically from a set of 28 GI variants, classified according to similarities in their gene content. One variant, present in 62% of the isolates, encodes a pathway for the degradation of aromatic compounds, and another, found in 78% of the strains, contains an operon for nitrate assimilation. Both variants were shown in ecophysiological tests to be functional, thus providing the potential for microniche partitioning. In addition, detected interspecific horizontal exchange of GIs indicates a large gene pool accessible to Polynucleobacter species. In contrast to core genes, GIs are spread more successfully across spatially separated freshwater habitats. The mobility and functional diversity of GIs allow for rapid evolution, which may be a key aspect for the ubiquitous occurrence of Polynucleobacter bacteria. IMPORTANCE Assessing the ecological relevance of bacterial diversity is a key challenge for current microbial ecology. The polyphasic approach which was applied in this study, including targeted isolation of strains, genome analysis, and ecophysiological tests, is crucial for the linkage of genetic and ecological knowledge. Particularly great importance is attached to the high number of closely related strains which were investigated, represented by genome-wide average nucleotide identities (ANI) larger than 97%. The extent of functional diversification found on this narrow phylogenetic scale is compelling. Moreover, the transfer of
Leisner, Courtney P; Hamilton, John P; Crisovan, Emily; Manrique-Carpintero, Norma C; Marand, Alexandre P; Newton, Linsey; Pham, Gina M; Jiang, Jiming; Douches, David S; Jansky, Shelley H; Buell, C Robin
Cultivated potato (Solanum tuberosum L.) is a highly heterozygous autotetraploid that presents challenges in genome analyses and breeding. Wild potato species serve as a resource for the introgression of important agronomic traits into cultivated potato. One key species is Solanum chacoense and the diploid, inbred clone M6, which is self-compatible and has desirable tuber market quality and disease resistance traits. Sequencing and assembly of the genome of the M6 clone of S. chacoense generated an assembly of 825 767 562 bp in 8260 scaffolds with an N50 scaffold size of 713 602 bp. Pseudomolecule construction anchored 508 Mb of the genome assembly into 12 chromosomes. Genome annotation yielded 49 124 high-confidence gene models representing 37 740 genes. Comparative analyses of the M6 genome with six other Solanaceae species revealed a core set of 158 367 Solanaceae genes and 1897 genes unique to three potato species. Analysis of single nucleotide polymorphisms across the M6 genome revealed enhanced residual heterozygosity on chromosomes 4, 8 and 9 relative to the other chromosomes. Access to the M6 genome provides a resource for identification of key genes for important agronomic traits and aids in genome-enabled development of inbred diploid potatoes with the potential to accelerate potato breeding. © 2018 The Authors The Plant Journal © 2018 John Wiley & Sons Ltd.
Meghana Deepak Shirke
Full Text Available Blast disease caused by the Magnaporthe species is a major factor affecting the productivity of rice, wheat and millets. This study was aimed at generating genomic information for rice and non-rice Magnaporthe isolates to understand the extent of genetic variation. We have sequenced the whole genome of the Magnaporthe isolates, infecting rice (leaf and neck, finger millet (leaf and neck, foxtail millet (leaf and buffel grass (leaf. Rice and finger millet isolates infecting both leaf and neck tissues were sequenced, since the damage and yield loss caused due to neck blast is much higher as compared to leaf blast. The genome-wide comparison was carried out to study the variability in gene content, candidate effectors, repeat element distribution, genes involved in carbohydrate metabolism and SNPs. The analysis of repeat element footprints revealed some genes such as naringenin, 2-oxoglutarate 3-dioxygenase being targeted by Pot2 and Occan, in isolates from different host species. Some repeat insertions were host-specific while other insertions were randomly shared between isolates. The distributions of repeat elements, secretory proteins, CAZymes and SNPs showed significant variation across host-specific lineages of Magnaporthe indicating an independent genome evolution orchestrated by multiple genomic factors.
Huguet-Tapia, Jose C.; Lefebure, Tristan; Badger, Jonathan H.; Guan, Dongli; Stanhope, Michael J.
Streptomyces spp. are highly differentiated actinomycetes with large, linear chromosomes that encode an arsenal of biologically active molecules and catabolic enzymes. Members of this genus are well equipped for life in nutrient-limited environments and are common soil saprophytes. Out of the hundreds of species in the genus Streptomyces, a small group has evolved the ability to infect plants. The recent availability of Streptomyces genome sequences, including four genomes of pathogenic species, provided an opportunity to characterize the gene content specific to these pathogens and to study phylogenetic relationships among them. Genome sequencing, comparative genomics, and phylogenetic analysis enabled us to discriminate pathogenic from saprophytic Streptomyces strains; moreover, we calculated that the pathogen-specific genome contains 4,662 orthologs. Phylogenetic reconstruction suggested that Streptomyces scabies and S. ipomoeae share an ancestor but that their biosynthetic clusters encoding the required virulence factor thaxtomin have diverged. In contrast, S. turgidiscabies and S. acidiscabies, two relatively unrelated pathogens, possess highly similar thaxtomin biosynthesis clusters, which suggests that the acquisition of these genes was through lateral gene transfer. PMID:26826232
Liu, Ruijuan; Wang, Richard R-C; Yu, Feng; Lu, Xingwang; Dou, Quanwen
Genomes of ten species of Elymus, either presumed or known as tetraploid StY, were characterized using fluorescence in situ hybridization (FISH) and genomic in situ hybridization (GISH). These tetraploid species could be grouped into three categories. Type I included StY genome reported species-Roegneria pendulina, R. nutans, R. glaberrima, R. ciliaris, and Elymus nevskii, and StY genome presumed species-R. sinica, R. breviglumis, and R. dura, whose genome could be separated into two sets based on different GISH intensities. Type I genome constitution was deemed as putative StY. The St genome were mainly characterized with intense hybridization with pAs1, fewer AAG sites, and linked distribution of 5S rDNA and 18S-26S rDNA, while the Y genome with less intense hybridization with pAs1, more varied AAG sites, and isolated distribution of 5S rDNA and 18S-26S rDNA. Nevertheless, further genomic variations were detected among the different StY species. Type II included E. alashanicus, whose genome could be easily separated based on GISH pattern. FISH and GISH patterns suggested that E. alashanicus comprised a modified St genome and an unknown genome. Type III included E. longearistatus, whose genome could not be separated by GISH and was designated as St l Y l . Notably, a close relationship between S l and Y l genomes was observed.
ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong
We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.
Full Text Available Abstract Background At least three species of Borrelia burgdorferi sensu lato (Bbsl cause tick-borne Lyme disease. Previous work including the genome analysis of B. burgdorferi B31 and B. garinii PBi suggested a highly variable plasmid part. The frequent occurrence of duplicated sequence stretches, the observed plasmid redundancy, as well as the mainly unknown function and variability of plasmid encoded genes rendered the relationships between plasmids within and between species largely unresolvable. Results To gain further insight into Borreliae genome properties we completed the plasmid sequences of B. garinii PBi, added the genome of a further species, B. afzelii PKo, to our analysis, and compared for both species the genomes of pathogenic and apathogenic strains. The core of all Bbsl genomes consists of the chromosome and two plasmids collinear between all species. We also found additional groups of plasmids, which share large parts of their sequences. This makes it very likely that these plasmids are relatively stable and share common ancestors before the diversification of Borrelia species. The analysis of the differences between B. garinii PBi and B. afzelii PKo genomes of low and high passages revealed that the loss of infectivity is accompanied in both species by a loss of similar genetic material. Whereas B. garinii PBi suffered only from the break-off of a plasmid end, B. afzelii PKo lost more material, probably an entire plasmid. In both cases the vls gene locus encoding for variable surface proteins is affected. Conclusion The complete genome sequences of a B. garinii and a B. afzelii strain facilitate further comparative studies within the genus Borrellia. Our study shows that loss of infectivity can be traced back to only one single event in B. garinii PBi: the loss of the vls cassettes possibly due to error prone gene conversion. Similar albeit extended losses in B. afzelii PKo support the hypothesis that infectivity of Borrelia
Matt C. Estep
Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.
Scleractinian corals are the foundation species of the coral-reef ecosystem. Their calcium carbonate skeletons form extensive structures that are home to millions of species, making coral reefs one of the most diverse ecosystems of our planet. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by uncertain relationships within their subclass Hexacorallia. Corallimorpharians have been proposed to originate from a complex scleractinian ancestor that lost the ability to calcify in response to increasing ocean acidification, suggesting the possibility for corals to lose and gain the ability to calcify in response to increasing ocean acidification. Here we employed a phylogenomic approach using whole-genome data from six hexacorallian species to resolve the evolutionary relationship between reef-building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same topologies, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.
Wang, Xin; Drillon, Gué nola; Ryu, Taewoo; Voolstra, Christian R.; Aranda, Manuel
Scleractinian corals are the foundation species of the coral-reef ecosystem. Their calcium carbonate skeletons form extensive structures that are home to millions of species, making coral reefs one of the most diverse ecosystems of our planet. However, our understanding of how reef-building corals have evolved the ability to calcify and become the ecosystem builders they are today is hampered by uncertain relationships within their subclass Hexacorallia. Corallimorpharians have been proposed to originate from a complex scleractinian ancestor that lost the ability to calcify in response to increasing ocean acidification, suggesting the possibility for corals to lose and gain the ability to calcify in response to increasing ocean acidification. Here we employed a phylogenomic approach using whole-genome data from six hexacorallian species to resolve the evolutionary relationship between reef-building corals and their non-calcifying relatives. Phylogenetic analysis based on 1,421 single-copy orthologs, as well as gene presence/absence and synteny information, converged on the same topologies, showing strong support for scleractinian monophyly and a corallimorpharian sister clade. Our broad phylogenomic approach using sequence-based and sequence-independent analyses provides unambiguous evidence for the monophyly of scleractinian corals and the rejection of corallimorpharians as descendants of a complex coral ancestor.
Anna A Vanyushkina
Full Text Available We present a systematic study of three bacterial species that belong to the class Mollicutes, the smallest and simplest bacteria, Spiroplasma melliferum, Mycoplasma gallisepticum, and Acholeplasma laidlawii. To understand the difference in the basic principles of metabolism regulation and adaptation to environmental conditions in the three species, we analyzed the metabolome of these bacteria. Metabolic pathways were reconstructed using the proteogenomic annotation data provided by our lab. The results of metabolome, proteome and genome profiling suggest a fundamental difference in the adaptation of the three closely related Mollicute species to stress conditions. As the transaldolase is not annotated in Mollicutes, we propose variants of the pentose phosphate pathway catalyzed by annotated enzymes for three species. For metabolite detection we employed high performance liquid chromatography coupled with mass spectrometry. We used liquid chromatography method - hydrophilic interaction chromatography with silica column - as it effectively separates highly polar cellular metabolites prior to their detection by mass spectrometer.
Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong
Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.
Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong
Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858
Kim, Yongkyu; Blasche, Sonja; Patil, Kiran R
We report here the genome sequences of three novel bacterial species strains- Bacillus kefirresidentii Opo, Rothia kefirresidentii KRP, and Streptococcus kefirresidentii YK-isolated from kefir grains collected in Germany. The draft genomes of these isolates were remarkably dissimilar (average nucleotide identities, 77.80%, 89.01%, and 92.10%, respectively) to those of the previously sequenced strains. Copyright © 2017 Kim et al.
Full Text Available Abstract Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit that allows solving several different tasks in a unified framework: (1 finding regions of high similarity among multiple genomic sequences and aligning them, (2 comparing two draft or multi-chromosomal genomes, (3 locating large segmental duplications in large genomic sequences, and (4 mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component, CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.
Su, Fei; Xu, Ping
Microbial strains with high substrate efficiency and excellent environmental tolerance are urgently needed for the production of platform bio-chemicals. Bacillus coagulans has these merits; however, little genetic information is available about this species. Here, we determined the genome sequences of five B. coagulans strains, and used a comparative genomic approach to reconstruct the central carbon metabolism of this species to explain their fermentation features. A novel xylose isomerase in the xylose utilization pathway was identified in these strains. Based on a genome-wide positive selection scan, the selection pressure on amino acid metabolism may have played a significant role in the thermal adaptation. We also researched the immune systems of B. coagulans strains, which provide them with acquired resistance to phages and mobile genetic elements. Our genomic analysis provides comprehensive insights into the genetic characteristics of B. coagulans and paves the way for improving and extending the uses of this species.
Full Text Available Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants.
Inês C Conceição
Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high
Gil Ana I
Full Text Available Abstract Background Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh- or (tdh-, trh+. The sixth pandemic strain sequenced in this study was serotype O4:K68. Results Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and K-antigen-encoding gene clusters. The "core" genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59% of each species' core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different
Shi, Jiaqin; Huang, Shunmou; Zhan, Jiepeng; Yu, Jingyin; Wang, Xinfa; Hua, Wei; Liu, Shengyi; Liu, Guihua; Wang, Hanzhong
Although much research has been conducted, the pattern of microsatellite distribution has remained ambiguous, and the development/utilization of microsatellite markers has still been limited/inefficient in Brassica, due to the lack of genome sequences. In view of this, we conducted genome-wide microsatellite characterization and marker development in three recently sequenced Brassica crops: Brassica rapa, Brassica oleracea and Brassica napus. The analysed microsatellite characteristics of these Brassica species were highly similar or almost identical, which suggests that the pattern of microsatellite distribution is likely conservative in Brassica. The genomic distribution of microsatellites was highly non-uniform and positively or negatively correlated with genes or transposable elements, respectively. Of the total of 115 869, 185 662 and 356 522 simple sequence repeat (SSR) markers developed with high frequencies (408.2, 343.8 and 356.2 per Mb or one every 2.45, 2.91 and 2.81 kb, respectively), most represented new SSR markers, the majority had determined physical positions, and a large number were genic or putative single-locus SSR markers. We also constructed a comprehensive database for the newly developed SSR markers, which was integrated with public Brassica SSR markers and annotated genome components. The genome-wide SSR markers developed in this study provide a useful tool to extend the annotated genome resources of sequenced Brassica species to genetic study/breeding in different Brassica species.
Ogedengbe, Mosun E; El-Sherry, Shiem; Whale, Julia; Barta, John R
Clinical and subclinical coccidiosis is cosmopolitan and inflicts significant losses to the poultry industry globally. Seven named Eimeria species are responsible for coccidiosis in turkeys: Eimeria dispersa; Eimeria meleagrimitis; Eimeria gallopavonis; Eimeria meleagridis; Eimeria adenoeides; Eimeria innocua; and, Eimeria subrotunda. Although attempts have been made to characterize these parasites molecularly at the nuclear 18S rDNA and ITS loci, the maternally-derived and mitotically replicating mitochondrial genome may be more suited for species level molecular work; however, only limited sequence data are available for Eimeria spp. infecting turkeys. The purpose of this study was to sequence and annotate the complete mitochondrial genomes from 5 Eimeria species that commonly infect the domestic turkey (Meleagris gallopavo). Six single-oocyst derived cultures of five Eimeria species infecting turkeys were PCR-amplified and sequenced completely prior to detailed annotation. Resulting sequences were aligned and used in phylogenetic analyses (BI, ML, and MP) that included complete mitochondrial genomes from 16 Eimeria species or concatenated CDS sequences from each genome. Complete mitochondrial genome sequences were obtained for Eimeria adenoeides Guelph, 6211 bp; Eimeria dispersa Briston, 6238 bp; Eimeria meleagridis USAR97-01, 6212 bp; Eimeria meleagrimitis USMN08-01, 6165 bp; Eimeria gallopavonis Weybridge, 6215 bp; and Eimeria gallopavonis USKS06-01, 6215 bp). The order, orientation and CDS lengths of the three protein coding genes (COI, COIII and CytB) as well as rDNA fragments encoding ribosomal large and small subunit rRNA were conserved among all sequences. Pairwise sequence identities between species ranged from 88.1% to 98.2%; sequence variability was concentrated within CDS or between rDNA fragments (where indels were common). No phylogenetic reconstruction supported monophyly of Eimeria species infecting turkeys; Eimeria dispersa may have arisen
Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.
Leong-Škorničková, Jana; Šída, Otakar; Jarolímová, Vlasta; Sabu, Mamyil; Fér, Tomáš; Trávníček, Pavel; Suda, Jan
Background and Aims Genome size and chromosome numbers are important cytological characters that significantly influence various organismal traits. However, geographical representation of these data is seriously unbalanced, with tropical and subtropical regions being largely neglected. In the present study, an investigation was made of chromosomal and genome size variation in the majority of Curcuma species from the Indian subcontinent, and an assessment was made of the value of these data for taxonomic purposes. Methods Genome size of 161 homogeneously cultivated plant samples classified into 51 taxonomic entities was determined by propidium iodide flow cytometry. Chromosome numbers were counted in actively growing root tips using conventional rapid squash techniques. Key Results Six different chromosome counts (2n = 22, 42, 63, >70, 77 and 105) were found, the last two representing new generic records. The 2C-values varied from 1·66 pg in C. vamana to 4·76 pg in C. oligantha, representing a 2·87-fold range. Three groups of taxa with significantly different homoploid genome sizes (Cx-values) and distinct geographical distribution were identified. Five species exhibited intraspecific variation in nuclear DNA content, reaching up to 15·1 % in cultivated C. longa. Chromosome counts and genome sizes of three Curcuma-like species (Hitchenia caulina, Kaempferia scaposa and Paracautleya bhatii) corresponded well with typical hexaploid (2n = 6x = 42) Curcuma spp. Conclusions The basic chromosome number in the majority of Indian taxa (belonging to subgenus Curcuma) is x = 7; published counts correspond to 6x, 9x, 11x, 12x and 15x ploidy levels. Only a few species-specific C-values were found, but karyological and/or flow cytometric data may support taxonomic decisions in some species alliances with morphological similarities. Close evolutionary relationships among some cytotypes are suggested based on the similarity in homoploid genome sizes and geographical grouping
Tang, Chaorong; Yang, Meng; Fang, Yongjun; Luo, Yingfeng; Gao, Shenghan; Xiao, Xiaohu; An, Zewei; Zhou, Binhui; Zhang, Bing; Tan, Xinyu; Yeang, Hoong Yeet; Qin, Yunxia; Yang, Jianghua; Lin, Qiang; Mei, Hailiang
The Para rubber tree (Hevea brasiliensis) is an economically important tropical tree species that produces natural rubber, an essential industrial raw material. Here we present a high-quality genome assembly of this species (1.37 Gb, scaffold N50 = 1.28 Mb) that covers 93.8% of the genome (1.47 Gb) and harbours 43,792 predicted protein-coding genes. A striking expansion of the REF/SRPP (rubber elongation factor/small rubber particle protein) gene family and its divergence into several laticif...
Wiens, Julia; Ho, Ryan; Fernando, Dinesh; Kumar, Ayush; Loewen, Peter C; Brassinga, Ann Karen C; Anderson, W Gary
We report here a genome sequence for Rhodococcus sp. isolate UM008 isolated from the renal/interrenal tissue of the winter skate Leucoraja ocellata Genome sequence analysis suggests that Rhodococcus bacteria may act in a novel mutualistic relationship with their elasmobranch host, serving as biocatalysts in the steroidogenic pathway of 1α-hydroxycorticosterone. Copyright © 2016 Wiens et al.
Full Text Available Acacia senegal (L Willd. and Acacia seyal Del. are highly nitrogen-fixing and moderately salt tolerant species. In this study we focused on the genetic and genomic diversity of Acacia mesorhizobia symbionts from diverse origins in Senegal and investigated possible correlations between the genetic diversity of the strains, their soil of origin, and their tolerance to salinity. We first performed a multi-locus sequence analysis on five markers gene fragments on a collection of 47 mesorhizobia strains of A. senegal and A. seyal from 8 localities. Most of the strains (60% clustered with the M. plurifarium type strain ORS 1032T, while the others form four new clades (MSP1 to MSP4. We sequenced and assembled seven draft genomes: four in the M. plurifarium clade (ORS3356, ORS3365, STM8773 and ORS1032T, one in MSP1 (STM8789, MSP2 (ORS3359 and MSP3 (ORS3324. The average nucleotide identities between these genomes together with the MLSA analysis reveal three new species of Mesorhizobium. A great variability of salt tolerance was found among the strains with a lack of correlation between the genetic diversity of mesorhizobia, their salt tolerance and the soils samples characteristics. A putative geographical pattern of A. senegal symbionts between the dryland north part and the center of Senegal was found, reflecting adaptations to specific local conditions such as the water regime. However, the presence of salt does not seem to be an important structuring factor of Mesorhizobium species.
Diouf, Fatou; Diouf, Diegane; Klonowska, Agnieszka; Le Queré, Antoine; Bakhoum, Niokhor; Fall, Dioumacor; Neyra, Marc; Parrinello, Hugues; Diouf, Mayecor; Ndoye, Ibrahima; Moulin, Lionel
Acacia senegal (L) Willd. and Acacia seyal Del. are highly nitrogen-fixing and moderately salt tolerant species. In this study we focused on the genetic and genomic diversity of Acacia mesorhizobia symbionts from diverse origins in Senegal and investigated possible correlations between the genetic diversity of the strains, their soil of origin, and their tolerance to salinity. We first performed a multi-locus sequence analysis on five markers gene fragments on a collection of 47 mesorhizobia strains of A. senegal and A. seyal from 8 localities. Most of the strains (60%) clustered with the M. plurifarium type strain ORS 1032T, while the others form four new clades (MSP1 to MSP4). We sequenced and assembled seven draft genomes: four in the M. plurifarium clade (ORS3356, ORS3365, STM8773 and ORS1032T), one in MSP1 (STM8789), MSP2 (ORS3359) and MSP3 (ORS3324). The average nucleotide identities between these genomes together with the MLSA analysis reveal three new species of Mesorhizobium. A great variability of salt tolerance was found among the strains with a lack of correlation between the genetic diversity of mesorhizobia, their salt tolerance and the soils samples characteristics. A putative geographical pattern of A. senegal symbionts between the dryland north part and the center of Senegal was found, reflecting adaptations to specific local conditions such as the water regime. However, the presence of salt does not seem to be an important structuring factor of Mesorhizobium species.
Development of novel simple sequence repeat markers in bitter gourd (Momordica charantia L.) through enriched genomic libraries and their utilization in analysis of genetic diversity and cross-species transferability.
Saxena, Swati; Singh, Archana; Archak, Sunil; Behera, Tushar K; John, Joseph K; Meshram, Sudhir U; Gaikwad, Ambika B
Microsatellite or simple sequence repeat (SSR) markers are the preferred markers for genetic analyses of crop plants. The availability of a limited number of such markers in bitter gourd (Momordica charantia L.) necessitates the development and characterization of more SSR markers. These were developed from genomic libraries enriched for three dinucleotide, five trinucleotide, and two tetranucleotide core repeat motifs. Employing the strategy of polymerase chain reaction-based screening, the number of clones to be sequenced was reduced by 81 % and 93.7 % of the sequenced clones contained in microsatellite repeats. Unique primer-pairs were designed for 160 microsatellite loci, and amplicons of expected length were obtained for 151 loci (94.4 %). Evaluation of diversity in 54 bitter gourd accessions at 51 loci indicated that 20 % of the loci were polymorphic with the polymorphic information content values ranging from 0.13 to 0.77. Fifteen Indian varieties were clearly distinguished indicative of the usefulness of the developed markers. Markers at 40 loci (78.4 %) were transferable to six species, viz. Momordica cymbalaria, Momordica subangulata subsp. renigera, Momordica balsamina, Momordica dioca, Momordica cochinchinesis, and Momordica sahyadrica. The microsatellite markers reported will be useful in various genetic and molecular genetic studies in bitter gourd, a cucurbit of immense nutritive, medicinal, and economic importance.
Full Text Available Numerous hybrid and polypoid species are found within the Triticeae. It has been suggested that the H subgenome of allopolyploid Elymus (wheatgrass species originated from diploid Hordeum (barley species, but the role of hybridization between polyploid Elymus and Hordeum has not been studied. It is not clear whether gene flow across polyploid Hordeum and Elymus species has occurred following polyploid speciation. Answering these questions will provide new insights into the formation of these polyploid species, and the potential role of gene flow among polyploid species during polyploid evolution. In order to address these questions, disrupted meiotic cDNA1 (DMC1 data from the allopolyploid StH Elymus are analyzed together with diploid and polyploid Hordeum species. Phylogenetic analysis revealed that the H copies of DMC1 sequence in some Elymus are very close to the H copies of DMC1 sequence in some polyploid Hordeum species, indicating either that the H genome in theses Elymus and polyploid Hordeum species originated from same diploid donor or that gene flow has occurred among them. Our analysis also suggested that the H genomes in Elymus species originated from limited gene pool, while H genomes in Hordeum polyploids have originated from broad gene pools. Nucleotide diversity (π of the DMC1 sequences on H genome from polyploid species (π = 0.02083 in Elymus, π = 0.01680 in polyploid Hordeum is higher than that in diploid Hordeum (π = 0.01488. The estimates of Tajima's D were significantly departure from the equilibrium neutral model at this locus in diploid Hordeum species (P<0.05, suggesting an excess of rare variants in diploid species which may not contribute to the origination of polyploids. Nucleotide diversity (π of the DMC1 sequences in Elymus polyploid species (π = 0.02083 is higher than that in polyploid Hordeum (π = 0.01680, suggesting that the degree of relationships between two parents of a polyploid might be a factor
Yang, J; Liu, G; Zhao, N; Chen, S; Liu, D; Ma, W; Hu, Z; Zhang, M
The genus Brassica has many species that are important for oil, vegetable and other food products. Three mitochondrial genome types (mitotype) originated from its common ancestor. In this paper, a B. nigra mitochondrial main circle genome with 232,407 bp was generated through de novo assembly. Synteny analysis showed that the mitochondrial genomes of B. rapa and B. oleracea had a better syntenic relationship than B. nigra. Principal components analysis and development of a phylogenetic tree indicated maternal ancestors of three allotetraploid species in Us triangle of Brassica. Diversified mitotypes were found in allotetraploid B. napus, in which napus-type B. napus was derived from B. oleracea, while polima-type B. napus was inherited from B. rapa. In addition, the mitochondrial genome of napus-type B. napus was closer to botrytis-type than capitata-type B. oleracea. The sub-stoichiometric shifting of several mitochondrial genes suggested that mitochondrial genome rearrangement underwent evolutionary selection during domestication and/or plant breeding. Our findings clarify the role of diploid species in the maternal origin of allotetraploid species in Brassica and suggest the possibility of breeding selection of the mitochondrial genome. © 2015 German Botanical Society and The Royal Botanical Society of the Netherlands.
Full Text Available Sex chromosomes turn over rapidly in some taxonomic groups, where closely related species have different sex chromosomes. Although there are many examples of sex chromosome turnover, we know little about the functional roles of sex chromosome turnover in phenotypic diversification and genomic evolution. The sympatric pair of Japanese threespine stickleback (Gasterosteus aculeatus provides an excellent system to address these questions: the Japan Sea species has a neo-sex chromosome system resulting from a fusion between an ancestral Y chromosome and an autosome, while the sympatric Pacific Ocean species has a simple XY sex chromosome system. Furthermore, previous quantitative trait locus (QTL mapping demonstrated that the Japan Sea neo-X chromosome contributes to phenotypic divergence and reproductive isolation between these sympatric species. To investigate the genomic basis for the accumulation of genes important for speciation on the neo-X chromosome, we conducted whole genome sequencing of males and females of both the Japan Sea and the Pacific Ocean species. No substantial degeneration has yet occurred on the neo-Y chromosome, but the nucleotide sequence of the neo-X and the neo-Y has started to diverge, particularly at regions near the fusion. The neo-sex chromosomes also harbor an excess of genes with sex-biased expression. Furthermore, genes on the neo-X chromosome showed higher non-synonymous substitution rates than autosomal genes in the Japan Sea lineage. Genomic regions of higher sequence divergence between species, genes with divergent expression between species, and QTL for inter-species phenotypic differences were found not only at the regions near the fusion site, but also at other regions along the neo-X chromosome. Neo-sex chromosomes can therefore accumulate substitutions causing species differences even in the absence of substantial neo-Y degeneration.
Yi, Jin Wook; Park, Ji Yeon; Sung, Ji-Youn; Kwak, Sang Hyuk; Yu, Jihan; Chang, Ji Hyun; Kim, Jo-Heon; Ha, Sang Yun; Paik, Eun Kyung; Lee, Woo Seung; Kim, Su-Jin; Lee, Kyu Eun; Kim, Ju Han
Elevated levels of reactive oxygen species (ROS) have been proposed as a risk factor for the development of papillary thyroid carcinoma (PTC) in patients with Hashimoto thyroiditis (HT). However, it has yet to be proven that the total levels of ROS are sufficiently increased to contribute to carcinogenesis. We hypothesized that if the ROS levels were increased in HT, ROS-related genes would also be differently expressed in PTC with HT. To find differentially expressed genes (DEGs) we analyzed data from the Cancer Genomic Atlas, gene expression data from RNA sequencing: 33 from normal thyroid tissue, 232 from PTC without HT, and 60 from PTC with HT. We prepared 402 ROS-related genes from three gene sets by genomic database searching. We also analyzed a public microarray data to validate our results. Thirty-three ROS related genes were up-regulated in PTC with HT, whereas there were only nine genes in PTC without HT (Chi-square p-value < 0.001). Mean log2 fold changes of up-regulated genes was 0.562 in HT group and 0.252 in PTC without HT group (t-test p-value = 0.001). In microarray data analysis, 12 of 32 ROS-related genes showed the same differential expression pattern with statistical significance. In gene ontology analysis, up-regulated ROS-related genes were related with ROS metabolism and apoptosis. Immune function-related and carcinogenesis-related gene sets were enriched only in HT group in Gene Set Enrichment Analysis. Our results suggested that ROS levels may be increased in PTC with HT. Increased levels of ROS may contribute to PTC development in patients with HT.
Full Text Available Penicillium capsulatum is a rare Penicillium species used in paper manufacturing, but recently it has been reported to cause invasive infection. To research the pathogenicity of the clinical Penicillium strain, we sequenced the genomes and transcriptome of the clinical and environmental strains of P. capsulatum. Comparative analyses of these two P. capsulatum strains and close related strains belonging to Eurotiales were performed. The assembled genome sizes of P. capsulatum are approximately 34.4 Mbp in length and encode 11,080 predicted genes. The different isolates of P. capsulatum are highly similar, with the exception of several unique genes, INDELs or SNP in the genes coding for glycosyl hydrolases, amino acid transporters and circumsporozoite protein. A phylogenomic analysis was performed based on the whole genome data of 38 strains belonging to Eurotiales. By comparing the whole genome sequences and the virulence-related genes from 20 important related species, including fungal pathogens and non-human pathogens belonging to Eurotiales, we found meaningful pathogenicity characteristics between P. capsulatum and its closely related species. Our research indicated that P. capsulatum may be a neglected opportunistic pathogen. This study is beneficial for mycologists, geneticists and epidemiologists to achieve a deeper understanding of the genetic basis of the role of P. capsulatum as a newly reported fungal pathogen.
Schoville, Sean D; Chen, Yolanda H; Andersson, Martin N; Benoit, Joshua B; Bhandari, Anita; Bowsher, Julia H; Brevik, Kristian; Cappelle, Kaat; Chen, Mei-Ju M; Childers, Anna K; Childers, Christopher; Christiaens, Olivier; Clements, Justin; Didion, Elise M; Elpidina, Elena N; Engsontia, Patamarerk; Friedrich, Markus; García-Robles, Inmaculada; Gibbs, Richard A; Goswami, Chandan; Grapputo, Alessandro; Gruden, Kristina; Grynberg, Marcin; Henrissat, Bernard; Jennings, Emily C; Jones, Jeffery W; Kalsi, Megha; Khan, Sher A; Kumar, Abhishek; Li, Fei; Lombard, Vincent; Ma, Xingzhou; Martynov, Alexander; Miller, Nicholas J; Mitchell, Robert F; Munoz-Torres, Monica; Muszewska, Anna; Oppert, Brenda; Palli, Subba Reddy; Panfilio, Kristen A; Pauchet, Yannick; Perkin, Lindsey C; Petek, Marko; Poelchau, Monica F; Record, Éric; Rinehart, Joseph P; Robertson, Hugh M; Rosendale, Andrew J; Ruiz-Arroyo, Victor M; Smagghe, Guy; Szendrei, Zsofia; Thomas, Gregg W C; Torson, Alex S; Vargas Jentzsch, Iris M; Weirauch, Matthew T; Yates, Ashley D; Yocum, George D; Yoon, June-Sun; Richards, Stephen
The Colorado potato beetle is one of the most challenging agricultural pests to manage. It has shown a spectacular ability to adapt to a variety of solanaceaeous plants and variable climates during its global invasion, and, notably, to rapidly evolve insecticide resistance. To examine evidence of rapid evolutionary change, and to understand the genetic basis of herbivory and insecticide resistance, we tested for structural and functional genomic changes relative to other arthropod species using genome sequencing, transcriptomics, and community annotation. Two factors that might facilitate rapid evolutionary change include transposable elements, which comprise at least 17% of the genome and are rapidly evolving compared to other Coleoptera, and high levels of nucleotide diversity in rapidly growing pest populations. Adaptations to plant feeding are evident in gene expansions and differential expression of digestive enzymes in gut tissues, as well as expansions of gustatory receptors for bitter tasting. Surprisingly, the suite of genes involved in insecticide resistance is similar to other beetles. Finally, duplications in the RNAi pathway might explain why Leptinotarsa decemlineata has high sensitivity to dsRNA. The L. decemlineata genome provides opportunities to investigate a broad range of phenotypes and to develop sustainable methods to control this widely successful pest.
Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.
The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640
Ahrens, Collin W; Supple, Megan A; Aitken, Nicola C; Cantrill, David J; Borevitz, Justin O; James, Elizabeth A
Species are often used as the unit for conservation, but may not be suitable for species complexes where taxa are difficult to distinguish. Under such circumstances, it may be more appropriate to consider species groups or populations as evolutionarily significant units (ESUs). A population genomic approach was employed to investigate the diversity within and among closely related species to create a more robust, lineage-specific conservation strategy for a nationally endangered terrestrial orchid and its relatives from south-eastern Australia. Four putative species were sampled from a total of 16 populations in the Victorian Volcanic Plain (VVP) bioregion and one population of a sub-alpine outgroup in south-eastern Australia. Morphological measurements were taken in situ along with leaf material for genotyping by sequencing (GBS) and microsatellite analyses. Species could not be differentiated using morphological measurements. Microsatellite and GBS markers confirmed the outgroup as distinct, but only GBS markers provided resolution of population genetic structure. The nationally endangered Diuris basaltica was indistinguishable from two related species ( D. chryseopsis and D. behrii ), while the state-protected D. gregaria showed genomic differentiation. Genomic diversity identified among the four Diuris species suggests that conservation of this taxonomically complex group will be best served by considering them as one ESU rather than separately aligned with species as currently recognized. This approach will maximize evolutionary potential among all species during increased isolation and environmental change. The methods used here can be applied generally to conserve evolutionary processes for groups where taxonomic uncertainty hinders the use of species as conservation units. © The Author 2017. Published by Oxford University Press on behalf of the Annals of Botany Company. All rights reserved. For Permissions, please email: firstname.lastname@example.org
Full Text Available In the recent era, due to tremendous advancement in industrialization, pollution and other anthropogenic activities have created a serious scenario for biota survival. It has been reported that present biota is entering a “sixth” mass extinction, because of chronic exposure to anthropogenic activities. Various ex situ and in situ measures have been adopted for conservation of threatened and endangered plants and animal species; however, these have been limited due to various discrepancies associated with them. Current advancement in molecular technologies, especially, genomics, is playing a very crucial role in biodiversity conservation. Advance genomics helps in identifying the segments of genome responsible for adaptation. It can also improve our understanding about microevolution through a better understanding of selection, mutation, assertive matting, and recombination. Advance genomics helps in identifying genes that are essential for fitness and ultimately for developing modern and fast monitoring tools for endangered biodiversity. This review article focuses on the applications of advanced genomics mainly demographic, adaptive genetic variations, inbreeding, hybridization and introgression, and disease susceptibilities, in the conservation of threatened biota. In short, it provides the fundamentals for novice readers and advancement in genomics for the experts working for the conservation of endangered plant and animal species.
Khan, Suliman; Nabi, Ghulam; Ullah, Muhammad Wajid; Yousaf, Muhammad; Manan, Sehrish; Siddique, Rabeea; Hou, Hongwei
In the recent era, due to tremendous advancement in industrialization, pollution and other anthropogenic activities have created a serious scenario for biota survival. It has been reported that present biota is entering a "sixth" mass extinction, because of chronic exposure to anthropogenic activities. Various ex situ and in situ measures have been adopted for conservation of threatened and endangered plants and animal species; however, these have been limited due to various discrepancies associated with them. Current advancement in molecular technologies, especially, genomics, is playing a very crucial role in biodiversity conservation. Advance genomics helps in identifying the segments of genome responsible for adaptation. It can also improve our understanding about microevolution through a better understanding of selection, mutation, assertive matting, and recombination. Advance genomics helps in identifying genes that are essential for fitness and ultimately for developing modern and fast monitoring tools for endangered biodiversity. This review article focuses on the applications of advanced genomics mainly demographic, adaptive genetic variations, inbreeding, hybridization and introgression, and disease susceptibilities, in the conservation of threatened biota. In short, it provides the fundamentals for novice readers and advancement in genomics for the experts working for the conservation of endangered plant and animal species.
Watanabe, Takayasu; Shibasaki, Masaki; Maruyama, Fumito; Sekizaki, Tsutomu; Nakagawa, Ichiro
The oral bacterial species Porphyromonas gingivalis, a periodontal pathogen, has plastic genomes that may be driven by homologous recombination with exogenous deoxyribonucleic acid (DNA) that is incorporated by natural transformation and conjugation. However, bacteriophages and plasmids, both of which are main resources of exogenous DNA, do not exist in the known P. gingivalis genomes. This could be associated with an adaptive immunity system conferred by clustered regularly interspaced short palindromic repeat (CRISPR) and CRISPR-associated (cas) genes in P. gingivalis as well as innate immune systems such as a restriction-modification system. In a previous study, few immune targets were predicted for P. gingivalis CRISPR/Cas. In this paper, we analyzed 51 P. gingivalis genomes, which were newly sequenced, and publicly available genomes of 13 P. gingivalis and 46 other Porphyromonas species. We detected 6 CRISPR/Cas types (classified by sequence similarity of repeat) in P. gingivalis and 12 other types in the remaining species. The Porphyromonas CRISPR spacers with potential targets in the genus Porphyromonas were approximately 23 times more abundant than those with potential targets in other genus taxa (1,720/6,896 spacers vs. 74/6,896 spacers). Porphyromonas CRISPR/Cas may be involved in genome plasticity by exhibiting selective interference against intra- and interspecies nucleic acids.
Full Text Available Giardia intestinalis is a major cause of diarrheal disease worldwide and two major Giardia genotypes, assemblages A and B, infect humans. The genome of assemblage A parasite WB was recently sequenced, and the structurally compact 11.7 Mbp genome contains simplified basic cellular machineries and metabolism. We here performed 454 sequencing to 16x coverage of the assemblage B isolate GS, the only Giardia isolate successfully used to experimentally infect animals and humans. The two genomes show 77% nucleotide and 78% amino-acid identity in protein coding regions. Comparative analysis identified 28 unique GS and 3 unique WB protein coding genes, and the variable surface protein (VSP repertoires of the two isolates are completely different. The promoters of several enzymes involved in the synthesis of the cyst-wall lack binding sites for encystation-specific transcription factors in GS. Several synteny-breaks were detected and verified. The tetraploid GS genome shows higher levels of overall allelic sequence polymorphism (0.5 versus <0.01% in WB. The genomic differences between WB and GS may explain some of the observed biological and clinical differences between the two isolates, and it suggests that assemblage A and B Giardia can be two different species.
Full Text Available IntroductionIt is rather difficult to delimit recently diverged species and construct their interspecific relationships because of insufficient informative variations of sampled DNA fragments (Schluter, 2000; Arnold, 2006. The genome-scale sequence variations were found to increase the phylogenetic resolutions of both high- and low-taxonomic groups (e.g., Yoder et al., 2013; Lamichhaney et al., 2015. It is still expensive to collect nuclear genome variations between species for most none-model genera without the reference genome. However, chloroplast genomes (plastome are relatively easy to be assembled to examine interspecific relationships for phylogenetic analyses, especially in addressing unresolved relationship at low taxonomic levels (Wu et al., 2010; Nock et al., 2011; Yang et al., 2013; Huang et al., 2014; Carbonell-Caballero et al., 2015. Plastomes are haploid with maternal inheritance in most angiosperms (Corriveau and Coleman, 1988; Zhang and Liu, 2003; Hagemann, 2004 and are highly conservative in gene order and genome structure with rare recombinations (Jansen et al., 2007; Moore et al., 2010. In this study, we aimed to examine species delimitation and interspecific relationships in Orychophragmus through assembling chloroplast genomes of multiple individuals of tentatively delimited species (Hu et al., 2015a. Orychophragmus is a small genus in the mustard family (Brassicaceae, Cruciferae distributed in northern, central, and southeastern China (Zhou et al., 2001. Its plants have been widely cultivated as ornamentals, vegetables, or source of seed oil (Sun et al., 2011. Despite controversial species delimitations in the genus (Zhou et al., 1987; Tan et al., 1998; Wu and Zhao, 2003; Al-Shehbaz and Yang, 2000; Zhou et al., 2001; Sun et al., 2012, our recent study based on nuclear (nr ITS sequence variations suggested the recognition of seven species (Hu et al., 2015a. Orychophragmus is sister to Sinalliaria, which is a genus endemic
Full Text Available Abstract Background Studying mitochondrial (mt genomics has important implications for various fundamental areas, including mt biochemistry, physiology and molecular biology. In addition, mt genome sequences have provided useful markers for investigating population genetic structures, systematics and phylogenetics of organisms. Toxocara canis, Toxocara cati and Toxocara malaysiensis cause significant health problems in animals and humans. Although they are of importance in human and animal health, no information on the mt genomes for any of Toxocara species is available. Results The sizes of the entire mt genome are 14,322 bp for T. canis, 14029 bp for T. cati and 14266 bp for T. malaysiensis, respectively. These circular genomes are amongst the largest reported to date for all secernentean nematodes. Their relatively large sizes relate mainly to an increased length in the AT-rich region. The mt genomes of the three Toxocara species all encode 12 proteins, two ribosomal RNAs and 22 transfer RNA genes, but lack the ATP synthetase subunit 8 gene, which is consistent with all other species of Nematode studied to date, with the exception of Trichinella spiralis. All genes are transcribed in the same direction and have a nucleotide composition high in A and T, but low in G and C. The contents of A+T of the complete genomes are 68.57% for T. canis, 69.95% for T. cati and 68.86% for T. malaysiensis, among which the A+T for T. canis is the lowest among all nematodes studied to date. The AT bias had a significant effect on both the codon usage pattern and amino acid composition of proteins. The mt genome structures for three Toxocara species, including genes and non-coding regions, are in the same order as for Ascaris suum and Anisakis simplex, but differ from Ancylostoma duodenale, Necator americanus and Caenorhabditis elegans only in the location of the AT-rich region, whereas there are substantial differences when compared with Onchocerca volvulus
Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human
Kim, Chang-Kug; Seol, Young-Joo; Perumal, Sampath; Lee, Jonghoon; Waminal, Nomar Espinosa; Jayakodi, Murukarthick; Lee, Sang-Choon; Jin, Seungwoo; Choi, Beom-Soon; Yu, Yeisoo; Ko, Ho-Cheol; Choi, Ji-Weon; Ryu, Kyoung-Yul; Sohn, Seong-Han; Parkin, Isobel; Yang, Tae-Jin
The concept of U's triangle, which revealed the importance of polyploidization in plant genome evolution, described natural allopolyploidization events in Brassica using three diploids [B. rapa (A genome), B. nigra (B), and B. oleracea (C)] and derived allotetraploids [B. juncea (AB genome), B. napus (AC), and B. carinata (BC)]. However, comprehensive understanding of Brassica genome evolution has not been fully achieved. Here, we performed low-coverage (2-6×) whole-genome sequencing of 28 accessions of Brassica as well as of Raphanus sativus [R genome] to explore the evolution of six Brassica species based on chloroplast genome and ribosomal DNA variations. Our phylogenomic analyses led to two main conclusions. (1) Intra-species-level chloroplast genome variations are low in the three allotetraploids (2~7 SNPs), but rich and variable in each diploid species (7~193 SNPs). (2) Three allotetraploids maintain two 45SnrDNA types derived from both ancestral species with maternal dominance. Furthermore, this study sheds light on the maternal origin of the AC chloroplast genome. Overall, this study clarifies the genetic relationships of U's triangle species based on a comprehensive genomics approach and provides important genomic resources for correlative and evolutionary studies.
Supple, Megan Ann; Bragg, Jason G; Broadhurst, Linda M; Nicotra, Adrienne B; Byrne, Margaret; Andrew, Rose L; Widdup, Abigail; Aitken, Nicola C; Borevitz, Justin O
As species face rapid environmental change, we can build resilient populations through restoration projects that incorporate predicted future climates into seed sourcing decisions. Eucalyptus melliodora is a foundation species of a critically endangered community in Australia that is a target for restoration. We examined genomic and phenotypic variation to make empirical based recommendations for seed sourcing. We examined isolation by distance and isolation by environment, determining high levels of gene flow extending for 500 km and correlations with climate and soil variables. Growth experiments revealed extensive phenotypic variation both within and among sampling sites, but no site-specific differentiation in phenotypic plasticity. Model predictions suggest that seed can be sourced broadly across the landscape, providing ample diversity for adaptation to environmental change. Application of our landscape genomic model to E. melliodora restoration projects can identify genomic variation suitable for predicted future climates, thereby increasing the long term probability of successful restoration. © 2018, Supple et al.
Full Text Available Abstract Background Studies of the Xenopus organizer have laid the foundation for our understanding of the conserved signaling pathways that pattern vertebrate embryos during gastrulation. The two primary activities of the organizer, BMP and Wnt inhibition, can regulate a spectrum of genes that pattern essentially all aspects of the embryo during gastrulation. As our knowledge of organizer signaling grows, it is imperative that we begin knitting together our gene-level knowledge into genome-level signaling models. The goal of this paper was to identify complete lists of genes regulated by different aspects of organizer signaling, thereby providing a deeper understanding of the genomic mechanisms that underlie these complex and fundamental signaling events. Results To this end, we ectopically overexpress Noggin and Dkk-1, inhibitors of the BMP and Wnt pathways, respectively, within ventral tissues. After isolating embryonic ventral halves at early and late gastrulation, we analyze the transcriptional response to these molecules within the generated ectopic organizers using oligonucleotide microarrays. An efficient statistical analysis scheme, combined with a new Gene Ontology biological process annotation of the Xenopus genome, allows reliable and faithful clustering of molecules based upon their roles during gastrulation. From this data, we identify new organizer-related expression patterns for 19 genes. Moreover, our data sub-divides organizer genes into separate head and trunk organizing groups, which each show distinct responses to Noggin and Dkk-1 activity during gastrulation. Conclusion Our data provides a genomic view of the cohorts of genes that respond to Noggin and Dkk-1 activity, allowing us to separate the role of each in organizer function. These patterns demonstrate a model where BMP inhibition plays a largely inductive role during early developmental stages, thereby initiating the suites of genes needed to pattern dorsal tissues
Piffanelli, P.; Ciampi, A.Y.; Silva, F.R.; Santos, C.R.; Dhont, A.; Vilarinhos, A.; Pappas, G.; Souza, M.T.; Milller, R.N.G.
Musa species (Zingiberaceae, Zingiberales) including bananas and plantains are collectively the fourth most important crop in developing countries. Knowledge concerning Musa genome structure and the origin of distinct cultivars has greatly increased over the last few years. Until now, however, no
Kaminuma, Eli; Fujisawa, Takatomo; Tanizawa, Yasuhiro; Sakamoto, Naoko; Kurata, Nori; Shimizu, Tokurou; Nakamura, Yasukazu
H2DB (http://tga.nig.ac.jp/h2db/), an annotation database of genetic heritability estimates for humans and other species, has been developed as a knowledge database to connect trait-associated genomic loci. Heritability estimates have been investigated for individual species, particularly in human twin studies and plant/animal breeding studies. However, there appears to be no comprehensive heritability database for both humans and other species. Here, we introduce an annotation database for genetic heritabilities of various species that was annotated by manually curating online public resources in PUBMED abstracts and journal contents. The proposed heritability database contains attribute information for trait descriptions, experimental conditions, trait-associated genomic loci and broad- and narrow-sense heritability specifications. Annotated trait-associated genomic loci, for which most are single-nucleotide polymorphisms derived from genome-wide association studies, may be valuable resources for experimental scientists. In addition, we assigned phenotype ontologies to the annotated traits for the purposes of discussing heritability distributions based on phenotypic classifications.
Chen, Hao; Wang, Xiangfeng
In plants and animals, chromosomal breakage and fusion events based on conserved syntenic genomic blocks lead to conserved patterns of karyotype evolution among species of the same family. However, karyotype information has not been well utilized in genomic comparison studies. We present CrusView, a Java-based bioinformatic application utilizing Standard Widget Toolkit/Swing graphics libraries and a SQLite database for performing visualized analyses of comparative genomics data in Brassicaceae (crucifer) plants. Compared with similar software and databases, one of the unique features of CrusView is its integration of karyotype information when comparing two genomes. This feature allows users to perform karyotype-based genome assembly and karyotype-assisted genome synteny analyses with preset karyotype patterns of the Brassicaceae genomes. Additionally, CrusView is a local program, which gives its users high flexibility when analyzing unpublished genomes and allows users to upload self-defined genomic information so that they can visually study the associations between genome structural variations and genetic elements, including chromosomal rearrangements, genomic macrosynteny, gene families, high-frequency recombination sites, and tandem and segmental duplications between related species. This tool will greatly facilitate karyotype, chromosome, and genome evolution studies using visualized comparative genomics approaches in Brassicaceae species. CrusView is freely available at http://www.cmbb.arizona.edu/CrusView/. PMID:23898041
Valles, Iliana Espinoza; Vora, Gary J; Lin, Baochuan
Vibrio harveyi CAIM 1792 is a marine bacterial strain that causes mortality in farmed shrimp in north-west Mexico, and the identification of virulence genes in this strain is important for understanding its pathogenicity. The aim of this work was to compare the V. harveyi CAIM 1792 genome....... The proteome of CAIM 1792 had higher similarity to those of other V. harveyi strains (78 %) than to those of the other closely related species Vibrio owensii (67 %), Vibrio rotiferianus (63 %) and Vibrio campbellii (59 %). Pan-genome ORFans trees showed the best fit with the accepted phylogeny based on DNA......-DNA hybridization and multi-locus sequence analysis of 11 concatenated housekeeping genes. SNP analysis clustered 34/38 genomes within their accepted species. The pangenomic and SNP trees showed that V. harveyi is the most conserved of the four species studied and V. campbellii may be divided into at least three...
Kües, Ursula; Nelson, David R; Liu, Chang; Yu, Guo-Jun; Zhang, Jianhui; Li, Jianqin; Wang, Xin-Cun; Sun, Hui
Ganoderma is a fungal genus belonging to the Ganodermataceae family and Polyporales order. Plant-pathogenic species in this genus can cause severe diseases (stem, butt, and root rot) in economically important trees and perennial crops, especially in tropical countries. Ganoderma species are white rot fungi and have ecological importance in the breakdown of woody plants for nutrient mobilization. They possess effective machineries of lignocellulose-decomposing enzymes useful for bioenergy production and bioremediation. In addition, the genus contains many important species that produce pharmacologically active compounds used in health food and medicine. With the rapid adoption of next-generation DNA sequencing technologies, whole genome sequencing and systematic transcriptome analyses become affordable approaches to identify an organism's genes. In the last few years, numerous projects have been initiated to identify the genetic contents of several Ganoderma species, particularly in different strains of Ganoderma lucidum. In November 2013, eleven whole genome sequencing projects for Ganoderma species were registered in international databases, three of which were already completed with genomes being assembled to high quality. In addition to the nuclear genome, two mitochondrial genomes for Ganoderma species have also been reported. Complementing genome analysis, four transcriptome studies on various developmental stages of Ganoderma species have been performed. Information obtained from these studies has laid the foundation for the identification of genes involved in biological pathways that are critical for understanding the biology of Ganoderma, such as the mechanism of pathogenesis, the biosynthesis of active components, life cycle and cellular development, etc. With abundant genetic information becoming available, a few centralized resources have been established to disseminate the knowledge and integrate relevant data to support comparative genomic analyses of
Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M
Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.
Asaf, Sajjad; Khan, Abdul Latif; Khan, Muhammad Aaqil; Waqas, Muhammad; Kang, Sang-Mo; Yun, Byung-Wook; Lee, In-Jung
We investigated the complete chloroplast (cp) genomes of non-model Arabidopsis halleri ssp. gemmifera and Arabidopsis lyrata ssp. petraea using Illumina paired-end sequencing to understand their genetic organization and structure. Detailed bioinformatics analysis revealed genome sizes of both subspecies ranging between 154.4~154.5 kbp, with a large single-copy region (84,197~84,158 bp), a small single-copy region (17,738~17,813 bp) and pair of inverted repeats (IRa/IRb; 26,264~26,259 bp). Both cp genomes encode 130 genes, including 85 protein-coding genes, eight ribosomal RNA genes and 37 transfer RNA genes. Whole cp genome comparison of A. halleri ssp. gemmifera and A. lyrata ssp. petraea, along with ten other Arabidopsis species, showed an overall high degree of sequence similarity, with divergence among some intergenic spacers. The location and distribution of repeat sequences were determined, and sequence divergences of shared genes were calculated among related species. Comparative phylogenetic analysis of the entire genomic data set and 70 shared genes between both cp genomes confirmed the previous phylogeny and generated phylogenetic trees with the same topologies. The sister species of A. halleri ssp. gemmifera is A. umezawana, whereas the closest relative of A. lyrata spp. petraea is A. arenicola.
Kang, Keunsoo; Kim, Joomyeong; Chung, Jae Hoon; Lee, Daeyoup
The identification of genome-wide cis-regulatory modules (CRMs) and characterization of their associated epigenetic features are fundamental steps toward the understanding of gene regulatory networks. Although integrative analysis of available genome-wide information can provide new biological insights, the lack of novel methodologies has become a major bottleneck. Here, we present a comprehensive analysis tool called combinatorial CRM decoder (CCD), which utilizes the publicly available information to identify and characterize genome-wide CRMs in a species of interest. CCD first defines a set of the epigenetic features which is significantly associated with a set of known CRMs as a code called 'trace code', and subsequently uses the trace code to pinpoint putative CRMs throughout the genome. Using 61 genome-wide data sets obtained from 17 independent mouse studies, CCD successfully catalogued ∼12 600 CRMs (five distinct classes) including polycomb repressive complex 2 target sites as well as imprinting control regions. Interestingly, we discovered that ∼4% of the identified CRMs belong to at least two different classes named 'multi-functional CRM', suggesting their functional importance for regulating spatiotemporal gene expression. From these examples, we show that CCD can be applied to any potential genome-wide datasets and therefore will shed light on unveiling genome-wide CRMs in various species.
Susanta K Behura
Full Text Available BACKGROUND: Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias. METHODS AND PRINCIPAL FINDINGS: Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3'- and 5'-context of start and stop codons, respectively. CONCLUSIONS: Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.
Rasmussen, Jane Lind Nybo; Vesth, Tammi Camilla; Theobald, Sebastian
In the era of high-throughput sequencing, comparative genomics can be applied for evaluating species diversity. In this project we aim to compare the genomes of 300 species of filamentous fungi from the Aspergillus genus, a complex task. To be able to define species, clade, and core features......, this project uses BLAST on the amino acid level to discover orthologs. With a potential of 300 Aspergillus species each having ~12,000 annotated genes, traditional clustering will demand supercomputing. Instead, our approach reduces the search space by identifying isoenzymes within each genome creating...... intragenomic protein families (iPFs), and then connecting iPFs across all genomes. The initial findings in a set of 31 species show that ~48% of the annotated genes are core genes (genes shared between all species) and 2-24% of the genes are defining the individual species. The methods presented here...
microorganisms and eventually speed up the diagnosis of foodborne illnesses. This genomic data can give biologists many possibilities to improve knowledge of organismal evolution and complex genetic systems. The general interest of this PhD thesis is how to obtain relevant information from growing amounts...... groups or genomic structures; and to use the information of a specific proteome to predict which species it might belong to. Two different algorithms, BLAST and profile Hidden Markov Models (HMMs), are used to determine similarity between sequences and to address the questions in this thesis. The first...... the application of PanFunPro to a set of more than 2000 genomes; this paper aims to define set of protein families, which are conserved among all the genomes. Papers V demonstrates comparative genomics analysis of proteomes, belonging to Vibrio genus. In the last project, described in Chapter 5, both BLAST...
Full Text Available Abstract Background Due to recent advances in whole genome shotgun sequencing and assembly technologies, the financial cost of decoding an organism's DNA has been drastically reduced, resulting in a recent explosion of genomic sequencing projects. This increase in related genomic data will allow for in depth studies of evolution in closely related species through multiple whole genome comparisons. Results To facilitate such comparisons, we present an interactive multiple genome comparison and alignment tool, M-GCAT, that can efficiently construct multiple genome comparison frameworks in closely related species. M-GCAT is able to compare and identify highly conserved regions in up to 20 closely related bacterial species in minutes on a standard computer, and as many as 90 (containing 75 cloned genomes from a set of 15 published enterobacterial genomes in an hour. M-GCAT also incorporates a novel comparative genomics data visualization interface allowing the user to globally and locally examine and inspect the conserved regions and gene annotations. Conclusion M-GCAT is an interactive comparative genomics tool well suited for quickly generating multiple genome comparisons frameworks and alignments among closely related species. M-GCAT is freely available for download for academic and non-commercial use at: http://alggen.lsi.upc.es/recerca/align/mgcat/intro-mgcat.html.
Jang, Jun Hyeong; Kim, Sun-Joong; Yoon, Bo Hyun; Ryu, Jee-Hoon; Gu, Man Bock; Chang, Hyo-Ihl
This study describes a method using a DNA microarray chip to rapidly and simultaneously detect Alicyclobacillus species in orange juice based on the hybridization of genomic DNA with random probes. Three food spoilage bacteria were used in this study: Alicyclobacillus acidocaldarius, Alicyclobacillus acidoterrestris, and Alicyclobacillus cycloheptanicus. The three Alicyclobacillus species were adjusted to 2 × 10(3) CFU/ml and inoculated into pasteurized 100% pure orange juice. Cy5-dCTP labeling was used for reference signals, and Cy3-dCTP was labeled for target genomic DNA. The molar ratio of 1:1 of Cy3-dCTP and Cy5-dCTP was used. DNA microarray chips were fabricated using randomly fragmented DNA of Alicyclobacillus spp. and were hybridized with genomic DNA extracted from Bacillus spp. Genomic DNA extracted from Alicyclobacillus spp. showed a significantly higher hybridization rate compared with DNA of Bacillus spp., thereby distinguishing Alicyclobacillus spp. from Bacillus spp. The results showed that the microarray DNA chip containing randomly fragmented genomic DNA was specific and clearly identified specific food spoilage bacteria. This microarray system is a good tool for rapid and specific detection of thermophilic spoilage bacteria, mainly Alicyclobacillus spp., and is useful and applicable to the fruit juice industry.
Kijas, J M; Fowler, J C; Thomas, M R
Microsatellites, also called sequence tagged microsatellite sites (STMSs), have become important markers for genome analysis but are currently little studied in plants. To assess the value of STMSs for analysis within the Citrus plant species, two example STMSs were isolated from an intergeneric cross between rangpur lime (Citrus x limonia Osbeck) and trifoliate orange (Poncirus trifoliata (L.) Raf.). Unique flanking primers were constructed for polymerase chain reaction amplification both within the test cross and across a broad range of citrus and related species. Both loci showed length variation between test cross parents with alleles segregating in a Mendelian fashion to progeny. Amplification across species showed the STMS flanking primers to be conserved in every genome tested. The traits of polymorphism, inheritance, and conservation across species mean that STMS markers are ideal for genome mapping within Citrus, which contains high levels of genetic variability.
Full Text Available Microsporidia have attracted considerable attention because they infect a wide range of hosts, from invertebrates to vertebrates, and cause serious human diseases and major economic losses in the livestock industry. There are no prospective drugs to counteract this pathogen. Eukaryotic protein kinases (ePKs play a central role in regulating many essential cellular processes and are therefore potential drug targets. In this study, a comprehensive summary and comparative analysis of the protein kinases in four microsporidia—Enterocytozoon bieneusi, Encephalitozoon cuniculi, Nosema bombycis and Nosema ceranae—was performed. The results show that there are 34 ePKs and 4 atypical protein kinases (aPKs in E. bieneusi, 29 ePKs and 6 aPKs in E. cuniculi, 41 ePKs and 5 aPKs in N. bombycis, and 27 ePKs and 4 aPKs in N. ceranae. These data support the previous conclusion that the microsporidian kinome is the smallest eukaryotic kinome. Microsporidian kinomes contain only serine-threonine kinases and do not contain receptor-like and tyrosine kinases. Many of the kinases related to nutrient and energy signaling and the stress response have been lost in microsporidian kinomes. However, cell cycle-, development- and growth-related kinases, which are important to parasites, are well conserved. This reduction of the microsporidian kinome is in good agreement with genome compaction, but kinome density is negatively correlated with proteome size. Furthermore, the protein kinases in each microsporidian genome are under strong purifying selection pressure. No remarkable differences in kinase family classification, domain features, gain and/or loss, and selective pressure were observed in these four species. Although microsporidia adapt to different host types, the coevolution of microsporidia and their hosts was not clearly reflected in the protein kinases. Overall, this study enriches and updates the microsporidian protein kinase database and may provide
Miller, Joshua M; Moore, Stephen S; Stothard, Paul; Liao, Xiaoping; Coltman, David W
Whole genome sequences (WGS) have proliferated as sequencing technology continues to improve and costs decline. While many WGS of model or domestic organisms have been produced, a growing number of non-model species are also being sequenced. In the absence of a reference, construction of a genome sequence necessitates de novo assembly which may be beyond the ability of many labs due to the large volumes of raw sequence data and extensive bioinformatics required. In contrast, the presence of a reference WGS allows for alignment which is more tractable than assembly. Recent work has highlighted that the reference need not come from the same species, potentially enabling a wide array of species WGS to be constructed using cross-species alignment. Here we report on the creation a draft WGS from a single bighorn sheep (Ovis canadensis) using alignment to the closely related domestic sheep (Ovis aries). Two sequencing libraries on SOLiD platforms yielded over 865 million reads, and combined alignment to the domestic sheep reference resulted in a nearly complete sequence (95% coverage of the reference) at an average of 12x read depth (104 SD). From this we discovered over 15 million variants and annotated them relative to the domestic sheep reference. We then conducted an enrichment analysis of those SNPs showing fixed differences between the reference and sequenced individual and found significant differences in a number of gene ontology (GO) terms, including those associated with reproduction, muscle properties, and bone deposition. Our results demonstrate that cross-species alignment enables the creation of novel WGS for non-model organisms. The bighorn sheep WGS will provide a resource for future resequencing studies or comparative genomics.
Davies Jonathan J
Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.
Rasmusen, L. H.; Dargis, R.; Iversen, Katrine Højholt
observed in single gene analyses. Species identification based on single gene analysis showed their limitations when more strains were included. In contrast, analyses incorporating more sequence data, like MLSA, SNPs and core-genome analyses, provided more distinct clustering. The core-genome tree showed......Identification of Mitis group streptococci (MGS) to the species level is challenging for routine microbiology laboratories. Correct identification is crucial for the diagnosis of infective endocarditis, identification of treatment failure, and/or infection relapse. Eighty MGS from Danish patients...
Straub, Daniel; Wenkel, Stephan
Protein concept beyond transcription factors to other protein families. Here, we reveal potential microProtein candidates in several plant and animal reference genomes. A large number of these microProteins are species-specific while others evolved early and are evolutionary highly conserved. Most known micro...... act in plant transcriptional regulation, signal transduction and anatomical structure development. MiPFinder is freely available to find microProteins in any genome and will aid in the identification of novel microProteins in plants and animals....
Full Text Available We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a identification of horizontally transferred genes, (b identification of genomic islands with special properties and (c binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a calculation of the k-mer based barcode image for a provided DNA sequence; (b detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c clustering of provided DNA sequences into groups having similar barcodes; and (d homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.
Full Text Available Abstract Background While genes that are conserved between related bacterial species are usually thought to have evolved along with the species, phylogenetic trees reconstructed for individual genes may contradict this picture and indicate horizontal gene transfer. Individual trees are often not resolved with high confidence, however, and in that case alternative trees are generally not considered as contradicting the species tree, although not confirming it either. Here we conduct an in-depth analysis of 401 protein phylogenetic trees inferred with varying levels of confidence for three lactobacilli from the acidophilus complex. At present the relationship between these bacteria, isolated from environments as diverse as the gastrointestinal tract (Lactobacillus acidophilus and Lactobacillus johnsonii and yogurt (Lactobacillus delbrueckii ssp. bulgaricus, is ambiguous due to contradictory phenotypical and 16S rRNA based classifications. Results Among the 401 phylogenetic trees, those that could be reconstructed with high confidence support the 16S-rRNA tree or one alternative topology in an astonishing 3:2 ratio, while the third possible topology is practically absent. Lowering the confidence threshold for trees to be taken into consideration does not significantly affect this ratio, and therefore suggests that gene transfer may have affected as much as 40% of the core genome genes. Gene function bias suggests that the 16S rRNA phylogeny of the acidophilus complex, which indicates that L. acidophilus and L. delbrueckii ssp. bulgaricus are the closest related of these three species, is correct. A novel approach of comparison of interspecies protein divergence data employed in this study allowed to determine that gene transfer most likely took place between the lineages of the two species found in the gastrointestinal tract. Conclusion This case-study reports an unprecedented level of phylogenetic incongruence, presumably resulting from extensive
Full Text Available Antarctic soils represent a unique environment characterised by extremes of temperature, salinity, elevated UV radiation, low nutrient and low water content. Despite the harshness of this environment, members of 15 bacterial phyla have been identified in soils of the Ross Sea Region (RSR. However, the survival mechanisms and ecological roles of these phyla are largely unknown. The aim of this study was to investigate whether strains of Paenibacillus darwinianus owe their resilience to substantial genomic changes. For this, genome-based comparative analyses were performed on three P. darwinianus strains, isolated from gamma-irradiated RSR soils, together with nine temperate, soil-dwelling Paenibacillus spp. The genome of each strain was sequenced to over 1,000-fold coverage, then assembled into contigs totalling approximately 3 Mbp per genome. Based on the occurrence of essential, single-copy genes, genome completeness was estimated at approximately 88%. Genome analysis revealed between 3,043-3,091 protein-coding sequences (CDSs, primarily associated with two-component systems, sigma factors, transporters, sporulation and genes induced by cold-shock, oxidative and osmotic stresses. These comparative analyses provide an insight into the metabolic potential of P. darwinianus, revealing potential adaptive mechanisms for survival in Antarctic soils. However, a large proportion of these mechanisms were also identified in temperate Paenibacillus spp., suggesting that these mechanisms are beneficial for growth and survival in a range of soil environments. These analyses have also revealed that the P. darwinianus genomes contain significantly fewer CDSs and have a lower paralogous content. Notwithstanding the incompleteness of the assemblies, the large differences in genome sizes, determined by the number of genes in paralogous clusters and the CDS content, are indicative of genome content scaling. Finally, these sequences are a resource for further
Genomes of solid tumors are often highly rearranged and these rearrangements promote cancer progression through disruption of genes mediating immortality, survival, metastasis, and resistance to therapy...
Full Text Available Abstract Background Bituminaria bituminosa is a perennial legume species from the Canary Islands and Mediterranean region that has potential as a drought-tolerant pasture species and as a source of pharmaceutical compounds. Three botanical varieties have previously been identified in this species: albomarginata, bituminosa and crassiuscula. B. bituminosa can be considered a genomic 'orphan' species with very few genomic resources available. New DNA sequencing technologies provide an opportunity to develop high quality molecular markers for such orphan species. Results 432,306 mRNA molecules were sampled from a leaf transcriptome of a single B. bituminosa plant using Roche 454 pyrosequencing, resulting in an average read length of 345 bp (149.1 Mbp in total. Sequences were assembled into 3,838 isotigs/contigs representing putatively unique gene transcripts. Gene ontology descriptors were identified for 3,419 sequences. Raw sequence reads containing simple sequence repeat (SSR motifs were identified, and 240 primer pairs flanking these motifs were designed. Of 87 primer pairs developed this way, 75 (86.2% successfully amplified primarily single fragments by PCR. Fragment analysis using 20 primer pairs in 79 accessions of B. bituminosa detected 130 alleles at 21 SSR loci. Genetic diversity analyses confirmed that variation at these SSR loci accurately reflected known taxonomic relationships in original collections of B. bituminosa and provided additional evidence that a division of the botanical variety bituminosa into two according to geographical origin (Mediterranean region and Canary Islands may be appropriate. Evidence of cross-pollination was also found between botanical varieties within a B. bituminosa breeding programme. Conclusions B. bituminosa can no longer be considered a genomic orphan species, having now a large (albeit incomplete repertoire of expressed gene sequences that can serve as a resource for future genetic studies. This
Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T
SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research
Full Text Available Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38-39 Mb genomes include 11,860-14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared to <1% of B. cinerea. The arsenal of genes associated with necrotrophic processes is similar between the species, including genes involved in plant cell wall degradation and oxalic acid production. Analysis of secondary metabolism gene clusters revealed an expansion in number and diversity of B. cinerea-specific secondary metabolites relative to S. sclerotiorum. The potential diversity in secondary metabolism might be involved in adaptation to specific ecological niches. Comparative genome analysis revealed the basis of differing sexual mating compatibility systems between S. sclerotiorum and B. cinerea. The organization of the mating-type loci differs, and their structures provide evidence for the evolution of heterothallism from homothallism. These data shed light on the evolutionary and mechanistic bases of the genetically complex traits of necrotrophic pathogenicity and sexual mating. This resource should facilitate the functional studies designed to better understand what makes these
Full Text Available Majority of the 38 known cat species are classified as small and they inhabit five of the seven continents. They survive in a vast range of habitats but still 12 out of the 18 threatened felids are small cats. However, there has not been enough progress in the field of small cat research as they generally get overshadowed by the charismatic big cats. Here we attempt to create a resource for small cat research especially of the genus Felis which has six species out of which two are classified as vulnerable by IUCN and at least one more is at risk. We collected tissue samples of four Felis chaus (Jungle cat from central India and used available whole genome sequences of nine individuals from four other Felis species, two individuals of Prionailurus bengalensis and an Otocolobus manul. These whole genome sequences were filtered and aligned with the already published domestic cat (Felis catus genome assembly. Felids are closely related species and reads from all species in our study aligned with the domestic cat genome with a rate of at least 93%. We estimated the existing genomic variation by calculating heterozygous SNP encounter rate. So far, it seems that all wild cats have more genetic variation than Felis catus species. This can be attributed to the inbreeding in these cats. Among the wild cats, Felis silvestris seems to have the highest level of genetic variation. To understand the reasons behind the distribution of genetic variation in small cats, we estimated the demographic histories of each of the species using PSMC. This method can only detect demographic changes more than 1000 generations ago. We observe that roughly all species share a parallel history in terms of population increase. The most interesting and important feature might be that all wild small cat population sizes increased exponentially around twenty thousand years ago as opposed to domestic cat and big cats which declined around this time. Another interesting feature of
The complete genome sequencing of Prevotella intermedia strain OMA14 and a subsequent fine-scale, intra-species genomic comparison reveal an unusual amplification of conjugative and mobile transposons and identify a novel Prevotella-lineage-specific repeat.
Naito, Mariko; Ogura, Yoshitoshi; Itoh, Takehiko; Shoji, Mikio; Okamoto, Masaaki; Hayashi, Tetsuya; Nakayama, Koji
Prevotella intermedia is a pathogenic bacterium involved in periodontal diseases. Here, we present the complete genome sequence of a clinical strain, OMA14, of this bacterium along with the results of comparative genome analysis with strain 17 of the same species whose genome has also been sequenced, but not fully analysed yet. The genomes of both strains consist of two circular chromosomes: the larger chromosomes are similar in size and exhibit a high overall linearity of gene organizations, whereas the smaller chromosomes show a significant size variation and have undergone remarkable genome rearrangements. Unique features of the Pre. intermedia genomes are the presence of a remarkable number of essential genes on the second chromosomes and the abundance of conjugative and mobilizable transposons (CTns and MTns). The CTns/MTns are particularly abundant in the second chromosomes, involved in its extensive genome rearrangement, and have introduced a number of strain-specific genes into each strain. We also found a novel 188-bp repeat sequence that has been highly amplified in Pre. intermedia and are specifically distributed among the Pre. intermedia-related species. These findings expand our understanding of the genetic features of Pre. intermedia and the roles of CTns and MTns in the evolution of bacteria. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.
Mořkovský, Libor; Janoušek, Václav; Reif, Jiří; Rídl, Jakub; Pačes, Jan; Choleva, Lukáš; Janko, Karel; Nachman, Michael W; Reifová, Radka
Hybrid sterility is a common first step in the evolution of postzygotic reproductive isolation. According to Haldane's Rule, it affects predominantly the heterogametic sex. While the genetic basis of hybrid male sterility in organisms with heterogametic males has been studied for decades, the genetic basis of hybrid female sterility in organisms with heterogametic females has received much less attention. We investigated the genetic basis of reproductive isolation in two closely related avian species, the common nightingale (Luscinia megarhynchos) and the thrush nightingale (L. luscinia), that hybridize in a secondary contact zone and produce viable hybrid progeny. In accordance with Haldane's Rule, hybrid females are sterile, while hybrid males are fertile, allowing gene flow to occur between the species. Using transcriptomic data from multiple individuals of both nightingale species, we identified genomic islands of high differentiation (F ST ) and of high divergence (D xy ), and we analysed gene content and patterns of molecular evolution within these islands. Interestingly, we found that these islands were enriched for genes related to female meiosis and metabolism. The islands of high differentiation and divergence were also characterized by higher levels of linkage disequilibrium than the rest of the genome in both species indicating that they might be situated in genomic regions of low recombination. This study provides one of the first insights into genetic basis of hybrid female sterility in organisms with heterogametic females. © 2018 John Wiley & Sons Ltd.
Arakaki, Atsushi; Shibusawa, Mie; Hosokawa, Masahito; Matsunaga, Tadashi
Magnetotactic bacteria comprise a phylogenetically diverse group that is capable of synthesizing intracellular magnetic particles. Although various morphotypes of magnetotactic bacteria have been observed in the environment, bacterial strains available in pure culture are currently limited to a few genera due to difficulties in their enrichment and cultivation. In order to obtain genetic information from uncultured magnetotactic bacteria, a genome preparation method that involves magnetic separation of cells, flow cytometry, and multiple displacement amplification (MDA) using phi29 polymerase was used in this study. The conditions for the MDA reaction using samples containing 1 to 100 cells were evaluated using a pure-culture magnetotactic bacterium, "Magnetospirillum magneticum AMB-1," whose complete genome sequence is available. Uniform gene amplification was confirmed by quantitative PCR (Q-PCR) when 100 cells were used as a template. This method was then applied for genome preparation of uncultured magnetotactic bacteria from complex bacterial communities in an aquatic environment. A sample containing 100 cells of the uncultured magnetotactic coccus was prepared by magnetic cell separation and flow cytometry and used as an MDA template. 16S rRNA sequence analysis of the MDA product from these 100 cells revealed that the amplified genomic DNA was from a single species of magnetotactic bacterium that was phylogenetically affiliated with magnetotactic cocci in the Alphaproteobacteria. The combined use of magnetic separation, flow cytometry, and MDA provides a new strategy to access individual genetic information from magnetotactic bacteria in environmental samples.
Full Text Available Abstract Background Comparative genomic hybridization (CGH constitutes a powerful tool for identification and characterization of bacterial strains. In this study we have applied this technique for the characterization of a number of Lactobacillus strains isolated from the intestinal content of rats fed with a diet supplemented with sorbitol. Results Phylogenetic analysis based on 16S rRNA gene, recA, pheS, pyrG and tuf sequences identified five bacterial strains isolated from the intestinal content of rats as belonging to the recently described Lactobacillus taiwanensis species. DNA-DNA hybridization experiments confirmed that these five strains are distinct but closely related to Lactobacillus johnsonii and Lactobacillus gasseri. A whole genome DNA microarray designed for the probiotic L. johnsonii strain NCC533 was used for CGH analysis of L. johnsonii ATCC 33200T, L. johnsonii BL261, L. gasseri ATCC 33323T and L. taiwanensis BL263. In these experiments, the fluorescence ratio distributions obtained with L. taiwanensis and L. gasseri showed characteristic inter-species profiles. The percentage of conserved L. johnsonii NCC533 genes was about 83% in the L. johnsonii strains comparisons and decreased to 51% and 47% for L. taiwanensis and L. gasseri, respectively. These results confirmed the separate status of L. taiwanensis from L. johnsonii at the level of species, and also that L. taiwanensis is closer to L. johnsonii than L. gasseri is to L. johnsonii. Conclusion Conventional taxonomic analyses and microarray-based CGH analysis have been used for the identification and characterization of the newly species L. taiwanensis. The microarray-based CGH technology has been shown as a remarkable tool for the identification and fine discrimination between phylogenetically close species, and additionally provided insight into the adaptation of the strain L. taiwanensis BL263 to its ecological niche.
Full Text Available Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these
Kang, Sang-Ho; Lee, Jeong-Hoon; Lee, Hyun Oh; Ahn, Byoung Ohg; Won, So Youn; Sohn, Seong-Han; Kim, Jung Sun
Glycyrrhiza uralensis and G. glabra, members of the Fabaceae, are medicinally important species that are native to Asia and Europe. Extracts from these plants are widely used as natural sweeteners because of their much greater sweetness than sucrose. In this study, the three complete chloroplast genomes and five 45S nuclear ribosomal (nr)DNA sequences of these two licorice species and an interspecific hybrid are presented. The chloroplast genomes of G. glabra, G. uralensis and G. glabra × G. uralensis were 127,895 bp, 127,716 bp and 127,939 bp, respectively. The three chloroplast genomes harbored 110 annotated genes, including 76 protein-coding genes, 30 tRNA genes and 4 rRNA genes. The 45S nrDNA sequences were either 5,947 or 5,948 bp in length. Glycyrrhiza glabra and G. glabra × G. uralensis showed two types of nrDNA, while G. uralensis contained a single type. The complete 45S nrDNA sequence unit contains 18S rRNA, ITS1, 5.8S rRNA, ITS2 and 26S rRNA. We identified simple sequence repeat and tandem repeat sequences. We also developed four reliable markers for analysis of Glycyrrhiza diversity authentication.
Kubicek, Christian P.; Herrera-Estrella, Alfredo; Seidl-Seiboth, Verena; Martinez, Diego A.; Druzhinina, Irina S.; Thon, Michael; Zeilinger, Susanne; Casas-Flores, Sergio; Horwitz, Benjamin A.; Mukherjee, Prasun K.; Mukherjee, Mala; Kredics, László; Alcaraz, Luis D.; Aerts, Andrea; Antal, Zsuzsanna
Background Mycoparasitism, a lifestyle where one fungus is parasitic on another fungus, has special relevance when the prey is a plant pathogen, providing a strategy for biological control of pests for plant protection. Probably, the most studied biocontrol agents are species of the genus Hypocrea/Trichoderma. Results Here we report an analysis of the genome sequences of the two biocontrol species Trichoderma atroviride (teleomorph Hypocrea atroviridis) and Trichoderma virens (formerly Gliocl...
Creixell, Pau; Reimand, Jueri; Haider, Syed
Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...
The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has
Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...
Beulé, Thierry; Agbessi, Mawussé Dt; Dussert, Stephane; Jaligot, Estelle; Guyot, Romain
The oil palm (Elaeis guineensis Jacq.) is a major cultivated crop and the world's largest source of edible vegetable oil. The genus Elaeis comprises two species E. guineensis, the commercial African oil palm and E. oleifera, which is used in oil palm genetic breeding. The recent publication of both the African oil palm genome assembly and the first draft sequence of its Latin American relative now allows us to tackle the challenge of understanding the genome composition, structure and evolution of these palm genomes through the annotation of their repeated sequences. In this study, we identified, annotated and compared Transposable Elements (TE) from the African and Latin American oil palms. In a first step, Transposable Element databases were built through de novo detection in both genome sequences then the TE content of both genomes was estimated. Then putative full-length retrotransposons with Long Terminal Repeats (LTRs) were further identified in the E. guineensis genome for characterization of their structural diversity, copy number and chromosomal distribution. Finally, their relative expression in several tissues was determined through in silico analysis of publicly available transcriptome data. Our results reveal a congruence in the transpositional history of LTR retrotransposons between E. oleifera and E. guineensis, especially the Sto-4 family. Also, we have identified and described 583 full-length LTR-retrotransposons in the Elaeis guineensis genome. Our work shows that these elements are most likely no longer mobile and that no recent insertion event has occurred. Moreover, the analysis of chromosomal distribution suggests a preferential insertion of Copia elements in gene-rich regions, whereas Gypsy elements appear to be evenly distributed throughout the genome. Considering the high proportion of LTR retrotransposon in the oil palm genome, our work will contribute to a greater understanding of their impact on genome organization and evolution
Full Text Available This paper reports detailed FISH-based karyotypes for three diploid wheatgrass species Agropyron cristatum (L. Beauv., Thinopyrum bessarabicum (Savul.&Rayss A. Löve, Pseudoroegneria spicata (Pursh A. Löve, the supposed ancestors of hexaploid Thinopyrum intermedium (Host Barkworth & D.R.Dewey, compiled using DNA repeats and comparative genome analysis based on COS markers. Fluorescence in situ hybridization (FISH with repetitive DNA probes proved suitable for the identification of individual chromosomes in the diploid JJ, StSt and PP genomes. Of the seven microsatellite markers tested only the (GAAn trinucleotide sequence was appropriate for use as a single chromosome marker for the P. spicata AS chromosome. Based on COS marker analysis, the phylogenetic relationship between diploid wheatgrasses and the hexaploid bread wheat genomes was established. These findings confirmed that the J and E genomes are in neighbouring clusters.
Full Text Available Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.
Rey, Michael W; Ramaiya, Preethi; Nelson, Beth A; Brody-Karpin, Shari D; Zaretsky, Elizabeth J; Tang, Maria; de Leon, Alfredo Lopez; Xiang, Henry; Gusti, Veronica; Clausen, Ib Groth; Olsen, Peter B; Rasmussen, Michael D; Andersen, Jens T; Jørgensen, Per L; Larsen, Thomas S; Sorokin, Alexei; Bolotin, Alexander; Lapidus, Alla; Galleron, Nathalie; Ehrlich, S Dusko; Berka, Randy M
Background Bacillus licheniformis is a Gram-positive, spore-forming soil bacterium that is used in the biotechnology industry to manufacture enzymes, antibiotics, biochemicals and consumer products. This species is closely related to the well studied model organism Bacillus subtilis, and produces an assortment of extracellular enzymes that may contribute to nutrient cycling in nature. Results We determined the complete nucleotide sequence of the B. licheniformis ATCC 14580 genome which comprises a circular chromosome of 4,222,336 base-pairs (bp) containing 4,208 predicted protein-coding genes with an average size of 873 bp, seven rRNA operons, and 72 tRNA genes. The B. licheniformis chromosome contains large regions that are colinear with the genomes of B. subtilis and Bacillus halodurans, and approximately 80% of the predicted B. licheniformis coding sequences have B. subtilis orthologs. Conclusions Despite the unmistakable organizational similarities between the B. licheniformis and B. subtilis genomes, there are notable differences in the numbers and locations of prophages, transposable elements and a number of extracellular enzymes and secondary metabolic pathway operons that distinguish these species. Differences include a region of more than 80 kilobases (kb) that comprises a cluster of polyketide synthase genes and a second operon of 38 kb encoding plipastatin synthase enzymes that are absent in the B. licheniformis genome. The availability of a completed genome sequence for B. licheniformis should facilitate the design and construction of improved industrial strains and allow for comparative genomics and evolutionary studies within this group of Bacillaceae. PMID:15461803
Gostinčar, Cene; Ohm, Robin A; Kogej, Tina; Sonjak, Silva; Turk, Martina; Zajc, Janja; Zalar, Polona; Grube, Martin; Sun, Hui; Han, James; Sharma, Aditi; Chiniquy, Jennifer; Ngan, Chew Yee; Lipzen, Anna; Barry, Kerrie; Grigoriev, Igor V; Gunde-Cimerman, Nina
Aureobasidium pullulans is a black-yeast-like fungus used for production of the polysaccharide pullulan and the antimycotic aureobasidin A, and as a biocontrol agent in agriculture. It can cause opportunistic human infections, and it inhabits various extreme environments. To promote the understanding of these traits, we performed de-novo genome sequencing of the four varieties of A. pullulans. The 25.43-29.62 Mb genomes of these four varieties of A. pullulans encode between 10266 and 11866 predicted proteins. Their genomes encode most of the enzyme families involved in degradation of plant material and many sugar transporters, and they have genes possibly associated with degradation of plastic and aromatic compounds. Proteins believed to be involved in the synthesis of pullulan and siderophores, but not of aureobasidin A, are predicted. Putative stress-tolerance genes include several aquaporins and aquaglyceroporins, large numbers of alkali-metal cation transporters, genes for the synthesis of compatible solutes and melanin, all of the components of the high-osmolarity glycerol pathway, and bacteriorhodopsin-like proteins. All of these genomes contain a homothallic mating-type locus. The differences between these four varieties of A. pullulans are large enough to justify their redefinition as separate species: A. pullulans, A. melanogenum, A. subglaciale and A. namibiae. The redundancy observed in several gene families can be linked to the nutritional versatility of these species and their particular stress tolerance. The availability of the genome sequences of the four Aureobasidium species should improve their biotechnological exploitation and promote our understanding of their stress-tolerance mechanisms, diverse lifestyles, and pathogenic potential.
Ghatak, Sandeep; Blom, Jochen; Das, Samir; Sanjukta, Rajkumari; Puro, Kekungu; Mawlong, Michael; Shakuntala, Ingudam; Sen, Arnab; Goesmann, Alexander; Kumar, Ashok; Ngachan, S V
Aeromonas species are important pathogens of fishes and aquatic animals capable of infecting humans and other animals via food. Due to the paucity of pan-genomic studies on aeromonads, the present study was undertaken to analyse the pan-genome of three clinically important Aeromonas species (A. hydrophila, A. veronii, A. caviae). Results of pan-genome analysis revealed an open pan-genome for all three species with pan-genome sizes of 9181, 7214 and 6884 genes for A. hydrophila, A. veronii and A. caviae, respectively. Core-genome: pan-genome ratio (RCP) indicated greater genomic diversity for A. hydrophila and interestingly RCP emerged as an effective indicator to gauge genomic diversity which could possibly be extended to other organisms too. Phylogenomic network analysis highlighted the influence of homologous recombination and lateral gene transfer in the evolution of Aeromonas spp. Prediction of virulence factors indicated no significant difference among the three species though analysis of pathogenic potential and acquired antimicrobial resistance genes revealed greater hazards from A. hydrophila. In conclusion, the present study highlighted the usefulness of whole genome analyses to infer evolutionary cues for Aeromonas species which indicated considerable phylogenomic diversity for A. hydrophila and hitherto unknown genomic evidence for pathogenic potential of A. hydrophila compared to A. veronii and A. caviae.
Full Text Available DNA methylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNA methylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS. While RRBS has proven to be effective in understanding DNA methylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNA methylation in agricultural animals. Here we describe the utilisation of RRBS to investigate DNA methylation in sheep Longissimus dorsi muscles. RRBS analysis of ∼1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNA methylation analysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNA methylation patterns in sheep and other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNA methylation in agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species.
Full Text Available The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L. is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut and a reference species (oil palm to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/.
Armero, Alix; Baudouin, Luc; Bocs, Stéphanie; This, Dominique
The palms are a family of tropical origin and one of the main constituents of the ecosystems of these regions around the world. The two main species of palm represent different challenges: coconut (Cocos nucifera L.) is a source of multiple goods and services in tropical communities, while oil palm (Elaeis guineensis Jacq) is the main protagonist of the oil market. In this study, we present a workflow that exploits the comparative genomics between a target species (coconut) and a reference species (oil palm) to improve the transcriptomic data, providing a proteome useful to answer functional or evolutionary questions. This workflow reduces redundancy and fragmentation, two inherent problems of transcriptomic data, while preserving the functional representation of the target species. Our approach was validated in Arabidopsis thaliana using Arabidopsis lyrata and Capsella rubella as references species. This analysis showed the high sensitivity and specificity of our strategy, relatively independent of the reference proteome. The workflow increased the length of proteins products in A. thaliana by 13%, allowing, often, to recover 100% of the protein sequence length. In addition redundancy was reduced by a factor greater than 3. In coconut, the approach generated 29,366 proteins, 1,246 of these proteins deriving from new contigs obtained with the BRANCH software. The coconut proteome presented a functional profile similar to that observed in rice and an important number of metabolic pathways related to secondary metabolism. The new sequences found with BRANCH software were enriched in functions related to biotic stress. Our strategy can be used as a complementary step to de novo transcriptome assembly to get a representative proteome of a target species. The results of the current analysis are available on the website PalmComparomics (http://palm-comparomics.southgreen.fr/).
Full Text Available Abstract Background Members of the Anopheles punctulatus group (AP group are the primary vectors of human malaria in Papua New Guinea. The AP group includes 13 sibling species, most of them morphologically indistinguishable. Understanding why only certain species are able to transmit malaria requires a better comprehension of their evolutionary history. In particular, understanding relationships and divergence times among Anopheles species may enable assessing how malaria-related traits (e.g. blood feeding behaviours, vector competence have evolved. Methods DNA sequences of 14 mitochondrial (mt genomes from five AP sibling species and two species of the Anopheles dirus complex of Southeast Asia were sequenced. DNA sequences from all concatenated protein coding genes (10,770 bp were then analysed using a Bayesian approach to reconstruct phylogenetic relationships and date the divergence of the AP sibling species. Results Phylogenetic reconstruction using the concatenated DNA sequence of all mitochondrial protein coding genes indicates that the ancestors of the AP group arrived in Papua New Guinea 25 to 54 million years ago and rapidly diverged to form the current sibling species. Conclusion Through evaluation of newly described mt genome sequences, this study has revealed a divergence among members of the AP group in Papua New Guinea that would significantly predate the arrival of humans in this region, 50 thousand years ago. The divergence observed among the mtDNA sequences studied here may have resulted from reproductive isolation during historical changes in sea-level through glacial minima and maxima. This leads to a hypothesis that the AP sibling species have evolved independently for potentially thousands of generations. This suggests that the evolution of many phenotypes, such as insecticide resistance will arise independently in each of the AP sibling species studied here.
Grijseels, Sietske; Nielsen, Jens Christian; Randelovic, Milica
A new soil-borne species belonging to the Penicillium section Canescentia is described, Penicillium arizonense sp. nov. (type strain CBS 141311T = IBT 12289T). The genome was sequenced and assembled into 33.7 Mb containing 12,502 predicted genes. A phylogenetic assessment based on marker genes...... confirmed the grouping of P. arizonense within section Canescentia. Compared to related species, P. arizonense proved to encode a high number of proteins involved in carbohydrate metabolism, in particular hemicellulases. Mining the genome for genes involved in secondary metabolite biosynthesis resulted...... of biosynthetic gene clusters in P. arizonense responsible for the synthesis of all detected compounds except curvulinic acid. The capacity to produce biomass degrading enzymes and the identification of a high chemical diversity in secreted bioactive secondary metabolites, offers a broad range of potential...
Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.
Buske Orion J
Full Text Available Abstract Background As genome-wide experiments and annotations become more prevalent, researchers increasingly require tools to help interpret data at this scale. Many functional genomics experiments involve partitioning the genome into labeled segments, such that segments sharing the same label exhibit one or more biochemical or functional traits. For example, a collection of ChlP-seq experiments yields a compendium of peaks, each labeled with one or more associated DNA-binding proteins. Similarly, manually or automatically generated annotations of functional genomic elements, including cis-regulatory modules and protein-coding or RNA genes, can also be summarized as genomic segmentations. Results We present a software toolkit called Segtools that simplifies and automates the exploration of genomic segmentations. The software operates as a series of interacting tools, each of which provides one mode of summarization. These various tools can be pipelined and summarized in a single HTML page. We describe the Segtools toolkit and demonstrate its use in interpreting a collection of human histone modification data sets and Plasmodium falciparum local chromatin structure data sets. Conclusions Segtools provides a convenient, powerful means of interpreting a genomic segmentation.
Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.
Chen, Zhi-Teng; Wu, Hai-Yan; Du, Yu-Zhou
We report the nearly complete mitochondrial genome of a stonefly species, Styloperla sp. (Plecoptera: Styloperlidae), which is a circular molecule of 15,416 bp in length and consists of 13 protein-coding genes, 2 ribosomal RNAs, 20 transfer RNAs and a partial control region (645 bp). Using the 13 protein-coding genes of 8 stoneflies and 3 other related species, we constructed a phylogenetic tree to verify the accuracy of the new determined mitogenome sequences. Our results provide basic data for further study of phylogeny in Plecoptera.
Bohlin, J; Snipen, L; Hardy, S.P.
the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content...
Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover
Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...
Buyuktimkin, Bora; Saier, Milton H
Select species of the bacterial genus Leptospira are causative agents of leptospirosis, an emerging global zoonosis affecting nearly one million people worldwide annually. We examined two Leptospira pathogens, Leptospira interrogans serovar Lai str. 56601 and Leptospira borgpetersenii serovar Hardjo-bovis str. L550, as well as the free-living leptospiral saprophyte, Leptospira biflexa serovar Patoc str. 'Patoc 1 (Ames)'. The transport proteins of these leptospires were identified and compared using bioinformatics to gain an appreciation for which proteins may be related to pathogenesis and saprophytism. L. biflexa possesses a disproportionately high number of secondary carriers for metabolite uptake and environmental adaptability as well as an increased number of inorganic cation transporters providing ionic homeostasis and effective osmoregulation in a rapidly changing environment. L. interrogans and L. borgpetersenii possess far fewer transporters, but those that they have are remarkably similar, with near-equivalent representation in most transporter families. These two Leptospira pathogens also possess intact sphingomyelinases, holins, and virulence-related outer membrane porins. These virulence-related factors, in conjunction with decreased transporter substrate versatility, indicate that pathogenicity was accompanied by progressively narrowing ecological niches and the emergence of a limited set of proteins responsible for host invasion. The variability of host tropism and mortality rates by infectious leptospires suggests that small differences in individual sets of proteins play important physiological and pathological roles. Copyright © 2015 Elsevier Ltd. All rights reserved.
Le Quéré, Antoine; Tak, Nisha; Gehlot, Hukam Singh; Lavire, Celine; Meyer, Thibault; Chapulliot, David; Rathi, Sonam; Sakrouhi, Ilham; Rocha, Guadalupe; Rohmer, Marine; Severac, Dany; Filali-Maltouf, Abdelkarim; Munive, Jose-Antonio
Nitrogen fixing bacteria isolated from hot arid areas in Asia, Africa and America but from diverse leguminous plants have been recently identified as belonging to a possible new species of Ensifer (Sinorhizobium). In this study, 6 strains belonging to this new clade were compared with Ensifer species at the genome-wide level. Their capacities to utilize various carbon sources and to establish a symbiotic interaction with several leguminous plants were examined. Draft genomes of selected strains isolated from Morocco (Merzouga desert), Mexico (Baja California) as well as from India (Thar desert) were produced. Genome based species delineation tools demonstrated that they belong to a new species of Ensifer. Comparison of its core genome with those of E. meliloti, E. medicae and E. fredii enabled the identification of a species conserved gene set. Predicted functions of associated proteins and pathway reconstruction revealed notably the presence of transport systems for octopine/nopaline and inositol phosphates. Phenotypic characterization of this new desert rhizobium species showed that it was capable to utilize malonate, to grow at 48 °C or under high pH while NaCl tolerance levels were comparable to other Ensifer species. Analysis of accessory genomes and plasmid profiling demonstrated the presence of large plasmids that varied in size from strain to strain. As symbiotic functions were found in the accessory genomes, the differences in symbiotic interactions between strains may be well related to the difference in plasmid content that could explain the different legumes with which they can develop the symbiosis. The genomic analysis performed here confirms that the selected rhizobial strains isolated from desert regions in three continents belong to a new species. As until now only recovered from such harsh environment, we propose to name it Ensifer aridi. The presented genomic data offers a good basis to explore adaptations and functionalities that enable them
Hasbún, Rodrigo; Valledor, Luís; Rodríguez, José L; Santamaria, Estrella; Ríos, Darcy; Sanchez, Manuel; Cañal, María J; Rodríguez, Roberto
Quantification of deoxynucleosides using micellar high-performance capillary electrophoresis (HPCE) is an efficient, fast and inexpensive evaluation method of genomic DNA methylation. This approach has been demonstrated to be more sensitive and specific than other methods for the quantification of DNA methylation content. However, effective detection and quantification of 5-methyl-2'-deoxycytidine depend of the sample characteristics. Previous works have revealed that in most woody species, the quality and quantity of RNA-free DNA extracted that is suitable for analysis by means of HPCE varies among species of the same gender, among tissues taken from the same tree, and vary in the same tissue depending on the different seasons of the year. The aim of this work is to establish a quantification method of genomic DNA methylation that lends itself to use in different Castanea sativa Mill. materials, and in other angiosperm and gymnosperm woody species. Using a DNA extraction kit based in silica membrane has increased the resolutive capacity of the method. Under these conditions, it can be analyzed different organs or tissues of angiosperms and gymnosperms, regardless of their state of development. We emphasized the importance of samples free of nucleosides, although, in the contrary case, the method ensures the effective separation of deoxynucleosides and identification of 5-methyl-2'-deoxycytidine.
The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has given rise to the parallel development of other high-throughput approaches such as determining mRNA expression level changes, gene-deletion phenotypes, chromosomal location of DNA binding proteins, cel...
Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng
Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.
Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.
Chai Zhifang; Mao Xueying; Wang Yuqi; Sun Jingxin; Qian Qingfang; Hou Xiaolin; Zhang Peiqun; Chen Chunying; Feng Weiyu; Ding Wenjun; Li Xiaolin; Li Chunsheng; Dai Xiongxin
The Molecular Activation Analysis (MAA) mainly refers to an activation analysis method that is able to provide information about the chemical species of elements in systems of interest, though its exact definition has remained to be assigned. Its development is strongly stimulated by the urgent need to know the chemical species of elements, because the bulk contents or concentrations are often insignificant for judging biological, environmental or geochemical effects of elements. In this paper, the features, methodology and limitation of MAA were outlined. Further, the up-to-date MAA progress made in our laboratory was introduced as well. (author)
Keeling Patrick J
Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements
Meganathan, P R; Pagan, Heidi J T; McCulloch, Eve S; Stevens, Richard D; Ray, David A
Order Chiroptera is a unique group of mammals whose members have attained self-powered flight as their main mode of locomotion. Much speculation persists regarding bat evolution; however, lack of sufficient molecular data hampers evolutionary and conservation studies. Of ~1200 species, complete mitochondrial genome sequences are available for only eleven. Additional sequences should be generated if we are to resolve many questions concerning these fascinating mammals. Herein, we describe the complete mitochondrial genomes of three bats: Corynorhinus rafinesquii, Lasiurus borealis and Artibeus lituratus. We also compare the currently available mitochondrial genomes and analyze codon usage in Chiroptera. C. rafinesquii, L. borealis and A. lituratus mitochondrial genomes are 16438 bp, 17048 bp and 16709 bp, respectively. Genome organization and gene arrangements are similar to other bats. Phylogenetic analyses using complete mitochondrial genome sequences support previously established phylogenetic relationships and suggest utility in future studies focusing on the evolutionary aspects of these species. Comprehensive analyses of available bat mitochondrial genomes reveal distinct nucleotide patterns and synonymous codon preferences corresponding to different chiropteran families. These patterns suggest that mutational and selection forces are acting to different extents within Chiroptera and shape their mitochondrial genomes. Copyright © 2011 Elsevier B.V. All rights reserved.
Full Text Available Abstract Background Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. Results We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. Conclusions The GWAMA (Genome-Wide Association Meta-Analysis software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.
Kim, Hyun-Joong; Ryu, Ji-Oh; Song, Ji-Yeon; Kim, Hae-Yeong
In the detection of Shigella species using molecular biological methods, previously known genetic markers for Shigella species were not sufficient to discriminate between Shigella species and diarrheagenic Escherichia coli. The purposes of this study were to screen for genetic markers of the Shigella genus and four Shigella species through comparative genomics and develop a multiplex polymerase chain reaction (PCR) for the detection of shigellae and Shigella species. A total of seven genomic DNA sequences from Shigella species were subjected to comparative genomics for the screening of genetic markers of shigellae and each Shigella species. The primer sets were designed from the screened genetic markers and evaluated using PCR with genomic DNAs from Shigella and other bacterial strains in Enterobacteriaceae. A novel Shigella quintuplex PCR, designed for the detection of Shigella genus, S. dysenteriae, S. boydii, S. flexneri, and S. sonnei, was developed from the evaluated primer sets, and its performance was demonstrated with specifically amplified results from each Shigella species. This Shigella multiplex PCR is the first to be reported with novel genetic markers developed through comparative genomics and may be a useful tool for the accurate detection of the Shigella genus and species from closely related bacteria in clinical microbiology and food safety.
Yoo, Ran Hee; Lee, Seung-Won; Lim, Seungmo; Zhao, Fumei; Igori, Davaajargal; Baek, Dasom; Hong, Jin-Sung; Lee, Su-Heon; Moon, Jae Sun
Two novel viruses, isolated in Bonghwa, Republic of Korea, from an Ixeridium dentatum plant with yellowing mottle symptoms, have been provisionally named Ixeridium yellow mottle-associated virus 1 (IxYMaV-1) and Ixeridium yellow mottle-associated virus 2 (IxYMaV-2). IxYMaV-1 has a genome of 6,017 nucleotides sharing a 56.4% sequence identity with that of cucurbit aphid-borne yellows virus (genus Polerovirus). The IxYMaV-2 genome of 4,196 nucleotides has a sequence identity of less than 48.3% with e other species classified within the genus Umbravirus. Genome properties and phylogenetic analysis suggested that IxYMaV-1 and -2 are representative isolates of new species classifiable within the genus Polerovirus and Umbravirus, respectively.
Price, Erin P; Sarovich, Derek S; Webb, Jessica R; Hall, Carina M; Jaramillo, Sierra A; Sahl, Jason W; Kaestli, Mirjam; Mayo, Mark; Harrington, Glenda; Baker, Anthony L; Sidak-Loftis, Lindsay C; Settles, Erik W; Lummis, Madeline; Schupp, James M; Gillece, John D; Tuanyok, Apichai; Warner, Jeffrey; Busch, Joseph D; Keim, Paul; Currie, Bart J; Wagner, David M
The bacterium Burkholderia ubonensis is commonly co-isolated from environmental specimens harbouring the melioidosis pathogen, Burkholderia pseudomallei. B. ubonensis has been reported in northern Australia and Thailand but not North America, suggesting similar geographic distribution to B. pseudomallei. Unlike most other Burkholderia cepacia complex (Bcc) species, B. ubonensis is considered non-pathogenic, although its virulence potential has not been tested. Antibiotic resistance in B. ubonensis, particularly towards drugs used to treat the most severe B. pseudomallei infections, has also been poorly characterised. This study examined the population biology of B. ubonensis, and includes the first reported isolates from the Caribbean. Phylogenomic analysis of 264 B. ubonensis genomes identified distinct clades that corresponded with geographic origin, similar to B. pseudomallei. A small proportion (4%) of strains lacked the 920kb chromosome III replicon, with discordance of presence/absence amongst genetically highly related strains, demonstrating that the third chromosome of B. ubonensis, like other Bcc species, probably encodes for a nonessential pC3 megaplasmid. Multilocus sequence typing using the B. pseudomallei scheme revealed that one-third of strains lack the "housekeeping" narK locus. In comparison, all strains could be genotyped using the Bcc scheme. Several strains possessed high-level meropenem resistance (≥32 μg/mL), a concern due to potential transmission of this phenotype to B. pseudomallei. In silico analysis uncovered a high degree of heterogeneity among the lipopolysaccharide O-antigen cluster loci, with at least 35 different variants identified. Finally, we show that Asian B. ubonensis isolate RF23-BP41 is avirulent in the BALB/c mouse model via a subcutaneous route of infection. Our results provide several new insights into the biology of this understudied species.
Lemos, Leandro N.; Pereira, Roberta V.; Quaggio, Ronaldo B.; Martins, Layla F.; Moura, Livia M. S.; da Silva, Amanda R.; Antunes, Luciana P.; da Silva, Aline M.; Setubal, João C.
Microbial consortia selected from complex lignocellulolytic microbial communities are promising alternatives to deconstruct plant waste, since synergistic action of different enzymes is required for full degradation of plant biomass in biorefining applications. Culture enrichment also facilitates the study of interactions among consortium members, and can be a good source of novel microbial species. Here, we used a sample from a plant waste composting operation in the São Paulo Zoo (Brazil) as inoculum to obtain a thermophilic aerobic consortium enriched through multiple passages at 60°C in carboxymethylcellulose as sole carbon source. The microbial community composition of this consortium was investigated by shotgun metagenomics and genome-centric analysis. Six near-complete (over 90%) genomes were reconstructed. Similarity and phylogenetic analyses show that four of these six genomes are novel, with the following hypothesized identifications: a new Thermobacillus species; the first Bacillus thermozeamaize genome (for which currently only 16S sequences are available) or else the first representative of a new family in the Bacillales order; the first representative of a new genus in the Paenibacillaceae family; and the first representative of a new deep-branching family in the Clostridia class. The reconstructed genomes from known species were identified as Geobacillus thermoglucosidasius and Caldibacillus debilis. The metabolic potential of these recovered genomes based on COG and CAZy analyses show that these genomes encode several glycoside hydrolases (GHs) as well as other genes related to lignocellulose breakdown. The new Thermobacillus species stands out for being the richest in diversity and abundance of GHs, possessing the greatest potential for biomass degradation among the six recovered genomes. We also investigated the presence and activity of the organisms corresponding to these genomes in the composting operation from which the consortium was built
Li, Shengbin; Li, Bo; Cheng, Cheng
sequences of multiple crested ibis individuals, its thriving co-habitant, the little egret, Egretta garzetta, and the recently sequenced genomes of 41 other avian species that are under various degrees of survival threats, including the bald eagle, we carry out comparative analyses for genomic signatures...
Full Text Available Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV, HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1-393, 394-1410, 1411-2910 within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics.
Full Text Available Background. The honey bee (Apis mellifera is the most important pollinator in agriculture worldwide. However, the number of honey bees has fallen significantly since 2006, becoming a huge ecological problem nowadays. The principal cause is CCD, or Colony Collapse Disorder, characterized by the seemingly spontaneous abandonment of hives by their workers. One of the characteristics of CCD in honey bees is the alteration of the bacterial communities in their gastrointestinal tract, mainly due to the decrease of Firmicutes populations, such as the Lactobacilli. At this time, the causes of these alterations remain unknown. We recently isolated a strain of Lactobacillus kunkeei (L. kunkeei strain MP2 from the gut of Chilean honey bees. L. kunkeei, is one of the most commonly isolated bacterium from the honey bee gut and is highly versatile in different ecological niches. In this study, we aimed to elucidate in detail, the L. kunkeei genetic background and perform a comparative genome analysis with other Lactobacillus species. Methods. L. kunkeei MP2 was originally isolated from the guts of Chilean A. mellifera individuals. Genome sequencing was done using Pacific Biosciences single-molecule real-time sequencing technology. De novo assembly was performed using Celera assembler. The genome was annotated using Prokka, and functional information was added using the EggNOG 3.1 database. In addition, genomic islands were predicted using IslandViewer, and pro-phage sequences using PHAST. Comparisons between L. kunkeei MP2 with other L. kunkeei, and Lactobacillus strains were done using Roary. Results. The complete genome of L. kunkeei MP2 comprises one circular chromosome of 1,614,522 nt. with a GC content of 36,9%. Pangenome analysis with 16 L. kunkeei strains, identified 113 unique genes, most of them related to phage insertions. A large and unique region of L. kunkeei MP2 genome contains several genes that encode for phage structural protein and
Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.
Dec 20, 2006 ... progestins, as well as lipids, cholesterol metabolites, and. Genome ... Gene structure analysis shows strong conservation of exon structures among orthologoues. ..... earlier subfamily classification of NRs (Nuclear Receptors.
This issue is the collection of the papers presented at the 25th NIRS symposium on Human, Mouse Genome Analysis and Radiation Biology. The 14 of the presented papers are indexed individually. (J.P.N.)
Illa, Eudald; Sargent, Daniel J; Lopez Girona, Elena; Bushakra, Jill; Cestaro, Alessandro; Crowhurst, Ross; Pindo, Massimo; Cabrera, Antonio; van der Knaap, Esther; Iezzoni, Amy; Gardiner, Susan; Velasco, Riccardo; Arús, Pere; Chagné, David; Troggio, Michela
Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.
Full Text Available Abstract Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.
Marcelletti, Simone; Scortichini, Marco
A total of 21 Xylella fastidiosa strains were assessed by comparing their genomes to infer their taxonomic relationships. The whole-genome-based average nucleotide identity and tetranucleotide frequency correlation coefficient analyses were performed. In addition, a consensus tree based on comparisons of 956 core gene families, and a genome-wide phylogenetic tree and a Neighbor-net network were constructed with 820,088 nucleotides (i.e., approximately 30-33 % of the entire X. fastidiosa genome). All approaches revealed the occurrence of three well-demarcated genetic clusters that represent X. fastidiosa subspecies fastidiosa, multiplex and pauca, with the latter appeared to diverge. We suggest that the proposed but never formally described subspecies 'sandyi' and 'morus' are instead members of the subspecies fastidiosa. These analyses support the view that the Xylella strain isolated from Pyrus pyrifolia in Taiwan is likely to be a new species. A widely used multilocus sequence typing analysis yielded conflicting results.
Full Text Available The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC tool and Pathogenomic Profiling Tool (PathoProT, which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
Zheng, Wenning; Tan, Tze King; Paterson, Ian C; Mutha, Naresh V R; Siow, Cheuk Chuen; Tan, Shi Yang; Old, Lesley A; Jakubovics, Nicholas S; Choo, Siew Woh
The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE) and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC) tool and Pathogenomic Profiling Tool (PathoProT), which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.
It was also shown that the profile generated by taking all dinucleotides together ... Keywords. genome signature; DRAP; HIV-1; chaos game representation. Journal of .... be used to quantify low levels of variation as are observed within species ..... Dayton A.I., Sodroski J.G., Rosen C.A., Goh W.C. and Haseltine. W.A. 1986 ...
Wang, Hongxia; Walla, James A; Zhong, Shaobin; Huang, Danqiong; Dai, Wenhao
Chokecherry (Prunus virginiana L.) (2n = 4x = 32) is a unique Prunus species for both genetics and disease-resistance research due to its tetraploid nature and X-disease resistance. However, no genetic and genomic information on chokecherry is available. A partial chokecherry genome was sequenced using Roche 454 sequencing technology. A total of 145,094 reads covering 4.8 Mbp of the chokecherry genome were generated and 15,113 contigs were assembled, of which 11,675 contigs were larger than 100 bp in size. A total of 481 SSR loci were identified from 234 (out of 11,675) contigs and 246 polymerase chain reaction (PCR) primer pairs were designed. Of 246 primers, 212 (86.2 %) effectively produced amplification from the genomic DNA of chokecherry. All 212 amplifiable chokecherry primers were used to amplify genomic DNA from 11 other rosaceous species (sour cherry, sweet cherry, black cherry, peach, apricot, plum, apple, crabapple, pear, juneberry, and raspberry). Thus, chokecherry SSR primers can be transferable across Prunus species and other rosaceous species. An average of 63.2 and 58.7 % of amplifiable chokecherry primers amplified DNA from cherry and other Prunus species, respectively, while 47.2 % of amplifiable chokecherry primers amplified DNA from other rosaceous species. Using random genome sequence data generated from next-generation sequencing technology to identify microsatellite loci appears to be rapid and cost-efficient, particularly for species with no sequence information available. Sequence information and confirmed transferability of the identified chokecherry SSRs among species will be valuable for genetic research in Prunus and other rosaceous species. Key message A total of 246 SSR primers were identified from chokecherry genome sequences. Of which, 212 were confirmed amplifiable both in chokecherry and other 11 other rosaceous species.
Full Text Available Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these
Yu, Jihyun; Ahn, Sojin; Kim, Kwondo; Caetano-Anolles, Kelsey; Lee, Chanho; Kang, Jungsun; Cho, Kyungjin; Yoon, Sook Hee; Kang, Dae-Kyung; Kim, Heebal
As probiotics play an important role in maintaining a healthy gut flora environment through antitoxin activity and inhibition of pathogen colonization, they have been of interest to the medical research community for quite some time now. Probiotic bacteria such as Lactobacillus plantarum , which can be found in fermented food, are of particular interest given their easy accessibility. We performed whole-genome sequencing and genomic analysis on a GB-LP1 strain of L. plantarum isolated from Korean traditional fermented food; this strain is well known for its functions in immune response, suppression of pathogen growth, and antitoxin effects. The complete genome sequence of GB-LP1 is a single chromosome of 3,040,388 bp with 2,899 predicted open reading frames. Genomic analysis of GB-LP1 revealed two CRISPR regions and genes showing accelerated evolution, which may have antibiotic and antitoxin functions. The aim of the present study was to predict strain specific-genomic characteristics and assess the potential of this new strain as lactic acid bacteria at the genomic level using in silico analysis. These results provide insight into the L. plantarum species as well as confirm the possibility of its utility as a candidate probiotic.
Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming
The data presented in this article are related to the published entitled "Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria" (Liu et al., 2017) . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming
The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017) . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Full Text Available The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017 . Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.
Permutation Multivariate Analysis of Variance ( PerMANOVA ). We used PerMANOVA to test the null-hypothesis of no... permutation -based version of the multivariate analysis of variance (MANOVA). PerMANOVA uses the distances between samples to partition variance and...coli. Antibiotics, bacteria, community analysis , diabetes, pyrosequencing, wound, wound therapy, 16S rRNA gene Genomic Analysis of Complex
Joseph, Sandeep J.; Marti, Hanna; Didelot, Xavier; Castillo-Ramirez, Santiago; Read, Timothy D.; Dean, Deborah
Chlamydiaceae are obligate intracellular bacteria that cause a diversity of severe infections among humans and livestock on a global scale. Identification of new species since 1989 and emergence of zoonotic infections, including abortion in women, underscore the need for genome sequencing of multiple strains of each species to advance our knowledge of evolutionary dynamics across Chlamydiaceae. Here, we genome sequenced isolates from avian, lower mammalian and human hosts. Based on core gene ...
that may help to explain why one species is pathogenic and the other is considered to be commensal. However, it was not possible to identify specific virulence determinant factors that could explain the differences in the pathogenicity of the analyzed species. The M. hyorhinis genome contains differences in some components involved in metabolism and evasion of the host’s immune system that may contribute to its growth aggressiveness. Several horizontal gene transfer events were identified. The phylogenomic analysis places M. hyopneumoniae, M. flocculare and M. hyorhinis in the hyopneumoniae clade. PMID:23497205
Mascagni, Flavia; Giordani, Tommaso; Ceccarelli, Marilena; Cavallini, Andrea; Natali, Lucia
Genome divergence by mobile elements activity and recombination is a continuous process that plays a key role in the evolution of species. Nevertheless, knowledge on retrotransposon-related variability among species belonging to the same genus is still limited. Considering the importance of the genus Helianthus, a model system for studying the ecological genetics of speciation and adaptation, we performed a comparative analysis of the repetitive genome fraction across ten species and one subspecies of sunflower, focusing on long terminal repeat retrotransposons at superfamily, lineage and sublineage levels. After determining the relative genome size of each species, genomic DNA was isolated and subjected to Illumina sequencing. Then, different assembling and clustering approaches allowed exploring the repetitive component of all genomes. On average, repetitive DNA in Helianthus species represented more than 75% of the genome, being composed mostly by long terminal repeat retrotransposons. Also, the prevalence of Gypsy over Copia superfamily was observed and, among lineages, Chromovirus was by far the most represented. Although nearly all the same sublineages are present in all species, we found considerable variability in the abundance of diverse retrotransposon lineages and sublineages, especially between annual and perennial species. This large variability should indicate that different events of amplification or loss related to these elements occurred following species separation and should have been involved in species differentiation. Our data allowed us inferring on the extent of interspecific repetitive DNA variation related to LTR-RE abundance, investigating the relationship between changes of LTR-RE abundance and the evolution of the genus, and determining the degree of coevolution of different LTR-RE lineages or sublineages between and within species. Moreover, the data suggested that LTR-RE abundance in a species was affected by the annual or perennial
Ziya Motalebipour, Elmira; Kafkas, Salih; Khodaeiaminjan, Mortaza; Çoban, Nergiz; Gözel, Hatice
Pistachio (Pistacia vera L.) is one of the most important nut crops in the world. There are about 11 wild species in the genus Pistacia, and they have importance as rootstock seed sources for cultivated P. vera and forest trees. Published information on the pistachio genome is limited. Therefore, a genome survey is necessary to obtain knowledge on the genome structure of pistachio by next generation sequencing. Simple sequence repeat (SSR) markers are useful tools for germplasm characterization, genetic diversity analysis, and genetic linkage mapping, and may help to elucidate genetic relationships among pistachio cultivars and species. To explore the genome structure of pistachio, a genome survey was performed using the Illumina platform at approximately 40× coverage depth in the P. vera cv. Siirt. The K-mer analysis indicated that pistachio has a genome that is about 600 Mb in size and is highly heterozygous. The assembly of 26.77 Gb Illumina data produced 27,069 scaffolds at N50 = 3.4 kb with a total of 513.5 Mb. A total of 59,280 SSR motifs were detected with a frequency of 8.67 kb. A total of 206 SSRs were used to characterize 24 P. vera cultivars and 20 wild Pistacia genotypes (four genotypes from each five wild Pistacia species) belonging to P. atlantica, P. integerrima, P. chinenesis, P. terebinthus, and P. lentiscus genotypes. Overall 135 SSR loci amplified in all 44 cultivars and genotypes, 41 were polymorphic in six Pistacia species. The novel SSR loci developed from cultivated pistachio were highly transferable to wild Pistacia species. The results from a genome survey of pistachio suggest that the genome size of pistachio is about 600 Mb with a high heterozygosity rate. This information will help to design whole genome sequencing strategies for pistachio. The newly developed novel polymorphic SSRs in this study may help germplasm characterization, genetic diversity, and genetic linkage mapping studies in the genus Pistacia.
Hiraki, Hideaki; Kagoshima, Hiroshi; Kraus, Christopher; Schiffer, Philipp H; Ueta, Yumiko; Kroiher, Michael; Schierenberg, Einhard; Kohara, Yuji
Sexual reproduction involving the fusion of egg and sperm is prevailing among eukaryotes. In contrast, the nematode Diploscapter coronatus, a close relative of the model Caenorhabditis elegans, reproduces parthenogenetically. Neither males nor sperm have been observed and some steps of meiosis are apparently skipped in this species. To uncover the genomic changes associated with the evolution of parthenogenesis in this nematode, we carried out a genome analysis. We obtained a 170 Mbp draft genome in only 511 scaffolds with a N 50 length of 1 Mbp. Nearly 90% of these scaffolds constitute homologous pairs with a 5.7% heterozygosity on average and inversions and translocations, meaning that the 170 Mbp sequences correspond to the diploid genome. Fluorescent staining shows that the D. coronatus genome consists of two chromosomes (2n = 2). In our genome annotation, we found orthologs of 59% of the C. elegans genes. However, a number of genes were missing or very divergent. These include genes involved in sex determination (e.g. xol-1, tra-2) and meiosis (e.g. the kleisins rec-8 and coh-3/4) giving a possible explanation for the absence of males and the second meiotic division. The high degree of heterozygosity allowed us to analyze the expression level of individual alleles. Most of the homologous pairs show very similar expression levels but others exhibit a 2-5-fold difference. Our high-quality draft genome of D. coronatus reveals the peculiarities of the genome of parthenogenesis and provides some clues to the genetic basis for parthenogenetic reproduction. This draft genome should be the basis to elucidate fundamental questions related to parthenogenesis such as its origin and mechanisms through comparative analyses with other nematodes. Furthermore, being the closest outgroup to the genus Caenorhabditis, the draft genome will help to disclose many idiosyncrasies of the model C. elegans and its congeners in future studies.
Ding, Mingquan; Chen, Jiadong; Jiang, Yurong; Lin, Lifeng; Cao, YueFen; Wang, Minhua; Zhang, Yuting; Rong, Junkang; Ye, Wuwei
WRKY transcription factors play important roles in various stress responses in diverse plant species. In cotton, this family has not been well studied, especially in relation to fiber development. Here, the genomes and transcriptomes of Gossypium raimondii and Gossypium arboreum were investigated to identify fiber development related WRKY genes. This represents the first comprehensive comparative study of WRKY transcription factors in both diploid A and D cotton species. In total, 112 G. raimondii and 109 G. arboreum WRKY genes were identified. No significant gene structure or domain alterations were detected between the two species, but many SNPs distributed unequally in exon and intron regions. Physical mapping revealed that the WRKY genes in G. arboreum were not located in the corresponding chromosomes of G. raimondii, suggesting great chromosome rearrangement in the diploid cotton genomes. The cotton WRKY genes, especially subgroups I and II, have expanded through multiple whole genome duplications and tandem duplications compared with other plant species. Sequence comparison showed many functionally divergent sites between WRKY subgroups, while the genes within each group are under strong purifying selection. Transcriptome analysis suggested that many WRKY genes participate in specific fiber development processes such as fiber initiation, elongation and maturation with different expression patterns between species. Complex WRKY gene expression such as differential Dt and At allelic gene expression in G. hirsutum and alternative splicing events were also observed in both diploid and tetraploid cottons during fiber development process. In conclusion, this study provides important information on the evolution and function of WRKY gene family in cotton species.
Studholme, D.J.; McDougal, R.L.; Sambles, C.; Hansen, E.; Hardy, G.; Grant, M.; Ganley, R.J.; Williams, N.M.
In New Zealand there has been a long association of Phytophthora diseases in forests, nurseries, remnant plantings and horticultural crops. However, new Phytophthora diseases of trees have recently emerged. Genome sequencing has been performed for 12 Phytophthora isolates, from six species: Phytophthora pluvialis, Phytophthora kernoviae, Phytophthora cinnamomi, Phytophthora agathidicida, Phytophthora multivora and Phytophthora taxon Totara. These sequences will enable comparative analyses to identify potential virulence strategies and ultimately facilitate better control strategies. This Whole Genome Shotgun data have been deposited in DDBJ/ENA/GenBank under the accession numbers LGTT00000000, LGTU00000000, JPWV00000000, JPWU00000000, LGSK00000000, LGSJ00000000, LGTR00000000, LGTS00000000, LGSM00000000, LGSL00000000, LGSO00000000, and LGSN00000000. PMID:26981359
Johnson, Robert A.; Wright, Karen D.; Poppleton, Helen; Mohankumar, Kumarasamypet M.; Finkelstein, David; Pounds, Stanley B.; Rand, Vikki; Leary, Sarah E.S.; White, Elsie; Eden, Christopher; Hogg, Twala; Northcott, Paul; Mack, Stephen; Neale, Geoffrey; Wang, Yong-Dong; Coyle, Beth; Atkinson, Jennifer; DeWire, Mariko; Kranenburg, Tanya A.; Gillespie, Yancey; Allen, Jeffrey C.; Merchant, Thomas; Boop, Fredrick A.; Sanford, Robert. A.; Gajjar, Amar; Ellison, David W.; Taylor, Michael D.; Grundy, Richard G.; Gilbertson, Richard J.
Understanding the biology that underlies histologically similar but molecularly distinct subgroups of cancer has proven difficult since their defining genetic alterations are often numerous, and the cellular origins of most cancers remain unknown1–3. We sought to decipher this heterogeneity by integrating matched genetic alterations and candidate cells of origin to generate accurate disease models. First, we identified subgroups of human ependymoma, a form of neural tumor that arises throughout the central nervous system (CNS). Subgroup specific alterations included amplifications and homozygous deletions of genes not yet implicated in ependymoma. To select cellular compartments most likely to give rise to subgroups of ependymoma, we matched the transcriptomes of human tumors to those of mouse neural stem cells (NSCs), isolated from different regions of the CNS at different developmental stages, with an intact or deleted Ink4a/Arf locus. The transcriptome of human cerebral ependymomas with amplified EPHB2 and deleted INK4A/ARF matched only that of embryonic cerebral Ink4a/Arf−/− NSCs. Remarkably, activation of Ephb2 signaling in these, but not other NSCs, generated the first mouse model of ependymoma, which is highly penetrant and accurately models the histology and transcriptome of one subgroup of human cerebral tumor. Further comparative analysis of matched mouse and human tumors revealed selective deregulation in the expression and copy number of genes that control synaptogenesis, pinpointing disruption of this pathway as a critical event in the production of this ependymoma subgroup. Our data demonstrate the power of cross-species genomics to meticulously match subgroup specific driver mutations with cellular compartments to model and interrogate cancer subgroups. PMID:20639864
Helena Sanches Marcon
Full Text Available Abstract Endogenous viral elements (EVEs are the result of heritable horizontal gene transfer from viruses to hosts. In the last years, several EVE integration events were reported in plants by the exponential availability of sequenced genomes. Eucalyptus grandis is a forest tree species with a sequenced genome that is poorly studied in terms of evolution and mobile genetic elements composition. Here we report the characterization of E. grandis endogenous viral element 1 (EgEVE_1, a transcriptionally active EVE with a size of 5,664 bp. Phylogenetic analysis and genomic distribution demonstrated that EgEVE_1 is a newly described member of the Caulimoviridae family, distinct from the recently characterized plant Florendoviruses. Genomic distribution of EgEVE_1 and Florendovirus is also distinct. EgEVE_1 qPCR quantification in Eucalyptus urophylla suggests that this genome has more EgEVE_1 copies than E. grandis. EgEVE_1 transcriptional activity was demonstrated by RT-qPCR in five Eucalyptus species and one intrageneric hybrid. We also identified that Eucalyptus EVEs can generate small RNAs (sRNAs,that might be involved in de novo DNA methylation and virus resistance. Our data suggest that EVE families in Eucalyptus have distinct properties, and we provide the first comparative analysis of EVEs in Eucalyptus genomes.
agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...
Homologous sequences were used in the definition of synteny relationships and subsequent identification of the shared disease response genes. The homologous genes within the human genome were then identified and aligned to the bovine radiation hybrid map in order to identify the mouse/bovine homologous regions.
Sahl, Jason W.; Gillece, John D.; Schupp, James M.; Waddell, Victor G.; Driebe, Elizabeth M.; Engelthaler, David M.; Keim, Paul
Acinetobacter baumannii is an emergent and global nosocomial pathogen. In addition to A. baumannii, other Acinetobacter species, especially those in the Acinetobacter calcoaceticus-baumannii (Acb) complex, have also been associated with serious human infection. Although mechanisms of attachment, persistence on abiotic surfaces, and pathogenesis in A. baumannii have been identified, the genetic mechanisms that explain the emergence of A. baumannii as the most widespread and virulent Acinetobacter species are not fully understood. Recent whole genome sequencing has provided insight into the phylogenetic structure of the genus Acinetobacter. However, a global comparison of genomic features between Acinetobacter spp. has not been described in the literature. In this study, 136 Acinetobacter genomes, including 67 sequenced in this study, were compared to identify the acquisition and loss of genes in the expansion of the Acinetobacter genus. A whole genome phylogeny confirmed that A. baumannii is a monophyletic clade and that the larger Acb complex is also a well-supported monophyletic group. The whole genome phylogeny provided the framework for a global genomic comparison based on a blast score ratio (BSR) analysis. The BSR analysis demonstrated that specific genes have been both lost and acquired in the evolution of A. baumannii. In addition, several genes associated with A. baumannii pathogenesis were found to be more conserved in the Acb complex, and especially in A. baumannii, than in other Acinetobacter genomes; until recently, a global analysis of the distribution and conservation of virulence factors across the genus was not possible. The results demonstrate that the acquisition of specific virulence factors has likely contributed to the widespread persistence and virulence of A. baumannii. The identification of novel features associated with transcriptional regulation and acquired by clades in the Acb complex presents targets for better understanding the
Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods Genome analysis... methods Data detail Data name Genome analysis methods DOI 10.18908/lsdba.nbdc01194-01-005 De...scription of data contents The current status and related information of the genomic analysis about each org...anism (March, 2014). In the case of organisms carried out genomic analysis, the d...e File name: pgdbj_dna_marker_linkage_map_genome_analysis_methods_en.zip File URL: ftp://ftp.biosciencedbc.j
De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David
Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. PMID:25281847
De Oliveira Martins, Leonardo; Mallo, Diego; Posada, David
Current phylogenomic data sets highlight the need for species tree methods able to deal with several sources of gene tree/species tree incongruence. At the same time, we need to make most use of all available data. Most species tree methods deal with single processes of phylogenetic discordance, namely, gene duplication and loss, incomplete lineage sorting (ILS) or horizontal gene transfer. In this manuscript, we address the problem of species tree inference from multilocus, genome-wide data sets regardless of the presence of gene duplication and loss and ILS therefore without the need to identify orthologs or to use a single individual per species. We do this by extending the idea of Maximum Likelihood (ML) supertrees to a hierarchical Bayesian model where several sources of gene tree/species tree disagreement can be accounted for in a modular manner. We implemented this model in a computer program called guenomu whose inputs are posterior distributions of unrooted gene tree topologies for multiple gene families, and whose output is the posterior distribution of rooted species tree topologies. We conducted extensive simulations to evaluate the performance of our approach in comparison with other species tree approaches able to deal with more than one leaf from the same species. Our method ranked best under simulated data sets, in spite of ignoring branch lengths, and performed well on empirical data, as well as being fast enough to analyze relatively large data sets. Our Bayesian supertree method was also very successful in obtaining better estimates of gene trees, by reducing the uncertainty in their distributions. In addition, our results show that under complex simulation scenarios, gene tree parsimony is also a competitive approach once we consider its speed, in contrast to more sophisticated models. © The Author(s) 2014. Published by Oxford University Press on behalf of the Society of Systematic Biologists.
Settepani, V; Schou, M F; Greve, M; Grinsted, L; Bechsgaard, J; Bilde, T
Across several animal taxa, the evolution of sociality involves a suite of characteristics, a "social syndrome," that includes cooperative breeding, reproductive skew, primary female-biased sex ratio, and the transition from outcrossing to inbreeding mating system, factors that are expected to reduce effective population size (Ne). This social syndrome may be favoured by short-term benefits but come with long-term costs, because the reduction in Ne amplifies loss of genetic diversity by genetic drift, ultimately restricting the potential of populations to respond to environmental change. To investigate the consequences of this social life form on genetic diversity, we used a comparative RAD-sequencing approach to estimate genomewide diversity in spider species that differ in level of sociality, reproductive skew and mating system. We analysed multiple populations of three independent sister-species pairs of social inbreeding and subsocial outcrossing Stegodyphus spiders, and a subsocial outgroup. Heterozygosity and within-population diversity were sixfold to 10-fold lower in social compared to subsocial species, and demographic modelling revealed a tenfold reduction in Ne of social populations. Species-wide genetic diversity depends on population divergence and the viability of genetic lineages. Population genomic patterns were consistent with high lineage turnover, which homogenizes the genetic structure that builds up between inbreeding populations, ultimately depleting genetic diversity at the species level. Indeed, species-wide genetic diversity of social species was 5-8 times lower than that of subsocial species. The repeated evolution of species with this social syndrome is associated with severe loss of genomewide diversity, likely to limit their evolutionary potential. © 2017 John Wiley & Sons Ltd.
Serres, Margrethe H.
The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the
Nepal, Madhav P; Andersen, Ethan J; Neupane, Surendra; Benson, Benjamin V
Disease resistance genes (R genes), as part of the plant defense system, have coevolved with corresponding pathogen molecules. The main objectives of this project were to identify non-Toll interleukin receptor, nucleotide-binding site, leucine-rich repeat (nTNL) genes and elucidate their evolutionary divergence across six plant genomes. Using reference sequences from Arabidopsis , we investigated nTNL orthologs in the genomes of common bean, Medicago , soybean, poplar, and rice. We used Hidden Markov Models for sequence identification, performed model-based phylogenetic analyses, visualized chromosomal positioning, inferred gene clustering, and assessed gene expression profiles. We analyzed 908 nTNL R genes in the genomes of the six plant species, and classified them into 12 subgroups based on the presence of coiled-coil (CC), nucleotide binding site (NBS), leucine rich repeat (LRR), resistance to Powdery mildew 8 (RPW8), and BED type zinc finger domains. Traditionally classified CC-NBS-LRR (CNL) genes were nested into four clades (CNL A-D) often with abundant, well-supported homogeneous subclades of Type-II R genes. CNL-D members were absent in rice, indicating a unique R gene retention pattern in the rice genome. Genomes from Arabidopsis , common bean, poplar and soybean had one chromosome without any CNL R genes. Medicago and Arabidopsis had the highest and lowest number of gene clusters, respectively. Gene expression analyses suggested unique patterns of expression for each of the CNL clades. Differential gene expression patterns of the nTNL genes were often found to correlate with number of introns and GC content, suggesting structural and functional divergence.
Zhang, Weipeng; Tian, Ren-Mao; Sun, Jin; Bougouffa, Salim; Ding, Wei; Cai, Lin; Lan, Yi; Tong, Haoya; Li, Yongxin; Jamieson, Alan J; Bajic, Vladimir B; Drazen, Jeffrey C; Bartlett, Douglas; Qian, Pei-Yuan
Amphipods are the dominant scavenging metazoan species in the Mariana Trench, the deepest known point in Earth's oceans. Here the gut microbiota of the amphipod Hirondellea gigas collected from the Challenger and Sirena Deeps of the Mariana Trench were investigated. The 11 amphipod individuals included for analyses were dominated by Psychromonas , of which a nearly complete genome was successfully recovered (designated CDP1). Compared with previously reported free-living Psychromonas strains, CDP1 has a highly reduced genome. Genome alignment showed deletion of the trimethylamine N -oxide (TMAO) reducing gene cluster in CDP1, suggesting that the "piezolyte" function of TMAO is more important than its function in respiration, which may lead to TMAO accumulation. In terms of nutrient utilization, the bacterium retains its central carbohydrate metabolism but lacks most of the extended carbohydrate utilization pathways, suggesting the confinement of Psychromonas to the host gut and sequestration from more variable environmental conditions. Moreover, CDP1 contains a complete formate hydrogenlyase complex, which might be involved in energy production. The genomic analyses imply that CDP1 may have developed adaptive strategies for a lifestyle within the gut of the hadal amphipod H. gigas. IMPORTANCE As a unique but poorly investigated habitat within marine ecosystems, hadal trenches have received interest in recent years. This study explores the gut microbial composition and function in hadal amphipods, which are among the dominant carrion feeders in hadal habitats. Further analyses of a dominant strain revealed genomic features that may contribute to its adaptation to the amphipod gut environment. Our findings provide new insights into animal-associated bacteria in the hadal biosphere.
Fang, Shu; Yukilevich, Roman; Chen, Ying; Turissini, David A; Zeng, Kai; Boussy, Ian A; Wu, Chung-I
The extent and nature of genetic incompatibilities between incipient races and sibling species is of fundamental importance to our view of speciation. However, with the exception of hybrid inviability and sterility factors, little is known about the extent of other, more subtle genetic incompatibilities between incipient species. Here we experimentally demonstrate the prevalence of such genetic incompatibilities between two young allopatric sibling species, Drosophila simulans and D. sechellia. Our experiments took advantage of 12 introgression lines that carried random introgressed D. sechellia segments in different parts of the D. simulans genome. First, we found that these introgression lines did not show any measurable sterility or inviability effects. To study if these sechellia introgressions in a simulans background contained other fitness consequences, we competed and genetically tracked the marked alleles within each introgression against the wild-type alleles for 20 generations. Strikingly, all marked D. sechellia introgression alleles rapidly decreased in frequency in only 6 to 7 generations. We then developed computer simulations to model our competition results. These simulations indicated that selection against D. sechellia introgression alleles was high (average s = 0.43) and that the marker alleles and the incompatible alleles did not separate in 78% of the introgressions. The latter result likely implies that most introgressions contain multiple genetic incompatibilities. Thus, this study reveals that, even at early stages of speciation, many parts of the genome diverge to a point where introducing foreign elements has detrimental fitness consequences, but which cannot be seen using standard sterility and inviability assays.
Szczecińska, Monika; Sawicki, Jakub
The European continent is presently colonized by nine species of the genus Pulsatilla, five of which are encountered only in mountainous regions of southwest and south-central Europe. The remaining four species inhabit lowlands in the north-central and eastern parts of the continent. Most plants of the genus Pulsatilla are rare and endangered, which is why most research efforts focused on their biology, ecology and hybridization. The objective of this study was to develop genomic resources, including complete plastid genomes and nuclear rRNA clusters, for three sympatric Pulsatilla species that are most commonly found in Central Europe. The results will supply valuable information about genetic variation, which can be used in the process of designing primers for population studies and conservation genetics research. The complete plastid genomes together with the nuclear rRNA cluster can serve as a useful tool in hybridization studies. Six complete plastid genomes and nuclear rRNA clusters were sequenced from three species of Pulsatilla using the Illumina sequencing technology. Four junctions between single copy regions and inverted repeats and junctions between the identified locally-collinear blocks (LCB) were confirmed by Sanger sequencing. Pulsatilla genomes of 120 unique genes had a total length of approximately 161-162 kb, and 21 were duplicated in the inverted repeats (IR) region. Comparative plastid genomes of newly-sequenced Pulsatilla and the previously-identified plastomes of Aconitum and Ranunculus species belonging to the family Ranunculaceae revealed several variations in the structure of the genome, but the gene content remained constant. The nuclear rRNA cluster (18S-ITS1-5.8S-ITS2-26S) of studied Pulsatilla species is 5795 bp long. Among five analyzed regions of the rRNA cluster, only Internal Transcribed Spacer 2 (ITS2) enabled the molecular delimitation of closely-related Pulsatilla patens and Pulsatilla vernalis. The determination of complete
O'Flynn, Ciaran; Deusch, Oliver; Darling, Aaron E; Eisen, Jonathan A; Wallis, Corrin; Davis, Ian J; Harris, Stephen J
Porphyromonads play an important role in human periodontal disease and recently have been shown to be highly prevalent in canine mouths. Porphyromonas cangingivalis is the most prevalent canine oral bacterial species in both plaque from healthy gingiva and plaque from dogs with early periodontitis. The ability of P. cangingivalis to flourish in the different environmental conditions characterized by these two states suggests a degree of metabolic flexibility. To characterize the genes responsible for this, the genomes of 32 isolates (including 18 newly sequenced and assembled) from 18 Porphyromonad species from dogs, humans, and other mammals were compared. Phylogenetic trees inferred using core genes largely matched previous findings; however, comparative genomic analysis identified several genes and pathways relating to heme synthesis that were present in P. cangingivalis but not in other Porphyromonads. Porphyromonas cangingivalis has a complete protoporphyrin IX synthesis pathway potentially allowing it to synthesize its own heme unlike pathogenic Porphyromonads such as Porphyromonas gingivalis that acquire heme predominantly from blood. Other pathway differences such as the ability to synthesize siroheme and vitamin B12 point to enhanced metabolic flexibility for P. cangingivalis, which may underlie its prevalence in the canine oral cavity. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.
Ventura, Marco; Canchaya, Carlos; Bernini, Valentina; Altermann, Eric; Barrangou, Rodolphe; McGrath, Stephen; Claesson, Marcus J.; Li, Yin; Leahy, Sinead; Walker, Carey D.; Zink, Ralf; Neviani, Erasmo; Steele, Jim; Broadbent, Jeff; Klaenhammer, Todd R.; Fitzgerald, Gerald F.; O'Toole, Paul W.; van Sinderen, Douwe
Lactobacillus gasseri ATCC 33323, Lactobacillus salivarius subsp. salivarius UCC 118, and Lactobacillus casei ATCC 334 contain one (LgaI), four (Sal1, Sal2, Sal3, Sal4), and one (Lca1) distinguishable prophage sequences, respectively. Sequence analysis revealed that LgaI, Lca1, Sal1, and Sal2 prophages belong to the group of Sfi11-like pac site and cos site Siphoviridae, respectively. Phylogenetic investigation of these newly described prophage sequences revealed that they have not followed an evolutionary development similar to that of their bacterial hosts and that they show a high degree of diversity, even within a species. The attachment sites were determined for all these prophage elements; LgaI as well as Sal1 integrates in tRNA genes, while prophage Sal2 integrates in a predicted arginino-succinate lyase-encoding gene. In contrast, Lca1 and the Sal3 and Sal4 prophage remnants are integrated in noncoding regions in the L. casei ATCC 334 and L. salivarius UCC 118 genomes. Northern analysis showed that large parts of the prophage genomes are transcriptionally silent and that transcription is limited to genome segments located near the attachment site. Finally, pulsed-field gel electrophoresis followed by Southern blot hybridization with specific prophage probes indicates that these prophage sequences are narrowly distributed within lactobacilli. PMID:16672450
Seo, Young-Su; Lim, Jae Yun; Park, Jungwook; Kim, Sunyoung; Lee, Hyun-Hee; Cheong, Hoon; Kim, Sang-Mok; Moon, Jae Sun; Hwang, Ingyu
In addition to human and animal diseases, bacteria of the genus Burkholderia can cause plant diseases. The representative species of rice-pathogenic Burkholderia are Burkholderia glumae, B. gladioli, and B. plantarii, which primarily cause grain rot, sheath rot, and seedling blight, respectively, resulting in severe reductions in rice production. Though Burkholderia rice pathogens cause problems in rice-growing countries, comprehensive studies of these rice-pathogenic species aiming to control Burkholderia-mediated diseases are only in the early stages. We first sequenced the complete genome of B. plantarii ATCC 43733T. Second, we conducted comparative analysis of the newly sequenced B. plantarii ATCC 43733T genome with eleven complete or draft genomes of B. glumae and B. gladioli strains. Furthermore, we compared the genome of three rice Burkholderia pathogens with those of other Burkholderia species such as those found in environmental habitats and those known as animal/human pathogens. These B. glumae, B. gladioli, and B. plantarii strains have unique genes involved in toxoflavin or tropolone toxin production and the clustered regularly interspaced short palindromic repeats (CRISPR)-mediated bacterial immune system. Although the genome of B. plantarii ATCC 43733T has many common features with those of B. glumae and B. gladioli, this B. plantarii strain has several unique features, including quorum sensing and CRISPR/CRISPR-associated protein (Cas) systems. The complete genome sequence of B. plantarii ATCC 43733T and publicly available genomes of B. glumae BGR1 and B. gladioli BSR3 enabled comprehensive comparative genome analyses among three rice-pathogenic Burkholderia species responsible for tissue rotting and seedling blight. Our results suggest that B. glumae has evolved rapidly, or has undergone rapid genome rearrangements or deletions, in response to the hosts. It also, clarifies the unique features of rice pathogenic Burkholderia species relative to other
Susanta K Behura
Full Text Available Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1 are components of developmental signaling pathways, 2 regulate fundamental developmental processes, 3 are critical for the development of tissues of vector importance, 4 function in developmental processes known to have diverged within insects, and 5 encode microRNAs (miRNAs that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.
Arquez, Moises; Uribe, Juan Esteban; Castro, Lyda Raquel
In this work we presented a comparative analysis of the mitochondrial genomes in gastropods. Nucleotide and amino acids composition was calculated and a comparative visual analysis of the start and termination codons was performed. The organization of the genome was compared calculating the number of intergenic sequences, the location of the genes and the number of reorganized genes (breakpoints) in comparison with the sequence that is presumed to be ancestral for the group. In order to calculate variations in the rates of molecular evolution within the group, the relative rate test was performed. In spite of the differences in the size of the genomes, the amino acids number is conserved. The nucleotide and amino acid composition is similar between Vetigastropoda, Ceanogastropoda and Neritimorpha in comparison to Heterobranchia and Patellogastropoda. The mitochondrial genomes of the group are very compact with few intergenic sequences, the only exception is the genome of Patellogastropoda with 26,828 bp. Start codons of the Heterobranchia and Patellogastropoda are very variable and there is also an increase in genome rearrangements for these two groups. Generally, the hypothesis of constant rates of molecular evolution between the groups is rejected, except when the genomes of Caenogastropoda and Vetigastropoda are compared.
Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A
The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).
Maeder, Dennis L.; Anderson, Iain; Brettin, Thomas S.; Bruce,David C.; Gilna, Paul; Han, Cliff S.; Lapidus, Alla; Metcalf, William W.; Saunders, Elizabeth; Tapia, Roxanne; Sowers, Kevin R.
We report here a comparative analysis of the genome sequence of Methanosarcina barkeri with those of Methanosarcina acetivorans and Methanosarcina mazei. All three genomes share a conserved double origin of replication and many gene clusters. M. barkeri is distinguished by having an organization that is well conserved with respect to the other Methanosarcinae in the region proximal to the origin of replication with interspecies gene similarities as high as 95%. However it is disordered and marked by increased transposase frequency and decreased gene synteny and gene density in the proximal semi-genome. Of the 3680 open reading frames in M. barkeri, 678 had paralogs with better than 80% similarity to both M. acetivorans and M. mazei while 128 nonhypothetical orfs were unique (non-paralogous) amongst these species including a complete formate dehydrogenase operon, two genes required for N-acetylmuramic acid synthesis, a 14 gene gas vesicle cluster and a bacterial P450-specific ferredoxin reductase cluster not previously observed or characterized in this genus. A cryptic 36 kbp plasmid sequence was detected in M. barkeri that contains an orc1 gene flanked by a presumptive origin of replication consisting of 38 tandem repeats of a 143 nt motif. Three-way comparison of these genomes reveals differing mechanisms for the accrual of changes. Elongation of the large M. acetivorans is the result of multiple gene-scale insertions and duplications uniformly distributed in that genome, while M. barkeri is characterized by localized inversions associated with the loss of gene content. In contrast, the relatively short M. mazei most closely approximates the ancestral organizational state.
Pocrnic, Ivan; Lourenco, Daniela A L; Masuda, Yutaka; Misztal, Ignacy
A genomic relationship matrix (GRM) can be inverted efficiently with the Algorithm for Proven and Young (APY) through recursion on a small number of core animals. The number of core animals is theoretically linked to effective population size (N e ). In a simulation study, the optimal number of core animals was equal to the number of largest eigenvalues of GRM that explained 98% of its variation. The purpose of this study was to find the optimal number of core animals and estimate N e for different species. Datasets included phenotypes, pedigrees, and genotypes for populations of Holstein, Jersey, and Angus cattle, pigs, and broiler chickens. The number of genotyped animals varied from 15,000 for broiler chickens to 77,000 for Holsteins, and the number of single-nucleotide polymorphisms used for genomic prediction varied from 37,000 to 61,000. Eigenvalue decomposition of the GRM for each population determined numbers of largest eigenvalues corresponding to 90, 95, 98, and 99% of variation. The number of eigenvalues corresponding to 90% (98%) of variation was 4527 (14,026) for Holstein, 3325 (11,500) for Jersey, 3654 (10,605) for Angus, 1239 (4103) for pig, and 1655 (4171) for broiler chicken. Each trait in each species was analyzed using the APY inverse of the GRM with randomly selected core animals, and their number was equal to the number of largest eigenvalues. Realized accuracies peaked with the number of core animals corresponding to 98% of variation for Holstein and Jersey and closer to 99% for other breed/species. N e was estimated based on comparisons of eigenvalue decomposition in a simulation study. Assuming a genome length of 30 Morgan, N e was equal to 149 for Holsteins, 101 for Jerseys, 113 for Angus, 32 for pigs, and 44 for broilers. Eigenvalue profiles of GRM for common species are similar to those in simulation studies although they are affected by number of genotyped animals and genotyping quality. For all investigated species, the APY required
Luisa Z Moreno
Full Text Available Leptospira kirschneri is one of the pathogenic species of the Leptospira genus. Human and animal infection from L. kirschneri gained further attention over the last few decades. Here we present the isolation and characterisation of Brazilian L. kirschneri serogroup Pomona serovar Mozdok strain M36/05 and the comparative genomic analysis with Brazilian human strain 61H. The M36/05 strain caused pulmonary hemorrhagic lesions in the hamster model, showing high virulence. The studied genomes presented high symmetrical identity and the in silico multilocus sequence typing analysis resulted in a new allelic profile (ST101 that so far has only been associated with the Brazilian L. kirschneri serogroup Pomona serovar Mozdok strains. Considering the environmental conditions and high genomic similarity observed between strains, we suggest the existence of a Brazilian L. kirschneri serogroup Pomona serovar Mozdok lineage that could represent a high public health risk; further studies are necessary to confirm the lineage significance and distribution.
Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y
In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.
Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G
The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.
Full Text Available Phytoremediation is an emerging technology that uses plants in order to cleanup pollutants including xenobiotics and heavy metals from soil, water and air. Inoculation of plants with plant growth promoting endophytic and rhizospheric bacteria can enhance efficiency of phytoremediation. Genomic analysis of four plant-associated strains belonging to the Stenotrophomonas maltophilia species revealed the presence of genes encoding proteins involved in plant growth promotion, biocontrol of phytopathogens, biodegradation of xenobiotics, heavy metals resistance and plant-bacteria-environment interaction. The results of this analysis suggest great potential of bacteria belonging to Stenotrophomonas maltophilia species in enhancing phytoremediation efficiency.
Full Text Available Genome size evolution is a complex process influenced by polyploidization, satellite DNA accumulation, and expansion of retroelements. How this process could be affected by different reproductive strategies is still poorly understood.We analyzed differences in the number and distribution of major repetitive DNA elements in two closely related species, Silene latifolia and S. vulgaris. Both species are diploid and possess the same chromosome number (2n = 24, but differ in their genome size and mode of reproduction. The dioecious S. latifolia (1C = 2.70 pg DNA possesses sex chromosomes and its genome is 2.5× larger than that of the gynodioecious S. vulgaris (1C = 1.13 pg DNA, which does not possess sex chromosomes. We discovered that the genome of S. latifolia is larger mainly due to the expansion of Ogre retrotransposons. Surprisingly, the centromeric STAR-C and TR1 tandem repeats were found to be more abundant in S. vulgaris, the species with the smaller genome. We further examined the distribution of major repetitive sequences in related species in the Caryophyllaceae family. The results of FISH (fluorescence in situ hybridization on mitotic chromosomes with the Retand element indicate that large rearrangements occurred during the evolution of the Caryophyllaceae family.Our data demonstrate that the evolution of genome size in the genus Silene is accompanied by the expansion of different repetitive elements with specific patterns in the dioecious species possessing the sex chromosomes.
Wang, Jing; Street, Nathaniel R; Scofield, Douglas G; Ingvarsson, Pär K
A central aim of evolutionary genomics is to identify the relative roles that various evolutionary forces have played in generating and shaping genetic variation within and among species. Here we use whole-genome resequencing data to characterize and compare genome-wide patterns of nucleotide polymorphism, site frequency spectrum, and population-scaled recombination rates in three species of Populus: Populus tremula, P. tremuloides, and P. trichocarpa. We find that P. tremuloides has the highest level of genome-wide variation, skewed allele frequencies, and population-scaled recombination rates, whereas P. trichocarpa harbors the lowest. Our findings highlight multiple lines of evidence suggesting that natural selection, due to both purifying and positive selection, has widely shaped patterns of nucleotide polymorphism at linked neutral sites in all three species. Differences in effective population sizes and rates of recombination largely explain the disparate magnitudes and signatures of linked selection that we observe among species. The present work provides the first phylogenetic comparative study on a genome-wide scale in forest trees. This information will also improve our ability to understand how various evolutionary forces have interacted to influence genome evolution among related species. Copyright © 2016 by the Genetics Society of America.
Arnold, Jason W; Simpson, Joshua B; Roach, Jeffrey; Kwintkiewicz, Jakub; Azcarate-Peril, M Andrea
Large-scale microbiome studies have established that most of the diversity contained in the gastrointestinal tract is represented at the strain level; however, exhaustive genomic and physiological characterization of human isolates is still lacking. With increased use of probiotics as interventions for gastrointestinal disorders, genomic and functional characterization of novel microorganisms becomes essential. In this study, we explored the impact of strain-level genomic variability on bacterial physiology of two novel human Lactobacillus rhamnosus strains (AMC143 and AMC010) of probiotic potential in relation to stress resistance. The strains showed differences with known probiotic strains ( L. rhamnosus GG, Lc705, and HN001) at the genomic level, including nucleotide polymorphisms, mutations in non-coding regulatory regions, and rearrangements of genomic architecture. Transcriptomics analysis revealed that gene expression profiles differed between strains when exposed to simulated gastrointestinal stresses, suggesting the presence of unique regulatory systems in each strain. In vitro physiological assays to test resistance to conditions mimicking the gut environment (acid, alkali, and bile stress) showed that growth of L. rhamnosus AMC143 was inhibited upon exposure to alkaline pH, while AMC010 and control strain LGG were unaffected. AMC143 also showed a significant survival advantage compared to the other strains upon bile exposure. Reverse transcription qPCR targeting the bile salt hydrolase gene ( bsh ) revealed that AMC143 expressed bsh poorly (a consequence of a deletion in the bsh promoter and truncation of bsh gene in AMC143), while AMC010 had significantly higher expression levels than AMC143 or LGG. Insertional inactivation of the bsh gene in AMC010 suggested that bsh could be detrimental to bacterial survival during bile stress. Together, these findings show that coupling of classical microbiology with functional genomics methods for the
Jason W. Arnold
Full Text Available Large-scale microbiome studies have established that most of the diversity contained in the gastrointestinal tract is represented at the strain level; however, exhaustive genomic and physiological characterization of human isolates is still lacking. With increased use of probiotics as interventions for gastrointestinal disorders, genomic and functional characterization of novel microorganisms becomes essential. In this study, we explored the impact of strain-level genomic variability on bacterial physiology of two novel human Lactobacillus rhamnosus strains (AMC143 and AMC010 of probiotic potential in relation to stress resistance. The strains showed differences with known probiotic strains (L. rhamnosus GG, Lc705, and HN001 at the genomic level, including nucleotide polymorphisms, mutations in non-coding regulatory regions, and rearrangements of genomic architecture. Transcriptomics analysis revealed that gene expression profiles differed between strains when exposed to simulated gastrointestinal stresses, suggesting the presence of unique regulatory systems in each strain. In vitro physiological assays to test resistance to conditions mimicking the gut environment (acid, alkali, and bile stress showed that growth of L. rhamnosus AMC143 was inhibited upon exposure to alkaline pH, while AMC010 and control strain LGG were unaffected. AMC143 also showed a significant survival advantage compared to the other strains upon bile exposure. Reverse transcription qPCR targeting the bile salt hydrolase gene (bsh revealed that AMC143 expressed bsh poorly (a consequence of a deletion in the bsh promoter and truncation of bsh gene in AMC143, while AMC010 had significantly higher expression levels than AMC143 or LGG. Insertional inactivation of the bsh gene in AMC010 suggested that bsh could be detrimental to bacterial survival during bile stress. Together, these findings show that coupling of classical microbiology with functional genomics methods for the
Jessica N Ricaldi
Full Text Available The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835 provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT. Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for
Full Text Available Salmonella enterica serovar Typhimurium, a gram-negative facultative rod-shaped bacterium causing salmonellosis and foodborne disease, is one of the most common isolated Salmonella serovars in both developed and developing nations. Several S. Typhimurium genomes have been completed and many more genome-sequencing projects are underway. Comparative genome analysis of the multiple strains leads to a better understanding of the evolution of S. Typhimurium and its pathogenesis. S. Typhimurium strain UK-1 (belongs to phage type 1 is highly virulent when orally administered to mice and chickens and efficiently colonizes lymphoid tissues of these species. These characteristics make this strain a good choice for use in vaccine development. In fact, UK-1 has been used as the parent strain for a number of nonrecombinant and recombinant vaccine strains, including several commercial vaccines for poultry. In this study, we conducted a thorough comparative genome analysis of the UK-1 strain with other S. Typhimurium strains and examined the phenotypic impact of several genomic differences. Whole genomic comparison highlights an extremely close relationship between the UK-1 strain and other S. Typhimurium strains; however, many interesting genetic and genomic variations specific to UK-1 were explored. In particular, the deletion of a UK-1-specific gene that is highly similar to the gene encoding the T3SS effector protein NleC exhibited a significant decrease in oral virulence in BALB/c mice. The complete genetic complements in UK-1, especially those elements that contribute to virulence or aid in determining the diversity within bacterial species, provide key information in evaluating the functional characterization of important genetic determinants and for development of vaccines.
Complete mitochondrial genomes and nuclear ribosomal RNA operons of two species of Diplostomum (Platyhelminthes: Trematoda): a molecular resource for taxonomy and molecular epidemiology of important fish pathogens.
Brabec, Jan; Kostadinova, Aneta; Scholz, Tomáš; Littlewood, D Timothy J
The genus Diplostomum (Platyhelminthes: Trematoda: Diplostomidae) is a diverse group of freshwater parasites with complex life-cycles and global distribution. The larval stages are important pathogens causing eye fluke disease implicated in substantial impacts on natural fish populations and losses in aquaculture. However, the problematic species delimitation and difficulties in the identification of larval stages hamper the assessment of the distributional and host ranges of Diplostomum spp. and their transmission ecology. Total genomic DNA was isolated from adult worms and shotgun sequenced using Illumina MiSeq technology. Mitochondrial (mt) genomes and nuclear ribosomal RNA (rRNA) operons were assembled using established bioinformatic tools and fully annotated. Mt protein-coding genes and nuclear rRNA genes were subjected to phylogenetic analysis by maximum likelihood and the resulting topologies compared. We characterised novel complete mt genomes and nuclear rRNA operons of two closely related species, Diplostomum spathaceum and D. pseudospathaceum. Comparative mt genome assessment revealed that the cox1 gene and its 'barcode' region used for molecular identification are the most conserved regions; instead, nad4 and nad5 genes were identified as most promising molecular diagnostic markers. Using the novel data, we provide the first genome wide estimation of the phylogenetic relationships of the order Diplostomida, one of the two fundamental lineages of the Digenea. Analyses of the mitogenomic data invariably recovered the Diplostomidae as a sister lineage of the order Plagiorchiida rather than as a basal lineage of the Diplostomida as inferred in rDNA phylogenies; this was concordant with the mt gene order of Diplostomum spp. exhibiting closer match to the conserved gene order of the Plagiorchiida. Complete sequences of the mt genome and rRNA operon of two species of Diplostomum provide a valuable resource for novel genetic markers for species delineation and
Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen
Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.
Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.
Wang, Daxi; Young, Neil D; Koehler, Anson V; Tan, Patrick; Sohn, Woon-Mok; Korhonen, Pasi K; Gasser, Robin B
Clonorchiasis is a neglected tropical disease that affects >35 million people mainly in China, Vietnam, South Korea and some parts of Russia. The disease-causing agent, Clonorchis sinensis, is a liver fluke of humans and other piscivorous animals, and has a complex aquatic life cycle involving snails and fish intermediate hosts. Chronic infection in humans causes liver disease and associated complications including malignant bile duct cancer. Central to control and to understanding the epidemiology of this disease is knowledge of the specific identity of the causative agent as well as genetic variation within and among populations of this parasite. Although most published molecular studies seem to suggest that C. sinensis represents a single species and that genetic variation within the species is limited, karyotypic variation within C. sinensis among China, Korea (2n=56) and Russian Far East (2n=14) suggests that this taxon might contain sibling species. Here, we assessed and applied a deep sequencing-bioinformatic approach to sequence and define a reference mitochondrial (mt) genome for a particular isolate of C. sinensis from Korea (Cs-k2), to confirm its specific identity, and compared this mt genome with homologous data sets available for this species. Comparative analyses revealed consistency in the number and structure of genes as well as in the lengths of protein-coding genes, and limited genetic variation among isolates of C. sinensis. Phylogenetic analyses of amino acid sequences predicted from mt genes showed that representatives of C. sinensis clustered together, with absolute nodal support, to the exclusion of other liver fluke representatives, but sub-structuring within C. sinensis was not well supported. The plan now is to proceed with the sequencing, assembly and annotation of a high quality draft nuclear genome of this defined isolate (Cs-k2) as a basis for a detailed investigation of molecular variation within C. sinensis from disparate
Zheng, Wenning; Mutha, Naresh V R; Heydari, Hamed; Dutta, Avirup; Siow, Cheuk Chuen; Jakubovics, Nicholas S; Wee, Wei Yee; Tan, Shi Yang; Ang, Mia Yang; Wong, Guat Jah; Choo, Siew Woh
Leonard, Guy; Stevens, Jamie R.; Richards, Thomas A.
The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment file, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree files (with a user-defined combination of species name and/or database accession number). Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file) and generation of species and accession number lists for use in supplementary materials or figure legends. PMID:19812722
Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment ﬁ le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree ﬁ les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report ﬁle and generation of species and accession number lists for use in supplementary materials or figure legends.
Full Text Available Although great progress in genome-wide association studies (GWAS has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked. The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001
Full Text Available Abstract Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1 pairwise whole genome alignments, (2 multiple whole genome alignments and (3 gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Severin, Jessica; Beal, Kathryn; Vilella, Albert J; Fitzgerald, Stephen; Schuster, Michael; Gordon, Leo; Ureta-Vidal, Abel; Flicek, Paul; Herrero, Javier
The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1) pairwise whole genome alignments, (2) multiple whole genome alignments and (3) gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.
Full Text Available Willow is a widely used dioecious woody plant of Salicaceae family in China. Due to their high biomass yields, willows are promising sources for bioenergy crops. In this study, we assembled the complete mitochondrial (mt genome sequence of S. suchowensis with the length of 644,437 bp using Roche-454 GS FLX Titanium sequencing technologies. Base composition of the S. suchowensis mt genome is A (27.43%, T (27.59%, C (22.34%, and G (22.64%, which shows a prevalent GC content with that of other angiosperms. This long circular mt genome encodes 58 unique genes (32 protein-coding genes, 23 tRNA genes and 3 rRNA genes, and 9 of the 32 protein-coding genes contain 17 introns. Through the phylogenetic analysis of 35 species based on 23 protein-coding genes, it is supported that Salix as a sister to Populus. With the detailed phylogenetic information and the identification of phylogenetic position, some ribosomal protein genes and succinate dehydrogenase genes are found usually lost during evolution. As a native shrub willow species, this worthwhile research of S. suchowensis mt genome will provide more desirable information for better understanding the genomic breeding and missing pieces of sex determination evolution in the future.
Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system a