WorldWideScience

Sample records for analysis full genome

  1. Genomic analysis identifies masqueraders of full-term cerebral palsy.

    Science.gov (United States)

    Takezawa, Yusuke; Kikuchi, Atsuo; Haginoya, Kazuhiro; Niihori, Tetsuya; Numata-Uematsu, Yurika; Inui, Takehiko; Yamamura-Suzuki, Saeko; Miyabayashi, Takuya; Anzai, Mai; Suzuki-Muromoto, Sato; Okubo, Yukimune; Endo, Wakaba; Togashi, Noriko; Kobayashi, Yasuko; Onuma, Akira; Funayama, Ryo; Shirota, Matsuyuki; Nakayama, Keiko; Aoki, Yoko; Kure, Shigeo

    2018-05-01

    Cerebral palsy is a common, heterogeneous neurodevelopmental disorder that causes movement and postural disabilities. Recent studies have suggested genetic diseases can be misdiagnosed as cerebral palsy. We hypothesized that two simple criteria, that is, full-term births and nonspecific brain MRI findings, are keys to extracting masqueraders among cerebral palsy cases due to the following: (1) preterm infants are susceptible to multiple environmental factors and therefore demonstrate an increased risk of cerebral palsy and (2) brain MRI assessment is essential for excluding environmental causes and other particular disorders. A total of 107 patients-all full-term births-without specific findings on brain MRI were identified among 897 patients diagnosed with cerebral palsy who were followed at our center. DNA samples were available for 17 of the 107 cases for trio whole-exome sequencing and array comparative genomic hybridization. We prioritized variants in genes known to be relevant in neurodevelopmental diseases and evaluated their pathogenicity according to the American College of Medical Genetics guidelines. Pathogenic/likely pathogenic candidate variants were identified in 9 of 17 cases (52.9%) within eight genes: CTNNB1 , CYP2U1 , SPAST , GNAO1 , CACNA1A , AMPD2 , STXBP1 , and SCN2A . Five identified variants had previously been reported. No pathogenic copy number variations were identified. The AMPD2 missense variant and the splice-site variants in CTNNB1 and AMPD2 were validated by in vitro functional experiments. The high rate of detecting causative genetic variants (52.9%) suggests that patients diagnosed with cerebral palsy in full-term births without specific MRI findings may include genetic diseases masquerading as cerebral palsy.

  2. Full-length genomic analysis of korean porcine sapelovirus strains

    DEFF Research Database (Denmark)

    Son, Kyu-Yeol; Kim, Deok-Song; Kwon, Joseph

    2014-01-01

    the typical picornavirus genome organization; 5'untranslated region (UTR)-L-VP4-VP2-VP3-VP1-2A-2B-2C-3A-3B-3C-3D-3'UTR. Three distinct cis-active RNA elements, the internal ribosome entry site (IRES) in the 5'UTR, a cis-replication element (CRE) in the 2C coding region and 3'UTR were identified...... and their structures were predicted. Interestingly, the structural features of the CRE and 3'UTR were different between PSV strains. The availability of these first complete genome sequences for PSV strains will facilitate future investigations of the molecular pathogenesis and evolutionary characteristics of PSV....

  3. Comparative analysis of full genomic sequences among different genotypes of dengue virus type 3

    Directory of Open Access Journals (Sweden)

    Lin Ting-Hsiang

    2008-05-01

    Full Text Available Abstract Background Although the previous study demonstrated the envelope protein of dengue viruses is under purifying selection pressure, little is known about the genetic differences of full-length viral genomes of DENV-3. In our study, complete genomic sequencing of DENV-3 strains collected from different geographical locations and isolation years were determined and the sequence diversity as well as selection pressure sites in the DENV genome other than within the E gene were also analyzed. Results Using maximum likelihood and Bayesian approaches, our phylogenetic analysis revealed that the Taiwan's indigenous DENV-3 isolated from 1994 and 1998 dengue/DHF epidemics and one 1999 sporadic case were of the three different genotypes – I, II, and III, each associated with DENV-3 circulating in Indonesia, Thailand and Sri Lanka, respectively. Sequence diversity and selection pressure of different genomic regions among DENV-3 different genotypes was further examined to understand the global DENV-3 evolution. The highest nucleotide sequence diversity among the fully sequenced DENV-3 strains was found in the nonstructural protein 2A (mean ± SD: 5.84 ± 0.54 and envelope protein gene regions (mean ± SD: 5.04 ± 0.32. Further analysis found that positive selection pressure of DENV-3 may occur in the non-structural protein 1 gene region and the positive selection site was detected at position 178 of the NS1 gene. Conclusion Our study confirmed that the envelope protein is under purifying selection pressure although it presented higher sequence diversity. The detection of positive selection pressure in the non-structural protein along genotype II indicated that DENV-3 originated from Southeast Asia needs to monitor the emergence of DENV strains with epidemic potential for better epidemic prevention and vaccine development.

  4. Full genome analysis of enterovirus D-68 strains circulating in Alberta, Canada.

    Science.gov (United States)

    Pabbaraju, Kanti; Wong, Sallene; Drews, Steven J; Tipples, Graham; Tellier, Raymond

    2016-07-01

    A widespread outbreak of enterovirus (EV)-D68 that started in the summer of 2014 has been reported in the USA and Canada. During the course of this outbreak, EV-D68 was identified as a possible cause of acute, unexplained severe respiratory illness and a temporal association was observed between acute flaccid paralysis with anterior myelitis and EV-D68 detection in the upper respiratory tract. In this study, four nasopharyngeal samples collected from patients in Alberta, Canada with a laboratory diagnosis of EV-D68 were used to determine the near full-length genome sequence directly from the specimens. Phylogenetic analysis was performed to study the genotypes and pathogenesis of the circulating strains. Our results support the contention that mutations in the VP1 gene and other regions of the genome causing altered antigenicity, as well as lack of immunity in the younger population, may be responsible for the increased severe respiratory disease outbreaks of EV-D68 worldwide. © 2015 Wiley Periodicals, Inc.

  5. Full genome analysis of a novel adenovirus from the South Polar skua (Catharacta maccormicki) in Antarctica.

    Science.gov (United States)

    Park, Yon Mi; Kim, Jeong-Hoon; Gu, Se Hun; Lee, Sook Young; Lee, Min-Goo; Kang, Yoon Kyoo; Kang, Sung-Ho; Kim, Hak Jun; Song, Jin-Won

    2012-01-05

    Adenoviruses have been identified in humans and a wide range of vertebrate animals, but not previously from the polar region. Here, we report the entire 26,340-bp genome of a novel adenovirus, detected by PCR, in tissues of six of nine South Polar skuas (Catharacta maccormicki), collected in Lake King Sejong, King George Island, Antarctica, from 2007 to 2009. The DNA polymerase, penton base, hexon and fiber genes of the South Polar skua adenovirus (SPSAdV) exhibited 68.3%, 75.4%, 74.9% and 48.0% nucleotide sequence similarity with their counterparts in turkey hemorrhagic enteritis virus. Phylogenetic analysis based on the entire genome revealed that SPSAdV belonged to the genus Siadenovirus, family Adenoviridae. This is the first evidence of a novel adenovirus, SPSAdV, from a large polar seabird (family Stercorariidae) in Antarctica. Copyright © 2011 Elsevier Inc. All rights reserved.

  6. Full-genome analysis of a canine pneumovirus causing acute respiratory disease in dogs, Italy.

    Directory of Open Access Journals (Sweden)

    Nicola Decaro

    Full Text Available An outbreak of canine infectious respiratory disease (CIRD associated to canine pneumovirus (CnPnV infection is reported. The outbreak occurred in a shelter of the Apulia region and involved 37 out of 350 dogs that displayed cough and/or nasal discharge with no evidence of fever. The full-genomic characterisation showed that the causative agent (strain Bari/100-12 was closely related to CnPnVs that have been recently isolated in the USA, as well as to murine pneumovirus, which is responsible for respiratory disease in mice. The present study represents a useful contribution to the knowledge of the pathogenic potential of CnPnV and its association with CIRD in dogs. Further studies will elucidate the pathogenicity and epidemiology of this novel pneumovirus, thus addressing the eventual need for specific vaccines.

  7. Full-Genome Analysis of Avian Influenza A(H5N1) Virus from a Human, North America, 2013

    Science.gov (United States)

    Pabbaraju, Kanti; Tellier, Raymond; Wong, Sallene; Li, Yan; Bastien, Nathalie; Tang, Julian W.; Drews, Steven J.; Jang, Yunho; Davis, C. Todd; Tipples, Graham A.

    2014-01-01

    Full-genome analysis was conducted on the first isolate of a highly pathogenic avian influenza A(H5N1) virus from a human in North America. The virus has a hemagglutinin gene of clade 2.3.2.1c and is a reassortant with an H9N2 subtype lineage polymerase basic 2 gene. No mutations conferring resistance to adamantanes or neuraminidase inhibitors were found. PMID:24755439

  8. A high HIV-1 strain variability in London, UK, revealed by full-genome analysis: Results from the ICONIC project

    Science.gov (United States)

    Frampton, Dan; Gallo Cassarino, Tiziano; Raffle, Jade; Hubb, Jonathan; Ferns, R. Bridget; Waters, Laura; Tong, C. Y. William; Kozlakidis, Zisis; Hayward, Andrew; Kellam, Paul; Pillay, Deenan; Clark, Duncan; Nastouli, Eleni; Leigh Brown, Andrew J.

    2018-01-01

    The initial analysis of genome sequences detected substantial hidden variability in the London HIV epidemic. Analysing full genome sequences, as opposed to only PR+RT, identified previously undetected recombinants. It provided a more reliable description of CRFs (that would be otherwise misclassified) and transmission clusters. PMID:29389981

  9. Comparative analysis of the full genome of Helicobacter pylori isolate Sahul64 identifies genes of high divergence.

    Science.gov (United States)

    Lu, Wei; Wise, Michael J; Tay, Chin Yen; Windsor, Helen M; Marshall, Barry J; Peacock, Christopher; Perkins, Tim

    2014-03-01

    Isolates of Helicobacter pylori can be classified phylogeographically. High genetic diversity and rapid microevolution are a hallmark of H. pylori genomes, a phenomenon that is proposed to play a functional role in persistence and colonization of diverse human populations. To provide further genomic evidence in the lineage of H. pylori and to further characterize diverse strains of this pathogen in different human populations, we report the finished genome sequence of Sahul64, an H. pylori strain isolated from an indigenous Australian. Our analysis identified genes that were highly divergent compared to the 38 publically available genomes, which include genes involved in the biosynthesis and modification of lipopolysaccharide, putative prophage genes, restriction modification components, and hypothetical genes. Furthermore, the virulence-associated vacA locus is a pseudogene and the cag pathogenicity island (cagPAI) is not present. However, the genome does contain a gene cluster associated with pathogenicity, including dupA. Our analysis found that with the addition of Sahul64 to the 38 genomes, the core genome content of H. pylori is reduced by approximately 14% (∼170 genes) and the pan-genome has expanded from 2,070 to 2,238 genes. We have identified three putative horizontally acquired regions, including one that is likely to have been acquired from the closely related Helicobacter cetorum prior to speciation. Our results suggest that Sahul64, with the absence of cagPAI, highly divergent cell envelope proteins, and a predicted nontransportable VacA protein, could be more highly adapted to ancient indigenous Australian people but with lower virulence potential compared to other sequenced and cagPAI-positive H. pylori strains.

  10. Full closure strategic analysis.

    Science.gov (United States)

    2014-07-01

    The full closure strategic analysis was conducted to create a decision process whereby full roadway : closures for construction and maintenance activities can be evaluated and approved or denied by CDOT : Traffic personnel. The study reviewed current...

  11. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum cultivar Micro-Tom, a reference system for the Solanaceae genomics

    Directory of Open Access Journals (Sweden)

    Kikuchi Mari

    2010-03-01

    Full Text Available Abstract Background The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. Results To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706 was estimated to be 0.061%. Conclusion The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the

  12. Large-scale analysis of full-length cDNAs from the tomato (Solanum lycopersicum) cultivar Micro-Tom, a reference system for the Solanaceae genomics.

    Science.gov (United States)

    Aoki, Koh; Yano, Kentaro; Suzuki, Ayako; Kawamura, Shingo; Sakurai, Nozomu; Suda, Kunihiro; Kurabayashi, Atsushi; Suzuki, Tatsuya; Tsugane, Taneaki; Watanabe, Manabu; Ooga, Kazuhide; Torii, Maiko; Narita, Takanori; Shin-I, Tadasu; Kohara, Yuji; Yamamoto, Naoki; Takahashi, Hideki; Watanabe, Yuichiro; Egusa, Mayumi; Kodama, Motoichiro; Ichinose, Yuki; Kikuchi, Mari; Fukushima, Sumire; Okabe, Akiko; Arie, Tsutomu; Sato, Yuko; Yazawa, Katsumi; Satoh, Shinobu; Omura, Toshikazu; Ezura, Hiroshi; Shibata, Daisuke

    2010-03-30

    The Solanaceae family includes several economically important vegetable crops. The tomato (Solanum lycopersicum) is regarded as a model plant of the Solanaceae family. Recently, a number of tomato resources have been developed in parallel with the ongoing tomato genome sequencing project. In particular, a miniature cultivar, Micro-Tom, is regarded as a model system in tomato genomics, and a number of genomics resources in the Micro-Tom-background, such as ESTs and mutagenized lines, have been established by an international alliance. To accelerate the progress in tomato genomics, we developed a collection of fully-sequenced 13,227 Micro-Tom full-length cDNAs. By checking redundant sequences, coding sequences, and chimeric sequences, a set of 11,502 non-redundant full-length cDNAs (nrFLcDNAs) was generated. Analysis of untranslated regions demonstrated that tomato has longer 5'- and 3'-untranslated regions than most other plants but rice. Classification of functions of proteins predicted from the coding sequences demonstrated that nrFLcDNAs covered a broad range of functions. A comparison of nrFLcDNAs with genes of sixteen plants facilitated the identification of tomato genes that are not found in other plants, most of which did not have known protein domains. Mapping of the nrFLcDNAs onto currently available tomato genome sequences facilitated prediction of exon-intron structure. Introns of tomato genes were longer than those of Arabidopsis and rice. According to a comparison of exon sequences between the nrFLcDNAs and the tomato genome sequences, the frequency of nucleotide mismatch in exons between Micro-Tom and the genome-sequencing cultivar (Heinz 1706) was estimated to be 0.061%. The collection of Micro-Tom nrFLcDNAs generated in this study will serve as a valuable genomic tool for plant biologists to bridge the gap between basic and applied studies. The nrFLcDNA sequences will help annotation of the tomato whole-genome sequence and aid in tomato functional

  13. Full-length genome sequence analysis of four subgroup J avian leukosis virus strains isolated from chickens with clinical hemangioma.

    Science.gov (United States)

    Lin, Lulu; Wang, Peikun; Yang, Yongli; Li, Haijuan; Huang, Teng; Wei, Ping

    2017-12-01

    Since 2014, cases of hemangioma associated with avian leukosis virus subgroup J (ALV-J) have been emerging in commercial chickens in Guangxi. In this study, four strains of the subgroup J avian leukosis virus (ALV-J), named GX14HG01, GX14HG04, GX14LT07, and GX14ZS14, were isolated from chickens with clinical hemangioma in 2014 by DF-1 cell culture and then identified with ELISA detection of ALV group specific antigen p27, the detection of subtype specific PCR and indirect immunofluorescence assay (IFA) with ALV-J specific monoclonal antibody. The complete genomes of the isolates were sequenced and it was found that the gag and pol were relatively conservative, while env was variable especially the gp85 gene. Homology analysis of the env gene sequences showed that the env gene of all the four isolates had higher similarities with the hemangioma (HE)-type reference strains than that of the myeloid leukosis (ML)-type strains, and moreover, the HE-type strains' specific deletion of 205-bp sequence covering the rTM and DR1 in 3'UTR fragment was also found in the four isolates. Further analysis on the sequences of subunits of env gene revealed an interesting finding: the gp85 of isolates GX14ZS14 and GX14HG04 had a higher similarity with HPRS-103 and much lower similarity with the HE-type reference strains resulting in GX14ZS14, GX14HG04, and HPRS-103 being clustered in the same branch, while gp37 had higher similarities with the HE-type reference strains when compared to that of HPRS-103, resulted in GX14ZS14, GX14HG04, and HE-type reference strains being clustered in the same branch. The results suggested that isolates GX14ZS14 and GX14HG04 may be the recombinant strains of the foreign strain HPRS-103 with the local epidemic HE-type strains of ALV-J.

  14. Full genome analysis of rotavirus G9P[8] strains identified in acute gastroenteritis cases reveals genetic diversity: Pune, western India.

    Science.gov (United States)

    Tatte, Vaishali S; Chaphekar, Deepa; Gopalkrishna, Varanasi

    2017-08-01

    Group A rotaviruses (RVA) are the major enteric etiological agents of severe acute gastroenteritis among children globally. As G9 RVA now represents as one of the major human RVA genotypes, studies on full genome of this particular genotype are being carried out worldwide. So far, no such studies on G9P[8] RVAs have been reported from Pune, western part of India. Keeping in view of this, the study was undertaken to understand the degree of genetic diversity of the commonly circulating G9P[8] RVA strains. Rotavirus surveillance studies carried out earlier during the years 2009-2011 showed increase in the prevalence of G9P[8] RVAs. Representative G9P[8] RVA strains from the years 2009, 2010, and 2011 were selected for the study. In general, all the G9 RVA strains showed clustering in the globally circulating sublineage of the VP7 gene and showed nucleotide/amino acid identities of 96.8-99.7%/96.9-99.8% with global G9 RV strains. Full genome analysis, of all three RVAs in this study indicated Wa-like genotype constellation G9-P[8]-I1-R1-C1-M1-A1-N1-T1-E1-H1. Within the strains nucleotide/amino acid divergence of 0.1-3.4%/0.0-4.1% was noted in all the RVA structural and non-structural genes. In conclusion, the present study highlights intra-genotypic variations throughout the RVA genome. The study further emphasizes the need for surveillance and analysis of the whole genomic constellation of the commonly circulating RVA strains of other regions in the country for understanding to a greater degree of the impact of rotavirus vaccination recently introduced in India. © 2017 Wiley Periodicals, Inc.

  15. Full Genomic Characterization of a Saffold Virus Isolated in Peru

    Directory of Open Access Journals (Sweden)

    Mariana Leguia

    2015-11-01

    Full Text Available While studying respiratory infections of unknown etiology we detected Saffold virus in an oropharyngeal swab collected from a two-year-old female suffering from diarrhea and respiratory illness. The full viral genome recovered by deep sequencing showed 98% identity to a previously described Saffold strain isolated in Japan. Phylogenetic analysis confirmed the Peruvian Saffold strain belongs to genotype 3 and is most closely related to strains that have circulated in Asia. This is the first documented case report of Saffold virus in Peru and the only complete genomic characterization of a Saffold-3 isolate from the Americas.

  16. Genome based analysis of a novel Chloroflexi in full-scale anaerobic digesters treating waste activated sludge

    DEFF Research Database (Denmark)

    McIlroy, Simon Jon; Kirkegaard, Rasmus Hansen; Albertsen, Mads

    Key to optimised design and operation of full-scale anaerobic digesters is an understanding of the organisms responsible. As one of the most abundant phyla in these systems, the Chloroflexi likely make a substantial contribute to system function. Here we apply state-of-the-art molecular methods t...

  17. Full-length genome analysis of Čalovo strains of Batai orthobunyavirus (Bunyamwera serogroup): Implications to taxonomy

    Czech Academy of Sciences Publication Activity Database

    Dufková, L.; Pachler, K.; Kilian, Patrik; Chrudimský, Tomáš; Danielová, V.; Růžek, Daniel; Nowotny, N.

    2014-01-01

    Roč. 27, 2014-Oct (2014), s. 96-104 ISSN 1567-1348 R&D Projects: GA ČR GAP502/11/2116; GA ČR(CZ) GA14-29256S Institutional support: RVO:60077344 Keywords : Batai virus * Orthobunyavirus * Phylogenetic analysis * Calovo virus Subject RIV: EE - Microbiology, Virology Impact factor: 3.015, year: 2014

  18. Generation of recombinant pestiviruses using a full genome amplification strategy

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Reimann, Ilona; Uttenthal, Åse

    Aim Complete genome amplification of viral RNA provides a new tool for generation of modified pestiviruses. We have recently reported a full genome amplification strategy for direct recovery of infectious pestivirus (Rasmussen et al., 2008). This comprised rescue of BDV strain “Gifhorn” from a full......-length RT-PCR amplicon demonstrating that long RT-PCR can be used for direct generation of an infectious pestivirus. The strategy is not limited to amplification of BDV “Gifhorn”, but can be further utilized for amplification of a diverse selection of pestivirus strains and for the generation of modified...... was reverse transcribed to cDNA at 50C for 90 minutes using SuperScript III reverse transcriptase (Invitrogen). Full-length PCR amplification was performed using primers specific for the extreme 5’- and 3’-ends of the viral genomes. A T7 promoter was incorporated in the 5’-primers for direct in vitro...

  19. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system.

    Science.gov (United States)

    Speth, Daan R; In 't Zandt, Michiel H; Guerrero-Cruz, Simon; Dutilh, Bas E; Jetten, Mike S M

    2016-03-31

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete draft genomes that together represent the majority of the microbial community. We assign these genomes to distinct anaerobic and aerobic microbial communities. In the aerobic community, nitrifying organisms and heterotrophs predominate. In the anaerobic community, widespread potential for partial denitrification suggests a nitrite loop increases treatment efficiency. Of our genomes, 19 have no previously cultivated or sequenced close relatives and six belong to bacterial phyla without any cultivated members, including the most complete Omnitrophica (formerly OP3) genome to date.

  20. Phylogenetic analysis of Puumala virus strains from Central Europe highlights the need for a full-genome perspective on hantavirus evolution.

    Science.gov (United States)

    Szabó, Róbert; Radosa, Lukáš; Ličková, Martina; Sláviková, Monika; Heroldová, Marta; Stanko, Michal; Pejčoch, Milan; Osterberg, Anja; Laenen, Lies; Schex, Susanne; Ulrich, Rainer G; Essbauer, Sandra; Maes, Piet; Klempa, Boris

    2017-12-01

    Puumala virus (PUUV), carried by bank voles (Myodes glareolus), is the medically most important hantavirus in Central and Western Europe. In this study, a total of 523 bank voles (408 from Germany, 72 from Slovakia, and 43 from Czech Republic) collected between the years 2007-2012 were analyzed for the presence of hantavirus RNA. Partial PUUV genome segment sequences were obtained from 51 voles. Phylogenetic analyses of all three genome segments showed that the newfound strains cluster with other Central and Western European PUUV strains. The new sequences from Šumava (Bohemian Forest), Czech Republic, are most closely related to the strains from the neighboring Bavarian Forest, a known hantavirus disease outbreak region. Interestingly, the Slovak strains clustered with the sequences from Bohemian and Bavarian Forests only in the M but not S segment analyses. This well-supported topological incongruence suggests a segment reassortment event or, as we analyzed only partial sequences, homologous recombination. Our data highlight the necessity of sequencing all three hantavirus genome segments and of a broader bank vole screening not only in recognized endemic foci but also in regions with no reported human hantavirus disease cases.

  1. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    OpenAIRE

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is used to seed reactors in wastewater treatment plants around the world; however, the role of most of its microbial community in ammonium removal remains unknown. Our analysis yielded 23 near-complete d...

  2. Full-length genome sequences of porcine epidemic diarrhoea virus strain CV777; Use of NGS to analyse genomic and sub-genomic RNAs

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Boniotti, Maria Beatrice; Papetti, Alice

    2018-01-01

    Porcine epidemic diarrhoea virus, strain CV777, was initially characterized in 1978 as the causative agent of a disease first identified in the UK in 1971. This coronavirus has been widely distributed among laboratories and has been passaged both within pigs and in cell culture. To determine...... the variability between different stocks of the PEDV strain CV777, sequencing of the full-length genome (ca. 28kb) has been performed in 6 different laboratories, using different protocols. Not surprisingly, each of the different full genome sequences were distinct from each other and from the reference sequence...... the analysis of sub-genomic mRNAs from infected cells. It is clearly important to know the features of the specific sample of CV777 being used for experimental studies....

  3. Comparative analysis of the full genome sequence of European bat lyssavirus type 1 and type 2 with other lyssaviruses and evidence for a conserved transcription termination and polyadenylation motif in the G-L 3' non-translated region.

    Science.gov (United States)

    Marston, D A; McElhinney, L M; Johnson, N; Müller, T; Conzelmann, K K; Tordo, N; Fooks, A R

    2007-04-01

    We report the first full-length genomic sequences for European bat lyssavirus type-1 (EBLV-1) and type-2 (EBLV-2). The EBLV-1 genomic sequence was derived from a virus isolated from a serotine bat in Hamburg, Germany, in 1968 and the EBLV-2 sequence was derived from a virus isolate from a human case of rabies that occurred in Scotland in 2002. A long-distance PCR strategy was used to amplify the open reading frames (ORFs), followed by standard and modified RACE (rapid amplification of cDNA ends) techniques to amplify the 3' and 5' ends. The lengths of each complete viral genome for EBLV-1 and EBLV-2 were 11 966 and 11 930 base pairs, respectively, and follow the standard rhabdovirus genome organization of five viral proteins. Comparison with other lyssavirus sequences demonstrates variation in degrees of homology, with the genomic termini showing a high degree of complementarity. The nucleoprotein was the most conserved, both intra- and intergenotypically, followed by the polymerase (L), matrix and glyco- proteins, with the phosphoprotein being the most variable. In addition, we have shown that the two EBLVs utilize a conserved transcription termination and polyadenylation (TTP) motif, approximately 50 nt upstream of the L gene start codon. All available lyssavirus sequences to date, with the exception of Pasteur virus (PV) and PV-derived isolates, use the second TTP site. This observation may explain differences in pathogenicity between lyssavirus strains, dependent on the length of the untranslated region, which might affect transcriptional activity and RNA stability.

  4. Comparative Genome Analysis and Genome Evolution

    NARCIS (Netherlands)

    Snel, Berend

    2002-01-01

    This thesis described a collection of bioinformatic analyses on complete genome sequence data. We have studied the evolution of gene content and find that vertical inheritance dominates over horizontal gene trasnfer, even to the extent that we can use the gene content to make genome phylogenies.

  5. Genome-based microbial ecology of anammox granules in a full-scale wastewater treatment system

    NARCIS (Netherlands)

    Speth, D.R.; Zandt, M.H. in 't; Guerrero Cruz, S.; Dutilh, B.E.; Jetten, M.S.M.

    2016-01-01

    Partial-nitritation anammox (PNA) is a novel wastewater treatment procedure for energy-efficient ammonium removal. Here we use genome-resolved metagenomics to build a genome-based ecological model of the microbial community in a full-scale PNA reactor. Sludge from the bioreactor examined here is

  6. Resolution analysis in full waveform inversion

    NARCIS (Netherlands)

    Fichtner, A.; Trampert, J.

    2011-01-01

    We propose a new method for the quantitative resolution analysis in full seismic waveform inversion that overcomes the limitations of classical synthetic inversions while being computationally more efficient and applicable to any misfit measure. The method rests on (1) the local quadratic

  7. Evidence for a Complex Mosaic Genome Pattern in a Full-length Hepatitis C Virus Sequence

    Directory of Open Access Journals (Sweden)

    R.S. Ross

    2008-01-01

    Full Text Available The genome of the hepatitis C virus (HCV exhibits a high genetic variability. This remarkable heterogeneity is mainly attributed to the gradual accumulation of mutational changes, whereas the contribution of recombination events to the evolution of HCV remains controversial so far. While performing phylogenetic analyses including a large number of sequences deposited in the GenBank, we encountered a full-length HCV sequence (AY651061 that showed evidence for inter-subtype recombination and was, therefore, subjected to a detailed analysis of its molecular structure. The obtained results indicated that AY651061 does not represent a “simple” HCV 1c isolate, but a complex 1a/1c mosaic genome, showing five putative breakpoints in the core to NS3 regions. To our knowledge, this is the first report on a mosaic HCV full- length sequence with multiple breakpoints. The molecular structure of AY651061 is reminiscent of complex homologous recombinant variants occurring among other members of the flaviviridae family, e.g. GB virus C, dengue virus, and Japanese encephalitis virus. Our finding of a mosaic HCV sequence may have important implications for many fields of current HCV research which merit careful consideration.

  8. Improvement of genome assembly completeness and identification of novel full-length protein-coding genes by RNA-seq in the giant panda genome.

    Science.gov (United States)

    Chen, Meili; Hu, Yibo; Liu, Jingxing; Wu, Qi; Zhang, Chenglin; Yu, Jun; Xiao, Jingfa; Wei, Fuwen; Wu, Jiayan

    2015-12-11

    High-quality and complete gene models are the basis of whole genome analyses. The giant panda (Ailuropoda melanoleuca) genome was the first genome sequenced on the basis of solely short reads, but the genome annotation had lacked the support of transcriptomic evidence. In this study, we applied RNA-seq to globally improve the genome assembly completeness and to detect novel expressed transcripts in 12 tissues from giant pandas, by using a transcriptome reconstruction strategy that combined reference-based and de novo methods. Several aspects of genome assembly completeness in the transcribed regions were effectively improved by the de novo assembled transcripts, including genome scaffolding, the detection of small-size assembly errors, the extension of scaffold/contig boundaries, and gap closure. Through expression and homology validation, we detected three groups of novel full-length protein-coding genes. A total of 12.62% of the novel protein-coding genes were validated by proteomic data. GO annotation analysis showed that some of the novel protein-coding genes were involved in pigmentation, anatomical structure formation and reproduction, which might be related to the development and evolution of the black-white pelage, pseudo-thumb and delayed embryonic implantation of giant pandas. The updated genome annotation will help further giant panda studies from both structural and functional perspectives.

  9. Full spectrum analysis in environmental monitoring

    International Nuclear Information System (INIS)

    Reinhardt, Sascha; Hartmann, Soeren; Pimpl, Richard

    2015-01-01

    In the environmental monitoring spectroscopic gamma detectors are frequently used. The motivation to use spectroscopic gamma detectors is the higher sensitivity and specific spectral information. For the analysis often the photo peaks of the gamma spectrum are used to identify the nuclide. These methods are very reliable, robust and well developed but using only the photo peak means also to use only a fraction of the available information. Doing a full spectrum analysis based on principle components obtained from NASVD for description of the radiation background and adjustment calculations are a possible analysis method which may provide advantages compared to a peak based analysis when used for a continuous environmental monitoring. An analysis example is shown and discussed with a measured time series of gamma spectra obtained from a spectroscopic gamma detector SARA IGS710 with a NaI(Tl) scintillator as it is used in the environmental monitoring.

  10. Full spectrum analysis in environmental monitoring

    International Nuclear Information System (INIS)

    Reinahrdt, S.; Hartmann, S.; Pimpl, R.

    2014-01-01

    In the environmental monitoring spectroscopic gamma detectors are frequently used. The motivation to use spectroscopic gamma detectors is the higher sensitivity and specific spectral information. For the analysis often the photo peaks of the gamma spectrum are used to identify the nuclide. These methods are very reliable, robust and well developed but using only the photo peak means also to use only a fraction of the available information. Doing a full spectrum analysis based on principal components obtained from NASVD for description of the radiation background and adjustment calculations are a possible analysis method, which may provide advantages compared to a peak based analysis when used for a continuous environmental monitoring. An analysis example is shown and discussed with a measured time series of gamma spectra obtained from a spectroscopic gamma detector SARA IGS710 with a NaI(Tl) scintillator as it is used in the environmental monitoring. (authors)

  11. The integrated microbial genome resource of analysis.

    Science.gov (United States)

    Checcucci, Alice; Mengoni, Alessio

    2015-01-01

    Integrated Microbial Genomes and Metagenomes (IMG) is a biocomputational system that allows to provide information and support for annotation and comparative analysis of microbial genomes and metagenomes. IMG has been developed by the US Department of Energy (DOE)-Joint Genome Institute (JGI). IMG platform contains both draft and complete genomes, sequenced by Joint Genome Institute and other public and available genomes. Genomes of strains belonging to Archaea, Bacteria, and Eukarya domains are present as well as those of viruses and plasmids. Here, we provide some essential features of IMG system and case study for pangenome analysis.

  12. Identification of a contemporary human parechovirus type 1 by VIDISCA and characterisation of its full genome

    Directory of Open Access Journals (Sweden)

    Drexler Jan

    2008-02-01

    Full Text Available Abstract Background Enteritis is caused by a spectrum of viruses that is most likely not fully characterised. When testing stool samples by cell culture, virus isolates are sometimes obtained which cannot be typed by current methods. In this study we used VIDISCA, a virus identification method which has not yet been widely applied, on such an untyped virus isolate. Results We found a human parechovirus (HPeV type 1 (strain designation: BNI-788st. Because genomes of contemporary HPeV1 were not available, we determined its complete genome sequence. We found that the novel strain was likely the result of recombination between structural protein genes of an ancestor of contemporary HPeV1 strains and nonstructural protein genes from an unknown ancestor, most closely related to HPeV3. In contrast to the non-structural protein genes of other HPeV prototype strains, the non-structural protein genes of BNI-788st and HPeV3 prototype strains did not co-segregate in bootscan analysis with that of other prototype strains. Conclusion HPeV3 nonstructural protein genes may form a distinct element in a pool of circulating HPeV non-structural protein genes. More research into the complex HPeV evolution is required to connect virus ecology with disease patterns in humans.

  13. Mathematical Analysis of Genomic Evolution

    Directory of Open Access Journals (Sweden)

    Cedric Green

    2011-01-01

    Full Text Available Changes in nucleotide sequences, or mutations, accumulate from generation to generation in the genomes of all living organisms. The mutations can be advantageous, deleterious, or neutral. The goal of this project is to determine the amount of advantageous mutations it takes to get human (Homo sapiens DNA from the DNA of genetically distinct organisms. We do this by collecting the genomic data of such organisms, and estimating the amount of mutations it takes to transform yeast (Saccharomyces cerevisiae DNA to the DNA of a human. We calculate the typical number of mutations occurring annually through the organism's average life span and the average mutation rate. This allows us to determine the total number of mutations as well as the probability of advantageous mutations. Not surprisingly, this probability proves to be fairly small. A more precise estimate can be determined by accounting for the differences in the chromosomal structure and phenomena like horizontal gene transfer.

  14. Comparative genome analysis of Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Henrissat, Bernard; Nagy, Laszlo; Brown, Daren; Held, Benjamin; Baker, Scott; Blanchette, Robert; Boussau, Bastien; Doty, Sharon L.; Fagnan, Kirsten; Floudas, Dimitris; Levasseur, Anthony; Manning, Gerard; Martin, Francis; Morin, Emmanuelle; Otillar, Robert; Pisabarro, Antonio; Walton, Jonathan; Wolfe, Ken; Hibbett, David; Grigoriev, Igor

    2013-08-07

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes symbionts, pathogens, and saprotrophs including the majority of wood decaying and ectomycorrhizal species. To better understand the genetic diversity of this phylum we compared the genomes of 35 basidiomycetes including 6 newly sequenced genomes. These genomes span extremes of genome size, gene number, and repeat content. Analysis of core genes reveals that some 48percent of basidiomycete proteins are unique to the phylum with nearly half of those (22percent) found in only one organism. Correlations between lifestyle and certain gene families are evident. Phylogenetic patterns of plant biomass-degrading genes in Agaricomycotina suggest a continuum rather than a dichotomy between the white rot and brown rot modes of wood decay. Based on phylogenetically-informed PCA analysis of wood decay genes, we predict that that Botryobasidium botryosum and Jaapia argillacea have properties similar to white rot species, although neither has typical ligninolytic class II fungal peroxidases (PODs). This prediction is supported by growth assays in which both fungi exhibit wood decay with white rot-like characteristics. Based on this, we suggest that the white/brown rot dichotomy may be inadequate to describe the full range of wood decaying fungi. Analysis of the rate of discovery of proteins with no or few homologs suggests the value of continued sequencing of basidiomycete fungi.

  15. Genome analysis methods - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods Genome analysis... methods Data detail Data name Genome analysis methods DOI 10.18908/lsdba.nbdc01194-01-005 De...scription of data contents The current status and related information of the genomic analysis about each org...anism (March, 2014). In the case of organisms carried out genomic analysis, the d...e File name: pgdbj_dna_marker_linkage_map_genome_analysis_methods_en.zip File URL: ftp://ftp.biosciencedbc.j

  16. Generation of recombinant pestiviruses using a full-genome amplification strategy

    DEFF Research Database (Denmark)

    Rasmussen, Thomas Bruun; Reimann, I.; Uttenthal, Åse

    2010-01-01

    -Gifhorn genome was generated by long RTPCR and then RNA transcripts derived from this amplicon were used to rescue infectious virus. Here, we have now used this full-genome amplification strategy for efficient and robust amplification of three additional pestivirus strains: the vaccine strain C and the virulent...... Paderborn strain of Classical swine fever virus plus the CP7 strain of Bovine viral diarrhoea virus. The amplicons were cloned directly into a stable single-copy bacterial artificial chromosome generating full-length pestivirus DNAs from which infectious RNA transcripts could be also derived....

  17. Rapid CRISPR/Cas9-Mediated Cloning of Full-Length Epstein-Barr Virus Genomes from Latently Infected Cells

    Directory of Open Access Journals (Sweden)

    Misako Yajima

    2018-04-01

    Full Text Available Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically.

  18. Phylogenomic Analysis and Dynamic Evolution of Chloroplast Genomes in Salicaceae

    Directory of Open Access Journals (Sweden)

    Yuan Huang

    2017-06-01

    Full Text Available Chloroplast genomes of plants are highly conserved in both gene order and gene content. Analysis of the whole chloroplast genome is known to provide much more informative DNA sites and thus generates high resolution for plant phylogenies. Here, we report the complete chloroplast genomes of three Salix species in family Salicaceae. Phylogeny of Salicaceae inferred from complete chloroplast genomes is generally consistent with previous studies but resolved with higher statistical support. Incongruences of phylogeny, however, are observed in genus Populus, which most likely results from homoplasy. By comparing three Salix chloroplast genomes with the published chloroplast genomes of other Salicaceae species, we demonstrate that the synteny and length of chloroplast genomes in Salicaceae are highly conserved but experienced dynamic evolution among species. We identify seven positively selected chloroplast genes in Salicaceae, which might be related to the adaptive evolution of Salicaceae species. Comparative chloroplast genome analysis within the family also indicates that some chloroplast genes are lost or became pseudogenes, infer that the chloroplast genes horizontally transferred to the nucleus genome. Based on the complete nucleus genome sequences from two Salicaceae species, we remarkably identify that the entire chloroplast genome is indeed transferred and integrated to the nucleus genome in the individual of the reference genome of P. trichocarpa at least once. This observation, along with presence of the large nuclear plastid DNA (NUPTs and NUPTs-containing multiple chloroplast genes in their original order in the chloroplast genome, favors the DNA-mediated hypothesis of organelle to nucleus DNA transfer. Overall, the phylogenomic analysis using chloroplast complete genomes clearly elucidates the phylogeny of Salicaceae. The identification of positively selected chloroplast genes and dynamic chloroplast-to-nucleus gene transfers in

  19. Full Genome Sequence and sfRNA Interferon Antagonist Activity of Zika Virus from Recife, Brazil.

    Directory of Open Access Journals (Sweden)

    Claire L Donald

    2016-10-01

    Full Text Available The outbreak of Zika virus (ZIKV in the Americas has transformed a previously obscure mosquito-transmitted arbovirus of the Flaviviridae family into a major public health concern. Little is currently known about the evolution and biology of ZIKV and the factors that contribute to the associated pathogenesis. Determining genomic sequences of clinical viral isolates and characterization of elements within these are an important prerequisite to advance our understanding of viral replicative processes and virus-host interactions.We obtained a ZIKV isolate from a patient who presented with classical ZIKV-associated symptoms, and used high throughput sequencing and other molecular biology approaches to determine its full genome sequence, including non-coding regions. Genome regions were characterized and compared to the sequences of other isolates where available. Furthermore, we identified a subgenomic flavivirus RNA (sfRNA in ZIKV-infected cells that has antagonist activity against RIG-I induced type I interferon induction, with a lesser effect on MDA-5 mediated action.The full-length genome sequence including non-coding regions of a South American ZIKV isolate from a patient with classical symptoms will support efforts to develop genetic tools for this virus. Detection of sfRNA that counteracts interferon responses is likely to be important for further understanding of pathogenesis and virus-host interactions.

  20. Barcode server: a visualization-based genome analysis system.

    Directory of Open Access Journals (Sweden)

    Fenglou Mao

    Full Text Available We have previously developed a computational method for representing a genome as a barcode image, which makes various genomic features visually apparent. We have demonstrated that this visual capability has made some challenging genome analysis problems relatively easy to solve. We have applied this capability to a number of challenging problems, including (a identification of horizontally transferred genes, (b identification of genomic islands with special properties and (c binning of metagenomic sequences, and achieved highly encouraging results. These application results inspired us to develop this barcode-based genome analysis server for public service, which supports the following capabilities: (a calculation of the k-mer based barcode image for a provided DNA sequence; (b detection of sequence fragments in a given genome with distinct barcodes from those of the majority of the genome, (c clustering of provided DNA sequences into groups having similar barcodes; and (d homology-based search using Blast against a genome database for any selected genomic regions deemed to have interesting barcodes. The barcode server provides a job management capability, allowing processing of a large number of analysis jobs for barcode-based comparative genome analyses. The barcode server is accessible at http://csbl1.bmb.uga.edu/Barcode.

  1. Analysis of ORF5 and Full-Length Genome Sequences of Porcine Reproductive and Respiratory Syndrome Virus Isolates of Genotypes 1 and 2 Retrieved Worldwide Provides Evidence that Recombination Is a Common Phenomenon and May Produce Mosaic Isolates

    DEFF Research Database (Denmark)

    Martín-Valls, G. E.; Kvisgaard, Lise Kirstine; Tello, M.

    2014-01-01

    Recombination is currently recognized as a factor for high genetic diversity, but the frequency of such recombination events and the genome segments involved are not well known. In the present study, we initially focused on the detection of recombinant porcine reproductive and respiratory syndrom...

  2. Comparison of two Next Generation sequencing platforms for full genome sequencing of Classical Swine Fever Virus

    DEFF Research Database (Denmark)

    Fahnøe, Ulrik; Pedersen, Anders Gorm; Höper, Dirk

    2013-01-01

    to the consensus sequence. Additionally, we got an average sequence depth for the genome of 4000 for the Iontorrent PGM and 400 for the FLX platform making the mapping suitable for single nucleotide variant (SNV) detection. The analysis revealed a single non-silent SNV A10665G leading to the amino acid change D......Next Generation Sequencing (NGS) is becoming more adopted into viral research and will be the preferred technology in the years to come. We have recently sequenced several strains of Classical Swine Fever Virus (CSFV) by NGS on both Genome Sequencer FLX (GS FLX) and Iontorrent PGM platforms...

  3. SIGMA: A System for Integrative Genomic Microarray Analysis of Cancer Genomes

    Directory of Open Access Journals (Sweden)

    Davies Jonathan J

    2006-12-01

    Full Text Available Abstract Background The prevalence of high resolution profiling of genomes has created a need for the integrative analysis of information generated from multiple methodologies and platforms. Although the majority of data in the public domain are gene expression profiles, and expression analysis software are available, the increase of array CGH studies has enabled integration of high throughput genomic and gene expression datasets. However, tools for direct mining and analysis of array CGH data are limited. Hence, there is a great need for analytical and display software tailored to cross platform integrative analysis of cancer genomes. Results We have created a user-friendly java application to facilitate sophisticated visualization and analysis such as cross-tumor and cross-platform comparisons. To demonstrate the utility of this software, we assembled array CGH data representing Affymetrix SNP chip, Stanford cDNA arrays and whole genome tiling path array platforms for cross comparison. This cancer genome database contains 267 profiles from commonly used cancer cell lines representing 14 different tissue types. Conclusion In this study we have developed an application for the visualization and analysis of data from high resolution array CGH platforms that can be adapted for analysis of multiple types of high throughput genomic datasets. Furthermore, we invite researchers using array CGH technology to deposit both their raw and processed data, as this will be a continually expanding database of cancer genomes. This publicly available resource, the System for Integrative Genomic Microarray Analysis (SIGMA of cancer genomes, can be accessed at http://sigma.bccrc.ca.

  4. Data on genome analysis of Bacillus velezensis LS69

    Directory of Open Access Journals (Sweden)

    Guoqiang Liu

    2017-08-01

    Full Text Available The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017 [1]. Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.

  5. GenomePeek—an online tool for prokaryotic genome and metagenome analysis

    Directory of Open Access Journals (Sweden)

    Katelyn McNair

    2015-06-01

    Full Text Available As more and more prokaryotic sequencing takes place, a method to quickly and accurately analyze this data is needed. Previous tools are mainly designed for metagenomic analysis and have limitations; such as long runtimes and significant false positive error rates. The online tool GenomePeek (edwards.sdsu.edu/GenomePeek was developed to analyze both single genome and metagenome sequencing files, quickly and with low error rates. GenomePeek uses a sequence assembly approach where reads to a set of conserved genes are extracted, assembled and then aligned against the highly specific reference database. GenomePeek was found to be faster than traditional approaches while still keeping error rates low, as well as offering unique data visualization options.

  6. Use of Dried Blood Spots to Elucidate Full-Length Transmitted/Founder HIV-1 Genomes

    Directory of Open Access Journals (Sweden)

    Jesus F. Salazar-Gonzalez

    2016-07-01

    Full Text Available Background: Identification of HIV-1 genomes responsible for establishing clinical infection in newly infected individuals is fundamental to prevention and pathogenesis research. Processing, storage, and transportation of the clinical samples required to perform these virologic assays in resource-limited settings requires challenging venipuncture and cold chain logistics. Here, we validate the use of dried-blood spots (DBS as a simple and convenient alternative to collecting and storing frozen plasma. Methods: We performed parallel nucleic acid extraction, single genome amplification (SGA, next generation sequencing (NGS, and phylogenetic analyses on plasma and DBS. Results: We demonstrated the capacity to extract viral RNA from DBS and perform SGA to infer the complete nucleotide sequence of the transmitted/founder (TF HIV-1 envelope gene and full-length genome in two acutely infected individuals. Using both SGA and NGS methodologies, we showed that sequences generated from DBS and plasma display comparable phylogenetic patterns in both acute and chronic infection. SGA was successful on samples with a range of plasma viremia, including samples as low as 1,700 copies/ml and an estimated ~50 viral copies per blood spot. Further, we demonstrated reproducible efficiency in gp160 env sequencing in DBS stored at ambient temperature for up to three weeks or at -20ºC for up to five months. Conclusions: These findings support the use of DBS as a practical and cost-effective alternative to frozen plasma for clinical trials and translational research conducted in resource-limited settings.

  7. Full-length genomic characterization and molecular evolution of canine parvovirus in China.

    Science.gov (United States)

    Zhou, Ling; Tang, Qinghai; Shi, Lijun; Kong, Miaomiao; Liang, Lin; Mao, Qianqian; Bu, Bin; Yao, Lunguang; Zhao, Kai; Cui, Shangjin; Leal, Élcio

    2016-06-01

    Canine parvovirus type 2 (CPV-2) can cause acute haemorrhagic enteritis in dogs and myocarditis in puppies. This disease has become one of the most serious infectious diseases of dogs. During 2014 in China, there were many cases of acute infectious diarrhoea in dogs. Some faecal samples were negative for the CPV-2 antigen based on a colloidal gold test strip but were positive based on PCR, and a viral strain was isolated from one such sample. The cytopathic effect on susceptible cells and the results of the immunoperoxidase monolayer assay, PCR, and sequencing indicated that the pathogen was CPV-2. The strain was named CPV-NY-14, and the full-length genome was sequenced and analysed. A maximum likelihood tree was constructed using the full-length genome and all available CPV-2 genomes. New strains have replaced the original strain in Taiwan and Italy, although the CPV-2a strain is still predominant there. However, CPV-2a still causes many cases of acute infectious diarrhoea in dogs in China.

  8. Genetic characterization of human herpesvirus type 1: Full-length genome sequence of strain obtained from an encephalitis case from India

    Directory of Open Access Journals (Sweden)

    Vijay P Bondre

    2016-01-01

    Interpretation & conclusions: Our results showed that the full-length genome sequence generated from an Indian HSV-1 isolate shared close genetic relationship with the American KOS and Chinese CR38 strains which belonged to the Asian genetic lineage. Recombination analysis of Indian isolate demonstrated multiple recombination crossover points throughout the genome. This full-length genome sequence amplified from the Indian isolate would be helpful to study HSV evolution, genetic basis of differential pathogenesis, host-virus interactions and viral factors contributing towards differential clinical outcome in human infections.

  9. Full-length genomic sequence of hepatitis B virus genotype C2 isolated from a native Brazilian patient

    Directory of Open Access Journals (Sweden)

    Mónica Viviana Alvarado-Mora

    2011-06-01

    Full Text Available The hepatitis B virus (HBV is among the leading causes of chronic hepatitis, cirrhosis and hepatocellular carcinoma. In Brazil, genotype A is the most frequent, followed by genotypes D and F. Genotypes B and C are found in Brazil exclusively among Asian patients and their descendants. The aim of this study was to sequence the entire HBV genome of a Caucasian patient infected with HBV/C2 and to infer the origin of the virus based on sequencing analysis. The sequence of this Brazilian isolate was grouped with four other sequences described in China. The sequence of this patient is the first complete genome of HBV/C2 reported in Brazil.

  10. Full-length genome sequence analysis of an avian leukosis virus subgroup J (ALV-J) as contaminant in live poultry vaccine: The commercial live vaccines might be a potential route for ALV-J transmission.

    Science.gov (United States)

    Wang, P; Lin, L; Li, H; Shi, M; Gu, Z; Wei, P

    2018-02-25

    One avian leukosis virus subgroup J (ALV-J) strain was isolated from 67 commercial live poultry vaccines produced by various manufacturers during 2013-2016 in China. The complete genomes of the isolate were sequenced and it was found that the genes gag and pol of the strain were relatively conservative, while the gp85 gene of the strain GX14YYA1 had the highest similarities with a field strain GX14ZS14, which was isolated from the chickens of a farm that had once used the same vaccine as the one found to be contaminated with the GX14YYA1. This is the first report of ALV-J contaminant in live poultry vaccine in China. Our finding demonstrates that vaccination of the commercial live vaccines might be a potential new route for ALV-J transmission in chickens and highlights the need for more extensive monitoring of the commercial live vaccines in China. © 2018 Blackwell Verlag GmbH.

  11. Broad genomic and transcriptional analysis reveals a highly derived genome in dinoflagellate mitochondria

    Directory of Open Access Journals (Sweden)

    Keeling Patrick J

    2007-09-01

    Full Text Available Abstract Background Dinoflagellates comprise an ecologically significant and diverse eukaryotic phylum that is sister to the phylum containing apicomplexan endoparasites. The mitochondrial genome of apicomplexans is uniquely reduced in gene content and size, encoding only three proteins and two ribosomal RNAs (rRNAs within a highly compacted 6 kb DNA. Dinoflagellate mitochondrial genomes have been comparatively poorly studied: limited available data suggest some similarities with apicomplexan mitochondrial genomes but an even more radical type of genomic organization. Here, we investigate structure, content and expression of dinoflagellate mitochondrial genomes. Results From two dinoflagellates, Crypthecodinium cohnii and Karlodinium micrum, we generated over 42 kb of mitochondrial genomic data that indicate a reduced gene content paralleling that of mitochondrial genomes in apicomplexans, i.e., only three protein-encoding genes and at least eight conserved components of the highly fragmented large and small subunit rRNAs. Unlike in apicomplexans, dinoflagellate mitochondrial genes occur in multiple copies, often as gene fragments, and in numerous genomic contexts. Analysis of cDNAs suggests several novel aspects of dinoflagellate mitochondrial gene expression. Polycistronic transcripts were found, standard start codons are absent, and oligoadenylation occurs upstream of stop codons, resulting in the absence of termination codons. Transcripts of at least one gene, cox3, are apparently trans-spliced to generate full-length mRNAs. RNA substitutional editing, a process previously identified for mRNAs in dinoflagellate mitochondria, is also implicated in rRNA expression. Conclusion The dinoflagellate mitochondrial genome shares the same gene complement and fragmentation of rRNA genes with its apicomplexan counterpart. However, it also exhibits several unique characteristics. Most notable are the expansion of gene copy numbers and their arrangements

  12. GWAMA: software for genome-wide association meta-analysis

    Directory of Open Access Journals (Sweden)

    Mägi Reedik

    2010-05-01

    Full Text Available Abstract Background Despite the recent success of genome-wide association studies in identifying novel loci contributing effects to complex human traits, such as type 2 diabetes and obesity, much of the genetic component of variation in these phenotypes remains unexplained. One way to improving power to detect further novel loci is through meta-analysis of studies from the same population, increasing the sample size over any individual study. Although statistical software analysis packages incorporate routines for meta-analysis, they are ill equipped to meet the challenges of the scale and complexity of data generated in genome-wide association studies. Results We have developed flexible, open-source software for the meta-analysis of genome-wide association studies. The software incorporates a variety of error trapping facilities, and provides a range of meta-analysis summary statistics. The software is distributed with scripts that allow simple formatting of files containing the results of each association study and generate graphical summaries of genome-wide meta-analysis results. Conclusions The GWAMA (Genome-Wide Association Meta-Analysis software has been developed to perform meta-analysis of summary statistics generated from genome-wide association studies of dichotomous phenotypes or quantitative traits. Software with source files, documentation and example data files are freely available online at http://www.well.ox.ac.uk/GWAMA.

  13. Full genomic analysis of an influenza A (H1N2 virus identified during 2009 pandemic in Eastern India: evidence of reassortment event between co-circulating A(H1N1pdm09 and A/Brisbane/10/2007-like H3N2 strains

    Directory of Open Access Journals (Sweden)

    Mukherjee Tapasi Roy

    2012-10-01

    Full Text Available Abstract Background During the pandemic [Influenza A(H1N1pdm09] period in 2009-2010, an influenza A (Inf-A virus with H1N2 subtype (designated as A/Eastern India/N-1289/2009 was detected from a 25 years old male from Mizoram (North-eastern India. Objective To characterize full genome of the H1N2 influenza virus. Methods For initial detection of Influenza viruses, amplification of matrix protein (M gene of Inf-A and B viruses was carried out by real time RT-PCR. Influenza A positive viruses are then further subtyped with HA and NA gene specific primers. Sequencing and the phylogenetic analysis was performed for the H1N2 strain to understand its origin. Results The outcome of this full genome study revealed a unique reassortment event where the N-1289 virus acquired it’s HA gene from a 2009 pandemic H1N1 virus with swine origin and the other genes from H3N2-like viruses of human origin. Conclusions This study provides information on possibility of occurrence of reassortment events during influenza season when infectivity is high and two different subtypes of Inf-A viruses co-circulate in same geographical location.

  14. Genome Sequencing and Analysis Conference IV

    Energy Technology Data Exchange (ETDEWEB)

    1993-12-31

    J. Craig Venter and C. Thomas Caskey co-chaired Genome Sequencing and Analysis Conference IV held at Hilton Head, South Carolina from September 26--30, 1992. Venter opened the conference by noting that approximately 400 researchers from 16 nations were present four times as many participants as at Genome Sequencing Conference I in 1989. Venter also introduced the Data Fair, a new component of the conference allowing exchange and on-site computer analysis of unpublished sequence data.

  15. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-25

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human genome variations: 1) HapMap Data (1,417 individuals) (http://hapmap.ncbi.nlm.nih.gov/downloads/genotypes/2010-08_phaseII+III/forward/), 2) HGDP (Human Genome Diversity Project) Data (940 individuals) (http://www.hagsc.org/hgdp/files.html), 3) 1000 genomes Data (2,504 individuals) http://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20130502/ If we can integrate all three data into a single volume of data, we should be able to conduct a more detailed analysis of human genome variations for a total number of 4,861 individuals (= 1,417+940+2,504 individuals). In fact, we successfully integrated these three data sets by use of information on the reference human genome sequence, and we conducted the big data analysis. In particular, we constructed a phylogenetic tree of about 5,000 human individuals at the genome level. As a result, we were able to identify clusters of ethnic groups, with detectable admixture, that were not possible by an analysis of each of the three data sets. Here, we report the outcome of this kind of big data analyses and discuss evolutionary significance of human genomic variations. Note that the present study was conducted in collaboration with Katsuhiko Mineta and Kosuke Goto at KAUST.

  16. Full-Genome Characterization and Genetic Evolution of West African Isolates of Bagaza Virus

    Directory of Open Access Journals (Sweden)

    Martin Faye

    2018-04-01

    Full Text Available Bagaza virus is a mosquito-borne flavivirus, first isolated in 1966 in Central African Republic. It has currently been identified in mosquito pools collected in the field in West and Central Africa. Emergence in wild birds in Europe and serological evidence in encephalitis patients in India raise questions on its genetic evolution and the diversity of isolates circulating in Africa. To better understand genetic diversity and evolution of Bagaza virus, we describe the full-genome characterization of 11 West African isolates, sampled from 1988 to 2014. Parameters such as genetic distances, N-glycosylation patterns, recombination events, selective pressures, and its codon adaptation to human genes are assessed. Our study is noteworthy for the observation of N-glycosylation and recombination in Bagaza virus and provides insight into its Indian origin from the 13th century. Interestingly, evidence of Bagaza virus codon adaptation to human house-keeping genes is also observed to be higher than those of other flaviviruses well known in human infections. Genetic variations on genome of West African Bagaza virus could play an important role in generating diversity and may promote Bagaza virus adaptation to other vertebrates and become an important threat in human health.

  17. Genome-wide identification of the regulatory targets of a transcription factor using biochemical characterization and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Jolly Emmitt R

    2005-11-01

    Full Text Available Abstract Background A major challenge in computational genomics is the development of methodologies that allow accurate genome-wide prediction of the regulatory targets of a transcription factor. We present a method for target identification that combines experimental characterization of binding requirements with computational genomic analysis. Results Our method identified potential target genes of the transcription factor Ndt80, a key transcriptional regulator involved in yeast sporulation, using the combined information of binding affinity, positional distribution, and conservation of the binding sites across multiple species. We have also developed a mathematical approach to compute the false positive rate and the total number of targets in the genome based on the multiple selection criteria. Conclusion We have shown that combining biochemical characterization and computational genomic analysis leads to accurate identification of the genome-wide targets of a transcription factor. The method can be extended to other transcription factors and can complement other genomic approaches to transcriptional regulation.

  18. Comparative Genome Analysis of Enterobacter cloacae

    Science.gov (United States)

    Liu, Wing-Yee; Wong, Chi-Fat; Chung, Karl Ming-Kar; Jiang, Jing-Wei; Leung, Frederick Chi-Ching

    2013-01-01

    The Enterobacter cloacae species includes an extremely diverse group of bacteria that are associated with plants, soil and humans. Publication of the complete genome sequence of the plant growth-promoting endophytic E. cloacae subsp. cloacae ENHKU01 provided an opportunity to perform the first comparative genome analysis between strains of this dynamic species. Examination of the pan-genome of E. cloacae showed that the conserved core genome retains the general physiological and survival genes of the species, while genomic factors in plasmids and variable regions determine the virulence of the human pathogenic E. cloacae strain; additionally, the diversity of fimbriae contributes to variation in colonization and host determination of different E. cloacae strains. Comparative genome analysis further illustrated that E. cloacae strains possess multiple mechanisms for antagonistic action against other microorganisms, which involve the production of siderophores and various antimicrobial compounds, such as bacteriocins, chitinases and antibiotic resistance proteins. The presence of Type VI secretion systems is expected to provide further fitness advantages for E. cloacae in microbial competition, thus allowing it to survive in different environments. Competition assays were performed to support our observations in genomic analysis, where E. cloacae subsp. cloacae ENHKU01 demonstrated antagonistic activities against a wide range of plant pathogenic fungal and bacterial species. PMID:24069314

  19. Microbial genome analysis: the COG approach.

    Science.gov (United States)

    Galperin, Michael Y; Kristensen, David M; Makarova, Kira S; Wolf, Yuri I; Koonin, Eugene V

    2017-09-14

    For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis. Published by Oxford University Press 2017. This work is written by US Government employees and is in the public domain in the US.

  20. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family

    Directory of Open Access Journals (Sweden)

    Velasco Riccardo

    2011-01-01

    Full Text Available Abstract Background Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. Results We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. Conclusions A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.

  1. Full genomic analysis of an influenza A (H1N2) virus identified during 2009 pandemic in Eastern India: evidence of reassortment event between co-circulating A(H1N1)pdm09 and A/Brisbane/10/2007-like H3N2 strains.

    Science.gov (United States)

    Mukherjee, Tapasi Roy; Agrawal, Anurodh S; Chakrabarti, Sekhar; Chawla-Sarkar, Mamta

    2012-10-11

    During the pandemic [Influenza A(H1N1)pdm09] period in 2009-2010, an influenza A (Inf-A) virus with H1N2 subtype (designated as A/Eastern India/N-1289/2009) was detected from a 25 years old male from Mizoram (North-eastern India). To characterize full genome of the H1N2 influenza virus. For initial detection of Influenza viruses, amplification of matrix protein (M) gene of Inf-A and B viruses was carried out by real time RT-PCR. Influenza A positive viruses are then further subtyped with HA and NA gene specific primers. Sequencing and the phylogenetic analysis was performed for the H1N2 strain to understand its origin. The outcome of this full genome study revealed a unique reassortment event where the N-1289 virus acquired it's HA gene from a 2009 pandemic H1N1 virus with swine origin and the other genes from H3N2-like viruses of human origin. This study provides information on possibility of occurrence of reassortment events during influenza season when infectivity is high and two different subtypes of Inf-A viruses co-circulate in same geographical location.

  2. Full genome sequences are key to disclose RHDV2 emergence in the Macaronesian islands.

    Science.gov (United States)

    Lopes, Ana M; Blanco-Aguiar, Jose; Martín-Alonso, Aaron; Leitão, Manuel; Foronda, Pilar; Mendes, Marco; Gonçalves, David; Abrantes, Joana; Esteves, Pedro J

    2018-02-01

    A recent publication by Carvalho et al. in "Virus Genes" (June 2017) reported the presence of the new variant of rabbit hemorrhagic disease virus (RHDV2) in the two larger islands of the archipelago of Madeira. Based on the capsid protein sequence, the authors suggested that the high sequence identity, along with the short time spanning between outbreaks, points to dissemination from Porto Santo to Madeira. By including information of the full RHDV2 genome of strains from Azores, Madeira, and the Canary Islands, we confirm the results obtained by Carvalho et al., but further show that several subtypes of RHDV2 circulate in these islands: non-recombinant RHDV2 in the Canary Islands, G1/RHDV2 in Azores, Porto Santo and Madeira, and NP/RHDV2 also in Madeira. Here we conclude that RHDV2 has been independently introduced in these archipelagos, and that in Madeira at least two independent introductions must have occurred. We provide additional information on the dynamics of RHDV2 in the Macaronesian archipelagos of Azores, Madeira, and the Canary Islands and highlight the importance of analyzing RHDV2 complete genome.

  3. Universal internucleotide statistics in full genomes: a footprint of the DNA structure and packaging?

    Directory of Open Access Journals (Sweden)

    Mikhail I Bogachev

    Full Text Available Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleotide interval distributions exhibit the same [Formula: see text]-exponential form. While in prokaryotes a single [Formula: see text]-exponential function makes the best fit, in eukaryotes the PDF contains additionally a second [Formula: see text]-exponential, which in the human genome makes a perfect approximation over nearly 10 decades. We suggest that this functional form is a footprint of the heterogeneous DNA structure, where the first [Formula: see text]-exponential reflects the universal helical pitch that appears both in pro- and eukaryotic DNA, while the second [Formula: see text]-exponential is a specific marker of the large-scale eukaryotic DNA organization.

  4. Full Mitochondrial Genome Sequence of the Sugar Beet Wireworm Limonius californicus (Coleoptera: Elateridae), a Common Agricultural Pest.

    Science.gov (United States)

    Gerritsen, Alida T; New, Daniel D; Robison, Barrie D; Rashed, Arash; Hohenlohe, Paul; Forney, Larry; Rashidi, Mahnaz; Wilson, Cathy M; Settles, Matthew L

    2016-01-21

    We report here the full mitochondrial genome sequence of Limonius californicus, a species of click beetle that is an agricultural pest in its larval form. The circular genome is 16.5 kb and contains 13 protein-coding genes, 2 rRNA genes, and 22 tRNA genes. Copyright © 2016 Gerritsen et al.

  5. Full-length genome sequences of five hepatitis C virus isolates representing subtypes 3g, 3h, 3i and 3k, and a unique genotype 3 variant.

    Science.gov (United States)

    Lu, Ling; Li, Chunhua; Yuan, Jie; Lu, Teng; Okamoto, Hiroaki; Murphy, Donald G

    2013-03-01

    We characterized the full-length genomes of five distinct hepatitis C virus (HCV)-3 isolates. These represent the first complete genomes for subtypes 3g and 3h, the second such genomes for 3k and 3i, and of one novel variant presently not assigned to a subtype. Each genome was determined from 18-25 overlapping fragments. They had lengths of 9579-9660 nt and each contained a single ORF encoding 3020-3025 aa. They were isolated from five patients residing in Canada; four were of Asian origin and one was of Somali origin. Phylogenetic analysis using 64 partial NS5B sequences differentiated 10 assigned subtypes, 3a-3i and 3k, and two additional lineages within genotype 3. From the data of this study, HCV-3 full-length sequences are now available for six of the assigned subtypes and one unassigned. Our findings should add insights to HCV evolutionary studies and clinical applications.

  6. Exploratory analysis of genomic segmentations with Segtools

    Directory of Open Access Journals (Sweden)

    Buske Orion J

    2011-10-01

    Full Text Available Abstract Background As genome-wide experiments and annotations become more prevalent, researchers increasingly require tools to help interpret data at this scale. Many functional genomics experiments involve partitioning the genome into labeled segments, such that segments sharing the same label exhibit one or more biochemical or functional traits. For example, a collection of ChlP-seq experiments yields a compendium of peaks, each labeled with one or more associated DNA-binding proteins. Similarly, manually or automatically generated annotations of functional genomic elements, including cis-regulatory modules and protein-coding or RNA genes, can also be summarized as genomic segmentations. Results We present a software toolkit called Segtools that simplifies and automates the exploration of genomic segmentations. The software operates as a series of interacting tools, each of which provides one mode of summarization. These various tools can be pipelined and summarized in a single HTML page. We describe the Segtools toolkit and demonstrate its use in interpreting a collection of human histone modification data sets and Plasmodium falciparum local chromatin structure data sets. Conclusions Segtools provides a convenient, powerful means of interpreting a genomic segmentation.

  7. Genomic analysis of Xenopus organizer function

    Directory of Open Access Journals (Sweden)

    Suhai Sándor

    2006-06-01

    Full Text Available Abstract Background Studies of the Xenopus organizer have laid the foundation for our understanding of the conserved signaling pathways that pattern vertebrate embryos during gastrulation. The two primary activities of the organizer, BMP and Wnt inhibition, can regulate a spectrum of genes that pattern essentially all aspects of the embryo during gastrulation. As our knowledge of organizer signaling grows, it is imperative that we begin knitting together our gene-level knowledge into genome-level signaling models. The goal of this paper was to identify complete lists of genes regulated by different aspects of organizer signaling, thereby providing a deeper understanding of the genomic mechanisms that underlie these complex and fundamental signaling events. Results To this end, we ectopically overexpress Noggin and Dkk-1, inhibitors of the BMP and Wnt pathways, respectively, within ventral tissues. After isolating embryonic ventral halves at early and late gastrulation, we analyze the transcriptional response to these molecules within the generated ectopic organizers using oligonucleotide microarrays. An efficient statistical analysis scheme, combined with a new Gene Ontology biological process annotation of the Xenopus genome, allows reliable and faithful clustering of molecules based upon their roles during gastrulation. From this data, we identify new organizer-related expression patterns for 19 genes. Moreover, our data sub-divides organizer genes into separate head and trunk organizing groups, which each show distinct responses to Noggin and Dkk-1 activity during gastrulation. Conclusion Our data provides a genomic view of the cohorts of genes that respond to Noggin and Dkk-1 activity, allowing us to separate the role of each in organizer function. These patterns demonstrate a model where BMP inhibition plays a largely inductive role during early developmental stages, thereby initiating the suites of genes needed to pattern dorsal tissues

  8. Comparative Pan-Genome Analysis of Piscirickettsia salmonis Reveals Genomic Divergences within Genogroups

    Directory of Open Access Journals (Sweden)

    Guillermo Nourdin-Galindo

    2017-10-01

    Full Text Available Piscirickettsia salmonis is the etiological agent of salmonid rickettsial septicemia, a disease that seriously affects the salmonid industry. Despite efforts to genomically characterize P. salmonis, functional information on the life cycle, pathogenesis mechanisms, diagnosis, treatment, and control of this fish pathogen remain lacking. To address this knowledge gap, the present study conducted an in silico pan-genome analysis of 19 P. salmonis strains from distinct geographic locations and genogroups. Results revealed an expected open pan-genome of 3,463 genes and a core-genome of 1,732 genes. Two marked genogroups were identified, as confirmed by phylogenetic and phylogenomic relationships to the LF-89 and EM-90 reference strains, as well as by assessments of genomic structures. Different structural configurations were found for the six identified copies of the ribosomal operon in the P. salmonis genome, indicating translocation throughout the genetic material. Chromosomal divergences in genomic localization and quantity of genetic cassettes were also found for the Dot/Icm type IVB secretion system. To determine divergences between core-genomes, additional pan-genome descriptions were compiled for the so-termed LF and EM genogroups. Open pan-genomes composed of 2,924 and 2,778 genes and core-genomes composed of 2,170 and 2,228 genes were respectively found for the LF and EM genogroups. The core-genomes were functionally annotated using the Gene Ontology, KEGG, and Virulence Factor databases, revealing the presence of several shared groups of genes related to basic function of intracellular survival and bacterial pathogenesis. Additionally, the specific pan-genomes for the LF and EM genogroups were defined, resulting in the identification of 148 and 273 exclusive proteins, respectively. Notably, specific virulence factors linked to adherence, colonization, invasion factors, and endotoxins were established. The obtained data suggest that these

  9. A Distance Measure for Genome Phylogenetic Analysis

    Science.gov (United States)

    Cao, Minh Duc; Allison, Lloyd; Dix, Trevor

    Phylogenetic analyses of species based on single genes or parts of the genomes are often inconsistent because of factors such as variable rates of evolution and horizontal gene transfer. The availability of more and more sequenced genomes allows phylogeny construction from complete genomes that is less sensitive to such inconsistency. For such long sequences, construction methods like maximum parsimony and maximum likelihood are often not possible due to their intensive computational requirement. Another class of tree construction methods, namely distance-based methods, require a measure of distances between any two genomes. Some measures such as evolutionary edit distance of gene order and gene content are computational expensive or do not perform well when the gene content of the organisms are similar. This study presents an information theoretic measure of genetic distances between genomes based on the biological compression algorithm expert model. We demonstrate that our distance measure can be applied to reconstruct the consensus phylogenetic tree of a number of Plasmodium parasites from their genomes, the statistical bias of which would mislead conventional analysis methods. Our approach is also used to successfully construct a plausible evolutionary tree for the γ-Proteobacteria group whose genomes are known to contain many horizontally transferred genes.

  10. Survey sequencing and comparative analysis of the elephant shark (Callorhinchus milii genome.

    Directory of Open Access Journals (Sweden)

    Byrappa Venkatesh

    2007-04-01

    Full Text Available Owing to their phylogenetic position, cartilaginous fishes (sharks, rays, skates, and chimaeras provide a critical reference for our understanding of vertebrate genome evolution. The relatively small genome of the elephant shark, Callorhinchus milii, a chimaera, makes it an attractive model cartilaginous fish genome for whole-genome sequencing and comparative analysis. Here, the authors describe survey sequencing (1.4x coverage and comparative analysis of the elephant shark genome, one of the first cartilaginous fish genomes to be sequenced to this depth. Repetitive sequences, represented mainly by a novel family of short interspersed element-like and long interspersed element-like sequences, account for about 28% of the elephant shark genome. Fragments of approximately 15,000 elephant shark genes reveal specific examples of genes that have been lost differentially during the evolution of tetrapod and teleost fish lineages. Interestingly, the degree of conserved synteny and conserved sequences between the human and elephant shark genomes are higher than that between human and teleost fish genomes. Elephant shark contains putative four Hox clusters indicating that, unlike teleost fish genomes, the elephant shark genome has not experienced an additional whole-genome duplication. These findings underscore the importance of the elephant shark as a critical reference vertebrate genome for comparative analysis of the human and other vertebrate genomes. This study also demonstrates that a survey-sequencing approach can be applied productively for comparative analysis of distantly related vertebrate genomes.

  11. Genomic analysis of Fusarium verticillioides.

    Science.gov (United States)

    Brown, D W; Butchko, R A E; Proctor, R H

    2008-09-01

    Fusarium verticillioides (teleomorph Gibberella moniliformis) can be either an endophyte of maize, causing no visible disease, or a pathogen-causing disease of ears, stalks, roots and seedlings. At any stage, this fungus can synthesize fumonisins, a family of mycotoxins structurally similar to the sphingolipid sphinganine. Ingestion of fumonisin-contaminated maize has been associated with a number of animal diseases, including cancer in rodents, and exposure has been correlated with human oesophageal cancer in some regions of the world, and some evidence suggests that fumonisins are a risk factor for neural tube defects. A primary goal of the authors' laboratory is to eliminate fumonisin contamination of maize and maize products. Understanding how and why these toxins are made and the F. verticillioides-maize disease process will allow one to develop novel strategies to limit tissue destruction (rot) and fumonisin production. To meet this goal, genomic sequence data, expressed sequence tags (ESTs) and microarrays are being used to identify F. verticillioides genes involved in the biosynthesis of toxins and plant pathogenesis. This paper describes the current status of F. verticillioides genomic resources and three approaches being used to mine microarray data from a wild-type strain cultured in liquid fumonisin production medium for 12, 24, 48, 72, 96 and 120h. Taken together, these approaches demonstrate the power of microarray technology to provide information on different biological processes.

  12. Metagenomics as a tool to obtain full genomes of process-critical bacteria in engineered systems

    DEFF Research Database (Denmark)

    Albertsen, Mads; Hugenholtz, Philip; Tyson, Gene W.

    of the community. The assembled genomes include many of the process-critical bacteria involved in wastewater treatment, such as Competibacter, Tetrasphaera and TM7. The approach is not limited to different extraction methods, but can be applied to any treatment that results in different relative abundance......Bacteria play a pivotal role in engineered systems such as wastewater treatment plants. Obtaining genomes of the bacteria provides the genetic potential of the system and also allows studies of in situ functions through transcriptomics and proteomics. Hence, it enables correlations of operational......, the sequencing of bulk genomic DNA from environmental samples, has the potential to provide genomes of this uncultured majority. However, so far only few bacterial genomes have been obtained from metagenomic data. In this study we present a new approach to obtain individual genomes from metagenomes. We deeply...

  13. A novel statistic for genome-wide interaction analysis.

    Directory of Open Access Journals (Sweden)

    Xuesen Wu

    2010-09-01

    Full Text Available Although great progress in genome-wide association studies (GWAS has been made, the significant SNP associations identified by GWAS account for only a few percent of the genetic variance, leading many to question where and how we can find the missing heritability. There is increasing interest in genome-wide interaction analysis as a possible source of finding heritability unexplained by current GWAS. However, the existing statistics for testing interaction have low power for genome-wide interaction analysis. To meet challenges raised by genome-wide interactional analysis, we have developed a novel statistic for testing interaction between two loci (either linked or unlinked. The null distribution and the type I error rates of the new statistic for testing interaction are validated using simulations. Extensive power studies show that the developed statistic has much higher power to detect interaction than classical logistic regression. The results identified 44 and 211 pairs of SNPs showing significant evidence of interactions with FDR<0.001 and 0.001genome-wide interaction analysis is a valuable tool for finding remaining missing heritability unexplained by the current GWAS, and the developed novel statistic is able to search significant interaction between SNPs across the genome. Real data analysis showed that the results of genome-wide interaction analysis can be replicated in two independent studies.

  14. Integrated analysis of whole genome and transcriptome sequencing reveals diverse transcriptomic aberrations driven by somatic genomic changes in liver cancers.

    Directory of Open Access Journals (Sweden)

    Yuichi Shiraishi

    Full Text Available Recent studies applying high-throughput sequencing technologies have identified several recurrently mutated genes and pathways in multiple cancer genomes. However, transcriptional consequences from these genomic alterations in cancer genome remain unclear. In this study, we performed integrated and comparative analyses of whole genomes and transcriptomes of 22 hepatitis B virus (HBV-related hepatocellular carcinomas (HCCs and their matched controls. Comparison of whole genome sequence (WGS and RNA-Seq revealed much evidence that various types of genomic mutations triggered diverse transcriptional changes. Not only splice-site mutations, but also silent mutations in coding regions, deep intronic mutations and structural changes caused splicing aberrations. HBV integrations generated diverse patterns of virus-human fusion transcripts depending on affected gene, such as TERT, CDK15, FN1 and MLL4. Structural variations could drive over-expression of genes such as WNT ligands, with/without creating gene fusions. Furthermore, by taking account of genomic mutations causing transcriptional aberrations, we could improve the sensitivity of deleterious mutation detection in known cancer driver genes (TP53, AXIN1, ARID2, RPS6KA3, and identified recurrent disruptions in putative cancer driver genes such as HNF4A, CPS1, TSC1 and THRAP3 in HCCs. These findings indicate genomic alterations in cancer genome have diverse transcriptomic effects, and integrated analysis of WGS and RNA-Seq can facilitate the interpretation of a large number of genomic alterations detected in cancer genome.

  15. Full-length RNA structure prediction of the HIV-1 genome reveals a conserved core domain

    DEFF Research Database (Denmark)

    Sükösd, Zsuzsanna; Andersen, Ebbe Sloth; Seemann, Ernst Stefan

    2015-01-01

    of the HIV-1 genome is highly variable in most regions, with a limited number of stable and conserved RNA secondary structures. Most interesting, a set of long distance interactions form a core organizing structure (COS) that organize the genome into three major structural domains. Despite overlapping...

  16. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan; Parsons, Sven David Charles; Sampson, Samantha Leigh; Van Der Merwe, Ruben Gerhard; Drewe, Julian Ashley; Abdallah, Abdallah; Siame, Kabengele Keith; Gey Van Pittius, Nicolaas Claudius; Van Helden, Paul David; Pain, Arnab; Warren, Robin Mark

    2015-01-01

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  17. Whole genome sequence analysis of Mycobacterium suricattae

    KAUST Repository

    Dippenaar, Anzaan

    2015-10-21

    Tuberculosis occurs in various mammalian hosts and is caused by a range of different lineages of the Mycobacterium tuberculosis complex (MTBC). A recently described member, Mycobacterium suricattae, causes tuberculosis in meerkats (Suricata suricatta) in Southern Africa and preliminary genetic analysis showed this organism to be closely related to an MTBC pathogen of rock hyraxes (Procavia capensis), the dassie bacillus. Here we make use of whole genome sequencing to describe the evolution of the genome of M. suricattae, including known and novel regions of difference, SNPs and IS6110 insertion sites. We used genome-wide phylogenetic analysis to show that M. suricattae clusters with the chimpanzee bacillus, previously isolated from a chimpanzee (Pan troglodytes) in West Africa. We propose an evolutionary scenario for the Mycobacterium africanum lineage 6 complex, showing the evolutionary relationship of M. africanum and chimpanzee bacillus, and the closely related members M. suricattae, dassie bacillus and Mycobacterium mungi.

  18. First experiences of full-profile analysis with GUISDAP

    Directory of Open Access Journals (Sweden)

    M. S. Lehtinen

    Full Text Available In this paper we summarize the theory behind full-profile analysis of IS measurements and report first practical experiences with the GUISDAP (Grand Unified Incoherent Scatter Design and Analysis Package system designed to perform full-profile analysis of any IS measurements efficiently. By fitting whole plasma parameter profiles over the ionosphere, instead of point values of the parameters supposed to be approximately constant over small range intervals, full-profile analysis is free of underlying assumptions about the slow variation of the plasma parameters as a function of range. We define full-profile analysis as a mathematical inversion problem formalism and explain how it differs from the traditional gated analysis. Moreover, we study the bias introduced to traditional analysis results using realistic model ionospheres. By applying the full-profile method to data generated from the model ionospheres, we demonstrate that full-profile analysis is free from this kind of bias. Lastly, an example of analysis of real data by full-profile and gated analyses is shown.

  19. First experiences of full-profile analysis with GUISDAP

    Directory of Open Access Journals (Sweden)

    M. S. Lehtinen

    1996-12-01

    Full Text Available In this paper we summarize the theory behind full-profile analysis of IS measurements and report first practical experiences with the GUISDAP (Grand Unified Incoherent Scatter Design and Analysis Package system designed to perform full-profile analysis of any IS measurements efficiently. By fitting whole plasma parameter profiles over the ionosphere, instead of point values of the parameters supposed to be approximately constant over small range intervals, full-profile analysis is free of underlying assumptions about the slow variation of the plasma parameters as a function of range. We define full-profile analysis as a mathematical inversion problem formalism and explain how it differs from the traditional gated analysis. Moreover, we study the bias introduced to traditional analysis results using realistic model ionospheres. By applying the full-profile method to data generated from the model ionospheres, we demonstrate that full-profile analysis is free from this kind of bias. Lastly, an example of analysis of real data by full-profile and gated analyses is shown.

  20. Differential DNA Methylation Analysis without a Reference Genome

    Directory of Open Access Journals (Sweden)

    Johanna Klughammer

    2015-12-01

    Full Text Available Genome-wide DNA methylation mapping uncovers epigenetic changes associated with animal development, environmental adaptation, and species evolution. To address the lack of high-throughput methods for DNA methylation analysis in non-model organisms, we developed an integrated approach for studying DNA methylation differences independent of a reference genome. Experimentally, our method relies on an optimized 96-well protocol for reduced representation bisulfite sequencing (RRBS, which we have validated in nine species (human, mouse, rat, cow, dog, chicken, carp, sea bass, and zebrafish. Bioinformatically, we developed the RefFreeDMA software to deduce ad hoc genomes directly from RRBS reads and to pinpoint differentially methylated regions between samples or groups of individuals (http://RefFreeDMA.computational-epigenetics.org. The identified regions are interpreted using motif enrichment analysis and/or cross-mapping to annotated genomes. We validated our method by reference-free analysis of cell-type-specific DNA methylation in the blood of human, cow, and carp. In summary, we present a cost-effective method for epigenome analysis in ecology and evolution, which enables epigenome-wide association studies in natural populations and species without a reference genome.

  1. The complete genome sequence and comparative genome analysis of the high pathogenicity Yersinia enterocolitica strain 8081.

    Directory of Open Access Journals (Sweden)

    Nicholas R Thomson

    2006-12-01

    Full Text Available The human enteropathogen, Yersinia enterocolitica, is a significant link in the range of Yersinia pathologies extending from mild gastroenteritis to bubonic plague. Comparison at the genomic level is a key step in our understanding of the genetic basis for this pathogenicity spectrum. Here we report the genome of Y. enterocolitica strain 8081 (serotype 0:8; biotype 1B and extensive microarray data relating to the genetic diversity of the Y. enterocolitica species. Our analysis reveals that the genome of Y. enterocolitica strain 8081 is a patchwork of horizontally acquired genetic loci, including a plasticity zone of 199 kb containing an extraordinarily high density of virulence genes. Microarray analysis has provided insights into species-specific Y. enterocolitica gene functions and the intraspecies differences between the high, low, and nonpathogenic Y. enterocolitica biotypes. Through comparative genome sequence analysis we provide new information on the evolution of the Yersinia. We identify numerous loci that represent ancestral clusters of genes potentially important in enteric survival and pathogenesis, which have been lost or are in the process of being lost, in the other sequenced Yersinia lineages. Our analysis also highlights large metabolic operons in Y. enterocolitica that are absent in the related enteropathogen, Yersinia pseudotuberculosis, indicating major differences in niche and nutrients used within the mammalian gut. These include clusters directing, the production of hydrogenases, tetrathionate respiration, cobalamin synthesis, and propanediol utilisation. Along with ancestral gene clusters, the genome of Y. enterocolitica has revealed species-specific and enteropathogen-specific loci. This has provided important insights into the pathology of this bacterium and, more broadly, into the evolution of the genus. Moreover, wider investigations looking at the patterns of gene loss and gain in the Yersinia have highlighted common

  2. A simple method for the parallel deep sequencing of full influenza A genomes

    DEFF Research Database (Denmark)

    Kampmann, Marie-Louise; Fordyce, Sarah Louise; Avila Arcos, Maria del Carmen

    2011-01-01

    Given the major threat of influenza A to human and animal health, and its ability to evolve rapidly through mutation and reassortment, tools that enable its timely characterization are necessary to help monitor its evolution and spread. For this purpose, deep sequencing can be a very valuable tool....... This study reports a comprehensive method that enables deep sequencing of the complete genomes of influenza A subtypes using the Illumina Genome Analyzer IIx (GAIIx). By using this method, the complete genomes of nine viruses were sequenced in parallel, representing the 2009 pandemic H1N1 virus, H5N1 virus...

  3. Quantitative high-resolution genomic analysis of single cancer cells.

    Directory of Open Access Journals (Sweden)

    Juliane Hannemann

    Full Text Available During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics.

  4. First full-length genome sequence of the polerovirus luffa aphid-borne yellows virus (LABYV) reveals the presence of at least two consensus sequences in an isolate from Thailand.

    Science.gov (United States)

    Knierim, Dennis; Maiss, Edgar; Kenyon, Lawrence; Winter, Stephan; Menzel, Wulf

    2015-10-01

    Luffa aphid-borne yellows virus (LABYV) was proposed as the name for a previously undescribed polerovirus based on partial genome sequences obtained from samples of cucurbit plants collected in Thailand between 2008 and 2013. In this study, we determined the first full-length genome sequence of LABYV. Based on phylogenetic analysis and genome properties, it is clear that this virus represents a distinct species in the genus Polerovirus. Analysis of sequences from sample TH24, which was collected in 2010 from a luffa plant in Thailand, reveals the presence of two different full-length genome consensus sequences.

  5. Full genome sequence of a Danish isolate of Mycobacterium avium subspecies paratuberculosis, strain Ejlskov2007

    DEFF Research Database (Denmark)

    Afzal, Mamuna; Abidi, Soad; Mikkelsen, Heidi

    We have sequenced a Danish isolate of Mycobacterium avium subspecies paratuberculosis, strain Ejlskov2007. The strain was isolated from faecal material of a 48 month old second parity Danish Holstein cow, with clinical symptoms of chronic diarrhoea and emaciation. The cultures were grown on Löwen......We have sequenced a Danish isolate of Mycobacterium avium subspecies paratuberculosis, strain Ejlskov2007. The strain was isolated from faecal material of a 48 month old second parity Danish Holstein cow, with clinical symptoms of chronic diarrhoea and emaciation. The cultures were grown......, consisting of 4317 unique gene families. Comparison with M. avium paratuberculosis strain K10 revealed only 3436 genes in common (~70%). We have used GenomeAtlases to show conserved (and unique) regions along the Ejlskov2007 chromosome, compared to 2 other Mycobacterium avium sequenced genomes. Pan......-genome analyses of the sequenced Mycobacterium genomes reveal a surprisingly open and diverse set of genes for this bacterial genera....

  6. Genome analysis and comparative genomics of a Giardia intestinalis assemblage E isolate

    Directory of Open Access Journals (Sweden)

    Andersson Jan O

    2010-10-01

    Full Text Available Abstract Background Giardia intestinalis is a protozoan parasite that causes diarrhea in a wide range of mammalian species. To further understand the genetic diversity between the Giardia intestinalis species, we have performed genome sequencing and analysis of a wild-type Giardia intestinalis sample from the assemblage E group, isolated from a pig. Results We identified 5012 protein coding genes, the majority of which are conserved compared to the previously sequenced genomes of the WB and GS strains in terms of microsynteny and sequence identity. Despite this, there is an unexpectedly large number of chromosomal rearrangements and several smaller structural changes that are present in all chromosomes. Novel members of the VSP, NEK Kinase and HCMP gene families were identified, which may reveal possible mechanisms for host specificity and new avenues for antigenic variation. We used comparative genomics of the three diverse Giardia intestinalis isolates P15, GS and WB to define a core proteome for this species complex and to identify lineage-specific genes. Extensive analyses of polymorphisms in the core proteome of Giardia revealed differential rates of divergence among cellular processes. Conclusions Our results indicate that despite a well conserved core of genes there is significant genome variation between Giardia isolates, both in terms of gene content, gene polymorphisms, structural chromosomal variations and surface molecule repertoires. This study improves the annotation of the Giardia genomes and enables the identification of functionally important variation.

  7. New genomic resources for switchgrass: a BAC library and comparative analysis of homoeologous genomic regions harboring bioenergy traits

    Directory of Open Access Journals (Sweden)

    Feltus Frank A

    2011-07-01

    Full Text Available Abstract Background Switchgrass, a C4 species and a warm-season grass native to the prairies of North America, has been targeted for development into an herbaceous biomass fuel crop. Genetic improvement of switchgrass feedstock traits through marker-assisted breeding and biotechnology approaches calls for genomic tools development. Establishment of integrated physical and genetic maps for switchgrass will accelerate mapping of value added traits useful to breeding programs and to isolate important target genes using map based cloning. The reported polyploidy series in switchgrass ranges from diploid (2X = 18 to duodecaploid (12X = 108. Like in other large, repeat-rich plant genomes, this genomic complexity will hinder whole genome sequencing efforts. An extensive physical map providing enough information to resolve the homoeologous genomes would provide the necessary framework for accurate assembly of the switchgrass genome. Results A switchgrass BAC library constructed by partial digestion of nuclear DNA with EcoRI contains 147,456 clones covering the effective genome approximately 10 times based on a genome size of 3.2 Gigabases (~1.6 Gb effective. Restriction digestion and PFGE analysis of 234 randomly chosen BACs indicated that 95% of the clones contained inserts, ranging from 60 to 180 kb with an average of 120 kb. Comparative sequence analysis of two homoeologous genomic regions harboring orthologs of the rice OsBRI1 locus, a low-copy gene encoding a putative protein kinase and associated with biomass, revealed that orthologous clones from homoeologous chromosomes can be unambiguously distinguished from each other and correctly assembled to respective fingerprint contigs. Thus, the data obtained not only provide genomic resources for further analysis of switchgrass genome, but also improve efforts for an accurate genome sequencing strategy. Conclusions The construction of the first switchgrass BAC library and comparative analysis of

  8. Genome-Based Comparison of Clostridioides difficile: Average Amino Acid Identity Analysis of Core Genomes.

    Science.gov (United States)

    Cabal, Adriana; Jun, Se-Ran; Jenjaroenpun, Piroon; Wanchai, Visanu; Nookaew, Intawat; Wongsurawat, Thidathip; Burgess, Mary J; Kothari, Atul; Wassenaar, Trudy M; Ussery, David W

    2018-02-14

    Infections due to Clostridioides difficile (previously known as Clostridium difficile) are a major problem in hospitals, where cases can be caused by community-acquired strains as well as by nosocomial spread. Whole genome sequences from clinical samples contain a lot of information but that needs to be analyzed and compared in such a way that the outcome is useful for clinicians or epidemiologists. Here, we compare 663 public available complete genome sequences of C. difficile using average amino acid identity (AAI) scores. This analysis revealed that most of these genomes (640, 96.5%) clearly belong to the same species, while the remaining 23 genomes produce four distinct clusters within the Clostridioides genus. The main C. difficile cluster can be further divided into sub-clusters, depending on the chosen cutoff. We demonstrate that MLST, either based on partial or full gene-length, results in biased estimates of genetic differences and does not capture the true degree of similarity or differences of complete genomes. Presence of genes coding for C. difficile toxins A and B (ToxA/B), as well as the binary C. difficile toxin (CDT), was deduced from their unique PfamA domain architectures. Out of the 663 C. difficile genomes, 535 (80.7%) contained at least one copy of ToxA or ToxB, while these genes were missing from 128 genomes. Although some clusters were enriched for toxin presence, these genes are variably present in a given genetic background. The CDT genes were found in 191 genomes, which were restricted to a few clusters only, and only one cluster lacked the toxin A/B genes consistently. A total of 310 genomes contained ToxA/B without CDT (47%). Further, published metagenomic data from stools were used to assess the presence of C. difficile sequences in blinded cases of C. difficile infection (CDI) and controls, to test if metagenomic analysis is sensitive enough to detect the pathogen, and to establish strain relationships between cases from the same

  9. Genomes

    National Research Council Canada - National Science Library

    Brown, T. A. (Terence A.)

    2002-01-01

    ... of genome expression and replication processes, and transcriptomics and proteomics. This text is richly illustrated with clear, easy-to-follow, full color diagrams, which are downloadable from the book's website...

  10. Unstructured Navier-Stokes Analysis of Full TCA Configuration

    Science.gov (United States)

    Frink, Neal T.; Pirzadeh, Shahyar Z.

    1999-01-01

    This paper presents an Unstructured Navier-Stokes Analysis of Full TCA (Technology Concept Airplane) Configuration. The topics include: 1) Motivation; 2) Milestone and approach; 3) Overview of the unstructured-grid system; 4) Results on full TCA W/B/N/D/E configuration; 5) Concluding remarks; and 6) Future directions.

  11. Comparative genomics of Mycoplasma: analysis of conserved essential genes and diversity of the pan-genome.

    Directory of Open Access Journals (Sweden)

    Wei Liu

    Full Text Available Mycoplasma, the smallest self-replicating organism with a minimal metabolism and little genomic redundancy, is expected to be a close approximation to the minimal set of genes needed to sustain bacterial life. This study employs comparative evolutionary analysis of twenty Mycoplasma genomes to gain an improved understanding of essential genes. By analyzing the core genome of mycoplasmas, we finally revealed the conserved essential genes set for mycoplasma survival. Further analysis showed that the core genome set has many characteristics in common with experimentally identified essential genes. Several key genes, which are related to DNA replication and repair and can be disrupted in transposon mutagenesis studies, may be critical for bacteria survival especially over long period natural selection. Phylogenomic reconstructions based on 3,355 homologous groups allowed robust estimation of phylogenetic relatedness among mycoplasma strains. To obtain deeper insight into the relative roles of molecular evolution in pathogen adaptation to their hosts, we also analyzed the positive selection pressures on particular sites and lineages. There appears to be an approximate correlation between the divergence of species and the level of positive selection detected in corresponding lineages.

  12. Full Dynamic Analysis of Mooring Solution Candidates - First Iteration

    DEFF Research Database (Denmark)

    Thomsen, Jonas Bjerg; Ferri, Francesco

    This report covers an initial full dynamic analysis of the mooring solutions for the four wave energy converters in the project “Mooring Solutions for Large Wave Energy Converters”. The analysis tends to provide the first understanding of the layouts and provide discussion on what parameters that...

  13. Full-Genome Sequence of a Novel Varicella-Zoster Virus Clade Isolated in Mexico.

    Science.gov (United States)

    Garcés-Ayala, Fabiola; Rodríguez-Castillo, Araceli; Ortiz-Alcántara, Joanna María; Gonzalez-Durán, Elizabeth; Segura-Candelas, José Miguel; Pérez-Agüeros, Sandra Ivette; Escobar-Escamilla, Noé; Méndez-Tenorio, Alfonso; Diaz-Quiñonez, José Alberto; Ramirez-González, José Ernesto

    2015-07-09

    Varicella-zoster virus (VZV) is a member of the Herpesviridae family, which causes varicella (chicken pox) and herpes zoster (shingles) in humans. Here, we report the complete genome sequence of varicella-zoster virus, isolated from a vesicular fluid sample, revealing the circulation of VZV clade VIII in Mexico. Copyright © 2015 Garcés-Ayala et al.

  14. Full-Genome Sequence of a Novel Varicella-Zoster Virus Clade Isolated in Mexico

    OpenAIRE

    Garc?s-Ayala, Fabiola; Rodr?guez-Castillo, Araceli; Ortiz-Alc?ntara, Joanna Mar?a; Gonzalez-Dur?n, Elizabeth; Segura-Candelas, Jos? Miguel; P?rez-Ag?eros, Sandra Ivette; Escobar-Escamilla, No?; M?ndez-Tenorio, Alfonso; Diaz-Qui?onez, Jos? Alberto; Ramirez-Gonz?lez, Jos? Ernesto

    2015-01-01

    Varicella-zoster virus (VZV) is a member of the Herpesviridae family, which causes varicella (chicken pox) and herpes zoster (shingles) in humans. Here, we report the complete genome sequence of varicella-zoster virus, isolated from a vesicular fluid sample, revealing the circulation of VZV clade VIII in Mexico.

  15. Full-Genome Sequence of a Novel Varicella-Zoster Virus Clade Isolated in Mexico

    Science.gov (United States)

    Rodríguez-Castillo, Araceli; Ortiz-Alcántara, Joanna María; Gonzalez-Durán, Elizabeth; Segura-Candelas, José Miguel; Pérez-Agüeros, Sandra Ivette; Escobar-Escamilla, Noé; Méndez-Tenorio, Alfonso; Diaz-Quiñonez, José Alberto

    2015-01-01

    Varicella-zoster virus (VZV) is a member of the Herpesviridae family, which causes varicella (chicken pox) and herpes zoster (shingles) in humans. Here, we report the complete genome sequence of varicella-zoster virus, isolated from a vesicular fluid sample, revealing the circulation of VZV clade VIII in Mexico. PMID:26159533

  16. FGWAS: Functional genome wide association analysis.

    Science.gov (United States)

    Huang, Chao; Thompson, Paul; Wang, Yalin; Yu, Yang; Zhang, Jingwen; Kong, Dehan; Colen, Rivka R; Knickmeyer, Rebecca C; Zhu, Hongtu

    2017-10-01

    Functional phenotypes (e.g., subcortical surface representation), which commonly arise in imaging genetic studies, have been used to detect putative genes for complexly inherited neuropsychiatric and neurodegenerative disorders. However, existing statistical methods largely ignore the functional features (e.g., functional smoothness and correlation). The aim of this paper is to develop a functional genome-wide association analysis (FGWAS) framework to efficiently carry out whole-genome analyses of functional phenotypes. FGWAS consists of three components: a multivariate varying coefficient model, a global sure independence screening procedure, and a test procedure. Compared with the standard multivariate regression model, the multivariate varying coefficient model explicitly models the functional features of functional phenotypes through the integration of smooth coefficient functions and functional principal component analysis. Statistically, compared with existing methods for genome-wide association studies (GWAS), FGWAS can substantially boost the detection power for discovering important genetic variants influencing brain structure and function. Simulation studies show that FGWAS outperforms existing GWAS methods for searching sparse signals in an extremely large search space, while controlling for the family-wise error rate. We have successfully applied FGWAS to large-scale analysis of data from the Alzheimer's Disease Neuroimaging Initiative for 708 subjects, 30,000 vertices on the left and right hippocampal surfaces, and 501,584 SNPs. Copyright © 2017 Elsevier Inc. All rights reserved.

  17. Rapid CRISPR/Cas9-Mediated Cloning of Full-Length Epstein-Barr Virus Genomes from Latently Infected Cells.

    Science.gov (United States)

    Yajima, Misako; Ikuta, Kazufumi; Kanda, Teru

    2018-04-03

    Herpesviruses have relatively large DNA genomes of more than 150 kb that are difficult to clone and sequence. Bacterial artificial chromosome (BAC) cloning of herpesvirus genomes is a powerful technique that greatly facilitates whole viral genome sequencing as well as functional characterization of reconstituted viruses. We describe recently invented technologies for rapid BAC cloning of herpesvirus genomes using CRISPR/Cas9-mediated homology-directed repair. We focus on recent BAC cloning techniques of Epstein-Barr virus (EBV) genomes and discuss the possible advantages of a CRISPR/Cas9-mediated strategy comparatively with precedent EBV-BAC cloning strategies. We also describe the design decisions of this technology as well as possible pitfalls and points to be improved in the future. The obtained EBV-BAC clones are subjected to long-read sequencing analysis to determine complete EBV genome sequence including repetitive regions. Rapid cloning and sequence determination of various EBV strains will greatly contribute to the understanding of their global geographical distribution. This technology can also be used to clone disease-associated EBV strains and test the hypothesis that they have special features that distinguish them from strains that infect asymptomatically.

  18. Near-Full Genome Characterisation of Two Natural Intergenotypic 2k/1b Recombinant Hepatitis C Virus Isolates

    Directory of Open Access Journals (Sweden)

    Victoria L. Demetriou

    2011-01-01

    Full Text Available Few natural intergenotypic hepatitis C virus (HCV recombinants have been characterised, and only RF1_2k/1b has demonstrated widespread transmission. The near-full length genome sequences for two cases of 2k/1b recombinants (CYHCV037 and CYHCV093 sampled in Cyprus were obtained using strain-specific RT-PCR amplification and sequencing protocols. Sequence analysis confirmed their similarity with the original RF1_2k/1b strain from St. Petersburg, N687. These two isolates significantly contribute to the sequence data available on this recombinant and confirm its increasing spread among individuals from Eastern Europe, and its association with transmission through intravenous drug use. Phylogenetic analyses reveal clustering of the sequence 3′ to the recombination point, not seen in the topology of the 5′ sequences, implying a more complicated evolutionary history than that held to date. The increasing cases of HCV recombinant strains underline the requirement of their contribution to the standardised rules of HCV classification and nomenclature, molecular epidemiology, diagnosis, and treatment.

  19. Molecular cloning and expression of full-length DNA copies of the genomic RNAs of cowpea mosaic virus

    NARCIS (Netherlands)

    Vos, P.

    1987-01-01

    The experiments described in this thesis were designed to unravel various aspects of the mechanism of gene expression of cowpea mosaic virus (CPMV). For this purpose full-length DNA copies of both genomic RNAs of CPMV were constructed. Using powerful invitro

  20. Comparative Genomic Analysis of Soybean Flowering Genes

    Science.gov (United States)

    Jung, Chol-Hee; Wong, Chui E.; Singh, Mohan B.; Bhalla, Prem L.

    2012-01-01

    Flowering is an important agronomic trait that determines crop yield. Soybean is a major oilseed legume crop used for human and animal feed. Legumes have unique vegetative and floral complexities. Our understanding of the molecular basis of flower initiation and development in legumes is limited. Here, we address this by using a computational approach to examine flowering regulatory genes in the soybean genome in comparison to the most studied model plant, Arabidopsis. For this comparison, a genome-wide analysis of orthologue groups was performed, followed by an in silico gene expression analysis of the identified soybean flowering genes. Phylogenetic analyses of the gene families highlighted the evolutionary relationships among these candidates. Our study identified key flowering genes in soybean and indicates that the vernalisation and the ambient-temperature pathways seem to be the most variant in soybean. A comparison of the orthologue groups containing flowering genes indicated that, on average, each Arabidopsis flowering gene has 2-3 orthologous copies in soybean. Our analysis highlighted that the CDF3, VRN1, SVP, AP3 and PIF3 genes are paralogue-rich genes in soybean. Furthermore, the genome mapping of the soybean flowering genes showed that these genes are scattered randomly across the genome. A paralogue comparison indicated that the soybean genes comprising the largest orthologue group are clustered in a 1.4 Mb region on chromosome 16 of soybean. Furthermore, a comparison with the undomesticated soybean (Glycine soja) revealed that there are hundreds of SNPs that are associated with putative soybean flowering genes and that there are structural variants that may affect the genes of the light-signalling and ambient-temperature pathways in soybean. Our study provides a framework for the soybean flowering pathway and insights into the relationship and evolution of flowering genes between a short-day soybean and the long-day plant, Arabidopsis. PMID:22679494

  1. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas

    2009-03-17

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a model cartilaginous fish genome because of its relatively small genome size (910 Mb). The whole genome of C. milii is being sequenced (first cartilaginous fish genome to be sequenced completely). To characterize the transcriptome of C. milii and to assist in annotating exon-intron boundaries, transcriptional start sites and alternatively spliced transcripts, we are generating full-length cDNA sequences from C. milii.

  2. The Chlamydia psittaci genome: a comparative analysis of intracellular pathogens.

    Directory of Open Access Journals (Sweden)

    Anja Voigt

    Full Text Available Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis.A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins.This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.

  3. PGSB/MIPS Plant Genome Information Resources and Concepts for the Analysis of Complex Grass Genomes.

    Science.gov (United States)

    Spannagl, Manuel; Bader, Kai; Pfeifer, Matthias; Nussbaumer, Thomas; Mayer, Klaus F X

    2016-01-01

    PGSB (Plant Genome and Systems Biology; formerly MIPS-Munich Institute for Protein Sequences) has been involved in developing, implementing and maintaining plant genome databases for more than a decade. Genome databases and analysis resources have focused on individual genomes and aim to provide flexible and maintainable datasets for model plant genomes as a backbone against which experimental data, e.g., from high-throughput functional genomics, can be organized and analyzed. In addition, genomes from both model and crop plants form a scaffold for comparative genomics, assisted by specialized tools such as the CrowsNest viewer to explore conserved gene order (synteny) between related species on macro- and micro-levels.The genomes of many economically important Triticeae plants such as wheat, barley, and rye present a great challenge for sequence assembly and bioinformatic analysis due to their enormous complexity and large genome size. Novel concepts and strategies have been developed to deal with these difficulties and have been applied to the genomes of wheat, barley, rye, and other cereals. This includes the GenomeZipper concept, reference-guided exome assembly, and "chromosome genomics" based on flow cytometry sorted chromosomes.

  4. Universal Internucleotide Statistics in Full Genomes: A Footprint of the DNA Structure and Packaging?

    OpenAIRE

    Bogachev, Mikhail I.; Kayumov, Airat R.; Bunde, Armin

    2014-01-01

    Uncovering the fundamental laws that govern the complex DNA structural organization remains challenging and is largely based upon reconstructions from the primary nucleotide sequences. Here we investigate the distributions of the internucleotide intervals and their persistence properties in complete genomes of various organisms from Archaea and Bacteria to H. Sapiens aiming to reveal the manifestation of the universal DNA architecture. We find that in all considered organisms the internucleot...

  5. Molecular characterisation of the full-length genome of olive latent virus 1 isolated from tomato.

    Science.gov (United States)

    Hasiów-Jaroszewska, Beata; Borodynko, Natasza; Pospieszny, Henryk

    2011-05-01

    Olive latent virus 1 (OLV-1) is a species of the Necrovirus genus. So far, it has been reported to infect olive, citrus tree and tulip. Here, we determined and analysed the complete genomic sequence of an isolate designated as CM1, which was collected from tomato plant in the Wielkopolska region of Poland and represents the prevalent isolate of OLV-1. The CM1 genome consists of monopartite single-stranded positive-sense RNA genome sized 3,699 nt with five open reading frames (ORFs) and small inter-cistronic regions. ORF1 encodes a polypeptide with a molecular weight of 23 kDa and the read-through (RT) of its amber stop codon results in ORF1 RT that encodes the virus RNA-dependent RNA polymerase. ORF2 and ORF3 encode two peptides, with 8 kDa and 6 kDa, respectively, which appear to be involved in cell-to-cell movement. ORF4 is located in the 3' terminal and encodes a protein with 30 kDa identified as the viral coat protein (CP). The differences in CP region of four OLV-1 isolates whose sequences have been deposited in GenBank were observed. Nucleotide sequence identities of the CP of tomato CM1 isolate with those of olive, citrus and tulip isolates were 91.8%, 89.5% and 92.5%, respectively. In contrast to other OLV-1 isolates, CM1 induced necrotic spots on tomato plants and elicited necrotic local lesions on Nicotiana benthamiana, followed by systemic infection. This is the third complete genomic sequence of OLV-1 reported and the first one from tomato.

  6. Whole genome sequencing and bioinformatics analysis of two Egyptian genomes.

    Science.gov (United States)

    ElHefnawi, Mahmoud; Jeon, Sungwon; Bhak, Youngjune; ElFiky, Asmaa; Horaiz, Ahmed; Jun, JeHoon; Kim, Hyunho; Bhak, Jong

    2018-05-15

    We report two Egyptian male genomes (EGP1 and EGP2) sequenced at ~ 30× sequencing depths. EGP1 had 4.7 million variants, where 198,877 were novel variants while EGP2 had 209,109 novel variants out of 4.8 million variants. The mitochondrial haplogroup of the two individuals were identified to be H7b1 and L2a1c, respectively. We also identified the Y haplogroup of EGP1 (R1b) and EGP2 (J1a2a1a2 > P58 > FGC11). EGP1 had a mutation in the NADH gene of the mitochondrial genome ND4 (m.11778 G > A) that causes Leber's hereditary optic neuropathy. Some SNPs shared by the two genomes were associated with an increased level of cholesterol and triglycerides, probably related with Egyptians obesity. Comparison of these genomes with African and Western-Asian genomes can provide insights on Egyptian ancestry and genetic history. This resource can be used to further understand genomic diversity and functional classification of variants as well as human migration and evolution across Africa and Western-Asia. Copyright © 2017. Published by Elsevier B.V.

  7. Performance analysis of a full-field and full-range swept-source OCT system

    Science.gov (United States)

    Krauter, J.; Boettcher, T.; Körner, K.; Gronle, M.; Osten, W.; Passilly, N.; Froehly, L.; Perrin, S.; Gorecki, C.

    2015-09-01

    In recent years, optical coherence tomography (OCT) became gained importance in medical disciplines like ophthalmology, due to its noninvasive optical imaging technique with micrometer resolution and short measurement time. It enables e. g. the measurement and visualization of the depth structure of the retina. In other medical disciplines like dermatology, histopathological analysis is still the gold standard for skin cancer diagnosis. The EU-funded project VIAMOS (Vertically Integrated Array-type Mirau-based OCT System) proposes a new type of OCT system combined with micro-technologies to provide a hand-held, low-cost and miniaturized OCT system. The concept is a combination of full-field and full-range swept-source OCT (SS-OCT) detection in a multi-channel sensor based on a micro-optical Mirau-interferometer array, which is fabricated by means of wafer fabrication. This paper presents the study of an experimental proof-of-concept OCT system as a one-channel sensor with bulk optics. This sensor is a Linnik-interferometer type with similar optical parameters as the Mirau-interferometer array. A commercial wavelength tunable light source with a center wavelength at 845nm and 50nm spectral bandwidth is used with a camera for parallel OCT A-Scan detection. In addition, the reference microscope objective lens of the Linnik-interferometer is mounted on a piezo-actuated phase-shifter. Phase-shifting interferometry (PSI) techniques are applied for resolving the conjugate complex artifact and consequently contribute to an increase of image quality and depth range. A suppression ratio of the complex conjugate term of 36 dB is shown and a system sensitivity greater than 96 dB could be measured.

  8. Genomic insight into the common carp (Cyprinus carpio genome by sequencing analysis of BAC-end sequences

    Directory of Open Access Journals (Sweden)

    Wang Jintu

    2011-04-01

    Full Text Available Abstract Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio, a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3

  9. Genome-wide comparative analysis of four Indian Drosophila species.

    Science.gov (United States)

    Mohanty, Sujata; Khanna, Radhika

    2017-12-01

    Comparative analysis of multiple genomes of closely or distantly related Drosophila species undoubtedly creates excitement among evolutionary biologists in exploring the genomic changes with an ecology and evolutionary perspective. We present herewith the de novo assembled whole genome sequences of four Drosophila species, D. bipectinata, D. takahashii, D. biarmipes and D. nasuta of Indian origin using Next Generation Sequencing technology on an Illumina platform along with their detailed assembly statistics. The comparative genomics analysis, e.g. gene predictions and annotations, functional and orthogroup analysis of coding sequences and genome wide SNP distribution were performed. The whole genome of Zaprionus indianus of Indian origin published earlier by us and the genome sequences of previously sequenced 12 Drosophila species available in the NCBI database were included in the analysis. The present work is a part of our ongoing genomics project of Indian Drosophila species.

  10. Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...the Plant DB link list in simple search page) Genome analysis methods Presence or... absence of Genome analysis methods information in this DB (link to the Genome analysis methods information ...base Site Policy | Contact Us Registered plant list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  11. Analysis of radiation-induced genome alterations in Vigna unguiculata

    Directory of Open Access Journals (Sweden)

    van der Vyver C

    2011-09-01

    Full Text Available Christell van der Vyver1, B Juan Vorster2, Karl J Kunert3, Christopher A Cullis41Institute for Plant Biotechnology, Department of Genetics, University of Stellenbosch, Stellenbosch, South Africa; 2Department of Plant Production and Soil Science, and 3Department of Plant Science, Forestry and Agricultural Biotechnology Institute, University of Pretoria, Pretoria, South Africa; 4Case Western Reserve University, Department of Biology, Cleveland, OH, USAAbstract: Seeds from an inbred Vigna unguiculata (cowpea cultivar were gamma-irradiated with a dose of 180 Gy in order to identify and characterize possible mutations. Three techniques, ie, random amplified polymorphic DNA, microsatellites, and representational difference analysis, were used to characterize possible DNA variation among the mutants and nonirradiated control plants both immediately after irradiation and in subsequent generations. A large portion of putative radiation-induced genome changes had significant similarities to chloroplast sequences. The frequency of mutation at three of these isolated polymorphic regions with chloroplast similarity was further determined by polymerase chain reaction screening using a large number of individual parental, M1, and M2 plants. Analysis of these sequences indicated that the rate at which various regions of the genome is mutated in irradiation experiments differs significantly and also that mutations have variable “repair” rates. Furthermore, regions of the nuclear DNA derived from the chloroplast genome are highly susceptible to modification by radiation treatment. Overall, data have provided detailed information on the effects of gamma irradiation on the cowpea genome and about the ability of the plant to repair these genome changes in subsequent plant generations.Keywords: mutation breeding, gamma radiation, genetic mutations, cowpea, representational difference analysis

  12. Validation study of core analysis methods for full MOX BWR

    International Nuclear Information System (INIS)

    2013-01-01

    JNES has been developing a technical database used in reviewing validation of core analysis methods of LWRs in the coming occasions: (1) confirming the core safety parameters of the initial core (one-third MOX core) through a full MOX core in Oma Nuclear Power Plant, which is under the construction, (2) licensing high-burnup MOX cores in the future and (3) reviewing topical reports on core analysis codes for safety design and evaluation. Based on the technical database, JNES will issue a guide of reviewing the core analysis methods used for safety design and evaluation of LWRs. The database will be also used for validation and improving of core analysis codes developed by JNES. JNES has progressed with the projects: (1) improving a Doppler reactivity analysis model in a Monte Carlo calculation code MVP, (2) sensitivity study of nuclear cross section date on reactivity calculation of experimental cores composed of UO 2 and MOX fuel rods, (3) analysis of isotopic composition data for UO 2 and MOX fuels and (4) the guide of reviewing the core analysis codes and others. (author)

  13. Validation study of core analysis methods for full MOX BWR

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    2013-08-15

    JNES has been developing a technical database used in reviewing validation of core analysis methods of LWRs in the coming occasions: (1) confirming the core safety parameters of the initial core (one-third MOX core) through a full MOX core in Oma Nuclear Power Plant, which is under the construction, (2) licensing high-burnup MOX cores in the future and (3) reviewing topical reports on core analysis codes for safety design and evaluation. Based on the technical database, JNES will issue a guide of reviewing the core analysis methods used for safety design and evaluation of LWRs. The database will be also used for validation and improving of core analysis codes developed by JNES. JNES has progressed with the projects: (1) improving a Doppler reactivity analysis model in a Monte Carlo calculation code MVP, (2) sensitivity study of nuclear cross section date on reactivity calculation of experimental cores composed of UO{sub 2} and MOX fuel rods, (3) analysis of isotopic composition data for UO{sub 2} and MOX fuels and (4) the guide of reviewing the core analysis codes and others. (author)

  14. Synonymous Codon Usage Analysis of Thirty Two Mycobacteriophage Genomes

    Directory of Open Access Journals (Sweden)

    Sameer Hassan

    2009-01-01

    Full Text Available Synonymous codon usage of protein coding genes of thirty two completely sequenced mycobacteriophage genomes was studied using multivariate statistical analysis. One of the major factors influencing codon usage is identified to be compositional bias. Codons ending with either C or G are preferred in highly expressed genes among which C ending codons are highly preferred over G ending codons. A strong negative correlation between effective number of codons (Nc and GC3s content was also observed, showing that the codon usage was effected by gene nucleotide composition. Translational selection is also identified to play a role in shaping the codon usage operative at the level of translational accuracy. High level of heterogeneity is seen among and between the genomes. Length of genes is also identified to influence the codon usage in 11 out of 32 phage genomes. Mycobacteriophage Cooper is identified to be the highly biased genome with better translation efficiency comparing well with the host specific tRNA genes.

  15. The sequence and analysis of a Chinese pig genome

    Directory of Open Access Journals (Sweden)

    Fang Xiaodong

    2012-11-01

    Full Text Available Abstract Background The pig is an economically important food source, amounting to approximately 40% of all meat consumed worldwide. Pigs also serve as an important model organism because of their similarity to humans at the anatomical, physiological and genetic level, making them very useful for studying a variety of human diseases. A pig strain of particular interest is the miniature pig, specifically the Wuzhishan pig (WZSP, as it has been extensively inbred. Its high level of homozygosity offers increased ease for selective breeding for specific traits and a more straightforward understanding of the genetic changes that underlie its biological characteristics. WZSP also serves as a promising means for applications in surgery, tissue engineering, and xenotransplantation. Here, we report the sequencing and analysis of an inbreeding WZSP genome. Results Our results reveal some unique genomic features, including a relatively high level of homozygosity in the diploid genome, an unusual distribution of heterozygosity, an over-representation of tRNA-derived transposable elements, a small amount of porcine endogenous retrovirus, and a lack of type C retroviruses. In addition, we carried out systematic research on gene evolution, together with a detailed investigation of the counterparts of human drug target genes. Conclusion Our results provide the opportunity to more clearly define the genomic character of pig, which could enhance our ability to create more useful pig models.

  16. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Directory of Open Access Journals (Sweden)

    Seyhan Yazar

    Full Text Available A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR on Amazon EC2 instances and Google Compute Engine (GCE, using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2 for E.coli and 53.5% (95% CI: 34.4-72.6 for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1 and 173.9% (95% CI: 134.6-213.1 more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  17. IMG: the integrated microbial genomes database and comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Jacob, Biju; Huang, Jinghua; Williams, Peter; Huntemann, Marcel; Anderson, Iain; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2012-01-01

    The Integrated Microbial Genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG integrates publicly available draft and complete genomes from all three domains of life with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. IMG's data content and analytical capabilities have been continuously extended through regular updates since its first release in March 2005. IMG is available at http://img.jgi.doe.gov. Companion IMG systems provide support for expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er), teaching courses and training in microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu) and analysis of genomes related to the Human Microbiome Project (IMG/HMP: http://www.hmpdacc-resources.org/img_hmp). PMID:22194640

  18. Full genome sequences and molecular characterization of tick-borne encephalitis virus strains isolated from human patients

    Czech Academy of Sciences Publication Activity Database

    Formanová, P.; Černý, Jiří; Černá Bolfíková, B.; Valdés, James J.; Kozlová, I.; Dzhioev, Y.; Růžek, Daniel

    2015-01-01

    Roč. 6, č. 1 (2015), s. 38-46 ISSN 1877-959X R&D Projects: GA ČR GAP502/11/2116; GA ČR GAP302/12/2490 Institutional support: RVO:60077344 Keywords : tick-borne encephalitis virus * tick-borne encephalitis * genome analysis * human patient s Subject RIV: EE - Microbiology, Virology Impact factor: 2.690, year: 2015

  19. Comparative genomic analysis of multidrug-resistant Streptococcus pneumoniae isolates

    Directory of Open Access Journals (Sweden)

    Pan F

    2018-05-01

    Full Text Available Fen Pan,1 Hong Zhang,1 Xiaoyan Dong,2 Weixing Ye,3 Ping He,4 Shulin Zhang,4 Jeff Xianchao Zhu,5 Nanbert Zhong1,2,6 1Department of Clinical Laboratory, Shanghai Children’s Hospital, Shanghai Jiaotong University, Shanghai, China; 2Department of Respiratory, Shanghai Children’s Hospital, Shanghai Jiaotong University, Shanghai, China; 3Shanghai Personal Biotechnology Co., Ltd, Shanghai, China; 4Department of Medical Microbiology and Immunology, Shanghai Jiao Tong University School of Medicine, Shanghai, China; 5Zhejiang Bioruida Biotechnology co. Ltd, Zhejiang, China; 6New York State Institute for Basic Research in Developmental Disabilities, Staten Island, NY, USA Introduction: Multidrug resistance in Streptococcus pneumoniae has emerged as a serious problem to public health. A further understanding of the genetic diversity in antibiotic-resistant S. pneumoniae isolates is needed. Methods: We conducted whole-genome resequencing for 25 pneumococcal strains isolated from children with different antimicrobial resistance profiles. Comparative analysis focus on detection of single-nucleotide polymorphisms (SNPs and insertions and deletions (indels was conducted. Moreover, phylogenetic analysis was applied to investigate the genetic relationship among these strains. Results: The genome size of the isolates was ~2.1 Mbp, covering >90% of the total estimated size of the reference genome. The overall G+C% content was ~39.5%, and there were 2,200–2,400 open reading frames. All isolates with different drug resistance profiles harbored many indels (range 131–171 and SNPs (range 16,103–28,128. Genetic diversity analysis showed that the variation of different genes were associated with specific antibiotic resistance. Known antibiotic resistance genes (pbps, murMN, ciaH, rplD, sulA, and dpr were identified, and new genes (regR, argH, trkH, and PTS-EII closely related with antibiotic resistance were found, although these genes were primarily annotated

  20. Comparative dynamic analysis of the full Grossman model.

    Science.gov (United States)

    Ried, W

    1998-08-01

    The paper applies the method of comparative dynamic analysis to the full Grossman model. For a particular class of solutions, it derives the equations implicitly defining the complete trajectories of the endogenous variables. Relying on the concept of Frisch decision functions, the impact of any parametric change on an endogenous variable can be decomposed into a direct and an indirect effect. The focus of the paper is on marginal changes in the rate of health capital depreciation. It also analyses the impact of either initial financial wealth or the initial stock of health capital. While the direction of most effects remains ambiguous in the full model, the assumption of a zero consumption benefit of health is sufficient to obtain a definite for any direct or indirect effect.

  1. Big Data Analysis of Human Genome Variations

    KAUST Repository

    Gojobori, Takashi

    2016-01-01

    Since the human genome draft sequence was in public for the first time in 2000, genomic analyses have been intensively extended to the population level. The following three international projects are good examples for large-scale studies of human

  2. Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae provide evidence for pervasive mitochondrial DNA recombination

    Directory of Open Access Journals (Sweden)

    Bleidorn Christoph

    2011-01-01

    Full Text Available Abstract Background Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Results Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni and 22,737 bp (P. panini, they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. Conclusions The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB, which has been

  3. Genomic analysis of murine DNA-dependent protein kinase

    International Nuclear Information System (INIS)

    Fujimori, A.; Abe, M.

    2003-01-01

    Full text: The gene of catalytic subunit of DNA dependent protein kinase is responsible gene for SCID mice. The molecules play a critical role in non-homologous end joining including the V(D)J recombination. Contribution of the molecules to the difference of radiosensitivity and the susceptibility to cancer has been suggested. Here we show the entire nucleotide sequence of approximately 193 kbp and 84 kbp genomic regions encoding the entire DNA-PKcs gene in the mouse and chicken respectively. Retroposon was found in the intron 51 of mouse genomic DNA-PKcs gene but in human and chicken. Comparative analysis of these two species strongly suggested that only two genes, DNA-PKcs and MCM4, exist in the region of both species. Several conserved sequences and cis elements, however, were predicted. Recently, the orthologous region for the human DNA-PKcs locus was completed. The results of further comparative study will be discussed

  4. SIDEKICK: Genomic data driven analysis and decision-making framework

    Directory of Open Access Journals (Sweden)

    Yoon Kihoon

    2010-12-01

    Full Text Available Abstract Background Scientists striving to unlock mysteries within complex biological systems face myriad barriers in effectively integrating available information to enhance their understanding. While experimental techniques and available data sources are rapidly evolving, useful information is dispersed across a variety of sources, and sources of the same information often do not use the same format or nomenclature. To harness these expanding resources, scientists need tools that bridge nomenclature differences and allow them to integrate, organize, and evaluate the quality of information without extensive computation. Results Sidekick, a genomic data driven analysis and decision making framework, is a web-based tool that provides a user-friendly intuitive solution to the problem of information inaccessibility. Sidekick enables scientists without training in computation and data management to pursue answers to research questions like "What are the mechanisms for disease X" or "Does the set of genes associated with disease X also influence other diseases." Sidekick enables the process of combining heterogeneous data, finding and maintaining the most up-to-date data, evaluating data sources, quantifying confidence in results based on evidence, and managing the multi-step research tasks needed to answer these questions. We demonstrate Sidekick's effectiveness by showing how to accomplish a complex published analysis in a fraction of the original time with no computational effort using Sidekick. Conclusions Sidekick is an easy-to-use web-based tool that organizes and facilitates complex genomic research, allowing scientists to explore genomic relationships and formulate hypotheses without computational effort. Possible analysis steps include gene list discovery, gene-pair list discovery, various enrichments for both types of lists, and convenient list manipulation. Further, Sidekick's ability to characterize pairs of genes offers new ways to

  5. Full scale experimental analysis of wind direction changes (EOD)

    DEFF Research Database (Denmark)

    Hansen, Kurt Schaldemose

    2007-01-01

    wind direction gust amplitudes associated with the investigated European sites are low compared to the recommended IEC- values. However, these values, as function of the mean wind speed, are difficult to validate thoroughly due to the limited number of fully correlated measurements....... the magnitudes of a joint gust event defined by a simultaneously wind speed- and direction change in order to obtain an indication of the validity of the magnitudes specified in the IEC code. The analysis relates to pre-specified recurrence periods and is based on full-scale wind field measurements. The wind......A coherent wind speed and wind direction change (ECD) load case is defined in the wind turbine standard. This load case is an essential extreme load case that e.g. may be design driving for flap defection of active stall controlled wind turbines. The present analysis identifies statistically...

  6. Specialty and full-service hospitals: a comparative cost analysis.

    Science.gov (United States)

    Carey, Kathleen; Burgess, James F; Young, Gary J

    2008-10-01

    To compare the costs of physician-owned cardiac, orthopedic, and surgical single specialty hospitals with those of full-service hospital competitors. The primary data sources are the Medicare Cost Reports for 1998-2004 and hospital inpatient discharge data for three of the states where single specialty hospitals are most prevalent, Texas, California, and Arizona. The latter were obtained from the Texas Department of State Health Services, the California Office of Statewide Health Planning and Development, and the Agency for Healthcare Research and Quality Healthcare Cost and Utilization Project. Additional data comes from the American Hospital Association Annual Survey Database. We identified all physician-owned cardiac, orthopedic, and surgical specialty hospitals in these three states as well as all full-service acute care hospitals serving the same market areas, defined using Dartmouth Hospital Referral Regions. We estimated a hospital cost function using stochastic frontier regression analysis, and generated hospital specific inefficiency measures. Application of t-tests of significance compared the inefficiency measures of specialty hospitals with those of full-service hospitals to make general comparisons between these classes of hospitals. Results do not provide evidence that specialty hospitals are more efficient than the full-service hospitals with whom they compete. In particular, orthopedic and surgical specialty hospitals appear to have significantly higher levels of cost inefficiency. Cardiac hospitals, however, do not appear to be different from competitors in this respect. Policymakers should not embrace the assumption that physician-owned specialty hospitals produce patient care more efficiently than their full-service hospital competitors.

  7. Dirofilaria immitis JYD-34 isolate: whole genome analysis

    Directory of Open Access Journals (Sweden)

    Catherine Bourguinat

    2017-11-01

    Full Text Available Abstract Background Macrocyclic lactone (ML anthelmintics are used for chemoprophylaxis for heartworm infection in dogs and cats. Cases of dogs becoming infected with heartworms, despite apparent compliance to recommended chemoprophylaxis with approved preventives, has led to such cases being considered as suspected lack of efficacy (LOE. Recently, microfilariae collected from a small number of LOE isolates were used as a source of infection of new host dogs and confirmed to have reduced susceptibility to ML in controlled efficacy studies using L3 challenge in dogs. A specific Dirofilaria immitis laboratory isolate named JYD-34 has also been confirmed to have less than 100% susceptibility to ML-based preventives. For preventive claims against heartworm disease, evidence of 100% efficacy is required by FDA-CVM. It was therefore of interest to determine whether JYD-34 has a genetic profile similar to other documented LOE and confirmed reduced susceptibility isolates or has a genetic profile similar to known ML-susceptible isolates. Methods In this study, the 90Mbp whole genome of the JYD-34 strain was sequenced. This genome was compared using bioinformatics tools to pooled whole genomes of four well-characterized susceptible D. immitis populations, one susceptible Missouri laboratory isolate, as well as the pooled whole genomes of four LOE D. immitis populations. Fixation indexes (FST, which allow the genetic structure of each population (isolate to be compared at the level of single nucleotide polymorphisms (SNP across the genome, have been calculated. Forty-one previously reported SNP, that appeared to differentiate between susceptible and LOE and confirmed reduced susceptibility isolates, were also investigated in the JYD-34 isolate. Results The FST analysis, and the analysis of the 41 SNP that appeared to differentiate reduced susceptibility from fully susceptible isolates, confirmed that the JYD-34 isolate has a genome similar to previously

  8. The complete mitochondrial genome of Gossypium hirsutum and evolutionary analysis of higher plant mitochondrial genomes.

    Science.gov (United States)

    Liu, Guozheng; Cao, Dandan; Li, Shuangshuang; Su, Aiguo; Geng, Jianing; Grover, Corrinne E; Hu, Songnian; Hua, Jinping

    2013-01-01

    Mitochondria are the main manufacturers of cellular ATP in eukaryotes. The plant mitochondrial genome contains large number of foreign DNA and repeated sequences undergone frequently intramolecular recombination. Upland Cotton (Gossypium hirsutum L.) is one of the main natural fiber crops and also an important oil-producing plant in the world. Sequencing of the cotton mitochondrial (mt) genome could be helpful for the evolution research of plant mt genomes. We utilized 454 technology for sequencing and combined with Fosmid library of the Gossypium hirsutum mt genome screening and positive clones sequencing and conducted a series of evolutionary analysis on Cycas taitungensis and 24 angiosperms mt genomes. After data assembling and contigs joining, the complete mitochondrial genome sequence of G. hirsutum was obtained. The completed G.hirsutum mt genome is 621,884 bp in length, and contained 68 genes, including 35 protein genes, four rRNA genes and 29 tRNA genes. Five gene clusters are found conserved in all plant mt genomes; one and four clusters are specifically conserved in monocots and dicots, respectively. Homologous sequences are distributed along the plant mt genomes and species closely related share the most homologous sequences. For species that have both mt and chloroplast genome sequences available, we checked the location of cp-like migration and found several fragments closely linked with mitochondrial genes. The G. hirsutum mt genome possesses most of the common characters of higher plant mt genomes. The existence of syntenic gene clusters, as well as the conservation of some intergenic sequences and genic content among the plant mt genomes suggest that evolution of mt genomes is consistent with plant taxonomy but independent among different species.

  9. Spectral analysis of full field digital mammography data

    International Nuclear Information System (INIS)

    Heine, John J.; Velthuizen, Robert P.

    2002-01-01

    The spectral content of mammograms acquired from using a full field digital mammography (FFDM) system are analyzed. Fourier methods are used to show that the FFDM image power spectra obey an inverse power law; in an average sense, the images may be considered as 1/f fields. Two data representations are analyzed and compared (1) the raw data, and (2) the logarithm of the raw data. Two methods are employed to analyze the power spectra (1) a technique based on integrating the Fourier plane with octave ring sectioning developed previously, and (2) an approach based on integrating the Fourier plane using rings of constant width developed for this work. Both methods allow theoretical modeling. Numerical analysis indicates that the effects due to the transformation influence the power spectra measurements in a statistically significant manner in the high frequency range. However, this effect has little influence on the inverse power law estimation for a given image regardless of the data representation or the theoretical analysis approach. The analysis is presented from two points of view (1) each image is treated independently with the results presented as distributions, and (2) for a given representation, the entire image collection is treated as an ensemble with the results presented as expected values. In general, the constant ring width analysis forms the foundation for a spectral comparison method for finding spectral differences, from an image distribution sense, after applying a nonlinear transformation to the data. The work also shows that power law estimation may be influenced due to the presence of noise in the higher frequency range, which is consistent with the known attributes of the detector efficiency. The spectral modeling and inverse power law determinations obtained here are in agreement with that obtained from the analysis of digitized film-screen images presented previously. The form of the power spectrum for a given image is approximately 1/f 2

  10. Comparative analysis of Acinetobacters: three genomes for three lifestyles.

    Directory of Open Access Journals (Sweden)

    David Vallenet

    Full Text Available Acinetobacter baumannii is the source of numerous nosocomial infections in humans and therefore deserves close attention as multidrug or even pandrug resistant strains are increasingly being identified worldwide. Here we report the comparison of two newly sequenced genomes of A. baumannii. The human isolate A. baumannii AYE is multidrug resistant whereas strain SDF, which was isolated from body lice, is antibiotic susceptible. As reference for comparison in this analysis, the genome of the soil-living bacterium A. baylyi strain ADP1 was used. The most interesting dissimilarities we observed were that i whereas strain AYE and A. baylyi genomes harbored very few Insertion Sequence elements which could promote expression of downstream genes, strain SDF sequence contains several hundred of them that have played a crucial role in its genome reduction (gene disruptions and simple DNA loss; ii strain SDF has low catabolic capacities compared to strain AYE. Interestingly, the latter has even higher catabolic capacities than A. baylyi which has already been reported as a very nutritionally versatile organism. This metabolic performance could explain the persistence of A. baumannii nosocomial strains in environments where nutrients are scarce; iii several processes known to play a key role during host infection (biofilm formation, iron uptake, quorum sensing, virulence factors were either different or absent, the best example of which is iron uptake. Indeed, strain AYE and A. baylyi use siderophore-based systems to scavenge iron from the environment whereas strain SDF uses an alternate system similar to the Haem Acquisition System (HAS. Taken together, all these observations suggest that the genome contents of the 3 Acinetobacters compared are partly shaped by life in distinct ecological niches: human (and more largely hospital environment, louse, soil.

  11. Genome sequencing and analysis of BCG vaccine strains.

    Directory of Open Access Journals (Sweden)

    Wen Zhang

    Full Text Available BACKGROUND: Although the Bacillus Calmette-Guérin (BCG vaccine against tuberculosis (TB has been available for more than 75 years, one third of the world's population is still infected with Mycobacterium tuberculosis and approximately 2 million people die of TB every year. To reduce this immense TB burden, a clearer understanding of the functional genes underlying the action of BCG and the development of new vaccines are urgently needed. METHODS AND FINDINGS: Comparative genomic analysis of 19 M. tuberculosis complex strains showed that BCG strains underwent repeated human manipulation, had higher region of deletion rates than those of natural M. tuberculosis strains, and lost several essential components such as T-cell epitopes. A total of 188 BCG strain T-cell epitopes were lost to various degrees. The non-virulent BCG Tokyo strain, which has the largest number of T-cell epitopes (359, lost 124. Here we propose that BCG strain protection variability results from different epitopes. This study is the first to present BCG as a model organism for genetics research. BCG strains have a very well-documented history and now detailed genome information. Genome comparison revealed the selection process of BCG strains under human manipulation (1908-1966. CONCLUSIONS: Our results revealed the cause of BCG vaccine strain protection variability at the genome level and supported the hypothesis that the restoration of lost BCG Tokyo epitopes is a useful future vaccine development strategy. Furthermore, these detailed BCG vaccine genome investigation results will be useful in microbial genetics, microbial engineering and other research fields.

  12. Millstone: software for multiplex microbial genome analysis and engineering.

    Science.gov (United States)

    Goodman, Daniel B; Kuznetsov, Gleb; Lajoie, Marc J; Ahern, Brian W; Napolitano, Michael G; Chen, Kevin Y; Chen, Changping; Church, George M

    2017-05-25

    Inexpensive DNA sequencing and advances in genome editing have made computational analysis a major rate-limiting step in adaptive laboratory evolution and microbial genome engineering. We describe Millstone, a web-based platform that automates genotype comparison and visualization for projects with up to hundreds of genomic samples. To enable iterative genome engineering, Millstone allows users to design oligonucleotide libraries and create successive versions of reference genomes. Millstone is open source and easily deployable to a cloud platform, local cluster, or desktop, making it a scalable solution for any lab.

  13. CoCoNUT: an efficient system for the comparison and analysis of genomes

    Directory of Open Access Journals (Sweden)

    Kurtz Stefan

    2008-11-01

    Full Text Available Abstract Background Comparative genomics is the analysis and comparison of genomes from different species. This area of research is driven by the large number of sequenced genomes and heavily relies on efficient algorithms and software to perform pairwise and multiple genome comparisons. Results Most of the software tools available are tailored for one specific task. In contrast, we have developed a novel system CoCoNUT (Computational Comparative geNomics Utility Toolkit that allows solving several different tasks in a unified framework: (1 finding regions of high similarity among multiple genomic sequences and aligning them, (2 comparing two draft or multi-chromosomal genomes, (3 locating large segmental duplications in large genomic sequences, and (4 mapping cDNA/EST to genomic sequences. Conclusion CoCoNUT is competitive with other software tools w.r.t. the quality of the results. The use of state of the art algorithms and data structures allows CoCoNUT to solve comparative genomics tasks more efficiently than previous tools. With the improved user interface (including an interactive visualization component, CoCoNUT provides a unified, versatile, and easy-to-use software tool for large scale studies in comparative genomics.

  14. Preliminary analysis of knee stress in Full Extension Landing

    Directory of Open Access Journals (Sweden)

    Majid Davoodi Makinejad

    2013-09-01

    Full Text Available OBJECTIVE: This study provides an experimental and finite element analysis of knee-joint structure during extended-knee landing based on the extracted impact force, and it numerically identifies the contact pressure, stress distribution and possibility of bone-to-bone contact when a subject lands from a safe height. METHODS: The impact time and loads were measured via inverse dynamic analysis of free landing without knee flexion from three different heights (25, 50 and 75 cm, using five subjects with an average body mass index of 18.8. Three-dimensional data were developed from computed tomography scans and were reprocessed with modeling software before being imported and analyzed by finite element analysis software. The whole leg was considered to be a fixed middle-hinged structure, while impact loads were applied to the femur in an upward direction. RESULTS: Straight landing exerted an enormous amount of pressure on the knee joint as a result of the body's inability to utilize the lower extremity muscles, thereby maximizing the threat of injury when the load exceeds the height-safety threshold. CONCLUSIONS: The researchers conclude that extended-knee landing results in serious deformation of the meniscus and cartilage and increases the risk of bone-to-bone contact and serious knee injury when the load exceeds the threshold safety height. This risk is considerably greater than the risk of injury associated with walking downhill or flexion landing activities.

  15. Full-motion video analysis for improved gender classification

    Science.gov (United States)

    Flora, Jeffrey B.; Lochtefeld, Darrell F.; Iftekharuddin, Khan M.

    2014-06-01

    The ability of computer systems to perform gender classification using the dynamic motion of the human subject has important applications in medicine, human factors, and human-computer interface systems. Previous works in motion analysis have used data from sensors (including gyroscopes, accelerometers, and force plates), radar signatures, and video. However, full-motion video, motion capture, range data provides a higher resolution time and spatial dataset for the analysis of dynamic motion. Works using motion capture data have been limited by small datasets in a controlled environment. In this paper, we explore machine learning techniques to a new dataset that has a larger number of subjects. Additionally, these subjects move unrestricted through a capture volume, representing a more realistic, less controlled environment. We conclude that existing linear classification methods are insufficient for the gender classification for larger dataset captured in relatively uncontrolled environment. A method based on a nonlinear support vector machine classifier is proposed to obtain gender classification for the larger dataset. In experimental testing with a dataset consisting of 98 trials (49 subjects, 2 trials per subject), classification rates using leave-one-out cross-validation are improved from 73% using linear discriminant analysis to 88% using the nonlinear support vector machine classifier.

  16. Analysis of the Complete Mitochondrial Genome Sequence of the Diploid Cotton Gossypium raimondii by Comparative Genomics Approaches

    Directory of Open Access Journals (Sweden)

    Changwei Bi

    2016-01-01

    Full Text Available Cotton is one of the most important economic crops and the primary source of natural fiber and is an important protein source for animal feed. The complete nuclear and chloroplast (cp genome sequences of G. raimondii are already available but not mitochondria. Here, we assembled the complete mitochondrial (mt DNA sequence of G. raimondii into a circular genome of length of 676,078 bp and performed comparative analyses with other higher plants. The genome contains 39 protein-coding genes, 6 rRNA genes, and 25 tRNA genes. We also identified four larger repeats (63.9 kb, 10.6 kb, 9.1 kb, and 2.5 kb in this mt genome, which may be active in intramolecular recombination in the evolution of cotton. Strikingly, nearly all of the G. raimondii mt genome has been transferred to nucleus on Chr1, and the transfer event must be very recent. Phylogenetic analysis reveals that G. raimondii, as a member of Malvaceae, is much closer to another cotton (G. barbadense than other rosids, and the clade formed by two Gossypium species is sister to Brassicales. The G. raimondii mt genome may provide a crucial foundation for evolutionary analysis, molecular biology, and cytoplasmic male sterility in cotton and other higher plants.

  17. Recombination analysis based on the complete genome of bocavirus

    Directory of Open Access Journals (Sweden)

    Chen Shengxia

    2011-04-01

    Full Text Available Abstract Bocavirus include bovine parvovirus, minute virus of canine, porcine bocavirus, gorilla bocavirus, and Human bocaviruses 1-4 (HBoVs. Although recent reports showed that recombination happened in bocavirus, no systematical study investigated the recombination of bocavirus. The present study performed the phylogenetic and recombination analysis of bocavirus over the complete genomes available in GenBank. Results confirmed that recombination existed among bocavirus, including the likely inter-genotype recombination between HBoV1 and HBoV4, and intra-genotype recombination among HBoV2 variants. Moreover, it is the first report revealing the recombination that occurred between minute viruses of canine.

  18. Greenhouse gas emission inventory based on full energy chain analysis

    International Nuclear Information System (INIS)

    Dones, R.; Hirschberg, S.; Knoepfel, I.

    1996-01-01

    Methodology, characteristics, features and results obtained for greenhouse gases within the recent Swiss LCA study 'Environmental Life-Cycle Inventories of Energy Systems' are presented. The focus of the study is on existing average Full Energy Chains (FENCHs) in the electricity generation mixes in Europe and in Switzerland. The systems, including coal (hard coal and lignite), oil, natural gas, nuclear and hydro, are discussed one by one as well as part of the electricity mixes. Photovoltaic systems are covered separately since they are not included in the electricity mixes. A sensitivity analysis on methane leakage during long-range transport via pipeline is shown. Whilst within the current study emissions are not attributed to specific countries, the main sectors contributing to the total GHGs emissions calculated for the various FENCHs are specified. (author). 10 refs, 10 figs, 9 tabs

  19. Greenhouse gas emission inventory based on full energy chain analysis

    Energy Technology Data Exchange (ETDEWEB)

    Dones, R; Hirschberg, S [Paul Scherrer Inst. (PSI), Villigen (Switzerland); Knoepfel, I [Federal Inst. of Technology Zurich, Zurich (Switzerland)

    1996-07-01

    Methodology, characteristics, features and results obtained for greenhouse gases within the recent Swiss LCA study `Environmental Life-Cycle Inventories of Energy Systems` are presented. The focus of the study is on existing average Full Energy Chains (FENCHs) in the electricity generation mixes in Europe and in Switzerland. The systems, including coal (hard coal and lignite), oil, natural gas, nuclear and hydro, are discussed one by one as well as part of the electricity mixes. Photovoltaic systems are covered separately since they are not included in the electricity mixes. A sensitivity analysis on methane leakage during long-range transport via pipeline is shown. Whilst within the current study emissions are not attributed to specific countries, the main sectors contributing to the total GHGs emissions calculated for the various FENCHs are specified. (author). 10 refs, 10 figs, 9 tabs.

  20. Full genome sequences and molecular characterization of tick-borne encephalitis virus strains isolated from human patients.

    Science.gov (United States)

    Formanová, Petra; Černý, Jiří; Bolfíková, Barbora Černá; Valdés, James J; Kozlova, Irina; Dzhioev, Yuri; Růžek, Daniel

    2015-02-01

    Tick-borne encephalitis virus (TBEV) causes tick-borne encephalitis (TBE), one of the most important human neuroinfections across Eurasia. Up to date, only three full genome sequences of human European TBEV isolates are available, mostly due to difficulties with isolation of the virus from human patients. Here we present full genome characterization of an additional five low-passage TBEV strains isolated from human patients with severe forms of TBE. These strains were isolated in 1953 within Central Bohemia in the former Czechoslovakia, and belong to the historically oldest human TBEV isolates in Europe. We demonstrate here that all analyzed isolates are distantly phylogenetically related, indicating that the emergence of TBE in Central Europe was not caused by one predominant strain, but rather a pool of distantly related TBEV strains. Nucleotide identity between individual sequenced TBEV strains ranged from 97.5% to 99.6% and all strains shared large deletions in the 3' non-coding region, which has been recently suggested to be an important determinant of virulence. The number of unique amino acid substitutions varied from 3 to 9 in individual isolates, but no characteristic amino acid substitution typical exclusively for all human TBEV isolates was identified when compared to the isolates from ticks. We did, however, correlate that the exploration of the TBEV envelope glycoprotein by specific antibodies were in close proximity to these unique amino acid substitutions. Taken together, we report here the largest number of patient-derived European TBEV full genome sequences to date and provide a platform for further studies on evolution of TBEV since the first emergence of human TBE in Europe. Copyright © 2014 Elsevier GmbH. All rights reserved.

  1. Bioinformatics analysis of SARS coronavirus genome polymorphism

    Directory of Open Access Journals (Sweden)

    Pavlović-Lažetić Gordana M

    2004-05-01

    Full Text Available Abstract Background We have compared 38 isolates of the SARS-CoV complete genome. The main goal was twofold: first, to analyze and compare nucleotide sequences and to identify positions of single nucleotide polymorphism (SNP, insertions and deletions, and second, to group them according to sequence similarity, eventually pointing to phylogeny of SARS-CoV isolates. The comparison is based on genome polymorphism such as insertions or deletions and the number and positions of SNPs. Results The nucleotide structure of all 38 isolates is presented. Based on insertions and deletions and dissimilarity due to SNPs, the dataset of all the isolates has been qualitatively classified into three groups each having their own subgroups. These are the A-group with "regular" isolates (no insertions / deletions except for 5' and 3' ends, the B-group of isolates with "long insertions", and the C-group of isolates with "many individual" insertions and deletions. The isolate with the smallest average number of SNPs, compared to other isolates, has been identified (TWH. The density distribution of SNPs, insertions and deletions for each group or subgroup, as well as cumulatively for all the isolates is also presented, along with the gene map for TWH. Since individual SNPs may have occurred at random, positions corresponding to multiple SNPs (occurring in two or more isolates are identified and presented. This result revises some previous results of a similar type. Amino acid changes caused by multiple SNPs are also identified (for the annotated sequences, as well as presupposed amino acid changes for non-annotated ones. Exact SNP positions for the isolates in each group or subgroup are presented. Finally, a phylogenetic tree for the SARS-CoV isolates has been produced using the CLUSTALW program, showing high compatibility with former qualitative classification. Conclusions The comparative study of SARS-CoV isolates provides essential information for genome

  2. Full-Range Public Health Leadership, Part 1: Quantitative Analysis

    Directory of Open Access Journals (Sweden)

    Erik L. Carlton

    2015-04-01

    Full Text Available Background. Workforce and leadership development are central to the future of public health. However, public health has been slow to translate and apply leadership models from other professions and to incorporate local perspectives in understanding public health leadership. Purpose. This study utilized the full-range leadership model in order to examine public health leadership. Specifically, it sought to measure leadership styles among local health department directors and to understand the context of leadership local health departments.Methods. Leadership styles among local health department directors (n=13 were examined using survey methodology. Quantitative analysis methods included descriptive statistics, boxplots, and Pearson bivariate correlations using SPSS v18.0. Findings. Self-reported leadership styles were highly correlated to leadership outcomes at the organizational level. However, they were not related to county health rankings. Results suggest the preeminence of leader behaviors and providing individual consideration to staff as compared to idealized attributes of leaders, intellectual stimulation, or inspirational motivation. Implications. Holistic leadership assessment instruments, such as the Multifactor Leadership Questionnaire (MLQ can be useful in assessing public health leaders approaches and outcomes. Comprehensive, 360-degree reviews may be especially helpful. Further research is needed to examine the effectiveness of public health leadership development models, as well as the extent that public health leadership impacts public health outcomes.

  3. Full-Genome Sequence of a Reassortant H1N2 Influenza A Virus Isolated from Pigs in Brazil.

    Science.gov (United States)

    Schmidt, Candice; Cibulski, Samuel Paulo; Muterle Varela, Ana Paula; Mengue Scheffer, Camila; Wendlant, Adrieli; Quoos Mayer, Fabiana; Lopes de Almeida, Laura; Franco, Ana Cláudia; Roehe, Paulo Michel

    2014-12-18

    In this study, the full-genome sequence of a reassortant H1N2 swine influenza virus is reported. The isolate has the hemagglutinin (HA) and neuraminidase (NA) genes from human lineage (H1-δ cluster and N2), and the internal genes (polymerase basic 1 [PB1], polymerase basic 2 [PB2], polymerase acidic [PA], nucleoprotein [NP], matrix [M], and nonstructural [NS]) are derived from human 2009 pandemic H1N1 (H1N1pdm09) virus. Copyright © 2014 Schmidt et al.

  4. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones

    OpenAIRE

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Söderlund-Venermo, Maria; Young, Neal S.; Brown, Kevin E.

    2008-01-01

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showe...

  5. CGUG: in silico proteome and genome parsing tool for the determination of "core" and unique genes in the analysis of genomes up to ca. 1.9 Mb

    Directory of Open Access Journals (Sweden)

    Mahadevan Padmanabhan

    2009-08-01

    Full Text Available Abstract Background Viruses and small-genome bacteria (~2 megabases and smaller comprise a considerable population in the biosphere and are of interest to many researchers. These genomes are now sequenced at an unprecedented rate and require complementary computational tools to analyze. "CoreGenesUniqueGenes" (CGUG is an in silico genome data mining tool that determines a "core" set of genes from two to five organisms with genomes in this size range. Core and unique genes may reflect similar niches and needs, and may be used in classifying organisms. Findings CGUG is available at http://binf.gmu.edu/geneorder.html as a web-based on-the-fly tool that performs iterative BLASTP analyses using a reference genome and up to four query genomes to provide a table of genes common to these genomes. The result is an in silico display of genomes and their proteomes, allowing for further analysis. CGUG can be used for "genome annotation by homology", as demonstrated with Chlamydophila and Francisella genomes. Conclusion CGUG is used to reanalyze the ICTV-based classifications of bacteriophages, to reconfirm long-standing relationships and to explore new classifications. These genomes have been problematic in the past, due largely to horizontal gene transfers. CGUG is validated as a tool for reannotating small genome bacteria using more up-to-date annotations by similarity or homology. These serve as an entry point for wet-bench experiments to confirm the functions of these "hypothetical" and "unknown" proteins.

  6. Data for constructing insect genome content matrices for phylogenetic analysis and functional annotation

    Directory of Open Access Journals (Sweden)

    Jeffrey Rosenfeld

    2016-03-01

    Full Text Available Twenty one fully sequenced and well annotated insect genomes were used to construct genome content matrices for phylogenetic analysis and functional annotation of insect genomes. To examine the role of e-value cutoff in ortholog determination we used scaled e-value cutoffs and a single linkage clustering approach.. The present communication includes (1 a list of the genomes used to construct the genome content phylogenetic matrices, (2 a nexus file with the data matrices used in phylogenetic analysis, (3 a nexus file with the Newick trees generated by phylogenetic analysis, (4 an excel file listing the Core (CORE genes and Unique (UNI genes found in five insect groups, and (5 a figure showing a plot of consistency index (CI versus percent of unannotated genes that are apomorphies in the data set for gene losses and gains and bar plots of gains and losses for four consistency index (CI cutoffs.

  7. Full text clustering and relationship network analysis of biomedical publications.

    Directory of Open Access Journals (Sweden)

    Renchu Guan

    Full Text Available Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  8. Virtual Northern analysis of the human genome.

    Directory of Open Access Journals (Sweden)

    Evan H Hurowitz

    2007-05-01

    Full Text Available We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale.We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90% confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs tend to be longer or shorter than average; these functional classes were similar in both human and yeast.Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.

  9. A Genomics Approach to Tumor Gemome Analysis

    National Research Council Canada - National Science Library

    Collins, Colin

    2002-01-01

    Genomes of solid tumors are often highly rearranged and these rearrangements promote cancer progression through disruption of genes mediating immortality, survival, metastasis, and resistance to therapy...

  10. E2FM: an encrypted and compressed full-text index for collections of genomic sequences.

    Science.gov (United States)

    Montecuollo, Ferdinando; Schmid, Giovannni; Tagliaferri, Roberto

    2017-09-15

    Next Generation Sequencing (NGS) platforms and, more generally, high-throughput technologies are giving rise to an exponential growth in the size of nucleotide sequence databases. Moreover, many emerging applications of nucleotide datasets-as those related to personalized medicine-require the compliance with regulations about the storage and processing of sensitive data. We have designed and carefully engineered E 2 FM -index, a new full-text index in minute space which was optimized for compressing and encrypting nucleotide sequence collections in FASTA format and for performing fast pattern-search queries. E 2 FM -index allows to build self-indexes which occupy till to 1/20 of the storage required by the input FASTA file, thus permitting to save about 95% of storage when indexing collections of highly similar sequences; moreover, it can exactly search the built indexes for patterns in times ranging from few milliseconds to a few hundreds milliseconds, depending on pattern length. Source code is available at https://github.com/montecuollo/E2FM . ferdinando.montecuollo@unicampania.it. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  11. Cinteny: flexible analysis and visualization of synteny and genome rearrangements in multiple organisms

    Directory of Open Access Journals (Sweden)

    Meller Jaroslaw

    2007-03-01

    Full Text Available Abstract Background Identifying syntenic regions, i.e., blocks of genes or other markers with evolutionary conserved order, and quantifying evolutionary relatedness between genomes in terms of chromosomal rearrangements is one of the central goals in comparative genomics. However, the analysis of synteny and the resulting assessment of genome rearrangements are sensitive to the choice of a number of arbitrary parameters that affect the detection of synteny blocks. In particular, the choice of a set of markers and the effect of different aggregation strategies, which enable coarse graining of synteny blocks and exclusion of micro-rearrangements, need to be assessed. Therefore, existing tools and resources that facilitate identification, visualization and analysis of synteny need to be further improved to provide a flexible platform for such analysis, especially in the context of multiple genomes. Results We present a new tool, Cinteny, for fast identification and analysis of synteny with different sets of markers and various levels of coarse graining of syntenic blocks. Using Hannenhalli-Pevzner approach and its extensions, Cinteny also enables interactive determination of evolutionary relationships between genomes in terms of the number of rearrangements (the reversal distance. In particular, Cinteny provides: i integration of synteny browsing with assessment of evolutionary distances for multiple genomes; ii flexibility to adjust the parameters and re-compute the results on-the-fly; iii ability to work with user provided data, such as orthologous genes, sequence tags or other conserved markers. In addition, Cinteny provides many annotated mammalian, invertebrate and fungal genomes that are pre-loaded and available for analysis at http://cinteny.cchmc.org. Conclusion Cinteny allows one to automatically compare multiple genomes and perform sensitivity analysis for synteny block detection and for the subsequent computation of reversal distances

  12. Analysis of high-identity segmental duplications in the grapevine genome

    Directory of Open Access Journals (Sweden)

    Carelli Francesco N

    2011-08-01

    Full Text Available Abstract Background Segmental duplications (SDs are blocks of genomic sequence of 1-200 kb that map to different loci in a genome and share a sequence identity > 90%. SDs show at the sequence level the same characteristics as other regions of the human genome: they contain both high-copy repeats and gene sequences. SDs play an important role in genome plasticity by creating new genes and modeling genome structure. Although data is plentiful for mammals, not much was known about the representation of SDs in plant genomes. In this regard, we performed a genome-wide analysis of high-identity SDs on the sequenced grapevine (Vitis vinifera genome (PN40024. Results We demonstrate that recent SDs (> 94% identity and >= 10 kb in size are a relevant component of the grapevine genome (85 Mb, 17% of the genome sequence. We detected mitochondrial and plastid DNA and genes (10% of gene annotation in segmentally duplicated regions of the nuclear genome. In particular, the nine highest copy number genes have a copy in either or both organelle genomes. Further we showed that several duplicated genes take part in the biosynthesis of compounds involved in plant response to environmental stress. Conclusions These data show the great influence of SDs and organelle DNA transfers in modeling the Vitis vinifera nuclear DNA structure as well as the impact of SDs in contributing to the adaptive capacity of grapevine and the nutritional content of grape products through genome variation. This study represents a step forward in the full characterization of duplicated genes important for grapevine cultural needs and human health.

  13. Full-length genomic and molecular characterization of Canine parvovirus in dogs from North of Brazil.

    Science.gov (United States)

    Silva, S P; Silva, L N P P; Rodrigues, E D L; Cardoso, J F; Tavares, F N; Souza, W M; Santos, C M P; Martins, F M S; Jesus, I S; Brito, T C; Moura, T P C; Nunes, M R T; Casseb, L M N; Silva Filho, E; Casseb, A R

    2017-09-21

    With the objective of characterizing Canine parvovirus (CPV) from some suspected fecal samples of dogs collected from the Veterinarian Hospital in Belém city, five positive samples were found by PCR assay and an update molecular characterization was provided of the CPV-2 circulation in Belém. Through sequencing of the complete DNA sequences (NS1, NS2, VP1, and VP2 genes), the CPV-2 strain was identified as CPV-2b (Asn426Asp) circulating in Belém. The CPV-2b strain with a different change at the position Tyr324Leu was detected in all samples assessed and thus reported for the first time for the scientific community. Phylogenetic analysis indicated that Belém CPV-2b and CPV-2a strains would be related to a cluster with samples after the 1990s, suggesting that CPV-2b in Belém originated from CPV-2a circulating in Brazil after the 1990s. Potential recombination events were analyzed using RDP4 and SplitsTree4; therefore, results suggest that CPV-2 sequences here described were not potentially recombination events. Continuous monitoring and molecular characterization of CPV-2 samples are needed not only to identify possible genetic and antigenic changes that may interfere with the effectiveness of vaccines but also to bring a better understanding of the mechanisms that drive the evolution of CPV-2 in Brazil.

  14. Isolation, preliminary characterization, and full-genome analyses of tick-borne encephalitis virus from Mongolia.

    Science.gov (United States)

    Frey, Stefan; Mossbrugger, Ilona; Altantuul, Damdin; Battsetseg, Jigjav; Davaadorj, Rendoo; Tserennorov, Damdindorj; Buyanjargal, Tsoodol; Otgonbaatar, Dashdavaa; Zöller, Lothar; Speck, Stephanie; Wölfel, Roman; Dobler, Gerhard; Essbauer, Sandra

    2012-12-01

    Tick-borne encephalitis virus (TBEV) causes one of the most important inflammatory diseases of the central nervous system, namely severe encephalitis in Europe and Asia. Since the 1980s tick-borne encephalitis is known in Mongolia with increasing numbers of human cases reported during the last years. So far, however, data on TBEV strains are still sparse. We herein report the isolation of a TBEV strain from Ixodes persulcatus ticks collected in Mongolia in 2010. Phylogenetic analysis of the E-gene classified this isolate as Siberian subtype of TBEV. The Mongolian TBEV strain showed differences in virus titers, plaque sizes, and growth properties in two human neuronal cell-lines. In addition, the 10,242 nucleotide long open-reading frame and the corresponding polyprotein sequence were revealed. The isolate grouped in the genetic subclade of the Siberian subtype. The strain Zausaev (AF527415) and Vasilchenko (AF069066) had 97 and 94 % identity on the nucleotide level. In summary, we herein describe first detailed data regarding TBEV from Mongolia. Further investigations of TBEV in Mongolia and adjacent areas are needed to understand the intricate dispersal of this virus.

  15. First identification of a recombinant form of hepatitis C virus in Austrian patients by full-genome next generation sequencing.

    Directory of Open Access Journals (Sweden)

    Evelyn Stelzl

    Full Text Available Hepatitis C virus (HCV intergenotypic recombinant forms have been reported for various HCV genotypes/subtypes in several countries worldwide. In a recent study, four patients living in Austria had been identified to be possibly infected with a recombinant HCV strain. To clarify results and determine the point of recombination, full-genome next-generation sequencing using the Illumina MiSeq v2 300 cycle kit (Illumina, San Diego, CA, USA was performed in the present study. Samples of all of the patients contained the recombinant HCV strain 2k/1b. The point of recombination was found to be within the HCV NS2 gene between nucleotide positions 3189-3200 based on H77 numbering. While three of four patients were male and had migration background from Chechnya (n = 2 and Azerbaijan (n = 1, the forth patient was a female born in Austria. Three of the four patients including the female had intravenous drug abuse as a risk factor for HCV transmission. While sequencing techniques are limited to a few specialized laboratories, a genotyping assay that uses both ends of the HCV genome should be employed to identify patients infected with a recombinant HCV strain. The correct identification of recombinant strains also has an impact considering the tailored choice of anti-HCV treatment.

  16. Comparative genomics analysis of rice and pineapple contributes to understand the chromosome number reduction and genomic changes in grasses

    Directory of Open Access Journals (Sweden)

    Jinpeng Wang

    2016-10-01

    Full Text Available Rice is one of the most researched model plant, and has a genome structure most resembling that of the grass common ancestor after a grass common tetraploidization ~100 million years ago. There has been a standing controversy whether there had been 5 or 7 basic chromosomes, before the tetraploidization, which were tackled but could not be well solved for the lacking of a sequenced and assembled outgroup plant to have a conservative genome structure. Recently, the availability of pineapple genome, which has not been subjected to the grass-common tetraploidization, provides a precious opportunity to solve the above controversy and to research into genome changes of rice and other grasses. Here, we performed a comparative genomics analysis of pineapple and rice, and found solid evidence that grass-common ancestor had 2n =2x =14 basic chromosomes before the tetraploidization and duplicated to 2n = 4x = 28 after the event. Moreover, we proposed that enormous gene missing from duplicated regions in rice should be explained by an allotetraploid produced by prominently divergent parental lines, rather than gene losses after their divergence. This means that genome fractionation might have occurred before the formation of the allotetraploid grass ancestor.

  17. Pathway and network analysis of cancer genomes

    DEFF Research Database (Denmark)

    Creixell, Pau; Reimand, Jueri; Haider, Syed

    2015-01-01

    Genomic information on tumors from 50 cancer types cataloged by the International Cancer Genome Consortium (ICGC) shows that only a few well-studied driver genes are frequently mutated, in contrast to many infrequently mutated genes that may also contribute to tumor biology. Hence there has been...

  18. Analysis of Genome-Scale Data

    NARCIS (Netherlands)

    Kemmeren, P.P.C.W.

    2005-01-01

    The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has

  19. GENOME ANALYSIS OF BURKHOLDERIA CEPACIA AC1100

    Science.gov (United States)

    Burkholderia cepacia is an important organism in bioremediation of environmental pollutants and it is also of increasing interest as a human pathogen. The genomic organization of B. cepacia is being studied in order to better understand its unusual adaptive capacity and genome pl...

  20. License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...t list, Marker list, QTL list, Plant DB link & Genome analysis methods © Satoshi ... Policy | Contact Us License - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  1. Virtual Northern analysis of the human genome.

    Science.gov (United States)

    Hurowitz, Evan H; Drori, Iddo; Stodden, Victoria C; Donoho, David L; Brown, Patrick O

    2007-05-23

    We applied the Virtual Northern technique to human brain mRNA to systematically measure human mRNA transcript lengths on a genome-wide scale. We used separation by gel electrophoresis followed by hybridization to cDNA microarrays to measure 8,774 mRNA transcript lengths representing at least 6,238 genes at high (>90%) confidence. By comparing these transcript lengths to the Refseq and H-Invitational full-length cDNA databases, we found that nearly half of our measurements appeared to represent novel transcript variants. Comparison of length measurements determined by hybridization to different cDNAs derived from the same gene identified clones that potentially correspond to alternative transcript variants. We observed a close linear relationship between ORF and mRNA lengths in human mRNAs, identical in form to the relationship we had previously identified in yeast. Some functional classes of protein are encoded by mRNAs whose untranslated regions (UTRs) tend to be longer or shorter than average; these functional classes were similar in both human and yeast. Human transcript diversity is extensive and largely unannotated. Our length dataset can be used as a new criterion for judging the completeness of cDNAs and annotating mRNA sequences. Similar relationships between the lengths of the UTRs in human and yeast mRNAs and the functions of the proteins they encode suggest that UTR sequences serve an important regulatory role among eukaryotes.

  2. Functional genomic analysis of C. elegans molting.

    Directory of Open Access Journals (Sweden)

    Alison R Frand

    2005-10-01

    Full Text Available Although the molting cycle is a hallmark of insects and nematodes, neither the endocrine control of molting via size, stage, and nutritional inputs nor the enzymatic mechanism for synthesis and release of the exoskeleton is well understood. Here, we identify endocrine and enzymatic regulators of molting in C. elegans through a genome-wide RNA-interference screen. Products of the 159 genes discovered include annotated transcription factors, secreted peptides, transmembrane proteins, and extracellular matrix enzymes essential for molting. Fusions between several genes and green fluorescent protein show a pulse of expression before each molt in epithelial cells that synthesize the exoskeleton, indicating that the corresponding proteins are made in the correct time and place to regulate molting. We show further that inactivation of particular genes abrogates expression of the green fluorescent protein reporter genes, revealing regulatory networks that might couple the expression of genes essential for molting to endocrine cues. Many molting genes are conserved in parasitic nematodes responsible for human disease, and thus represent attractive targets for pesticide and pharmaceutical development.

  3. Be-Breeder – an application for analysis of genomic data in plant breeding

    Directory of Open Access Journals (Sweden)

    Filipe Inácio Matias

    2016-12-01

    Full Text Available Be-Breeder is an application directed toward genetic breeding of plants, developed through the Shiny package of the R software, which allows different phenotype and molecular (marker analysis to be undertaken. The section for analysis of molecular data of the Be-Breeder application makes it possible to achieve quality control of genotyping data, to obtain genomic kinship matrices, and to analyze genomic selection, genome association, and genetic diversity in a simple manner on line. This application is available for use in a network through the site of the Allogamous Plant Breeding Laboratory of ESALQ-USP (http://www.genetica.esalq.usp.br/alogamas/R.html.

  4. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    Directory of Open Access Journals (Sweden)

    Cassidy L Klima

    Full Text Available Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1 and 6 (S6 isolated from pneumonic lesions and serotype 2 (S2 found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design

  5. A comparative genome analysis of Cercospora sojina with other members of the pathogen genus Mycosphaerella on different plant hosts

    Directory of Open Access Journals (Sweden)

    Fanchang Zeng

    2017-09-01

    Full Text Available Fungi are the causal agents of many of the world's most serious plant diseases causing disastrous consequences for large-scale agricultural production. Pathogenicity genomic basis is complex in fungi as multicellular eukaryotic pathogens. Here, we report the genome sequence of C. sojina, and comparative genome analysis with plant pathogen members of the genus Mycosphaerella (Zymoseptoria. tritici (synonyms M. graminicola, M. pini, M. populorum and M. fijiensis - pathogens of wheat, pine, poplar and banana, respectively. Synteny or collinearity was limited between genomes of major Mycosphaerella pathogens. Comparative analysis with these related pathogen genomes indicated distinct genome-wide repeat organization features. It suggests repetitive elements might be responsible for considerable evolutionary genomic changes. These results reveal the background of genomic differences and similarities between Dothideomycete species. Wide diversity as well as conservation on genome features forms the potential genomic basis of the pathogen specialization, such as pathogenicity to woody vs. herbaceous hosts. Through comparative genome analysis among five Dothideomycete species, our results have shed light on the genome features of these related fungi species. It provides insight for understanding the genomic basis of fungal pathogenicity and disease resistance in the crop hosts.

  6. Download - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...t_db_link_en.zip (36.3 KB) - 6 Genome analysis methods pgdbj_dna_marker_linkage_map_genome_analysis_methods_... of This Database Site Policy | Contact Us Download - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  7. Update History of This Database - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...B link & Genome analysis methods English archive site is opened. 2012/08/08 PGDBj... Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods is opened. About This...ate History of This Database - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  8. Comparative analysis of catfish BAC end sequences with the zebrafish genome

    Directory of Open Access Journals (Sweden)

    Abernathy Jason

    2009-12-01

    Full Text Available Abstract Background Comparative mapping is a powerful tool to transfer genomic information from sequenced genomes to closely related species for which whole genome sequence data are not yet available. However, such an approach is still very limited in catfish, the most important aquaculture species in the United States. This project was initiated to generate additional BAC end sequences and demonstrate their applications in comparative mapping in catfish. Results We reported the generation of 43,000 BAC end sequences and their applications for comparative genome analysis in catfish. Using these and the additional 20,000 existing BAC end sequences as a resource along with linkage mapping and existing physical map, conserved syntenic regions were identified between the catfish and zebrafish genomes. A total of 10,943 catfish BAC end sequences (17.3% had significant BLAST hits to the zebrafish genome (cutoff value ≤ e-5, of which 3,221 were unique gene hits, providing a platform for comparative mapping based on locations of these genes in catfish and zebrafish. Genetic linkage mapping of microsatellites associated with contigs allowed identification of large conserved genomic segments and construction of super scaffolds. Conclusion BAC end sequences and their associated polymorphic markers are great resources for comparative genome analysis in catfish. Highly conserved chromosomal regions were identified to exist between catfish and zebrafish. However, it appears that the level of conservation at local genomic regions are high while a high level of chromosomal shuffling and rearrangements exist between catfish and zebrafish genomes. Orthologous regions established through comparative analysis should facilitate both structural and functional genome analysis in catfish.

  9. Full-Duplex MIMO Small-Cell Networks: Performance Analysis

    OpenAIRE

    Atzeni, Italo; Kountouris, Marios

    2015-01-01

    Full-duplex small-cell relays with multiple antennas constitute a core element of the envisioned 5G network architecture. In this paper, we use stochastic geometry to analyze the performance of wireless networks with full-duplex multiple-antenna small cells, with particular emphasis on the probability of successful transmission. To achieve this goal, we additionally characterize the distribution of the self-interference power of the full-duplex nodes. The proposed framework reveals useful ins...

  10. First identification of a recombinant form of hepatitis C virus in Austrian patients by full-genome next generation sequencing.

    Science.gov (United States)

    Stelzl, Evelyn; Haas, Bernhard; Bauer, Bernd; Zhang, Sherry; Fiss, Ellen H; Hillman, Grantland; Hamilton, Aaron T; Mehta, Rochak; Heil, Marintha L; Marins, Ed G; Santner, Brigitte I; Kessler, Harald H

    2017-01-01

    Hepatitis C virus (HCV) intergenotypic recombinant forms have been reported for various HCV genotypes/subtypes in several countries worldwide. In a recent study, four patients living in Austria had been identified to be possibly infected with a recombinant HCV strain. To clarify results and determine the point of recombination, full-genome next-generation sequencing using the Illumina MiSeq v2 300 cycle kit (Illumina, San Diego, CA, USA) was performed in the present study. Samples of all of the patients contained the recombinant HCV strain 2k/1b. The point of recombination was found to be within the HCV NS2 gene between nucleotide positions 3189-3200 based on H77 numbering. While three of four patients were male and had migration background from Chechnya (n = 2) and Azerbaijan (n = 1), the forth patient was a female born in Austria. Three of the four patients including the female had intravenous drug abuse as a risk factor for HCV transmission. While sequencing techniques are limited to a few specialized laboratories, a genotyping assay that uses both ends of the HCV genome should be employed to identify patients infected with a recombinant HCV strain. The correct identification of recombinant strains also has an impact considering the tailored choice of anti-HCV treatment.

  11. Assumption-free estimation of heritability from genome-wide identity-by-descent sharing between full siblings.

    Directory of Open Access Journals (Sweden)

    2006-03-01

    Full Text Available The study of continuously varying, quantitative traits is important in evolutionary biology, agriculture, and medicine. Variation in such traits is attributable to many, possibly interacting, genes whose expression may be sensitive to the environment, which makes their dissection into underlying causative factors difficult. An important population parameter for quantitative traits is heritability, the proportion of total variance that is due to genetic factors. Response to artificial and natural selection and the degree of resemblance between relatives are all a function of this parameter. Following the classic paper by R. A. Fisher in 1918, the estimation of additive and dominance genetic variance and heritability in populations is based upon the expected proportion of genes shared between different types of relatives, and explicit, often controversial and untestable models of genetic and non-genetic causes of family resemblance. With genome-wide coverage of genetic markers it is now possible to estimate such parameters solely within families using the actual degree of identity-by-descent sharing between relatives. Using genome scans on 4,401 quasi-independent sib pairs of which 3,375 pairs had phenotypes, we estimated the heritability of height from empirical genome-wide identity-by-descent sharing, which varied from 0.374 to 0.617 (mean 0.498, standard deviation 0.036. The variance in identity-by-descent sharing per chromosome and per genome was consistent with theory. The maximum likelihood estimate of the heritability for height was 0.80 with no evidence for non-genetic causes of sib resemblance, consistent with results from independent twin and family studies but using an entirely separate source of information. Our application shows that it is feasible to estimate genetic variance solely from within-family segregation and provides an independent validation of previously untestable assumptions. Given sufficient data, our new paradigm will

  12. Analysis of intra-genomic GC content homogeneity within prokaryotes

    DEFF Research Database (Denmark)

    Bohlin, J; Snipen, L; Hardy, S.P.

    2010-01-01

    the GC content varies within microbial genomes to assess whether this property can be associated with certain biological functions related to the organism's environment and phylogeny. We utilize a new quantity GCVAR, the intra-genomic GC content variability with respect to the average GC content......Bacterial genomes possess varying GC content (total guanines (Gs) and cytosines (Cs) per total of the four bases within the genome) but within a given genome, GC content can vary locally along the chromosome, with some regions significantly more or less GC rich than on average. We have examined how...... both aerobic and facultative microbes. Although an association has previously been found between mean genomic GC content and oxygen requirement, our analysis suggests that no such association exits when phylogenetic bias is accounted for. A significant association between GCVAR and mean GC content...

  13. Creation and genomic analysis of irradiation hybrids in Populus

    Science.gov (United States)

    Matthew S. Zinkgraf; K. Haiby; M.C. Lieberman; L. Comai; I.M. Henry; Andrew Groover

    2016-01-01

    Establishing efficient functional genomic systems for creating and characterizing genetic variation in forest trees is challenging. Here we describe protocols for creating novel gene-dosage variation in Populus through gamma-irradiation of pollen, followed by genomic analysis to identify chromosomal regions that have been deleted or inserted in...

  14. Genome-wide analysis of LTR-retrotransposons in oil palm.

    Science.gov (United States)

    Beulé, Thierry; Agbessi, Mawussé Dt; Dussert, Stephane; Jaligot, Estelle; Guyot, Romain

    2015-10-15

    The oil palm (Elaeis guineensis Jacq.) is a major cultivated crop and the world's largest source of edible vegetable oil. The genus Elaeis comprises two species E. guineensis, the commercial African oil palm and E. oleifera, which is used in oil palm genetic breeding. The recent publication of both the African oil palm genome assembly and the first draft sequence of its Latin American relative now allows us to tackle the challenge of understanding the genome composition, structure and evolution of these palm genomes through the annotation of their repeated sequences. In this study, we identified, annotated and compared Transposable Elements (TE) from the African and Latin American oil palms. In a first step, Transposable Element databases were built through de novo detection in both genome sequences then the TE content of both genomes was estimated. Then putative full-length retrotransposons with Long Terminal Repeats (LTRs) were further identified in the E. guineensis genome for characterization of their structural diversity, copy number and chromosomal distribution. Finally, their relative expression in several tissues was determined through in silico analysis of publicly available transcriptome data. Our results reveal a congruence in the transpositional history of LTR retrotransposons between E. oleifera and E. guineensis, especially the Sto-4 family. Also, we have identified and described 583 full-length LTR-retrotransposons in the Elaeis guineensis genome. Our work shows that these elements are most likely no longer mobile and that no recent insertion event has occurred. Moreover, the analysis of chromosomal distribution suggests a preferential insertion of Copia elements in gene-rich regions, whereas Gypsy elements appear to be evenly distributed throughout the genome. Considering the high proportion of LTR retrotransposon in the oil palm genome, our work will contribute to a greater understanding of their impact on genome organization and evolution

  15. Pan-Genome Analysis Links the Hereditary Variation of Leptospirillum ferriphilum With Its Evolutionary Adaptation

    Directory of Open Access Journals (Sweden)

    Xian Zhang

    2018-03-01

    Full Text Available Niche adaptation has long been recognized to drive intra-species differentiation and speciation, yet knowledge about its relatedness with hereditary variation of microbial genomes is relatively limited. Using Leptospirillum ferriphilum species as a case study, we present a detailed analysis of genomic features of five recognized strains. Genome-to-genome distance calculation preliminarily determined the roles of spatial distance and environmental heterogeneity that potentially contribute to intra-species variation within L. ferriphilum species at the genome level. Mathematical models were further constructed to extrapolate the expansion of L. ferriphilum genomes (an ‘open’ pan-genome, indicating the emergence of novel genes with new sequenced genomes. The identification of diverse mobile genetic elements (MGEs (such as transposases, integrases, and phage-associated genes revealed the prevalence of horizontal gene transfer events, which is an important evolutionary mechanism that provides avenues for the recruitment of novel functionalities and further for the genetic divergence of microbial genomes. Comprehensive analysis also demonstrated that the genome reduction by gene loss in a broad sense might contribute to the observed diversification. We thus inferred a plausible explanation to address this observation: the community-dependent adaptation that potentially economizes the limiting resources of the entire community. Now that the introduction of new genes is accompanied by a parallel abandonment of some other ones, our results provide snapshots on the biological fitness cost of environmental adaptation within the L. ferriphilum genomes. In short, our genome-wide analyses bridge the relation between genetic variation of L. ferriphilum with its evolutionary adaptation.

  16. Analysis of Genome-Scale Data

    OpenAIRE

    Kemmeren, P.P.C.W.

    2005-01-01

    The genetic material of every cell in an organism is stored inside DNA in the form of genes, which together form the genome. The information stored in the DNA is translated to RNA and subsequently to proteins, which form complex biological systems. The availability of whole genome sequences has given rise to the parallel development of other high-throughput approaches such as determining mRNA expression level changes, gene-deletion phenotypes, chromosomal location of DNA binding proteins, cel...

  17. Use of application containers and workflows for genomic data analysis

    Directory of Open Access Journals (Sweden)

    Wade L Schulz

    2016-01-01

    Full Text Available Background: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Methods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. Results: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS data. Conclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia.

  18. Analysis of expressed sequence tags generated from full-length enriched cDNA libraries of melon

    Directory of Open Access Journals (Sweden)

    Bendahmane Abdelhafid

    2011-05-01

    Full Text Available Abstract Background Melon (Cucumis melo, an economically important vegetable crop, belongs to the Cucurbitaceae family which includes several other important crops such as watermelon, cucumber, and pumpkin. It has served as a model system for sex determination and vascular biology studies. However, genomic resources currently available for melon are limited. Result We constructed eleven full-length enriched and four standard cDNA libraries from fruits, flowers, leaves, roots, cotyledons, and calluses of four different melon genotypes, and generated 71,577 and 22,179 ESTs from full-length enriched and standard cDNA libraries, respectively. These ESTs, together with ~35,000 ESTs available in public domains, were assembled into 24,444 unigenes, which were extensively annotated by comparing their sequences to different protein and functional domain databases, assigning them Gene Ontology (GO terms, and mapping them onto metabolic pathways. Comparative analysis of melon unigenes and other plant genomes revealed that 75% to 85% of melon unigenes had homologs in other dicot plants, while approximately 70% had homologs in monocot plants. The analysis also identified 6,972 gene families that were conserved across dicot and monocot plants, and 181, 1,192, and 220 gene families specific to fleshy fruit-bearing plants, the Cucurbitaceae family, and melon, respectively. Digital expression analysis identified a total of 175 tissue-specific genes, which provides a valuable gene sequence resource for future genomics and functional studies. Furthermore, we identified 4,068 simple sequence repeats (SSRs and 3,073 single nucleotide polymorphisms (SNPs in the melon EST collection. Finally, we obtained a total of 1,382 melon full-length transcripts through the analysis of full-length enriched cDNA clones that were sequenced from both ends. Analysis of these full-length transcripts indicated that sizes of melon 5' and 3' UTRs were similar to those of tomato, but

  19. Integrated Genomic Analysis of the Ubiquitin Pathway across Cancer Types

    Directory of Open Access Journals (Sweden)

    Zhongqi Ge

    2018-04-01

    Full Text Available Summary: Protein ubiquitination is a dynamic and reversible process of adding single ubiquitin molecules or various ubiquitin chains to target proteins. Here, using multidimensional omic data of 9,125 tumor samples across 33 cancer types from The Cancer Genome Atlas, we perform comprehensive molecular characterization of 929 ubiquitin-related genes and 95 deubiquitinase genes. Among them, we systematically identify top somatic driver candidates, including mutated FBXW7 with cancer-type-specific patterns and amplified MDM2 showing a mutually exclusive pattern with BRAF mutations. Ubiquitin pathway genes tend to be upregulated in cancer mediated by diverse mechanisms. By integrating pan-cancer multiomic data, we identify a group of tumor samples that exhibit worse prognosis. These samples are consistently associated with the upregulation of cell-cycle and DNA repair pathways, characterized by mutated TP53, MYC/TERT amplification, and APC/PTEN deletion. Our analysis highlights the importance of the ubiquitin pathway in cancer development and lays a foundation for developing relevant therapeutic strategies. : Ge et al. analyze a cohort of 9,125 TCGA samples across 33 cancer types to provide a comprehensive characterization of the ubiquitin pathway. They detect somatic driver candidates in the ubiquitin pathway and identify a cluster of patients with poor survival, highlighting the importance of this pathway in cancer development. Keywords: ubiquitin pathway, pan-cancer analysis, The Cancer Genome Atlas, tumor subtype, cancer prognosis, therapeutic targets, biomarker, FBXW7

  20. The Complete Chloroplast Genome of Catha edulis: A Comparative Analysis of Genome Features with Related Species

    Directory of Open Access Journals (Sweden)

    Cuihua Gu

    2018-02-01

    Full Text Available Qat (Catha edulis, Celastraceae is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA genes, 8 ribosomal RNA (rRNA genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae.

  1. Complete Genome Analysis of Thermus parvatiensis and Comparative Genomics of Thermus spp. Provide Insights into Genetic Variability and Evolution of Natural Competence as Strategic Survival Attributes

    Directory of Open Access Journals (Sweden)

    Charu Tripathi

    2017-07-01

    Full Text Available Thermophilic environments represent an interesting niche. Among thermophiles, the genus Thermus is among the most studied genera. In this study, we have sequenced the genome of Thermus parvatiensis strain RL, a thermophile isolated from Himalayan hot water springs (temperature >96°C using PacBio RSII SMRT technique. The small genome (2.01 Mbp comprises a chromosome (1.87 Mbp and a plasmid (143 Kbp, designated in this study as pTP143. Annotation revealed a high number of repair genes, a squeezed genome but containing highly plastic plasmid with transposases, integrases, mobile elements and hypothetical proteins (44%. We performed a comparative genomic study of the group Thermus with an aim of analysing the phylogenetic relatedness as well as niche specific attributes prevalent among the group. We compared the reference genome RL with 16 Thermus genomes to assess their phylogenetic relationships based on 16S rRNA gene sequences, average nucleotide identity (ANI, conserved marker genes (31 and 400, pan genome and tetranucleotide frequency. The core genome of the analyzed genomes contained 1,177 core genes and many singleton genes were detected in individual genomes, reflecting a conserved core but adaptive pan repertoire. We demonstrated the presence of metagenomic islands (chromosome:5, plasmid:5 by recruiting raw metagenomic data (from the same niche against the genomic replicons of T. parvatiensis. We also dissected the CRISPR loci wide all genomes and found widespread presence of this system across Thermus genomes. Additionally, we performed a comparative analysis of competence loci wide Thermus genomes and found evidence for recent horizontal acquisition of the locus and continued dispersal among members reflecting that natural competence is a beneficial survival trait among Thermus members and its acquisition depicts unending evolution in order to accomplish optimal fitness.

  2. Complete Chloroplast Genomes of Papaver rhoeas and Papaver orientale: Molecular Structures, Comparative Analysis, and Phylogenetic Analysis

    Directory of Open Access Journals (Sweden)

    Jianguo Zhou

    2018-02-01

    Full Text Available Papaver rhoeas L. and P. orientale L., which belong to the family Papaveraceae, are used as ornamental and medicinal plants. The chloroplast genome has been used for molecular markers, evolutionary biology, and barcoding identification. In this study, the complete chloroplast genome sequences of P. rhoeas and P. orientale are reported. Results show that the complete chloroplast genomes of P. rhoeas and P. orientale have typical quadripartite structures, which are comprised of circular 152,905 and 152,799-bp-long molecules, respectively. A total of 130 genes were identified in each genome, including 85 protein-coding genes, 37 tRNA genes, and 8 rRNA genes. Sequence divergence analysis of four species from Papaveraceae indicated that the most divergent regions are found in the non-coding spacers with minimal differences among three Papaver species. These differences include the ycf1 gene and intergenic regions, such as rpoB-trnC, trnD-trnT, petA-psbJ, psbE-petL, and ccsA-ndhD. These regions are hypervariable regions, which can be used as specific DNA barcodes. This finding suggested that the chloroplast genome could be used as a powerful tool to resolve the phylogenetic positions and relationships of Papaveraceae. These results offer valuable information for future research in the identification of Papaver species and will benefit further investigations of these species.

  3. Genome-wide comparative analysis of codon usage bias and codon context patterns among cyanobacterial genomes.

    Science.gov (United States)

    Prabha, Ratna; Singh, Dhananjaya P; Sinha, Swati; Ahmad, Khurshid; Rai, Anil

    2017-04-01

    With the increasing accumulation of genomic sequence information of prokaryotes, the study of codon usage bias has gained renewed attention. The purpose of this study was to examine codon selection pattern within and across cyanobacterial species belonging to diverse taxonomic orders and habitats. We performed detailed comparative analysis of cyanobacterial genomes with respect to codon bias. Our analysis reflects that in cyanobacterial genomes, A- and/or T-ending codons were used predominantly in the genes whereas G- and/or C-ending codons were largely avoided. Variation in the codon context usage of cyanobacterial genes corresponded to the clustering of cyanobacteria as per their GC content. Analysis of codon adaptation index (CAI) and synonymous codon usage order (SCUO) revealed that majority of genes are associated with low codon bias. Codon selection pattern in cyanobacterial genomes reflected compositional constraints as major influencing factor. It is also identified that although, mutational constraint may play some role in affecting codon usage bias in cyanobacteria, compositional constraint in terms of genomic GC composition coupled with environmental factors affected codon selection pattern in cyanobacterial genomes. Copyright © 2016 Elsevier B.V. All rights reserved.

  4. A comparative phylogenetic analysis of full-length mariner elements

    Indian Academy of Sciences (India)

    Mariner like elements (MLEs) are widely distributed type II transposons with an open reading frame (ORF) for transposase. We studied comparative phylogenetic evolution and inverted terminal repeat (ITR) conservation of MLEs from Indian saturniid silkmoth, Antheraea mylitta with other full length MLEs submitted in the ...

  5. RESEARCH NOTE Genome-based exome-sequencing analysis ...

    Indian Academy of Sciences (India)

    Navya

    2017-02-22

    Feb 22, 2017 ... Genome-based exome-sequencing analysis identifies GYG1, DIS3L, DDRGK1 genes ... Cardiology Division, Department of Internal Medicine, Severance .... with p values of <0.05 byanalyzing differences in allele distribution.

  6. Genome inventory and analysis of nuclear hormone receptors in ...

    Indian Academy of Sciences (India)

    Prakash

    2006-12-20

    Dec 20, 2006 ... progestins, as well as lipids, cholesterol metabolites, and. Genome ... Gene structure analysis shows strong conservation of exon structures among orthologoues. ..... earlier subfamily classification of NRs (Nuclear Receptors.

  7. Human · mouse genome analysis and radiation biology. Proceedings

    International Nuclear Information System (INIS)

    Hori, Tada-aki

    1994-03-01

    This issue is the collection of the papers presented at the 25th NIRS symposium on Human, Mouse Genome Analysis and Radiation Biology. The 14 of the presented papers are indexed individually. (J.P.N.)

  8. Comparative analysis of rosaceous genomes and the reconstruction of a putative ancestral genome for the family.

    Science.gov (United States)

    Illa, Eudald; Sargent, Daniel J; Lopez Girona, Elena; Bushakra, Jill; Cestaro, Alessandro; Crowhurst, Ross; Pindo, Massimo; Cabrera, Antonio; van der Knaap, Esther; Iezzoni, Amy; Gardiner, Susan; Velasco, Riccardo; Arús, Pere; Chagné, David; Troggio, Michela

    2011-01-12

    Comparative genome mapping studies in Rosaceae have been conducted until now by aligning genetic maps within the same genus, or closely related genera and using a limited number of common markers. The growing body of genomics resources and sequence data for both Prunus and Fragaria permits detailed comparisons between these genera and the recently released Malus × domestica genome sequence. We generated a comparative analysis using 806 molecular markers that are anchored genetically to the Prunus and/or Fragaria reference maps, and physically to the Malus genome sequence. Markers in common for Malus and Prunus, and Malus and Fragaria, respectively were 784 and 148. The correspondence between marker positions was high and conserved syntenic blocks were identified among the three genera in the Rosaceae. We reconstructed a proposed ancestral genome for the Rosaceae. A genome containing nine chromosomes is the most likely candidate for the ancestral Rosaceae progenitor. The number of chromosomal translocations observed between the three genera investigated was low. However, the number of inversions identified among Malus and Prunus was much higher than any reported genome comparisons in plants, suggesting that small inversions have played an important role in the evolution of these two genera or of the Rosaceae.

  9. Sequencing and comparative genome analysis of two pathogenic Streptococcus gallolyticus subspecies: genome plasticity, adaptation and virulence.

    Directory of Open Access Journals (Sweden)

    I-Hsuan Lin

    Full Text Available Streptococcus gallolyticus infections in humans are often associated with bacteremia, infective endocarditis and colon cancers. The disease manifestations are different depending on the subspecies of S. gallolyticus causing the infection. Here, we present the complete genomes of S. gallolyticus ATCC 43143 (biotype I and S. pasteurianus ATCC 43144 (biotype II.2. The genomic differences between the two biotypes were characterized with comparative genomic analyses. The chromosome of ATCC 43143 and ATCC 43144 are 2,36 and 2,10 Mb in length and encode 2246 and 1869 CDS respectively. The organization and genomic contents of both genomes were most similar to the recently published S. gallolyticus UCN34, where 2073 (92% and 1607 (86% of the ATCC 43143 and ATCC 43144 CDS were conserved in UCN34 respectively. There are around 600 CDS conserved in all Streptococcus genomes, indicating the Streptococcus genus has a small core-genome (constitute around 30% of total CDS and substantial evolutionary plasticity. We identified eight and five regions of genome plasticity in ATCC 43143 and ATCC 43144 respectively. Within these regions, several proteins were recognized to contribute to the fitness and virulence of each of the two subspecies. We have also predicted putative cell-surface associated proteins that could play a role in adherence to host tissues, leading to persistent infections causing sub-acute and chronic diseases in humans. This study showed evidence that the S. gallolyticus still possesses genes making it suitable in a rumen environment, whereas the ability for S. pasteurianus to live in rumen is reduced. The genome heterogeneity and genetic diversity among the two biotypes, especially membrane and lipoproteins, most likely contribute to the differences in the pathogenesis of the two S. gallolyticus biotypes and the type of disease an infected patient eventually develops.

  10. Short and long-term genome stability analysis of prokaryotic genomes.

    Science.gov (United States)

    Brilli, Matteo; Liò, Pietro; Lacroix, Vincent; Sagot, Marie-France

    2013-05-08

    Gene organization dynamics is actively studied because it provides useful evolutionary information, makes functional annotation easier and often enables to characterize pathogens. There is therefore a strong interest in understanding the variability of this trait and the possible correlations with life-style. Two kinds of events affect genome organization: on one hand translocations and recombinations change the relative position of genes shared by two genomes (i.e. the backbone gene order); on the other, insertions and deletions leave the backbone gene order unchanged but they alter the gene neighborhoods by breaking the syntenic regions. A complete picture about genome organization evolution therefore requires to account for both kinds of events. We developed an approach where we model chromosomes as graphs on which we compute different stability estimators; we consider genome rearrangements as well as the effect of gene insertions and deletions. In a first part of the paper, we fit a measure of backbone gene order conservation (hereinafter called backbone stability) against phylogenetic distance for over 3000 genome comparisons, improving existing models for the divergence in time of backbone stability. Intra- and inter-specific comparisons were treated separately to focus on different time-scales. The use of multiple genomes of a same species allowed to identify genomes with diverging gene order with respect to their conspecific. The inter-species analysis indicates that pathogens are more often unstable with respect to non-pathogens. In a second part of the text, we show that in pathogens, gene content dynamics (insertions and deletions) have a much more dramatic effect on genome organization stability than backbone rearrangements. In this work, we studied genome organization divergence taking into account the contribution of both genome order rearrangements and genome content dynamics. By studying species with multiple sequenced genomes available, we were

  11. Data on genome analysis of Bacillus velezensis LS69.

    Science.gov (United States)

    Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming

    2017-08-01

    The data presented in this article are related to the published entitled "Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria" (Liu et al., 2017) [1]. Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.

  12. Data on genome analysis of Bacillus velezensis LS69

    OpenAIRE

    Liu, Guoqiang; Kong, Yingying; Fan, Yajing; Geng, Ce; Peng, Donghai; Sun, Ming

    2017-01-01

    The data presented in this article are related to the published entitled “Whole-genome sequencing of Bacillus velezensis LS69, a strain with a broad inhibitory spectrum against pathogenic bacteria” (Liu et al., 2017) [1]. Genome analysis revealed B. velezensis LS69 has a good potential for biocontrol and plant growth promotion. This article provides an extended analysis of the genetic islands, core genes and amylolysin loci of B. velezensis LS69.

  13. Genomic Analysis of Complex Microbial Communities in Wounds

    Science.gov (United States)

    2012-01-01

    Permutation Multivariate Analysis of Variance ( PerMANOVA ). We used PerMANOVA to test the null-hypothesis of no... permutation -based version of the multivariate analysis of variance (MANOVA). PerMANOVA uses the distances between samples to partition variance and...coli. Antibiotics, bacteria, community analysis , diabetes, pyrosequencing, wound, wound therapy, 16S rRNA gene Genomic Analysis of Complex

  14. Mycobacterial species as case-study of comparative genome analysis.

    Science.gov (United States)

    Zakham, F; Belayachi, L; Ussery, D; Akrim, M; Benjouad, A; El Aouad, R; Ennaji, M M

    2011-02-08

    The genus Mycobacterium represents more than 120 species including important pathogens of human and cause major public health problems and illnesses. Further, with more than 100 genome sequences from this genus, comparative genome analysis can provide new insights for better understanding the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str. Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length of genomes, GC content, number of genes in different data bases (Genbank, Refseq, and Prodigal). The BLAST matrix of these genomes has been figured to give a lot of information about the similarity between species in a simple scheme. As a result of multiple genome analysis, the pan and core genome have been defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene for tuberculosis and non tuberculosis Mycobacteria to understand the evolutionary events of these species.

  15. Genomic sequence around butterfly wing development genes: annotation and comparative analysis.

    Directory of Open Access Journals (Sweden)

    Inês C Conceição

    Full Text Available BACKGROUND: Analysis of genomic sequence allows characterization of genome content and organization, and access beyond gene-coding regions for identification of functional elements. BAC libraries, where relatively large genomic regions are made readily available, are especially useful for species without a fully sequenced genome and can increase genomic coverage of phylogenetic and biological diversity. For example, no butterfly genome is yet available despite the unique genetic and biological properties of this group, such as diversified wing color patterns. The evolution and development of these patterns is being studied in a few target species, including Bicyclus anynana, where a whole-genome BAC library allows targeted access to large genomic regions. METHODOLOGY/PRINCIPAL FINDINGS: We characterize ∼1.3 Mb of genomic sequence around 11 selected genes expressed in B. anynana developing wings. Extensive manual curation of in silico predictions, also making use of a large dataset of expressed genes for this species, identified repetitive elements and protein coding sequence, and highlighted an expansion of Alcohol dehydrogenase genes. Comparative analysis with orthologous regions of the lepidopteran reference genome allowed assessment of conservation of fine-scale synteny (with detection of new inversions and translocations and of DNA sequence (with detection of high levels of conservation of non-coding regions around some, but not all, developmental genes. CONCLUSIONS: The general properties and organization of the available B. anynana genomic sequence are similar to the lepidopteran reference, despite the more than 140 MY divergence. Our results lay the groundwork for further studies of new interesting findings in relation to both coding and non-coding sequence: 1 the Alcohol dehydrogenase expansion with higher similarity between the five tandemly-repeated B. anynana paralogs than with the corresponding B. mori orthologs, and 2 the high

  16. Analysis of full scale impact into an abutment

    International Nuclear Information System (INIS)

    Fullard, K.; Dowler, H.J.; Soanes, T.P.T.

    1985-01-01

    A 60mph impact into a tunnel abutment, of a flask on a railway flatrol with following vehicles, is shown to be a much less severe event for the flask than a 9 metre drop test to IAEA regulations. This involves the use of mathematical models of the full scale event of the same type as were employed in studying the behaviour of quarter scale models. The latter were subject to actual impact testing as part of the validation process. (author)

  17. Detection and full genome characterization of two beta CoV viruses related to Middle East respiratory syndrome from bats in Italy.

    Science.gov (United States)

    Moreno, Ana; Lelli, Davide; de Sabato, Luca; Zaccaria, Guendalina; Boni, Arianna; Sozzi, Enrica; Prosperi, Alice; Lavazza, Antonio; Cella, Eleonora; Castrucci, Maria Rita; Ciccozzi, Massimo; Vaccari, Gabriele

    2017-12-19

    Middle East respiratory syndrome coronavirus (MERS-CoV), which belongs to beta group of coronavirus, can infect multiple host species and causes severe diseases in humans. Multiple surveillance and phylogenetic studies suggest a bat origin. In this study, we describe the detection and full genome characterization of two CoVs closely related to MERS-CoV from two Italian bats, Pipistrellus kuhlii and Hypsugo savii. Pool of viscera were tested by a pan-coronavirus RT-PCR. Virus isolation was attempted by inoculation in different cell lines. Full genome sequencing was performed using the Ion Torrent platform and phylogenetic trees were performed using IQtree software. Similarity plots of CoV clade c genomes were generated by using SSE v1.2. The three dimensional macromolecular structure (3DMMS) of the receptor binding domain (RBD) in the S protein was predicted by sequence-homology method using the protein data bank (PDB). Both samples resulted positive to the pan-coronavirus RT-PCR (IT-batCoVs) and their genome organization showed identical pattern of MERS CoV. Phylogenetic analysis showed a monophyletic group placed in the Beta2c clade formed by MERS-CoV sequences originating from humans and camels and bat-related sequences from Africa, Italy and China. The comparison of the secondary and 3DMMS of the RBD of IT-batCoVs with MERS, HKU4 and HKU5 bat sequences showed two aa deletions located in a region corresponding to the external subdomain of MERS-RBD in IT-batCoV and HKU5 RBDs. This study reported two beta CoVs closely related to MERS that were obtained from two bats belonging to two commonly recorded species in Italy (P. kuhlii and H. savii). The analysis of the RBD showed similar structure in IT-batCoVs and HKU5 respect to HKU4 sequences. Since the RBD domain of HKU4 but not HKU5 can bind to the human DPP4 receptor for MERS-CoV, it is possible to suggest also for IT-batCoVs the absence of DPP4-binding potential. More surveillance studies are needed to better

  18. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Directory of Open Access Journals (Sweden)

    Rodrigo Pessôa

    Full Text Available BACKGROUND: Here, we report on the partial and full-length genomic (FLG variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs, 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP and 7 adult T-cell leukemia/lymphoma (ATLL patients, using an Illumina paired-end protocol. METHODS: Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. RESULTS: A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14 and FLG (n = 76 data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5% individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA and that 4 individuals (4.5% were infected with the Japanese sub-subtypes (aB. A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. CONCLUSIONS: This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data

  19. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Directory of Open Access Journals (Sweden)

    Gurusamy Raman

    Full Text Available Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC region (82,805 bp, with some variations in the inverted repeat region A (IRA/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19 was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA and ribosomal protein subunit L23 (rpl23 genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  20. Full core reactor analysis: Running Denovo on Jaguar

    Energy Technology Data Exchange (ETDEWEB)

    Jarrell, J. J.; Godfrey, A. T.; Evans, T. M.; Davidson, G. G. [Oak Ridge National Laboratory, PO Box 2008, Oak Ridge, TN 37831 (United States)

    2012-07-01

    Fully-consistent, full-core, 3D, deterministic neutron transport simulations using the orthogonal mesh code Denovo were run on the massively parallel computing architecture Jaguar XT5. Using energy and spatial parallelization schemes, Denovo was able to efficiently scale to more than 160 k processors. Cell-homogenized cross sections were used with step-characteristics, linear-discontinuous finite element, and trilinear-discontinuous finite element spatial methods. It was determined that using the finite element methods gave considerably more accurate eigenvalue solutions for large-aspect ratio meshes than using step-characteristics. (authors)

  1. Applied bioinformatics: Genome annotation and transcriptome analysis

    DEFF Research Database (Denmark)

    Gupta, Vikas

    agricultural and biological importance. Its capacity to form symbiotic relationships with rhizobia and microrrhizal fungi has fascinated researchers for years. Lotus has a small genome of approximately 470 Mb and a short life cycle of 2 to 3 months, which has made Lotus a model legume plant for many molecular...

  2. Comparative genome analysis of trypanotolerance QTL | Nganga ...

    African Journals Online (AJOL)

    Homologous sequences were used in the definition of synteny relationships and subsequent identification of the shared disease response genes. The homologous genes within the human genome were then identified and aligned to the bovine radiation hybrid map in order to identify the mouse/bovine homologous regions.

  3. Genome-wide identification of specific oligonucleotides using artificial neural network and computational genomic analysis

    Directory of Open Access Journals (Sweden)

    Chen Jiun-Ching

    2007-05-01

    Full Text Available Abstract Background Genome-wide identification of specific oligonucleotides (oligos is a computationally-intensive task and is a requirement for designing microarray probes, primers, and siRNAs. An artificial neural network (ANN is a machine learning technique that can effectively process complex and high noise data. Here, ANNs are applied to process the unique subsequence distribution for prediction of specific oligos. Results We present a novel and efficient algorithm, named the integration of ANN and BLAST (IAB algorithm, to identify specific oligos. We establish the unique marker database for human and rat gene index databases using the hash table algorithm. We then create the input vectors, via the unique marker database, to train and test the ANN. The trained ANN predicted the specific oligos with high efficiency, and these oligos were subsequently verified by BLAST. To improve the prediction performance, the ANN over-fitting issue was avoided by early stopping with the best observed error and a k-fold validation was also applied. The performance of the IAB algorithm was about 5.2, 7.1, and 6.7 times faster than the BLAST search without ANN for experimental results of 70-mer, 50-mer, and 25-mer specific oligos, respectively. In addition, the results of polymerase chain reactions showed that the primers predicted by the IAB algorithm could specifically amplify the corresponding genes. The IAB algorithm has been integrated into a previously published comprehensive web server to support microarray analysis and genome-wide iterative enrichment analysis, through which users can identify a group of desired genes and then discover the specific oligos of these genes. Conclusion The IAB algorithm has been developed to construct SpecificDB, a web server that provides a specific and valid oligo database of the probe, siRNA, and primer design for the human genome. We also demonstrate the ability of the IAB algorithm to predict specific oligos through

  4. Genome-Wide Analysis of Grain Yield Stability and Environmental Interactions in a Multiparental Soybean Population

    Directory of Open Access Journals (Sweden)

    Alencar Xavier

    2018-02-01

    Full Text Available Genetic improvement toward optimized and stable agronomic performance of soybean genotypes is desirable for food security. Understanding how genotypes perform in different environmental conditions helps breeders develop sustainable cultivars adapted to target regions. Complex traits of importance are known to be controlled by a large number of genomic regions with small effects whose magnitude and direction are modulated by environmental factors. Knowledge of the constraints and undesirable effects resulting from genotype by environmental interactions is a key objective in improving selection procedures in soybean breeding programs. In this study, the genetic basis of soybean grain yield responsiveness to environmental factors was examined in a large soybean nested association population. For this, a genome-wide association to performance stability estimates generated from a Finlay-Wilkinson analysis and the inclusion of the interaction between marker genotypes and environmental factors was implemented. Genomic footprints were investigated by analysis and meta-analysis using a recently published multiparent model. Results indicated that specific soybean genomic regions were associated with stability, and that multiplicative interactions were present between environments and genetic background. Seven genomic regions in six chromosomes were identified as being associated with genotype-by-environment interactions. This study provides insight into genomic assisted breeding aimed at achieving a more stable agronomic performance of soybean, and documented opportunities to exploit genomic regions that were specifically associated with interactions involving environments and subpopulations.

  5. A performance analysis in AF full duplex relay selection network

    Science.gov (United States)

    Ngoc, Long Nguyen; Hong, Nhu Nguyen; Loan, Nguyen Thi Phuong; Kieu, Tam Nguyen; Voznak, Miroslav; Zdralek, Jaroslav

    2018-04-01

    This paper studies on the relaying selective matter in amplify-and-forward (AF) cooperation communication with full-duplex (FD) activity. Various relay choice models supposing the present of different instant information are investigated. We examine a maximal relaying choice that optimizes the instant FD channel capacity and asks for global channel state information (CSI) as well as partial CSI learning. To make comparison easy, accurate outage probability clauses and asymptote form of these strategies that give a diversity rank are extracted. From that, we can see clearly that the number of relays, noise factor, the transmittance coefficient as well as the information transfer power had impacted on their performance. Besides, the optimal relay selection (ORS) model can promote than that of the partial relay selection (PRS) model.

  6. Full text clustering and relationship network analysis of biomedical publications.

    Science.gov (United States)

    Guan, Renchu; Yang, Chen; Marchese, Maurizio; Liang, Yanchun; Shi, Xiaohu

    2014-01-01

    Rapid developments in the biomedical sciences have increased the demand for automatic clustering of biomedical publications. In contrast to current approaches to text clustering, which focus exclusively on the contents of abstracts, a novel method is proposed for clustering and analysis of complete biomedical article texts. To reduce dimensionality, Cosine Coefficient is used on a sub-space of only two vectors, instead of computing the Euclidean distance within the space of all vectors. Then a strategy and algorithm is introduced for Semi-supervised Affinity Propagation (SSAP) to improve analysis efficiency, using biomedical journal names as an evaluation background. Experimental results show that by avoiding high-dimensional sparse matrix computations, SSAP outperforms conventional k-means methods and improves upon the standard Affinity Propagation algorithm. In constructing a directed relationship network and distribution matrix for the clustering results, it can be noted that overlaps in scope and interests among BioMed publications can be easily identified, providing a valuable analytical tool for editors, authors and readers.

  7. Analysis of the Mannshan Unit 2 full load rejection transient

    International Nuclear Information System (INIS)

    Kang, J.C.; Pei, B.S.; Yu, G.P.; Yuann, R.Y.

    1987-01-01

    Mannshan Unit 2 is a Westinghouse three-loop pressurized water reactor with a rated core power of 2775 MW(thermal) and a rated core flow of 4702 kg/s. Before full power operation, a planned net load rejection was performed during the startup test by opening the main transformer highside breakers. The generator power rapidly reduced to station load. All 16 steam dump valves immediately popped open, and control bank-D rods automatically stepped in as the temperature difference T/sub avg/ - T/sub ref/ reached a programmed 2.8 0 C. Nuclear power decreased smoothly as control rods were inserted into the core. The pressurizer pressure and liquid levels also dropped. Neither safety injection nor reactor trip occurred during this transient. The test was done to verify that the whole system would function properly under a transient to keep the reactor from scramming and that the vessel integrity would also be protected. In this study, which is the preliminary stage of RELAP5/MOD2 transient simulation of the Mannshan PWR plants, system thermal-hydraulic response is tested first and isolated from the neutronic effects. The variation of core power versus time curve was extracted from the power test data to serve as a time varying boundary condition. The comparison of the analytical results of four major parameters (pressurizer pressure, average temperature of the core, steam dump flow rate, and feedwater flow rate) from RELAP5/MOD2 and the power test data is illustrated

  8. Molecular cytogenetic (FISH and genome analysis of diploid wheatgrasses and their phylogenetic relationship.

    Directory of Open Access Journals (Sweden)

    Gabriella Linc

    Full Text Available This paper reports detailed FISH-based karyotypes for three diploid wheatgrass species Agropyron cristatum (L. Beauv., Thinopyrum bessarabicum (Savul.&Rayss A. Löve, Pseudoroegneria spicata (Pursh A. Löve, the supposed ancestors of hexaploid Thinopyrum intermedium (Host Barkworth & D.R.Dewey, compiled using DNA repeats and comparative genome analysis based on COS markers. Fluorescence in situ hybridization (FISH with repetitive DNA probes proved suitable for the identification of individual chromosomes in the diploid JJ, StSt and PP genomes. Of the seven microsatellite markers tested only the (GAAn trinucleotide sequence was appropriate for use as a single chromosome marker for the P. spicata AS chromosome. Based on COS marker analysis, the phylogenetic relationship between diploid wheatgrasses and the hexaploid bread wheat genomes was established. These findings confirmed that the J and E genomes are in neighbouring clusters.

  9. Genetic Characterization and Comparative Genome Analysis of Brucella melitensis Isolates from India

    Directory of Open Access Journals (Sweden)

    Sarwar Azam

    2016-01-01

    Full Text Available Brucellosis is the most frequent zoonotic disease worldwide, with over 500,000 new human infections every year. Brucella melitensis, the most virulent species in humans, primarily affects goats and the zoonotic transmission occurs by ingestion of unpasteurized milk products or through direct contact with fetal tissues. Brucellosis is endemic in India but no information is available on population structure and genetic diversity of Brucella spp. in India. We performed multilocus sequence typing of four B. melitensis strains isolated from naturally infected goats from India. For more detailed genetic characterization, we carried out whole genome sequencing and comparative genome analysis of one of the B. melitensis isolates, Bm IND1. Genome analysis identified 141 unique SNPs, 78 VNTRs, 51 Indels, and 2 putative prophage integrations in the Bm IND1 genome. Our data may help to develop improved epidemiological typing tools and efficient preventive strategies to control brucellosis.

  10. Comparative analysis of the mitochondrial genomes in gastropods

    International Nuclear Information System (INIS)

    Arquez, Moises; Uribe, Juan Esteban; Castro, Lyda Raquel

    2012-01-01

    In this work we presented a comparative analysis of the mitochondrial genomes in gastropods. Nucleotide and amino acids composition was calculated and a comparative visual analysis of the start and termination codons was performed. The organization of the genome was compared calculating the number of intergenic sequences, the location of the genes and the number of reorganized genes (breakpoints) in comparison with the sequence that is presumed to be ancestral for the group. In order to calculate variations in the rates of molecular evolution within the group, the relative rate test was performed. In spite of the differences in the size of the genomes, the amino acids number is conserved. The nucleotide and amino acid composition is similar between Vetigastropoda, Ceanogastropoda and Neritimorpha in comparison to Heterobranchia and Patellogastropoda. The mitochondrial genomes of the group are very compact with few intergenic sequences, the only exception is the genome of Patellogastropoda with 26,828 bp. Start codons of the Heterobranchia and Patellogastropoda are very variable and there is also an increase in genome rearrangements for these two groups. Generally, the hypothesis of constant rates of molecular evolution between the groups is rejected, except when the genomes of Caenogastropoda and Vetigastropoda are compared.

  11. MIPS: analysis and annotation of proteins from whole genomes.

    Science.gov (United States)

    Mewes, H W; Amid, C; Arnold, R; Frishman, D; Güldener, U; Mannhaupt, G; Münsterkötter, M; Pagel, P; Strack, N; Stümpflen, V; Warfsmann, J; Ruepp, A

    2004-01-01

    The Munich Information Center for Protein Sequences (MIPS-GSF), Neuherberg, Germany, provides protein sequence-related information based on whole-genome analysis. The main focus of the work is directed toward the systematic organization of sequence-related attributes as gathered by a variety of algorithms, primary information from experimental data together with information compiled from the scientific literature. MIPS maintains automatically generated and manually annotated genome-specific databases, develops systematic classification schemes for the functional annotation of protein sequences and provides tools for the comprehensive analysis of protein sequences. This report updates the information on the yeast genome (CYGD), the Neurospora crassa genome (MNCDB), the database of complete cDNAs (German Human Genome Project, NGFN), the database of mammalian protein-protein interactions (MPPI), the database of FASTA homologies (SIMAP), and the interface for the fast retrieval of protein-associated information (QUIPOS). The Arabidopsis thaliana database, the rice database, the plant EST databases (MATDB, MOsDB, SPUTNIK), as well as the databases for the comprehensive set of genomes (PEDANT genomes) are described elsewhere in the 2003 and 2004 NAR database issues, respectively. All databases described, and the detailed descriptions of our projects can be accessed through the MIPS web server (http://mips.gsf.de).

  12. A Mitochondrial Genome of Rhyparochromidae (Hemiptera: Heteroptera) and a Comparative Analysis of Related Mitochondrial Genomes.

    Science.gov (United States)

    Li, Teng; Yang, Jie; Li, Yinwan; Cui, Ying; Xie, Qiang; Bu, Wenjun; Hillis, David M

    2016-10-19

    The Rhyparochromidae, the largest family of Lygaeoidea, encompasses more than 1,850 described species, but no mitochondrial genome has been sequenced to date. Here we describe the first mitochondrial genome for Rhyparochromidae: a complete mitochondrial genome of Panaorus albomaculatus (Scott, 1874). This mitochondrial genome is comprised of 16,345 bp, and contains the expected 37 genes and control region. The majority of the control region is made up of a large tandem-repeat region, which has a novel pattern not previously observed in other insects. The tandem-repeats region of P. albomaculatus consists of 53 tandem duplications (including one partial repeat), which is the largest number of tandem repeats among all the known insect mitochondrial genomes. Slipped-strand mispairing during replication is likely to have generated this novel pattern of tandem repeats. Comparative analysis of tRNA gene families in sequenced Pentatomomorpha and Lygaeoidea species shows that the pattern of nucleotide conservation is markedly higher on the J-strand. Phylogenetic reconstruction based on mitochondrial genomes suggests that Rhyparochromidae is not the sister group to all the remaining Lygaeoidea, and supports the monophyly of Lygaeoidea.

  13. Molecular characterization of human T-cell lymphotropic virus type 1 full and partial genomes by Illumina massively parallel sequencing technology.

    Science.gov (United States)

    Pessôa, Rodrigo; Watanabe, Jaqueline Tomoko; Nukui, Youko; Pereira, Juliana; Casseb, Jorge; Kasseb, Jorge; de Oliveira, Augusto César Penalva; Segurado, Aluisio Cotrim; Sanabani, Sabri Saeed

    2014-01-01

    Here, we report on the partial and full-length genomic (FLG) variability of HTLV-1 sequences from 90 well-characterized subjects, including 48 HTLV-1 asymptomatic carriers (ACs), 35 HTLV-1-associated myelopathy/tropical spastic paraparesis (HAM/TSP) and 7 adult T-cell leukemia/lymphoma (ATLL) patients, using an Illumina paired-end protocol. Blood samples were collected from 90 individuals, and DNA was extracted from the PBMCs to measure the proviral load and to amplify the HTLV-1 FLG from two overlapping fragments. The amplified PCR products were subjected to deep sequencing. The sequencing data were assembled, aligned, and mapped against the HTLV-1 genome with sufficient genetic resemblance and utilized for further phylogenetic analysis. A high-throughput sequencing-by-synthesis instrument was used to obtain an average of 3210- and 5200-fold coverage of the partial (n = 14) and FLG (n = 76) data from the HTLV-1 strains, respectively. The results based on the phylogenetic trees of consensus sequences from partial and FLGs revealed that 86 (95.5%) individuals were infected with the transcontinental sub-subtypes of the cosmopolitan subtype (aA) and that 4 individuals (4.5%) were infected with the Japanese sub-subtypes (aB). A comparison of the nucleotide and amino acids of the FLG between the three clinical settings yielded no correlation between the sequenced genotype and clinical outcomes. The evolutionary relationships among the HTLV sequences were inferred from nucleotide sequence, and the results are consistent with the hypothesis that there were multiple introductions of the transcontinental subtype in Brazil. This study has increased the number of subtype aA full-length genomes from 8 to 81 and HTLV-1 aB from 2 to 5 sequences. The overall data confirmed that the cosmopolitan transcontinental sub-subtypes were the most prevalent in the Brazilian population. It is hoped that this valuable genomic data will add to our current understanding of the

  14. COGNAT: a web server for comparative analysis of genomic neighborhoods.

    Science.gov (United States)

    Klimchuk, Olesya I; Konovalov, Kirill A; Perekhvatov, Vadim V; Skulachev, Konstantin V; Dibrova, Daria V; Mulkidjanian, Armen Y

    2017-11-22

    In prokaryotic genomes, functionally coupled genes can be organized in conserved gene clusters enabling their coordinated regulation. Such clusters could contain one or several operons, which are groups of co-transcribed genes. Those genes that evolved from a common ancestral gene by speciation (i.e. orthologs) are expected to have similar genomic neighborhoods in different organisms, whereas those copies of the gene that are responsible for dissimilar functions (i.e. paralogs) could be found in dissimilar genomic contexts. Comparative analysis of genomic neighborhoods facilitates the prediction of co-regulated genes and helps to discern different functions in large protein families. We intended, building on the attribution of gene sequences to the clusters of orthologous groups of proteins (COGs), to provide a method for visualization and comparative analysis of genomic neighborhoods of evolutionary related genes, as well as a respective web server. Here we introduce the COmparative Gene Neighborhoods Analysis Tool (COGNAT), a web server for comparative analysis of genomic neighborhoods. The tool is based on the COG database, as well as the Pfam protein families database. As an example, we show the utility of COGNAT in identifying a new type of membrane protein complex that is formed by paralog(s) of one of the membrane subunits of the NADH:quinone oxidoreductase of type 1 (COG1009) and a cytoplasmic protein of unknown function (COG3002). This article was reviewed by Drs. Igor Zhulin, Uri Gophna and Igor Rogozin.

  15. Arabidopsis transcription factors: genome-wide comparative analysis among eukaryotes.

    Science.gov (United States)

    Riechmann, J L; Heard, J; Martin, G; Reuber, L; Jiang, C; Keddie, J; Adam, L; Pineda, O; Ratcliffe, O J; Samaha, R R; Creelman, R; Pilgrim, M; Broun, P; Zhang, J Z; Ghandehari, D; Sherman, B K; Yu, G

    2000-12-15

    The completion of the Arabidopsis thaliana genome sequence allows a comparative analysis of transcriptional regulators across the three eukaryotic kingdoms. Arabidopsis dedicates over 5% of its genome to code for more than 1500 transcription factors, about 45% of which are from families specific to plants. Arabidopsis transcription factors that belong to families common to all eukaryotes do not share significant similarity with those of the other kingdoms beyond the conserved DNA binding domains, many of which have been arranged in combinations specific to each lineage. The genome-wide comparison reveals the evolutionary generation of diversity in the regulation of transcription.

  16. Full-field Strain Analysis of a Ski Boot

    Science.gov (United States)

    Reiter, M.; Singer, G.; Major, Z.

    2010-06-01

    The quality of the ski boots plays an extraordinary important role in the performance and in the safety of the skiers. The deformation behavior of a racing class ski boot was characterized by using the digital image correlation technique in this study. The boot was gripped in the ski binding and 3 types of motions of the skiers and the deformations of the boot were simulated by a professional skier in the laboratory. First, the buckles were closed in 4 stages and the resulting strains were measured. Furthermore, the skier positioned his balance continuously forward, resulting in a high overall bending deformation of the boot. The leg of the skier acted as a bending arm and pushed the upper part of the boot forward. This loading situation was assumed as quasistatic and was repeated several times. Finally, the skier jumped and this dynamic movement was recorded by using two high speed cameras for 3D analysis. Special focus was devoted to the measurement of the deformation of the boot during the contact of the ski with the ground of the laboratory. Both the displacement of the upper part and the local strain in selected areas of the boot was determined for both quasi-static and dynamic test conditions and are discussed in the paper.

  17. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones.

    Science.gov (United States)

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Söderlund-Venermo, Maria; Young, Neal S; Brown, Kevin E

    2008-05-10

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus.

  18. VP1u phospholipase activity is critical for infectivity of full-length parvovirus B19 genomic clones✰

    Science.gov (United States)

    Filippone, Claudia; Zhi, Ning; Wong, Susan; Lu, Jun; Kajigaya, Sachiko; Gallinella, Giorgio; Kakkola, Laura; Venermo, Maria S Söderlund; Young, Neal S.; Brown, Kevin E.

    2008-01-01

    Three full-length genomic clones (pB19-M20, pB19-FL and pB19-HG1) of parvovirus B19 were produced in different laboratories. pB19-M20 was shown to produce infectious virus. To determine the differences in infectivity, all three plasmids were tested by transfection and infection assays. All three clones were similar in viral DNA replication, RNA transcription, and viral capsid protein production. However, only pB19-M20 and pB19-HG1 produced infectious virus. Comparison of viral sequences showed no significant differences in ITR or NS regions. In the capsid region, there was a nucleotide sequence difference conferring an amino acid substitution (E176K) in the phospholipase A2-like motif of the VP1-unique (VP1u) region. The recombinant VP1u with the E176K mutation had no catalytic activity as compared with the wild-type. When this mutation was introduced into pB19-M20, infectivity was significantly attenuated, confirming the critical role of this motif. Investigation of the original serum from which pB19-FL was cloned confirmed that the phospholipase mutation was present in the native B19 virus. PMID:18252260

  19. Multivariate analysis of full-term neonatal polysomnographic data.

    Science.gov (United States)

    Gerla, V; Paul, K; Lhotska, L; Krajca, V

    2009-01-01

    Polysomnography (PSG) is one of the most important noninvasive methods for studying maturation of the child brain. Sleep in infants is significantly different from sleep in adults. This paper addresses the problem of computer analysis of neonatal polygraphic signals. We applied methods designed for differentiating three important neonatal behavioral states: quiet sleep, active sleep, and wakefulness. The proportion of these states is a significant indicator of the maturity of the newborn brain in clinical practice. In this study, we used data provided by the Institute for Care of Mother and Child, Prague (12 newborn infants of similar postconceptional age). The data were scored by an experienced physician to four states (wake, quiet sleep, active sleep, movement artifact). For accurate classification, it was necessary to determine the most informative features. We used a method based on power spectral density (PSD) applied to each EEG channel. We also used features derived from electrooculogram (EOG), electromyogram (EMG), ECG, and respiration [pneumogram (PNG)] signals. The most informative feature was the measure of regularity of respiration from the PNG signal. We designed an algorithm for interpreting these characteristics. This algorithm was based on Markov models. The results of automatic detection of sleep states were compared to the "sleep profiles" determined visually. We evaluated both the success rate and the true positive rate of the classification, and statistically significant agreement of the two scorings was found. Two variants, for learning and for testing, were applied, namely learning from the data of all 12 newborns and tenfold cross-validation, and learning from the data of 11 newborns and testing on the data from the 12th newborn. We utilized information obtained from several biological signals (EEG, ECG, PNG, EMG, EOG) for our final classification. We reached the final success rate of 82.5%. The true positive rate was 81.8% and the false

  20. Diversity of Pseudomonas Genomes, Including Populus-Associated Isolates, as Revealed by Comparative Genome Analysis.

    Science.gov (United States)

    Jun, Se-Ran; Wassenaar, Trudy M; Nookaew, Intawat; Hauser, Loren; Wanchai, Visanu; Land, Miriam; Timm, Collin M; Lu, Tse-Yuan S; Schadt, Christopher W; Doktycz, Mitchel J; Pelletier, Dale A; Ussery, David W

    2016-01-01

    The Pseudomonas genus contains a metabolically versatile group of organisms that are known to occupy numerous ecological niches, including the rhizosphere and endosphere of many plants. Their diversity influences the phylogenetic diversity and heterogeneity of these communities. On the basis of average amino acid identity, comparative genome analysis of >1,000 Pseudomonas genomes, including 21 Pseudomonas strains isolated from the roots of native Populus deltoides (eastern cottonwood) trees resulted in consistent and robust genomic clusters with phylogenetic homogeneity. All Pseudomonas aeruginosa genomes clustered together, and these were clearly distinct from other Pseudomonas species groups on the basis of pangenome and core genome analyses. In contrast, the genomes of Pseudomonas fluorescens were organized into 20 distinct genomic clusters, representing enormous diversity and heterogeneity. Most of our 21 Populus-associated isolates formed three distinct subgroups within the major P. fluorescens group, supported by pathway profile analysis, while two isolates were more closely related to Pseudomonas chlororaphis and Pseudomonas putida. Genes specific to Populus-associated subgroups were identified. Genes specific to subgroup 1 include several sensory systems that act in two-component signal transduction, a TonB-dependent receptor, and a phosphorelay sensor. Genes specific to subgroup 2 contain hypothetical genes, and genes specific to subgroup 3 were annotated with hydrolase activity. This study justifies the need to sequence multiple isolates, especially from P. fluorescens, which displays the most genetic variation, in order to study functional capabilities from a pangenomic perspective. This information will prove useful when choosing Pseudomonas strains for use to promote growth and increase disease resistance in plants. Copyright © 2015 Jun et al.

  1. Comparative genomic analysis of Vibrio parahaemolyticus: serotype conversion and virulence

    Directory of Open Access Journals (Sweden)

    Gil Ana I

    2011-06-01

    Full Text Available Abstract Background Vibrio parahaemolyticus is a common cause of foodborne disease. Beginning in 1996, a more virulent strain having serotype O3:K6 caused major outbreaks in India and other parts of the world, resulting in the emergence of a pandemic. Other serovariants of this strain emerged during its dissemination and together with the original O3:K6 were termed strains of the pandemic clone. Two genomes, one of this virulent strain and one pre-pandemic strain have been sequenced. We sequenced four additional genomes of V. parahaemolyticus in this study that were isolated from different geographical regions and time points. Comparative genomic analyses of six strains of V. parahaemolyticus isolated from Asia and Peru were performed in order to advance knowledge concerning the evolution of V. parahaemolyticus; specifically, the genetic changes contributing to serotype conversion and virulence. Two pre-pandemic strains and three pandemic strains, isolated from different geographical regions, were serotype O3:K6 and either toxin profiles (tdh+, trh- or (tdh-, trh+. The sixth pandemic strain sequenced in this study was serotype O4:K68. Results Genomic analyses revealed that the trh+ and tdh+ strains had different types of pathogenicity islands and mobile elements as well as major structural differences between the tdh pathogenicity islands of the pre-pandemic and pandemic strains. In addition, the results of single nucleotide polymorphism (SNP analysis showed that 94% of the SNPs between O3:K6 and O4:K68 pandemic isolates were within a 141 kb region surrounding the O- and K-antigen-encoding gene clusters. The "core" genes of V. parahaemolyticus were also compared to those of V. cholerae and V. vulnificus, in order to delineate differences between these three pathogenic species. Approximately one-half (49-59% of each species' core genes were conserved in all three species, and 14-24% of the core genes were species-specific and in different

  2. Full genome sequencing and genetic characterization of Eubenangee viruses identify Pata virus as a distinct species within the genus Orbivirus.

    Directory of Open Access Journals (Sweden)

    Manjunatha N Belaganahalli

    Full Text Available Eubenangee virus has previously been identified as the cause of Tammar sudden death syndrome (TSDS. Eubenangee virus (EUBV, Tilligery virus (TILV, Pata virus (PATAV and Ngoupe virus (NGOV are currently all classified within the Eubenangee virus species of the genus Orbivirus, family Reoviridae. Full genome sequencing confirmed that EUBV and TILV (both of which are from Australia show high levels of aa sequence identity (>92% in the conserved polymerase VP1(Pol, sub-core VP3(T2 and outer core VP7(T13 proteins, and are therefore appropriately classified within the same virus species. However, they show much lower amino acid (aa identity levels in their larger outer-capsid protein VP2 (<53%, consistent with membership of two different serotypes - EUBV-1 and EUBV-2 (respectively. In contrast PATAV showed significantly lower levels of aa sequence identity with either EUBV or TILV (with <71% in VP1(Pol and VP3(T2, and <57% aa identity in VP7(T13 consistent with membership of a distinct virus species. A proposal has therefore been sent to the Reoviridae Study Group of ICTV to recognise 'Pata virus' as a new Orbivirus species, with the PATAV isolate as serotype 1 (PATAV-1. Amongst the other orbiviruses, PATAV shows closest relationships to Epizootic Haemorrhagic Disease virus (EHDV, with 80.7%, 72.4% and 66.9% aa identity in VP3(T2, VP1(Pol, and VP7(T13 respectively. Although Ngoupe virus was not available for these studies, like PATAV it was isolated in Central Africa, and therefore seems likely to also belong to the new species, possibly as a distinct 'type'. The data presented will facilitate diagnostic assay design and the identification of additional isolates of these viruses.

  3. Analysis tools for the interplay between genome layout and regulation.

    Science.gov (United States)

    Bouyioukos, Costas; Elati, Mohamed; Képès, François

    2016-06-06

    Genome layout and gene regulation appear to be interdependent. Understanding this interdependence is key to exploring the dynamic nature of chromosome conformation and to engineering functional genomes. Evidence for non-random genome layout, defined as the relative positioning of either co-functional or co-regulated genes, stems from two main approaches. Firstly, the analysis of contiguous genome segments across species, has highlighted the conservation of gene arrangement (synteny) along chromosomal regions. Secondly, the study of long-range interactions along a chromosome has emphasised regularities in the positioning of microbial genes that are co-regulated, co-expressed or evolutionarily correlated. While one-dimensional pattern analysis is a mature field, it is often powerless on biological datasets which tend to be incomplete, and partly incorrect. Moreover, there is a lack of comprehensive, user-friendly tools to systematically analyse, visualise, integrate and exploit regularities along genomes. Here we present the Genome REgulatory and Architecture Tools SCAN (GREAT:SCAN) software for the systematic study of the interplay between genome layout and gene expression regulation. SCAN is a collection of related and interconnected applications currently able to perform systematic analyses of genome regularities as well as to improve transcription factor binding sites (TFBS) and gene regulatory network predictions based on gene positional information. We demonstrate the capabilities of these tools by studying on one hand the regular patterns of genome layout in the major regulons of the bacterium Escherichia coli. On the other hand, we demonstrate the capabilities to improve TFBS prediction in microbes. Finally, we highlight, by visualisation of multivariate techniques, the interplay between position and sequence information for effective transcription regulation.

  4. Effects of a diet high in monounsaturated fat and a full Mediterranean diet on PBMC whole genome gene expression and plasma proteins

    OpenAIRE

    Dijk, van, Susan; Feskens, Edith; Bos, M.B.; Groot, de, Lisette; Vries, de, Jeanne; Muller, Michael; Afman, Lydia

    2012-01-01

    This study aimed to identify the effects of replacement of saturated fat (SFA) by monunsaturated fat (MUFA) in a western-type diet and the effects of a full Mediterranean (MED) diet on whole genome PBMC gene expression and plasma protein profiles. Abdominally overweight subjects were randomized to a 8 wk completely controlled SFA-rich diet, a SFA-by-MUFA-replaced diet (MUFA diet) or a MED diet. Concentrations of 124 plasma proteins and PBMCs whole genome transcriptional profiles were assessed...

  5. Gene organization in rice revealed by full-length cDNA mapping and gene expression analysis through microarray.

    Directory of Open Access Journals (Sweden)

    Kouji Satoh

    Full Text Available Rice (Oryza sativa L. is a model organism for the functional genomics of monocotyledonous plants since the genome size is considerably smaller than those of other monocotyledonous plants. Although highly accurate genome sequences of indica and japonica rice are available, additional resources such as full-length complementary DNA (FL-cDNA sequences are also indispensable for comprehensive analyses of gene structure and function. We cross-referenced 28.5K individual loci in the rice genome defined by mapping of 578K FL-cDNA clones with the 56K loci predicted in the TIGR genome assembly. Based on the annotation status and the presence of corresponding cDNA clones, genes were classified into 23K annotated expressed (AE genes, 33K annotated non-expressed (ANE genes, and 5.5K non-annotated expressed (NAE genes. We developed a 60mer oligo-array for analysis of gene expression from each locus. Analysis of gene structures and expression levels revealed that the general features of gene structure and expression of NAE and ANE genes were considerably different from those of AE genes. The results also suggested that the cloning efficiency of rice FL-cDNA is associated with the transcription activity of the corresponding genetic locus, although other factors may also have an effect. Comparison of the coverage of FL-cDNA among gene families suggested that FL-cDNA from genes encoding rice- or eukaryote-specific domains, and those involved in regulatory functions were difficult to produce in bacterial cells. Collectively, these results indicate that rice genes can be divided into distinct groups based on transcription activity and gene structure, and that the coverage bias of FL-cDNA clones exists due to the incompatibility of certain eukaryotic genes in bacteria.

  6. Comparative analysis of prophages in Streptococcus mutans genomes

    Science.gov (United States)

    Fu, Tiwei; Fan, Xiangyu; Long, Quanxin; Deng, Wanyan; Song, Jinlin

    2017-01-01

    Prophages have been considered genetic units that have an intimate association with novel phenotypic properties of bacterial hosts, such as pathogenicity and genomic variation. Little is known about the genetic information of prophages in the genome of Streptococcus mutans, a major pathogen of human dental caries. In this study, we identified 35 prophage-like elements in S. mutans genomes and performed a comparative genomic analysis. Comparative genomic and phylogenetic analyses of prophage sequences revealed that the prophages could be classified into three main large clusters: Cluster A, Cluster B, and Cluster C. The S. mutans prophages in each cluster were compared. The genomic sequences of phismuN66-1, phismuNLML9-1, and phismu24-1 all shared similarities with the previously reported S. mutans phages M102, M102AD, and ϕAPCM01. The genomes were organized into seven major gene clusters according to the putative functions of the predicted open reading frames: packaging and structural modules, integrase, host lysis modules, DNA replication/recombination modules, transcriptional regulatory modules, other protein modules, and hypothetical protein modules. Moreover, an integrase gene was only identified in phismuNLML9-1 prophages. PMID:29158986

  7. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Directory of Open Access Journals (Sweden)

    Jianmin Fu

    Full Text Available Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  8. Genome sequencing and comparative genomics analysis revealed pathogenic potential in Penicillium capsulatum as a novel fungal pathogen belonging to Eurotiales

    Directory of Open Access Journals (Sweden)

    Ying Yang

    2016-10-01

    Full Text Available Penicillium capsulatum is a rare Penicillium species used in paper manufacturing, but recently it has been reported to cause invasive infection. To research the pathogenicity of the clinical Penicillium strain, we sequenced the genomes and transcriptome of the clinical and environmental strains of P. capsulatum. Comparative analyses of these two P. capsulatum strains and close related strains belonging to Eurotiales were performed. The assembled genome sizes of P. capsulatum are approximately 34.4 Mbp in length and encode 11,080 predicted genes. The different isolates of P. capsulatum are highly similar, with the exception of several unique genes, INDELs or SNP in the genes coding for glycosyl hydrolases, amino acid transporters and circumsporozoite protein. A phylogenomic analysis was performed based on the whole genome data of 38 strains belonging to Eurotiales. By comparing the whole genome sequences and the virulence-related genes from 20 important related species, including fungal pathogens and non-human pathogens belonging to Eurotiales, we found meaningful pathogenicity characteristics between P. capsulatum and its closely related species. Our research indicated that P. capsulatum may be a neglected opportunistic pathogen. This study is beneficial for mycologists, geneticists and epidemiologists to achieve a deeper understanding of the genetic basis of the role of P. capsulatum as a newly reported fungal pathogen.

  9. SOLiD sequencing of four Vibrio vulnificus genomes enables comparative genomic analysis and identification of candidate clade-specific virulence genes

    Directory of Open Access Journals (Sweden)

    Telonis-Scott Marina

    2010-09-01

    Full Text Available Abstract Background Vibrio vulnificus is the leading cause of reported death from consumption of seafood in the United States. Despite several decades of research on molecular pathogenesis, much remains to be learned about the mechanisms of virulence of this opportunistic bacterial pathogen. The two complete and annotated genomic DNA sequences of V. vulnificus belong to strains of clade 2, which is the predominant clade among clinical strains. Clade 2 strains generally possess higher virulence potential in animal models of disease compared with clade 1, which predominates among environmental strains. SOLiD sequencing of four V. vulnificus strains representing different clades (1 and 2 and biotypes (1 and 2 was used for comparative genomic analysis. Results Greater than 4,100,000 bases were sequenced of each strain, yielding approximately 100-fold coverage for each of the four genomes. Although the read lengths of SOLiD genomic sequencing were only 35 nt, we were able to make significant conclusions about the unique and shared sequences among the genomes, including identification of single nucleotide polymorphisms. Comparative analysis of the newly sequenced genomes to the existing reference genomes enabled the identification of 3,459 core V. vulnificus genes shared among all six strains and 80 clade 2-specific genes. We identified 523,161 SNPs among the six genomes. Conclusions We were able to glean much information about the genomic content of each strain using next generation sequencing. Flp pili, GGDEF proteins, and genomic island XII were identified as possible virulence factors because of their presence in virulent sequenced strains. Genomic comparisons also point toward the involvement of sialic acid catabolism in pathogenesis.

  10. Proteomic and genomic analysis of cardiovascular disease

    National Research Council Canada - National Science Library

    Van Eyk, Jennifer; Dunn, M. J

    2003-01-01

    ... to cardiovascular disease. By exploring the various strategies and technical aspects of both, using examples from cardiac or vascular biology, the limitations and the potential of these methods can be clearly seen. The book is divided into three sections: the first focuses on genomics, the second on proteomics, and the third provides an overview of the importance of these two scientific disciplines in drug and diagnostic discovery. The goal of this book is the transfer of their hard-earned lessons to the growing num...

  11. MBGD update 2015: microbial genome database for flexible ortholog analysis utilizing a diverse set of genomic data.

    Science.gov (United States)

    Uchiyama, Ikuo; Mihara, Motohiro; Nishide, Hiroyo; Chiba, Hirokazu

    2015-01-01

    The microbial genome database for comparative analysis (MBGD) (available at http://mbgd.genome.ad.jp/) is a comprehensive ortholog database for flexible comparative analysis of microbial genomes, where the users are allowed to create an ortholog table among any specified set of organisms. Because of the rapid increase in microbial genome data owing to the next-generation sequencing technology, it becomes increasingly challenging to maintain high-quality orthology relationships while allowing the users to incorporate the latest genomic data available into an analysis. Because many of the recently accumulating genomic data are draft genome sequences for which some complete genome sequences of the same or closely related species are available, MBGD now stores draft genome data and allows the users to incorporate them into a user-specific ortholog database using the MyMBGD functionality. In this function, draft genome data are incorporated into an existing ortholog table created only from the complete genome data in an incremental manner to prevent low-quality draft data from affecting clustering results. In addition, to provide high-quality orthology relationships, the standard ortholog table containing all the representative genomes, which is first created by the rapid classification program DomClust, is now refined using DomRefine, a recently developed program for improving domain-level clustering using multiple sequence alignment information. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.

  12. Identification of conserved regulatory elements by comparative genome analysis

    Directory of Open Access Journals (Sweden)

    Jareborg Niclas

    2003-05-01

    Full Text Available Abstract Background For genes that have been successfully delineated within the human genome sequence, most regulatory sequences remain to be elucidated. The annotation and interpretation process requires additional data resources and significant improvements in computational methods for the detection of regulatory regions. One approach of growing popularity is based on the preferential conservation of functional sequences over the course of evolution by selective pressure, termed 'phylogenetic footprinting'. Mutations are more likely to be disruptive if they appear in functional sites, resulting in a measurable difference in evolution rates between functional and non-functional genomic segments. Results We have devised a flexible suite of methods for the identification and visualization of conserved transcription-factor-binding sites. The system reports those putative transcription-factor-binding sites that are both situated in conserved regions and located as pairs of sites in equivalent positions in alignments between two orthologous sequences. An underlying collection of metazoan transcription-factor-binding profiles was assembled to facilitate the study. This approach results in a significant improvement in the detection of transcription-factor-binding sites because of an increased signal-to-noise ratio, as demonstrated with two sets of promoter sequences. The method is implemented as a graphical web application, ConSite, which is at the disposal of the scientific community at http://www.phylofoot.org/. Conclusions Phylogenetic footprinting dramatically improves the predictive selectivity of bioinformatic approaches to the analysis of promoter sequences. ConSite delivers unparalleled performance using a novel database of high-quality binding models for metazoan transcription factors. With a dynamic interface, this bioinformatics tool provides broad access to promoter analysis with phylogenetic footprinting.

  13. Recurrence time statistics: versatile tools for genomic DNA sequence analysis.

    Science.gov (United States)

    Cao, Yinhe; Tung, Wen-Wen; Gao, J B

    2004-01-01

    With the completion of the human and a few model organisms' genomes, and the genomes of many other organisms waiting to be sequenced, it has become increasingly important to develop faster computational tools which are capable of easily identifying the structures and extracting features from DNA sequences. One of the more important structures in a DNA sequence is repeat-related. Often they have to be masked before protein coding regions along a DNA sequence are to be identified or redundant expressed sequence tags (ESTs) are to be sequenced. Here we report a novel recurrence time based method for sequence analysis. The method can conveniently study all kinds of periodicity and exhaustively find all repeat-related features from a genomic DNA sequence. An efficient codon index is also derived from the recurrence time statistics, which has the salient features of being largely species-independent and working well on very short sequences. Efficient codon indices are key elements of successful gene finding algorithms, and are particularly useful for determining whether a suspected EST belongs to a coding or non-coding region. We illustrate the power of the method by studying the genomes of E. coli, the yeast S. cervisivae, the nematode worm C. elegans, and the human, Homo sapiens. Computationally, our method is very efficient. It allows us to carry out analysis of genomes on the whole genomic scale by a PC.

  14. Comparative genomic analysis by microbial COGs self-attraction rate.

    Science.gov (United States)

    Santoni, Daniele; Romano-Spica, Vincenzo

    2009-06-21

    Whole genome analysis provides new perspectives to determine phylogenetic relationships among microorganisms. The availability of whole nucleotide sequences allows different levels of comparison among genomes by several approaches. In this work, self-attraction rates were considered for each cluster of orthologous groups of proteins (COGs) class in order to analyse gene aggregation levels in physical maps. Phylogenetic relationships among microorganisms were obtained by comparing self-attraction coefficients. Eighteen-dimensional vectors were computed for a set of 168 completely sequenced microbial genomes (19 archea, 149 bacteria). The components of the vector represent the aggregation rate of the genes belonging to each of 18 COGs classes. Genes involved in nonessential functions or related to environmental conditions showed the highest aggregation rates. On the contrary genes involved in basic cellular tasks showed a more uniform distribution along the genome, except for translation genes. Self-attraction clustering approach allowed classification of Proteobacteria, Bacilli and other species belonging to Firmicutes. Rearrangement and Lateral Gene Transfer events may influence divergences from classical taxonomy. Each set of COG classes' aggregation values represents an intrinsic property of the microbial genome. This novel approach provides a new point of view for whole genome analysis and bacterial characterization.

  15. Mycobacterial species as case-study of comparative genome analysis

    DEFF Research Database (Denmark)

    Zakham, F.; Belayachi, L.; Ussery, David

    2011-01-01

    . Pasteur 1173P2, M. leprae Br4923, M. marinum M, M. sp. KMS, M. sp. MCS, M. tuberculosis CDC1551, M. tuberculosis F11, M. tuberculosis H37Ra, M. tuberculosis H37Rv, M. tuberculosis KZN 1435 , M. ulcerans Agy99,and M. vanbaalenii PYR—1, For this purpose a comparison has been done based on their length...... defined for twelve Mycobacterial species. We have also introduced the genome atlas of the reference strain M. tuberculosis H37Rv which can give a good overview of this genome. And for examining the phylogenetic relationships among these bacteria, a phylogenic tree has been constructed from 16S rRNA gene...... the evolutionary events of these species and improving drugs, vaccines, and diagnostics tools for controlling Mycobacterial diseases. In this present study we aim to outline a comparative genome analysis of fourteen Mycobacterial genomes: M. avium subsp. paratuberculosis K—10, M. bovis AF2122/97, M. bovis BCG str...

  16. Particle infectivity of HIV-1 full-length genome infectious molecular clones in a subtype C heterosexual transmission pair following high fidelity amplification and unbiased cloning

    Energy Technology Data Exchange (ETDEWEB)

    Deymier, Martin J., E-mail: mdeymie@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Claiborne, Daniel T., E-mail: dclaibo@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Ende, Zachary, E-mail: zende@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Ratner, Hannah K., E-mail: hannah.ratner@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Kilembe, William, E-mail: wkilembe@rzhrg-mail.org [Zambia-Emory HIV Research Project (ZEHRP), B22/737 Mwembelelo, Emmasdale Post Net 412, P/BagE891, Lusaka (Zambia); Allen, Susan, E-mail: sallen5@emory.edu [Zambia-Emory HIV Research Project (ZEHRP), B22/737 Mwembelelo, Emmasdale Post Net 412, P/BagE891, Lusaka (Zambia); Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA (United States); Hunter, Eric, E-mail: eric.hunter2@emory.edu [Emory Vaccine Center, Yerkes National Primate Research Center, 954 Gatewood Road NE, Atlanta, GA 30329 (United States); Department of Pathology and Laboratory Medicine, Emory University, Atlanta, GA (United States)

    2014-11-15

    The high genetic diversity of HIV-1 impedes high throughput, large-scale sequencing and full-length genome cloning by common restriction enzyme based methods. Applying novel methods that employ a high-fidelity polymerase for amplification and an unbiased fusion-based cloning strategy, we have generated several HIV-1 full-length genome infectious molecular clones from an epidemiologically linked transmission pair. These clones represent the transmitted/founder virus and phylogenetically diverse non-transmitted variants from the chronically infected individual's diverse quasispecies near the time of transmission. We demonstrate that, using this approach, PCR-induced mutations in full-length clones derived from their cognate single genome amplicons are rare. Furthermore, all eight non-transmitted genomes tested produced functional virus with a range of infectivities, belying the previous assumption that a majority of circulating viruses in chronic HIV-1 infection are defective. Thus, these methods provide important tools to update protocols in molecular biology that can be universally applied to the study of human viral pathogens. - Highlights: • Our novel methodology demonstrates accurate amplification and cloning of full-length HIV-1 genomes. • A majority of plasma derived HIV variants from a chronically infected individual are infectious. • The transmitted/founder was more infectious than the majority of the variants from the chronically infected donor.

  17. Particle infectivity of HIV-1 full-length genome infectious molecular clones in a subtype C heterosexual transmission pair following high fidelity amplification and unbiased cloning

    International Nuclear Information System (INIS)

    Deymier, Martin J.; Claiborne, Daniel T.; Ende, Zachary; Ratner, Hannah K.; Kilembe, William; Allen, Susan; Hunter, Eric

    2014-01-01

    The high genetic diversity of HIV-1 impedes high throughput, large-scale sequencing and full-length genome cloning by common restriction enzyme based methods. Applying novel methods that employ a high-fidelity polymerase for amplification and an unbiased fusion-based cloning strategy, we have generated several HIV-1 full-length genome infectious molecular clones from an epidemiologically linked transmission pair. These clones represent the transmitted/founder virus and phylogenetically diverse non-transmitted variants from the chronically infected individual's diverse quasispecies near the time of transmission. We demonstrate that, using this approach, PCR-induced mutations in full-length clones derived from their cognate single genome amplicons are rare. Furthermore, all eight non-transmitted genomes tested produced functional virus with a range of infectivities, belying the previous assumption that a majority of circulating viruses in chronic HIV-1 infection are defective. Thus, these methods provide important tools to update protocols in molecular biology that can be universally applied to the study of human viral pathogens. - Highlights: • Our novel methodology demonstrates accurate amplification and cloning of full-length HIV-1 genomes. • A majority of plasma derived HIV variants from a chronically infected individual are infectious. • The transmitted/founder was more infectious than the majority of the variants from the chronically infected donor

  18. Savant Genome Browser 2: visualization and analysis for population-scale genomics.

    Science.gov (United States)

    Fiume, Marc; Smith, Eric J M; Brook, Andrew; Strbenac, Dario; Turner, Brian; Mezlini, Aziz M; Robinson, Mark D; Wodak, Shoshana J; Brudno, Michael

    2012-07-01

    High-throughput sequencing (HTS) technologies are providing an unprecedented capacity for data generation, and there is a corresponding need for efficient data exploration and analysis capabilities. Although most existing tools for HTS data analysis are developed for either automated (e.g. genotyping) or visualization (e.g. genome browsing) purposes, such tools are most powerful when combined. For example, integration of visualization and computation allows users to iteratively refine their analyses by updating computational parameters within the visual framework in real-time. Here we introduce the second version of the Savant Genome Browser, a standalone program for visual and computational analysis of HTS data. Savant substantially improves upon its predecessor and existing tools by introducing innovative visualization modes and navigation interfaces for several genomic datatypes, and synergizing visual and automated analyses in a way that is powerful yet easy even for non-expert users. We also present a number of plugins that were developed by the Savant Community, which demonstrate the power of integrating visual and automated analyses using Savant. The Savant Genome Browser is freely available (open source) at www.savantbrowser.com.

  19. Genomic analysis of mouse retinal development.

    Directory of Open Access Journals (Sweden)

    Seth Blackshaw

    2004-09-01

    Full Text Available The vertebrate retina is comprised of seven major cell types that are generated in overlapping but well-defined intervals. To identify genes that might regulate retinal development, gene expression in the developing retina was profiled at multiple time points using serial analysis of gene expression (SAGE. The expression patterns of 1,051 genes that showed developmentally dynamic expression by SAGE were investigated using in situ hybridization. A molecular atlas of gene expression in the developing and mature retina was thereby constructed, along with a taxonomic classification of developmental gene expression patterns. Genes were identified that label both temporal and spatial subsets of mitotic progenitor cells. For each developing and mature major retinal cell type, genes selectively expressed in that cell type were identified. The gene expression profiles of retinal Müller glia and mitotic progenitor cells were found to be highly similar, suggesting that Müller glia might serve to produce multiple retinal cell types under the right conditions. In addition, multiple transcripts that were evolutionarily conserved that did not appear to encode open reading frames of more than 100 amino acids in length ("noncoding RNAs" were found to be dynamically and specifically expressed in developing and mature retinal cell types. Finally, many photoreceptor-enriched genes that mapped to chromosomal intervals containing retinal disease genes were identified. These data serve as a starting point for functional investigations of the roles of these genes in retinal development and physiology.

  20. DivStat: a user-friendly tool for single nucleotide polymorphism analysis of genomic diversity.

    Directory of Open Access Journals (Sweden)

    Inês Soares

    Full Text Available Recent developments have led to an enormous increase of publicly available large genomic data, including complete genomes. The 1000 Genomes Project was a major contributor, releasing the results of sequencing a large number of individual genomes, and allowing for a myriad of large scale studies on human genetic variation. However, the tools currently available are insufficient when the goal concerns some analyses of data sets encompassing more than hundreds of base pairs and when considering haplotype sequences of single nucleotide polymorphisms (SNPs. Here, we present a new and potent tool to deal with large data sets allowing the computation of a variety of summary statistics of population genetic data, increasing the speed of data analysis.

  1. Comparative Genomic Analysis of Clinical and Environmental Vibrio Vulnificus Isolates Revealed Biotype 3 Evolutionary Relationships

    Directory of Open Access Journals (Sweden)

    Yael eKotton

    2015-01-01

    Full Text Available In 1996 a common-source outbreak of severe soft tissue and bloodstream infections erupted among Israeli fish farmers and fish consumers due to changes in fish marketing policies. The causative pathogen was a new strain of Vibrio vulnificus, named biotype 3, which displayed a unique biochemical and genotypic profile. Initial observations suggested that the pathogen erupted as a result of genetic recombination between two distinct populations. We applied a whole genome shotgun sequencing approach using several V. vulnificus strains from Israel in order to study the pan genome of V. vulnificus and determine the phylogenetic relationship of biotype 3 with existing populations. The core genome of V. vulnificus based on 16 draft and complete genomes consisted of 3068 genes, representing between 59% and 78% of the whole genome of 16 strains. The accessory genome varied in size from 781 kbp to 2044 kbp. Phylogenetic analysis based on whole, core, and accessory genomes displayed similar clustering patterns with two main clusters, clinical (C and environmental (E, all biotype 3 strains formed a distinct group within the E cluster. Annotation of accessory genomic regions found in biotype 3 strains and absent from the core genome yielded 1732 genes, of which the vast majority encoded hypothetical proteins, phage-related proteins, and mobile element proteins. A total of 1916 proteins (including 713 hypothetical proteins were present in all human pathogenic strains (both biotype 3 and non-biotype 3 and absent from the environmental strains. Clustering analysis of the non-hypothetical proteins revealed 148 protein clusters shared by all human pathogenic strains; these included transcriptional regulators, arylsulfatases, methyl-accepting chemotaxis proteins, acetyltransferases, GGDEF family proteins, transposases, type IV secretory system (T4SS proteins, and integrases. Our study showed that V. vulnificus biotype 3 evolved from environmental populations and

  2. Genome-wide Analysis of Gene Regulation

    DEFF Research Database (Denmark)

    Chen, Yun

    to protein: through epigenetic modifications, transcription regulators or post-transcriptional controls. The following papers concern several layers of gene regulation with questions answered by different HTS approaches. Genome-wide screening of epigenetic changes by ChIP-seq allowed us to study both spatial...... and temporal alterations of histone modifications (Papers I and II). Coupling the data with machine learning approaches, we established a prediction framework to assess the most informative histone marks as well as their most influential nucleosome positions in predicting the promoter usages. (Papers I...... they regulated or if the sites had global elevated usage rates by multiple TFs. Using RNA-seq, 5’end-seq in combination with depletion of 5’exonuclease as well as nonsensemediated decay (NMD) factors, we systematically analyzed NMD substrates as well as their degradation intermediates in human cells (Paper V...

  3. arrayCGHbase: an analysis platform for comparative genomic hybridization microarrays

    Directory of Open Access Journals (Sweden)

    Moreau Yves

    2005-05-01

    Full Text Available Abstract Background The availability of the human genome sequence as well as the large number of physically accessible oligonucleotides, cDNA, and BAC clones across the entire genome has triggered and accelerated the use of several platforms for analysis of DNA copy number changes, amongst others microarray comparative genomic hybridization (arrayCGH. One of the challenges inherent to this new technology is the management and analysis of large numbers of data points generated in each individual experiment. Results We have developed arrayCGHbase, a comprehensive analysis platform for arrayCGH experiments consisting of a MIAME (Minimal Information About a Microarray Experiment supportive database using MySQL underlying a data mining web tool, to store, analyze, interpret, compare, and visualize arrayCGH results in a uniform and user-friendly format. Following its flexible design, arrayCGHbase is compatible with all existing and forthcoming arrayCGH platforms. Data can be exported in a multitude of formats, including BED files to map copy number information on the genome using the Ensembl or UCSC genome browser. Conclusion ArrayCGHbase is a web based and platform independent arrayCGH data analysis tool, that allows users to access the analysis suite through the internet or a local intranet after installation on a private server. ArrayCGHbase is available at http://medgen.ugent.be/arrayCGHbase/.

  4. First fungal genome sequence from Africa: A preliminary analysis

    Directory of Open Access Journals (Sweden)

    Rene Sutherland

    2012-01-01

    Full Text Available Some of the most significant breakthroughs in the biological sciences this century will emerge from the development of next generation sequencing technologies. The ease of availability of DNA sequence made possible through these new technologies has given researchers opportunities to study organisms in a manner that was not possible with Sanger sequencing. Scientists will, therefore, need to embrace genomics, as well as develop and nurture the human capacity to sequence genomes and utilise the ’tsunami‘ of data that emerge from genome sequencing. In response to these challenges, we sequenced the genome of Fusarium circinatum, a fungal pathogen of pine that causes pitch canker, a disease of great concern to the South African forestry industry. The sequencing work was conducted in South Africa, making F. circinatum the first eukaryotic organism for which the complete genome has been sequenced locally. Here we report on the process that was followed to sequence, assemble and perform a preliminary characterisation of the genome. Furthermore, details of the computer annotation and manual curation of this genome are presented. The F. circinatum genome was found to be nearly 44 million bases in size, which is similar to that of four other Fusarium genomes that have been sequenced elsewhere. The genome contains just over 15 000 open reading frames, which is less than that of the related species, Fusarium oxysporum, but more than that for Fusarium verticillioides. Amongst the various putative gene clusters identified in F. circinatum, those encoding the secondary metabolites fumosin and fusarin appeared to harbour evidence of gene translocation. It is anticipated that similar comparisons of other loci will provide insights into the genetic basis for pathogenicity of the pitch canker pathogen. Perhaps more importantly, this project has engaged a relatively large group of scientists

  5. Calculation of evolutionary correlation between individual genes and full-length genome: a method useful for choosing phylogenetic markers for molecular epidemiology.

    Directory of Open Access Journals (Sweden)

    Shuai Wang

    Full Text Available Individual genes or regions are still commonly used to estimate the phylogenetic relationships among viral isolates. The genomic regions that can faithfully provide assessments consistent with those predicted with full-length genome sequences would be preferable to serve as good candidates of the phylogenetic markers for molecular epidemiological studies of many viruses. Here we employed a statistical method to evaluate the evolutionary relationships between individual viral genes and full-length genomes without tree construction as a way to determine which gene can match the genome well in phylogenetic analyses. This method was performed by calculation of linear correlations between the genetic distance matrices of aligned individual gene sequences and aligned genome sequences. We applied this method to the phylogenetic analyses of porcine circovirus 2 (PCV2, measles virus (MV, hepatitis E virus (HEV and Japanese encephalitis virus (JEV. Phylogenetic trees were constructed for comparisons and the possible factors affecting the method accuracy were also discussed in the calculations. The results revealed that this method could produce results consistent with those of previous studies about the proper consensus sequences that could be successfully used as phylogenetic markers. And our results also suggested that these evolutionary correlations could provide useful information for identifying genes that could be used effectively to infer the genetic relationships.

  6. Detection and analysis of ancient segmental duplications in mammalian genomes.

    Science.gov (United States)

    Pu, Lianrong; Lin, Yu; Pevzner, Pavel A

    2018-05-07

    Although segmental duplications (SDs) represent hotbeds for genomic rearrangements and emergence of new genes, there are still no easy-to-use tools for identifying SDs. Moreover, while most previous studies focused on recently emerged SDs, detection of ancient SDs remains an open problem. We developed an SDquest algorithm for SD finding and applied it to analyzing SDs in human, gorilla, and mouse genomes. Our results demonstrate that previous studies missed many SDs in these genomes and show that SDs account for at least 6.05% of the human genome (version hg19), a 17% increase as compared to the previous estimate. Moreover, SDquest classified 6.42% of the latest GRCh38 version of the human genome as SDs, a large increase as compared to previous studies. We thus propose to re-evaluate evolution of SDs based on their accurate representation across multiple genomes. Toward this goal, we analyzed the complex mosaic structure of SDs and decomposed mosaic SDs into elementary SDs, a prerequisite for follow-up evolutionary analysis. We also introduced the concept of the breakpoint graph of mosaic SDs that revealed SD hotspots and suggested that some SDs may have originated from circular extrachromosomal DNA (ecDNA), not unlike ecDNA that contributes to accelerated evolution in cancer. © 2018 Pu et al.; Published by Cold Spring Harbor Laboratory Press.

  7. Quantitative high-resolution genomic analysis of single cancer cells.

    Science.gov (United States)

    Hannemann, Juliane; Meyer-Staeckling, Sönke; Kemming, Dirk; Alpers, Iris; Joosse, Simon A; Pospisil, Heike; Kurtz, Stefan; Görndt, Jennifer; Püschel, Klaus; Riethdorf, Sabine; Pantel, Klaus; Brandt, Burkhard

    2011-01-01

    During cancer progression, specific genomic aberrations arise that can determine the scope of the disease and can be used as predictive or prognostic markers. The detection of specific gene amplifications or deletions in single blood-borne or disseminated tumour cells that may give rise to the development of metastases is of great clinical interest but technically challenging. In this study, we present a method for quantitative high-resolution genomic analysis of single cells. Cells were isolated under permanent microscopic control followed by high-fidelity whole genome amplification and subsequent analyses by fine tiling array-CGH and qPCR. The assay was applied to single breast cancer cells to analyze the chromosomal region centred by the therapeutical relevant EGFR gene. This method allows precise quantitative analysis of copy number variations in single cell diagnostics.

  8. Genome-Wide Analysis of Simple Sequence Repeats in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2017-06-01

    Full Text Available Bitter gourd (Momordica charantia is widely cultivated as a vegetable and medicinal herb in many Asian and African countries. After the sequencing of the cucumber (Cucumis sativus, watermelon (Citrullus lanatus, and melon (Cucumis melo genomes, bitter gourd became the fourth cucurbit species whose whole genome was sequenced. However, a comprehensive analysis of simple sequence repeats (SSRs in bitter gourd, including a comparison with the three aforementioned cucurbit species has not yet been published. Here, we identified a total of 188,091 and 167,160 SSR motifs in the genomes of the bitter gourd lines ‘Dali-11’ and ‘OHB3-1,’ respectively. Subsequently, the SSR content, motif lengths, and classified motif types were characterized for the bitter gourd genomes and compared among all the cucurbit genomes. Lastly, a large set of 138,727 unique in silico SSR primer pairs were designed for bitter gourd. Among these, 71 primers were selected, all of which successfully amplified SSRs from the two bitter gourd lines ‘Dali-11’ and ‘K44’. To further examine the utilization of unique SSR primers, 21 SSR markers were used to genotype a collection of 211 bitter gourd lines from all over the world. A model-based clustering method and phylogenetic analysis indicated a clear separation among the geographic groups. The genomic SSR markers developed in this study have considerable potential value in advancing bitter gourd research.

  9. Genome-wide Studies of Mycolic Acid Bacteria: Computational Identification and Analysis of a Minimal Genome

    KAUST Repository

    Kamanu, Frederick Kinyua

    2012-12-01

    The mycolic acid bacteria are a distinct suprageneric group of asporogenous Grampositive, high GC-content bacteria, distinguished by the presence of mycolic acids in their cell envelope. They exhibit great diversity in their cell and morphology; although primarily non-pathogens, this group contains three major pathogens Mycobacterium leprae, Mycobacterium tuberculosis complex, and Corynebacterium diphtheria. Although the mycolic acid bacteria are a clearly defined group of bacteria, the taxonomic relationships between its constituent genera and species are less well defined. Two approaches were tested for their suitability in describing the taxonomy of the group. First, a Multilocus Sequence Typing (MLST) experiment was assessed and found to be superior to monophyletic (16S small ribosomal subunit) in delineating a total of 52 mycolic acid bacterial species. Phylogenetic inference was performed using the neighbor-joining method. To further refine phylogenetic analysis and to take advantage of the widespread availability of bacterial genome data, a computational framework that simulates DNA-DNA hybridisation was developed and validated using multiscale bootstrap resampling. The tool classifies microbial genomes based on whole genome DNA, and was deployed as a web-application using PHP and Javascript. It is accessible online at http://cbrc.kaust.edu.sa/dna_hybridization/ A third study was a computational and statistical methods in the identification and analysis of a putative minimal mycolic acid bacterial genome so as to better understand (1) the genomic requirements to encode a mycolic acid bacterial cell and (2) the role and type of genes and genetic elements that lead to the massive increase in genome size in environmental mycolic acid bacteria. Using a reciprocal comparison approach, a total of 690 orthologous gene clusters forming a putative minimal genome were identified across 24 mycolic acid bacterial species. In order to identify new potential drug

  10. Genome sequencing and analysis of the first complete genome of Lactobacillus kunkeei strain MP2, an Apis mellifera gut isolate

    Directory of Open Access Journals (Sweden)

    Freddy Asenjo

    2016-04-01

    Full Text Available Background. The honey bee (Apis mellifera is the most important pollinator in agriculture worldwide. However, the number of honey bees has fallen significantly since 2006, becoming a huge ecological problem nowadays. The principal cause is CCD, or Colony Collapse Disorder, characterized by the seemingly spontaneous abandonment of hives by their workers. One of the characteristics of CCD in honey bees is the alteration of the bacterial communities in their gastrointestinal tract, mainly due to the decrease of Firmicutes populations, such as the Lactobacilli. At this time, the causes of these alterations remain unknown. We recently isolated a strain of Lactobacillus kunkeei (L. kunkeei strain MP2 from the gut of Chilean honey bees. L. kunkeei, is one of the most commonly isolated bacterium from the honey bee gut and is highly versatile in different ecological niches. In this study, we aimed to elucidate in detail, the L. kunkeei genetic background and perform a comparative genome analysis with other Lactobacillus species. Methods. L. kunkeei MP2 was originally isolated from the guts of Chilean A. mellifera individuals. Genome sequencing was done using Pacific Biosciences single-molecule real-time sequencing technology. De novo assembly was performed using Celera assembler. The genome was annotated using Prokka, and functional information was added using the EggNOG 3.1 database. In addition, genomic islands were predicted using IslandViewer, and pro-phage sequences using PHAST. Comparisons between L. kunkeei MP2 with other L. kunkeei, and Lactobacillus strains were done using Roary. Results. The complete genome of L. kunkeei MP2 comprises one circular chromosome of 1,614,522 nt. with a GC content of 36,9%. Pangenome analysis with 16 L. kunkeei strains, identified 113 unique genes, most of them related to phage insertions. A large and unique region of L. kunkeei MP2 genome contains several genes that encode for phage structural protein and

  11. Primer to analysis of genomic data using R

    CERN Document Server

    Gondro, Cedric

    2015-01-01

    Through this book, researchers and students will learn to use R for analysis of large-scale genomic data and how to create routines to automate analytical steps. The philosophy behind the book is to start with real world raw datasets and perform all the analytical steps needed to reach final results. Though theory plays an important role, this is a practical book for advanced undergraduate and graduate classes in bioinformatics, genomics and statistical genetics or for use in lab sessions. This book is also designed to be used by students in computer science and statistics who want to learn the practical aspects of genomic analysis without delving into algorithmic details. The datasets used throughout the book may be downloaded from the publisher’s website.  Chapters show how to handle and manage high-throughput genomic data, create automated workflows and speed up analyses in R. A wide range of R packages useful for working with genomic data are illustrated with practical examples. In recent years R has b...

  12. Full mitochondrial genome sequences of two endemic Philippine hornbill species (Aves: Bucerotidae) provide evidence for pervasive mitochondrial DNA recombination.

    Science.gov (United States)

    Sammler, Svenja; Bleidorn, Christoph; Tiedemann, Ralph

    2011-01-14

    Although nowaday it is broadly accepted that mitochondrial DNA (mtDNA) may undergo recombination, the frequency of such recombination remains controversial. Its estimation is not straightforward, as recombination under homoplasmy (i.e., among identical mt genomes) is likely to be overlooked. In species with tandem duplications of large mtDNA fragments the detection of recombination can be facilitated, as it can lead to gene conversion among duplicates. Although the mechanisms for concerted evolution in mtDNA are not fully understood yet, recombination rates have been estimated from "one per speciation event" down to 850 years or even "during every replication cycle". Here we present the first complete mt genome of the avian family Bucerotidae, i.e., that of two Philippine hornbills, Aceros waldeni and Penelopides panini. The mt genomes are characterized by a tandemly duplicated region encompassing part of cytochrome b, 3 tRNAs, NADH6, and the control region. The duplicated fragments are identical to each other except for a short section in domain I and for the length of repeat motifs in domain III of the control region. Due to the heteroplasmy with regard to the number of these repeat motifs, there is some size variation in both genomes; with around 21,657 bp (A. waldeni) and 22,737 bp (P. panini), they significantly exceed the hitherto longest known avian mt genomes, that of the albatrosses. We discovered concerted evolution between the duplicated fragments within individuals. The existence of differences between individuals in coding genes as well as in the control region, which are maintained between duplicates, indicates that recombination apparently occurs frequently, i.e., in every generation. The homogenised duplicates are interspersed by a short fragment which shows no sign of recombination. We hypothesize that this region corresponds to the so-called Replication Fork Barrier (RFB), which has been described from the chicken mitochondrial genome. As this RFB

  13. Detecting Genomic Signatures of Natural Selection with Principal Component Analysis: Application to the 1000 Genomes Data.

    Science.gov (United States)

    Duforet-Frebourg, Nicolas; Luu, Keurcien; Laval, Guillaume; Bazin, Eric; Blum, Michael G B

    2016-04-01

    To characterize natural selection, various analytical methods for detecting candidate genomic regions have been developed. We propose to perform genome-wide scans of natural selection using principal component analysis (PCA). We show that the common FST index of genetic differentiation between populations can be viewed as the proportion of variance explained by the principal components. Considering the correlations between genetic variants and each principal component provides a conceptual framework to detect genetic variants involved in local adaptation without any prior definition of populations. To validate the PCA-based approach, we consider the 1000 Genomes data (phase 1) considering 850 individuals coming from Africa, Asia, and Europe. The number of genetic variants is of the order of 36 millions obtained with a low-coverage sequencing depth (3×). The correlations between genetic variation and each principal component provide well-known targets for positive selection (EDAR, SLC24A5, SLC45A2, DARC), and also new candidate genes (APPBPP2, TP1A1, RTTN, KCNMA, MYO5C) and noncoding RNAs. In addition to identifying genes involved in biological adaptation, we identify two biological pathways involved in polygenic adaptation that are related to the innate immune system (beta defensins) and to lipid metabolism (fatty acid omega oxidation). An additional analysis of European data shows that a genome scan based on PCA retrieves classical examples of local adaptation even when there are no well-defined populations. PCA-based statistics, implemented in the PCAdapt R package and the PCAdapt fast open-source software, retrieve well-known signals of human adaptation, which is encouraging for future whole-genome sequencing project, especially when defining populations is difficult. © The Author(s) 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  14. Genome-wide comparative analysis reveals similar types of NBS genes in hybrid Citrus sinensis genome and original Citrus clementine genome and provides new insights into non-TIR NBS genes.

    Directory of Open Access Journals (Sweden)

    Yunsheng Wang

    Full Text Available In this study, we identified and compared nucleotide-binding site (NBS domain-containing genes from three Citrus genomes (C. clementina, C. sinensis from USA and C. sinensis from China. Phylogenetic analysis of all Citrus NBS genes across these three genomes revealed that there are three approximately evenly numbered groups: one group contains the Toll-Interleukin receptor (TIR domain and two different Non-TIR groups in which most of proteins contain the Coiled Coil (CC domain. Motif analysis confirmed that the two groups of CC-containing NBS genes are from different evolutionary origins. We partitioned NBS genes into clades using NBS domain sequence distances and found most clades include NBS genes from all three Citrus genomes. This suggests that three Citrus genomes have similar numbers and types of NBS genes. We also mapped the re-sequenced reads of three pomelo and three mandarin genomes onto the C. sinensis genome. We found that most NBS genes of the hybrid C. sinensis genome have corresponding homologous genes in both pomelo and mandarin genomes. The homologous NBS genes in pomelo and mandarin suggest that the parental species of C. sinensis may contain similar types of NBS genes. This explains why the hybrid C. sinensis and original C. clementina have similar types of NBS genes in this study. Furthermore, we found that sequence variation amongst Citrus NBS genes were shaped by multiple independent and shared accelerated mutation accumulation events among different groups of NBS genes and in different Citrus genomes. Our comparative analyses yield valuable insight into the structure, organization and evolution of NBS genes in Citrus genomes. Furthermore, our comprehensive analysis showed that the non-TIR NBS genes can be divided into two groups that come from different evolutionary origins. This provides new insights into non-TIR genes, which have not received much attention.

  15. The complexity of Rhipicephalus (Boophilus microplus genome characterised through detailed analysis of two BAC clones

    Directory of Open Access Journals (Sweden)

    Valle Manuel

    2011-07-01

    Full Text Available Abstract Background Rhipicephalus (Boophilus microplus (Rmi a major cattle ectoparasite and tick borne disease vector, impacts on animal welfare and industry productivity. In arthropod research there is an absence of a complete Chelicerate genome, which includes ticks, mites, spiders, scorpions and crustaceans. Model arthropod genomes such as Drosophila and Anopheles are too taxonomically distant for a reference in tick genomic sequence analysis. This study focuses on the de-novo assembly of two R. microplus BAC sequences from the understudied R microplus genome. Based on available R. microplus sequenced resources and comparative analysis, tick genomic structure and functional predictions identify complex gene structures and genomic targets expressed during tick-cattle interaction. Results In our BAC analyses we have assembled, using the correct positioning of BAC end sequences and transcript sequences, two challenging genomic regions. Cot DNA fractions compared to the BAC sequences confirmed a highly repetitive BAC sequence BM-012-E08 and a low repetitive BAC sequence BM-005-G14 which was gene rich and contained short interspersed elements (SINEs. Based directly on the BAC and Cot data comparisons, the genome wide frequency of the SINE Ruka element was estimated. Using a conservative approach to the assembly of the highly repetitive BM-012-E08, the sequence was de-convoluted into three repeat units, each unit containing an 18S, 5.8S and 28S ribosomal RNA (rRNA encoding gene sequence (rDNA, related internal transcribed spacer and complex intergenic region. In the low repetitive BM-005-G14, a novel gene complex was found between to 2 genes on the same strand. Nested in the second intron of a large 9 Kb papilin gene was a helicase gene. This helicase overlapped in two exonic regions with the papilin. Both these genes were shown expressed in different tick life stage important in ectoparasite interaction with the host. Tick specific sequence

  16. Effects of a diet high in monounsaturated fat and a full Mediterranean diet on PBMC whole genome gene expression and plasma proteins

    NARCIS (Netherlands)

    Dijk, van Susan; Feskens, Edith; Bos, M.B.; Groot, de Lisette; Vries, de Jeanne; Muller, Michael; Afman, Lydia

    2012-01-01

    This study aimed to identify the effects of replacement of saturated fat (SFA) by monunsaturated fat (MUFA) in a western-type diet and the effects of a full Mediterranean (MED) diet on whole genome PBMC gene expression and plasma protein profiles. Abdominally overweight subjects were randomized to a

  17. Genome-wide identification, functional analysis and expression ...

    African Journals Online (AJOL)

    The plant pleiotropic drug resistance (PDR) family of ATP-binding cassette (ABC) transporters has comprehensively been researched in relation to transport of antifungal agents and resistant pathogens. In our study, analyses of the whole family of PDR genes present in the potato genome were provided. This analysis ...

  18. Genome-wide analysis of the WRKY transcription factors in aegilops tauschii.

    Science.gov (United States)

    Ma, Jianhui; Zhang, Daijing; Shao, Yun; Liu, Pei; Jiang, Lina; Li, Chunxi

    2014-01-01

    The WRKY transcription factors (TFs) play important roles in responding to abiotic and biotic stress in plants. However, due to its unfinished genome sequencing, relatively few WRKY TFs with full-length coding sequences (CDSs) have been identified in wheat. Instead, the Aegilops tauschii genome, which is the D-genome progenitor of the hexaploid wheat genome, provides important resources for the discovery of new genes. In this study, we performed a bioinformatics analysis to identify WRKY TFs with full-length CDSs from the A. tauschii genome. A detailed evolutionary analysis for all these TFs was conducted, and quantitative real-time PCR was carried out to investigate the expression patterns of the abiotic stress-related WRKY TFs under different abiotic stress conditions in A. tauschii seedlings. A total of 93 WRKY TFs were identified from A. tauschii, and 79 of them were found to be newly discovered genes compared with wheat. Gene phylogeny, gene structure and chromosome location of the 93 WRKY TFs were fully analyzed. These studies provide a global view of the WRKY TFs from A. tauschii and a firm foundation for further investigations in both A. tauschii and wheat. © 2015 S. Karger AG, Basel.

  19. Analysis of CR1 Repeats in the Zebra Finch Genome

    Directory of Open Access Journals (Sweden)

    George E. Liu

    2013-06-01

    Full Text Available Most bird species have smaller genomes and fewer repeats than mammals. Chicken Repeat 1 (CR1 repeat is one of the most abundant families of repeats, ranging from ~133,000 to ~187,000 copies accounting for ~50 to ~80% of the interspersed repeats in the zebra finch and chicken genomes, respectively. CR1 repeats are believed to have arisen from the retrotransposition of a small number of master elements, which gave rise to multiple CR1 subfamilies in the chicken. In this study, we performed a global assessment of the divergence distributions, phylogenies, and consensus sequences of CR1 repeats in the zebra finch genome. We identified and validated 34 CR1 subfamilies and further analyzed the correlation between these subfamilies. We also discovered 4 novel lineage-specific CR1 subfamilies in the zebra finch when compared to the chicken genome. We built various evolutionary trees of these subfamilies and concluded that CR1 repeats may play an important role in reshaping the structure of bird genomes.

  20. Bradyrhizobium elkanii nod regulon: insights through genomic analysis

    Directory of Open Access Journals (Sweden)

    Luciane M. P. Passaglia

    2017-07-01

    Full Text Available Abstract A successful symbiotic relationship between soybean [Glycine max (L. Merr.] and Bradyrhizobium species requires expression of the bacterial structural nod genes that encode for the synthesis of lipochitooligosaccharide nodulation signal molecules, known as Nod factors (NFs. Bradyrhizobium diazoefficiens USDA 110 possesses a wide nodulation gene repertoire that allows NF assembly and modification, with transcription of the nodYABCSUIJnolMNOnodZ operon depending upon specific activators, i.e., products of regulatory nod genes that are responsive to signaling molecules such as flavonoid compounds exuded by host plant roots. Central to this regulatory circuit of nod gene expression are NodD proteins, members of the LysR-type regulator family. In this study, publicly available Bradyrhizobium elkanii sequenced genomes were compared with the closely related B. diazoefficiens USDA 110 reference genome to determine the similarities between those genomes, especially with regards to the nod operon and nod regulon. Bioinformatics analyses revealed a correlation between functional mechanisms and key elements that play an essential role in the regulation of nod gene expression. These analyses also revealed new genomic features that had not been clearly explored before, some of which were unique for some B. elkanii genomes.

  1. The Revolution in Viral Genomics as Exemplified by the Bioinformatic Analysis of Human Adenoviruses

    Directory of Open Access Journals (Sweden)

    Sarah Torres

    2010-06-01

    Full Text Available Over the past 30 years, genomic and bioinformatic analysis of human adenoviruses has been achieved using a variety of DNA sequencing methods; initially with the use of restriction enzymes and more currently with the use of the GS FLX pyrosequencing technology. Following the conception of DNA sequencing in the 1970s, analysis of adenoviruses has evolved from 100 base pair mRNA fragments to entire genomes. Comparative genomics of adenoviruses made its debut in 1984 when nucleotides and amino acids of coding sequences within the hexon genes of two human adenoviruses (HAdV, HAdV–C2 and HAdV–C5, were compared and analyzed. It was determined that there were three different zones (1-393, 394-1410, 1411-2910 within the hexon gene, of which HAdV–C2 and HAdV–C5 shared zones 1 and 3 with 95% and 89.5% nucleotide identity, respectively. In 1992, HAdV-C5 became the first adenovirus genome to be fully sequenced using the Sanger method. Over the next seven years, whole genome analysis and characterization was completed using bioinformatic tools such as blastn, tblastx, ClustalV and FASTA, in order to determine key proteins in species HAdV-A through HAdV-F. The bioinformatic revolution was initiated with the introduction of a novel species, HAdV-G, that was typed and named by the use of whole genome sequencing and phylogenetics as opposed to traditional serology. HAdV bioinformatics will continue to advance as the latest sequencing technology enables scientists to add to and expand the resource databases. As a result of these advancements, how novel HAdVs are typed has changed. Bioinformatic analysis has become the revolutionary tool that has significantly accelerated the in-depth study of HAdV microevolution through comparative genomics.

  2. Genome Assembly and Computational Analysis Pipelines for Bacterial Pathogens

    KAUST Repository

    Rangkuti, Farania Gama Ardhina

    2011-06-01

    Pathogens lie behind the deadliest pandemics in history. To date, AIDS pandemic has resulted in more than 25 million fatal cases, while tuberculosis and malaria annually claim more than 2 million lives. Comparative genomic analyses are needed to gain insights into the molecular mechanisms of pathogens, but the abundance of biological data dictates that such studies cannot be performed without the assistance of computational approaches. This explains the significant need for computational pipelines for genome assembly and analyses. The aim of this research is to develop such pipelines. This work utilizes various bioinformatics approaches to analyze the high-­throughput genomic sequence data that has been obtained from several strains of bacterial pathogens. A pipeline has been compiled for quality control for sequencing and assembly, and several protocols have been developed to detect contaminations. Visualization has been generated of genomic data in various formats, in addition to alignment, homology detection and sequence variant detection. We have also implemented a metaheuristic algorithm that significantly improves bacterial genome assemblies compared to other known methods. Experiments on Mycobacterium tuberculosis H37Rv data showed that our method resulted in improvement of N50 value of up to 9697% while consistently maintaining high accuracy, covering around 98% of the published reference genome. Other improvement efforts were also implemented, consisting of iterative local assemblies and iterative correction of contiguated bases. Our result expedites the genomic analysis of virulent genes up to single base pair resolution. It is also applicable to virtually every pathogenic microorganism, propelling further research in the control of and protection from pathogen-­associated diseases.

  3. Comparative genomic analysis of Brazilian Leptospira kirschneri serogroup Pomona serovar Mozdok

    Directory of Open Access Journals (Sweden)

    Luisa Z Moreno

    2016-08-01

    Full Text Available Leptospira kirschneri is one of the pathogenic species of the Leptospira genus. Human and animal infection from L. kirschneri gained further attention over the last few decades. Here we present the isolation and characterisation of Brazilian L. kirschneri serogroup Pomona serovar Mozdok strain M36/05 and the comparative genomic analysis with Brazilian human strain 61H. The M36/05 strain caused pulmonary hemorrhagic lesions in the hamster model, showing high virulence. The studied genomes presented high symmetrical identity and the in silico multilocus sequence typing analysis resulted in a new allelic profile (ST101 that so far has only been associated with the Brazilian L. kirschneri serogroup Pomona serovar Mozdok strains. Considering the environmental conditions and high genomic similarity observed between strains, we suggest the existence of a Brazilian L. kirschneri serogroup Pomona serovar Mozdok lineage that could represent a high public health risk; further studies are necessary to confirm the lineage significance and distribution.

  4. Database Description - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ... QTL list, Plant DB link & Genome analysis methods Alternative name - DOI 10.18908/lsdba.nbdc01194-01-000 Cr...ers and QTLs are curated manually from the published literature. The marker information includes marker sequences, genotyping methods... Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  5. Recombination events and variability among full-length genomes of co-circulating molluscum contagiosum virus subtypes 1 and 2.

    Science.gov (United States)

    López-Bueno, Alberto; Parras-Moltó, Marcos; López-Barrantes, Olivia; Belda, Sylvia; Alejo, Alí

    2017-05-01

    Molluscum contagiosum virus (MCV) is the sole member of the Molluscipoxvirus genus and causes a highly prevalent human disease of the skin characterized by the formation of a variable number of lesions that can persist for prolonged periods of time. Two major genotypes, subtype 1 and subtype 2, are recognized, although currently only a single complete genomic sequence corresponding to MCV subtype 1 is available. Using next-generation sequencing techniques, we report the complete genomic sequence of four new MCV isolates, including the first one derived from a subtype 2. Comparisons suggest a relatively distant evolutionary split between both MCV subtypes. Further, our data illustrate concurrent circulation of distinct viruses within a population and reveal the existence of recombination events among them. These results help identify a set of MCV genes with potentially relevant roles in molluscum contagiosum epidemiology and pathogenesis.

  6. Sequencing and Analysis of Neanderthal Genomic DNA

    Energy Technology Data Exchange (ETDEWEB)

    Noonan, James P.; Coop, Graham; Kudaravalli, Sridhar; Smith,Doug; Krause, Johannes; Alessi, Joe; Chen, Feng; Platt, Darren; Paabo,Svante; Pritchard, Jonathan K.; Rubin, Edward M.

    2006-06-13

    Recovery and analysis of multiple Neanderthal autosomalsequences using a metagenomic approach reveals that modern humans andNeanderthals split ~;400,000 years ago, without significant evidence ofsubsequent admixture.

  7. Phylogeny and comparative genome analysis of a Basidiomycete fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert W.; Salamov, Asaf; Grigoriev, Igor; Hibbett, David

    2011-03-14

    Fungi of the phylum Basidiomycota, make up some 37percent of the described fungi, and are important from the perspectives of forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, plant pathogenic rusts and smuts, and some human pathogens. To better understand these important fungi, we have undertaken a comparative genomic analysis of the Basidiomycetes with available sequenced genomes. We report a phylogeny that sheds light on previously unclear evolutionary relationships among the Basidiomycetes. We also define a `core proteome? based on protein families conserved in all Basidiomycetes. We identify key expansions and contractions in protein families that may be responsible for the degradation of plant biomass such as cellulose, hemicellulose, and lignin. Finally, we speculate as to the genomic changes that drove such expansions and contractions.

  8. Sequence analysis of the genome of carnation (Dianthus caryophyllus L.).

    Science.gov (United States)

    Yagi, Masafumi; Kosugi, Shunichi; Hirakawa, Hideki; Ohmiya, Akemi; Tanase, Koji; Harada, Taro; Kishimoto, Kyutaro; Nakayama, Masayoshi; Ichimura, Kazuo; Onozaki, Takashi; Yamaguchi, Hiroyasu; Sasaki, Nobuhiro; Miyahara, Taira; Nishizaki, Yuzo; Ozeki, Yoshihiro; Nakamura, Noriko; Suzuki, Takamasa; Tanaka, Yoshikazu; Sato, Shusei; Shirasawa, Kenta; Isobe, Sachiko; Miyamura, Yoshinori; Watanabe, Akiko; Nakayama, Shinobu; Kishida, Yoshie; Kohara, Mitsuyo; Tabata, Satoshi

    2014-06-01

    The whole-genome sequence of carnation (Dianthus caryophyllus L.) cv. 'Francesco' was determined using a combination of different new-generation multiplex sequencing platforms. The total length of the non-redundant sequences was 568,887,315 bp, consisting of 45,088 scaffolds, which covered 91% of the 622 Mb carnation genome estimated by k-mer analysis. The N50 values of contigs and scaffolds were 16,644 bp and 60,737 bp, respectively, and the longest scaffold was 1,287,144 bp. The average GC content of the contig sequences was 36%. A total of 1050, 13, 92 and 143 genes for tRNAs, rRNAs, snoRNA and miRNA, respectively, were identified in the assembled genomic sequences. For protein-encoding genes, 43 266 complete and partial gene structures excluding those in transposable elements were deduced. Gene coverage was ∼ 98%, as deduced from the coverage of the core eukaryotic genes. Intensive characterization of the assigned carnation genes and comparison with those of other plant species revealed characteristic features of the carnation genome. The results of this study will serve as a valuable resource for fundamental and applied research of carnation, especially for breeding new carnation varieties. Further information on the genomic sequences is available at http://carnation.kazusa.or.jp. © The Author 2013. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  9. The Chlamydia psittaci genome: a comparative analysis of intracellular pathogens.

    Science.gov (United States)

    Voigt, Anja; Schöfl, Gerhard; Saluz, Hans Peter

    2012-01-01

    Chlamydiaceae are a family of obligate intracellular pathogens causing a wide range of diseases in animals and humans, and facing unique evolutionary constraints not encountered by free-living prokaryotes. To investigate genomic aspects of infection, virulence and host preference we have sequenced Chlamydia psittaci, the pathogenic agent of ornithosis. A comparison of the genome of the avian Chlamydia psittaci isolate 6BC with the genomes of other chlamydial species, C. trachomatis, C. muridarum, C. pneumoniae, C. abortus, C. felis and C. caviae, revealed a high level of sequence conservation and synteny across taxa, with the major exception of the human pathogen C. trachomatis. Important differences manifest in the polymorphic membrane protein family specific for the Chlamydiae and in the highly variable chlamydial plasticity zone. We identified a number of psittaci-specific polymorphic membrane proteins of the G family that may be related to differences in host-range and/or virulence as compared to closely related Chlamydiaceae. We calculated non-synonymous to synonymous substitution rate ratios for pairs of orthologous genes to identify putative targets of adaptive evolution and predicted type III secreted effector proteins. This study is the first detailed analysis of the Chlamydia psittaci genome sequence. It provides insights in the genome architecture of C. psittaci and proposes a number of novel candidate genes mostly of yet unknown function that may be important for pathogen-host interactions.

  10. Genomics-enabled analysis of the emergent disease cotton bacterial blight.

    Directory of Open Access Journals (Sweden)

    Anne Z Phillips

    2017-09-01

    Full Text Available Cotton bacterial blight (CBB, an important disease of (Gossypium hirsutum in the early 20th century, had been controlled by resistant germplasm for over half a century. Recently, CBB re-emerged as an agronomic problem in the United States. Here, we report analysis of cotton variety planting statistics that indicate a steady increase in the percentage of susceptible cotton varieties grown each year since 2009. Phylogenetic analysis revealed that strains from the current outbreak cluster with race 18 Xanthomonas citri pv. malvacearum (Xcm strains. Illumina based draft genomes were generated for thirteen Xcm isolates and analyzed along with 4 previously published Xcm genomes. These genomes encode 24 conserved and nine variable type three effectors. Strains in the race 18 clade contain 3 to 5 more effectors than other Xcm strains. SMRT sequencing of two geographically and temporally diverse strains of Xcm yielded circular chromosomes and accompanying plasmids. These genomes encode eight and thirteen distinct transcription activator-like effector genes. RNA-sequencing revealed 52 genes induced within two cotton cultivars by both tested Xcm strains. This gene list includes a homeologous pair of genes, with homology to the known susceptibility gene, MLO. In contrast, the two strains of Xcm induce different clade III SWEET sugar transporters. Subsequent genome wide analysis revealed patterns in the overall expression of homeologous gene pairs in cotton after inoculation by Xcm. These data reveal important insights into the Xcm-G. hirsutum disease complex and strategies for future development of resistant cultivars.

  11. Genomic analysis of natural selection and phenotypic variation in high-altitude mongolians.

    Directory of Open Access Journals (Sweden)

    Jinchuan Xing

    Full Text Available Deedu (DU Mongolians, who migrated from the Mongolian steppes to the Qinghai-Tibetan Plateau approximately 500 years ago, are challenged by environmental conditions similar to native Tibetan highlanders. Identification of adaptive genetic factors in this population could provide insight into coordinated physiological responses to this environment. Here we examine genomic and phenotypic variation in this unique population and present the first complete analysis of a Mongolian whole-genome sequence. High-density SNP array data demonstrate that DU Mongolians share genetic ancestry with other Mongolian as well as Tibetan populations, specifically in genomic regions related with adaptation to high altitude. Several selection candidate genes identified in DU Mongolians are shared with other Asian groups (e.g., EDAR, neighboring Tibetan populations (including high-altitude candidates EPAS1, PKLR, and CYP2E1, as well as genes previously hypothesized to be associated with metabolic adaptation (e.g., PPARG. Hemoglobin concentration, a trait associated with high-altitude adaptation in Tibetans, is at an intermediate level in DU Mongolians compared to Tibetans and Han Chinese at comparable altitude. Whole-genome sequence from a DU Mongolian (Tianjiao1 shows that about 2% of the genomic variants, including more than 300 protein-coding changes, are specific to this individual. Our analyses of DU Mongolians and the first Mongolian genome provide valuable insight into genetic adaptation to extreme environments.

  12. Comparative genomic analysis of multiple strains of two unusual plant pathogens: Pseudomonas corrugata and Pseudomonas mediterranea

    Directory of Open Access Journals (Sweden)

    Emmanouil A Trantas

    2015-08-01

    Full Text Available The non-fluorescent pseudomonads, Pseudomonas corrugata (Pcor and P. mediterranea (Pmed, are closely related species that cause pith necrosis, a disease of tomato that causes severe crop losses. However, they also show strong antagonistic effects against economically important pathogens, demonstrating their potential for utilization as biological control agents. In addition, their metabolic versatility makes them attractive for the production of commercial biomolecules and bioremediation. An extensive comparative genomics study is required to dissect the mechanisms that Pcor and Pmed employ to cause disease, prevent disease caused by other pathogens, and to mine their genomes for commercially significant chemical pathways. Here, we present the draft genomes of nine Pcor and Pmed strains from different geographical locations. This analysis covered significant genetic heterogeneity and allowed in-depth genomic comparison. All examined strains were able to trigger symptoms in tomato plants but not all induced a hypersensitive-like response in Nicotiana benthamiana. Genome-mining revealed the absence of a type III secretion system and of known type III effectors from all examined Pcor and Pmed strains. The lack of a type III secretion system appears to be unique among the plant pathogenic pseudomonads. Several gene clusters coding for type VI secretion system were detected in all genomes.

  13. Analysis of The Cancer Genome Atlas sequencing data reveals novel properties of the human papillomavirus 16 genome in head and neck squamous cell carcinoma.

    Science.gov (United States)

    Nulton, Tara J; Olex, Amy L; Dozmorov, Mikhail; Morgan, Iain M; Windle, Brad

    2017-03-14

    Human papillomavirus (HPV) DNA is detected in up to 80% of oropharyngeal carcinomas (OPC) and this HPV positive disease has reached epidemic proportions. To increase our understanding of the disease, we investigated the status of the HPV16 genome in HPV-positive head and neck cancers (HNC). Raw RNA-Seq and Whole Genome Sequence data from The Cancer Genome Atlas HNC samples were analyzed to gain a full understanding of the HPV genome status for these tumors. Several remarkable and novel observations were made following this analysis. Firstly, there are three main HPV genome states in these tumors that are split relatively evenly: An episomal only state, an integrated state, and a state in which the viral genome exists as a hybrid episome with human DNA. Secondly, none of the tumors expressed high levels of E6; E6*I is the dominant variant expressed in all tumors. The most striking conclusion from this study is that around three quarters of HPV16 positive HNC contain episomal versions of the viral genome that are likely replicating in an E1-E2 dependent manner. The clinical and therapeutic implications of these observations are discussed.

  14. In silico comparative genomic analysis of GABAA receptor transcriptional regulation

    Directory of Open Access Journals (Sweden)

    Joyce Christopher J

    2007-06-01

    Full Text Available Abstract Background Subtypes of the GABAA receptor subunit exhibit diverse temporal and spatial expression patterns. In silico comparative analysis was used to predict transcriptional regulatory features in individual mammalian GABAA receptor subunit genes, and to identify potential transcriptional regulatory components involved in the coordinate regulation of the GABAA receptor gene clusters. Results Previously unreported putative promoters were identified for the β2, γ1, γ3, ε, θ and π subunit genes. Putative core elements and proximal transcriptional factors were identified within these predicted promoters, and within the experimentally determined promoters of other subunit genes. Conserved intergenic regions of sequence in the mammalian GABAA receptor gene cluster comprising the α1, β2, γ2 and α6 subunits were identified as potential long range transcriptional regulatory components involved in the coordinate regulation of these genes. A region of predicted DNase I hypersensitive sites within the cluster may contain transcriptional regulatory features coordinating gene expression. A novel model is proposed for the coordinate control of the gene cluster and parallel expression of the α1 and β2 subunits, based upon the selective action of putative Scaffold/Matrix Attachment Regions (S/MARs. Conclusion The putative regulatory features identified by genomic analysis of GABAA receptor genes were substantiated by cross-species comparative analysis and now require experimental verification. The proposed model for the coordinate regulation of genes in the cluster accounts for the head-to-head orientation and parallel expression of the α1 and β2 subunit genes, and for the disruption of transcription caused by insertion of a neomycin gene in the close vicinity of the α6 gene, which is proximal to a putative critical S/MAR.

  15. Detailed analysis of putative genes encoding small proteins in legume genomes

    Directory of Open Access Journals (Sweden)

    Gabriel eGuillén

    2013-06-01

    Full Text Available Diverse plant genome sequencing projects coupled with powerful bioinformatics tools have facilitated massive data analysis to construct specialized databases classified according to cellular function. However, there are still a considerable number of genes encoding proteins whose function has not yet been characterized. Included in this category are small proteins (SPs, 30-150 amino acids encoded by short open reading frames (sORFs. SPs play important roles in plant physiology, growth, and development. Unfortunately, protocols focused on the genome-wide identification and characterization of sORFs are scarce or remain poorly implemented. As a result, these genes are underrepresented in many genome annotations. In this work, we exploited publicly available genome sequences of Phaseolus vulgaris, Medicago truncatula, Glycine max and Lotus japonicus to analyze the abundance of annotated SPs in plant legumes. Our strategy to uncover bona fide sORFs at the genome level was centered in bioinformatics analysis of characteristics such as evidence of expression (transcription, presence of known protein regions or domains, and identification of orthologous genes in the genomes explored. We collected 6170, 10461, 30521, and 23599 putative sORFs from P. vulgaris, G. max, M. truncatula, and L. japonicus genomes, respectively. Expressed sequence tags (ESTs available in the DFCI Gene Index database provided evidence that ~one-third of the predicted legume sORFs are expressed. Most potential SPs have a counterpart in a different plant species and counterpart regions or domains in larger proteins. Potential functional sORFs were also classified according to a reduced set of GO categories, and the expression of 13 of them during P. vulgaris nodule ontogeny was confirmed by qPCR. This analysis provides a collection of sORFs that potentially encode for meaningful SPs, and offers the possibility of their further functional evaluation.

  16. Genomic analysis of the necrotrophic fungal pathogens Sclerotinia sclerotiorum and Botrytis cinerea.

    Directory of Open Access Journals (Sweden)

    Joelle Amselem

    2011-08-01

    Full Text Available Sclerotinia sclerotiorum and Botrytis cinerea are closely related necrotrophic plant pathogenic fungi notable for their wide host ranges and environmental persistence. These attributes have made these species models for understanding the complexity of necrotrophic, broad host-range pathogenicity. Despite their similarities, the two species differ in mating behaviour and the ability to produce asexual spores. We have sequenced the genomes of one strain of S. sclerotiorum and two strains of B. cinerea. The comparative analysis of these genomes relative to one another and to other sequenced fungal genomes is provided here. Their 38-39 Mb genomes include 11,860-14,270 predicted genes, which share 83% amino acid identity on average between the two species. We have mapped the S. sclerotiorum assembly to 16 chromosomes and found large-scale co-linearity with the B. cinerea genomes. Seven percent of the S. sclerotiorum genome comprises transposable elements compared to <1% of B. cinerea. The arsenal of genes associated with necrotrophic processes is similar between the species, including genes involved in plant cell wall degradation and oxalic acid production. Analysis of secondary metabolism gene clusters revealed an expansion in number and diversity of B. cinerea-specific secondary metabolites relative to S. sclerotiorum. The potential diversity in secondary metabolism might be involved in adaptation to specific ecological niches. Comparative genome analysis revealed the basis of differing sexual mating compatibility systems between S. sclerotiorum and B. cinerea. The organization of the mating-type loci differs, and their structures provide evidence for the evolution of heterothallism from homothallism. These data shed light on the evolutionary and mechanistic bases of the genetically complex traits of necrotrophic pathogenicity and sexual mating. This resource should facilitate the functional studies designed to better understand what makes these

  17. Comparative genome analysis: selection pressure on the Borrelia vls cassettes is essential for infectivity

    Directory of Open Access Journals (Sweden)

    Wilske Bettina

    2006-08-01

    Full Text Available Abstract Background At least three species of Borrelia burgdorferi sensu lato (Bbsl cause tick-borne Lyme disease. Previous work including the genome analysis of B. burgdorferi B31 and B. garinii PBi suggested a highly variable plasmid part. The frequent occurrence of duplicated sequence stretches, the observed plasmid redundancy, as well as the mainly unknown function and variability of plasmid encoded genes rendered the relationships between plasmids within and between species largely unresolvable. Results To gain further insight into Borreliae genome properties we completed the plasmid sequences of B. garinii PBi, added the genome of a further species, B. afzelii PKo, to our analysis, and compared for both species the genomes of pathogenic and apathogenic strains. The core of all Bbsl genomes consists of the chromosome and two plasmids collinear between all species. We also found additional groups of plasmids, which share large parts of their sequences. This makes it very likely that these plasmids are relatively stable and share common ancestors before the diversification of Borrelia species. The analysis of the differences between B. garinii PBi and B. afzelii PKo genomes of low and high passages revealed that the loss of infectivity is accompanied in both species by a loss of similar genetic material. Whereas B. garinii PBi suffered only from the break-off of a plasmid end, B. afzelii PKo lost more material, probably an entire plasmid. In both cases the vls gene locus encoding for variable surface proteins is affected. Conclusion The complete genome sequences of a B. garinii and a B. afzelii strain facilitate further comparative studies within the genus Borrellia. Our study shows that loss of infectivity can be traced back to only one single event in B. garinii PBi: the loss of the vls cassettes possibly due to error prone gene conversion. Similar albeit extended losses in B. afzelii PKo support the hypothesis that infectivity of Borrelia

  18. Genomic Research Data Generation, Analysis and Sharing – Challenges in the African Setting

    Directory of Open Access Journals (Sweden)

    Nicola Mulder

    2017-11-01

    Full Text Available Genomics is the study of the genetic material that constitutes the genomes of organisms. This genetic material can be sequenced and it provides a powerful tool for the study of human, plant and animal evolutionary history and diseases. Genomics research is becoming increasingly commonplace due to significant advances in and reducing costs of technologies such as sequencing. This has led to new challenges including increasing cost and complexity of data. There is, therefore, an increasing need for computing infrastructure and skills to manage, store, analyze and interpret the data. In addition, there is a significant cost associated with recruitment of participants and collection and processing of biological samples, particularly for large human genetics studies on specific diseases. As a result, researchers are often reluctant to share the data due to the effort and associated cost. In Africa, where researchers are most commonly at the study recruitment, determination of phenotypes and collection of biological samples end of the genomic research spectrum, rather than the generation of genomic data, data sharing without adequate safeguards for the interests of the primary data generators is a concern. There are substantial ethical considerations in the sharing of human genomics data. The broad consent for data sharing preferred by genomics researchers and funders does not necessarily align with the expectations of researchers, research participants, legal authorities and bioethicists. In Africa, this is complicated by concerns about comprehension of genomics research studies, quality of research ethics reviews and understanding of the implications of broad consent, secondary analyses of shared data, return of results and incidental findings. Additional challenges with genomics research in Africa include the inability to transfer, store, process and analyze large-scale genomics data on the continent, because this requires highly specialized skills

  19. Viral genome analysis and knowledge management.

    Science.gov (United States)

    Kuiken, Carla; Yoon, Hyejin; Abfalterer, Werner; Gaschen, Brian; Lo, Chienchi; Korber, Bette

    2013-01-01

    One of the challenges of genetic data analysis is to combine information from sources that are distributed around the world and accessible through a wide array of different methods and interfaces. The HIV database and its footsteps, the hepatitis C virus (HCV) and hemorrhagic fever virus (HFV) databases, have made it their mission to make different data types easily available to their users. This involves a large amount of behind-the-scenes processing, including quality control and analysis of the sequences and their annotation. Gene and protein sequences are distilled from the sequences that are stored in GenBank; to this end, both submitter annotation and script-generated sequences are used. Alignments of both nucleotide and amino acid sequences are generated, manually curated, distilled into an alignment model, and regenerated in an iterative cycle that results in ever better new alignments. Annotation of epidemiological and clinical information is parsed, checked, and added to the database. User interfaces are updated, and new interfaces are added based upon user requests. Vital for its success, the database staff are heavy users of the system, which enables them to fix bugs and find opportunities for improvement. In this chapter we describe some of the infrastructure that keeps these heavily used analysis platforms alive and vital after nearly 25 years of use. The database/analysis platforms described in this chapter can be accessed at http://hiv.lanl.gov http://hcv.lanl.gov http://hfv.lanl.gov.

  20. Salmo salar and Esox lucius full-length cDNA sequences reveal changes in evolutionary pressures on a post-tetraploidization genome

    Directory of Open Access Journals (Sweden)

    Holt Robert A

    2010-04-01

    Full Text Available Abstract Background Salmonids are one of the most intensely studied fish, in part due to their economic and environmental importance, and in part due to a recent whole genome duplication in the common ancestor of salmonids. This duplication greatly impacts species diversification, functional specialization, and adaptation. Extensive new genomic resources have recently become available for Atlantic salmon (Salmo salar, but documentation of allelic versus duplicate reference genes remains a major uncertainty in the complete characterization of its genome and its evolution. Results From existing expressed sequence tag (EST resources and three new full-length cDNA libraries, 9,057 reference quality full-length gene insert clones were identified for Atlantic salmon. A further 1,365 reference full-length clones were annotated from 29,221 northern pike (Esox lucius ESTs. Pairwise dN/dS comparisons within each of 408 sets of duplicated salmon genes using northern pike as a diploid out-group show asymmetric relaxation of selection on salmon duplicates. Conclusions 9,057 full-length reference genes were characterized in S. salar and can be used to identify alleles and gene family members. Comparisons of duplicated genes show that while purifying selection is the predominant force acting on both duplicates, consistent with retention of functionality in both copies, some relaxation of pressure on gene duplicates can be identified. In addition, there is evidence that evolution has acted asymmetrically on paralogs, allowing one of the pair to diverge at a faster rate.

  1. Genome-Wide Detection and Analysis of Multifunctional Genes

    Science.gov (United States)

    Pritykin, Yuri; Ghersi, Dario; Singh, Mona

    2015-01-01

    Many genes can play a role in multiple biological processes or molecular functions. Identifying multifunctional genes at the genome-wide level and studying their properties can shed light upon the complexity of molecular events that underpin cellular functioning, thereby leading to a better understanding of the functional landscape of the cell. However, to date, genome-wide analysis of multifunctional genes (and the proteins they encode) has been limited. Here we introduce a computational approach that uses known functional annotations to extract genes playing a role in at least two distinct biological processes. We leverage functional genomics data sets for three organisms—H. sapiens, D. melanogaster, and S. cerevisiae—and show that, as compared to other annotated genes, genes involved in multiple biological processes possess distinct physicochemical properties, are more broadly expressed, tend to be more central in protein interaction networks, tend to be more evolutionarily conserved, and are more likely to be essential. We also find that multifunctional genes are significantly more likely to be involved in human disorders. These same features also hold when multifunctionality is defined with respect to molecular functions instead of biological processes. Our analysis uncovers key features about multifunctional genes, and is a step towards a better genome-wide understanding of gene multifunctionality. PMID:26436655

  2. Genomic analysis and selected molecular pathways in rare cancers

    International Nuclear Information System (INIS)

    Liu, Stephen V; Lenkiewicz, Elizabeth; Evers, Lisa; Holley, Tara; Kiefer, Jeffrey; Demeure, Michael J; Ramanathan, Ramesh K; Von Hoff, Daniel D; Barrett, Michael T; Ruiz, Christian; Glatz, Katharina; Bubendorf, Lukas; Eng, Cathy

    2012-01-01

    It is widely accepted that many cancers arise as a result of an acquired genomic instability and the subsequent evolution of tumor cells with variable patterns of selected and background aberrations. The presence and behaviors of distinct neoplastic cell populations within a patient's tumor may underlie multiple clinical phenotypes in cancers. A goal of many current cancer genome studies is the identification of recurring selected driver events that can be advanced for the development of personalized therapies. Unfortunately, in the majority of rare tumors, this type of analysis can be particularly challenging. Large series of specimens for analysis are simply not available, allowing recurring patterns to remain hidden. In this paper, we highlight the use of DNA content-based flow sorting to identify and isolate DNA-diploid and DNA-aneuploid populations from tumor biopsies as a strategy to comprehensively study the genomic composition and behaviors of individual cancers in a series of rare solid tumors: intrahepatic cholangiocarcinoma, anal carcinoma, adrenal leiomyosarcoma, and pancreatic neuroendocrine tumors. We propose that the identification of highly selected genomic events in distinct tumor populations within each tumor can identify candidate driver events that can facilitate the development of novel, personalized treatment strategies for patients with cancer. (paper)

  3. A genomic background based method for association analysis in related individuals.

    Directory of Open Access Journals (Sweden)

    Najaf Amin

    Full Text Available BACKGROUND: Feasibility of genotyping of hundreds and thousands of single nucleotide polymorphisms (SNPs in thousands of study subjects have triggered the need for fast, powerful, and reliable methods for genome-wide association analysis. Here we consider a situation when study participants are genetically related (e.g. due to systematic sampling of families or because a study was performed in a genetically isolated population. Of the available methods that account for relatedness, the Measured Genotype (MG approach is considered the 'gold standard'. However, MG is not efficient with respect to time taken for the analysis of genome-wide data. In this context we proposed a fast two-step method called Genome-wide Association using Mixed Model and Regression (GRAMMAR for the analysis of pedigree-based quantitative traits. This method certainly overcomes the drawback of time limitation of the measured genotype (MG approach, but pays in power. One of the major drawbacks of both MG and GRAMMAR, is that they crucially depend on the availability of complete and correct pedigree data, which is rarely available. METHODOLOGY: In this study we first explore type 1 error and relative power of MG, GRAMMAR, and Genomic Control (GC approaches for genetic association analysis. Secondly, we propose an extension to GRAMMAR i.e. GRAMMAR-GC. Finally, we propose application of GRAMMAR-GC using the kinship matrix estimated through genomic marker data, instead of (possibly missing and/or incorrect genealogy. CONCLUSION: Through simulations we show that MG approach maintains high power across a range of heritabilities and possible pedigree structures, and always outperforms other contemporary methods. We also show that the power of our proposed GRAMMAR-GC approaches to that of the 'gold standard' MG for all models and pedigrees studied. We show that this method is both feasible and powerful and has correct type 1 error in the context of genome-wide association analysis

  4. Analysis of the Legionella longbeachae genome and transcriptome uncovers unique strategies to cause Legionnaires' disease.

    Directory of Open Access Journals (Sweden)

    Christel Cazalet

    2010-02-01

    Full Text Available Legionella pneumophila and L. longbeachae are two species of a large genus of bacteria that are ubiquitous in nature. L. pneumophila is mainly found in natural and artificial water circuits while L. longbeachae is mainly present in soil. Under the appropriate conditions both species are human pathogens, capable of causing a severe form of pneumonia termed Legionnaires' disease. Here we report the sequencing and analysis of four L. longbeachae genomes, one complete genome sequence of L. longbeachae strain NSW150 serogroup (Sg 1, and three draft genome sequences another belonging to Sg1 and two to Sg2. The genome organization and gene content of the four L. longbeachae genomes are highly conserved, indicating strong pressure for niche adaptation. Analysis and comparison of L. longbeachae strain NSW150 with L. pneumophila revealed common but also unexpected features specific to this pathogen. The interaction with host cells shows distinct features from L. pneumophila, as L. longbeachae possesses a unique repertoire of putative Dot/Icm type IV secretion system substrates, eukaryotic-like and eukaryotic domain proteins, and encodes additional secretion systems. However, analysis of the ability of a dotA mutant of L. longbeachae NSW150 to replicate in the Acanthamoeba castellanii and in a mouse lung infection model showed that the Dot/Icm type IV secretion system is also essential for the virulence of L. longbeachae. In contrast to L. pneumophila, L. longbeachae does not encode flagella, thereby providing a possible explanation for differences in mouse susceptibility to infection between the two pathogens. Furthermore, transcriptome analysis revealed that L. longbeachae has a less pronounced biphasic life cycle as compared to L. pneumophila, and genome analysis and electron microscopy suggested that L. longbeachae is encapsulated. These species-specific differences may account for the different environmental niches and disease epidemiology of these

  5. An analysis of expressed sequence tags of developing castor endosperm using a full-length cDNA library

    Directory of Open Access Journals (Sweden)

    Wallis James G

    2007-07-01

    Full Text Available Abstract Background Castor seeds are a major source for ricinoleate, an important industrial raw material. Genomics studies of castor plant will provide critical information for understanding seed metabolism, for effectively engineering ricinoleate production in transgenic oilseeds, or for genetically improving castor plants by eliminating toxic and allergic proteins in seeds. Results Full-length cDNAs are useful resources in annotating genes and in providing functional analysis of genes and their products. We constructed a full-length cDNA library from developing castor endosperm, and obtained 4,720 ESTs from 5'-ends of the cDNA clones representing 1,908 unique sequences. The most abundant transcripts are genes encoding storage proteins, ricin, agglutinin and oleosins. Several other sequences are also very numerous, including two acidic triacylglycerol lipases, and the oleate hydroxylase (FAH12 gene that is responsible for ricinoleate biosynthesis. The role(s of the lipases in developing castor seeds are not clear, and co-expressing of a lipase and the FAH12 did not result in significant changes in hydroxy fatty acid accumulation in transgenic Arabidopsis seeds. Only one oleate desaturase (FAD2 gene was identified in our cDNA sequences. Sequence and functional analyses of the castor FAD2 were carried out since it had not been characterized previously. Overexpression of castor FAD2 in a FAH12-expressing Arabidopsis line resulted in decreased accumulation of hydroxy fatty acids in transgenic seeds. Conclusion Our results suggest that transcriptional regulation of FAD2 and FAH12 genes maybe one of the mechanisms that contribute to a high level of ricinoleate accumulation in castor endosperm. The full-length cDNA library will be used to search for additional genes that affect ricinoleate accumulation in seed oils. Our EST sequences will also be useful to annotate the castor genome, which whole sequence is being generated by shotgun sequencing at

  6. Genome-association analysis of Korean Holstein milk traits using genomic estimated breeding value

    Directory of Open Access Journals (Sweden)

    Donghyun Shin

    2017-03-01

    Full Text Available Objective Holsteins are known as the world’s highest-milk producing dairy cattle. The purpose of this study was to identify genetic regions strongly associated with milk traits (milk production, fat, and protein using Korean Holstein data. Methods This study was performed using single nucleotide polymorphism (SNP chip data (Illumina BovineSNP50 Beadchip of 911 Korean Holstein individuals. We inferred each genomic estimated breeding values based on best linear unbiased prediction (BLUP and ridge regression using BLUPF90 and R. We then performed a genome-wide association study and identified genetic regions related to milk traits. Results We identified 9, 6, and 17 significant genetic regions related to milk production, fat and protein, respectively. These genes are newly reported in the genetic association with milk traits of Holstein. Conclusion This study complements a recent Holstein genome-wide association studies that identified other SNPs and genes as the most significant variants. These results will help to expand the knowledge of the polygenic nature of milk production in Holsteins.

  7. Integrative Genomic Analysis of Complex traits

    DEFF Research Database (Denmark)

    Ehsani, Ali Reza

    In the last decade rapid development in biotechnologies has made it possible to extract extensive information about practically all levels of biological organization. An ever-increasing number of studies are reporting miltilayered datasets on the entire DNA sequence, transceroption, protein...... expression, and metabolite abundance of more and more populations in a multitude of invironments. However, a solid model for including all of this complex information in one analysis, to disentangle genetic variation and the underlying genetic architecture of complex traits and diseases, has not yet been...

  8. Genome-wide analysis of EgEVE_1, a transcriptionally active endogenous viral element associated to small RNAs in Eucalyptus genomes

    Directory of Open Access Journals (Sweden)

    Helena Sanches Marcon

    2017-02-01

    Full Text Available Abstract Endogenous viral elements (EVEs are the result of heritable horizontal gene transfer from viruses to hosts. In the last years, several EVE integration events were reported in plants by the exponential availability of sequenced genomes. Eucalyptus grandis is a forest tree species with a sequenced genome that is poorly studied in terms of evolution and mobile genetic elements composition. Here we report the characterization of E. grandis endogenous viral element 1 (EgEVE_1, a transcriptionally active EVE with a size of 5,664 bp. Phylogenetic analysis and genomic distribution demonstrated that EgEVE_1 is a newly described member of the Caulimoviridae family, distinct from the recently characterized plant Florendoviruses. Genomic distribution of EgEVE_1 and Florendovirus is also distinct. EgEVE_1 qPCR quantification in Eucalyptus urophylla suggests that this genome has more EgEVE_1 copies than E. grandis. EgEVE_1 transcriptional activity was demonstrated by RT-qPCR in five Eucalyptus species and one intrageneric hybrid. We also identified that Eucalyptus EVEs can generate small RNAs (sRNAs,that might be involved in de novo DNA methylation and virus resistance. Our data suggest that EVE families in Eucalyptus have distinct properties, and we provide the first comparative analysis of EVEs in Eucalyptus genomes.

  9. NeisseriaBase: a specialised Neisseria genomic resource and analysis platform

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    2016-03-01

    Full Text Available Background. The gram-negative Neisseria is associated with two of the most potent human epidemic diseases: meningococcal meningitis and gonorrhoea. In both cases, disease is caused by bacteria colonizing human mucosal membrane surfaces. Overall, the genus shows great diversity and genetic variation mainly due to its ability to acquire and incorporate genetic material from a diverse range of sources through horizontal gene transfer. Although a number of databases exist for the Neisseria genomes, they are mostly focused on the pathogenic species. In this present study we present the freely available NeisseriaBase, a database dedicated to the genus Neisseria encompassing the complete and draft genomes of 15 pathogenic and commensal Neisseria species. Methods. The genomic data were retrieved from National Center for Biotechnology Information (NCBI and annotated using the RAST server which were then stored into the MySQL database. The protein-coding genes were further analyzed to obtain information such as calculation of GC content (%, predicted hydrophobicity and molecular weight (Da using in-house Perl scripts. The web application was developed following the secure four-tier web application architecture: (1 client workstation, (2 web server, (3 application server, and (4 database server. The web interface was constructed using PHP, JavaScript, jQuery, AJAX and CSS, utilizing the model-view-controller (MVC framework. The in-house developed bioinformatics tools implemented in NeisseraBase were developed using Python, Perl, BioPerl and R languages. Results. Currently, NeisseriaBase houses 603,500 Coding Sequences (CDSs, 16,071 RNAs and 13,119 tRNA genes from 227 Neisseria genomes. The database is equipped with interactive web interfaces. Incorporation of the JBrowse genome browser in the database enables fast and smooth browsing of Neisseria genomes. NeisseriaBase includes the standard BLAST program to facilitate homology searching, and for Virulence

  10. Online Genome Analysis Resources for Educators, a Comparative Review

    Directory of Open Access Journals (Sweden)

    Sarah Grace Prescott

    2012-08-01

    Full Text Available A comparative review of several companies that offer similar kits or services that allow students to isolate DNA (human and others, amplify it by PCR, and in some cases sequence the resulting sample.  The companies include:  Carolina® Biological Supply Company, Bio-Rad®, Edvotek® Inc., Hiram Genomics Store, and 23andMe.

  11. Genomic Characterization for Parasitic Weeds of the Genus Striga by Sample Sequence Analysis

    Directory of Open Access Journals (Sweden)

    Matt C. Estep

    2012-03-01

    Full Text Available Generation of ∼2200 Sanger sequence reads or ∼10,000 454 reads for seven Lour. DNA samples (five species allowed identification of the highly repetitive DNA content in these genomes. The 14 most abundant repeats in these species were identified and partially assembled. Annotation indicated that they represent nine long terminal repeat (LTR retrotransposon families, three tandem satellite repeats, one long interspersed element (LINE retroelement, and one DNA transposon. All of these repeats are most closely related to repetitive elements in other closely related plants and are not products of horizontal transfer from their host species. These repeats were differentially abundant in each species, with the LTR retrotransposons and satellite repeats most responsible for variation in genome size. Each species had some repetitive elements that were more abundant and some less abundant than the other species examined, indicating that no single element or any unilateral growth or decrease trend in genome behavior was responsible for variation in genome size and composition. Genome sizes were determined by flow sorting, and the values of 615 Mb [ (L. Kuntze], 1330 Mb [ (Willd. Vatke], 1425 Mb [ (Delile Benth.] and 2460 Mb ( Benth. suggest a ploidy series, a prediction supported by repetitive DNA sequence analysis. Phylogenetic analysis using six chloroplast loci indicated the ancestral relationships of the five most agriculturally important species, with the unexpected result that the one parasite of dicotyledonous plants ( was found to be more closely related to some of the grass parasites than many of the grass parasites are to each other.

  12. Quantitative analysis of polycomb response elements (PREs at identical genomic locations distinguishes contributions of PRE sequence and genomic environment

    Directory of Open Access Journals (Sweden)

    Okulski Helena

    2011-03-01

    Full Text Available Abstract Background Polycomb/Trithorax response elements (PREs are cis-regulatory elements essential for the regulation of several hundred developmentally important genes. However, the precise sequence requirements for PRE function are not fully understood, and it is also unclear whether these elements all function in a similar manner. Drosophila PRE reporter assays typically rely on random integration by P-element insertion, but PREs are extremely sensitive to genomic position. Results We adapted the ΦC31 site-specific integration tool to enable systematic quantitative comparison of PREs and sequence variants at identical genomic locations. In this adaptation, a miniwhite (mw reporter in combination with eye-pigment analysis gives a quantitative readout of PRE function. We compared the Hox PRE Frontabdominal-7 (Fab-7 with a PRE from the vestigial (vg gene at four landing sites. The analysis revealed that the Fab-7 and vg PREs have fundamentally different properties, both in terms of their interaction with the genomic environment at each site and their inherent silencing abilities. Furthermore, we used the ΦC31 tool to examine the effect of deletions and mutations in the vg PRE, identifying a 106 bp region containing a previously predicted motif (GTGT that is essential for silencing. Conclusions This analysis showed that different PREs have quantifiably different properties, and that changes in as few as four base pairs have profound effects on PRE function, thus illustrating the power and sensitivity of ΦC31 site-specific integration as a tool for the rapid and quantitative dissection of elements of PRE design.

  13. Comparative Genome Analysis of Basidiomycete Fungi

    Energy Technology Data Exchange (ETDEWEB)

    Riley, Robert; Salamov, Asaf; Morin, Emmanuelle; Nagy, Laszlo; Manning, Gerard; Baker, Scott; Brown, Daren; Henrissat, Bernard; Levasseur, Anthony; Hibbett, David; Martin, Francis; Grigoriev, Igor

    2012-03-19

    Fungi of the phylum Basidiomycota (basidiomycetes), make up some 37percent of the described fungi, and are important in forestry, agriculture, medicine, and bioenergy. This diverse phylum includes the mushrooms, wood rots, symbionts, and plant and animal pathogens. To better understand the diversity of phenotypes in basidiomycetes, we performed a comparative analysis of 35 basidiomycete fungi spanning the diversity of the phylum. Phylogenetic patterns of lignocellulose degrading genes suggest a continuum rather than a sharp dichotomy between the white rot and brown rot modes of wood decay. Patterns of secondary metabolic enzymes give additional insight into the broad array of phenotypes found in the basidiomycetes. We suggest that the profile of an organism in lignocellulose-targeting genes can be used to predict its nutritional mode, and predict Dacryopinax sp. as a brown rot; Botryobasidium botryosum and Jaapia argillacea as white rots.

  14. Comparative genome analysis of the high pathogenicity Salmonella Typhimurium strain UK-1.

    Directory of Open Access Journals (Sweden)

    Yingqin Luo

    Full Text Available Salmonella enterica serovar Typhimurium, a gram-negative facultative rod-shaped bacterium causing salmonellosis and foodborne disease, is one of the most common isolated Salmonella serovars in both developed and developing nations. Several S. Typhimurium genomes have been completed and many more genome-sequencing projects are underway. Comparative genome analysis of the multiple strains leads to a better understanding of the evolution of S. Typhimurium and its pathogenesis. S. Typhimurium strain UK-1 (belongs to phage type 1 is highly virulent when orally administered to mice and chickens and efficiently colonizes lymphoid tissues of these species. These characteristics make this strain a good choice for use in vaccine development. In fact, UK-1 has been used as the parent strain for a number of nonrecombinant and recombinant vaccine strains, including several commercial vaccines for poultry. In this study, we conducted a thorough comparative genome analysis of the UK-1 strain with other S. Typhimurium strains and examined the phenotypic impact of several genomic differences. Whole genomic comparison highlights an extremely close relationship between the UK-1 strain and other S. Typhimurium strains; however, many interesting genetic and genomic variations specific to UK-1 were explored. In particular, the deletion of a UK-1-specific gene that is highly similar to the gene encoding the T3SS effector protein NleC exhibited a significant decrease in oral virulence in BALB/c mice. The complete genetic complements in UK-1, especially those elements that contribute to virulence or aid in determining the diversity within bacterial species, provide key information in evaluating the functional characterization of important genetic determinants and for development of vaccines.

  15. Isolation and complete genome analysis of neurotropic dengue virus serotype 3 from the cerebrospinal fluid of an encephalitis patient.

    Directory of Open Access Journals (Sweden)

    Rama Dhenni

    2018-01-01

    Full Text Available Although neurological manifestations associated with dengue viruses (DENV infection have been reported, there is very limited information on the genetic characteristics of neurotropic DENV. Here we describe the isolation and complete genome analysis of DENV serotype 3 (DENV-3 from cerebrospinal fluid of an encephalitis paediatric patient in Jakarta, Indonesia. Next-generation sequencing was employed to deduce the complete genome of the neurotropic DENV-3 isolate. Based on complete genome analysis, two unique and nine uncommon amino acid changes in the protein coding region were observed in the virus. A phylogenetic tree and molecular clock analysis revealed that the neurotropic virus was a member of Sumatran-Javan clade of DENV-3 genotype I and shared a common ancestor with other isolates from Jakarta around 1998. This is the first report of neurotropic DENV-3 complete genome analysis, providing detailed information on the genetic characteristics of this virus.

  16. Identification and Characterization of Microsatellite Markers Derived from the Whole Genome Analysis of Taenia solium.

    Directory of Open Access Journals (Sweden)

    Mónica J Pajuelo

    2015-12-01

    Full Text Available Infections with Taenia solium are the most common cause of adult acquired seizures worldwide, and are the leading cause of epilepsy in developing countries. A better understanding of the genetic diversity of T. solium will improve parasite diagnostics and transmission pathways in endemic areas thereby facilitating the design of future control measures and interventions. Microsatellite markers are useful genome features, which enable strain typing and identification in complex pathogen genomes. Here we describe microsatellite identification and characterization in T. solium, providing information that will assist in global efforts to control this important pathogen.For genome sequencing, T. solium cysts and proglottids were collected from Huancayo and Puno in Peru, respectively. Using next generation sequencing (NGS and de novo assembly, we assembled two draft genomes and one hybrid genome. Microsatellite sequences were identified and 36 of them were selected for further analysis. Twenty T. solium isolates were collected from Tumbes in the northern region, and twenty from Puno in the southern region of Peru. The size-polymorphism of the selected microsatellites was determined with multi-capillary electrophoresis. We analyzed the association between microsatellite polymorphism and the geographic origin of the samples.The predicted size of the hybrid (proglottid genome combined with cyst genome T. solium genome was 111 MB with a GC content of 42.54%. A total of 7,979 contigs (>1,000 nt were obtained. We identified 9,129 microsatellites in the Puno-proglottid genome and 9,936 in the Huancayo-cyst genome, with 5 or more repeats, ranging from mono- to hexa-nucleotide. Seven microsatellites were polymorphic and 29 were monomorphic within the analyzed isolates. T. solium tapeworms were classified into two genetic groups that correlated with the North/South geographic origin of the parasites.The availability of draft genomes for T. solium represents a

  17. Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Database Site Policy | Contact Us Marker list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  18. QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...Policy | Contact Us QTL list - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  19. Plant DB link - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive [Life Science Database Archive metadata

    Lifescience Database Archive (English)

    Full Text Available List Contact us PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods ...e Site Policy | Contact Us Plant DB link - PGDBj Registered plant list, Marker list, QTL list, Plant DB link & Genome analysis methods | LSDB Archive ...

  20. Benchmarking undedicated cloud computing providers for analysis of genomic datasets.

    Science.gov (United States)

    Yazar, Seyhan; Gooden, George E C; Mackey, David A; Hewitt, Alex W

    2014-01-01

    A major bottleneck in biological discovery is now emerging at the computational level. Cloud computing offers a dynamic means whereby small and medium-sized laboratories can rapidly adjust their computational capacity. We benchmarked two established cloud computing services, Amazon Web Services Elastic MapReduce (EMR) on Amazon EC2 instances and Google Compute Engine (GCE), using publicly available genomic datasets (E.coli CC102 strain and a Han Chinese male genome) and a standard bioinformatic pipeline on a Hadoop-based platform. Wall-clock time for complete assembly differed by 52.9% (95% CI: 27.5-78.2) for E.coli and 53.5% (95% CI: 34.4-72.6) for human genome, with GCE being more efficient than EMR. The cost of running this experiment on EMR and GCE differed significantly, with the costs on EMR being 257.3% (95% CI: 211.5-303.1) and 173.9% (95% CI: 134.6-213.1) more expensive for E.coli and human assemblies respectively. Thus, GCE was found to outperform EMR both in terms of cost and wall-clock time. Our findings confirm that cloud computing is an efficient and potentially cost-effective alternative for analysis of large genomic datasets. In addition to releasing our cost-effectiveness comparison, we present available ready-to-use scripts for establishing Hadoop instances with Ganglia monitoring on EC2 or GCE.

  1. Data analysis in the post-genome-wide association study era

    Directory of Open Access Journals (Sweden)

    Qiao-Ling Wang

    2016-12-01

    Full Text Available Since the first report of a genome-wide association study (GWAS on human age-related macular degeneration, GWAS has successfully been used to discover genetic variants for a variety of complex human diseases and/or traits, and thousands of associated loci have been identified. However, the underlying mechanisms for these loci remain largely unknown. To make these GWAS findings more useful, it is necessary to perform in-depth data mining. The data analysis in the post-GWAS era will include the following aspects: fine-mapping of susceptibility regions to identify susceptibility genes for elucidating the biological mechanism of action; joint analysis of susceptibility genes in different diseases; integration of GWAS, transcriptome, and epigenetic data to analyze expression and methylation quantitative trait loci at the whole-genome level, and find single-nucleotide polymorphisms that influence gene expression and DNA methylation; genome-wide association analysis of disease-related DNA copy number variations. Applying these strategies and methods will serve to strengthen GWAS data to enhance the utility and significance of GWAS in improving understanding of the genetics of complex diseases or traits and translate these findings for clinical applications. Keywords: Genome-wide association study, Data mining, Integrative data analysis, Polymorphism, Copy number variation

  2. Group sparse canonical correlation analysis for genomic data integration.

    Science.gov (United States)

    Lin, Dongdong; Zhang, Jigang; Li, Jingyao; Calhoun, Vince D; Deng, Hong-Wen; Wang, Yu-Ping

    2013-08-12

    The emergence of high-throughput genomic datasets from different sources and platforms (e.g., gene expression, single nucleotide polymorphisms (SNP), and copy number variation (CNV)) has greatly enhanced our understandings of the interplay of these genomic factors as well as their influences on the complex diseases. It is challenging to explore the relationship between these different types of genomic data sets. In this paper, we focus on a multivariate statistical method, canonical correlation analysis (CCA) method for this problem. Conventional CCA method does not work effectively if the number of data samples is significantly less than that of biomarkers, which is a typical case for genomic data (e.g., SNPs). Sparse CCA (sCCA) methods were introduced to overcome such difficulty, mostly using penalizations with l-1 norm (CCA-l1) or the combination of l-1and l-2 norm (CCA-elastic net). However, they overlook the structural or group effect within genomic data in the analysis, which often exist and are important (e.g., SNPs spanning a gene interact and work together as a group). We propose a new group sparse CCA method (CCA-sparse group) along with an effective numerical algorithm to study the mutual relationship between two different types of genomic data (i.e., SNP and gene expression). We then extend the model to a more general formulation that can include the existing sCCA models. We apply the model to feature/variable selection from two data sets and compare our group sparse CCA method with existing sCCA methods on both simulation and two real datasets (human gliomas data and NCI60 data). We use a graphical representation of the samples with a pair of canonical variates to demonstrate the discriminating characteristic of the selected features. Pathway analysis is further performed for biological interpretation of those features. The CCA-sparse group method incorporates group effects of features into the correlation analysis while performs individual feature

  3. Characterization of a Full-Length Endogenous Beta-Retrovirus, EqERV-Beta1, in the Genome of the Horse (Equus caballus

    Directory of Open Access Journals (Sweden)

    Antoinette C. van der Kuyl

    2011-06-01

    Full Text Available Information on endogenous retroviruses fixed in the horse (Equus caballus genome is scarce. The recent availability of a draft sequence of the horse genome enables the detection of such integrated viruses by similarity search. Using translated nucleotide fragments from gamma-, beta-, and delta-retroviral genera for initial searches, a full-length beta-retrovirus genome was retrieved from a horse chromosome 5 contig. The provirus, tentatively named EqERV-beta1 (for the first equine endogenous beta-retrovirus, was 10434 nucleotide (nt in length with the usual retroviral genome structure of 5’LTR-gag-pro-pol-env-3’LTR. The LTRs were 1361 nt long, and differed approximately 1% from each other, suggestive of a relatively recent integration. Coding sequences for gag, pro and pol were present in three different reading-frames, as common for beta-retroviruses, and the reading frames were completely open, except that the env gene was interrupted by a single stopcodon. No reading frame was apparent downstream of the env gene, suggesting that EqERV-beta1 does not encode a superantigen like mouse mammary tumor virus (MMTV. A second proviral genome of EqERV-beta1, with no stopcodon in env, is additionally integrated on chromosome 5 downstream of the first virus. Single EqERV-beta1 LTRs were abundantly present on all chromosomes except chromosome 24. Phylogenetically, EqERV-beta1 most closely resembles an unclassified retroviral sequence from cattle (Bos taurus, and the murine beta-retrovirus MMTV.

  4. Whole-Genome Sequencing and Comparative Genome Analysis of Bacillus subtilis Strains Isolated from Non-Salted Fermented Soybean Foods.

    Directory of Open Access Journals (Sweden)

    Mayumi Kamada

    Full Text Available Bacillus subtilis is the main component in the fermentation of soybeans. To investigate the genetics of the soybean-fermenting B. subtilis strains and its relationship with the productivity of extracellular poly-γ-glutamic acid (γPGA, we sequenced the whole genome of eight B. subtilis stains isolated from non-salted fermented soybean foods in Southeast Asia. Assembled nucleotide sequences were compared with those of a natto (fermented soybean food starter strain B. subtilis BEST195 and the laboratory standard strain B. subtilis 168 that is incapable of γPGA production. Detected variants were investigated in terms of insertion sequences, biotin synthesis, production of subtilisin NAT, and regulatory genes for γPGA synthesis, which were related to fermentation process. Comparing genome sequences, we found that the strains that produce γPGA have a deletion in a protein that constitutes the flagellar basal body, and this deletion was not found in the non-producing strains. We further identified diversity in variants of the bio operon, which is responsible for the biotin auxotrophism of the natto starter strains. Phylogenetic analysis using multilocus sequencing typing revealed that the B. subtilis strains isolated from the non-salted fermented soybeans were not clustered together, while the natto-fermenting strains were tightly clustered; this analysis also suggested that the strain isolated from "Tua Nao" of Thailand traces a different evolutionary process from other strains.

  5. Genomic insight into the common carp (Cyprinus carpio) genome by sequencing analysis of BAC-end sequences

    Science.gov (United States)

    2011-01-01

    Background Common carp is one of the most important aquaculture teleost fish in the world. Common carp and other closely related Cyprinidae species provide over 30% aquaculture production in the world. However, common carp genomic resources are still relatively underdeveloped. BAC end sequences (BES) are important resources for genome research on BAC-anchored genetic marker development, linkage map and physical map integration, and whole genome sequence assembling and scaffolding. Result To develop such valuable resources in common carp (Cyprinus carpio), a total of 40,224 BAC clones were sequenced on both ends, generating 65,720 clean BES with an average read length of 647 bp after sequence processing, representing 42,522,168 bp or 2.5% of common carp genome. The first survey of common carp genome was conducted with various bioinformatics tools. The common carp genome contains over 17.3% of repetitive elements with GC content of 36.8% and 518 transposon ORFs. To identify and develop BAC-anchored microsatellite markers, a total of 13,581 microsatellites were detected from 10,355 BES. The coding region of 7,127 genes were recognized from 9,443 BES on 7,453 BACs, with 1,990 BACs have genes on both ends. To evaluate the similarity to the genome of closely related zebrafish, BES of common carp were aligned against zebrafish genome. A total of 39,335 BES of common carp have conserved homologs on zebrafish genome which demonstrated the high similarity between zebrafish and common carp genomes, indicating the feasibility of comparative mapping between zebrafish and common carp once we have physical map of common carp. Conclusion BAC end sequences are great resources for the first genome wide survey of common carp. The repetitive DNA was estimated to be approximate 28% of common carp genome, indicating the higher complexity of the genome. Comparative analysis had mapped around 40,000 BES to zebrafish genome and established over 3,100 microsyntenies, covering over 50% of

  6. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney; Kodzius, Rimantas; Tan, Yue Ying; Tay, Alice; Tay, Boon-Hui; Venkatesh, Byrappa

    2012-01-01

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the 'oligo-capping' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5'-ESTs and 41,317 3'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for whole

  7. Sequencing and analysis of full-length cDNAs, 5'-ESTs and 3'-ESTs from a cartilaginous fish, the elephant shark (Callorhinchus milii).

    KAUST Repository

    Brenner, Sydney

    2012-10-08

    Cartilaginous fishes are the most ancient group of living jawed vertebrates (gnathostomes) and are, therefore, an important reference group for understanding the evolution of vertebrates. The elephant shark (Callorhinchus milii), a holocephalan cartilaginous fish, has been identified as a model cartilaginous fish genome because of its compact genome (∼910 Mb) and a genome project has been initiated to obtain its whole genome sequence. In this study, we have generated and sequenced full-length enriched cDNA libraries of the elephant shark using the \\'oligo-capping\\' method and Sanger sequencing. A total of 6,778 full-length protein-coding cDNA and 10,701 full-length noncoding cDNA were sequenced from six tissues (gills, intestine, kidney, liver, spleen, and testis) of the elephant shark. Analysis of their polyadenylation signals showed that polyadenylation usage in elephant shark is similar to that in mammals. Furthermore, both coding and noncoding transcripts of the elephant shark use the same proportion of canonical polyadenylation sites. Besides BLASTX searches, protein-coding transcripts were annotated by Gene Ontology, InterPro domain, and KEGG pathway analyses. By comparing elephant shark genes to bony vertebrate genes, we identified several ancient genes present in elephant shark but differentially lost in tetrapods or teleosts. Only ∼6% of elephant shark noncoding cDNA showed similarity to known noncoding RNAs (ncRNAs). The rest are either highly divergent ncRNAs or novel ncRNAs. In addition to full-length transcripts, 30,375 5\\'-ESTs and 41,317 3\\'-ESTs were sequenced and annotated. The clones and transcripts generated in this study are valuable resources for annotating transcription start sites, exon-intron boundaries, and UTRs of genes in the elephant shark genome, and for the functional characterization of protein sequences. These resources will also be useful for annotating genes in other cartilaginous fishes whose genomes have been targeted for

  8. Comparative Genomic Analysis of Holospora spp., Intranuclear Symbionts of Paramecia

    Directory of Open Access Journals (Sweden)

    Sofya K. Garushyants

    2018-04-01

    Full Text Available While most endosymbiotic bacteria are transmitted only vertically, Holospora spp., an alphaproteobacterium from the Rickettsiales order, can desert its host and invade a new one. All bacteria from the genus Holospora are intranuclear symbionts of ciliates Paramecium spp. with strict species and nuclear specificity. Comparative metabolic reconstruction based on the newly sequenced genome of Holospora curviuscula, a macronuclear symbiont of Paramecium bursaria, and known genomes of other Holospora species shows that even though all Holospora spp. can persist outside the host, they cannot synthesize most of the essential small molecules, such as amino acids, and lack some central energy metabolic pathways, including glycolysis and the citric acid cycle. As the main energy source, Holospora spp. likely rely on nucleotides pirated from the host. Holospora-specific genes absent from other Rickettsiales are possibly involved in the lifestyle switch from the infectious to the reproductive form and in cell invasion.

  9. Genome analysis of the anaerobic thermohalophilic bacterium Halothermothrix orenii.

    Directory of Open Access Journals (Sweden)

    Konstantinos Mavromatis

    Full Text Available Halothermothirx orenii is a strictly anaerobic thermohalophilic bacterium isolated from sediment of a Tunisian salt lake. It belongs to the order Halanaerobiales in the phylum Firmicutes. The complete sequence revealed that the genome consists of one circular chromosome of 2578146 bps encoding 2451 predicted genes. This is the first genome sequence of an organism belonging to the Haloanaerobiales. Features of both Gram positive and Gram negative bacteria were identified with the presence of both a sporulating mechanism typical of Firmicutes and a characteristic Gram negative lipopolysaccharide being the most prominent. Protein sequence analyses and metabolic reconstruction reveal a unique combination of strategies for thermophilic and halophilic adaptation. H. orenii can serve as a model organism for the study of the evolution of the Gram negative phenotype as well as the adaptation under thermohalophilic conditions and the development of biotechnological applications under conditions that require high temperatures and high salt concentrations.

  10. eHive: An Artificial Intelligence workflow system for genomic analysis

    Directory of Open Access Journals (Sweden)

    Gordon Leo

    2010-05-01

    Full Text Available Abstract Background The Ensembl project produces updates to its comparative genomics resources with each of its several releases per year. During each release cycle approximately two weeks are allocated to generate all the genomic alignments and the protein homology predictions. The number of calculations required for this task grows approximately quadratically with the number of species. We currently support 50 species in Ensembl and we expect the number to continue to grow in the future. Results We present eHive, a new fault tolerant distributed processing system initially designed to support comparative genomic analysis, based on blackboard systems, network distributed autonomous agents, dataflow graphs and block-branch diagrams. In the eHive system a MySQL database serves as the central blackboard and the autonomous agent, a Perl script, queries the system and runs jobs as required. The system allows us to define dataflow and branching rules to suit all our production pipelines. We describe the implementation of three pipelines: (1 pairwise whole genome alignments, (2 multiple whole genome alignments and (3 gene trees with protein homology inference. Finally, we show the efficiency of the system in real case scenarios. Conclusions eHive allows us to produce computationally demanding results in a reliable and efficient way with minimal supervision and high throughput. Further documentation is available at: http://www.ensembl.org/info/docs/eHive/.

  11. Assembly and comparative analysis of complete mitochondrial genome sequence of an economic plant Salix suchowensis

    Directory of Open Access Journals (Sweden)

    Ning Ye

    2017-03-01

    Full Text Available Willow is a widely used dioecious woody plant of Salicaceae family in China. Due to their high biomass yields, willows are promising sources for bioenergy crops. In this study, we assembled the complete mitochondrial (mt genome sequence of S. suchowensis with the length of 644,437 bp using Roche-454 GS FLX Titanium sequencing technologies. Base composition of the S. suchowensis mt genome is A (27.43%, T (27.59%, C (22.34%, and G (22.64%, which shows a prevalent GC content with that of other angiosperms. This long circular mt genome encodes 58 unique genes (32 protein-coding genes, 23 tRNA genes and 3 rRNA genes, and 9 of the 32 protein-coding genes contain 17 introns. Through the phylogenetic analysis of 35 species based on 23 protein-coding genes, it is supported that Salix as a sister to Populus. With the detailed phylogenetic information and the identification of phylogenetic position, some ribosomal protein genes and succinate dehydrogenase genes are found usually lost during evolution. As a native shrub willow species, this worthwhile research of S. suchowensis mt genome will provide more desirable information for better understanding the genomic breeding and missing pieces of sex determination evolution in the future.

  12. Comparative genomic sequence analysis of strawberry and other rosids reveals significant microsynteny

    Directory of Open Access Journals (Sweden)

    Abbott Albert

    2010-06-01

    Full Text Available Abstract Background Fragaria belongs to the Rosaceae, an economically important family that includes a number of important fruit producing genera such as Malus and Prunus. Using genomic sequences from 50 Fragaria fosmids, we have examined the microsynteny between Fragaria and other plant models. Results In more than half of the strawberry fosmids, we found syntenic regions that are conserved in Populus, Vitis, Medicago and/or Arabidopsis with Populus containing the greatest number of syntenic regions with Fragaria. The longest syntenic region was between LG VIII of the poplar genome and the strawberry fosmid 72E18, where seven out of twelve predicted genes were collinear. We also observed an unexpectedly high level of conserved synteny between Fragaria (rosid I and Vitis (basal rosid. One of the strawberry fosmids, 34E24, contained a cluster of R gene analogs (RGAs with NBS and LRR domains. We detected clusters of RGAs with high sequence similarity to those in 34E24 in all the genomes compared. In the phylogenetic tree we have generated, all the NBS-LRR genes grouped together with Arabidopsis CNL-A type NBS-LRR genes. The Fragaria RGA grouped together with those of Vitis and Populus in the phylogenetic tree. Conclusions Our analysis shows considerable microsynteny between Fragaria and other plant genomes such as Populus, Medicago, Vitis, and Arabidopsis to a lesser degree. We also detected a cluster of NBS-LRR type genes that are conserved in all the genomes compared.

  13. Genomic Analysis of a Marine Bacterium: Bioinformatics for Comparison, Evaluation, and Interpretation of DNA Sequences

    Directory of Open Access Journals (Sweden)

    Bhagwan N. Rekadwad

    2016-01-01

    Full Text Available A total of five highly related strains of an unidentified marine bacterium were analyzed through their short genome sequences (AM260709–AM260713. Genome-to-Genome Distance (GGDC showed high similarity to Pseudoalteromonas haloplanktis (X67024. The generated unique Quick Response (QR codes indicated no identity to other microbial species or gene sequences. Chaos Game Representation (CGR showed the number of bases concentrated in the area. Guanine residues were highest in number followed by cytosine. Frequency of Chaos Game Representation (FCGR indicated that CC and GG blocks have higher frequency in the sequence from the evaluated marine bacterium strains. Maximum GC content for the marine bacterium strains ranged 53-54%. The use of QR codes, CGR, FCGR, and GC dataset helped in identifying and interpreting short genome sequences from specific isolates. A phylogenetic tree was constructed with the bootstrap test (1000 replicates using MEGA6 software. Principal Component Analysis (PCA was carried out using EMBL-EBI MUSCLE program. Thus, generated genomic data are of great assistance for hierarchical classification in Bacterial Systematics which combined with phenotypic features represents a basic procedure for a polyphasic approach on unambiguous bacterial isolate taxonomic classification.

  14. YersiniaBase: a genomic resource and analysis platform for comparative analysis of Yersinia.

    Science.gov (United States)

    Tan, Shi Yang; Dutta, Avirup; Jakubovics, Nicholas S; Ang, Mia Yang; Siow, Cheuk Chuen; Mutha, Naresh Vr; Heydari, Hamed; Wee, Wei Yee; Wong, Guat Jah; Choo, Siew Woh

    2015-01-16

    Yersinia is a Gram-negative bacteria that includes serious pathogens such as the Yersinia pestis, which causes plague, Yersinia pseudotuberculosis, Yersinia enterocolitica. The remaining species are generally considered non-pathogenic to humans, although there is evidence that at least some of these species can cause occasional infections using distinct mechanisms from the more pathogenic species. With the advances in sequencing technologies, many genomes of Yersinia have been sequenced. However, there is currently no specialized platform to hold the rapidly-growing Yersinia genomic data and to provide analysis tools particularly for comparative analyses, which are required to provide improved insights into their biology, evolution and pathogenicity. To facilitate the ongoing and future research of Yersinia, especially those generally considered non-pathogenic species, a well-defined repository and analysis platform is needed to hold the Yersinia genomic data and analysis tools for the Yersinia research community. Hence, we have developed the YersiniaBase, a robust and user-friendly Yersinia resource and analysis platform for the analysis of Yersinia genomic data. YersiniaBase has a total of twelve species and 232 genome sequences, of which the majority are Yersinia pestis. In order to smooth the process of searching genomic data in a large database, we implemented an Asynchronous JavaScript and XML (AJAX)-based real-time searching system in YersiniaBase. Besides incorporating existing tools, which include JavaScript-based genome browser (JBrowse) and Basic Local Alignment Search Tool (BLAST), YersiniaBase also has in-house developed tools: (1) Pairwise Genome Comparison tool (PGC) for comparing two user-selected genomes; (2) Pathogenomics Profiling Tool (PathoProT) for comparative pathogenomics analysis of Yersinia genomes; (3) YersiniaTree for constructing phylogenetic tree of Yersinia. We ran analyses based on the tools and genomic data in YersiniaBase and the

  15. Genomic analysis for managing small and endangered populations: A case study in Tyrol Grey cattle

    Directory of Open Access Journals (Sweden)

    Gábor eMészáros

    2015-05-01

    Full Text Available Analysis of genomic data is increasingly becoming part of the livestock industry. Therefore the routine collection of genomic information would be an invaluable resource for management of breeding programs in small, endangered populations. The objectives of this project were to analyse 1. linkage disequlibrium decay and the effective population size; 2. Inbreeding level and effective population size (NeROH based on runs of homozygosity (ROH; 3. Prediction of genomic breeding values (GEBV within and across breeds. In addition, the use of genomic information for breed management is discussed. The study was based on all available genotypes of Tyrol Grey AI bulls. ROHs were derived based on regions covering at least 4 Mb, 8 Mb and 16 Mb regions, with the corresponding mean inbreeding coefficients 4.0%, 2.9% and 1.6%, respectively. The NeROH was 125 (NeROH>16Mb, 186 (NeROH>8Mb and 370 (NeROH>4Mb, indicating strict avoidance of close inbreeding in the population.The genomic selection was developed for and is working well in large breeds. Contrary to the expectations, the accuracy of GEBVs with very small within breed reference populations were very high, between 0.13-0.91 and 0.12-0.63, when EBVs and dEBVs were used as pseudo-phenotypes, respectively. Subsequent analyses confirmed the high accuracies being heavily influenced by parent averages. Multi-breed and across breed reference sets gave inconsistent and lower accuracies. Genomic information may have a crucial role in management of small breeds. It allows to assess relatedness between individuals, trends in inbreeding and to take decisions accordingly. These decisions would be based on the real genome architecture, rather than conventional pedigree information, which can be missing or incomplete. We strongly suggest the routine genotyping of all individuals that belong to a small breed in order to facilitate the effective management of endangered livestock populations.

  16. Comparative analysis of the genomes of two field isolates of the rice blast fungus Magnaporthe oryzae.

    Directory of Open Access Journals (Sweden)

    Minfeng Xue

    Full Text Available Rice blast caused by Magnaporthe oryzae is one of the most destructive diseases of rice worldwide. The fungal pathogen is notorious for its ability to overcome host resistance. To better understand its genetic variation in nature, we sequenced the genomes of two field isolates, Y34 and P131. In comparison with the previously sequenced laboratory strain 70-15, both field isolates had a similar genome size but slightly more genes. Sequences from the field isolates were used to improve genome assembly and gene prediction of 70-15. Although the overall genome structure is similar, a number of gene families that are likely involved in plant-fungal interactions are expanded in the field isolates. Genome-wide analysis on asynonymous to synonymous nucleotide substitution rates revealed that many infection-related genes underwent diversifying selection. The field isolates also have hundreds of isolate-specific genes and a number of isolate-specific gene duplication events. Functional characterization of randomly selected isolate-specific genes revealed that they play diverse roles, some of which affect virulence. Furthermore, each genome contains thousands of loci of transposon-like elements, but less than 30% of them are conserved among different isolates, suggesting active transposition events in M. oryzae. A total of approximately 200 genes were disrupted in these three strains by transposable elements. Interestingly, transposon-like elements tend to be associated with isolate-specific or duplicated sequences. Overall, our results indicate that gain or loss of unique genes, DNA duplication, gene family expansion, and frequent translocation of transposon-like elements are important factors in genome variation of the rice blast fungus.

  17. St2-80: a new FISH marker for St genome and genome analysis in Triticeae.

    Science.gov (United States)

    Wang, Long; Shi, Qinghua; Su, Handong; Wang, Yi; Sha, Lina; Fan, Xing; Kang, Houyang; Zhang, Haiqin; Zhou, Yonghong

    2017-07-01

    The St genome is one of the most fundamental genomes in Triticeae. Repetitive sequences are widely used to distinguish different genomes or species. The primary objectives of this study were to (i) screen a new sequence that could easily distinguish the chromosome of the St genome from those of other genomes by fluorescence in situ hybridization (FISH) and (ii) investigate the genome constitution of some species that remain uncertain and controversial. We used degenerated oligonucleotide primer PCR (Dop-PCR), Dot-blot, and FISH to screen for a new marker of the St genome and to test the efficiency of this marker in the detection of the St chromosome at different ploidy levels. Signals produced by a new FISH marker (denoted St 2 -80) were present on the entire arm of chromosomes of the St genome, except in the centromeric region. On the contrary, St 2 -80 signals were present in the terminal region of chromosomes of the E, H, P, and Y genomes. No signal was detected in the A and B genomes, and only weak signals were detected in the terminal region of chromosomes of the D genome. St 2 -80 signals were obvious and stable in chromosomes of different genomes, whether diploid or polyploid. Therefore, St 2 -80 is a potential and useful FISH marker that can be used to distinguish the St genome from those of other genomes in Triticeae.

  18. Understanding intratumor heterogeneity by combining genome analysis and mathematical modeling.

    Science.gov (United States)

    Niida, Atsushi; Nagayama, Satoshi; Miyano, Satoru; Mimori, Koshi

    2018-04-01

    Cancer is composed of multiple cell populations with different genomes. This phenomenon called intratumor heterogeneity (ITH) is supposed to be a fundamental cause of therapeutic failure. Therefore, its principle-level understanding is a clinically important issue. To achieve this goal, an interdisciplinary approach combining genome analysis and mathematical modeling is essential. For example, we have recently performed multiregion sequencing to unveil extensive ITH in colorectal cancer. Moreover, by employing mathematical modeling of cancer evolution, we demonstrated that it is possible that this ITH is generated by neutral evolution. In this review, we introduce recent advances in a research field related to ITH and also discuss strategies for exploiting novel findings on ITH in a clinical setting. © 2018 The Authors. Cancer Science published by John Wiley & Sons Australia, Ltd on behalf of Japanese Cancer Association.

  19. Genome-Centric Analysis of a Thermophilic and Cellulolytic Bacterial Consortium Derived from Composting

    Science.gov (United States)

    Lemos, Leandro N.; Pereira, Roberta V.; Quaggio, Ronaldo B.; Martins, Layla F.; Moura, Livia M. S.; da Silva, Amanda R.; Antunes, Luciana P.; da Silva, Aline M.; Setubal, João C.

    2017-01-01

    Microbial consortia selected from complex lignocellulolytic microbial communities are promising alternatives to deconstruct plant waste, since synergistic action of different enzymes is required for full degradation of plant biomass in biorefining applications. Culture enrichment also facilitates the study of interactions among consortium members, and can be a good source of novel microbial species. Here, we used a sample from a plant waste composting operation in the São Paulo Zoo (Brazil) as inoculum to obtain a thermophilic aerobic consortium enriched through multiple passages at 60°C in carboxymethylcellulose as sole carbon source. The microbial community composition of this consortium was investigated by shotgun metagenomics and genome-centric analysis. Six near-complete (over 90%) genomes were reconstructed. Similarity and phylogenetic analyses show that four of these six genomes are novel, with the following hypothesized identifications: a new Thermobacillus species; the first Bacillus thermozeamaize genome (for which currently only 16S sequences are available) or else the first representative of a new family in the Bacillales order; the first representative of a new genus in the Paenibacillaceae family; and the first representative of a new deep-branching family in the Clostridia class. The reconstructed genomes from known species were identified as Geobacillus thermoglucosidasius and Caldibacillus debilis. The metabolic potential of these recovered genomes based on COG and CAZy analyses show that these genomes encode several glycoside hydrolases (GHs) as well as other genes related to lignocellulose breakdown. The new Thermobacillus species stands out for being the richest in diversity and abundance of GHs, possessing the greatest potential for biomass degradation among the six recovered genomes. We also investigated the presence and activity of the organisms corresponding to these genomes in the composting operation from which the consortium was built

  20. An overview of the Phalaenopsis orchid genome through BAC end sequence analysis

    Directory of Open Access Journals (Sweden)

    Hsiao Yu-Yun

    2011-01-01

    Full Text Available Abstract Background Phalaenopsis orchids are popular floral crops, and development of new cultivars is economically important to floricultural industries worldwide. Analysis of orchid genes could facilitate orchid improvement. Bacterial artificial chromosome (BAC end sequences (BESs can provide the first glimpses into the sequence composition of a novel genome and can yield molecular markers for use in genetic mapping and breeding. Results We used two BAC libraries (constructed using the BamHI and HindIII restriction enzymes of Phalaenopsis equestris to generate pair-end sequences from 2,920 BAC clones (71.4% and 28.6% from the BamHI and HindIII libraries, respectively, at a success rate of 95.7%. A total of 5,535 BESs were generated, representing 4.5 Mb, or about 0.3% of the Phalaenopsis genome. The trimmed sequences ranged from 123 to 1,397 base pairs (bp in size, with an average edited read length of 821 bp. When these BESs were subjected to sequence homology searches, it was found that 641 (11.6% were predicted to represent protein-encoding regions, whereas 1,272 (23.0% contained repetitive DNA. Most of the repetitive DNA sequences were gypsy- and copia-like retrotransposons (41.9% and 12.8%, respectively, whereas only 10.8% were DNA transposons. Further, 950 potential simple sequence repeats (SSRs were discovered. Dinucleotides were the most abundant repeat motifs; AT/TA dimer repeats were the most frequent SSRs, representing 253 (26.6% of all identified SSRs. Microsynteny analysis revealed that more BESs mapped to the whole-genome sequences of poplar than to those of grape or Arabidopsis, and even fewer mapped to the rice genome. This work will facilitate analysis of the Phalaenopsis genome, and will help clarify similarities and differences in genome composition between orchids and other plant species. Conclusion Using BES analysis, we obtained an overview of the Phalaenopsis genome in terms of gene abundance, the presence of repetitive

  1. Whole genome analysis of Leptospira licerasiae provides insight into leptospiral evolution and pathogenicity.

    Directory of Open Access Journals (Sweden)

    Jessica N Ricaldi

    Full Text Available The whole genome analysis of two strains of the first intermediately pathogenic leptospiral species to be sequenced (Leptospira licerasiae strains VAR010 and MMD0835 provides insight into their pathogenic potential and deepens our understanding of leptospiral evolution. Comparative analysis of eight leptospiral genomes shows the existence of a core leptospiral genome comprising 1547 genes and 452 conserved genes restricted to infectious species (including L. licerasiae that are likely to be pathogenicity-related. Comparisons of the functional content of the genomes suggests that L. licerasiae retains several proteins related to nitrogen, amino acid and carbohydrate metabolism which might help to explain why these Leptospira grow well in artificial media compared with pathogenic species. L. licerasiae strains VAR010(T and MMD0835 possess two prophage elements. While one element is circular and shares homology with LE1 of L. biflexa, the second is cryptic and homologous to a previously identified but unnamed region in L. interrogans serovars Copenhageni and Lai. We also report a unique O-antigen locus in L. licerasiae comprised of a 6-gene cluster that is unexpectedly short compared with L. interrogans in which analogous regions may include >90 such genes. Sequence homology searches suggest that these genes were acquired by lateral gene transfer (LGT. Furthermore, seven putative genomic islands ranging in size from 5 to 36 kb are present also suggestive of antecedent LGT. How Leptospira become naturally competent remains to be determined, but considering the phylogenetic origins of the genes comprising the O-antigen cluster and other putative laterally transferred genes, L. licerasiae must be able to exchange genetic material with non-invasive environmental bacteria. The data presented here demonstrate that L. licerasiae is genetically more closely related to pathogenic than to saprophytic Leptospira and provide insight into the genomic bases for

  2. StreptoBase: An Oral Streptococcus mitis Group Genomic Resource and Analysis Platform.

    Directory of Open Access Journals (Sweden)

    Wenning Zheng

    Full Text Available The oral streptococci are spherical Gram-positive bacteria categorized under the phylum Firmicutes which are among the most common causative agents of bacterial infective endocarditis (IE and are also important agents in septicaemia in neutropenic patients. The Streptococcus mitis group is comprised of 13 species including some of the most common human oral colonizers such as S. mitis, S. oralis, S. sanguinis and S. gordonii as well as species such as S. tigurinus, S. oligofermentans and S. australis that have only recently been classified and are poorly understood at present. We present StreptoBase, which provides a specialized free resource focusing on the genomic analyses of oral species from the mitis group. It currently hosts 104 S. mitis group genomes including 27 novel mitis group strains that we sequenced using the high throughput Illumina HiSeq technology platform, and provides a comprehensive set of genome sequences for analyses, particularly comparative analyses and visualization of both cross-species and cross-strain characteristics of S. mitis group bacteria. StreptoBase incorporates sophisticated in-house designed bioinformatics web tools such as Pairwise Genome Comparison (PGC tool and Pathogenomic Profiling Tool (PathoProT, which facilitate comparative pathogenomics analysis of Streptococcus strains. Examples are provided to demonstrate how StreptoBase can be employed to compare genome structure of different S. mitis group bacteria and putative virulence genes profile across multiple streptococcal strains. In conclusion, StreptoBase offers access to a range of streptococci genomic resources as well as analysis tools and will be an invaluable platform to accelerate research in streptococci. Database URL: http://streptococcus.um.edu.my.

  3. General metabolism of Laribacter hongkongensis: a genome-wide analysis

    Directory of Open Access Journals (Sweden)

    Curreem Shirly O

    2011-04-01

    Full Text Available Abstract Background Laribacter hongkongensis is associated with community-acquired gastroenteritis and traveler's diarrhea. In this study, we performed an in-depth annotation of the genes and pathways of the general metabolism of L. hongkongensis and correlated them with its phenotypic characteristics. Results The L. hongkongensis genome possesses the pentose phosphate and gluconeogenesis pathways and tricarboxylic acid and glyoxylate cycles, but incomplete Embden-Meyerhof-Parnas and Entner-Doudoroff pathways, in agreement with its asaccharolytic phenotype. It contains enzymes for biosynthesis and β-oxidation of saturated fatty acids, biosynthesis of all 20 universal amino acids and selenocysteine, the latter not observed in Neisseria gonorrhoeae, Neisseria meningitidis and Chromobacterium violaceum. The genome contains a variety of dehydrogenases, enabling it to utilize different substrates as electron donors. It encodes three terminal cytochrome oxidases for respiration using oxygen as the electron acceptor under aerobic and microaerophilic conditions and four reductases for respiration with alternative electron acceptors under anaerobic conditions. The presence of complete tetrathionate reductase operon may confer survival advantage in mammalian host in association with diarrhea. The genome contains CDSs for incorporating sulfur and nitrogen by sulfate assimilation, ammonia assimilation and nitrate reduction. The existence of both glutamate dehydrogenase and glutamine synthetase/glutamate synthase pathways suggests an importance of ammonia metabolism in the living environments that it may encounter. Conclusions The L. hongkongensis genome possesses a variety of genes and pathways for carbohydrate, amino acid and lipid metabolism, respiratory chain and sulfur and nitrogen metabolism. These allow the bacterium to utilize various substrates for energy production and survive in different environmental niches.

  4. In vitro analysis of integrated global high-resolution DNA methylation profiling with genomic imbalance and gene expression in osteosarcoma.

    Directory of Open Access Journals (Sweden)

    Bekim Sadikovic

    Full Text Available Genetic and epigenetic changes contribute to deregulation of gene expression and development of human cancer. Changes in DNA methylation are key epigenetic factors regulating gene expression and genomic stability. Recent progress in microarray technologies resulted in developments of high resolution platforms for profiling of genetic, epigenetic and gene expression changes. OS is a pediatric bone tumor with characteristically high level of numerical and structural chromosomal changes. Furthermore, little is known about DNA methylation changes in OS. Our objective was to develop an integrative approach for analysis of high-resolution epigenomic, genomic, and gene expression profiles in order to identify functional epi/genomic differences between OS cell lines and normal human osteoblasts. A combination of Affymetrix Promoter Tilling Arrays for DNA methylation, Agilent array-CGH platform for genomic imbalance and Affymetrix Gene 1.0 platform for gene expression analysis was used. As a result, an integrative high-resolution approach for interrogation of genome-wide tumour-specific changes in DNA methylation was developed. This approach was used to provide the first genomic DNA methylation maps, and to identify and validate genes with aberrant DNA methylation in OS cell lines. This first integrative analysis of global cancer-related changes in DNA methylation, genomic imbalance, and gene expression has provided comprehensive evidence of the cumulative roles of epigenetic and genetic mechanisms in deregulation of gene expression networks.

  5. Characterization of near full-length genomes of HIV type 1 strains in Denmark: Basis for a universal therapeutic vaccine

    DEFF Research Database (Denmark)

    Andresen, Betina S.; Vinner, Lasse; Tang, Sheila Tuyet

    2007-01-01

    We report here the near full-length sequence characterization of 17 Danish clinical HIV-1 strains isolated from HLA-A02 patients not in need of ART, with relatively low viral loads and normal CD4 cell counts. Sequencing was performed directly on DNA extracted from short-term cocultures of PBMCs...... of a universal immunotherapeutic vaccine construct based on these epitopes....

  6. Comparative Genomic Analysis of Meningitis- and Bacteremia-Causing Pneumococci Identifies a Common Core Genome

    Science.gov (United States)

    Cornick, Jennifer E.; Chaguza, Chrispin; Yalcin, Feyruz; Harris, Simon R.; Gray, Katherine J.; Kiran, Anmol M.; Molyneux, Elizabeth; French, Neil; Faragher, Brian E.; Everett, Dean B.; Bentley, Stephen D.

    2015-01-01

    Streptococcus pneumoniae is a nasopharyngeal commensal that occasionally invades normally sterile sites to cause bloodstream infection and meningitis. Although the pneumococcal population structure and evolutionary genetics are well defined, it is not clear whether pneumococci that cause meningitis are genetically distinct from those that do not. Here, we used whole-genome sequencing of 140 isolates of S. pneumoniae recovered from bloodstream infection (n = 70) and meningitis (n = 70) to compare their genetic contents. By fitting a double-exponential decaying-function model, we show that these isolates share a core of 1,427 genes (95% confidence interval [CI], 1,425 to 1,435 genes) and that there is no difference in the core genome or accessory gene content from these disease manifestations. Gene presence/absence alone therefore does not explain the virulence behavior of pneumococci that reach the meninges. Our analysis, however, supports the requirement of a range of previously described virulence factors and vaccine candidates for both meningitis- and bacteremia-causing pneumococci. This high-resolution view suggests that, despite considerable competency for genetic exchange, all pneumococci are under considerable pressure to retain key components advantageous for colonization and transmission and that these components are essential for access to and survival in sterile sites. PMID:26259813

  7. Genome-wide analysis of Tol2 transposon reintegration in zebrafish

    Directory of Open Access Journals (Sweden)

    Parinov Sergey

    2009-09-01

    Full Text Available Abstract Background Tol2, a member of the hAT family of transposons, has become a useful tool for genetic manipulation of model animals, but information about its interactions with vertebrate genomes is still limited. Furthermore, published reports on Tol2 have mainly been based on random integration of the transposon system after co-injection of a plasmid DNA harboring the transposon and a transposase mRNA. It is important to understand how Tol2 would behave upon activation after integration into the genome. Results We performed a large-scale enhancer trap (ET screen and generated 338 insertions of the Tol2 transposon-based ET cassette into the zebrafish genome. These insertions were generated by remobilizing the transposon from two different donor sites in two transgenic lines. We found that 39% of Tol2 insertions occurred in transcription units, mostly into introns. Analysis of the transposon target sites revealed no strict specificity at the DNA sequence level. However, Tol2 was prone to target AT-rich regions with weak palindromic consensus sequences centered at the insertion site. Conclusion Our systematic analysis of sequential remobilizations of the Tol2 transposon from two independent sites within a vertebrate genome has revealed properties such as a tendency to integrate into transcription units and into AT-rich palindrome-like sequences. This information will influence the development of various applications involving DNA transposons and Tol2 in particular.

  8. In Silico Genome Comparison and Distribution Analysis of Simple Sequences Repeats in Cassava

    Directory of Open Access Journals (Sweden)

    Andrea Vásquez

    2014-01-01

    Full Text Available We conducted a SSRs density analysis in different cassava genomic regions. The information obtained was useful to establish comparisons between cassava’s SSRs genomic distribution and those of poplar, flax, and Jatropha. In general, cassava has a low SSR density (~50 SSRs/Mbp and has a high proportion of pentanucleotides, (24,2 SSRs/Mbp. It was found that coding sequences have 15,5 SSRs/Mbp, introns have 82,3 SSRs/Mbp, 5′ UTRs have 196,1 SSRs/Mbp, and 3′ UTRs have 50,5 SSRs/Mbp. Through motif analysis of cassava’s genome SSRs, the most abundant motif was AT/AT while in intron sequences and UTRs regions it was AG/CT. In addition, in coding sequences the motif AAG/CTT was also found to occur most frequently; in fact, it is the third most used codon in cassava. Sequences containing SSRs were classified according to their functional annotation of Gene Ontology categories. The identified SSRs here may be a valuable addition for genetic mapping and future studies in phylogenetic analyses and genomic evolution.

  9. [Phylogenetic analysis of genomes of Vibrio cholerae strains isolated on the territory of Rostov region].

    Science.gov (United States)

    Kuleshov, K V; Markelov, M L; Dedkov, V G; Vodop'ianov, A S; Kermanov, A V; Pisanov, R V; Kruglikov, V D; Mazrukho, A B; Maleev, V V; Shipulin, G A

    2013-01-01

    Determination of origin of 2 Vibrio cholerae strains isolated on the territory of Rostov region by using full genome sequencing data. Toxigenic strain 2011 EL- 301 V. cholerae 01 El Tor Inaba No. 301 (ctxAB+, tcpA+) and nontoxigenic strain V. cholerae O1 Ogawa P- 18785 (ctxAB-, tcpA+) were studied. Sequencing was carried out on the MiSeq platform. Phylogenetic analysis of the genomes obtained was carried out based on comparison of conservative part of the studied and 54 previously sequenced genomes. 2011EL-301 strain genome was presented by 164 contigs with an average coverage of 100, N50 parameter was 132 kb, for strain P- 18785 - 159 contigs with a coverage of69, N50 - 83 kb. The contigs obtained for strain 2011 EL-301 were deposited in DDBJ/EMBL/GenBank databases with access code AJFN02000000, for strain P-18785 - ANHS00000000. 716 protein-coding orthologous genes were detected. Based on phylogenetic analysis strain P- 18785 belongs to PG-1 subgroup (a group of predecessor strains of the 7th pandemic). Strain 2011EL-301 belongs to groups of strains of the 7th pandemic and is included into the cluster with later isolates that are associated with cases of cholera in South Africa and cases of import of cholera to the USA from Pakistan. The data obtained allows to establish phylogenetic connections with V cholerae strains isolated earlier.

  10. Neandertal admixture in Eurasia confirmed by maximum-likelihood analysis of three genomes.

    Science.gov (United States)

    Lohse, Konrad; Frantz, Laurent A F

    2014-04-01

    Although there has been much interest in estimating histories of divergence and admixture from genomic data, it has proved difficult to distinguish recent admixture from long-term structure in the ancestral population. Thus, recent genome-wide analyses based on summary statistics have sparked controversy about the possibility of interbreeding between Neandertals and modern humans in Eurasia. Here we derive the probability of full mutational configurations in nonrecombining sequence blocks under both admixture and ancestral structure scenarios. Dividing the genome into short blocks gives an efficient way to compute maximum-likelihood estimates of parameters. We apply this likelihood scheme to triplets of human and Neandertal genomes and compare the relative support for a model of admixture from Neandertals into Eurasian populations after their expansion out of Africa against a history of persistent structure in their common ancestral population in Africa. Our analysis allows us to conclusively reject a model of ancestral structure in Africa and instead reveals strong support for Neandertal admixture in Eurasia at a higher rate (3.4-7.3%) than suggested previously. Using analysis and simulations we show that our inference is more powerful than previous summary statistics and robust to realistic levels of recombination.

  11. Comparative analysis of mitochondrial genomes of five aphid species (Hemiptera: Aphididae and phylogenetic implications.

    Directory of Open Access Journals (Sweden)

    Yuan Wang

    Full Text Available Insect mitochondrial genomes (mitogenomes are of great interest in exploring molecular evolution, phylogenetics and population genetics. Only two mitogenomes have been previously released in the insect group Aphididae, which consists of about 5,000 known species including some agricultural, forestry and horticultural pests. Here we report the complete 16,317 bp mitogenome of Cavariella salicicola and two nearly complete mitogenomes of Aphis glycines and Pterocomma pilosum. We also present a first comparative analysis of mitochondrial genomes of aphids. Results showed that aphid mitogenomes share conserved genomic organization, nucleotide and amino acid composition, and codon usage features. All 37 genes usually present in animal mitogenomes were sequenced and annotated. The analysis of gene evolutionary rate revealed the lowest and highest rates for COI and ATP8, respectively. A unique repeat region exclusively in aphid mitogenomes, which included variable numbers of tandem repeats in a lineage-specific manner, was highlighted for the first time. This region may have a function as another origin of replication. Phylogenetic reconstructions based on protein-coding genes and the stem-loop structures of control regions confirmed a sister relationship between Cavariella and pterocommatines. Current evidence suggest that pterocommatines could be formally transferred into Macrosiphini. Our paper also offers methodological instructions for obtaining other Aphididae mitochondrial genomes.

  12. Comparative Genomic Analysis of Mannheimia haemolytica from Bovine Sources.

    Science.gov (United States)

    Klima, Cassidy L; Cook, Shaun R; Zaheer, Rahat; Laing, Chad; Gannon, Vick P; Xu, Yong; Rasmussen, Jay; Potter, Andrew; Hendrick, Steve; Alexander, Trevor W; McAllister, Tim A

    2016-01-01

    Bovine respiratory disease is a common health problem in beef production. The primary bacterial agent involved, Mannheimia haemolytica, is a target for antimicrobial therapy and at risk for associated antimicrobial resistance development. The role of M. haemolytica in pathogenesis is linked to serotype with serotypes 1 (S1) and 6 (S6) isolated from pneumonic lesions and serotype 2 (S2) found in the upper respiratory tract of healthy animals. Here, we sequenced the genomes of 11 strains of M. haemolytica, representing all three serotypes and performed comparative genomics analysis to identify genetic features that may contribute to pathogenesis. Possible virulence associated genes were identified within 14 distinct prophage, including a periplasmic chaperone, a lipoprotein, peptidoglycan glycosyltransferase and a stress response protein. Prophage content ranged from 2-8 per genome, but was higher in S1 and S6 strains. A type I-C CRISPR-Cas system was identified in each strain with spacer diversity and organization conserved among serotypes. The majority of spacers occur in S1 and S6 strains and originate from phage suggesting that serotypes 1 and 6 may be more resistant to phage predation. However, two spacers complementary to the host chromosome targeting a UDP-N-acetylglucosamine 2-epimerase and a glycosyl transferases group 1 gene are present in S1 and S6 strains only indicating these serotypes may employ CRISPR-Cas to regulate gene expression to avoid host immune responses or enhance adhesion during infection. Integrative conjugative elements are present in nine of the eleven genomes. Three of these harbor extensive multi-drug resistance cassettes encoding resistance against the majority of drugs used to combat infection in beef cattle, including macrolides and tetracyclines used in human medicine. The findings here identify key features that are likely contributing to serotype related pathogenesis and specific targets for vaccine design intended to reduce the

  13. Comparative genomic analysis of Drosophila melanogaster and vector mosquito developmental genes.

    Directory of Open Access Journals (Sweden)

    Susanta K Behura

    Full Text Available Genome sequencing projects have presented the opportunity for analysis of developmental genes in three vector mosquito species: Aedes aegypti, Culex quinquefasciatus, and Anopheles gambiae. A comparative genomic analysis of developmental genes in Drosophila melanogaster and these three important vectors of human disease was performed in this investigation. While the study was comprehensive, special emphasis centered on genes that 1 are components of developmental signaling pathways, 2 regulate fundamental developmental processes, 3 are critical for the development of tissues of vector importance, 4 function in developmental processes known to have diverged within insects, and 5 encode microRNAs (miRNAs that regulate developmental transcripts in Drosophila. While most fruit fly developmental genes are conserved in the three vector mosquito species, several genes known to be critical for Drosophila development were not identified in one or more mosquito genomes. In other cases, mosquito lineage-specific gene gains with respect to D. melanogaster were noted. Sequence analyses also revealed that numerous repetitive sequences are a common structural feature of Drosophila and mosquito developmental genes. Finally, analysis of predicted miRNA binding sites in fruit fly and mosquito developmental genes suggests that the repertoire of developmental genes targeted by miRNAs is species-specific. The results of this study provide insight into the evolution of developmental genes and processes in dipterans and other arthropods, serve as a resource for those pursuing analysis of mosquito development, and will promote the design and refinement of functional analysis experiments.

  14. Genome-wide analysis of potential cross-reactive endogenous allergens in rice (Oryza sativa L.

    Directory of Open Access Journals (Sweden)

    Fang Chao Zhu

    2015-01-01

    Full Text Available The proteins in the food are the source of common allergic components to certain patients. Current lists of plant endogenous allergens were based on the medical/clinical reports as well as laboratory results. Plant genome sequences made it possible to predict and characterize the genome-wide of putative endogenous allergens in rice (Oryza sativa L.. In this work, we identified and characterized 122 candidate rice allergens including the 22 allergens in present databases. Conserved domain analysis also revealed 37 domains among rice allergens including one novel domain (histidine kinase-, DNA gyrase B-, and HSP90-like ATPase, PF13589 adding to the allergen protein database. Phylogenetic analysis of the allergens revealed the diversity among the Prolamin superfamily and DnaK protein family, respectively. Additionally, some allergens proteins clustered on the rice chromosome might suggest the molecular function during the evolution.

  15. Genome-scale analysis of positional clustering of mouse testis-specific genes

    Directory of Open Access Journals (Sweden)

    Lee Bernett TK

    2005-01-01

    Full Text Available Abstract Background Genes are not randomly distributed on a chromosome as they were thought even after removal of tandem repeats. The positional clustering of co-expressed genes is known in prokaryotes and recently reported in several eukaryotic organisms such as Caenorhabditis elegans, Drosophila melanogaster, and Homo sapiens. In order to further investigate the mode of tissue-specific gene clustering in higher eukaryotes, we have performed a genome-scale analysis of positional clustering of the mouse testis-specific genes. Results Our computational analysis shows that a large proportion of testis-specific genes are clustered in groups of 2 to 5 genes in the mouse genome. The number of clusters is much higher than expected by chance even after removal of tandem repeats. Conclusion Our result suggests that testis-specific genes tend to cluster on the mouse chromosomes. This provides another piece of evidence for the hypothesis that clusters of tissue-specific genes do exist.

  16. Genome and transcriptome analysis of the food-yeast Candida utilis.

    Directory of Open Access Journals (Sweden)

    Yasuyuki Tomita

    Full Text Available The industrially important food-yeast Candida utilis is a Crabtree effect-negative yeast used to produce valuable chemicals and recombinant proteins. In the present study, we conducted whole genome sequencing and phylogenetic analysis of C. utilis, which showed that this yeast diverged long before the formation of the CUG and Saccharomyces/Kluyveromyces clades. In addition, we performed comparative genome and transcriptome analyses using next-generation sequencing, which resulted in the identification of genes important for characteristic phenotypes of C. utilis such as those involved in nitrate assimilation, in addition to the gene encoding the functional hexose transporter. We also found that an antisense transcript of the alcohol dehydrogenase gene, which in silico analysis did not predict to be a functional gene, was transcribed in the stationary-phase, suggesting a novel system of repression of ethanol production. These findings should facilitate the development of more sophisticated systems for the production of useful reagents using C. utilis.

  17. Significance of genomic instability in breast cancer in atomic bomb survivors: analysis of microarray-comparative genomic hybridization

    Directory of Open Access Journals (Sweden)

    Oikawa Masahiro

    2011-12-01

    Full Text Available Abstract Background It has been postulated that ionizing radiation induces breast cancers among atomic bomb (A-bomb survivors. We have reported a higher incidence of HER2 and C-MYC oncogene amplification in breast cancers from A-bomb survivors. The purpose of this study was to clarify the effect of A-bomb radiation exposure on genomic instability (GIN, which is an important hallmark of carcinogenesis, in archival formalin-fixed paraffin-embedded (FFPE tissues of breast cancer by using microarray-comparative genomic hybridization (aCGH. Methods Tumor DNA was extracted from FFPE tissues of invasive ductal cancers from 15 survivors who were exposed at 1.5 km or less from the hypocenter and 13 calendar year-matched non-exposed patients followed by aCGH analysis using a high-density oligonucleotide microarray. The total length of copy number aberrations (CNA was used as an indicator of GIN, and correlation with clinicopathological factors were statistically tested. Results The mean of the derivative log ratio spread (DLRSpread, which estimates the noise by calculating the spread of log ratio differences between consecutive probes for all chromosomes, was 0.54 (range, 0.26 to 1.05. The concordance of results between aCGH and fluorescence in situ hybridization (FISH for HER2 gene amplification was 88%. The incidence of HER2 amplification and histological grade was significantly higher in the A-bomb survivors than control group (P = 0.04, respectively. The total length of CNA tended to be larger in the A-bomb survivors (P = 0.15. Correlation analysis of CNA and clinicopathological factors revealed that DLRSpread was negatively correlated with that significantly (P = 0.034, r = -0.40. Multivariate analysis with covariance revealed that the exposure to A-bomb was a significant (P = 0.005 independent factor which was associated with larger total length of CNA of breast cancers. Conclusions Thus, archival FFPE tissues from A-bomb survivors are useful for

  18. Construction of an integrated database to support genomic sequence analysis

    Energy Technology Data Exchange (ETDEWEB)

    Gilbert, W.; Overbeek, R.

    1994-11-01

    The central goal of this project is to develop an integrated database to support comparative analysis of genomes including DNA sequence data, protein sequence data, gene expression data and metabolism data. In developing the logic-based system GenoBase, a broader integration of available data was achieved due to assistance from collaborators. Current goals are to easily include new forms of data as they become available and to easily navigate through the ensemble of objects described within the database. This report comments on progress made in these areas.

  19. Analysis Of Segmental Duplications In The Pig Genome Based On Next-Generation Sequencing

    DEFF Research Database (Denmark)

    Fadista, João; Bendixen, Christian

    Segmental duplications are >1kb segments of duplicated DNA present in a genome with high sequence identity (>90%). They are associated with genomic rearrangements and provide a significant source of gene and genome evolution within mammalian genomes. Although segmental duplications have been...... extensively studied in other organisms, its analysis in pig has been hampered by the lack of a complete pig genome assembly. By measuring the depth of coverage of Illumina whole-genome shotgun sequencing reads of the Tabasco animal aligned to the latest pig genome assembly (Sus scrofa 10 – based also...... and their associated copy number alterations, focusing on the global organization of these segments and their possible functional significance in porcine phenotypes. This work provides insights into mammalian genome evolution and generates a valuable resource for porcine genomics research...

  20. CHESS (CgHExpreSS): a comprehensive analysis tool for the analysis of genomic alterations and their effects on the expression profile of the genome.

    Science.gov (United States)

    Lee, Mikyung; Kim, Yangseok

    2009-12-16

    Genomic alterations frequently occur in many cancer patients and play important mechanistic roles in the pathogenesis of cancer. Furthermore, they can modify the expression level of genes due to altered copy number in the corresponding region of the chromosome. An accumulating body of evidence supports the possibility that strong genome-wide correlation exists between DNA content and gene expression. Therefore, more comprehensive analysis is needed to quantify the relationship between genomic alteration and gene expression. A well-designed bioinformatics tool is essential to perform this kind of integrative analysis. A few programs have already been introduced for integrative analysis. However, there are many limitations in their performance of comprehensive integrated analysis using published software because of limitations in implemented algorithms and visualization modules. To address this issue, we have implemented the Java-based program CHESS to allow integrative analysis of two experimental data sets: genomic alteration and genome-wide expression profile. CHESS is composed of a genomic alteration analysis module and an integrative analysis module. The genomic alteration analysis module detects genomic alteration by applying a threshold based method or SW-ARRAY algorithm and investigates whether the detected alteration is phenotype specific or not. On the other hand, the integrative analysis module measures the genomic alteration's influence on gene expression. It is divided into two separate parts. The first part calculates overall correlation between comparative genomic hybridization ratio and gene expression level by applying following three statistical methods: simple linear regression, Spearman rank correlation and Pearson's correlation. In the second part, CHESS detects the genes that are differentially expressed according to the genomic alteration pattern with three alternative statistical approaches: Student's t-test, Fisher's exact test and Chi square

  1. Genomic analysis of primordial dwarfism reveals novel disease genes.

    Science.gov (United States)

    Shaheen, Ranad; Faqeih, Eissa; Ansari, Shinu; Abdel-Salam, Ghada; Al-Hassnan, Zuhair N; Al-Shidi, Tarfa; Alomar, Rana; Sogaty, Sameera; Alkuraya, Fowzan S

    2014-02-01

    Primordial dwarfism (PD) is a disease in which severely impaired fetal growth persists throughout postnatal development and results in stunted adult size. The condition is highly heterogeneous clinically, but the use of certain phenotypic aspects such as head circumference and facial appearance has proven helpful in defining clinical subgroups. In this study, we present the results of clinical and genomic characterization of 16 new patients in whom a broad definition of PD was used (e.g., 3M syndrome was included). We report a novel PD syndrome with distinct facies in two unrelated patients, each with a different homozygous truncating mutation in CRIPT. Our analysis also reveals, in addition to mutations in known PD disease genes, the first instance of biallelic truncating BRCA2 mutation causing PD with normal bone marrow analysis. In addition, we have identified a novel locus for Seckel syndrome based on a consanguineous multiplex family and identified a homozygous truncating mutation in DNA2 as the likely cause. An additional novel PD disease candidate gene XRCC4 was identified by autozygome/exome analysis, and the knockout mouse phenotype is highly compatible with PD. Thus, we add a number of novel genes to the growing list of PD-linked genes, including one which we show to be linked to a novel PD syndrome with a distinct facial appearance. PD is extremely heterogeneous genetically and clinically, and genomic tools are often required to reach a molecular diagnosis.

  2. A fast and robust method for full genome sequencing of Porcine Reproductive and Respiratory Syndrome Virus (PRRSV) Type 1 and Type 2

    DEFF Research Database (Denmark)

    Kvisgaard, Lise Kirstine; Hjulsager, Charlotte Kristiane; Fahnøe, Ulrik

    2013-01-01

    . In the present study, fast and robust methods for long range RT-PCR amplification and subsequent next generation sequencing (NGS) were developed and validated on nine Type 1 and nine Type 2 PRRSV viruses. The methods generated robust and reliable sequences both on primary material and cell culture adapted...... viruses and the protocols performed well on all three NGS platforms tested (Roche 454 FLX, Illumina HiSeq2000, and Ion Torrent PGM™ Sequencer). These methods will greatly facilitate the generation of more full genome PRRSV sequences globally....

  3. Comparative analysis of codon usage bias and codon context patterns between dipteran and hymenopteran sequenced genomes.

    Directory of Open Access Journals (Sweden)

    Susanta K Behura

    Full Text Available BACKGROUND: Codon bias is a phenomenon of non-uniform usage of codons whereas codon context generally refers to sequential pair of codons in a gene. Although genome sequencing of multiple species of dipteran and hymenopteran insects have been completed only a few of these species have been analyzed for codon usage bias. METHODS AND PRINCIPAL FINDINGS: Here, we use bioinformatics approaches to analyze codon usage bias and codon context patterns in a genome-wide manner among 15 dipteran and 7 hymenopteran insect species. Results show that GAA is the most frequent codon in the dipteran species whereas GAG is the most frequent codon in the hymenopteran species. Data reveals that codons ending with C or G are frequently used in the dipteran genomes whereas codons ending with A or T are frequently used in the hymenopteran genomes. Synonymous codon usage orders (SCUO vary within genomes in a pattern that seems to be distinct for each species. Based on comparison of 30 one-to-one orthologous genes among 17 species, the fruit fly Drosophila willistoni shows the least codon usage bias whereas the honey bee (Apis mellifera shows the highest bias. Analysis of codon context patterns of these insects shows that specific codons are frequently used as the 3'- and 5'-context of start and stop codons, respectively. CONCLUSIONS: Codon bias pattern is distinct between dipteran and hymenopteran insects. While codon bias is favored by high GC content of dipteran genomes, high AT content of genes favors biased usage of synonymous codons in the hymenopteran insects. Also, codon context patterns vary among these species largely according to their phylogeny.

  4. Comparative genomic analysis of the multispecies probiotic-marketed product VSL#3.

    Directory of Open Access Journals (Sweden)

    François P Douillard

    Full Text Available Several probiotic-marketed formulations available for the consumers contain live lactic acid bacteria and/or bifidobacteria. The multispecies product commercialized as VSL#3 has been used for treating various gastro-intestinal disorders. However, like many other products, the bacterial strains present in VSL#3 have only been characterized to a limited extent and their efficacy as well as their predicted mode of action remain unclear, preventing further applications or comparative studies. In this work, the genomes of all eight bacterial strains present in VSL#3 were sequenced and characterized, to advance insights into the possible mode of action of this product and also to serve as a basis for future work and trials. Phylogenetic and genomic data analysis allowed us to identify the 7 species present in the VSL#3 product as specified by the manufacturer. The 8 strains present belong to the species Streptococcus thermophilus, Lactobacillus acidophilus, Lactobacillus paracasei, Lactobacillus plantarum, Lactobacillus helveticus, Bifidobacterium breve and B. animalis subsp. lactis (two distinct strains. Comparative genomics revealed that the draft genomes of the S. thermophilus and L. helveticus strains were predicted to encode most of the defence systems such as restriction modification and CRISPR-Cas systems. Genes associated with a variety of potential probiotic functions were also identified. Thus, in the three Bifidobacterium spp., gene clusters were predicted to encode tight adherence pili, known to promote bacteria-host interaction and intestinal barrier integrity, and to impact host cell development. Various repertoires of putative signalling proteins were predicted to be encoded by the genomes of the Lactobacillus spp., i.e. surface layer proteins, LPXTG-containing proteins, or sortase-dependent pili that may interact with the intestinal mucosa and dendritic cells. Taken altogether, the individual genomic characterization of the strains

  5. Genome wide characterization of simple sequence repeats in watermelon genome and their application in comparative mapping and genetic diversity analysis.

    Science.gov (United States)

    Zhu, Huayu; Song, Pengyao; Koo, Dal-Hoe; Guo, Luqin; Li, Yanman; Sun, Shouru; Weng, Yiqun; Yang, Luming

    2016-08-05

    Microsatellite markers are one of the most informative and versatile DNA-based markers used in plant genetic research, but their development has traditionally been difficult and costly. The whole genome sequencing with next-generation sequencing (NGS) technologies provides large amounts of sequence data to develop numerous microsatellite markers at whole genome scale. SSR markers have great advantage in cross-species comparisons and allow investigation of karyotype and genome evolution through highly efficient computation approaches such as in silico PCR. Here we described genome wide development and characterization of SSR markers in the watermelon (Citrullus lanatus) genome, which were then use in comparative analysis with two other important crop species in the Cucurbitaceae family: cucumber (Cucumis sativus L.) and melon (Cucumis melo L.). We further applied these markers in evaluating the genetic diversity and population structure in watermelon germplasm collections. A total of 39,523 microsatellite loci were identified from the watermelon draft genome with an overall density of 111 SSRs/Mbp, and 32,869 SSR primers were designed with suitable flanking sequences. The dinucleotide SSRs were the most common type representing 34.09 % of the total SSR loci and the AT-rich motifs were the most abundant in all nucleotide repeat types. In silico PCR analysis identified 832 and 925 SSR markers with each having a single amplicon in the cucumber and melon draft genome, respectively. Comparative analysis with these cross-species SSR markers revealed complicated mosaic patterns of syntenic blocks among the genomes of three species. In addition, genetic diversity analysis of 134 watermelon accessions with 32 highly informative SSR loci placed these lines into two groups with all accessions of C.lanatus var. citorides and three accessions of C. colocynthis clustered in one group and all accessions of C. lanatus var. lanatus and the remaining accessions of C. colocynthis

  6. Generation and Analysis of Full-length cDNA Sequences from Elephant Shark (Callorhinchus milii)

    KAUST Repository

    Kodzius, Rimantas; Tay, Boon-Hui; Tan, Yue Ying; Brenner, Sydney; Venkatesh, Byrappa

    2009-01-01

    Cartilaginous fishes are the oldest living group of jawed vertebrates and therefore is an important group for understanding the evolution of vertebrate genomes including the human genome. Our laboratory has proposed elephant shark (C. milii) as a

  7. Microbial Genome Analysis and Comparisons: Web-based Protocols and Resources

    Science.gov (United States)

    Fully annotated genome sequences of many microorganisms are publicly available as a resource. However, in-depth analysis of these genomes using specialized tools is required to derive meaningful information. We describe here the utility of three powerful publicly available genome databases and ana...

  8. IMG 4 version of the integrated microbial genomes comparative analysis system

    Science.gov (United States)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Pillay, Manoj; Ratner, Anna; Huang, Jinghua; Woyke, Tanja; Huntemann, Marcel; Anderson, Iain; Billis, Konstantinos; Varghese, Neha; Mavromatis, Konstantinos; Pati, Amrita; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2014-01-01

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu). PMID:24165883

  9. IMG 4 version of the integrated microbial genomes comparative analysis system

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chen, I-Min A. [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Palaniappan, Krishna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Chu, Ken [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Szeto, Ernest [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Pillay, Manoj [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Ratner, Anna [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Huang, Jinghua [Lawrence Berkeley National Lab. (LBNL), Berkeley, CA (United States). Biological Data Management and Technology Center. Computational Research Division; Woyke, Tanja [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Huntemann, Marcel [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Anderson, Iain [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Billis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Varghese, Neha [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Mavromatis, Konstantinos [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Pati, Amrita [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Ivanova, Natalia N. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program; Kyrpides, Nikos C. [USDOE Joint Genome Institute (JGI), Walnut Creek, CA (United States). Microbial Genome and Metagenome Program

    2013-10-27

    The Integrated Microbial Genomes (IMG) data warehouse integrates genomes from all three domains of life, as well as plasmids, viruses and genome fragments. IMG provides tools for analyzing and reviewing the structural and functional annotations of genomes in a comparative context. IMG’s data content and analytical capabilities have increased continuously since its first version released in 2005. Since the last report published in the 2012 NAR Database Issue, IMG’s annotation and data integration pipelines have evolved while new tools have been added for recording and analyzing single cell genomes, RNA Seq and biosynthetic cluster data. Finally, different IMG datamarts provide support for the analysis of publicly available genomes (IMG/W: http://img.jgi.doe.gov/w), expert review of genome annotations (IMG/ER: http://img.jgi.doe.gov/er) and teaching and training in the area of microbial genome analysis (IMG/EDU: http://img.jgi.doe.gov/edu).

  10. REFGEN and TREENAMER: Automated Sequence Data Handling for Phylogenetic Analysis in the Genomic Era

    Directory of Open Access Journals (Sweden)

    Guy Leonard

    2009-01-01

    Full Text Available The phylogenetic analysis of nucleotide sequences and increasingly that of amino acid sequences is used to address a number of biological questions. Access to extensive datasets, including numerous genome projects, means that standard phylogenetic analyses can include many hundreds of sequences. Unfortunately, most phylogenetic analysis programs do not tolerate the sequence naming conventions of genome databases. Managing large numbers of sequences and standardizing sequence labels for use in phylogenetic analysis programs can be a time consuming and laborious task. Here we report the availability of an online resource for the management of gene sequences recovered from public access genome databases such as GenBank. These web utilities include the facility for renaming every sequence in a FASTA alignment fi le, with each sequence label derived from a user-defined combination of the species name and/or database accession number. This facility enables the user to keep track of the branching order of the sequences/taxa during multiple tree calculations and re-optimisations. Post phylogenetic analysis, these webpages can then be used to rename every label in the subsequent tree fi les (with a user-defined combination of species name and/or database accession number. Together these programs drastically reduce the time required for managing sequence alignments and labelling phylogenetic figures. Additional features of our platform include the automatic removal of identical accession numbers (recorded in the report file and generation of species and accession number lists for use in supplementary materials or figure legends.

  11. Prospects and limitations of full-text index structures in genome analysis

    Science.gov (United States)

    Vyverman, Michaël; De Baets, Bernard; Fack, Veerle; Dawyndt, Peter

    2012-01-01

    The combination of incessant advances in sequencing technology producing large amounts of data and innovative bioinformatics approaches, designed to cope with this data flood, has led to new interesting results in the life sciences. Given the magnitude of sequence data to be processed, many bioinformatics tools rely on efficient solutions to a variety of complex string problems. These solutions include fast heuristic algorithms and advanced data structures, generally referred to as index structures. Although the importance of index structures is generally known to the bioinformatics community, the design and potency of these data structures, as well as their properties and limitations, are less understood. Moreover, the last decade has seen a boom in the number of variant index structures featuring complex and diverse memory-time trade-offs. This article brings a comprehensive state-of-the-art overview of the most popular index structures and their recently developed variants. Their features, interrelationships, the trade-offs they impose, but also their practical limitations, are explained and compared. PMID:22584621

  12. Determination and analysis of the full-length chicken parvovirus genome.

    Science.gov (United States)

    Viral enteric disease in poultry is an ongoing problem in many parts of the world. Many enteric viruses have been identified in turkeys and chickens, including avian astroviruses, rotaviruses, reoviruses, and coronaviruses. Through the application of a molecular screening method targeting particle-a...

  13. Comparative Genomics and Transcriptomic Analysis of Mycobacterium Kansasii

    KAUST Repository

    Alzahid, Yara

    2014-04-01

    The group of Mycobacteria is one of the most intensively studied bacterial taxa, as they cause the two historical and worldwide known diseases: leprosy and tuberculosis. Mycobacteria not identified as tuberculosis or leprosy complex, have been referred to by ‘environmental mycobacteria’ or ‘Nontuberculous mycobacteria (NTM). Mycobacterium kansasii (M. kansasii) is one of the most frequent NTM pathogens, as it causes pulmonary disease in immuno-competent patients and pulmonary, and disseminated disease in patients with various immuno-deficiencies. There have been five documented subtypes of this bacterium, by different molecular typing methods, showing that type I causes tuberculosis-like disease in healthy individuals, and type II in immune-compromised individuals. The remaining types are said to be environmental, thereby, not causing any diseases. The aim of this project was to conduct a comparative genomic study of M. kansasii types I-V and investigating the gene expression level of those types. From various comparative genomics analysis, provided genomics evidence on why M. kansasii type I is considered pathogenic, by focusing on three key elements that are involved in virulence of Mycobacteria: ESX secretion system, Phospholipase c (plcb) and Mammalian cell entry (Mce) operons. The results showed the lack of the espA operon in types II-V, which renders the ESX- 1 operon dysfunctional, as espA is one of the key factors that control this secretion system. However, gene expression analysis showed this operon to be deleted in types II, III and IV. Furthermore, plcB was found to be truncated in types III and IV. Analysis of Mce operons (1-4) show that mce-1 operon is duplicated, mce-2 is absent and mce-3 and mce-4 is present in one copy in M. kansasii types I-V. Gene expression profiles of type I-IV, showed that the secreted proteins of ESX-1 were slightly upregulated in types II-IV when compared to type I and the secreted forms of ESX-5 were highly down

  14. Full Life Cycle of Data Analysis with Climate Model Diagnostic Analyzer (CMDA)

    Science.gov (United States)

    Lee, S.; Zhai, C.; Pan, L.; Tang, B.; Zhang, J.; Bao, Q.; Malarout, N.

    2017-12-01

    We have developed a system that supports the full life cycle of a data analysis process, from data discovery, to data customization, to analysis, to reanalysis, to publication, and to reproduction. The system called Climate Model Diagnostic Analyzer (CMDA) is designed to demonstrate that the full life cycle of data analysis can be supported within one integrated system for climate model diagnostic evaluation with global observational and reanalysis datasets. CMDA has four subsystems that are highly integrated to support the analysis life cycle. Data System manages datasets used by CMDA analysis tools, Analysis System manages CMDA analysis tools which are all web services, Provenance System manages the meta data of CMDA datasets and the provenance of CMDA analysis history, and Recommendation System extracts knowledge from CMDA usage history and recommends datasets/analysis tools to users. These four subsystems are not only highly integrated but also easily expandable. New datasets can be easily added to Data System and scanned to be visible to the other subsystems. New analysis tools can be easily registered to be available in the Analysis System and Provenance System. With CMDA, a user can start a data analysis process by discovering datasets of relevance to their research topic using the Recommendation System. Next, the user can customize the discovered datasets for their scientific use (e.g. anomaly calculation, regridding, etc) with tools in the Analysis System. Next, the user can do their analysis with the tools (e.g. conditional sampling, time averaging, spatial averaging) in the Analysis System. Next, the user can reanalyze the datasets based on the previously stored analysis provenance in the Provenance System. Further, they can publish their analysis process and result to the Provenance System to share with other users. Finally, any user can reproduce the published analysis process and results. By supporting the full life cycle of climate data analysis

  15. GENOMIC ANALYSIS OF PLANT-ASSOCIATED BACTERIA AND THEIR POTENTIAL IN ENHANCING PHYTOREMEDIATION EFFICIENCY

    Directory of Open Access Journals (Sweden)

    Artur Piński

    2017-07-01

    Full Text Available Phytoremediation is an emerging technology that uses plants in order to cleanup pollutants including xenobiotics and heavy metals from soil, water and air. Inoculation of plants with plant growth promoting endophytic and rhizospheric bacteria can enhance efficiency of phytoremediation. Genomic analysis of four plant-associated strains belonging to the Stenotrophomonas maltophilia species revealed the presence of genes encoding proteins involved in plant growth promotion, biocontrol of phytopathogens, biodegradation of xenobiotics, heavy metals resistance and plant-bacteria-environment interaction. The results of this analysis suggest great potential of bacteria belonging to Stenotrophomonas maltophilia species in enhancing phytoremediation efficiency.

  16. Main: Nucleotide Analysis [KOME

    Lifescience Database Archive (English)

    Full Text Available Nucleotide Analysis Japonica genome blast search result Result of blastn search against jap...onica genome sequence kome_japonica_genome_blast_search_result.zip kome_japonica_genome_blast_search_result ...

  17. Sequencing, mapping, and analysis of 27,455 maize full-length cDNAs.

    Directory of Open Access Journals (Sweden)

    Carol Soderlund

    2009-11-01

    Full Text Available Full-length cDNA (FLcDNA sequencing establishes the precise primary structure of individual gene transcripts. From two libraries representing 27 B73 tissues and abiotic stress treatments, 27,455 high-quality FLcDNAs were sequenced. The average transcript length was 1.44 kb including 218 bases and 321 bases of 5' and 3' UTR, respectively, with 8.6% of the FLcDNAs encoding predicted proteins of fewer than 100 amino acids. Approximately 94% of the FLcDNAs were stringently mapped to the maize genome. Although nearly two-thirds of this genome is composed of transposable elements (TEs, only 5.6% of the FLcDNAs contained TE sequences in coding or UTR regions. Approximately 7.2% of the FLcDNAs are putative transcription factors, suggesting that rare transcripts are well-enriched in our FLcDNA set. Protein similarity searching identified 1,737 maize transcripts not present in rice, sorghum, Arabidopsis, or poplar annotated genes. A strict FLcDNA assembly generated 24,467 non-redundant sequences, of which 88% have non-maize protein matches. The FLcDNAs were also assembled with 41,759 FLcDNAs in GenBank from other projects, where semi-strict parameters were used to identify 13,368 potentially unique non-redundant sequences from this project. The libraries, ESTs, and FLcDNA sequences produced from this project are publicly available. The annotated EST and FLcDNA assemblies are available through the maize FLcDNA web resource (www.maizecdna.org.

  18. Comparative Analysis of the Complete Chloroplast Genomes of Four Aconitum Medicinal Species

    Directory of Open Access Journals (Sweden)

    Jing Meng

    2018-04-01

    Full Text Available Aconitum (Ranunculaceae consists of approximately 400 species distributed in the temperate regions of the northern hemisphere. Many species are well-known herbs, mainly used for analgesia and anti-inflammatory purposes. This genus is well represented in China and has gained widespread attention for its toxicity and detoxification properties. In southwestern China, several Aconitum species, called ‘Dula’ in the Yi Nationality, were often used to control the poisonous effects of other Aconitum plants. In this study, the complete chloroplast (cp genomes of these species were determined for the first time through Illumina paired-end sequencing. Our results indicate that their cp genomes ranged from 151,214 bp (A. episcopale to 155,769 bp (A. delavayi in length. A total of 111–112 unique genes were identified, including 85 protein-coding genes, 36–37 tRNA genes and eight ribosomal RNA genes (rRNA. We also analyzed codon usage, IR expansion or contraction and simple sequence repeats in the cp genomes. Eight variable regions were identified and these may potentially be useful as specific DNA barcodes for species identification of Aconitum. Phylogenetic analysis revealed that all five studied species formed a new clade and were resolved with 100% bootstrap support. This study will provide genomic resources and potential plastid markers for DNA barcoding, further taxonomy and germplasm exploration of Aconitum.

  19. Phenotypic and genomic analysis of serotype 3 Sabin poliovirus vaccine produced in MRC-5 cell substrate.

    Science.gov (United States)

    Alirezaie, Behnam; Taqavian, Mohammad; Aghaiypour, Khosrow; Esna-Ashari, Fatemeh; Shafyi, Abbas

    2011-05-01

    The cell substrate has a pivotal role in live virus vaccines production. It is necessary to evaluate the effects of the cell substrate on the properties of the propagated viruses, especially in the case of viruses which are unstable genetically such as polioviruses, by monitoring the molecular and phenotypical characteristics of harvested viruses. To investigate the presence/absence of mutation(s), the near full-length genomic sequence of different harvests of the type 3 Sabin strain of poliovirus propagated in MRC-5 cells were determined. The sequences were compared with genomic sequences of different virus seeds, vaccines, and OPV-like isolates. Nearly complete genomic sequencing results, however, revealed no detectable mutations throughout the genome RNA-plaque purified (RSO)-derived monopool of type 3 OPVs manufactured in MRC-5. Thirty-six years of experience in OPV production, trend analysis, and vaccine surveillance also suggest that: (i) different monopools of serotype 3 OPV produced in MRC-5 retained their phenotypic characteristics (temperature sensitivity and neuroattenuation), (ii) MRC-5 cells support the production of acceptable virus yields, (iii) OPV replicated in the MRC-5 cell substrate is a highly efficient and safe vaccine. These results confirm previous reports that MRC-5 is a desirable cell substrate for the production of OPV. Copyright © 2011 Wiley-Liss, Inc.

  20. Complete sequence and comparative analysis of the chloroplast genome of Plinia trunciflora

    Directory of Open Access Journals (Sweden)

    Maria Eguiluz

    2017-11-01

    Full Text Available Abstract Plinia trunciflora is a Brazilian native fruit tree from the Myrtaceae family, also known as jaboticaba. This species has great potential by its fruit production. Due to the high content of essential oils in their leaves and of anthocyanins in the fruits, there is also an increasing interest by the pharmaceutical industry. Nevertheless, there are few studies focusing on its molecular biology and genetic characterization. We herein report the complete chloroplast (cp genome of P. trunciflora using high-throughput sequencing and compare it to other previously sequenced Myrtaceae genomes. The cp genome of P. trunciflora is 159,512 bp in size, comprising inverted repeats of 26,414 bp and single-copy regions of 88,097 bp (LSC and 18,587 bp (SSC. The genome contains 111 single-copy genes (77 protein-coding, 30 tRNA and four rRNA genes. Phylogenetic analysis using 57 cp protein-coding genes demonstrated that P. trunciflora, Eugenia uniflora and Acca sellowiana form a cluster with closer relationship to Syzygium cumini than with Eucalyptus. The complete cp sequence reported here can be used in evolutionary and population genetics studies, contributing to resolve the complex taxonomy of this species and fill the gap in genetic characterization.

  1. Genome size variation among and within Camellia species by using flow cytometric analysis.

    Directory of Open Access Journals (Sweden)

    Hui Huang

    Full Text Available BACKGROUND: The genus Camellia, belonging to the family Theaceae, is economically important group in flowering plants. Frequent interspecific hybridization together with polyploidization has made them become taxonomically "difficult taxa". The DNA content is often used to measure genome size variation and has largely advanced our understanding of plant evolution and genome variation. The goals of this study were to investigate patterns of interspecific and intraspecific variation of DNA contents and further explore genome size evolution in a phylogenetic context of the genus. METHODOLOGY/PRINCIPAL FINDINGS: The DNA amount in the genus was determined by using propidium iodide flow cytometry analysis for a total of 139 individual plants representing almost all sections of the two subgenera, Camellia and Thea. An improved WPB buffer was proven to be suitable for the Camellia species, which was able to counteract the negative effects of secondary metabolite and generated high-quality results with low coefficient of variation values (CV <5%. Our results showed trivial effects on different tissues of flowers, leaves and buds as well as cytosolic compounds on the estimation of DNA amount. The DNA content of C. sinensis var. assamica was estimated to be 1C = 3.01 pg by flow cytometric analysis, which is equal to a genome size of about 2940 Mb. CONCLUSION: Intraspecific and interspecific variations were observed in the genus Camellia, and as expected, the latter was larger than the former. Our study suggests a directional trend of increasing genome size in the genus Camellia probably owing to the frequent polyploidization events.

  2. Inconsistencies of genome annotations in apicomplexan parasites revealed by 5'-end-one-pass and full-length sequences of oligo-capped cDNAs

    Directory of Open Access Journals (Sweden)

    Sugano Sumio

    2009-07-01

    Full Text Available Abstract Background Apicomplexan parasites are causative agents of various diseases including malaria and have been targets of extensive genomic sequencing. We generated 5'-EST collections for six apicomplexa parasites using our full-length oligo-capping cDNA library method. To improve upon the current genome annotations, as well as to validate the importance for physical cDNA clone resources, we generated a large-scale collection of full-length cDNAs for several apicomplexa parasites. Results In this study, we used a total of 61,056 5'-end-single-pass cDNA sequences from Plasmodium falciparum, P. vivax, P. yoelii, P. berghei, Cryptosporidium parvum, and Toxoplasma gondii. We compared these partially sequenced cDNA sequences with the currently annotated gene models and observed significant inconsistencies between the two datasets. In particular, we found that on average 14% of the exons in the current gene models were not supported by any cDNA evidence, and that 16% of the current gene models may contain at least one mis-annotation and should be re-evaluated. We also identified a large number of transcripts that had been previously unidentified. For 732 cDNAs in T. gondii, the entire sequences were determined in order to evaluate the annotated gene models at the complete full-length transcript level. We found that 41% of the T. gondii gene models contained at least one inconsistency. We also identified and confirmed by RT-PCR 140 previously unidentified transcripts found in the intergenic regions of the current gene annotations. We show that the majority of these discrepancies are due to questionable predictions of one or two extra exons in the upstream or downstream regions of the genes. Conclusion Our data indicates that the current gene models are likely to still be incomplete and have much room for improvement. Our unique full-length cDNA information is especially useful for further refinement of the annotations for the genomes of

  3. Genomic analysis suggests higher susceptibility of children to air pollution

    DEFF Research Database (Denmark)

    van Leeuwen, Danitsja M; Pedersen, Marie; Hendriksen, Peter J M

    2008-01-01

    modulated gene expressions. In addition, gene expressions in both children and adults were investigated for associations with micronuclei frequencies. Both analysis approaches returned considerably more genes or gene groups and pathways that significantly differed between children from both regions than......Differences in biological responses to exposure to hazardous airborne substances between children and adults have been reported, suggesting children to be more susceptible. Aim of this study was to improve our understanding of differences in susceptibility in cancer risk associated with air...... pollution by comparing genome-wide gene expression profiles in peripheral blood of children and their parents. Gene expression analysis was performed in blood from children and parents living in two different regions in the Czech Republic with different levels of air pollution. Data were analyzed by two...

  4. Use of application containers and workflows for genomic data analysis

    Science.gov (United States)

    Schulz, Wade L.; Durant, Thomas J. S.; Siddon, Alexa J.; Torres, Richard

    2016-01-01

    Background: The rapid acquisition of biological data and development of computationally intensive analyses has led to a need for novel approaches to software deployment. In particular, the complexity of common analytic tools for genomics makes them difficult to deploy and decreases the reproducibility of computational experiments. Methods: Recent technologies that allow for application virtualization, such as Docker, allow developers and bioinformaticians to isolate these applications and deploy secure, scalable platforms that have the potential to dramatically increase the efficiency of big data processing. Results: While limitations exist, this study demonstrates a successful implementation of a pipeline with several discrete software applications for the analysis of next-generation sequencing (NGS) data. Conclusions: With this approach, we significantly reduced the amount of time needed to perform clonal analysis from NGS data in acute myeloid leukemia. PMID:28163975

  5. Establishing a framework for comparative analysis of genome sequences

    Energy Technology Data Exchange (ETDEWEB)

    Bansal, A.K.

    1995-06-01

    This paper describes a framework and a high-level language toolkit for comparative analysis of genome sequence alignment The framework integrates the information derived from multiple sequence alignment and phylogenetic tree (hypothetical tree of evolution) to derive new properties about sequences. Multiple sequence alignments are treated as an abstract data type. Abstract operations have been described to manipulate a multiple sequence alignment and to derive mutation related information from a phylogenetic tree by superimposing parsimonious analysis. The framework has been applied on protein alignments to derive constrained columns (in a multiple sequence alignment) that exhibit evolutionary pressure to preserve a common property in a column despite mutation. A Prolog toolkit based on the framework has been implemented and demonstrated on alignments containing 3000 sequences and 3904 columns.

  6. Molecular and Biological Characterization of an Isolate of Cucumber mosaic virus from Glycine soja by Generating its Infectious Full-genome cDNA Clones

    Directory of Open Access Journals (Sweden)

    Mi Sa Vo Phan

    2014-06-01

    Full Text Available Molecular and biological characteristics of an isolate of Cucumber mosaic virus (CMV from Glycine soja (wild soybean, named as CMV-209, was examined in this study. Comparison of nucleotide sequences and phylogenetic analyses of CMV-209 with the other CMV strains revealed that CMV-209 belonged to CMV subgroup I. However, CMV-209 showed some genetic distance from the CMV strains assigned to subgroup IA or subgroup IB. Infectious full-genome cDNA clones of CMV-209 were generated under the control of the Cauliflower mosaic virus 35S promoter. Infectivity of the CMV-209 clones was evaluated in Nicotiana benthamiana and various legume species. Our assays revealed that CMV-209 could systemically infect Glycine soja (wild soybean and Pisum sativum (pea as well as N. benthamiana, but not the other legume species.

  7. Genome-wide DNA methylation patterns and transcription analysis in sheep muscle.

    Directory of Open Access Journals (Sweden)

    Christine Couldrey

    Full Text Available DNA methylation plays a central role in regulating many aspects of growth and development in mammals through regulating gene expression. The development of next generation sequencing technologies have paved the way for genome-wide, high resolution analysis of DNA methylation landscapes using methodology known as reduced representation bisulfite sequencing (RRBS. While RRBS has proven to be effective in understanding DNA methylation landscapes in humans, mice, and rats, to date, few studies have utilised this powerful method for investigating DNA methylation in agricultural animals. Here we describe the utilisation of RRBS to investigate DNA methylation in sheep Longissimus dorsi muscles. RRBS analysis of ∼1% of the genome from Longissimus dorsi muscles provided data of suitably high precision and accuracy for DNA methylation analysis, at all levels of resolution from genome-wide to individual nucleotides. Combining RRBS data with mRNAseq data allowed the sheep Longissimus dorsi muscle methylome to be compared with methylomes from other species. While some species differences were identified, many similarities were observed between DNA methylation patterns in sheep and other more commonly studied species. The RRBS data presented here highlights the complexity of epigenetic regulation of genes. However, the similarities observed across species are promising, in that knowledge gained from epigenetic studies in human and mice may be applied, with caution, to agricultural species. The ability to accurately measure DNA methylation in agricultural animals will contribute an additional layer of information to the genetic analyses currently being used to maximise production gains in these species.

  8. Genome-scale metabolic network validation of Shewanella oneidensis using transposon insertion frequency analysis.

    Directory of Open Access Journals (Sweden)

    Hong Yang

    2014-09-01

    Full Text Available Transposon mutagenesis, in combination with parallel sequencing, is becoming a powerful tool for en-masse mutant analysis. A probability generating function was used to explain observed miniHimar transposon insertion patterns, and gene essentiality calls were made by transposon insertion frequency analysis (TIFA. TIFA incorporated the observed genome and sequence motif bias of the miniHimar transposon. The gene essentiality calls were compared to: 1 previous genome-wide direct gene-essentiality assignments; and, 2 flux balance analysis (FBA predictions from an existing genome-scale metabolic model of Shewanella oneidensis MR-1. A three-way comparison between FBA, TIFA, and the direct essentiality calls was made to validate the TIFA approach. The refinement in the interpretation of observed transposon insertions demonstrated that genes without insertions are not necessarily essential, and that genes that contain insertions are not always nonessential. The TIFA calls were in reasonable agreement with direct essentiality calls for S. oneidensis, but agreed more closely with E. coli essentiality calls for orthologs. The TIFA gene essentiality calls were in good agreement with the MR-1 FBA essentiality predictions, and the agreement between TIFA and FBA predictions was substantially better than between the FBA and the direct gene essentiality predictions.

  9. A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis

    Directory of Open Access Journals (Sweden)

    Stajich Jason E

    2006-11-01

    Full Text Available Abstract Background To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available. Results A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset. Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD, and their close

  10. Coevolution of aah: A dps-Like Gene with the Host Bacterium Revealed by Comparative Genomic Analysis

    Directory of Open Access Journals (Sweden)

    Liyan Ping

    2012-01-01

    Full Text Available A protein named AAH was isolated from the bacterium Microbacterium arborescens SE14, a gut commensal of the lepidopteran larvae. It showed not only a high sequence similarity to Dps-like proteins (DNA-binding proteins from starved cell but also reversible hydrolase activity. A comparative genomic analysis was performed to gain more insights into its evolution. The GC profile of the aah gene indicated that it was evolved from a low GC ancestor. Its stop codon usage was also different from the general pattern of Actinobacterial genomes. The phylogeny of dps-like proteins showed strong correlation with the phylogeny of host bacteria. A conserved genomic synteny was identified in some taxonomically related Actinobacteria, suggesting that the ancestor genes had incorporated into the genome before the divergence of Micrococcineae from other families. The aah gene had evolved new function but still retained the typical dodecameric structure.

  11. Complete mitochondrial genome of the aluminum-tolerant fungus Rhodotorula taiwanensis RS1 and comparative analysis of Basidiomycota mitochondrial genomes.

    Science.gov (United States)

    Zhao, Xue Qiang; Aizawa, Tomoko; Schneider, Jessica; Wang, Chao; Shen, Ren Fang; Sunairi, Michio

    2013-04-01

    The complete mitochondrial genome of Rhodotorula taiwanensis RS1, an aluminum-tolerant Basidiomycota fungus, was determined and compared with the known mitochondrial genomes of 12 Basidiomycota species. The mitochondrial genome of R. taiwanensis RS1 is a circular DNA molecule of 40,392 bp and encodes the typical 15 mitochondrial proteins, 23 tRNAs, and small and large rRNAs as well as 10 intronic open reading frames. These genes are apparently transcribed in two directions and do not show syntenies in gene order with other investigated Basidiomycota species. The average G+C content (41%) of the mitochondrial genome of R. taiwanensis RS1 is the highest among the Basidiomycota species. Two introns were detected in the sequence of the atp9 gene of R. taiwanensis RS1, but not in that of other Basidiomycota species. Rhodotorula taiwanensis is the first species of the genus Rhodotorula whose full mitochondrial genome has been sequenced; and the data presented here supply valuable information for understanding the evolution of fungal mitochondrial genomes and researching the mechanism of aluminum tolerance in microorganisms. © 2013 The Authors. Published by Blackwell Publishing Ltd.

  12. Complete sequence and analysis of plastid genomes of two economically important red algae: Pyropia haitanensis and Pyropia yezoensis.

    Directory of Open Access Journals (Sweden)

    Li Wang

    Full Text Available Pyropia haitanensis and P. yezoensis are two economically important marine crops that are also considered to be research models to study the physiological ecology of intertidal seaweed communities, evolutionary biology of plastids, and the origins of sexual reproduction. This plastid genome information will facilitate study of breeding, population genetics and phylogenetics.We have fully sequenced using next-generation sequencing the circular plastid genomes of P. hatanensis (195,597 bp and P. yezoensis (191,975 bp, the largest of all the plastid genomes of the red lineage sequenced to date. Organization and gene contents of the two plastids were similar, with 211-213 protein-coding genes (including 29-31 unknown-function ORFs, 37 tRNA genes, and 6 ribosomal RNA genes, suggesting a largest coding capacity in the red lineage. In each genome, 14 protein genes overlapped and no interrupted genes were found, indicating a high degree of genomic condensation. Pyropia maintain an ancient gene content and conserved gene clusters in their plastid genomes, containing nearly complete repertoires of the plastid genes known in photosynthetic eukaryotes. Similarity analysis based on the whole plastid genome sequences showed the distance between P. haitanensis and P. yezoensis (0.146 was much smaller than that of Porphyra purpurea and P. haitanensis (0.250, and P. yezoensis (0.251; this supports re-grouping the two species in a resurrected genus Pyropia while maintaining P. purpurea in genus Porphyra. Phylogenetic analysis supports a sister relationship between Bangiophyceae and Florideophyceae, though precise phylogenetic relationships between multicellular red alage and chromists were not fully resolved.These results indicate that Pyropia have compact plastid genomes. Large coding capacity and long intergenic regions contribute to the size of the largest plastid genomes reported for the red lineage. Possessing the largest coding capacity and ancient gene

  13. Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae : Implications for the microbial "pan-genome"

    NARCIS (Netherlands)

    Tettelin, H; Masignani, [No Value; Cieslewicz, MJ; Donati, C; Medini, D; Ward, NL; Angiuoli, SV; Crabtree, J; Jones, AL; Durkin, AS; DeBoy, RT; Davidsen, TM; Mora, M; Scarselli, M; Ros, IMY; Peterson, JD; Hauser, CR; Sundaram, JP; Nelson, WC; Madupu, R; Brinkac, LM; Dodson, RJ; Rosovitz, MJ; Sullivan, SA; Daugherty, SC; Haft, DH; Selengut, J; Gwinn, ML; Zhou, LW; Zafar, N; Khouri, H; Radune, D; Dimitrov, G; Watkins, K; O'Connor, KJB; Smith, S; Utterback, TR; White, O; Rubens, CE; Grandi, G; Madoff, LC; Kasper, DL; Telford, JL; Wessels, MR; Rappuoli, R; Fraser, CM

    2005-01-01

    The development of efficient and inexpensive genome sequencing methods has revolutionized the study of human bacterial pathogens and improved vaccine design. Unfortunately, the sequence of a single genome does not reflect how genetic variability drives pathogenesis within a bacterial species and

  14. Characterization and Comparative Analysis of the Complete Chloroplast Genome of the Critically Endangered Species Streptocarpus teitensis (Gesneriaceae

    Directory of Open Access Journals (Sweden)

    Cornelius M. Kyalo

    2018-01-01

    Full Text Available Streptocarpus teitensis (Gesneriaceae is an endemic species listed as critically endangered in the International Union for Conservation of Nature (IUCN red list of threatened species. However, the sequence and genome information of this species remains to be limited. In this article, we present the complete chloroplast genome structure of Streptocarpus teitensis and its evolution inferred through comparative studies with other related species. S. teitensis displayed a chloroplast genome size of 153,207 bp, sheltering a pair of inverted repeats (IR of 25,402 bp each split by small and large single-copy (SSC and LSC regions of 18,300 and 84,103 bp, respectively. The chloroplast genome was observed to contain 116 unique genes, of which 80 are protein-coding, 32 are transfer RNAs, and four are ribosomal RNAs. In addition, a total of 196 SSR markers were detected in the chloroplast genome of Streptocarpus teitensis with mononucleotides (57.1% being the majority, followed by trinucleotides (33.2% and dinucleotides and tetranucleotides (both 4.1%, and pentanucleotides being the least (1.5%. Genome alignment indicated that this genome was comparable to other sequenced members of order Lamiales. The phylogenetic analysis suggested that Streptocarpus teitensis is closely related to Lysionotus pauciflorus and Dorcoceras hygrometricum.

  15. Multi-platform next-generation sequencing of the domestic turkey (Meleagris gallopavo: genome assembly and analysis.

    Directory of Open Access Journals (Sweden)

    Rami A Dalloul

    2010-09-01

    Full Text Available A synergistic combination of two next-generation sequencing platforms with a detailed comparative BAC physical contig map provided a cost-effective assembly of the genome sequence of the domestic turkey (Meleagris gallopavo. Heterozygosity of the sequenced source genome allowed discovery of more than 600,000 high quality single nucleotide variants. Despite this heterozygosity, the current genome assembly (∼1.1 Gb includes 917 Mb of sequence assigned to specific turkey chromosomes. Annotation identified nearly 16,000 genes, with 15,093 recognized as protein coding and 611 as non-coding RNA genes. Comparative analysis of the turkey, chicken, and zebra finch genomes, and comparing avian to mammalian species, supports the characteristic stability of avian genomes and identifies genes unique to the avian lineage. Clear differences are seen in number and variety of genes of the avian immune system where expansions and novel genes are less frequent than examples of gene loss. The turkey genome sequence provides resources to further understand the evolution of vertebrate genomes and genetic variation underlying economically important quantitative traits in poultry. This integrated approach may be a model for providing both gene and chromosome level assemblies of other species with agricultural, ecological, and evolutionary interest.

  16. The complete chloroplast genome sequence of Aconitum coreanum and Aconitum carmichaelii and comparative analysis with other Aconitum species.

    Directory of Open Access Journals (Sweden)

    Inkyu Park

    Full Text Available Aconitum species (belonging to the Ranunculaceae are well known herbaceous medicinal ingredients and have great economic value in Asian countries. However, there are still limited genomic resources available for Aconitum species. In this study, we sequenced the chloroplast (cp genomes of two Aconitum species, A. coreanum and A. carmichaelii, using the MiSeq platform. The two Aconitum chloroplast genomes were 155,880 and 157,040 bp in length, respectively, and exhibited LSC and SSC regions separated by a pair of inverted repeat regions. Both cp genomes had 38% GC content and contained 131 unique functional genes including 86 protein-coding genes, eight ribosomal RNA genes, and 37 transfer RNA genes. The gene order, content, and orientation of the two Aconitum cp genomes exhibited the general structure of angiosperms, and were similar to those of other Aconitum species. Comparison of the cp genome structure and gene order with that of other Aconitum species revealed general contraction and expansion of the inverted repeat regions and single copy boundary regions. Divergent regions were also identified. In phylogenetic analysis, Aconitum species positon among the Ranunculaceae was determined with other family cp genomes in the Ranunculales. We obtained a barcoding target sequence in a divergent region, ndhC-trnV, and successfully developed a SCAR (sequence characterized amplified region marker for discrimination of A. coreanum. Our results provide useful genetic information and a specific barcode for discrimination of Aconitum species.

  17. The tiger genome and comparative analysis with lion and snow leopard genomes.

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-Uk; Luo, Shu-Jin; Johnson, Warren E; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A; Marker, Laurie; Harper, Cindy; Miller, Susan M; Jacobs, Wilhelm; Bertola, Laura D; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O'Brien, Stephen J; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world's most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats' hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species.

  18. The tiger genome and comparative analysis with lion and snow leopard genomes

    Science.gov (United States)

    Cho, Yun Sung; Hu, Li; Hou, Haolong; Lee, Hang; Xu, Jiaohui; Kwon, Soowhan; Oh, Sukhun; Kim, Hak-Min; Jho, Sungwoong; Kim, Sangsoo; Shin, Young-Ah; Kim, Byung Chul; Kim, Hyunmin; Kim, Chang-uk; Luo, Shu-Jin; Johnson, Warren E.; Koepfli, Klaus-Peter; Schmidt-Küntzel, Anne; Turner, Jason A.; Marker, Laurie; Harper, Cindy; Miller, Susan M.; Jacobs, Wilhelm; Bertola, Laura D.; Kim, Tae Hyung; Lee, Sunghoon; Zhou, Qian; Jung, Hyun-Ju; Xu, Xiao; Gadhvi, Priyvrat; Xu, Pengwei; Xiong, Yingqi; Luo, Yadan; Pan, Shengkai; Gou, Caiyun; Chu, Xiuhui; Zhang, Jilin; Liu, Sanyang; He, Jing; Chen, Ying; Yang, Linfeng; Yang, Yulan; He, Jiaju; Liu, Sha; Wang, Junyi; Kim, Chul Hong; Kwak, Hwanjong; Kim, Jong-Soo; Hwang, Seungwoo; Ko, Junsu; Kim, Chang-Bae; Kim, Sangtae; Bayarlkhagva, Damdin; Paek, Woon Kee; Kim, Seong-Jin; O’Brien, Stephen J.; Wang, Jun; Bhak, Jong

    2013-01-01

    Tigers and their close relatives (Panthera) are some of the world’s most endangered species. Here we report the de novo assembly of an Amur tiger whole-genome sequence as well as the genomic sequences of a white Bengal tiger, African lion, white African lion and snow leopard. Through comparative genetic analyses of these genomes, we find genetic signatures that may reflect molecular adaptations consistent with the big cats’ hypercarnivorous diet and muscle strength. We report a snow leopard-specific genetic determinant in EGLN1 (Met39>Lys39), which is likely to be associated with adaptation to high altitude. We also detect a TYR260G>A mutation likely responsible for the white lion coat colour. Tiger and cat genomes show similar repeat composition and an appreciably conserved synteny. Genomic data from the five big cats provide an invaluable resource for resolving easily identifiable phenotypes evident in very close, but distinct, species. PMID:24045858

  19. Genome sequence analysis of the model grass Brachypodium distachyon: insights into grass genome evolution

    Energy Technology Data Exchange (ETDEWEB)

    Schulman, Al

    2009-08-09

    Three subfamilies of grasses, the Erhardtoideae (rice), the Panicoideae (maize, sorghum, sugar cane and millet), and the Pooideae (wheat, barley and cool season forage grasses) provide the basis of human nutrition and are poised to become major sources of renewable energy. Here we describe the complete genome sequence of the wild grass Brachypodium distachyon (Brachypodium), the first member of the Pooideae subfamily to be completely sequenced. Comparison of the Brachypodium, rice and sorghum genomes reveals a precise sequence- based history of genome evolution across a broad diversity of the grass family and identifies nested insertions of whole chromosomes into centromeric regions as a predominant mechanism driving chromosome evolution in the grasses. The relatively compact genome of Brachypodium is maintained by a balance of retroelement replication and loss. The complete genome sequence of Brachypodium, coupled to its exceptional promise as a model system for grass research, will support the development of new energy and food crops

  20. Genome analysis of two type 6 echovirus (E6) strains recovered from sewage specimens in Greece in 2006.

    Science.gov (United States)

    Kyriakopoulou, Zaharoula; Pliaka, Vaia; Tsakogiannis, Dimitris; Ruether, Irina G A; Komiotis, Dimitris; Gartzonika, Constantina; Levidiotou-Stefanou, Stamatina; Markoulatos, Panayotis

    2012-04-01

    Echovirus 6 (E6) is one of the main enteroviral serotypes that was isolated from cases of aseptic meningitis and encephalitis during the last years in Greece. Two E6 (LR51A5 and LR61G3) were isolated from the sewage treatment plant unit in Larissa, Greece, in May 2006, 1 year before their characterization from aseptic meningitis cases. The two isolates were initially found to be intra-serotypic recombinants in the genomic region VP1, a finding that initiated a full genome sequence analysis. In the present study, nucleotide, amino acid, and phylogenetic analyses for all genomic regions were conducted. For the detection of recombination events, Simplot and bootscan analyses were carried out. The continuous phylogenetic relationship in 2C-3D genomic region of strains LR51A5 and LR61G3 with E30 isolated in France in 2002-2005 indicated that the two strains were recombinants. SimPlot and Bootscan analyses confirmed that LR51A5 and LR61G3 carry an inter-serotypic recombination in the 2C genomic region. The present study provide evidence that recombination events occurred in the regions VP1 (intraserotypic) and non-capsid (interserotypic) during the evolution of LR51A5 and LR61G3, supporting the statement that the genomes of circulating enteroviruses are a mosaic of genomic regions of viral strains of the same or different serotypes. In conclusion, full genome sequence analysis of circulating enteroviral strains is a prerequisite to understand the complexity of enterovirus evolution.

  1. Complete genome sequencing and phylogenetic analysis of dengue type 1 virus isolated from Jeddah, Saudi Arabia.

    Science.gov (United States)

    Azhar, Esam I; Hashem, Anwar M; El-Kafrawy, Sherif A; Abol-Ela, Said; Abd-Alla, Adly M M; Sohrab, Sayed Sartaj; Farraj, Suha A; Othman, Norah A; Ben-Helaby, Huda G; Ashshi, Ahmed; Madani, Tariq A; Jamjoom, Ghazi

    2015-01-16

    Dengue viruses (DENVs) are mosquito-borne viruses which can cause disease ranging from mild fever to severe dengue infection. These viruses are endemic in several tropical and subtropical regions. Multiple outbreaks of DENV serotypes 1, 2 and 3 (DENV-1, DENV-2 and DENV-3) have been reported from the western region in Saudi Arabia since 1994. Strains from at least two genotypes of DENV-1 (Asia and America/Africa genotypes) have been circulating in western Saudi Arabia until 2006. However, all previous studies reported from Saudi Arabia were based on partial sequencing data of the envelope (E) gene without any reports of full genome sequences for any DENV serotypes circulating in Saudi Arabia. Here, we report the isolation and the first complete genome sequence of a DENV-1 strain (DENV-1-Jeddah-1-2011) isolated from a patient from Jeddah, Saudi Arabia in 2011. Whole genome sequence alignment and phylogenetic analysis showed high similarity between DENV-1-Jeddah-1-2011 strain and D1/H/IMTSSA/98/606 isolate (Asian genotype) reported from Djibouti in 1998. Further analysis of the full envelope gene revealed a close relationship between DENV-1-Jeddah-1-2011 strain and isolates reported between 2004-2006 from Jeddah as well as recent isolates from Somalia, suggesting the widespread of the Asian genotype in this region. These data suggest that strains belonging to the Asian genotype might have been introduced into Saudi Arabia long before 2004 most probably by African pilgrims and continued to circulate in western Saudi Arabia at least until 2011. Most importantly, these results indicate that pilgrims from dengue endemic regions can play an important role in the spread of new DENVs in Saudi Arabia and the rest of the world. Therefore, availability of complete genome sequences would serve as a reference for future epidemiological studies of DENV-1 viruses.

  2. Research study on analysis/use technologies of genome information; Genome joho kaidoku riyo gijutsu no chosa kenkyu

    Energy Technology Data Exchange (ETDEWEB)

    NONE

    1997-03-01

    For wide use of genome information in the industrial field, the required R and D was surveyed from the standpoints of biology and information science. To clarify the present state and issues of the international research on genome analysis, the genome map as well as sequence and function information are first surveyed. The current analysis/use technologies of genome information are analyzed, and the following are summarized: prediction and identification of gene regions in genome sequences, techniques for searching and selecting useful genes, and techniques for predicting the expression of gene functions and the gene-product structure and functions. It is recommended that R and D and data collection/interpretation necessary to clarify inter-gene interactions and information networks should be promoted by integrating Japanese advanced know-how and technologies. As examples of the impact of the research results on industry and society, the present state and future expected effect are summarized for medicines, diagnosis/analysis instruments, chemicals, foods, agriculture, fishery, animal husbandry, electronics, environment and information. 278 refs., 42 figs., 5 tabs.

  3. Ethical considerations of research policy for personal genome analysis: the approach of the Genome Science Project in Japan.

    Science.gov (United States)

    Minari, Jusaku; Shirai, Tetsuya; Kato, Kazuto

    2014-12-01

    As evidenced by high-throughput sequencers, genomic technologies have recently undergone radical advances. These technologies enable comprehensive sequencing of personal genomes considerably more efficiently and less expensively than heretofore. These developments present a challenge to the conventional framework of biomedical ethics; under these changing circumstances, each research project has to develop a pragmatic research policy. Based on the experience with a new large-scale project-the Genome Science Project-this article presents a novel approach to conducting a specific policy for personal genome research in the Japanese context. In creating an original informed-consent form template for the project, we present a two-tiered process: making the draft of the template following an analysis of national and international policies; refining the draft template in conjunction with genome project researchers for practical application. Through practical use of the template, we have gained valuable experience in addressing challenges in the ethical review process, such as the importance of sharing details of the latest developments in genomics with members of research ethics committees. We discuss certain limitations of the conventional concept of informed consent and its governance system and suggest the potential of an alternative process using information technology.

  4. Genome-wide analysis of wild-type Epstein-Barr virus genomes derived from healthy individuals of the 1,000 Genomes Project.

    Science.gov (United States)

    Santpere, Gabriel; Darre, Fleur; Blanco, Soledad; Alcami, Antonio; Villoslada, Pablo; Mar Albà, M; Navarro, Arcadi

    2014-04-01

    Most people in the world (∼90%) are infected by the Epstein-Barr virus (EBV), which establishes itself permanently in B cells. Infection by EBV is related to a number of diseases including infectious mononucleosis, multiple sclerosis, and different types of cancer. So far, only seven complete EBV strains have been described, all of them coming from donors presenting EBV-related diseases. To perform a detailed comparative genomic analysis of EBV including, for the first time, EBV strains derived from healthy individuals, we reconstructed EBV sequences infecting lymphoblastoid cell lines (LCLs) from the 1000 Genomes Project. As strain B95-8 was used to transform B cells to obtain LCLs, it is always present, but a specific deletion in its genome sets it apart from natural EBV strains. After studying hundreds of individuals, we determined the presence of natural EBV in at least 10 of them and obtained a set of variants specific to wild-type EBV. By mapping the natural EBV reads into the EBV reference genome (NC007605), we constructed nearly complete wild-type viral genomes from three individuals. Adding them to the five disease-derived EBV genomic sequences available in the literature, we performed an in-depth comparative genomic analysis. We found that latency genes harbor more nucleotide diversity than lytic genes and that six out of nine latency-related genes, as well as other genes involved in viral attachment and entry into host cells, packaging, and the capsid, present the molecular signature of accelerated protein evolution rates, suggesting rapid host-parasite coevolution.

  5. BIGSdb: Scalable analysis of bacterial genome variation at the population level

    Directory of Open Access Journals (Sweden)

    Maiden Martin CJ

    2010-12-01

    Full Text Available Abstract Background The opportunities for bacterial population genomics that are being realised by the application of parallel nucleotide sequencing require novel bioinformatics platforms. These must be capable of the storage, retrieval, and analysis of linked phenotypic and genotypic information in an accessible, scalable and computationally efficient manner. Results The Bacterial Isolate Genome Sequence Database (BIGSDB is a scalable, open source, web-accessible database system that meets these needs, enabling phenotype and sequence data, which can range from a single sequence read to whole genome data, to be efficiently linked for a limitless number of bacterial specimens. The system builds on the widely used mlstdbNet software, developed for the storage and distribution of multilocus sequence typing (MLST data, and incorporates the capacity to define and identify any number of loci and genetic variants at those loci within the stored nucleotide sequences. These loci can be further organised into 'schemes' for isolate characterisation or for evolutionary or functional analyses. Isolates and loci can be indexed by multiple names and any number of alternative schemes can be accommodated, enabling cross-referencing of different studies and approaches. LIMS functionality of the software enables linkage to and organisation of laboratory samples. The data are easily linked to external databases and fine-grained authentication of access permits multiple users to participate in community annotation by setting up or contributing to different schemes within the database. Some of the applications of BIGSDB are illustrated with the genera Neisseria and Streptococcus. The BIGSDB source code and documentation are available at http://pubmlst.org/software/database/bigsdb/. Conclusions Genomic data can be used to characterise bacterial isolates in many different ways but it can also be efficiently exploited for evolutionary or functional studies. BIGSDB

  6. Genome analysis of environmental and clinical P. aeruginosa isolates from sequence type-1146.

    Directory of Open Access Journals (Sweden)

    David Sánchez

    Full Text Available The genomes of Pseudomonas aeruginosa isolates of the new sequence type ST-1146, three environmental (P37, P47 and P49 and one clinical (SD9 isolates, with differences in their antibiotic susceptibility profiles have been sequenced and analysed. The genomes were mapped against P. aeruginosa PAO1-UW and UCBPP-PA14. The allelic profiles showed that the highest number of differences were in "Related to phage, transposon or plasmid" and "Secreted factors" categories. The clinical isolate showed a number of exclusive alleles greater than that for the environmental isolates. The phage Pf1 region in isolate SD9 accumulated the highest number of nucleotide substitutions. The ORF analysis of the four genomes assembled de novo indicated that the number of isolate-specific genes was higher in isolate SD9 (132 genes than in isolates P37 (24 genes, P47 (16 genes and P49 (21 genes. CRISPR elements were found in all isolates and SD9 showed differences in the spacer region. Genes related to bacteriophages F116 and H66 were found only in isolate SD9. Genome comparisons indicated that the isolates of ST-1146 are close related, and most genes implicated in pathogenicity are highly conserved, suggesting a genetic potential for infectivity in the environmental isolates similar to the clinical one. Phage-related genes are responsible of the main differences among the genomes of ST-1146 isolates. The role of bacteriophages has to be considered in the adaptation processes of isolates to the host and in microevolution studies.

  7. BGI-RIS: an integrated information resource and comparative analysis workbench for rice genomics

    DEFF Research Database (Denmark)

    Zhao, Wenming; Wang, Jing; He, Ximiao

    2004-01-01

    Rice is a major food staple for the world's population and serves as a model species in cereal genome research. The Beijing Genomics Institute (BGI) has long been devoting itself to sequencing, information analysis and biological research of the rice and other crop genomes. In order to facilitate....... Designed as a basic platform, BGI-RIS presents the sequenced genomes and related information in systematic and graphical ways for the convenience of in-depth comparative studies (http://rise.genomics.org.cn/). Udgivelsesdato: 2004-Jan-1...

  8. The Complete Chloroplast Genome of Catha edulis: A Comparative Analysis of Genome Features with Related Species

    Science.gov (United States)

    Tembrock, Luke R.; Zheng, Shaoyu; Wu, Zhiqiang

    2018-01-01

    Qat (Catha edulis, Celastraceae) is a woody evergreen species with great economic and cultural importance. It is cultivated for its stimulant alkaloids cathine and cathinone in East Africa and southwest Arabia. However, genome information, especially DNA sequence resources, for C. edulis are limited, hindering studies regarding interspecific and intraspecific relationships. Herein, the complete chloroplast (cp) genome of Catha edulis is reported. This genome is 157,960 bp in length with 37% GC content and is structurally arranged into two 26,577 bp inverted repeats and two single-copy areas. The size of the small single-copy and the large single-copy regions were 18,491 bp and 86,315 bp, respectively. The C. edulis cp genome consists of 129 coding genes including 37 transfer RNA (tRNA) genes, 8 ribosomal RNA (rRNA) genes, and 84 protein coding genes. For those genes, 112 are single copy genes and 17 genes are duplicated in two inverted regions with seven tRNAs, four rRNAs, and six protein coding genes. The phylogenetic relationships resolved from the cp genome of qat and 32 other species confirms the monophyly of Celastraceae. The cp genomes of C. edulis, Euonymus japonicus and seven Celastraceae species lack the rps16 intron, which indicates an intron loss took place among an ancestor of this family. The cp genome of C. edulis provides a highly valuable genetic resource for further phylogenomic research, barcoding and cp transformation in Celastraceae. PMID:29425128

  9. STINGRAY: system for integrated genomic resources and analysis.

    Science.gov (United States)

    Wagner, Glauber; Jardim, Rodrigo; Tschoeke, Diogo A; Loureiro, Daniel R; Ocaña, Kary A C S; Ribeiro, Antonio C B; Emmel, Vanessa E; Probst, Christian M; Pitaluga, André N; Grisard, Edmundo C; Cavalcanti, Maria C; Campos, Maria L M; Mattoso, Marta; Dávila, Alberto M R

    2014-03-07

    The STINGRAY system has been conceived to ease the tasks of integrating, analyzing, annotating and presenting genomic and expression data from Sanger and Next Generation Sequencing (NGS) platforms. STINGRAY includes: (a) a complete and integrated workflow (more than 20 bioinformatics tools) ranging from functional annotation to phylogeny; (b) a MySQL database schema, suitable for data integration and user access control; and (c) a user-friendly graphical web-based interface that makes the system intuitive, facilitating the tasks of data analysis and annotation. STINGRAY showed to be an easy to use and complete system for analyzing sequencing data. While both Sanger and NGS platforms are supported, the system could be faster using Sanger data, since the large NGS datasets could potentially slow down the MySQL database usage. STINGRAY is available at http://stingray.biowebdb.org and the open source code at http://sourceforge.net/projects/stingray-biowebdb/.

  10. Genome-Wide Analysis of the Synonymous Codon Usage Patterns in Riemerella anatipestifer

    Directory of Open Access Journals (Sweden)

    Jibin Liu

    2016-08-01

    Full Text Available Riemerella anatipestifer (RA belongs to the Flavobacteriaceae family and can cause a septicemia disease in poultry. The synonymous codon usage patterns of bacteria reflect a series of evolutionary changes that enable bacteria to improve tolerance of the various environments. We detailed the codon usage patterns of RA isolates from the available 12 sequenced genomes by multiple codon and statistical analysis. Nucleotide compositions and relative synonymous codon usage (RSCU analysis revealed that A or U ending codons are predominant in RA. Neutrality analysis found no significant correlation between GC12 and GC3 (p > 0.05. Correspondence analysis and ENc-plot results showed that natural selection dominated over mutation in the codon usage bias. The tree of cluster analysis based on RSCU was concordant with dendrogram based on genomic BLAST by neighbor-joining method. By comparative analysis, about 50 highly expressed genes that were orthologs across all 12 strains were found in the top 5% of high CAI value. Based on these CAI values, we infer that RA contains a number of predicted highly expressed coding sequences, involved in transcriptional regulation and metabolism, reflecting their requirement for dealing with diverse environmental conditions. These results provide some useful information on the mechanisms that contribute to codon usage bias and evolution of RA.

  11. Genome Analysis and Characterisation of the Exopolysaccharide Produced by Bifidobacterium longum subsp. longum 35624™.

    Directory of Open Access Journals (Sweden)

    Friedrich Altmann

    Full Text Available The Bifibobacterium longum subsp. longum 35624™ strain (formerly named Bifidobacterium longum subsp. infantis is a well described probiotic with clinical efficacy in Irritable Bowel Syndrome clinical trials and induces immunoregulatory effects in mice and in humans. This paper presents (a the genome sequence of the organism allowing the assignment to its correct subspeciation longum; (b a comparative genome assessment with other B. longum strains and (c the molecular structure of the 35624 exopolysaccharide (EPS624. Comparative genome analysis of the 35624 strain with other B. longum strains determined that the sub-speciation of the strain is longum and revealed the presence of a 35624-specific gene cluster, predicted to encode the biosynthetic machinery for EPS624. Following isolation and acid treatment of the EPS, its chemical structure was determined using gas and liquid chromatography for sugar constituent and linkage analysis, electrospray and matrix assisted laser desorption ionization mass spectrometry for sequencing and NMR. The EPS consists of a branched hexasaccharide repeating unit containing two galactose and two glucose moieties, galacturonic acid and the unusual sugar 6-deoxy-L-talose. These data demonstrate that the B. longum 35624 strain has specific genetic features, one of which leads to the generation of a characteristic exopolysaccharide.

  12. Functional Analysis of Shewanella, a cross genome comparison.

    Energy Technology Data Exchange (ETDEWEB)

    Serres, Margrethe H.

    2009-05-15

    The bacterial genus Shewanella includes a group of highly versatile organisms that have successfully adapted to life in many environments ranging from aquatic (fresh and marine) to sedimentary (lake and marine sediments, subsurface sediments, sea vent). A unique respiratory capability of the Shewanellas, initially observed for Shewanella oneidensis MR-1, is the ability to use metals and metalloids, including radioactive compounds, as electron acceptors. Members of the Shewanella genus have also been shown to degrade environmental pollutants i.e. halogenated compounds, making this group highly applicable for the DOE mission. S. oneidensis MR-1 has in addition been found to utilize a diverse set of nutrients and to have a large set of genes dedicated to regulation and to sensing of the environment. The sequencing of the S. oneidensis MR-1 genome facilitated experimental and bioinformatics analyses by a group of collaborating researchers, the Shewanella Federation. Through the joint effort and with support from Department of Energy S. oneidensis MR-1 has become a model organism of study. Our work has been a functional analysis of S. oneidensis MR-1, both by itself and as part of a comparative study. We have improved the annotation of gene products, assigned metabolic functions, and analyzed protein families present in S. oneidensis MR-1. The data has been applied to analysis of experimental data (i.e. gene expression, proteome) generated for S. oneidensis MR-1. Further, this work has formed the basis for a comparative study of over 20 members of the Shewanella genus. The species and strains selected for genome sequencing represented an evolutionary gradient of DNA relatedness, ranging from close to intermediate, and to distant. The organisms selected have also adapted to a variety of ecological niches. Through our work we have been able to detect and interpret genome similarities and differences between members of the genus. We have in this way contributed to the

  13. Be-Breeder - an application for analysis of genomic data in plant breeding

    OpenAIRE

    Matias,Filipe Inácio; Granato,Italo Stefanine Correa; Dequigiovanni,Gabriel; Fritsche-Neto,Roberto

    2017-01-01

    Abstract Be-Breeder is an application directed toward genetic breeding of plants, developed through the Shiny package of the R software, which allows different phenotype and molecular (marker) analysis to be undertaken. The section for analysis of molecular data of the Be-Breeder application makes it possible to achieve quality control of genotyping data, to obtain genomic kinship matrices, and to analyze genome selection, genome association, and genetic diversity in a simple manner on line. ...

  14. Molecular comparisons of full length metapneumovirus (MPV genomes, including newly determined French AMPV-C and -D isolates, further supports possible subclassification within the MPV Genus.

    Directory of Open Access Journals (Sweden)

    Paul A Brown

    Full Text Available Four avian metapneumovirus (AMPV subgroups (A-D have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus "clusters" HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II.

  15. EG-13GENOME-WIDE METHYLATION ANALYSIS IDENTIFIES GENOMIC DNA DEMETHYLATION DURING MALIGNANT PROGRESSION OF GLIOMAS

    Science.gov (United States)

    Saito, Kuniaki; Mukasa, Akitake; Nagae, Genta; Aihara, Koki; Otani, Ryohei; Takayanagi, Shunsaku; Omata, Mayu; Tanaka, Shota; Shibahara, Junji; Takahashi, Miwako; Momose, Toshimitsu; Shimamura, Teppei; Miyano, Satoru; Narita, Yoshitaka; Ueki, Keisuke; Nishikawa, Ryo; Nagane, Motoo; Aburatani, Hiroyuki; Saito, Nobuhito

    2014-01-01

    Low-grade gliomas often undergo malignant progression, and these transformations are a leading cause of death in patients with low-grade gliomas. However, the molecular mechanisms underlying malignant tumor progression are still not well understood. Recent evidence indicates that epigenetic deregulation is an important cause of gliomagenesis; therefore, we examined the impact of epigenetic changes during malignant progression of low-grade gliomas. Specifically, we used the Illumina Infinium Human Methylation 450K BeadChip to perform genome-wide DNA methylation analysis of 120 gliomas and four normal brains. This study sample included 25 matched-pairs of initial low-grade gliomas and recurrent tumors (temporal heterogeneity) and 20 of the 25 recurring tumors recurred as malignant progressions, and one matched-pair of newly emerging malignant lesions and pre-existing lesions (spatial heterogeneity). Analyses of methylation profiles demonstrated that most low-grade gliomas in our sample (43/51; 84%) had a CpG island methylator phenotype (G-CIMP). Remarkably, approximately 50% of secondary glioblastomas that had progressed from low-grade tumors with the G-CIMP status exhibited a characteristic partial demethylation of genomic DNA during malignant progression, but other recurrent gliomas showed no apparent change in DNA methylation pattern. Interestingly, we found that most loci that were demethylated during malignant progression were located outside of CpG islands. The information of histone modifications patterns in normal human astrocytes and embryonal stem cells also showed that the ratio of active marks at the site corresponding to DNA demethylated loci in G-CIMP-demethylated tumors was significantly lower; this finding indicated that most demethylated loci in G-CIMP-demethylated tumors were likely transcriptionally inactive. A small number of the genes that were upregulated and had demethylated CpG islands were associated with cell cycle-related pathway. In

  16. Reconstruction of the Transmission History of RNA Virus Outbreaks Using Full Genome Sequences: Foot-and-Mouth Disease Virus in Bulgaria in 2011

    DEFF Research Database (Denmark)

    Valdazo-González, Begoña; Polihronova, Lilyana; Alexandrov, Tsviatko

    2012-01-01

    the origin and transmission history of the FMD outbreaks which occurred during 2011 in Burgas Province, Bulgaria, a country that had been previously FMD-free-without-vaccination since 1996. Nineteen full genome sequences (FGS) of FMD virus (FMDV) were generated and analysed, including eight representative...... identified in wild boar. The closest relative from outside of Bulgaria was a FMDV collected during 2010 in Bursa (Anatolia, Turkey). Within Bulgaria, two discrete genetic clusters were detected that corresponded to two episodes of outbreaks that occurred during January and March-April 2011. The number...... of nucleotide substitutions that were present between, and within, these separate clusters provided evidence that undetected FMDV infection had occurred. These conclusions are supported by laboratory data that subsequently identified three additional FMDV-infected livestock premises by serosurveillance, as well...

  17. GEnomes Management Application (GEM.app): a new software tool for large-scale collaborative genome analysis.

    Science.gov (United States)

    Gonzalez, Michael A; Lebrigio, Rafael F Acosta; Van Booven, Derek; Ulloa, Rick H; Powell, Eric; Speziani, Fiorella; Tekin, Mustafa; Schüle, Rebecca; Züchner, Stephan

    2013-06-01

    Novel genes are now identified at a rapid pace for many Mendelian disorders, and increasingly, for genetically complex phenotypes. However, new challenges have also become evident: (1) effectively managing larger exome and/or genome datasets, especially for smaller labs; (2) direct hands-on analysis and contextual interpretation of variant data in large genomic datasets; and (3) many small and medium-sized clinical and research-based investigative teams around the world are generating data that, if combined and shared, will significantly increase the opportunities for the entire community to identify new genes. To address these challenges, we have developed GEnomes Management Application (GEM.app), a software tool to annotate, manage, visualize, and analyze large genomic datasets (https://genomics.med.miami.edu/). GEM.app currently contains ∼1,600 whole exomes from 50 different phenotypes studied by 40 principal investigators from 15 different countries. The focus of GEM.app is on user-friendly analysis for nonbioinformaticians to make next-generation sequencing data directly accessible. Yet, GEM.app provides powerful and flexible filter options, including single family filtering, across family/phenotype queries, nested filtering, and evaluation of segregation in families. In addition, the system is fast, obtaining results within 4 sec across ∼1,200 exomes. We believe that this system will further enhance identification of genetic causes of human disease. © 2013 Wiley Periodicals, Inc.

  18. Genome-Wide Prediction and Analysis of 3D-Domain Swapped Proteins in the Human Genome from Sequence Information.

    Science.gov (United States)

    Upadhyay, Atul Kumar; Sowdhamini, Ramanathan

    2016-01-01

    3D-domain swapping is one of the mechanisms of protein oligomerization and the proteins exhibiting this phenomenon have many biological functions. These proteins, which undergo domain swapping, have acquired much attention owing to their involvement in human diseases, such as conformational diseases, amyloidosis, serpinopathies, proteionopathies etc. Early realisation of proteins in the whole human genome that retain tendency to domain swap will enable many aspects of disease control management. Predictive models were developed by using machine learning approaches with an average accuracy of 78% (85.6% of sensitivity, 87.5% of specificity and an MCC value of 0.72) to predict putative domain swapping in protein sequences. These models were applied to many complete genomes with special emphasis on the human genome. Nearly 44% of the protein sequences in the human genome were predicted positive for domain swapping. Enrichment analysis was performed on the positively predicted sequences from human genome for their domain distribution, disease association and functional importance based on Gene Ontology (GO). Enrichment analysis was also performed to infer a better understanding of the functional importance of these sequences. Finally, we developed hinge region prediction, in the given putative domain swapped sequence, by using important physicochemical properties of amino acids.

  19. Susceptibility to Childhood Pneumonia: A Genome-Wide Analysis.

    Science.gov (United States)

    Hayden, Lystra P; Cho, Michael H; McDonald, Merry-Lynn N; Crapo, James D; Beaty, Terri H; Silverman, Edwin K; Hersh, Craig P

    2017-01-01

    Previous studies have indicated that in adult smokers, a history of childhood pneumonia is associated with reduced lung function and chronic obstructive pulmonary disease. There have been few previous investigations using genome-wide association studies to investigate genetic predisposition to pneumonia. This study aims to identify the genetic variants associated with the development of pneumonia during childhood and over the course of the lifetime. Study subjects included current and former smokers with and without chronic obstructive pulmonary disease participating in the COPDGene Study. Pneumonia was defined by subject self-report, with childhood pneumonia categorized as having the first episode at pneumonia (843 cases, 9,091 control subjects) and lifetime pneumonia (3,766 cases, 5,659 control subjects) were performed separately in non-Hispanic whites and African Americans. Non-Hispanic white and African American populations were combined in the meta-analysis. Top genetic variants from childhood pneumonia were assessed in network analysis. No single-nucleotide polymorphisms reached genome-wide significance, although we identified potential regions of interest. In the childhood pneumonia analysis, this included variants in NGR1 (P = 6.3 × 10 -8 ), PAK6 (P = 3.3 × 10 -7 ), and near MATN1 (P = 2.8 × 10 -7 ). In the lifetime pneumonia analysis, this included variants in LOC339862 (P = 8.7 × 10 -7 ), RAPGEF2 (P = 8.4 × 10 -7 ), PHACTR1 (P = 6.1 × 10 -7 ), near PRR27 (P = 4.3 × 10 -7 ), and near MCPH1 (P = 2.7 × 10 -7 ). Network analysis of the genes associated with childhood pneumonia included top networks related to development, blood vessel morphogenesis, muscle contraction, WNT signaling, DNA damage, apoptosis, inflammation, and immune response (P ≤ 0.05). We have identified genes potentially associated with the risk of pneumonia. Further research will be required to confirm these

  20. High-throughput SHAPE analysis reveals structures in HIV-1 genomic RNA strongly conserved across distinct biological states.

    Directory of Open Access Journals (Sweden)

    Kevin A Wilkinson

    2008-04-01

    Full Text Available Replication and pathogenesis of the human immunodeficiency virus (HIV is tightly linked to the structure of its RNA genome, but genome structure in infectious virions is poorly understood. We invent high-throughput SHAPE (selective 2'-hydroxyl acylation analyzed by primer extension technology, which uses many of the same tools as DNA sequencing, to quantify RNA backbone flexibility at single-nucleotide resolution and from which robust structural information can be immediately derived. We analyze the structure of HIV-1 genomic RNA in four biologically instructive states, including the authentic viral genome inside native particles. Remarkably, given the large number of plausible local structures, the first 10% of the HIV-1 genome exists in a single, predominant conformation in all four states. We also discover that noncoding regions functioning in a regulatory role have significantly lower (p-value < 0.0001 SHAPE reactivities, and hence more structure, than do viral coding regions that function as the template for protein synthesis. By directly monitoring protein binding inside virions, we identify the RNA recognition motif for the viral nucleocapsid protein. Seven structurally homologous binding sites occur in a well-defined domain in the genome, consistent with a role in directing specific packaging of genomic RNA into nascent virions. In addition, we identify two distinct motifs that are targets for the duplex destabilizing activity of this same protein. The nucleocapsid protein destabilizes local HIV-1 RNA structure in ways likely to facilitate initial movement both of the retroviral reverse transcriptase from its tRNA primer and of the ribosome in coding regions. Each of the three nucleocapsid interaction motifs falls in a specific genome domain, indicating that local protein interactions can be organized by the long-range architecture of an RNA. High-throughput SHAPE reveals a comprehensive view of HIV-1 RNA genome structure, and further

  1. Comparative genome sequence analysis of Choristoneura occidentalis Freeman and C. rosaceana Harris (Lepidoptera: Tortricidae alphabaculoviruses.

    Directory of Open Access Journals (Sweden)

    David K Thumbi

    Full Text Available The complete genome sequences of Choristoneura occidentalis and C. rosaceana nucleopolyhedroviruses (ChocNPV and ChroNPV, respectively (Baculoviridae: Alphabaculovirus were determined and compared with each other and with those of other baculoviruses, including the genome of the closely related C. fumiferana NPV (CfMNPV. The ChocNPV genome was 128,446 bp in length (1147 bp smaller than that of CfMNPV, had a G+C content of 50.1%, and contained 148 open reading frames (ORFs. In comparison, the ChroNPV genome was 129,052 bp in length, had a G+C content of 48.6% and contained 149 ORFs. ChocNPV and ChroNPV shared 144 ORFs in common, and had a 77% sequence identity with each other and 96.5% and 77.8% sequence identity, respectively, with CfMNPV. Five homologous regions (hrs, with sequence similarities to those of CfMNPV, were identified in ChocNPV, whereas the ChroNPV genome contained three hrs featuring up to 14 repeats. Both genomes encoded three inhibitors of apoptosis (IAP-1, IAP-2, and IAP-3, as reported for CfMNPV, and the ChocNPV IAP-3 gene represented the most divergent functional region of this genome relative to CfMNPV. Two ORFs were unique to ChocNPV, and four were unique to ChroNPV. ChroNPV ORF chronpv38 is a eukaryotic initiation factor 5 (eIF-5 homolog that has also been identified in the C. occidentalis granulovirus (ChocGV and is believed to be the product of horizontal gene transfer from the host. Based on levels of sequence identity and phylogenetic analysis, both ChocNPV and ChroNPV fall within group I alphabaculoviruses, where ChocNPV appears to be more closely related to CfMNPV than does ChroNPV. Our analyses suggest that it may be appropriate to consider ChocNPV and CfMNPV as variants of the same virus species.

  2. Whole genome SNP discovery and analysis of genetic diversity in Turkey (Meleagris gallopavo

    Directory of Open Access Journals (Sweden)

    Aslam Muhammad L

    2012-08-01

    Full Text Available Abstract Background The turkey (Meleagris gallopavo is an important agricultural species and the second largest contributor to the world’s poultry meat production. Genetic improvement is attributed largely to selective breeding programs that rely on highly heritable phenotypic traits, such as body size and breast muscle development. Commercial breeding with small effective population sizes and epistasis can result in loss of genetic diversity, which in turn can lead to reduced individual fitness and reduced response to selection. The presence of genomic diversity in domestic livestock species therefore, is of great importance and a prerequisite for rapid and accurate genetic improvement of selected breeds in various environments, as well as to facilitate rapid adaptation to potential changes in breeding goals. Genomic selection requires a large number of genetic markers such as e.g. single nucleotide polymorphisms (SNPs the most abundant source of genetic variation within the genome. Results Alignment of next generation sequencing data of 32 individual turkeys from different populations was used for the discovery of 5.49 million SNPs, which subsequently were used for the analysis of genetic diversity among the different populations. All of the commercial lines branched from a single node relative to the heritage varieties and the South Mexican turkey population. Heterozygosity of all individuals from the different turkey populations ranged from 0.17-2.73 SNPs/Kb, while heterozygosity of populations ranged from 0.73-1.64 SNPs/Kb. The average frequency of heterozygous SNPs in individual turkeys was 1.07 SNPs/Kb. Five genomic regions with very low nucleotide variation were identified in domestic turkeys that showed state of fixation towards alleles different than wild alleles. Conclusion The turkey genome is much less diverse with a relatively low frequency of heterozygous SNPs as compared to other livestock species like chicken and pig. The

  3. Comparative genomic analysis of human fungal pathogens causing paracoccidioidomycosis.

    Directory of Open Access Journals (Sweden)

    Christopher A Desjardins

    2011-10-01

    Full Text Available Paracoccidioides is a fungal pathogen and the cause of paracoccidioidomycosis, a health-threatening human systemic mycosis endemic to Latin America. Infection by Paracoccidioides, a dimorphic fungus in the order Onygenales, is coupled with a thermally regulated transition from a soil-dwelling filamentous form to a yeast-like pathogenic form. To better understand the genetic basis of growth and pathogenicity in Paracoccidioides, we sequenced the genomes of two strains of Paracoccidioides brasiliensis (Pb03 and Pb18 and one strain of Paracoccidioides lutzii (Pb01. These genomes range in size from 29.1 Mb to 32.9 Mb and encode 7,610 to 8,130 genes. To enable genetic studies, we mapped 94% of the P. brasiliensis Pb18 assembly onto five chromosomes. We characterized gene family content across Onygenales and related fungi, and within Paracoccidioides we found expansions of the fungal-specific kinase family FunK1. Additionally, the Onygenales have lost many genes involved in carbohydrate metabolism and fewer genes involved in protein metabolism, resulting in a higher ratio of proteases to carbohydrate active enzymes in the Onygenales than their relatives. To determine if gene content correlated with growth on different substrates, we screened the non-pathogenic onygenale Uncinocarpus reesii, which has orthologs for 91% of Paracoccidioides metabolic genes, for growth on 190 carbon sources. U. reesii showed growth on a limited range of carbohydrates, primarily basic plant sugars and cell wall components; this suggests that Onygenales, including dimorphic fungi, can degrade cellulosic plant material in the soil. In addition, U. reesii grew on gelatin and a wide range of dipeptides and amino acids, indicating a preference for proteinaceous growth substrates over carbohydrates, which may enable these fungi to also degrade animal biomass. These capabilities for degrading plant and animal substrates suggest a duality in lifestyle that could enable pathogenic

  4. Physical mapping and BAC-end sequence analysis provide initial insights into the flax (Linum usitatissimum L. genome

    Directory of Open Access Journals (Sweden)

    Cloutier Sylvie

    2011-05-01

    Full Text Available Abstract Background Flax (Linum usitatissimum L. is an important source of oil rich in omega-3 fatty acids, which have proven health benefits and utility as an industrial raw material. Flax seeds also contain lignans which are associated with reducing the risk of certain types of cancer. Its bast fibres have broad industrial applications. However, genomic tools needed for molecular breeding were non existent. Hence a project, Total Utilization Flax GENomics (TUFGEN was initiated. We report here the first genome-wide physical map of flax and the generation and analysis of BAC-end sequences (BES from 43,776 clones, providing initial insights into the genome. Results The physical map consists of 416 contigs spanning ~368 Mb, assembled from 32,025 fingerprints, representing roughly 54.5% to 99.4% of the estimated haploid genome (370-675 Mb. The N50 size of the contigs was estimated to be ~1,494 kb. The longest contig was ~5,562 kb comprising 437 clones. There were 96 contigs containing more than 100 clones. Approximately 54.6 Mb representing 8-14.8% of the genome was obtained from 80,337 BES. Annotation revealed that a large part of the genome consists of ribosomal DNA (~13.8%, followed by known transposable elements at 6.1%. Furthermore, ~7.4% of sequence was identified to harbour novel repeat elements. Homology searches against flax-ESTs and NCBI-ESTs suggested that ~5.6% of the transcriptome is unique to flax. A total of 4064 putative genomic SSRs were identified and are being developed as novel markers for their use in molecular breeding. Conclusion The first genome-wide physical map of flax constructed with BAC clones provides a framework for accessing target loci with economic importance for marker development and positional cloning. Analysis of the BES has provided insights into the uniqueness of the flax genome. Compared to other plant genomes, the proportion of rDNA was found to be very high whereas the proportion of known transposable

  5. Complete genome analysis of three Acinetobacter baumannii clinical isolates in China for insight into the diversification of drug resistance elements.

    Directory of Open Access Journals (Sweden)

    Lingxiang Zhu

    Full Text Available The emergence and rapid spreading of multidrug-resistant Acinetobacter baumannii strains has become a major health threat worldwide. To better understand the genetic recombination related with the acquisition of drug-resistant elements during bacterial infection, we performed complete genome analysis on three newly isolated multidrug-resistant A. baumannii strains from Beijing using next-generation sequencing technology.Whole genome comparison revealed that all 3 strains share some common drug resistant elements including carbapenem-resistant bla OXA-23 and tetracycline (tet resistance islands, but the genome structures are diversified among strains. Various genomic islands intersperse on the genome with transposons and insertions, reflecting the recombination flexibility during the acquisition of the resistant elements. The blood-isolated BJAB07104 and ascites-isolated BJAB0868 exhibit high similarity on their genome structure with most of the global clone II strains, suggesting these two strains belong to the dominant outbreak strains prevalent worldwide. A large resistance island (RI of about 121-kb, carrying a cluster of resistance-related genes, was inserted into the ATPase gene on BJAB07104 and BJAB0868 genomes. A 78-kb insertion element carrying tra-locus and bla OXA-23 island, can be either inserted into one of the tniB gene in the 121-kb RI on the chromosome, or transformed to conjugative plasmid in the two BJAB strains. The third strains of this study, BJAB0715, which was isolated from spinal fluid, exhibit much more divergence compared with above two strains. It harbors multiple drug-resistance elements including a truncated AbaR-22-like RI on its genome. One of the unique features of this strain is that it carries both bla OXA-23 and bla OXA-58 genes on its genome. Besides, an Acinetobacter lwoffii adeABC efflux element was found inserted into the ATPase position in BJAB0715.Our comparative analysis on currently completed

  6. HBV genome analysis in the progression of HBV related chronic liver disease

    Directory of Open Access Journals (Sweden)

    Ruksana Raihan

    2017-12-01

    Full Text Available Although HBV is a non-cytopathic virus, alteration of viral genome may also alter host immunity and may play a part in the pathogenesis LC and HCC. During the last decade, various studies have shown that mutations in the HBV genome may play a role in HCC pathogenesis. Here, we have analyzed HBV genome from patients with asymptomatic HBV carrier [ASC], chronic hepatitis B (CHB, cirrhosis of liver (LC, and hepatocellular carcinoma (HCC of Bangladeshi origin. A total of 225 patients tested positive for HBV with different stages of chronic HBV infection were enrolled in this study. The extent of liver damages were assayed by estimating serum levels of alanine aminotransferase (ALT, serum bilirubin and finally by abdominal ultrasonography and/or fine needle aspiration cytology. Wherever required, cancer marker like alpha fetoprotein (AFP was assessed. HBV genotype was evaluated by immunoassays and sequenced. A total of 25 patients were ASC, 135 were CHB and 65 were LC and HCC. Among ASC patients, 5, 7 and 13 belonged to HBV genotype A, C, and D, respectively. On the other hand, HBV genotype C was most prevalent in CHB patients (about 42%, followed by HBV genotype D (36%. About 69% patients with LC and HCC also had genotype C. Full genomic analysis of sera of patients with progressive liver damages (LC and HCC revealed mutations at HBeAg promoter regions in more than 80% patients. However, mutations in this region were mostly unseen in ASC and patients with less progressive liver diseases. HBV genotype was found quite different in Bangladeshi HBV patients which seem a mixture of Indian and Asia-Pacific region. This study also reveals that HBeAg promoter region mutation may have role in development of HBV related LC and HCC.

  7. Genome wide analysis of stress responsive WRKY transcription factors in Arabidopsis thaliana

    Directory of Open Access Journals (Sweden)

    Shaiq Sultan

    2016-04-01

    Full Text Available WRKY transcription factors are a class of DNA-binding proteins that bind with a specific sequence C/TTGACT/C known as W-Box found in promoters of genes which are regulated by these WRKYs. From previous studies, 43 different stress responsive WRKY transcription factors in Arabidopsis thaliana, identified and then categorized in three groups viz., abiotic, biotic and both of these stresses. A comprehensive genome wide analysis including chromosomal localization, gene structure analysis, multiple sequence alignment, phylogenetic analysis and promoter analysis of these WRKY genes was carried out in this study to determine the functional homology in Arabidopsis. This analysis led to the classification of these WRKY family members into 3 major groups and subgroups and showed evolutionary relationship among these groups on the base of their functional WRKY domain, chromosomal localization and intron/exon structure. The proposed groups of these stress responsive WRKY genes and annotation based on their position on chromosomes can also be explored to determine their functional homology in other plant species in relation to different stresses. The result of the present study provides indispensable genomic information for the stress responsive WRKY transcription factors in Arabidopsis and will pave the way to explain the precise role of various AtWRKYs in plant growth and development under stressed conditions.

  8. Comparing genomes: databases and computational tools for comparative analysis of prokaryotic genomes - DOI: 10.3395/reciis.v1i2.Sup.105en

    Directory of Open Access Journals (Sweden)

    Marcos Catanho

    2007-12-01

    Full Text Available Since the 1990's, the complete genetic code of more than 600 living organisms has been deciphered, such as bacteria, yeasts, protozoan parasites, invertebrates and vertebrates, including Homo sapiens, and plants. More than 2,000 other genome projects representing medical, commercial, environmental and industrial interests, or comprising model organisms, important for the development of the scientific research, are currently in progress. The achievement of complete genome sequences of numerous species combined with the tremendous progress in computation that occurred in the last few decades allowed the use of new holistic approaches in the study of genome structure, organization and evolution, as well as in the field of gene prediction and functional classification. Numerous public or proprietary databases and computational tools have been created attempting to optimize the access to this information through the web. In this review, we present the main resources available through the web for comparative analysis of prokaryotic genomes. We concentrated on the group of mycobacteria that contains important human and animal pathogens. The birth of Bioinformatics and Computational Biology and the contributions of these disciplines to the scientific development of this field are also discussed.

  9. Genome analysis and DNA marker-based characterisation of pathogenic trypanosomes

    NARCIS (Netherlands)

    Agbo, Edwin Chukwura

    2003-01-01

    The advances in genomics technologies and genome analysis methods that offer new leads for accelerating discovery of putative targets for developing overall control tools are reviewed in Chapter 1. In Chapter 2, a PCR typing method based on restriction fragment length polymorphism analysis of the

  10. Soybean (Glycine max) SWEET gene family: insights through comparative genomics, transcriptome profiling and whole genome re-sequence analysis.

    Science.gov (United States)

    Patil, Gunvant; Valliyodan, Babu; Deshmukh, Rupesh; Prince, Silvas; Nicander, Bjorn; Zhao, Mingzhe; Sonah, Humira; Song, Li; Lin, Li; Chaudhary, Juhi; Liu, Yang; Joshi, Trupti; Xu, Dong; Nguyen, Henry T

    2015-07-11

    SWEET (MtN3_saliva) domain proteins, a recently identified group of efflux transporters, play an indispensable role in sugar efflux, phloem loading, plant-pathogen interaction and reproductive tissue development. The SWEET gene family is predominantly studied in Arabidopsis and members of the family are being investigated in rice. To date, no transcriptome or genomics analysis of soybean SWEET genes has been reported. In the present investigation, we explored the evolutionary aspect of the SWEET gene family in diverse plant species including primitive single cell algae to angiosperms with a major emphasis on Glycine max. Evolutionary features showed expansion and duplication of the SWEET gene family in land plants. Homology searches with BLAST tools and Hidden Markov Model-directed sequence alignments identified 52 SWEET genes that were mapped to 15 chromosomes in the soybean genome as tandem duplication events. Soybean SWEET (GmSWEET) genes showed a wide range of expression profiles in different tissues and developmental stages. Analysis of public transcriptome data and expression profiling using quantitative real time PCR (qRT-PCR) showed that a majority of the GmSWEET genes were confined to reproductive tissue development. Several natural genetic variants (non-synonymous SNPs, premature stop codons and haplotype) were identified in the GmSWEET genes using whole genome re-sequencing data analysis of 106 soybean genotypes. A significant association was observed between SNP-haplogroup and seed sucrose content in three gene clusters on chromosome 6. Present investigation utilized comparative genomics, transcriptome profiling and whole genome re-sequencing approaches and provided a systematic description of soybean SWEET genes and identified putative candidates with probable roles in the reproductive tissue development. Gene expression profiling at different developmental stages and genomic variation data will aid as an important resource for the soybean research

  11. Molecular Comparisons of Full Length Metapneumovirus (MPV) Genomes, Including Newly Determined French AMPV-C and –D Isolates, Further Supports Possible Subclassification within the MPV Genus

    Science.gov (United States)

    Brown, Paul A.; Lemaitre, Evelyne; Briand, François-Xavier; Courtillon, Céline; Guionie, Olivier; Allée, Chantal; Toquin, Didier; Bayon-Auboyer, Marie-Hélène; Jestin, Véronique; Eterradossi, Nicolas

    2014-01-01

    Four avian metapneumovirus (AMPV) subgroups (A–D) have been reported previously based on genetic and antigenic differences. However, until now full length sequences of the only known isolates of European subgroup C and subgroup D viruses (duck and turkey origin, respectively) have been unavailable. These full length sequences were determined and compared with other full length AMPV and human metapneumoviruses (HMPV) sequences reported previously, using phylogenetics, comparisons of nucleic and amino acid sequences and study of codon usage bias. Results confirmed that subgroup C viruses were more closely related to HMPV than they were to the other AMPV subgroups in the study. This was consistent with previous findings using partial genome sequences. Closer relationships between AMPV-A, B and D were also evident throughout the majority of results. Three metapneumovirus “clusters” HMPV, AMPV-C and AMPV-A, B and D were further supported by codon bias and phylogenetics. The data presented here together with those of previous studies describing antigenic relationships also between AMPV-A, B and D and between AMPV-C and HMPV may call for a subclassification of metapneumoviruses similar to that used for avian paramyxoviruses, grouping AMPV-A, B and D as type I metapneumoviruses and AMPV-C and HMPV as type II. PMID:25036224

  12. CoryneCenter – An online resource for the integrated analysis of corynebacterial genome and transcriptome data

    Directory of Open Access Journals (Sweden)

    Hüser Andrea T

    2007-11-01

    Full Text Available Abstract Background The introduction of high-throughput genome sequencing and post-genome analysis technologies, e.g. DNA microarray approaches, has created the potential to unravel and scrutinize complex gene-regulatory networks on a large scale. The discovery of transcriptional regulatory interactions has become a major topic in modern functional genomics. Results To facilitate the analysis of gene-regulatory networks, we have developed CoryneCenter, a web-based resource for the systematic integration and analysis of genome, transcriptome, and gene regulatory information for prokaryotes, especially corynebacteria. For this purpose, we extended and combined the following systems into a common platform: (1 GenDB, an open source genome annotation system, (2 EMMA, a MAGE compliant application for high-throughput transcriptome data storage and analysis, and (3 CoryneRegNet, an ontology-based data warehouse designed to facilitate the reconstruction and analysis of gene regulatory interactions. We demonstrate the potential of CoryneCenter by means of an application example. Using microarray hybridization data, we compare the gene expression of Corynebacterium glutamicum under acetate and glucose feeding conditions: Known regulatory networks are confirmed, but moreover CoryneCenter points out additional regulatory interactions. Conclusion CoryneCenter provides more than the sum of its parts. Its novel analysis and visualization features significantly simplify the process of obtaining new biological insights into complex regulatory systems. Although the platform currently focusses on corynebacteria, the integrated tools are by no means restricted to these species, and the presented approach offers a general strategy for the analysis and verification of gene regulatory networks. CoryneCenter provides freely accessible projects with the underlying genome annotation, gene expression, and gene regulation data. The system is publicly available at http://www.CoryneCenter.de.

  13. Genome-wide SNP discovery in tetraploid alfalfa using 454 sequencing and high resolution melting analysis

    Directory of Open Access Journals (Sweden)

    Zhao Patrick X

    2011-07-01

    Full Text Available Abstract Background Single nucleotide polymorphisms (SNPs are the most common type of sequence variation among plants and are often functionally important. We describe the use of 454 technology and high resolution melting analysis (HRM for high throughput SNP discovery in tetraploid alfalfa (Medicago sativa L., a species with high economic value but limited genomic resources. Results The alfalfa genotypes selected from M. sativa subsp. sativa var. 'Chilean' and M. sativa subsp. falcata var. 'Wisfal', which differ in water stress sensitivity, were used to prepare cDNA from tissue of clonally-propagated plants grown under either well-watered or water-stressed conditions, and then pooled for 454 sequencing. Based on 125.2 Mb of raw sequence, a total of 54,216 unique sequences were obtained including 24,144 tentative consensus (TCs sequences and 30,072 singletons, ranging from 100 bp to 6,662 bp in length, with an average length of 541 bp. We identified 40,661 candidate SNPs distributed throughout the genome. A sample of candidate SNPs were evaluated and validated using high resolution melting (HRM analysis. A total of 3,491 TCs harboring 20,270 candidate SNPs were located on the M. truncatula (MT 3.5.1 chromosomes. Gene Ontology assignments indicate that sequences obtained cover a broad range of GO categories. Conclusions We describe an efficient method to identify thousands of SNPs distributed throughout the alfalfa genome covering a broad range of GO categories. Validated SNPs represent valuable molecular marker resources that can be used to enhance marker density in linkage maps, identify potential factors involved in heterosis and genetic variation, and as tools for association mapping and genomic selection in alfalfa.

  14. Complete genome and comparative analysis of Streptococcus gallolyticus subsp. gallolyticus, an emerging pathogen of infective endocarditis

    Directory of Open Access Journals (Sweden)

    Dreier Jens

    2011-08-01

    Full Text Available Abstract Background Streptococcus gallolyticus subsp. gallolyticus is an important causative agent of infectious endocarditis, while the pathogenicity of this species is widely unclear. To gain insight into the pathomechanisms and the underlying genetic elements for lateral gene transfer, we sequenced the entire genome of this pathogen. Results We sequenced the whole genome of S. gallolyticus subsp. gallolyticus strain ATCC BAA-2069, consisting of a 2,356,444 bp circular DNA molecule with a G+C-content of 37.65% and a novel 20,765 bp plasmid designated as pSGG1. Bioinformatic analysis predicted 2,309 ORFs and the presence of 80 tRNAs and 21 rRNAs in the chromosome. Furthermore, 21 ORFs were detected on the plasmid pSGG1, including tetracycline resistance genes telL and tet(O/W/32/O. Screening of 41 S. gallolyticus subsp. gallolyticus isolates revealed one plasmid (pSGG2 homologous to pSGG1. We further predicted 21 surface proteins containing the cell wall-sorting motif LPxTG, which were shown to play a functional role in the adhesion of bacteria to host cells. In addition, we performed a whole genome comparison to the recently sequenced S. gallolyticus subsp. gallolyticus strain UCN34, revealing significant differences. Conclusions The analysis of the whole genome sequence of S. gallolyticus subsp. gallolyticus promotes understanding of genetic factors concerning the pathogenesis and adhesion to ECM of this pathogen. For the first time we detected the presence of the mobilizable pSGG1 plasmid, which may play a functional role in lateral gene transfer and promote a selective advantage due to a tetracycline resistance.

  15. Microenvironmental Heterogeneity Parallels Breast Cancer Progression: A Histology-Genomic Integration Analysis.

    Directory of Open Access Journals (Sweden)

    Rachael Natrajan

    2016-02-01

    Full Text Available The intra-tumor diversity of cancer cells is under intense investigation; however, little is known about the heterogeneity of the tumor microenvironment that is key to cancer progression and evolution. We aimed to assess the degree of microenvironmental heterogeneity in breast cancer and correlate this with genomic and clinical parameters.We developed a quantitative measure of microenvironmental heterogeneity along three spatial dimensions (3-D in solid tumors, termed the tumor ecosystem diversity index (EDI, using fully automated histology image analysis coupled with statistical measures commonly used in ecology. This measure was compared with disease-specific survival, key mutations, genome-wide copy number, and expression profiling data in a retrospective study of 510 breast cancer patients as a test set and 516 breast cancer patients as an independent validation set. In high-grade (grade 3 breast cancers, we uncovered a striking link between high microenvironmental heterogeneity measured by EDI and a poor prognosis that cannot be explained by tumor size, genomics, or any other data types. However, this association was not observed in low-grade (grade 1 and 2 breast cancers. The prognostic value of EDI was superior to known prognostic factors and was enhanced with the addition of TP53 mutation status (multivariate analysis test set, p = 9 × 10-4, hazard ratio = 1.47, 95% CI 1.17-1.84; validation set, p = 0.0011, hazard ratio = 1.78, 95% CI 1.26-2.52. Integration with genome-wide profiling data identified losses of specific genes on 4p14 and 5q13 that were enriched in grade 3 tumors with high microenvironmental diversity that also substratified patients into poor prognostic groups. Limitations of this study include the number of cell types included in the model, that EDI has prognostic value only in grade 3 tumors, and that our spatial heterogeneity measure was dependent on spatial scale and tumor size.To our knowledge, this is the first

  16. Five Complete Chloroplast Genome Sequences from Diospyros: Genome Organization and Comparative Analysis.

    Science.gov (United States)

    Fu, Jianmin; Liu, Huimin; Hu, Jingjing; Liang, Yuqin; Liang, Jinjun; Wuyun, Tana; Tan, Xiaofeng

    2016-01-01

    Diospyros is the largest genus in Ebenaceae, comprising more than 500 species with remarkable economic value, especially Diospyros kaki Thunb., which has traditionally been an important food resource in China, Korea, and Japan. Complete chloroplast (cp) genomes from D. kaki, D. lotus L., D. oleifera Cheng., D. glaucifolia Metc., and Diospyros 'Jinzaoshi' were sequenced using Illumina sequencing technology. This is the first cp genome reported in Ebenaceae. The cp genome sequences of Diospyros ranged from 157,300 to 157,784 bp in length, presenting a typical quadripartite structure with two inverted repeats each separated by one large and one small single-copy region. For each cp genome, 134 genes were annotated, including 80 protein-coding, 31 tRNA, and 4 rRNA unique genes. In all, 179 repeats and 283 single sequence repeats were identified. Four hypervariable regions, namely, intergenic region of trnQ_rps16, trnV_ndhC, and psbD_trnT, and intron of ndhA, were identified in the Diospyros genomes. Phylogenetic analyses based on the whole cp genome, protein-coding, and intergenic and intron sequences indicated that D. oleifera is closely related to D. kaki and could be used as a model plant for future research on D. kaki; to our knowledge, this is proposed for the first time. Further, these analyses together with two large deletions (301 and 140 bp) in the cp genome of D. 'Jinzaoshi', support its placement as a new species in Diospyros. Both maximum parsimony and likelihood analyses for 19 taxa indicated the basal position of Ericales in asterids and suggested that Ebenaceae is monophyletic in Ericales.

  17. Full cost accounting in the analysis of separated waste collection efficiency: A methodological proposal.

    Science.gov (United States)

    D'Onza, Giuseppe; Greco, Giulio; Allegrini, Marco

    2016-02-01

    Recycling implies additional costs for separated municipal solid waste (MSW) collection. The aim of the present study is to propose and implement a management tool - the full cost accounting (FCA) method - to calculate the full collection costs of different types of waste. Our analysis aims for a better understanding of the difficulties of putting FCA into practice in the MSW sector. We propose a FCA methodology that uses standard cost and actual quantities to calculate the collection costs of separate and undifferentiated waste. Our methodology allows cost efficiency analysis and benchmarking, overcoming problems related to firm-specific accounting choices, earnings management policies and purchase policies. Our methodology allows benchmarking and variance analysis that can be used to identify the causes of off-standards performance and guide managers to deploy resources more efficiently. Our methodology can be implemented by companies lacking a sophisticated management accounting system. Copyright © 2015 Elsevier Ltd. All rights reserved.

  18. Analysis of the Complete Chloroplast Genome of a Medicinal Plant, Dianthus superbus var. longicalyncinus, from a Comparative Genomics Perspective.

    Science.gov (United States)

    Raman, Gurusamy; Park, SeonJoo

    2015-01-01

    Dianthus superbus var. longicalycinus is an economically important traditional Chinese medicinal plant that is also used for ornamental purposes. In this study, D. superbus was compared to its closely related family of Caryophyllaceae chloroplast (cp) genomes such as Lychnis chalcedonica and Spinacia oleracea. D. superbus had the longest large single copy (LSC) region (82,805 bp), with some variations in the inverted repeat region A (IRA)/LSC regions. The IRs underwent both expansion and constriction during evolution of the Caryophyllaceae family; however, intense variations were not identified. The pseudogene ribosomal protein subunit S19 (rps19) was identified at the IRA/LSC junction, but was not present in the cp genome of other Caryophyllaceae family members. The translation initiation factor IF-1 (infA) and ribosomal protein subunit L23 (rpl23) genes were absent from the Dianthus cp genome. When the cp genome of Dianthus was compared with 31 other angiosperm lineages, the infA gene was found to have been lost in most members of rosids, solanales of asterids and Lychnis of Caryophyllales, whereas rpl23 gene loss or pseudogization had occurred exclusively in Caryophyllales. Nevertheless, the cp genome of Dianthus and Spinacia has two introns in the proteolytic subunit of ATP-dependent protease (clpP) gene, but Lychnis has lost introns from the clpP gene. Furthermore, phylogenetic analysis of individual protein-coding genes infA and rpl23 revealed that gene loss or pseudogenization occurred independently in the cp genome of Dianthus. Molecular phylogenetic analysis also demonstrated a sister relationship between Dianthus and Lychnis based on 78 protein-coding sequences. The results presented herein will contribute to studies of the evolution, molecular biology and genetic engineering of the medicinal and ornamental plant, D. superbus var. longicalycinus.

  19. Rice-arsenate interactions in hydroponics: whole genome transcriptional analysis.

    Science.gov (United States)

    Norton, Gareth J; Lou-Hing, Daniel E; Meharg, Andrew A; Price, Adam H

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 muM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the BalaxAzucena mapping population.

  20. Rice–arsenate interactions in hydroponics: whole genome transcriptional analysis

    Science.gov (United States)

    Norton, Gareth J.; Lou-Hing, Daniel E.; Meharg, Andrew A.; Price, Adam H.

    2008-01-01

    Rice (Oryza sativa) varieties that are arsenate-tolerant (Bala) and -sensitive (Azucena) were used to conduct a transcriptome analysis of the response of rice seedlings to sodium arsenate (AsV) in hydroponic solution. RNA extracted from the roots of three replicate experiments of plants grown for 1 week in phosphate-free nutrient with or without 13.3 μM AsV was used to challenge the Affymetrix (52K) GeneChip Rice Genome array. A total of 576 probe sets were significantly up-regulated at least 2-fold in both varieties, whereas 622 were down-regulated. Ontological classification is presented. As expected, a large number of transcription factors, stress proteins, and transporters demonstrated differential expression. Striking is the lack of response of classic oxidative stress-responsive genes or phytochelatin synthases/synthatases. However, the large number of responses from genes involved in glutathione synthesis, metabolism, and transport suggests that glutathione conjugation and arsenate methylation may be important biochemical responses to arsenate challenge. In this report, no attempt is made to dissect differences in the response of the tolerant and sensitive variety, but analysis in a companion article will link gene expression to the known tolerance loci available in the Bala×Azucena mapping population. PMID:18453530

  1. Lignin degradation: microorganisms, enzymes involved, genomes analysis and evolution.

    Science.gov (United States)

    Janusz, Grzegorz; Pawlik, Anna; Sulej, Justyna; Swiderska-Burek, Urszula; Jarosz-Wilkolazka, Anna; Paszczynski, Andrzej

    2017-11-01

    Extensive research efforts have been dedicated to describing degradation of wood, which is a complex process; hence, microorganisms have evolved different enzymatic and non-enzymatic strategies to utilize this plentiful plant material. This review describes a number of fungal and bacterial organisms which have developed both competitive and mutualistic strategies for the decomposition of wood and to thrive in different ecological niches. Through the analysis of the enzymatic machinery engaged in wood degradation, it was possible to elucidate different strategies of wood decomposition which often depend on ecological niches inhabited by given organism. Moreover, a detailed description of low molecular weight compounds is presented, which gives these organisms not only an advantage in wood degradation processes, but seems rather to be a new evolutionatory alternative to enzymatic combustion. Through analysis of genomics and secretomic data, it was possible to underline the probable importance of certain wood-degrading enzymes produced by different fungal organisms, potentially giving them advantage in their ecological niches. The paper highlights different fungal strategies of wood degradation, which possibly correlates to the number of genes coding for secretory enzymes. Furthermore, investigation of the evolution of wood-degrading organisms has been described. © FEMS 2017.

  2. Comparative sequence analysis of Sordaria macrospora and Neurospora crassa as a means to improve genome annotation.

    Science.gov (United States)

    Nowrousian, Minou; Würtz, Christian; Pöggeler, Stefanie; Kück, Ulrich

    2004-03-01

    One of the most challenging parts of large scale sequencing projects is the identification of functional elements encoded in a genome. Recently, studies of genomes of up to six different Saccharomyces species have demonstrated that a comparative analysis of genome sequences from closely related species is a powerful approach to identify open reading frames and other functional regions within genomes [Science 301 (2003) 71, Nature 423 (2003) 241]. Here, we present a comparison of selected sequences from Sordaria macrospora to their corresponding Neurospora crassa orthologous regions. Our analysis indicates that due to the high degree of sequence similarity and conservation of overall genomic organization, S. macrospora sequence information can be used to simplify the annotation of the N. crassa genome.

  3. Comparative genome analysis of Bacillus cereus group genomes withBacillus subtilis

    Energy Technology Data Exchange (ETDEWEB)

    Anderson, Iain; Sorokin, Alexei; Kapatral, Vinayak; Reznik, Gary; Bhattacharya, Anamitra; Mikhailova, Natalia; Burd, Henry; Joukov, Victor; Kaznadzey, Denis; Walunas, Theresa; D' Souza, Mark; Larsen, Niels; Pusch,Gordon; Liolios, Konstantinos; Grechkin, Yuri; Lapidus, Alla; Goltsman,Eugene; Chu, Lien; Fonstein, Michael; Ehrlich, S. Dusko; Overbeek, Ross; Kyrpides, Nikos; Ivanova, Natalia

    2005-09-14

    Genome features of the Bacillus cereus group genomes (representative strains of Bacillus cereus, Bacillus anthracis and Bacillus thuringiensis sub spp israelensis) were analyzed and compared with the Bacillus subtilis genome. A core set of 1,381 protein families among the four Bacillus genomes, with an additional set of 933 families common to the B. cereus group, was identified. Differences in signal transduction pathways, membrane transporters, cell surface structures, cell wall, and S-layer proteins suggesting differences in their phenotype were identified. The B. cereus group has signal transduction systems including a tyrosine kinase related to two-component system histidine kinases from B. subtilis. A model for regulation of the stress responsive sigma factor sigmaB in the B. cereus group different from the well studied regulation in B. subtilis has been proposed. Despite a high degree of chromosomal synteny among these genomes, significant differences in cell wall and spore coat proteins that contribute to the survival and adaptation in specific hosts has been identified.

  4. K-mer Content, Correlation, and Position Analysis of Genome DNA Sequences for the Identification of Function and Evolutionary Features

    Directory of Open Access Journals (Sweden)

    Aaron Sievers

    2017-04-01

    Full Text Available In genome analysis, k-mer-based comparison methods have become standard tools. However, even though they are able to deliver reliable results, other algorithms seem to work better in some cases. To improve k-mer-based DNA sequence analysis and comparison, we successfully checked whether adding positional resolution is beneficial for finding and/or comparing interesting organizational structures. A simple but efficient algorithm for extracting and saving local k-mer spectra (frequency distribution of k-mers was developed and used. The results were analyzed by including positional information based on visualizations as genomic maps and by applying basic vector correlation methods. This analysis was concentrated on small word lengths (1 ≤ k ≤ 4 on relatively small viral genomes of Papillomaviridae and Herpesviridae, while also checking its usability for larger sequences, namely human chromosome 2 and the homologous chromosomes (2A, 2B of a chimpanzee. Using this alignment-free analysis, several regions with specific characteristics in Papillomaviridae and Herpesviridae formerly identified by independent, mostly alignment-based methods, were confirmed. Correlations between the k-mer content and several genes in these genomes have been found, showing similarities between classified and unclassified viruses, which may be potentially useful for further taxonomic research. Furthermore, unknown k-mer correlations in the genomes of Human Herpesviruses (HHVs, which are probably of major biological function, are found and described. Using the chromosomes of a chimpanzee and human that are currently known, identities between the species on every analyzed chromosome were reproduced. This demonstrates the feasibility of our approach for large data sets of complex genomes. Based on these results, we suggest k-mer analysis with positional resolution as a method for closing a gap between the effectiveness of alignment-based methods (like NCBI BLAST and the

  5. Analysis of high moderation full MOX BWR core physics experiments BASALA

    International Nuclear Information System (INIS)

    Ishii, Kazuya; Ando, Yoshihira; Takada, Naoyuki; Kan, Taro; Sasagawa, Masaru; Kikuchi, Tsukasa; Yamamoto, Toru; Kanda, Ryoji; Umano, Takuya

    2005-01-01

    Nuclear Power Engineering Corporation (NUPEC) has performed conceptual design studies of high moderation full MOX LWR cores that aim for increasing fissile Pu consumption rate and reducing residual Pu in discharged MOX fuel. As part of these studies, NUPEC, French Atomic Energy Commission (CEA) and their industrial partners implemented an experimental program BASALA following MISTRAL. They were devoted to measuring the core physics parameters of such advanced cores. The MISTRAL program consists of one reference UO 2 core, two homogeneous full MOX cores and one full MOX PWR mock-up core that have higher moderation ratio than the conventional lattice. As for MISTRAL, the analysis results have already been reported on April 2003. The BASALA program consists of two high moderation full MOX BWR mock-up cores for operating and cold stand-by conditions. NUPEC has analyzed the experimental results of BASALA with the diffusion and the transport calculations by the SRAC code system and the continuous energy Monte Carlo calculations by the MVP code with the common nuclear data file, JENDL-3.2. The calculation results well reproduce the experimental data approximately within the same range of the experimental uncertainty. The analysis results of MISTRAL and BASALA indicate that these applied analysis methods have the same accuracy for the UO 2 and MOX cores, for the different moderation MOX cores, and for the homogeneous and the mock-up MOX cores. (author)

  6. Full-Band Quasi-Harmonic Analysis and Synthesis of Musical Instrument Sounds with Adaptive Sinusoids

    Directory of Open Access Journals (Sweden)

    Marcelo Caetano

    2016-05-01

    Full Text Available Sinusoids are widely used to represent the oscillatory modes of musical instrument sounds in both analysis and synthesis. However, musical instrument sounds feature transients and instrumental noise that are poorly modeled with quasi-stationary sinusoids, requiring spectral decomposition and further dedicated modeling. In this work, we propose a full-band representation that fits sinusoids across the entire spectrum. We use the extended adaptive Quasi-Harmonic Model (eaQHM to iteratively estimate amplitude- and frequency-modulated (AM–FM sinusoids able to capture challenging features such as sharp attacks, transients, and instrumental noise. We use the signal-to-reconstruction-error ratio (SRER as the objective measure for the analysis and synthesis of 89 musical instrument sounds from different instrumental families. We compare against quasi-stationary sinusoids and exponentially damped sinusoids. First, we show that the SRER increases with adaptation in eaQHM. Then, we show that full-band modeling with eaQHM captures partials at the higher frequency end of the spectrum that are neglected by spectral decomposition. Finally, we demonstrate that a frame size equal to three periods of the fundamental frequency results in the highest SRER with AM–FM sinusoids from eaQHM. A listening test confirmed that the musical instrument sounds resynthesized from full-band analysis with eaQHM are virtually perceptually indistinguishable from the original recordings.

  7. A full genomic characterization of the development of a stable Small Colony Variant cell-type by a clinical Staphylococcus aureus strain.

    Science.gov (United States)

    Bui, Long M G; Kidd, Stephen P

    2015-12-01

    A key to persistent and recurrent Staphylococcus aureus infections is its ability to adapt to diverse and toxic conditions. This ability includes a switch into a biofilm or to the quasi-dormant Small Colony Variant (SCV). The development and molecular attributes of SCVs have been difficult to study due to their rapid reversion to their parental cell-type. We recently described the unique induction of a matrix-embedded and stable SCV cell-type in a clinical S. aureus strain (WCH-SK2) by growing the cells with limiting conditions for a prolonged timeframe. Here we further study their characteristics. They possessed an increased viability in the presence of antibiotics compared to their non-SCV form. Their stability implied that there had been genetic changes; we therefore determined both the genome sequence of WCH-SK2 and its stable SCV form at a single base resolution, employing Single Molecular Real-Time (SMRT) sequencing that enabled the methylome to also be determined. The genetic features of WCH-SK2 have been identified; the SCCmec type, the pathogenicity and genetic islands and virulence factors. The genetic changes that had occurred in the stable SCV form were identified; most notably being in MgrA, a global regulator, and RsbU, a phosphoserine phosphatase within the regulatory pathway of the sigma factor SigB. There was a shift in the methylomes of the non-SCV and stable SCV forms. We have also shown a similar induction of this cell-type in other S. aureus strains and performed a genetic comparison to these and other S. aureus genomes. We additionally map RNAseq data to the WCH-SK2 genome in a transcriptomic analysis of the parental, SCV and stable SCV cells. The results from this study represent the unique identification of a suite of epigenetic, genetic and transcriptional factors that are implicated in the switch in S. aureus to its persistent SCV form. Copyright © 2015 Elsevier B.V. All rights reserved.

  8. Comparative genomic in situ hybridization analysis on the ...

    African Journals Online (AJOL)

    The nucleolar organizing regions (NORs), a few telomeres, most centromeric regions and numerous interstitial sites were detected. The signals in small genomes were relatively sparse and unevenly distributed along chromosomes, whereas those in large genomes were dense and basically evenly distributed.

  9. Whole-genome sequence-based analysis of thyroid function

    DEFF Research Database (Denmark)

    Taylor, Peter N.; Porcu, Eleonora; Chew, Shelby

    2015-01-01

    Normal thyroid function is essential for health, but its genetic architecture remains poorly understood. Here, for the heritable thyroid traits thyrotropin (TSH) and free thyroxine (FT4), we analyse whole-genome sequence data from the UK10K project (N = 2,287). Using additional whole-genome seque...

  10. A bibliometric analysis of global research on genome sequencing ...

    African Journals Online (AJOL)

    The results show that disease and protein related researches were the leading research focuses, and comparative genomics and evolution related research had strong potential in the near future. Key words: Genome sequencing, research trend, scientometrics, science citation index expanded (SCI-Expanded), word cluster ...

  11. Mainstreaming sex and gender analysis in public health genomics

    NARCIS (Netherlands)

    Verdonk, P.; Klinge, I.

    2012-01-01

    Background: The integration of genome-based knowledge into public health or public health genomics (PHG) aims to contribute to disease prevention, health promotion, and risk reduction associated with genetic disease susceptibility. Men and women differ, for instance, in susceptibilities for heart

  12. Genomic Analysis of Caldithrix abyssi, the Thermophilic Anaerobic Bacterium of the Novel Bacterial Phylum Calditrichaeota

    OpenAIRE

    Kublanov, Ilya V.; Sigalova, Olga M.; Gavrilov, Sergey N.; Lebedinsky, Alexander V.; Rinke, Christian; Kovaleva, Olga; Chernyh, Nikolai A.; Ivanova, Natalia; Daum, Chris; Reddy, T.B.K.; Klenk, Hans-Peter; Spring, Stefan; G?ker, Markus; Reva, Oleg N.; Miroshnichenko, Margarita L.

    2017-01-01

    © 2017 Kublanov, Sigalova, Gavrilov, Lebedinsky, Rinke, Kovaleva, Chernyh, Ivanova, Daum, Reddy, Klenk, Spring, Göker, Reva, Miroshnichenko, Kyrpides, Woyke, Gelfand, Bonch-Osmolovskaya. The genome of Caldithrix abyssi, the first cultivated representative of a phylum-level bacterial lineage, was sequenced within the framework of Genomic Encyclopedia of Bacteria and Archaea (GEBA) project. The genomic analysis revealed mechanisms allowing this anaerobic bacterium to ferment peptides or to impl...

  13. Intraspecific phylogenetic analysis of Siberian woolly mammoths using complete mitochondrial genomes

    DEFF Research Database (Denmark)

    Gilbert, M Thomas P; Drautz, Daniela I; Lesk, Arthur M

    2008-01-01

    We report five new complete mitochondrial DNA (mtDNA) genomes of Siberian woolly mammoth (Mammuthus primigenius), sequenced with up to 73-fold coverage from DNA extracted from hair shaft material. Three of the sequences present the first complete mtDNA genomes of mammoth clade II. Analysis...... to indicate any important functional difference between genomes belonging to the two clades, suggesting that the loss of clade II more likely is due to genetic drift than a selective sweep....

  14. CloVR-Comparative: automated, cloud-enabled comparative microbial genome sequence analysis pipeline

    OpenAIRE

    Agrawal, Sonia; Arze, Cesar; Adkins, Ricky S.; Crabtree, Jonathan; Riley, David; Vangala, Mahesh; Galens, Kevin; Fraser, Claire M.; Tettelin, Herv?; White, Owen; Angiuoli, Samuel V.; Mahurkar, Anup; Fricke, W. Florian

    2017-01-01

    Background The benefit of increasing genomic sequence data to the scientific community depends on easy-to-use, scalable bioinformatics support. CloVR-Comparative combines commonly used bioinformatics tools into an intuitive, automated, and cloud-enabled analysis pipeline for comparative microbial genomics. Results CloVR-Comparative runs on annotated complete or draft genome sequences that are uploaded by the user or selected via a taxonomic tree-based user interface and downloaded from NCBI. ...

  15. Genome Sequencing and Comparative Analysis of Stenotrophomonas acidaminiphila Reveal Evolutionary Insights Into Sulfamethoxazole Resistance

    Directory of Open Access Journals (Sweden)

    Yao-Ting Huang

    2018-05-01

    Full Text Available Stenotrophomonas acidaminiphila is an aerobic, glucose non-fermentative, Gram-negative bacterium that been isolated from various environmental sources, particularly aquatic ecosystems. Although resistance to multiple antimicrobial agents has been reported in S. acidaminiphila, the mechanisms are largely unknown. Here, for the first time, we report the complete genome and antimicrobial resistome analysis of a clinical isolate S. acidaminiphila SUNEO which is resistant to sulfamethoxazole. Comparative analysis among closely related strains identified common and strain-specific genes. In particular, comparison with a sulfamethoxazole-sensitive strain identified a mutation within the sulfonamide-binding site of folP in SUNEO, which may reduce the binding affinity of sulfamethoxazole. Selection pressure analysis indicated folP in SUNEO is under purifying selection, which may be owing to long-term administration of sulfonamide against Stenotrophomonas.

  16. The theoretical study of full spectrum analysis method for airborne gamma-ray spectrometric data

    International Nuclear Information System (INIS)

    Ni Weichong

    2011-01-01

    Spectra of airborne gamma-ray spectrometry was found to be the synthesis of spectral components of radioelement sources by analyzing the constitution of radioactive sources for airborne gamma-ray spectrometric survey and establishing the models of gamma-ray measurement. The mathematical equation for analysising airborne gamma-ray full spectrometric data can be expressed into matrix and related expansions were developed for the mineral resources exploration, environmental radiation measurement, nuclear emergency monitoring, and so on. Theoretical study showed that the atmospheric radon could be directly computed by airborne gamma-ray spectrometric data with full spectrum analysis without the use of the accessional upward-looking detectors. (authors)

  17. Whole genome analysis of Klebsiella pneumoniae T2-1-1 from human oral cavity

    Directory of Open Access Journals (Sweden)

    Kok-Gan Chan

    2016-03-01

    Full Text Available Klebsiella pneumoniae T2-1-1 was isolated from the human tongue debris and subjected to whole genome sequencing on HiSeq platform and annotated on RAST. The nucleotide sequence of this genome was deposited into DDBJ/EMBL/GenBank under the accession JAQL00000000. Keywords: Human tongue surface, Oral cavity, Oral bacteria, Virulence

  18. The First Complete Chloroplast Genome Sequences in Actinidiaceae: Genome Structure and Comparative Analysis.

    Science.gov (United States)

    Yao, Xiaohong; Tang, Ping; Li, Zuozhou; Li, Dawei; Liu, Yifei; Huang, Hongwen

    2015-01-01

    Actinidia chinensis is an important economic plant belonging to the basal lineage of the asterids. Availability of a complete Actinidia chloroplast genome sequence is crucial to understanding phylogenetic relationships among major lineages of angiosperms and facilitates kiwifruit genetic improvement. We report here the complete nucleotide sequences of the chloroplast genomes for Actinidia chinensis and A. chinensis var deliciosa obtained through de novo assembly of Illumina paired-end reads produced by total DNA sequencing. The total genome size ranges from 155,446 to 157,557 bp, with an inverted repeat (IR) of 24,013 to 24,391 bp, a large single copy region (LSC) of 87,984 to 88,337 bp and a small single copy region (SSC) of 20,332 to 20,336 bp. The genome encodes 113 different genes, including 79 unique protein-coding genes, 30 tRNA genes and 4 ribosomal RNA genes, with 16 duplicated in the inverted repeats, and a tRNA gene (trnfM-CAU) duplicated once in the LSC region. Comparisons of IR boundaries among four asterid species showed that IR/LSC borders were extended into the 5' portion of the psbA gene and IR contraction occurred in Actinidia. The clap gene has been lost from the chloroplast genome in Actinidia, and may have been transferred to the nucleus during chloroplast evolution. Twenty-seven polymorphic simple sequence repeat (SSR) loci were identified in the Actinidia chloroplast genome. Maximum parsimony analyses of a 72-gene, 16 taxa angiosperm dataset strongly support the placement of Actinidiaceae in Ericales within the basal asterids.

  19. Advanced Whole-Genome Sequencing and Analysis of Fetal Genomes from Amniotic Fluid.

    Science.gov (United States)

    Mao, Qing; Chin, Robert; Xie, Weiwei; Deng, Yuqing; Zhang, Wenwei; Xu, Huixin; Zhang, Rebecca Yu; Shi, Quan; Peters, Erin E; Gulbahce, Natali; Li, Zhenyu; Chen, Fang; Drmanac, Radoje; Peters, Brock A

    2018-04-01

    Amniocentesis is a common procedure, the primary purpose of which is to collect cells from the fetus to allow testing for abnormal chromosomes, altered chromosomal copy number, or a small number of genes that have small single- to multibase defects. Here we demonstrate the feasibility of generating an accurate whole-genome sequence of a fetus from either the cellular or cell-free DNA (cfDNA) of an amniotic sample. cfDNA and DNA isolated from the cell pellet of 31 amniocenteses were sequenced to approximately 50× genome coverage by use of the Complete Genomics nanoarray platform. In a subset of the samples, long fragment read libraries were generated from DNA isolated from cells and sequenced to approximately 100× genome coverage. Concordance of variant calls between the 2 DNA sources and with parental libraries was >96%. Two fetal genomes were found to harbor potentially detrimental variants in chromodomain helicase DNA binding protein 8 ( CHD8 ) and LDL receptor-related protein 1 ( LRP1 ), variations of which have been associated with autism spectrum disorder and keratosis pilaris atrophicans, respectively. We also discovered drug sensitivities and carrier information of fetuses for a variety of diseases. We were able to elucidate the complete genome sequence of 31 fetuses from amniotic fluid and demonstrate that the cfDNA or DNA from the cell pellet can be analyzed with little difference in quality. We believe that current technologies could analyze this material in a highly accurate and complete manner and that analyses like these should be considered for addition to current amniocentesis procedures. © 2018 American Association for Clinical Chemistry.

  20. Cloud Based Resource for Data Hosting, Visualization and Analysis Using UCSC Cancer Genomics Browser | Informatics Technology for Cancer Research (ITCR)

    Science.gov (United States)

    The Cancer Analysis Virtual Machine (CAVM) project will leverage cloud technology, the UCSC Cancer Genomics Browser, and the Galaxy analysis workflow system to provide investigators with a flexible, scalable platform for hosting, visualizing and analyzing their own genomic data.

  1. Comparative transcriptional and genomic analysis of Plasmodium falciparum field isolates.

    Directory of Open Access Journals (Sweden)

    Margaret J Mackinnon

    2009-10-01

    Full Text Available Mechanisms for differential regulation of gene expression may underlie much of the phenotypic variation and adaptability of malaria parasites. Here we describe transcriptional variation among culture-adapted field isolates of Plasmodium falciparum, the species responsible for most malarial disease. It was found that genes coding for parasite protein export into the red cell cytosol and onto its surface, and genes coding for sexual stage proteins involved in parasite transmission are up-regulated in field isolates compared with long-term laboratory isolates. Much of this variability was associated with the loss of small or large chromosomal segments, or other forms of gene copy number variation that are prevalent in the P. falciparum genome (copy number variants, CNVs. Expression levels of genes inside these segments were correlated to that of genes outside and adjacent to the segment boundaries, and this association declined with distance from the CNV boundary. This observation could not be explained by copy number variation in these adjacent genes. This suggests a local-acting regulatory role for CNVs in transcription of neighboring genes and helps explain the chromosomal clustering that we observed here. Transcriptional co-regulation of physical clusters of adaptive genes may provide a way for the parasite to readily adapt to its highly heterogeneous and strongly selective environment.

  2. Genome-Wide Transcriptome Analysis of Cadmium Stress in Rice

    Directory of Open Access Journals (Sweden)

    Youko Oono

    2016-01-01

    Full Text Available Rice growth is severely affected by toxic concentrations of the nonessential heavy metal cadmium (Cd. To elucidate the molecular basis of the response to Cd stress, we performed mRNA sequencing of rice following our previous study on exposure to high concentrations of Cd (Oono et al., 2014. In this study, rice plants were hydroponically treated with low concentrations of Cd and approximately 211 million sequence reads were mapped onto the IRGSP-1.0 reference rice genome sequence. Many genes, including some identified under high Cd concentration exposure in our previous study, were found to be responsive to low Cd exposure, with an average of about 11,000 transcripts from each condition. However, genes expressed constitutively across the developmental course responded only slightly to low Cd concentrations, in contrast to their clear response to high Cd concentration, which causes fatal damage to rice seedlings according to phenotypic changes. The expression of metal ion transporter genes tended to correlate with Cd concentration, suggesting the potential of the RNA-Seq strategy to reveal novel Cd-responsive transporters by analyzing gene expression under different Cd concentrations. This study could help to develop novel strategies for improving tolerance to Cd exposure in rice and other cereal crops.

  3. Investigating hookworm genomes by comparative analysis of two Ancylostoma species

    Directory of Open Access Journals (Sweden)

    Kapulkin Wadim

    2005-04-01

    Full Text Available Abstract Background Hookworms, infecting over one billion people, are the mostly closely related major human parasites to the model nematode Caenorhabditis elegans. Applying genomics techniques to these species, we analyzed 3,840 and 3,149 genes from Ancylostoma caninum and A. ceylanicum. Results Transcripts originated from libraries representing infective L3 larva, stimulated L3, arrested L3, and adults. Most genes are represented in single stages including abundant transcripts like hsp-20 in infective L3 and vit-3 in adults. Over 80% of the genes have homologs in C. elegans, and nearly 30% of these were with observable RNA interference phenotypes. Homologies were identified to nematode-specific and clade V specific gene families. To study the evolution of hookworm genes, 574 A. caninum / A. ceylanicum orthologs were identified, all of which were found to be under purifying selection with distribution ratios of nonsynonymous to synonymous amino acid substitutions similar to that reported for C. elegans / C. briggsae orthologs. The phylogenetic distance between A. caninum and A. ceylanicum is almost identical to that for C. elegans / C. briggsae. Conclusion The genes discovered should substantially accelerate research toward better understanding of the parasites' basic biology as well as new therapies including vaccines and novel anthelmintics.

  4. Analysis of the genetic variation in Mycobacterium tuberculosis strains by multiple genome alignments

    Directory of Open Access Journals (Sweden)

    Morales Juan

    2008-11-01

    Full Text Available Abstract Background The recent determination of the complete nucleotide sequence of several Mycobacterium tuberculosis (MTB genomes allows the use of comparative genomics as a tool for dissecting the nature and consequence of genetic variability within this species. The multiple alignment of the genomes of clinical strains (CDC1551, F11, Haarlem and C, along with the genomes of laboratory strains (H37Rv and H37Ra, provides new insights on the mechanisms of adaptation of this bacterium to the human host. Findings The genetic variation found in six M. tuberculosis strains does not involve significant genomic rearrangements. Most of the variation results from deletion and transposition events preferentially associated with insertion sequences and genes of the PE/PPE family but not with genes implicated in virulence. Using a Perl-based software islandsanalyser, which creates a representation of the genetic variation in the genome, we identified differences in the patterns of distribution and frequency of the polymorphisms across the genome. The identification of genes displaying strain-specific polymorphisms and the extrapolation of the number of strain-specific polymorphisms to an unlimited number of genomes indicates that the different strains contain a limited number of unique polymorphisms. Conclusion The comparison of multiple genomes demonstrates that the M. tuberculosis genome is currently undergoing an active process of gene decay, analogous to the adaptation process of obligate bacterial symbionts. This observation opens new perspectives into the evolution and the understanding of the pathogenesis of this bacterium.

  5. A SAS2H/KENO-V Methodology for 3D Full Core depletion analysis

    International Nuclear Information System (INIS)

    Milosevic, M.; Greenspan, E.; Vujic, J.; Petrovic, B.

    2003-04-01

    This paper describes the use of a SAS2H/KENO-V methodology for 3D full core depletion analysis and illustrates its capabilities by applying it to burnup analysis of the IRIS core benchmarks. This new SAS2H/KENO-V sequence combines a 3D Monte Carlo full core calculation of node power distribution and a 1D Wigner-Seitz equivalent cell transport method for independent depletion calculation of each of the nodes. This approach reduces by more than an order of magnitude the time required for getting comparable results using the MOCUP code system. The SAS2H/KENO-V results for the asymmetric IRIS core benchmark are in good agreement with the results of the ALPHA/PHOENIX/ANC code system. (author)

  6. Analysis of the critical and first full power operating cores for PARR using leu oxide fuel

    International Nuclear Information System (INIS)

    Khan, L.A.; Qazi, M.K.; Bokhari, I.H.; Fazal, R.

    1989-10-01

    This paper explains the analysis for determining the first full power operating core for PARR using LEU oxide fuel. The core configuration selected for this first full power operation contains about 6.13 kg of U-235 distributed in 19 standard and five control fuel elements. The neutron flux level is doubled when core is shifted from 5MW to 10 MW. Total nuclear power peaking factor of the core is 2.03. The analysis shows that the core can be operated safely at 5 MW with a flow rate of 520 meter cube per hour and at 10 MW with a flow rate of 900 meter cube per hour. (A.B.). 10 figs

  7. Thermal analysis of both ventilated and full disc brake rotors with frictional heat generation

    Directory of Open Access Journals (Sweden)

    Belhocine A.

    2014-06-01

    Full Text Available In automotive engineering, the safety aspect has been considered as a number one priority in development of a new vehicle. Each single system has been studied and developed in order to meet safety requirements. Instead of having air bags, good suspension systems, good handling and safe cornering, one of the most critical systems in a vehicle is the brake system. The objective of this work is to investigate and analyze the temperature distribution of rotor disc during braking operation using ANSYS Multiphysics. The work uses the finite element analysis techniques to predict the temperature distribution on the full and ventilated brake discs and to identify the critical temperature of the rotor. The analysis also gives us the heat flux distribution for the two discs.

  8. Comparative genomic and functional analysis of 100 Lactobacillus rhamnosus strains and their comparison with strain GG.

    Directory of Open Access Journals (Sweden)

    François P Douillard

    Full Text Available Lactobacillus rhamnosus is a lactic acid bacterium that is found in a large variety of ecological habitats, including artisanal and industrial dairy products, the oral cavity, intestinal tract or vagina. To gain insights into the genetic complexity and ecological versatility of the species L. rhamnosus, we examined the genomes and phenotypes of 100 L. rhamnosus strains isolated from diverse sources. The genomes of 100 L. rhamnosus strains were mapped onto the L. rhamnosus GG reference genome. These strains were phenotypically characterized for a wide range of metabolic, antagonistic, signalling and functional properties. Phylogenomic analysis showed multiple groupings of the species that could partly be associated with their ecological niches. We identified 17 highly variable regions that encode functions related to lifestyle, i.e. carbohydrate transport and metabolism, production of mucus-binding pili, bile salt resistance, prophages and CRISPR adaptive immunity. Integration of the phenotypic and genomic data revealed that some L. rhamnosus strains possibly resided in multiple niches, illustrating the dynamics of bacterial habitats. The present study showed two distinctive geno-phenotypes in the L. rhamnosus species. The geno-phenotype A suggests an adaptation to stable nutrient-rich niches, i.e. milk-derivative products, reflected by the alteration or loss of biological functions associated with antimicrobial activity spectrum, stress resistance, adaptability and fitness to a distinctive range of habitats. In contrast, the geno-phenotype B displays adequate traits to a variable environment, such as the intestinal tract, in terms of nutrient resources, bacterial population density and host effects.

  9. eXframe: reusable framework for storage, analysis and visualization of genomics experiments

    Directory of Open Access Journals (Sweden)

    Sinha Amit U

    2011-11-01

    Full Text Available Abstract Background Genome-wide experiments are routinely conducted to measure gene expression, DNA-protein interactions and epigenetic status. Structured metadata for these experiments is imperative for a complete understanding of experimental conditions, to enable consistent data processing and to allow retrieval, comparison, and integration of experimental results. Even though several repositories have been developed for genomics data, only a few provide annotation of samples and assays using controlled vocabularies. Moreover, many of them are tailored for a single type of technology or measurement and do not support the integration of multiple data types. Results We have developed eXframe - a reusable web-based framework for genomics experiments that provides 1 the ability to publish structured data compliant with accepted standards 2 support for multiple data types including microarrays and next generation sequencing 3 query, analysis and visualization integration tools (enabled by consistent processing of the raw data and annotation of samples and is available as open-source software. We present two case studies where this software is currently being used to build repositories of genomics experiments - one contains data from hematopoietic stem cells and another from Parkinson's disease patients. Conclusion The web-based framework eXframe offers structured annotation of experiments as well as uniform processing and storage of molecular data from microarray and next generation sequencing platforms. The framework allows users to query and integrate information across species, technologies, measurement types and experimental conditions. Our framework is reusable and freely modifiable - other groups or institutions can deploy their own custom web-based repositories based on this software. It is interoperable with the most important data formats in this domain. We hope that other groups will not only use eXframe, but also contribute their own

  10. Simultaneous analysis of all SNPs in genome-wide and re-sequencing association studies.

    Directory of Open Access Journals (Sweden)

    Clive J Hoggart

    2008-07-01

    Full Text Available Testing one SNP at a time does not fully realise the potential of genome-wide association studies to identify multiple causal variants, which is a plausible scenario for many complex diseases. We show that simultaneous analysis of the entire set of SNPs from a genome-wide study to identify the subset that best predicts disease outcome is now feasible, thanks to developments in stochastic search methods. We used a Bayesian-inspired penalised maximum likelihood approach in which every SNP can be considered for additive, dominant, and recessive contributions to disease risk. Posterior mode estimates were obtained for regression coefficients that were each assigned a prior with a sharp mode at zero. A non-zero coefficient estimate was interpreted as corresponding to a significant SNP. We investigated two prior distributions and show that the normal-exponential-gamma prior leads to improved SNP selection in comparison with single-SNP tests. We also derived an explicit approximation for type-I error that avoids the need to use permutation procedures. As well as genome-wide analyses, our method is well-suited to fine mapping with very dense SNP sets obtained from re-sequencing and/or imputation. It can accommodate quantitative as well as case-control phenotypes, covariate adjustment, and can be extended to search for interactions. Here, we demonstrate the power and empirical type-I error of our approach using simulated case-control data sets of up to 500 K SNPs, a real genome-wide data set of 300 K SNPs, and a sequence-based dataset, each of which can be analysed in a few hours on a desktop workstation.

  11. Using Whole Genome Analysis to Examine Recombination across Diverse Sequence Types of Staphylococcus aureus.

    Directory of Open Access Journals (Sweden)

    Elizabeth M Driebe

    Full Text Available Staphylococcus aureus is an important clinical pathogen worldwide and understanding this organism's phylogeny and, in particular, the role of recombination, is important both to understand the overall spread of virulent lineages and to characterize outbreaks. To further elucidate the phylogeny of S. aureus, 35 diverse strains were sequenced using whole genome sequencing. In addition, 29 publicly available whole genome sequences were included to create a single nucleotide polymorphism (SNP-based phylogenetic tree encompassing 11 distinct lineages. All strains of a particular sequence type fell into the same clade with clear groupings of the major clonal complexes of CC8, CC5, CC30, CC45 and CC1. Using a novel analysis method, we plotted the homoplasy density and SNP density across the whole genome and found evidence of recombination throughout the entire chromosome, but when we examined individual clonal lineages we found very little recombination. However, when we analyzed three branches of multiple lineages, we saw intermediate and differing levels of recombination between them. These data demonstrate that in S. aureus, recombination occurs across major lineages that subsequently expand in a clonal manner. Estimated mutation rates for the CC8 and CC5 lineages were different from each other. While the CC8 lineage rate was similar to previous studies, the CC5 lineage was 100-fold greater. Fifty known virulence genes were screened in all genomes in silico to determine their distribution across major clades. Thirty-three genes were present variably across clades, most of which were not constrained by ancestry, indicating horizontal gene transfer or gene loss.

  12. Genome Sequence Analysis of the Naphthenic Acid Degrading and Metal Resistant Bacterium Cupriavidus gilardii CR3.

    Directory of Open Access Journals (Sweden)

    Xiaoyu Wang

    Full Text Available Cupriavidus sp. are generally heavy metal tolerant bacteria with the ability to degrade a variety of aromatic hydrocarbon compounds, although the degradation pathways and substrate versatilities remain largely unknown. Here we studied the bacterium Cupriavidus gilardii strain CR3, which was isolated from a natural asphalt deposit, and which was shown to utilize naphthenic acids as a sole carbon source. Genome sequencing of C. gilardii CR3 was carried out to elucidate possible mechanisms for the naphthenic acid biodegradation. The genome of C. gilardii CR3 was composed of two circular chromosomes chr1 and chr2 of respectively 3,539,530 bp and 2,039,213 bp in size. The genome for strain CR3 encoded 4,502 putative protein-coding genes, 59 tRNA genes, and many other non-coding genes. Many genes were associated with xenobiotic biodegradation and metal resistance functions. Pathway prediction for degradation of cyclohexanecarboxylic acid, a representative naphthenic acid, suggested that naphthenic acid undergoes initial ring-cleavage, after which the ring fission products can be degraded via several plausible degradation pathways including a mechanism similar to that used for fatty acid oxidation. The final metabolic products of these pathways are unstable or volatile compounds that were not toxic to CR3. Strain CR3 was also shown to have tolerance to at least 10 heavy metals, which was mainly achieved by self-detoxification through ion efflux, metal-complexation and metal-reduction, and a powerful DNA self-repair mechanism. Our genomic analysis suggests that CR3 is well adapted to survive the harsh environment in natural asphalts containing naphthenic acids and high concentrations of heavy metals.

  13. Characterization of partial and near full-length genomes of HIV-1 strains sampled from recently infected individuals in São Paulo, Brazil.

    Directory of Open Access Journals (Sweden)

    Sabri Saeed Sanabani

    Full Text Available BACKGROUND: Genetic variability is a major feature of human immunodeficiency virus type 1 (HIV-1 and is considered the key factor frustrating efforts to halt the HIV epidemic. A proper understanding of HIV-1 genomic diversity is a fundamental prerequisite for proper epidemiology, genetic diagnosis, and successful drugs and vaccines design. Here, we report on the partial and near full-length genomic (NFLG variability of HIV-1 isolates from a well-characterized cohort of recently infected patients in São Paul, Brazil. METHODOLOGY: HIV-1 proviral DNA was extracted from the peripheral blood mononuclear cells of 113 participants. The NFLG and partial fragments were determined by overlapping nested PCR and direct sequencing. The data were phylogenetically analyzed. RESULTS: Of the 113 samples (90.3% male; median age 31 years; 79.6% homosexual men studied, 77 (68.1% NFLGs and 32 (29.3% partial fragments were successfully subtyped. Of the successfully subtyped sequences, 88 (80.7% were subtype B sequences, 12 (11% BF1 recombinants, 3 (2.8% subtype C sequences, 2 (1.8% BC recombinants and subclade F1 each, 1 (0.9% CRF02 AG, and 1 (0.9% CRF31 BC. Primary drug resistance mutations were observed in 14/101 (13.9% of samples, with 5.9% being resistant to protease inhibitors and nucleoside reverse transcriptase inhibitors (NRTI and 4.9% resistant to non-NRTIs. Predictions of viral tropism were determined for 86 individuals. X4 or X4 dual or mixed-tropic viruses (X4/DM were seen in 26 (30.2% of subjects. The proportion of X4 viruses in homosexuals was detected in 19/69 (27.5%. CONCLUSIONS: Our results confirm the existence of various HIV-1 subtypes circulating in São Paulo, and indicate that subtype B account for the majority of infections. Antiretroviral (ARV drug resistance is relatively common among recently infected patients. The proportion of X4 viruses in homosexuals was significantly higher than the proportion seen in other study populations.

  14. A Cost-Effective Design and Analysis of Full Bridge LLC Resonant Converter

    OpenAIRE

    Kaibalya Prasad Panda; Sreyasee Rout

    2016-01-01

    LLC (Inductor-inductor-capacitor) resonant converter has lots of advantages over other type of resonant converters which include high efficiency, more reliable and have high power density. This paper presents the design and analysis of a full bridge LLC resonant converter. In addition to the operational principle, the ZVS and ZCS conditions are also explained with the DC characteristics. Simulation of the LLC resonant converter is performed in MATLAB/ Simulink and the practical prototype setu...

  15. Status of the ITER full-tungsten divertor shaping and heat load distribution analysis

    International Nuclear Information System (INIS)

    Carpentier-Chouchana, S; Hirai, T; Escourbiac, F; Durocher, A; Fedosov, A; Ferrand, L; Kocan, M; Kukushkin, A S; Jokinen, T; Komarov, V; Lehnen, M; Merola, M; Mitteau, R; Pitts, R A; Sugihara, M; Firdaouss, M; Stangeby, P C

    2014-01-01

    In September 2011, the ITER Organization (IO) proposed to begin operation with a full-tungsten (W) armoured divertor, with the objective of taking a decision on the final target material (carbon fibre composite or W) by the end of 2013. This period of 2 years would enable the development of a full-W divertor design compatible with nuclear operations, the investigation of further several physics R and D aspects associated with the use of W targets and the completion of technology qualification. Beginning with a brief overview of the reference heat load specifications which have been defined for the full-W engineering activity, this paper will report on the current status of the ITER divertor shaping and will summarize the results of related three-dimensional heat load distribution analysis performed as part of the design validation. (paper)

  16. Full factorial design analysis of carbon nanotube polymer-cement composites

    Directory of Open Access Journals (Sweden)

    Fábio de Paiva Cota

    2012-08-01

    Full Text Available The work described in this paper is related to the effect of adding carbon nanotubes (CNT on the mechanical properties of polymer-cement composites. A full factorial design has been performed on 160 samples to identify the contribution provided by the following factors: polymeric phase addition, CNT weight addition and water/cement ratio. The response parameters of the full factorial design were the bulk density, apparent porosity, compressive strength and elastic modulus of the polymer-cement-based nanocomposites. All the factors considered in this analysis affected significantly the bulk density and apparent porosity of the composites. The compressive strength and elastic modulus were affected primarily by the cross-interactions between polymeric phase and CNT additions, and the water/cement ratio with polymeric phase factors.

  17. Comparative Genome Analysis Reveals Divergent Genome Size Evolution in a Carnivorous Plant Genus

    Czech Academy of Sciences Publication Activity Database

    Vu, G.T.H.; Schmutzer, T.; Bull, F.; Cao, H.X.; Fuchs, J.; Tran, T.D.; Jovtchev, G.; Pistrick, K.; Stein, N.; Pečinka, A.; Neumann, Pavel; Novák, Petr; Macas, Jiří; Dear, P.H.; Blattner, F.R.; Scholz, U.; Schubert, I.

    2015-01-01

    Roč. 8, č. 3 (2015) ISSN 1940-3372 R&D Projects: GA ČR GBP501/12/G090 Institutional support: RVO:60077344 Keywords : Genlisea * genome * repetitive sequences Subject RIV: EB - Genetics ; Molecular Biology Impact factor: 3.509, year: 2015

  18. Meta-analysis of genome-wide association from genomic prediction models

    Science.gov (United States)

    A limitation of many genome-wide association studies (GWA) in animal breeding is that there are many loci with small effect sizes; thus, larger sample sizes (N) are required to guarantee suitable power of detection. To increase sample size, results from different GWA can be combined in a meta-analys...

  19. Genome-Wide Analysis of the RNA Helicase Gene Family in Gossypium raimondii

    Directory of Open Access Journals (Sweden)

    Jie Chen

    2014-03-01

    Full Text Available The RNA helicases, which help to unwind stable RNA duplexes, and have important roles in RNA metabolism, belong to a class of motor proteins that play important roles in plant development and responses to stress. Although this family of genes has been the subject of systematic investigation in Arabidopsis, rice, and tomato, it has not yet been characterized in cotton. In this study, we identified 161 putative RNA helicase genes in the genome of the diploid cotton species Gossypium raimondii. We classified these genes into three subfamilies, based on the presence of either a DEAD-box (51 genes, DEAH-box (52 genes, or DExD/H-box (58 genes in their coding regions. Chromosome location analysis showed that the genes that encode RNA helicases are distributed across all 13 chromosomes of G. raimondii. Syntenic analysis revealed that 62 of the 161 G. raimondii helicase genes (38.5% are within the identified syntenic blocks. Sixty-six (40.99% helicase genes from G. raimondii have one or several putative orthologs in tomato. Additionally, GrDEADs have more conserved gene structures and more simple domains than GrDEAHs and GrDExD/Hs. Transcriptome sequencing data demonstrated that many of these helicases, especially GrDEADs, are highly expressed at the fiber initiation stage and in mature leaves. To our knowledge, this is the first report of a genome-wide analysis of the RNA helicase gene family in cotton.

  20. Comparative Genome Analysis of Lolium-Festuca Complex Species

    DEFF Research Database (Denmark)

    Czaban, Adrian; Byrne, Stephen; Sharma, Sapna

    2015-01-01

    , winter hardiness, drought tolerance and resistance to grazing. In this study we have sequenced and assembled the low copy fraction of the genomes of Lolium westerwoldicum, Lolium multiflorum, Festuca pratensis and Lolium temulentum. We have also generated de-novo transcriptome assemblies for each species......, and these have aided in the annotation of the genomic sequence. Using this data we were able to generate annotated assemblies of the gene rich regions of the four species to complement the already sequenced Lolium perenne genome. Using these gene models we have identified orthologous genes between the species...

  1. Genomic analysis of WCP30 Phage of Weissella cibaria for Dairy Fermented Foods.

    Science.gov (United States)

    Lee, Young-Duck; Park, Jong-Hyun

    2017-01-01

    In this study, we report the morphogenetic analysis and genome sequence of a new WCP30 phage of Weissella cibaria , isolated from a fermented food. Based on its morphology, as observed by transmission electron microscopy, WCP30 phage belongs to the family Siphoviridae . Genomic analysis of WCP30 phage showed that it had a 33,697-bp double-stranded DNA genome with 41.2% G+C content. Bioinformatics analysis of the genome revealed 35 open reading frames. A BLASTN search showed that WCP30 phage had low sequence similarity compared to other phages infecting lactic acid bacteria. This is the first report of the morphological features and complete genome sequence of WCP30 phage, which may be useful for controlling the fermentation of dairy foods.

  2. Ultrahigh-dimensional variable selection method for whole-genome gene-gene interaction analysis

    Directory of Open Access Journals (Sweden)

    Ueki Masao

    2012-05-01

    Full Text Available Abstract Background Genome-wide gene-gene interaction analysis using single nucleotide polymorphisms (SNPs is an attractive way for identification of genetic components that confers susceptibility of human complex diseases. Individual hypothesis testing for SNP-SNP pairs as in common genome-wide association study (GWAS however involves difficulty in setting overall p-value due to complicated correlation structure, namely, the multiple testing problem that causes unacceptable false negative results. A large number of SNP-SNP pairs than sample size, so-called the large p small n problem, precludes simultaneous analysis using multiple regression. The method that overcomes above issues is thus needed. Results We adopt an up-to-date method for ultrahigh-dimensional variable selection termed the sure independence screening (SIS for appropriate handling of numerous number of SNP-SNP interactions by including them as predictor variables in logistic regression. We propose ranking strategy using promising dummy coding methods and following variable selection procedure in the SIS method suitably modified for gene-gene interaction analysis. We also implemented the procedures in a software program, EPISIS, using the cost-effective GPGPU (General-purpose computing on graphics processing units technology. EPISIS can complete exhaustive search for SNP-SNP interactions in standard GWAS dataset within several hours. The proposed method works successfully in simulation experiments and in application to real WTCCC (Wellcome Trust Case–control Consortium data. Conclusions Based on the machine-learning principle, the proposed method gives powerful and flexible genome-wide search for various patterns of gene-gene interaction.

  3. The Integrated Microbial Genomes (IMG) System: An Expanding Comparative Analysis Resource

    Energy Technology Data Exchange (ETDEWEB)

    Markowitz, Victor M.; Chen, I-Min A.; Palaniappan, Krishna; Chu, Ken; Szeto, Ernest; Grechkin, Yuri; Ratner, Anna; Anderson, Iain; Lykidis, Athanasios; Mavromatis, Konstantinos; Ivanova, Natalia N.; Kyrpides, Nikos C.

    2009-09-13

    The integrated microbial genomes (IMG) system serves as a community resource for comparative analysis of publicly available genomes in a comprehensive integrated context. IMG contains both draft and complete microbial genomes integrated with other publicly available genomes from all three domains of life, together with a large number of plasmids and viruses. IMG provides tools and viewers for analyzing and reviewing the annotations of genes and genomes in a comparative context. Since its first release in 2005, IMG's data content and analytical capabilities have been constantly expanded through regular releases. Several companion IMG systems have been set up in order to serve domain specific needs, such as expert review of genome annotations. IMG is available at .

  4. Analysis of the Genome and Chromium Metabolism-Related Genes of Serratia sp. S2.

    Science.gov (United States)

    Dong, Lanlan; Zhou, Simin; He, Yuan; Jia, Yan; Bai, Qunhua; Deng, Peng; Gao, Jieying; Li, Yingli; Xiao, Hong

    2018-05-01

    This study is to investigate the genome sequence of Serratia sp. S2. The genomic DNA of Serratia sp. S2 was extracted and the sequencing library was constructed. The sequencing was carried out by Illumina 2000 and complete genomic sequences were obtained. Gene function annotation and bioinformatics analysis were performed by comparing with the known databases. The genome size of Serratia sp. S2 was 5,604,115 bp and the G+C content was 57.61%. There were 5373 protein coding genes, and 3732, 3614, and 3942 genes were respectively annotated into the GO, KEGG, and COG databases. There were 12 genes related to chromium metabolism in the Serratia sp. S2 genome. The whole genome sequence of Serratia sp. S2 is submitted to the GenBank database with gene accession number of LNRP00000000. Our findings may provide theoretical basis for the subsequent development of new biotechnology to repair environmental chromium pollution.

  5. Comparative genomics and functional analysis of the 936 group of lactococcal Siphoviridae phages

    NARCIS (Netherlands)

    Murphy, James; Bottacini, Francesca; Mahony, Jennifer; Kelleher, Philip; Neve, Horst; Zomer, Aldert; Nauta, Arjen; van Sinderen, Douwe

    2016-01-01

    Genome sequencing and comparative analysis of bacteriophage collections has greatly enhanced our understanding regarding their prevalence, phage-host interactions as well as the overall biodiversity of their genomes. This knowledge is very relevant to phages infecting Lactoc