WorldWideScience

Sample records for identify sequence variation

  1. SoftSearch: integration of multiple sequence features to identify breakpoints of structural variations.

    Directory of Open Access Journals (Sweden)

    Steven N Hart

    Full Text Available BACKGROUND: Structural variation (SV represents a significant, yet poorly understood contribution to an individual's genetic makeup. Advanced next-generation sequencing technologies are widely used to discover such variations, but there is no single detection tool that is considered a community standard. In an attempt to fulfil this need, we developed an algorithm, SoftSearch, for discovering structural variant breakpoints in Illumina paired-end next-generation sequencing data. SoftSearch combines multiple strategies for detecting SV including split-read, discordant read-pair, and unmated pairs. Co-localized split-reads and discordant read pairs are used to refine the breakpoints. RESULTS: We developed and validated SoftSearch using real and synthetic datasets. SoftSearch's key features are 1 not requiring secondary (or exhaustive primary alignment, 2 portability into established sequencing workflows, and 3 is applicable to any DNA-sequencing experiment (e.g. whole genome, exome, custom capture, etc.. SoftSearch identifies breakpoints from a small number of soft-clipped bases from split reads and a few discordant read-pairs which on their own would not be sufficient to make an SV call. CONCLUSIONS: We show that SoftSearch can identify more true SVs by combining multiple sequence features. SoftSearch was able to call clinically relevant SVs in the BRCA2 gene not reported by other tools while offering significantly improved overall performance.

  2. VWF mutations and new sequence variations identified in healthy controls are more frequent in the African-American population.

    Science.gov (United States)

    Bellissimo, Daniel B; Christopherson, Pamela A; Flood, Veronica H; Gill, Joan Cox; Friedman, Kenneth D; Haberichter, Sandra L; Shapiro, Amy D; Abshire, Thomas C; Leissinger, Cindy; Hoots, W Keith; Lusher, Jeanne M; Ragni, Margaret V; Montgomery, Robert R

    2012-03-01

    Diagnosis and classification of VWD is aided by molecular analysis of the VWF gene. Because VWF polymorphisms have not been fully characterized, we performed VWF laboratory testing and gene sequencing of 184 healthy controls with a negative bleeding history. The controls included 66 (35.9%) African Americans (AAs). We identified 21 new sequence variations, 13 (62%) of which occurred exclusively in AAs and 2 (G967D, T2666M) that were found in 10%-15% of the AA samples, suggesting they are polymorphisms. We identified 14 sequence variations reported previously as VWF mutations, the majority of which were type 1 mutations. These controls had VWF Ag levels within the normal range, suggesting that these sequence variations might not always reduce plasma VWF levels. Eleven mutations were found in AAs, and the frequency of M740I, H817Q, and R2185Q was 15%-18%. Ten AA controls had the 2N mutation H817Q; 1 was homozygous. The average factor VIII level in this group was 99 IU/dL, suggesting that this variation may confer little or no clinical symptoms. This study emphasizes the importance of sequencing healthy controls to understand ethnic-specific sequence variations so that asymptomatic sequence variations are not misidentified as mutations in other ethnic or racial groups.

  3. SeqAnt: A web service to rapidly identify and annotate DNA sequence variations

    Directory of Open Access Journals (Sweden)

    Patel Viren

    2010-09-01

    Full Text Available Abstract Background The enormous throughput and low cost of second-generation sequencing platforms now allow research and clinical geneticists to routinely perform single experiments that identify tens of thousands to millions of variant sites. Existing methods to annotate variant sites using information from publicly available databases via web browsers are too slow to be useful for the large sequencing datasets being routinely generated by geneticists. Because sequence annotation of variant sites is required before functional characterization can proceed, the lack of a high-throughput pipeline to efficiently annotate variant sites can act as a significant bottleneck in genetics research. Results SeqAnt (Sequence Annotator is an open source web service and software package that rapidly annotates DNA sequence variants and identifies recessive or compound heterozygous loci in human, mouse, fly, and worm genome sequencing experiments. Variants are characterized with respect to their functional type, frequency, and evolutionary conservation. Annotated variants can be viewed on a web browser, downloaded in a tab-delimited text file, or directly uploaded in a BED format to the UCSC genome browser. To demonstrate the speed of SeqAnt, we annotated a series of publicly available datasets that ranged in size from 37 to 3,439,107 variant sites. The total time to completely annotate these data completely ranged from 0.17 seconds to 28 minutes 49.8 seconds. Conclusion SeqAnt is an open source web service and software package that overcomes a critical bottleneck facing research and clinical geneticists using second-generation sequencing platforms. SeqAnt will prove especially useful for those investigators who lack dedicated bioinformatics personnel or infrastructure in their laboratories.

  4. Identifying Rare Variation in Cases of Schizophrenia in the Isolated Population of the Faroe Islands using Whole-genome Sequencing

    DEFF Research Database (Denmark)

    Als, Thomas Damm; Lescai, Francesco; Dahl, Hans

    to map risk variants involved in complex traits. We aim at utilizing samples of cases and controls of the isolated population of the Faroe Islands to conduct whole-genome-sequence analysis in order to identify rare genetic variants associated with schizophrenia. We will search for rare genetic variants...... of developing SZ. However, these studies are designed to examining only “the common variant” proportion of the genomic landscape of SZ. Due to increased genetic drift during founding and potential bottlenecks, followed by population expansion, isolated populations may be particularly useful in identifying rare...... disease variants, that may appear at higher frequencies and/or within a more clearly distinct haplotype structure compared to outbred populations. Small isolated populations also typically show reduced phenotypic, genetic and environmental heterogeneity, thus making them advantageous in studies aiming...

  5. Sequencing of a patient with balanced chromosome abnormalities and neurodevelopmental disease identifies disruption of multiple high risk loci by structural variation.

    Directory of Open Access Journals (Sweden)

    Jonathon Blake

    Full Text Available Balanced chromosome abnormalities (BCAs occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14 that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception.

  6. Sequencing of a Patient with Balanced Chromosome Abnormalities and Neurodevelopmental Disease Identifies Disruption of Multiple High Risk Loci by Structural Variation

    Science.gov (United States)

    Blake, Jonathon; Riddell, Andrew; Theiss, Susanne; Gonzalez, Alexis Perez; Haase, Bettina; Jauch, Anna; Janssen, Johannes W. G.; Ibberson, David; Pavlinic, Dinko; Moog, Ute; Benes, Vladimir; Runz, Heiko

    2014-01-01

    Balanced chromosome abnormalities (BCAs) occur at a high frequency in healthy and diseased individuals, but cost-efficient strategies to identify BCAs and evaluate whether they contribute to a phenotype have not yet become widespread. Here we apply genome-wide mate-pair library sequencing to characterize structural variation in a patient with unclear neurodevelopmental disease (NDD) and complex de novo BCAs at the karyotype level. Nucleotide-level characterization of the clinically described BCA breakpoints revealed disruption of at least three NDD candidate genes (LINC00299, NUP205, PSMD14) that gave rise to abnormal mRNAs and could be assumed as disease-causing. However, unbiased genome-wide analysis of the sequencing data for cryptic structural variation was key to reveal an additional submicroscopic inversion that truncates the schizophrenia- and bipolar disorder-associated brain transcription factor ZNF804A as an equally likely NDD-driving gene. Deep sequencing of fluorescent-sorted wild-type and derivative chromosomes confirmed the clinically undetected BCA. Moreover, deep sequencing further validated a high accuracy of mate-pair library sequencing to detect structural variants larger than 10 kB, proposing that this approach is powerful for clinical-grade genome-wide structural variant detection. Our study supports previous evidence for a role of ZNF804A in NDD and highlights the need for a more comprehensive assessment of structural variation in karyotypically abnormal individuals and patients with neurocognitive disease to avoid diagnostic deception. PMID:24625750

  7. Understanding human DNA sequence variation.

    Science.gov (United States)

    Kidd, K K; Pakstis, A J; Speed, W C; Kidd, J R

    2004-01-01

    Over the past century researchers have identified normal genetic variation and studied that variation in diverse human populations to determine the amounts and distributions of that variation. That information is being used to develop an understanding of the demographic histories of the different populations and the species as a whole, among other studies. With the advent of DNA-based markers in the last quarter century, these studies have accelerated. One of the challenges for the next century is to understand that variation. One component of that understanding will be population genetics. We present here examples of many of the ways these new data can be analyzed from a population perspective using results from our laboratory on multiple individual DNA-based polymorphisms, many clustered in haplotypes, studied in multiple populations representing all major geographic regions of the world. These data support an "out of Africa" hypothesis for human dispersal around the world and begin to refine the understanding of population structures and genetic relationships. We are also developing baseline information against which we can compare findings at different loci to aid in the identification of loci subject, now and in the past, to selection (directional or balancing). We do not yet have a comprehensive understanding of the extensive variation in the human genome, but some of that understanding is coming from population genetics.

  8. Sequence variation of the glycoprotein gene identifies three distinct lineages within field isolates of viral hemorrhagic septicemia virus, a fish rhabdovirus

    Science.gov (United States)

    Benmansour, A.; Bascuro, B.; Monnier, A.F.; Vende, P.; Winton, J.R.; de Kinkelin, P.

    1997-01-01

    To evaluate the genetic diversity of viral haemorrhagic septicaemia virus (VHSV), the sequence of the glycoprotein genes (G) of 11 North American and European isolates were determined. Comparison with the G protein of representative members of the family Rhabdoviridae suggested that VHSV was a different virus species from infectious haemorrhagic necrosis virus (IHNV) and Hirame rhabdovirus (HIRRV). At a higher taxonomic level, VHSV, IHNV and HIRRV formed a group which was genetically closest to the genus Lyssavirus. Compared with each other, the G genes of VHSV displayed a dissimilar overall genetic diversity which correlated with differences in geographical origin. The multiple sequence alignment of the complete G protein, showed that the divergent positions were not uniformly distributed along the sequence. A central region (amino acid position 245-300) accumulated substitutions and appeared to be highly variable. The genetic heterogeneity within a single isolate was high, with an apparent internal mutation frequency of 1.2 x 10(-3) per nucleotide site, attesting the quasispecies nature of the viral population. The phylogeny separated VHSV strains according to the major geographical area of isolation: genotype I for continental Europe, genotype II for the British Isles, and genotype III for North America. Isolates from continental Europe exhibited the highest genetic variability, with sub-groups correlated partially with the serological classification. Neither neutralizing polyclonal sera, nor monoclonal antibodies, were able to discriminate between the genotypes. The overall structure of the phylogenetic tree suggests that VHSV genetic diversity and evolution fit within the model of random change and positive selection operating on quasispecies.

  9. Genomic Sequence Variation Markup Language (GSVML).

    Science.gov (United States)

    Nakaya, Jun; Kimura, Michio; Hiroi, Kaei; Ido, Keisuke; Yang, Woosung; Tanaka, Hiroshi

    2010-02-01

    With the aim of making good use of internationally accumulated genomic sequence variation data, which is increasing rapidly due to the explosive amount of genomic research at present, the development of an interoperable data exchange format and its international standardization are necessary. Genomic Sequence Variation Markup Language (GSVML) will focus on genomic sequence variation data and human health applications, such as gene based medicine or pharmacogenomics. We developed GSVML through eight steps, based on case analysis and domain investigations. By focusing on the design scope to human health applications and genomic sequence variation, we attempted to eliminate ambiguity and to ensure practicability. We intended to satisfy the requirements derived from the use case analysis of human-based clinical genomic applications. Based on database investigations, we attempted to minimize the redundancy of the data format, while maximizing the data covering range. We also attempted to ensure communication and interface ability with other Markup Languages, for exchange of omics data among various omics researchers or facilities. The interface ability with developing clinical standards, such as the Health Level Seven Genotype Information model, was analyzed. We developed the human health-oriented GSVML comprising variation data, direct annotation, and indirect annotation categories; the variation data category is required, while the direct and indirect annotation categories are optional. The annotation categories contain omics and clinical information, and have internal relationships. For designing, we examined 6 cases for three criteria as human health application and 15 data elements for three criteria as data formats for genomic sequence variation data exchange. The data format of five international SNP databases and six Markup Languages and the interface ability to the Health Level Seven Genotype Model in terms of 317 items were investigated. GSVML was developed as

  10. In silico detection of sequence variations modifying transcriptional regulation.

    Directory of Open Access Journals (Sweden)

    Malin C Andersen

    2008-01-01

    Full Text Available Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers. The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation.

  11. In Silico Detection of Sequence Variations Modifying Transcriptional Regulation

    Science.gov (United States)

    Andersen, Malin C; Engström, Pär G; Lithwick, Stuart; Arenillas, David; Eriksson, Per; Lenhard, Boris; Wasserman, Wyeth W; Odeberg, Jacob

    2008-01-01

    Identification of functional genetic variation associated with increased susceptibility to complex diseases can elucidate genes and underlying biochemical mechanisms linked to disease onset and progression. For genes linked to genetic diseases, most identified causal mutations alter an encoded protein sequence. Technological advances for measuring RNA abundance suggest that a significant number of undiscovered causal mutations may alter the regulation of gene transcription. However, it remains a challenge to separate causal genetic variations from linked neutral variations. Here we present an in silico driven approach to identify possible genetic variation in regulatory sequences. The approach combines phylogenetic footprinting and transcription factor binding site prediction to identify variation in candidate cis-regulatory elements. The bioinformatics approach has been tested on a set of SNPs that are reported to have a regulatory function, as well as background SNPs. In the absence of additional information about an analyzed gene, the poor specificity of binding site prediction is prohibitive to its application. However, when additional data is available that can give guidance on which transcription factor is involved in the regulation of the gene, the in silico binding site prediction improves the selection of candidate regulatory polymorphisms for further analyses. The bioinformatics software generated for the analysis has been implemented as a Web-based application system entitled RAVEN (regulatory analysis of variation in enhancers). The RAVEN system is available at http://www.cisreg.ca for all researchers interested in the detection and characterization of regulatory sequence variation. PMID:18208319

  12. Identifying structural variants using linked-read sequencing data.

    Science.gov (United States)

    Elyanow, Rebecca; Wu, Hsin-Ta; Raphael, Benjamin J

    2017-11-03

    Structural variation, including large deletions, duplications, inversions, translocations, and other rearrangements, is common in human and cancer genomes. A number of methods have been developed to identify structural variants from Illumina short-read sequencing data. However, reliable identification of structural variants remains challenging because many variants have breakpoints in repetitive regions of the genome and thus are difficult to identify with short reads. The recently developed linked-read sequencing technology from 10X Genomics combines a novel barcoding strategy with Illumina sequencing. This technology labels all reads that originate from a small number (~5-10) DNA molecules ~50Kbp in length with the same molecular barcode. These barcoded reads contain long-range sequence information that is advantageous for identification of structural variants. We present Novel Adjacency Identification with Barcoded Reads (NAIBR), an algorithm to identify structural variants in linked-read sequencing data. NAIBR predicts novel adjacencies in a individual genome resulting from structural variants using a probabilistic model that combines multiple signals in barcoded reads. We show that NAIBR outperforms several existing methods for structural variant identification - including two recent methods that also analyze linked-reads - on simulated sequencing data and 10X whole-genome sequencing data from the NA12878 human genome and the HCC1954 breast cancer cell line. Several of the novel somatic structural variants identified in HCC1954 overlap known cancer genes. Software is available at compbio.cs.brown.edu/software. braphael@princeton.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  13. Identifying driver mutations in sequenced cancer genomes

    DEFF Research Database (Denmark)

    Raphael, Benjamin J; Dobson, Jason R; Oesper, Layla

    2014-01-01

    High-throughput DNA sequencing is revolutionizing the study of cancer and enabling the measurement of the somatic mutations that drive cancer development. However, the resulting sequencing datasets are large and complex, obscuring the clinically important mutations in a background of errors, nois...... patterns of mutual exclusivity. These techniques, coupled with advances in high-throughput DNA sequencing, are enabling precision medicine approaches to the diagnosis and treatment of cancer....

  14. SVAMP: Sequence variation analysis, maps and phylogeny

    KAUST Repository

    Naeem, Raeece

    2014-04-03

    Summary: SVAMP is a stand-alone desktop application to visualize genomic variants (in variant call format) in the context of geographical metadata. Users of SVAMP are able to generate phylogenetic trees and perform principal coordinate analysis in real time from variant call format (VCF) and associated metadata files. Allele frequency map, geographical map of isolates, Tajima\\'s D metric, single nucleotide polymorphism density, GC and variation density are also available for visualization in real time. We demonstrate the utility of SVAMP in tracking a methicillin-resistant Staphylococcus aureus outbreak from published next-generation sequencing data across 15 countries. We also demonstrate the scalability and accuracy of our software on 245 Plasmodium falciparum malaria isolates from three continents. Availability and implementation: The Qt/C++ software code, binaries, user manual and example datasets are available at http://cbrc.kaust.edu.sa/svamp. © The Author 2014.

  15. Sequence variations in the FAD2 gene in seeded pumpkins.

    Science.gov (United States)

    Ge, Y; Chang, Y; Xu, W L; Cui, C S; Qu, S P

    2015-12-21

    Seeded pumpkins are important economic crops; the seeds contain various unsaturated fatty acids, such as oleic acid and linoleic acid, which are crucial for human and animal nutrition. The fatty acid desaturase-2 (FAD2) gene encodes delta-12 desaturase, which converts oleic acid to linoleic acid. However, little is known about sequence variations in FAD2 in seeded pumpkins. Twenty-seven FAD2 clones from 27 accessions of Cucurbita moschata, Cucurbita maxima, Cucurbita pepo, and Cucurbita ficifolia were obtained (totally 1152 bp; a single gene without introns). More than 90% nucleotide identities were detected among the 27 FAD2 clones. Nucleotide substitution, rather than nucleotide insertion and deletion, led to sequence polymorphism in the 27 FAD2 clones. Furthermore, the 27 FAD2 selected clones all encoded the FAD2 enzyme (delta-12 desaturase) with amino acid sequence identities from 91.7 to 100% for 384 amino acids. The same main-function domain between 47 and 329 amino acids was identified. The four species clustered separately based on differences in the sequences that were identified using the unweighted pair group method with arithmetic mean. Geographic origin and species were found to be closely related to sequence variation in FAD2.

  16. Variation of clinical expression in patients with Stargardt dystrophy and sequence variations in the ABCR gene.

    Science.gov (United States)

    Fishman, G A; Stone, E M; Grover, S; Derlacki, D J; Haines, H L; Hockey, R R

    1999-04-01

    To report the spectrum of ophthalmic findings in patients with Stargardt dystrophy or fundus flavimaculatus who have a specific sequence variation in the ABCR gene. Twenty-nine patients with Stargardt dystrophy or fundus flavimaculatus from different pedigrees were identified with possible disease-causing sequence variations in the ABCR gene from a group of 66 patients who were screened for sequence variations in this gene. Patients underwent a routine ocular examination, including slitlamp biomicroscopy and a dilated fundus examination. Fluorescein angiography was performed on 22 patients, and electroretinographic measurements were obtained on 24 of 29 patients. Kinetic visual fields were measured with a Goldmann perimeter in 26 patients. Single-strand conformation polymorphism analysis and DNA sequencing were used to identify variations in coding sequences of the ABCR gene. Three clinical phenotypes were observed among these 29 patients. In phenotype I, 9 of 12 patients had a sequence change in exon 42 of the ABCR gene in which the amino acid glutamic acid was substituted for glycine (Gly1961Glu). In only 4 of these 9 patients was a second possible disease-causing mutation found on the other ABCR allele. In addition to an atrophic-appearing macular lesion, phenotype I was characterized by localized perifoveal yellowish white flecks, the absence of a dark choroid, and normal electroretinographic amplitudes. Phenotype II consisted of 10 patients who showed a dark choroid and more diffuse yellowish white flecks in the fundus. None exhibited the Gly1961Glu change. Phenotype III consisted of 7 patients who showed extensive atrophic-appearing changes of the retinal pigment epithelium. Electroretinographic cone and rod amplitudes were reduced. One patient showed the Gly1961Glu change. A wide variation in clinical phenotype can occur in patients with sequence changes in the ABCR gene. In individual patients, a certain phenotype seems to be associated with the presence of

  17. Variational multi-valued velocity field estimation for transparent sequences

    DEFF Research Database (Denmark)

    Ramírez-Manzanares, Alonso; Rivera, Mariano; Kornprobst, Pierre

    2011-01-01

    Motion estimation in sequences with transparencies is an important problem in robotics and medical imaging applications. In this work we propose a variational approach for estimating multi-valued velocity fields in transparent sequences. Starting from existing local motion estimators, we derive...... a variational model for integrating in space and time such a local information in order to obtain a robust estimation of the multi-valued velocity field. With this approach, we can indeed estimate multi-valued velocity fields which are not necessarily piecewise constant on a layer –each layer can evolve...

  18. Exome sequencing identifies ZNF644 mutations in high myopia.

    Directory of Open Access Journals (Sweden)

    Yi Shi

    2011-06-01

    Full Text Available Myopia is the most common ocular disorder worldwide, and high myopia in particular is one of the leading causes of blindness. Genetic factors play a critical role in the development of myopia, especially high myopia. Recently, the exome sequencing approach has been successfully used for the disease gene identification of Mendelian disorders. Here we show a successful application of exome sequencing to identify a gene for an autosomal dominant disorder, and we have identified a gene potentially responsible for high myopia in a monogenic form. We captured exomes of two affected individuals from a Han Chinese family with high myopia and performed sequencing analysis by a second-generation sequencer with a mean coverage of 30× and sufficient depth to call variants at ∼97% of each targeted exome. The shared genetic variants of these two affected individuals in the family being studied were filtered against the 1000 Genomes Project and the dbSNP131 database. A mutation A672G in zinc finger protein 644 isoform 1 (ZNF644 was identified as being related to the phenotype of this family. After we performed sequencing analysis of the exons in the ZNF644 gene in 300 sporadic cases of high myopia, we identified an additional five mutations (I587V, R680G, C699Y, 3'UTR+12 C>G, and 3'UTR+592 G>A in 11 different patients. All these mutations were absent in 600 normal controls. The ZNF644 gene was expressed in human retinal and retinal pigment epithelium (RPE. Given that ZNF644 is predicted to be a transcription factor that may regulate genes involved in eye development, mutation may cause the axial elongation of eyeball found in high myopia patients. Our results suggest that ZNF644 might be a causal gene for high myopia in a monogenic form.

  19. Quality standards for DNA sequence variation databases to improve clinical management under development in Australia

    Directory of Open Access Journals (Sweden)

    B. Bennetts

    2014-09-01

    Full Text Available Despite the routine nature of comparing sequence variations identified during clinical testing to database records, few databases meet quality requirements for clinical diagnostics. To address this issue, The Royal College of Pathologists of Australasia (RCPA in collaboration with the Human Genetics Society of Australasia (HGSA, and the Human Variome Project (HVP is developing standards for DNA sequence variation databases intended for use in the Australian clinical environment. The outputs of this project will be promoted to other health systems and accreditation bodies by the Human Variome Project to support the development of similar frameworks in other jurisdictions.

  20. Exome sequencing identifies SUCO mutations in mesial temporal lobe epilepsy.

    Science.gov (United States)

    Sha, Zhiqiang; Sha, Longze; Li, Wenting; Dou, Wanchen; Shen, Yan; Wu, Liwen; Xu, Qi

    2015-03-30

    Mesial temporal lobe epilepsy (mTLE) is the main type and most common medically intractable form of epilepsy. Severity of disease-based stratified samples may help identify new disease-associated mutant genes. We analyzed mRNA expression profiles from patient hippocampal tissue. Three of the seven patients had severe mTLE with generalized-onset convulsions and consciousness loss that occurred over many years. We found that compared with other groups, patients with severe mTLE were classified into a distinct group. Whole-exome sequencing and Sanger sequencing validation in all seven patients identified three novel SUN domain-containing ossification factor (SUCO) mutations in severely affected patients. Furthermore, SUCO knock down significantly reduced dendritic length in vitro. Our results indicate that mTLE defects may affect neuronal development, and suggest that neurons have abnormal development due to lack of SUCO, which may be a generalized-onset epilepsy-related gene. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  1. Identifying tagging SNPs for African specific genetic variation from the African Diaspora Genome.

    Science.gov (United States)

    Johnston, Henry Richard; Hu, Yi-Juan; Gao, Jingjing; O'Connor, Timothy D; Abecasis, Gonçalo R; Wojcik, Genevieve L; Gignoux, Christopher R; Gourraud, Pierre-Antoine; Lizee, Antoine; Hansen, Mark; Genuario, Rob; Bullis, Dave; Lawley, Cindy; Kenny, Eimear E; Bustamante, Carlos; Beaty, Terri H; Mathias, Rasika A; Barnes, Kathleen C; Qin, Zhaohui S

    2017-04-21

    A primary goal of The Consortium on Asthma among African-ancestry Populations in the Americas (CAAPA) is to develop an 'African Diaspora Power Chip' (ADPC), a genotyping array consisting of tagging SNPs, useful in comprehensively identifying African specific genetic variation. This array is designed based on the novel variation identified in 642 CAAPA samples of African ancestry with high coverage whole genome sequence data (~30× depth). This novel variation extends the pattern of variation catalogued in the 1000 Genomes and Exome Sequencing Projects to a spectrum of populations representing the wide range of West African genomic diversity. These individuals from CAAPA also comprise a large swath of the African Diaspora population and incorporate historical genetic diversity covering nearly the entire Atlantic coast of the Americas. Here we show the results of designing and producing such a microchip array. This novel array covers African specific variation far better than other commercially available arrays, and will enable better GWAS analyses for researchers with individuals of African descent in their study populations. A recent study cataloging variation in continental African populations suggests this type of African-specific genotyping array is both necessary and valuable for facilitating large-scale GWAS in populations of African ancestry.

  2. Sequence Variation in Toxoplasma gondii rop17 Gene among Strains from Different Hosts and Geographical Locations

    Directory of Open Access Journals (Sweden)

    Nian-Zhang Zhang

    2014-01-01

    Full Text Available Genetic diversity of T. gondii is a concern of many studies, due to the biological and epidemiological diversity of this parasite. The present study examined sequence variation in rhoptry protein 17 (ROP17 gene among T. gondii isolates from different hosts and geographical regions. The rop17 gene was amplified and sequenced from 10 T. gondii strains, and phylogenetic relationship among these T. gondii strains was reconstructed using maximum parsimony (MP, neighbor-joining (NJ, and maximum likelihood (ML analyses. The partial rop17 gene sequences were 1375 bp in length and A+T contents varied from 49.45% to 50.11% among all examined T. gondii strains. Sequence analysis identified 33 variable nucleotide positions (2.1%, 16 of which were identified as transitions. Phylogeny reconstruction based on rop17 gene data revealed two major clusters which could readily distinguish Type I and Type II strains. Analyses of sequence variations in nucleotides and amino acids among these strains revealed high ratio of nonsynonymous to synonymous polymorphisms (>1, indicating that rop17 shows signs of positive selection. This study demonstrated the existence of slightly high sequence variability in the rop17 gene sequences among T. gondii strains from different hosts and geographical regions, suggesting that rop17 gene may represent a new genetic marker for population genetic studies of T. gondii isolates.

  3. Forward Genetics by Sequencing EMS Variation-Induced Inbred Lines

    Directory of Open Access Journals (Sweden)

    Charles Addo-Quaye

    2017-02-01

    Full Text Available In order to leverage novel sequencing techniques for cloning genes in eukaryotic organisms with complex genomes, the false positive rate of variant discovery must be controlled for by experimental design and informatics. We sequenced five lines from three pedigrees of ethyl methanesulfonate (EMS-mutagenized Sorghum bicolor, including a pedigree segregating a recessive dwarf mutant. Comparing the sequences of the lines, we were able to identify and eliminate error-prone positions. One genomic region contained EMS mutant alleles in dwarfs that were homozygous reference sequences in wild-type siblings and heterozygous in segregating families. This region contained a single nonsynonymous change that cosegregated with dwarfism in a validation population and caused a premature stop codon in the Sorghum ortholog encoding the gibberellic acid (GA biosynthetic enzyme ent-kaurene oxidase. Application of exogenous GA rescued the mutant phenotype. Our method for mapping did not require outcrossing and introduced no segregation variance. This enables work when line crossing is complicated by life history, permitting gene discovery outside of genetic models. This inverts the historical approach of first using recombination to define a locus and then sequencing genes. Our formally identical approach first sequences all the genes and then seeks cosegregation with the trait. Mutagenized lines lacking obvious phenotypic alterations are available for an extension of this approach: mapping with a known marker set in a line that is phenotypically identical to starting material for EMS mutant generation.

  4. Mitochondrial D-loop sequence variation among Italian horse breeds

    Directory of Open Access Journals (Sweden)

    Zanotti Marta

    2004-11-01

    Full Text Available Abstract The genetic variability of the mitochondrial D-loop DNA sequence in seven horse breeds bred in Italy (Giara, Haflinger, Italian trotter, Lipizzan, Maremmano, Thoroughbred and Sarcidano was analysed. Five unrelated horses were chosen in each breed and twenty-two haplotypes were identified. The sequences obtained were aligned and compared with a reference sequence and with 27 mtDNA D-loop sequences selected in the GenBank database, representing Spanish, Portuguese, North African, wild horses and an Equus asinus sequence as the outgroup. Kimura two-parameter distances were calculated and a cluster analysis using the Neighbour-joining method was performed to obtain phylogenetic trees among breeds bred in Italy and among Italian and foreign breeds. The cluster analysis indicates that all the breeds but Giara are divided in the two trees, and no clear relationships were revealed between Italian populations and the other breeds. These results could be interpreted as showing the mixed origin of breeds bred in Italy and probably indicate the presence of many ancient maternal lineages with high diversity in mtDNA sequences.

  5. MetaPhinder-Identifying Bacteriophage Sequences in Metagenomic Data Sets

    DEFF Research Database (Denmark)

    Jurtz, Vanessa Isabell; Villarroel, Julia; Lund, Ole

    2016-01-01

    genome structure of many bacteriophages. The method is demonstrated to outperform both BLAST methods based on single hits and methods based on k-mer comparisons. MetaPhinder is available as a web service at the Center for Genomic Epidemiology https://cge.cbs.dtu.dk/services/MetaPhinder/, while the source...... and understand them. Here we present MetaPhinder, a method to identify assembled genomic fragments (i.e. contigs) of phage origin in metage-nomic data sets. The method is based on a comparison to a database of whole genome bacteriophage sequences, integrating hits to multiple genomes to accomodate for the mosaic...... code can be downloaded from https://bitbucket.org/genomicepidemiology/metaphinder or https://github.com/vanessajurtz/MetaPhinder....

  6. Genome-wide sequence variations among Mycobacterium avium subspecies paratuberculosis.

    Directory of Open Access Journals (Sweden)

    Chung-Yi eHsu

    2011-12-01

    Full Text Available Mycobacterium avium subspecies paratuberculosis (M. ap, the causative agent of Johne’s disease (JD, infects many farmed ruminants, wildlife animals and humans. To better understand the molecular pathogenesis of these infections, we analyzed the whole genome sequences of several M. ap and M. avium subspecies avium (M. avium strains isolated from various hosts and environments. Using Next-generation sequencing technology, all 6 M. ap isolates showed a high percentage of homology (98% to the reference genome sequence of M. ap K-10 isolated from cattle. However, 2 M. avium isolates (DT 78 and Env 77 showed significant sequence diversity from the reference strain M. avium 104. The genomes of M. avium isolates DT 78 and Env 77 exhibited only 87% and 40% homology, respectively, to the M. avium 104 reference genome. Within the M. ap isolates, genomic rearrangements (insertions/deletions, Indels were not detected, and only unique single nucleotide polymorphisms (SNPs were observed among the 6 M. ap strains. While most of the SNPs (~100 in M. ap genomes were non-synonymous, a total of ~ 6000 SNPs were detected among M. avium genomes, most of them were synonymous suggesting a differential selective pressure between M. ap and M. avium isolates. In addition, SNPs-based phylo-genomic analysis showed that isolates from goat and Oryx are closely related to the cattle (K-10 strain while the human isolate (M. ap 4B is closely related to the environmental strains, indicating environmental source to human infections. Overall, SNPs were the most common variations among M. ap isolates while SNPs in addition to Indels were prevalent among M. avium isolates. Genomic variations will be useful in designing host-specific markers for the analysis of mycobacterial evolution and for developing novel diagnostics directed against Johne’s disease in animals.

  7. Mapping copy number variation by population-scale genome sequencing

    DEFF Research Database (Denmark)

    Mills, Ryan E.; Walter, Klaudia; Stewart, Chip

    2011-01-01

    Genomic structural variants (SVs) are abundant in humans, differing from other forms of variation in extent, origin and functional impact. Despite progress in SV characterization, the nucleotide resolution architecture of most SVs remains unknown. We constructed a map of unbalanced SVs (that is......, copy number variants) based on whole genome DNA sequencing data from 185 human genomes, integrating evidence from complementary SV discovery approaches with extensive experimental validations. Our map encompassed 22,025 deletions and 6,000 additional SVs, including insertions and tandem duplications...

  8. Use of a mitochondrial COI sequence to identify species of the subtribe Aphidina (Hemiptera, Aphididae

    Directory of Open Access Journals (Sweden)

    Jianfeng WANG

    2011-08-01

    Full Text Available Aphids of the subtribe Aphidina are found mainly in the North Temperate Zone. The relative lack of diagnostic morphological characteristics has obscured the identification of species in this group. However, DNA-based taxonomic methods can clarify species relationships within this group. Sequence variation in a partial segment of the mitochondrial COI gene was highly effective for resolving species relationships within Aphidina. Forty-five species were correctly identified in a neighbor-joining tree. Mean intraspecific sequence divergence was 0.17%, with a range of 0.00% to 1.54%. Mean interspecific divergence within previously recognized genera or morphologically similar species groups was 4.54%, with variation mainly in the range of 3.50% to 8.00%. Possible reasons for anomalous levels of mean nucleotide divergence within or between some taxa are discussed.

  9. Exome sequencing identifies recurrent somatic RAC1 mutations in melanoma

    Energy Technology Data Exchange (ETDEWEB)

    Krauthammer, Michael; Kong, Yong; Ha, Byung Hak; Evans, Perry; Bacchiocchi, Antonella; McCusker, James P.; Cheng, Elaine; Davis, Matthew J.; Goh, Gerald; Choi, Murim; Ariyan, Stephan; Narayan, Deepak; Dutton-Regester, Ken; Capatana, Ana; Holman, Edna C.; Bosenberg, Marcus; Sznol, Mario; Kluger, Harriet M.; Brash, Douglas E.; Stern, David F.; Materin, Miguel A.; Lo, Roger S.; Mane, Shrikant; Ma, Shuangge; Kidd, Kenneth K.; Hayward, Nicholas K.; Lifton, Richard P.; Schlessinger, Joseph; Boggon, Titus J.; Halaban, Ruth (Yale-MED); (UCLA); (Queens)

    2012-10-11

    We characterized the mutational landscape of melanoma, the form of skin cancer with the highest mortality rate, by sequencing the exomes of 147 melanomas. Sun-exposed melanomas had markedly more ultraviolet (UV)-like C>T somatic mutations compared to sun-shielded acral, mucosal and uveal melanomas. Among the newly identified cancer genes was PPP6C, encoding a serine/threonine phosphatase, which harbored mutations that clustered in the active site in 12% of sun-exposed melanomas, exclusively in tumors with mutations in BRAF or NRAS. Notably, we identified a recurrent UV-signature, an activating mutation in RAC1 in 9.2% of sun-exposed melanomas. This activating mutation, the third most frequent in our cohort of sun-exposed melanoma after those of BRAF and NRAS, changes Pro29 to serine (RAC1{sup P29S}) in the highly conserved switch I domain. Crystal structures, and biochemical and functional studies of RAC1{sup P29S} showed that the alteration releases the conformational restraint conferred by the conserved proline, causes an increased binding of the protein to downstream effectors, and promotes melanocyte proliferation and migration. These findings raise the possibility that pharmacological inhibition of downstream effectors of RAC1 signaling could be of therapeutic benefit.

  10. Somatic mutations in histiocytic sarcoma identified by next generation sequencing.

    Science.gov (United States)

    Liu, Qingqing; Tomaszewicz, Keith; Hutchinson, Lloyd; Hornick, Jason L; Woda, Bruce; Yu, Hongbo

    2016-08-01

    Histiocytic sarcoma is a rare malignant neoplasm of presumed hematopoietic origin showing morphologic and immunophenotypic evidence of histiocytic differentiation. Somatic mutation importance in the pathogenesis or disease progression of histiocytic sarcoma was largely unknown. To identify somatic mutations in histiocytic sarcoma, we studied 5 histiocytic sarcomas [3 female and 2 male patients; mean age 54.8 (20-72), anatomic sites include lymph node, uterus, and pleura] and matched normal tissues from each patient as germ line controls. Somatic mutations in 50 "Hotspot" oncogenes and tumor suppressor genes were examined using next generation sequencing. Three (out of five) histiocytic sarcoma cases carried somatic mutations in BRAF. Among them, G464V [variant frequency (VF) of 43.6 %] and G466R (VF of 29.6 %) located at the P loop potentially interfere with the hydrophobic interaction between P and activating loops and ultimately activation of BRAF. Also detected was BRAF somatic mutation N581S (VF of 7.4 %), which was located at the catalytic loop of BRAF kinase domain: its role in modifying kinase activity was unclear. A similar mutational analysis was also performed on nine acute monocytic/monoblastic leukemia cases, which did not identify any BRAF somatic mutations. Our study detected several BRAF mutations in histiocytic sarcomas, which may be important in understanding the tumorigenesis of this rare neoplasm and providing mechanisms for potential therapeutical opportunities.

  11. RNA-ID, a Powerful Tool for Identifying and Characterizing Regulatory Sequences.

    Science.gov (United States)

    Brule, C E; Dean, K M; Grayhack, E J

    2016-01-01

    The identification and analysis of sequences that regulate gene expression is critical because regulated gene expression underlies biology. RNA-ID is an efficient and sensitive method to discover and investigate regulatory sequences in the yeast Saccharomyces cerevisiae, using fluorescence-based assays to detect green fluorescent protein (GFP) relative to a red fluorescent protein (RFP) control in individual cells. Putative regulatory sequences can be inserted either in-frame or upstream of a superfolder GFP fusion protein whose expression, like that of RFP, is driven by the bidirectional GAL1,10 promoter. In this chapter, we describe the methodology to identify and study cis-regulatory sequences in the RNA-ID system, explaining features and variations of the RNA-ID reporter, as well as some applications of this system. We describe in detail the methods to analyze a single regulatory sequence, from construction of a single GFP variant to assay of variants by flow cytometry, as well as modifications required to screen libraries of different strains simultaneously. We also describe subsequent analyses of regulatory sequences. © 2016 Elsevier Inc. All rights reserved.

  12. Somatic Genetic Variation in Solid Pseudopapillary Tumor of the Pancreas by Whole Exome Sequencing

    Directory of Open Access Journals (Sweden)

    Meng Guo

    2017-01-01

    Full Text Available Solid pseudopapillary tumor of the pancreas (SPT is a rare pancreatic disease with a unique clinical manifestation. Although CTNNB1 gene mutations had been universally reported, genetic variation profiles of SPT are largely unidentified. We conducted whole exome sequencing in nine SPT patients to probe the SPT-specific insertions and deletions (indels and single nucleotide polymorphisms (SNPs. In total, 54 SNPs and 41 indels of prominent variations were demonstrated through parallel exome sequencing. We detected that CTNNB1 mutations presented throughout all patients studied (100%, and a higher count of SNPs was particularly detected in patients with older age, larger tumor, and metastatic disease. By aggregating 95 detected variation events and viewing the interconnections among each of the genes with variations, CTNNB1 was identified as the core portion in the network, which might collaborate with other events such as variations of USP9X, EP400, HTT, MED12, and PKD1 to regulate tumorigenesis. Pathway analysis showed that the events involved in other cancers had the potential to influence the progression of the SNPs count. Our study revealed an insight into the variation of the gene encoding region underlying solid-pseudopapillary neoplasm tumorigenesis. The detection of these variations might partly reflect the potential molecular mechanism.

  13. Reliable Detection of Herpes Simplex Virus Sequence Variation by High-Throughput Resequencing.

    Science.gov (United States)

    Morse, Alison M; Calabro, Kaitlyn R; Fear, Justin M; Bloom, David C; McIntyre, Lauren M

    2017-08-16

    High-throughput sequencing (HTS) has resulted in data for a number of herpes simplex virus (HSV) laboratory strains and clinical isolates. The knowledge of these sequences has been critical for investigating viral pathogenicity. However, the assembly of complete herpesviral genomes, including HSV, is complicated due to the existence of large repeat regions and arrays of smaller reiterated sequences that are commonly found in these genomes. In addition, the inherent genetic variation in populations of isolates for viruses and other microorganisms presents an additional challenge to many existing HTS sequence assembly pipelines. Here, we evaluate two approaches for the identification of genetic variants in HSV1 strains using Illumina short read sequencing data. The first, a reference-based approach, identifies variants from reads aligned to a reference sequence and the second, a de novo assembly approach, identifies variants from reads aligned to de novo assembled consensus sequences. Of critical importance for both approaches is the reduction in the number of low complexity regions through the construction of a non-redundant reference genome. We compared variants identified in the two methods. Our results indicate that approximately 85% of variants are identified regardless of the approach. The reference-based approach to variant discovery captures an additional 15% representing variants divergent from the HSV1 reference possibly due to viral passage. Reference-based approaches are significantly less labor-intensive and identify variants across the genome where de novo assembly-based approaches are limited to regions where contigs have been successfully assembled. In addition, regions of poor quality assembly can lead to false variant identification in de novo consensus sequences. For viruses with a well-assembled reference genome, a reference-based approach is recommended.

  14. A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

    Science.gov (United States)

    Lea, Amanda J.

    2015-01-01

    Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html. PMID:26599596

  15. Microsatellite Primers Identified by 454 Sequencing in the Floodplain Tree Species Eucalyptus victrix (Myrtaceae

    Directory of Open Access Journals (Sweden)

    Paul G. Nevill

    2013-05-01

    Full Text Available Premise of the study: Microsatellite primers were developed for Eucalyptus victrix (Myrtaceae to evaluate the population and spatial genetic structure of this widespread northwestern Australian riparian tree species, which may be impacted by hydrological changes associated with mining activity. Methods and Results: 454 GS-FLX shotgun sequencing was used to obtain 1895 sequences containing putative microsatellite motifs. Ten polymorphic microsatellite loci were identified and screened for variation in individuals from two populations in the Pilbara region. Observed heterozygosities ranged from 0.44 to 0.91 (mean: 0.66 and the number of alleles per locus ranged from five to 25 (average: 11. Conclusions: These microsatellite loci will be useful in future studies of population and spatial genetic structure in E. victrix, and inform the development of seed sourcing strategies for the species.

  16. Identification of the sequence variations of 15 autosomal STR loci in a Chinese population.

    Science.gov (United States)

    Chen, Wenjing; Cheng, Jianding; Ou, Xueling; Chen, Yong; Tong, Dayue; Sun, Hongyu

    2014-01-01

    DNA sequence variation including base(s) changes and insertion or deletion in the primer binding region may cause a null allele and, if this changes the length of the amplified fragment out of the allelic ladder, off-ladder (OL) alleles may be detected. In order to provide accurate and reliable DNA evidence for forensic DNA analysis, it is essential to clarify sequence variations in prevalently used STR loci. Suspected null alleles and OL alleles of PlowerPlex16® System from 21,934 unrelated Chinese individuals were verified by alternative systems and sequenced. A total of 17 cases with null alleles were identified, including 12 kinds of point mutations in 16 cases and a 19-base deletion in one case. The total frequency of null alleles was 7.751 × 10(-4). Eight hundred and forty-four OL alleles classified as being of 97 different kinds were observed at 15 STR loci of the PowerPlex®16 system except vWA. All the frequencies of OL alleles were under 0.01. Null alleles should be confirmed by alternative primers and OL alleles should be named appropriately. Particular attention should be paid to sequence variation, since incorrect designation could lead to false conclusions.

  17. Analysis of sequence diversity through internal transcribed spacers and simple sequence repeats to identify Dendrobium species.

    Science.gov (United States)

    Liu, Y T; Chen, R K; Lin, S J; Chen, Y C; Chin, S W; Chen, F C; Lee, C Y

    2014-04-08

    The Orchidaceae is one of the largest and most diverse families of flowering plants. The Dendrobium genus has high economic potential as ornamental plants and for medicinal purposes. In addition, the species of this genus are able to produce large crops. However, many Dendrobium varieties are very similar in outward appearance, making it difficult to distinguish one species from another. This study demonstrated that the 12 Dendrobium species used in this study may be divided into 2 groups by internal transcribed spacer (ITS) sequence analysis. Red and yellow flowers may also be used to separate these species into 2 main groups. In particular, the deciduous characteristic is associated with the ITS genetic diversity of the A group. Of 53 designed simple sequence repeat (SSR) primer pairs, 7 pairs were polymorphic for polymerase chain reaction products that were amplified from a specific band. The results of this study demonstrate that these 7 SSR primer pairs may potentially be used to identify Dendrobium species and their progeny in future studies.

  18. Comparative analysis of complete chloroplast genome sequence and inversion variation in Lasthenia burkei (Madieae, Asteraceae).

    Science.gov (United States)

    Walker, Joseph F; Zanis, Michael J; Emery, Nancy C

    2014-04-01

    Complete chloroplast genome studies can help resolve relationships among large, complex plant lineages such as Asteraceae. We present the first whole plastome from the Madieae tribe and compare its sequence variation to other chloroplast genomes in Asteraceae. We used high throughput sequencing to obtain the Lasthenia burkei chloroplast genome. We compared sequence structure and rates of molecular evolution in the small single copy (SSC), large single copy (LSC), and inverted repeat (IR) regions to those for eight Asteraceae accessions and one Solanaceae accession. The chloroplast sequence of L. burkei is 150 746 bp and contains 81 unique protein coding genes and 4 coding ribosomal RNA sequences. We identified three major inversions in the L. burkei chloroplast, all of which have been found in other Asteraceae lineages, and a previously unreported inversion in Lactuca sativa. Regions flanking inversions contained tRNA sequences, but did not have particularly high G + C content. Substitution rates varied among the SSC, LSC, and IR regions, and rates of evolution within each region varied among species. Some observed differences in rates of molecular evolution may be explained by the relative proportion of coding to noncoding sequence within regions. Rates of molecular evolution vary substantially within and among chloroplast genomes, and major inversion events may be promoted by the presence of tRNAs. Collectively, these results provide insight into different mechanisms that may promote intramolecular recombination and the inversion of large genomic regions in the plastome.

  19. Simple sequence repeat (SSR) markers are effective for identifying ...

    African Journals Online (AJOL)

    DNA was extracted from newly formed leaves and amplified using 21 simple sequence repeat (SSR) markers (NH001c, NH002b, NH005b, NH007b, NH008b, NH009b, NH011b, NH013b, NH012a, NH014a, NH015a, NH017a, KA4b, KA5, KA14, KA16, KB16, KU10, BGA35, BGT23b and HGA8b). The data was analyzed by ...

  20. Identification, variation and transcription of pneumococcal repeat sequences

    Science.gov (United States)

    2011-01-01

    Background Small interspersed repeats are commonly found in many bacterial chromosomes. Two families of repeats (BOX and RUP) have previously been identified in the genome of Streptococcus pneumoniae, a nasopharyngeal commensal and respiratory pathogen of humans. However, little is known about the role they play in pneumococcal genetics. Results Analysis of the genome of S. pneumoniae ATCC 700669 revealed the presence of a third repeat family, which we have named SPRITE. All three repeats are present at a reduced density in the genome of the closely related species S. mitis. However, they are almost entirely absent from all other streptococci, although a set of elements related to the pneumococcal BOX repeat was identified in the zoonotic pathogen S. suis. In conjunction with information regarding their distribution within the pneumococcal chromosome, this suggests that it is unlikely that these repeats are specialised sequences performing a particular role for the host, but rather that they constitute parasitic elements. However, comparing insertion sites between pneumococcal sequences indicates that they appear to transpose at a much lower rate than IS elements. Some large BOX elements in S. pneumoniae were found to encode open reading frames on both strands of the genome, whilst another was found to form a composite RNA structure with two T box riboswitches. In multiple cases, such BOX elements were demonstrated as being expressed using directional RNA-seq and RT-PCR. Conclusions BOX, RUP and SPRITE repeats appear to have proliferated extensively throughout the pneumococcal chromosome during the species' past, but novel insertions are currently occurring at a relatively slow rate. Through their extensive secondary structures, they seem likely to affect the expression of genes with which they are co-transcribed. Software for annotation of these repeats is freely available from ftp://ftp.sanger.ac.uk/pub/pathogens/strep_repeats/. PMID:21333003

  1. Simultaneous Structural Variation Discovery in Multiple Paired-End Sequenced Genomes

    Science.gov (United States)

    Hormozdiari, Fereydoun; Hajirasouliha, Iman; McPherson, Andrew; Eichler, Evan E.; Sahinalp, S. Cenk

    Next generation sequencing technologies have been decreasing the costs and increasing the world-wide capacity for sequence production at an unprecedented rate, making the initiation of large scale projects aiming to sequence almost 2000 genomes [1]. Structural variation detection promises to be one of the key diagnostic tools for cancer and other diseases with genomic origin. In this paper, we study the problem of detecting structural variation events in two or more sequenced genomes through high throughput sequencing . We propose to move from the current model of (1) detecting genomic variations in single next generation sequenced (NGS) donor genomes independently, and (2) checking whether two or more donor genomes indeed agree or disagree on the variations (in this paper we name this framework Independent Structural Variation Discovery and Merging - ISV&M), to a new model in which we detect structural variation events among multiple genomes simultaneously.

  2. Transcriptomic variation among six Arabidopsis thaliana accessions identified several novel genes controlling aluminium tolerance.

    Science.gov (United States)

    Kusunoki, Kazutaka; Nakano, Yuki; Tanaka, Keisuke; Sakata, Yoichi; Koyama, Hiroyuki; Kobayashi, Yuriko

    2017-02-01

    Differences in the expression levels of aluminium (Al) tolerance genes are a known determinant of Al tolerance among plant varieties. We combined transcriptomic analysis of six Arabidopsis thaliana accessions with contrasting Al tolerance and a reverse genetic approach to identify Al-tolerance genes responsible for differences in Al tolerance between accession groups. Gene expression variation increased in the signal transduction process under Al stress and in growth-related processes in the absence of stress. Co-expression analysis and promoter single nucleotide polymorphism searching suggested that both trans-acting polymorphisms of Al signal transduction pathway and cis-acting polymorphisms in the promoter sequences caused the variations in gene expression associated with Al tolerance. Compared with the wild type, Al sensitivity increased in T-DNA knockout (KO) lines for five genes, including TARGET OF AVRB OPERATION1 (TAO1) and an unannotated gene (At5g22530). These were identified from 53 Al-inducible genes showing significantly higher expression in tolerant accessions than in sensitive accessions. These results indicate that the difference in transcriptional signalling is partly associated with the natural variation in Al tolerance in Arabidopsis. Our study also demonstrates the feasibility of comparative transcriptome analysis by using natural genetic variation for the identification of genes responsible for Al stress tolerance. © 2016 John Wiley & Sons Ltd.

  3. Targeted next-generation sequencing analysis identifies novel mutations in families with severe familial exudative vitreoretinopathy

    Science.gov (United States)

    Huang, Xiao-Yan; Zhuang, Hong; Wu, Ji-Hong; Li, Jian-Kang; Hu, Fang-Yuan; Zheng, Yu; Tellier, Laurent Christian Asker M.; Zhang, Sheng-Hai; Gao, Feng-Juan; Zhang, Jian-Guo

    2017-01-01

    Purpose Familial exudative vitreoretinopathy (FEVR) is a genetically and clinically heterogeneous disease, characterized by failure of vascular development of the peripheral retina. The symptoms of FEVR vary widely among patients in the same family, and even between the two eyes of a given patient. This study was designed to identify the genetic defect in a patient cohort of ten Chinese families with a definitive diagnosis of FEVR. Methods To identify the causative gene, next-generation sequencing (NGS)-based target capture sequencing was performed. Segregation analysis of the candidate variant was performed in additional family members by using Sanger sequencing and quantitative real-time PCR (QPCR). Results Of the cohort of ten FEVR families, six pathogenic variants were identified, including four novel and two known heterozygous mutations. Of the variants identified, four were missense variants, and two were novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del]. The two novel heterozygous deletion mutations were not observed in the control subjects and could give rise to a relatively severe FEVR phenotype, which could be explained by the protein function prediction. Conclusions We identified two novel heterozygous deletion mutations [LRP5, c.4053 DelC (p.Ile1351IlefsX88); TSPAN12, EX8Del] using targeted NGS as a causative mutation for FEVR. These genetic deletion variations exhibit a severe form of FEVR, with tractional retinal detachments compared with other known point mutations. The data further enrich the mutation spectrum of FEVR and enhance our understanding of genotype–phenotype correlations to provide useful information for disease diagnosis, prognosis, and effective genetic counseling. PMID:28867931

  4. Rare and common regulatory variation in population-scale sequenced human genomes.

    Directory of Open Access Journals (Sweden)

    Stephen B Montgomery

    2011-07-01

    Full Text Available Population-scale genome sequencing allows the characterization of functional effects of a broad spectrum of genetic variants underlying human phenotypic variation. Here, we investigate the influence of rare and common genetic variants on gene expression patterns, using variants identified from sequencing data from the 1000 genomes project in an African and European population sample and gene expression data from lymphoblastoid cell lines. We detect comparable numbers of expression quantitative trait loci (eQTLs when compared to genotypes obtained from HapMap 3, but as many as 80% of the top expression quantitative trait variants (eQTVs discovered from 1000 genomes data are novel. The properties of the newly discovered variants suggest that mapping common causal regulatory variants is challenging even with full resequencing data; however, we observe significant enrichment of regulatory effects in splice-site and nonsense variants. Using RNA sequencing data, we show that 46.2% of nonsynonymous variants are differentially expressed in at least one individual in our sample, creating widespread potential for interactions between functional protein-coding and regulatory variants. We also use allele-specific expression to identify putative rare causal regulatory variants. Furthermore, we demonstrate that outlier expression values can be due to rare variant effects, and we approximate the number of such effects harboured in an individual by effect size. Our results demonstrate that integration of genomic and RNA sequencing analyses allows for the joint assessment of genome sequence and genome function.

  5. A map of human genome variation from population-scale sequencing.

    Science.gov (United States)

    Abecasis, Gonçalo R; Altshuler, David; Auton, Adam; Brooks, Lisa D; Durbin, Richard M; Gibbs, Richard A; Hurles, Matt E; McVean, Gil A

    2010-10-28

    The 1000 Genomes Project aims to provide a deep characterization of human genome sequence variation as a foundation for investigating the relationship between genotype and phenotype. Here we present results of the pilot phase of the project, designed to develop and compare different strategies for genome-wide sequencing with high-throughput platforms. We undertook three projects: low-coverage whole-genome sequencing of 179 individuals from four populations; high-coverage sequencing of two mother-father-child trios; and exon-targeted sequencing of 697 individuals from seven populations. We describe the location, allele frequency and local haplotype structure of approximately 15 million single nucleotide polymorphisms, 1 million short insertions and deletions, and 20,000 structural variants, most of which were previously undescribed. We show that, because we have catalogued the vast majority of common variation, over 95% of the currently accessible variants found in any individual are present in this data set. On average, each person is found to carry approximately 250 to 300 loss-of-function variants in annotated genes and 50 to 100 variants previously implicated in inherited disorders. We demonstrate how these results can be used to inform association and functional studies. From the two trios, we directly estimate the rate of de novo germline base substitution mutations to be approximately 10(-8) per base pair per generation. We explore the data with regard to signatures of natural selection, and identify a marked reduction of genetic variation in the neighbourhood of genes, due to selection at linked sites. These methods and public data will support the next phase of human genetic research.

  6. Geochemical variations during the 2012 Emilia seismic sequence

    Science.gov (United States)

    Sciarra, Alessandra; Cantucci, Barbara; Galli, Gianfranco; Cinti, Daniele; Pizzino, Luca

    2015-04-01

    Several geochemical surveys (soil gas and shallow water) were performed in the Modena province (Massa Finalese, Finale Emilia, Medolla and S. Felice sul Panaro), during 2006-2014 period. In May-June 2012, a seismic sequence (main shocks of ML 5.9 and 5.8) was occurred closely to the investigated area. In this area 300 CO2 and CH4 fluxes measurements, 150 soil gas concentrations (He, H2, CO2, CH4 and C2H6), 30 shallow waters and their isotopic analyses (δ13C- CH4, δD- CH4 and δ13C- CO2) were performed in April-May 2006, October and December 2008, repeated in May and September 2012, June 2013 and July 2014 afterwards the 2012 Emilia seismic sequences. Chemical composition of soil gas are dominated by CH4 in the southern part by CO2 in the northern part. Very anomalous fluxes and concentrations are recorded in spot areas; elsewhere CO2 and CH4 values are very low, within the typical range of vegetative and of organic exhalation of the cultivated soil. After the seismic sequence the CH4 and CO2 fluxes are increased of one order of magnitude in the spotty areas, whereas in the surrounding area the values are within the background. On the contrary, CH4 concentration decrease (40%v/v in the 2012 surveys) and CO2 concentration increase until to 12.7%v/v (2013 survey). Isotopic gas analysis were carried out only on samples with anomalous values. Pre-seismic data hint a thermogenic origin of CH4 probably linked to leakage from a deep source in the Medolla area. Conversely, 2012/2013 isotopic data indicate a typical biogenic origin (i.e. microbial hydrocarbon production) of the CH4, as recognized elsewhere in the Po Plain and surroundings. The δ13C-CO2 value suggests a prevalent shallow origin of CO2 (i.e. organic and/or soil-derived) probably related to anaerobic oxidation of heavy hydrocarbons. Water samples, collected from domestic, industrial and hydrocarbons exploration wells, allowed us to recognize different families of waters. Waters are meteoric in origin and

  7. Transcriptome sequencing in prostate cancer identifies inter-tumor heterogeneity

    Directory of Open Access Journals (Sweden)

    Janet Mendonca

    2015-06-01

    Full Text Available Given the dearth of gene mutations in prostate cancer, [1] ,[2] it is likely that genomic rearrangements play a significant role in the evolution of prostate cancer. However, in the search for recurrent genomic alterations, "private alterations" have received less attention. Such alterations may provide insights into the evolution, behavior, and clinical outcome of an individual tumor. In a recent report in "Genome Biology" Wyatt et al. [3] defines unique alterations in a cohort of high-risk prostate cancer patient with a lethal phenotype. Utilizing a transcriptome sequencing approach they observe high inter-tumor heterogeneity; however, the genes altered distill into three distinct cancer-relevant pathways. Their analysis reveals the presence of several non-ETS fusions, which may contribute to the phenotype of individual tumors, and have significance for disease progression.

  8. Identifying and correcting epigenetics measurements for systematic sources of variation.

    NARCIS (Netherlands)

    Perrier, Flavie; Novoloaca, Alexei; Ambatipudi, Srikant; Baglietto, Laura; Ghantous, Akram; Perduca, Vittorio; Barrdahl, Myrto; Harlid, Sophia; Ong, Ken K; Cardona, Alexia; Polidoro, Silvia; Nøst, Therese Haugdahl; Overvad, Kim; Omichessan, Hanane; Dollé, Martijn; Bamia, Christina; Huerta, José Marìa; Vineis, Paolo; Herceg, Zdenko; Romieu, Isabelle; Ferrari, Pietro

    2018-01-01

    Methylation measures quantified by microarray techniques can be affected by systematic variation due to the technical processing of samples, which may compromise the accuracy of the measurement process and contribute to bias the estimate of the association under investigation. The quantification of

  9. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes.

    Science.gov (United States)

    Hu, H; Haas, S A; Chelly, J; Van Esch, H; Raynaud, M; de Brouwer, A P M; Weinert, S; Froyen, G; Frints, S G M; Laumonnier, F; Zemojtel, T; Love, M I; Richard, H; Emde, A-K; Bienek, M; Jensen, C; Hambrock, M; Fischer, U; Langnick, C; Feldkamp, M; Wissink-Lindhout, W; Lebrun, N; Castelnau, L; Rucci, J; Montjean, R; Dorseuil, O; Billuart, P; Stuhlmann, T; Shaw, M; Corbett, M A; Gardner, A; Willis-Owen, S; Tan, C; Friend, K L; Belet, S; van Roozendaal, K E P; Jimenez-Pocquet, M; Moizard, M-P; Ronce, N; Sun, R; O'Keeffe, S; Chenna, R; van Bömmel, A; Göke, J; Hackett, A; Field, M; Christie, L; Boyle, J; Haan, E; Nelson, J; Turner, G; Baynam, G; Gillessen-Kaesbach, G; Müller, U; Steinberger, D; Budny, B; Badura-Stronka, M; Latos-Bieleńska, A; Ousager, L B; Wieacker, P; Rodríguez Criado, G; Bondeson, M-L; Annerén, G; Dufke, A; Cohen, M; Van Maldergem, L; Vincent-Delorme, C; Echenne, B; Simon-Bouy, B; Kleefstra, T; Willemsen, M; Fryns, J-P; Devriendt, K; Ullmann, R; Vingron, M; Wrogemann, K; Wienker, T F; Tzschach, A; van Bokhoven, H; Gecz, J; Jentsch, T J; Chen, W; Ropers, H-H; Kalscheuer, V M

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of these males were previously tested negative for copy number variations and for mutations in a subset of known XLID genes by Sanger sequencing. In total, 745 X-chromosomal genes were screened. After stringent filtering, a total of 1297 non-recurrent exonic variants remained for prioritization. Co-segregation analysis of potential clinically relevant changes revealed that 80 families (20%) carried pathogenic variants in established XLID genes. In 19 families, we detected likely causative protein truncating and missense variants in 7 novel and validated XLID genes (CLCN4, CNKSR2, FRMPD4, KLHL15, LAS1L, RLIM and USP27X) and potentially deleterious variants in 2 novel candidate XLID genes (CDK16 and TAF1). We show that the CLCN4 and CNKSR2 variants impair protein functions as indicated by electrophysiological studies and altered differentiation of cultured primary neurons from Clcn4(-/-) mice or after mRNA knock-down. The newly identified and candidate XLID proteins belong to pathways and networks with established roles in cognitive function and intellectual disability in particular. We suggest that systematic sequencing of all X-chromosomal genes in a cohort of patients with genetic evidence for X-chromosome locus involvement may resolve up to 58% of Fragile X-negative cases.

  10. Haplotypes and Sequence Variation in the Ovine Adiponectin Gene (ADIPOQ

    Directory of Open Access Journals (Sweden)

    Qing-Ming An

    2015-11-01

    Full Text Available The adiponectin gene (ADIPOQ plays an important role in energy homeostasis. In this study five separate regions (regions 1 to 5 of ovine ADIPOQ were analysed using PCR-SSCP. Four different PCR-SSCP patterns (A1-D1, A2-D2 were detected in region-1 and region-2, respectively, with seven and six SNPs being revealed. In region-3, three different patterns (A3-C3 and three SNPs were observed. Two patterns (A4-B4, A5-B5 and two and one SNPs were observed in region-4 and region-5, respectively. In total, nineteen SNPs were detected, with five of them in the coding region and two (c.46T/C and c.515G/A putatively resulting in amino acid changes (p.Tyr16His and p.Lys172Arg. In region-1, -2 and -3 of 316 sheep from eight New Zealand breeds, variants A1, A2 and A3 were the most common, although variant frequencies differed in the eight breeds. Across region-1 and region-3, nine haplotypes were identified and haplotypes A1-A3, A1-C3, B1-A3 and B1-C3 were most common. These results indicate that the ADIPOQ gene is polymorphic and suggest that further analysis is required to see if the variation in the gene is associated with animal production traits.

  11. Genome-wide association study identified CNP12587 region underlying height variation in Chinese females.

    Directory of Open Access Journals (Sweden)

    Yin-Ping Zhang

    Full Text Available Human height is a highly heritable trait considered as an important factor for health. There has been limited success in identifying the genetic factors underlying height variation. We aim to identify sequence variants associated with adult height by a genome-wide association study of copy number variants (CNVs in Chinese.Genome-wide CNV association analyses were conducted in 1,625 unrelated Chinese adults and sex specific subgroup for height variation, respectively. Height was measured with a stadiometer. Affymetrix SNP6.0 genotyping platform was used to identify copy number polymorphisms (CNPs. We constructed a genomic map containing 1,009 CNPs in Chinese individuals and performed a genome-wide association study of CNPs with height.We detected 10 significant association signals for height (p<0.05 in the whole population, 9 and 11 association signals for Chinese female and male population, respectively. A copy number polymorphism (CNP12587, chr18:54081842-54086942, p = 2.41 × 10(-4 was found to be significantly associated with height variation in Chinese females even after strict Bonferroni correction (p = 0.048. Confirmatory real time PCR experiments lent further support for CNV validation. Compared to female subjects with two copies of the CNP, carriers of three copies had an average of 8.1% decrease in height. An important candidate gene, ubiquitin-protein ligase NEDD4-like (NEDD4L, was detected at this region, which plays important roles in bone metabolism by binding to bone formation regulators.Our findings suggest the important genetic variants underlying height variation in Chinese.

  12. Potential of DNA sequences to identify zoanthids (Cnidaria: Zoantharia).

    Science.gov (United States)

    Sinniger, Frederic; Reimer, James D; Pawlowski, Jan

    2008-12-01

    The order Zoantharia is known for its chaotic taxonomy and difficult morphological identification. One method that potentially could help for examining such troublesome taxa is DNA barcoding, which identifies species using standard molecular markers. The mitochondrial cytochrome oxidase subunit I (COI) has been utilized to great success in groups such as birds and insects; however, its applicability in many other groups is controversial. Recently, some studies have suggested that barcoding is not applicable to anthozoans. Here, we examine the use of COI and mitochondrial 16S ribosomal DNA for zoanthid identification. Despite the absence of a clear barcoding gap, our results show that for most of 54 zoanthid samples, both markers could separate samples to the species, or species group, level, particularly when easily accessible ecological or distributional data were included. Additionally, we have used the short V5 region of mt 16S rDNA to identify eight old (13 to 50 years old) museum samples. We discuss advantages and disadvantages of COI and mt 16S rDNA as barcodes for Zoantharia, and recommend that either one or both of these markers be considered for zoanthid identification in the future.

  13. Genetic variation in the Staphylococcus aureus 8325 strain lineage revealed by whole-genome sequencing.

    Directory of Open Access Journals (Sweden)

    Kristoffer T Bæk

    Full Text Available Staphylococcus aureus strains of the 8325 lineage, especially 8325-4 and derivatives lacking prophage, have been used extensively for decades of research. We report herein the results of our deep sequence analysis of strain 8325-4. Assignment of sequence variants compared with the reference strain 8325 (NRS77/PS47 required correction of errors in the 8325 reference genome, and reassessment of variation previously attributed to chemical mutagenesis of the restriction-defective RN4220. Using an extensive strain pedigree analysis, we discovered that 8325-4 contains 16 single nucleotide polymorphisms (SNP arising prior to the construction of RN4220. We identified 5 indels in 8325-4 compared with 8325. Three indels correspond to expected Φ11, 12, 13 excisions, one indel is explained by a sequence assembly artifact, and the final indel (Δ63bp in the spa-sarS intergenic region is common to only a sub-lineage of 8325-4 strains including SH1000. This deletion was found to significantly decrease (75% steady state sarS but not spa transcript levels in post-exponential phase. The sub-lineage 8325-4 was also found to harbor 4 additional SNPs. We also found large sequence variation between 8325, 8325-4 and RN4220 in a cluster of repetitive hypothetical proteins (SA0282 homologs near the Ess secretion cluster. The overall 8325-4 SNP set results in 17 alterations within coding sequences. Remarkably, we discovered that all tested strains of the 8325-4 lineage lack phenol soluble modulin α3 (PSMα3, a virulence determinant implicated in neutrophil chemotaxis, biofilm architecture and surface spreading. Collectively, our results clarify and define the 8325-4 pedigree and reveal clear evidence that mutations existing throughout all branches of this lineage, including the widely used RN6390 and SH1000 strains, could conceivably impact virulence regulation.

  14. Identifying Statistical Dependence in Genomic Sequences via Mutual Information Estimates

    Directory of Open Access Journals (Sweden)

    Wojciech Szpankowski

    2007-12-01

    Full Text Available Questions of understanding and quantifying the representation and amount of information in organisms have become a central part of biological research, as they potentially hold the key to fundamental advances. In this paper, we demonstrate the use of information-theoretic tools for the task of identifying segments of biomolecules (DNA or RNA that are statistically correlated. We develop a precise and reliable methodology, based on the notion of mutual information, for finding and extracting statistical as well as structural dependencies. A simple threshold function is defined, and its use in quantifying the level of significance of dependencies between biological segments is explored. These tools are used in two specific applications. First, they are used for the identification of correlations between different parts of the maize zmSRp32 gene. There, we find significant dependencies between the 5′ untranslated region in zmSRp32 and its alternatively spliced exons. This observation may indicate the presence of as-yet unknown alternative splicing mechanisms or structural scaffolds. Second, using data from the FBI's combined DNA index system (CODIS, we demonstrate that our approach is particularly well suited for the problem of discovering short tandem repeats—an application of importance in genetic profiling.

  15. Protein 3D structure computed from evolutionary sequence variation.

    Directory of Open Access Journals (Sweden)

    Debora S Marks

    Full Text Available The evolutionary trajectory of a protein through sequence space is constrained by its function. Collections of sequence homologs record the outcomes of millions of evolutionary experiments in which the protein evolves according to these constraints. Deciphering the evolutionary record held in these sequences and exploiting it for predictive and engineering purposes presents a formidable challenge. The potential benefit of solving this challenge is amplified by the advent of inexpensive high-throughput genomic sequencing.In this paper we ask whether we can infer evolutionary constraints from a set of sequence homologs of a protein. The challenge is to distinguish true co-evolution couplings from the noisy set of observed correlations. We address this challenge using a maximum entropy model of the protein sequence, constrained by the statistics of the multiple sequence alignment, to infer residue pair couplings. Surprisingly, we find that the strength of these inferred couplings is an excellent predictor of residue-residue proximity in folded structures. Indeed, the top-scoring residue couplings are sufficiently accurate and well-distributed to define the 3D protein fold with remarkable accuracy.We quantify this observation by computing, from sequence alone, all-atom 3D structures of fifteen test proteins from different fold classes, ranging in size from 50 to 260 residues, including a G-protein coupled receptor. These blinded inferences are de novo, i.e., they do not use homology modeling or sequence-similar fragments from known structures. The co-evolution signals provide sufficient information to determine accurate 3D protein structure to 2.7-4.8 Å C(α-RMSD error relative to the observed structure, over at least two-thirds of the protein (method called EVfold, details at http://EVfold.org. This discovery provides insight into essential interactions constraining protein evolution and will facilitate a comprehensive survey of the universe of

  16. Genotyping common and rare variation using overlapping pool sequencing

    Directory of Open Access Journals (Sweden)

    Pasaniuc Bogdan

    2011-07-01

    Full Text Available Abstract Background Recent advances in sequencing technologies set the stage for large, population based studies, in which the ANA or RNA of thousands of individuals will be sequenced. Currently, however, such studies are still infeasible using a straightforward sequencing approach; as a result, recently a few multiplexing schemes have been suggested, in which a small number of ANA pools are sequenced, and the results are then deconvoluted using compressed sensing or similar approaches. These methods, however, are limited to the detection of rare variants. Results In this paper we provide a new algorithm for the deconvolution of DNA pools multiplexing schemes. The presented algorithm utilizes a likelihood model and linear programming. The approach allows for the addition of external data, particularly imputation data, resulting in a flexible environment that is suitable for different applications. Conclusions Particularly, we demonstrate that both low and high allele frequency SNPs can be accurately genotyped when the DNA pooling scheme is performed in conjunction with microarray genotyping and imputation. Additionally, we demonstrate the use of our framework for the detection of cancer fusion genes from RNA sequences.

  17. Integrated sequence analysis pipeline provides one-stop solution for identifying disease-causing mutations.

    Science.gov (United States)

    Hu, Hao; Wienker, Thomas F; Musante, Luciana; Kalscheuer, Vera M; Kahrizi, Kimia; Najmabadi, Hossein; Ropers, H Hilger

    2014-12-01

    Next-generation sequencing has greatly accelerated the search for disease-causing defects, but even for experts the data analysis can be a major challenge. To facilitate the data processing in a clinical setting, we have developed a novel medical resequencing analysis pipeline (MERAP). MERAP assesses the quality of sequencing, and has optimized capacity for calling variants, including single-nucleotide variants, insertions and deletions, copy-number variation, and other structural variants. MERAP identifies polymorphic and known causal variants by filtering against public domain databases, and flags nonsynonymous and splice-site changes. MERAP uses a logistic model to estimate the causal likelihood of a given missense variant. MERAP considers the relevant information such as phenotype and interaction with known disease-causing genes. MERAP compares favorably with GATK, one of the widely used tools, because of its higher sensitivity for detecting indels, its easy installation, and its economical use of computational resources. Upon testing more than 1,200 individuals with mutations in known and novel disease genes, MERAP proved highly reliable, as illustrated here for five families with disease-causing variants. We believe that the clinical implementation of MERAP will expedite the diagnostic process of many disease-causing defects. © 2014 WILEY PERIODICALS, INC.

  18. Rapid detection of SMARCB1 sequence variation using high resolution melting

    Directory of Open Access Journals (Sweden)

    Ashley David M

    2009-12-01

    Full Text Available Abstract Background Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM, for detecting sequence variations in SMARCB1. Methods Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. Results The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4% showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA. A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. Conclusions This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to

  19. Rapid detection of SMARCB1 sequence variation using high resolution melting

    International Nuclear Information System (INIS)

    Dagar, Vinod; Chow, Chung-Wo; Ashley, David M; Algar, Elizabeth M

    2009-01-01

    Rhabdoid tumors are rare cancers of early childhood arising in the kidney, central nervous system and other organs. The majority are caused by somatic inactivating mutations or deletions affecting the tumor suppressor locus SMARCB1 [OMIM 601607]. Germ-line SMARCB1 inactivation has been reported in association with rhabdoid tumor, epitheloid sarcoma and familial schwannomatosis, underscoring the importance of accurate mutation screening to ascertain recurrence and transmission risks. We describe a rapid and sensitive diagnostic screening method, using high resolution melting (HRM), for detecting sequence variations in SMARCB1. Amplicons, encompassing the nine coding exons of SMARCB1, flanking splice site sequences and the 5' and 3' UTR, were screened by both HRM and direct DNA sequencing to establish the reliability of HRM as a primary mutation screening tool. Reaction conditions were optimized with commercially available HRM mixes. The false negative rate for detecting sequence variants by HRM in our sample series was zero. Nine amplicons out of a total of 140 (6.4%) showed variant melt profiles that were subsequently shown to be false positive. Overall nine distinct pathogenic SMARCB1 mutations were identified in a total of 19 possible rhabdoid tumors. Two tumors had two distinct mutations and two harbored SMARCB1 deletion. Other mutations were nonsense or frame-shifts. The detection sensitivity of the HRM screening method was influenced by both sequence context and specific nucleotide change and varied from 1: 4 to 1:1000 (variant to wild-type DNA). A novel method involving digital HRM, followed by re-sequencing, was used to confirm mutations in tumor specimens containing associated normal tissue. This is the first report describing SMARCB1 mutation screening using HRM. HRM is a rapid, sensitive and inexpensive screening technology that is likely to be widely adopted in diagnostic laboratories to facilitate whole gene mutation screening

  20. Exome sequencing identifies three novel candidate genes implicated in intellectual disability.

    Directory of Open Access Journals (Sweden)

    Zehra Agha

    Full Text Available Intellectual disability (ID is a major health problem mostly with an unknown etiology. Recently exome sequencing of individuals with ID identified novel genes implicated in the disease. Therefore the purpose of the present study was to identify the genetic cause of ID in one syndromic and two non-syndromic Pakistani families. Whole exome of three ID probands was sequenced. Missense variations in two plausible novel genes implicated in autosomal recessive ID were identified: lysine (K-specific methyltransferase 2B (KMT2B, zinc finger protein 589 (ZNF589, as well as hedgehog acyltransferase (HHAT with a de novo mutation with autosomal dominant mode of inheritance. The KMT2B recessive variant is the first report of recessive Kleefstra syndrome-like phenotype. Identification of plausible causative mutations for two recessive and a dominant type of ID, in genes not previously implicated in disease, underscores the large genetic heterogeneity of ID. These results also support the viewpoint that large number of ID genes converge on limited number of common networks i.e. ZNF589 belongs to KRAB-domain zinc-finger proteins previously implicated in ID, HHAT is predicted to affect sonic hedgehog, which is involved in several disorders with ID, KMT2B associated with syndromic ID fits the epigenetic module underlying the Kleefstra syndromic spectrum. The association of these novel genes in three different Pakistani ID families highlights the importance of screening these genes in more families with similar phenotypes from different populations to confirm the involvement of these genes in pathogenesis of ID.

  1. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    Science.gov (United States)

    Yuen, Ryan KC; Merico, Daniele; Bookman, Matt; Howe, Jennifer L; Thiruvahindrapuram, Bhooma; Patel, Rohan V; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A; Walker, Susan; Marshall, Christian R; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D’Abate, Lia; Chan, Ada JS; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R; Nalpathamkalam, Thomas; Sung, Wilson WL; Tsoi, Fiona J; Wei, John; Xu, Lizhen; Tasse, Anne-Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie MacKinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A; Parr, Jeremy R; Spence, Sarah J; Vorstman, Jacob; Frey, Brendan J; Robinson, James T; Strug, Lisa J; Fernandez, Bridget A; Elsabbagh, Mayada; Carter, Melissa T; Hallmayer, Joachim; Knoppers, Bartha M; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H; Glazer, David; Pletcher, Mathew T; Scherer, Stephen W

    2017-01-01

    We are performing whole genome sequencing (WGS) of families with Autism Spectrum Disorder (ASD) to build a resource, named MSSNG, to enable the sub-categorization of phenotypes and underlying genetic factors involved. Here, we report WGS of 5,205 samples from families with ASD, accompanied by clinical information, creating a database accessible in a cloud platform, and through an internet portal with controlled access. We found an average of 73.8 de novo single nucleotide variants and 12.6 de novo insertion/deletions (indels) or copy number variations (CNVs) per ASD subject. We identified 18 new candidate ASD-risk genes such as MED13 and PHF3, and found that participants bearing mutations in susceptibility genes had significantly lower adaptive ability (p=6×10−4). In 294/2,620 (11.2%) of ASD cases, a molecular basis could be determined and 7.2% of these carried CNV/chromosomal abnormalities, emphasizing the importance of detecting all forms of genetic variation as diagnostic and therapeutic targets in ASD. PMID:28263302

  2. WHITE-DWARF-MAIN-SEQUENCE BINARIES IDENTIFIED FROM THE LAMOST PILOT SURVEY

    International Nuclear Information System (INIS)

    Ren Juanjuan; Luo Ali; Li Yinbi; Wei Peng; Zhao Jingkun; Zhao Yongheng; Song Yihan; Zhao Gang

    2013-01-01

    We present a set of white-dwarf-main-sequence (WDMS) binaries identified spectroscopically from the Large sky Area Multi-Object fiber Spectroscopic Telescope (LAMOST, also called the Guo Shou Jing Telescope) pilot survey. We develop a color selection criteria based on what is so far the largest and most complete Sloan Digital Sky Survey (SDSS) DR7 WDMS binary catalog and identify 28 WDMS binaries within the LAMOST pilot survey. The primaries in our binary sample are mostly DA white dwarfs except for one DB white dwarf. We derive the stellar atmospheric parameters, masses, and radii for the two components of 10 of our binaries. We also provide cooling ages for the white dwarf primaries as well as the spectral types for the companion stars of these 10 WDMS binaries. These binaries tend to contain hot white dwarfs and early-type companions. Through cross-identification, we note that nine binaries in our sample have been published in the SDSS DR7 WDMS binary catalog. Nineteen spectroscopic WDMS binaries identified by the LAMOST pilot survey are new. Using the 3σ radial velocity variation as a criterion, we find two post-common-envelope binary candidates from our WDMS binary sample

  3. Non-codingRNA sequence variations in human chronic lymphocytic leukemia and colorectal cancer.

    Science.gov (United States)

    Wojcik, Sylwia E; Rossi, Simona; Shimizu, Masayoshi; Nicoloso, Milena S; Cimmino, Amelia; Alder, Hansjuerg; Herlea, Vlad; Rassenti, Laura Z; Rai, Kanti R; Kipps, Thomas J; Keating, Michael J; Croce, Carlo M; Calin, George A

    2010-02-01

    Cancer is a genetic disease in which the interplay between alterations in protein-coding genes and non-coding RNAs (ncRNAs) plays a fundamental role. In recent years, the full coding component of the human genome was sequenced in various cancers, whereas such attempts related to ncRNAs are still fragmentary. We screened genomic DNAs for sequence variations in 148 microRNAs (miRNAs) and ultraconserved regions (UCRs) loci in patients with chronic lymphocytic leukemia (CLL) or colorectal cancer (CRC) by Sanger technique and further tried to elucidate the functional consequences of some of these variations. We found sequence variations in miRNAs in both sporadic and familial CLL cases, mutations of UCRs in CLLs and CRCs and, in certain instances, detected functional effects of these variations. Furthermore, by integrating our data with previously published data on miRNA sequence variations, we have created a catalog of DNA sequence variations in miRNAs/ultraconserved genes in human cancers. These findings argue that ncRNAs are targeted by both germ line and somatic mutations as well as by single-nucleotide polymorphisms with functional significance for human tumorigenesis. Sequence variations in ncRNA loci are frequent and some have functional and biological significance. Such information can be exploited to further investigate on a genome-wide scale the frequency of genetic variations in ncRNAs and their functional meaning, as well as for the development of new diagnostic and prognostic markers for leukemias and carcinomas.

  4. A survey of single nucleotide polymorphisms identified from whole-genome sequencing and their functional effect in the porcine genome.

    Science.gov (United States)

    Keel, B N; Nonneman, D J; Rohrer, G A

    2017-08-01

    Genetic variants detected from sequence have been used to successfully identify causal variants and map complex traits in several organisms. High and moderate impact variants, those expected to alter or disrupt the protein coded by a gene and those that regulate protein production, likely have a more significant effect on phenotypic variation than do other types of genetic variants. Hence, a comprehensive list of these functional variants would be of considerable interest in swine genomic studies, particularly those targeting fertility and production traits. Whole-genome sequence was obtained from 72 of the founders of an intensely phenotyped experimental swine herd at the U.S. Meat Animal Research Center (USMARC). These animals included all 24 of the founding boars (12 Duroc and 12 Landrace) and 48 Yorkshire-Landrace composite sows. Sequence reads were mapped to the Sscrofa10.2 genome build, resulting in a mean of 6.1 fold (×) coverage per genome. A total of 22 342 915 high confidence SNPs were identified from the sequenced genomes. These included 21 million previously reported SNPs and 79% of the 62 163 SNPs on the PorcineSNP60 BeadChip assay. Variation was detected in the coding sequence or untranslated regions (UTRs) of 87.8% of the genes in the porcine genome: loss-of-function variants were predicted in 504 genes, 10 202 genes contained nonsynonymous variants, 10 773 had variation in UTRs and 13 010 genes contained synonymous variants. Approximately 139 000 SNPs were classified as loss-of-function, nonsynonymous or regulatory, which suggests that over 99% of the variation detected in our pigs could potentially be ignored, allowing us to focus on a much smaller number of functional SNPs during future analyses. Published 2017. This article is a U.S. Government work and is in the public domain in the USA.

  5. Mitochondrial DNA sequence variation in the Anatolian Peninsula ...

    Indian Academy of Sciences (India)

    Unknown

    necting the Middle East, Europe and Central Asia, and, thus, has been subject to major population movements. The ... from different parts of Anatolia by direct sequencing. Analysis of the two ... the country, samples were obtained from individuals com- ing from ..... Arlequin: a software environment for the analysis of popula-.

  6. Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

    Science.gov (United States)

    Yang, Aiyuan; Yan, Chunxia; Zhu, Feng; Zhao, Zhongmeng; Cao, Zhi

    2013-01-01

    Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR) is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR), which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds. PMID:23984382

  7. Identifying Interacting Genetic Variations by Fish-Swarm Logic Regression

    Directory of Open Access Journals (Sweden)

    Xuanping Zhang

    2013-01-01

    Full Text Available Understanding associations between genotypes and complex traits is a fundamental problem in human genetics. A major open problem in mapping phenotypes is that of identifying a set of interacting genetic variants, which might contribute to complex traits. Logic regression (LR is a powerful multivariant association tool. Several LR-based approaches have been successfully applied to different datasets. However, these approaches are not adequate with regard to accuracy and efficiency. In this paper, we propose a new LR-based approach, called fish-swarm logic regression (FSLR, which improves the logic regression process by incorporating swarm optimization. In our approach, a school of fish agents are conducted in parallel. Each fish agent holds a regression model, while the school searches for better models through various preset behaviors. A swarm algorithm improves the accuracy and the efficiency by speeding up the convergence and preventing it from dropping into local optimums. We apply our approach on a real screening dataset and a series of simulation scenarios. Compared to three existing LR-based approaches, our approach outperforms them by having lower type I and type II error rates, being able to identify more preset causal sites, and performing at faster speeds.

  8. Variation in Symbiodinium ITS2 sequence assemblages among coral colonies.

    Science.gov (United States)

    Stat, Michael; Bird, Christopher E; Pochon, Xavier; Chasqui, Luis; Chauka, Leonard J; Concepcion, Gregory T; Logan, Dan; Takabayashi, Misaki; Toonen, Robert J; Gates, Ruth D

    2011-01-05

    Endosymbiotic dinoflagellates in the genus Symbiodinium are fundamentally important to the biology of scleractinian corals, as well as to a variety of other marine organisms. The genus Symbiodinium is genetically and functionally diverse and the taxonomic nature of the union between Symbiodinium and corals is implicated as a key trait determining the environmental tolerance of the symbiosis. Surprisingly, the question of how Symbiodinium diversity partitions within a species across spatial scales of meters to kilometers has received little attention, but is important to understanding the intrinsic biological scope of a given coral population and adaptations to the local environment. Here we address this gap by describing the Symbiodinium ITS2 sequence assemblages recovered from colonies of the reef building coral Montipora capitata sampled across Kāne'ohe Bay, Hawai'i. A total of 52 corals were sampled in a nested design of Coral Colony(Site(Region)) reflecting spatial scales of meters to kilometers. A diversity of Symbiodinium ITS2 sequences was recovered with the majority of variance partitioning at the level of the Coral Colony. To confirm this result, the Symbiodinium ITS2 sequence diversity in six M. capitata colonies were analyzed in much greater depth with 35 to 55 clones per colony. The ITS2 sequences and quantitative composition recovered from these colonies varied significantly, indicating that each coral hosted a different assemblage of Symbiodinium. The diversity of Symbiodinium ITS2 sequence assemblages retrieved from individual colonies of M. capitata here highlights the problems inherent in interpreting multi-copy and intra-genomically variable molecular markers, and serves as a context for discussing the utility and biological relevance of assigning species names based on Symbiodinium ITS2 genotyping.

  9. Solar Luminosity on the Main Sequence, Standard Model and Variations

    Science.gov (United States)

    Ayukov, S. V.; Baturin, V. A.; Gorshkov, A. B.; Oreshina, A. V.

    2017-05-01

    Our Sun became Main Sequence star 4.6 Gyr ago according Standard Solar Model. At that time solar luminosity was 30% lower than current value. This conclusion is based on assumption that Sun is fueled by thermonuclear reactions. If Earth's albedo and emissivity in infrared are unchanged during Earth history, 2.3 Gyr ago oceans had to be frozen. This contradicts to geological data: there was liquid water 3.6-3.8 Gyr ago on Earth. This problem is known as Faint Young Sun Paradox. We analyze luminosity change in standard solar evolution theory. Increase of mean molecular weight in the central part of the Sun due to conversion of hydrogen to helium leads to gradual increase of luminosity with time on the Main Sequence. We also consider several exotic models: fully mixed Sun; drastic change of pp reaction rate; Sun consisting of hydrogen and helium only. Solar neutrino observations however exclude most non-standard solar models.

  10. Gene Unprediction with Spurio: A tool to identify spurious protein sequences.

    Science.gov (United States)

    Höps, Wolfram; Jeffryes, Matt; Bateman, Alex

    2018-01-01

    We now have access to the sequences of tens of millions of proteins. These protein sequences are essential for modern molecular biology and computational biology. The vast majority of protein sequences are derived from gene prediction tools and have no experimental supporting evidence for their translation.  Despite the increasing accuracy of gene prediction tools there likely exists a large number of spurious protein predictions in the sequence databases.  We have developed the Spurio tool to help identify spurious protein predictions in prokaryotes.  Spurio searches the query protein sequence against a prokaryotic nucleotide database using tblastn and identifies homologous sequences. The tblastn matches are used to score the query sequence's likelihood of being a spurious protein prediction using a Gaussian process model. The most informative feature is the appearance of stop codons within the presumed translation of homologous DNA sequences. Benchmarking shows that the Spurio tool is able to distinguish spurious from true proteins. However, transposon proteins are prone to be predicted as spurious because of the frequency of degraded homologs found in the DNA sequence databases. Our initial experiments suggest that less than 1% of the proteins in the UniProtKB sequence database are likely to be spurious and that Spurio is able to identify over 60 times more spurious proteins than the AntiFam resource. The Spurio software and source code is available under an MIT license at the following URL: https://bitbucket.org/bateman-group/spurio.

  11. Draft genome sequence of an elite Dura palm and whole-genome patterns of DNA variation in oil palm.

    Science.gov (United States)

    Jin, Jingjing; Lee, May; Bai, Bin; Sun, Yanwei; Qu, Jing; Rahmadsyah; Alfiko, Yuzer; Lim, Chin Huat; Suwanto, Antonius; Sugiharti, Maria; Wong, Limsoon; Ye, Jian; Chua, Nam-Hai; Yue, Gen Hua

    2016-12-01

    Oil palm is the world's leading source of vegetable oil and fat. Dura, Pisifera and Tenera are three forms of oil palm. The genome sequence of Pisifera is available whereas the Dura form has not been sequenced yet. We sequenced the genome of one elite Dura palm, and re-sequenced 17 palm genomes. The assemble genome sequence of the elite Dura tree contained 10,971 scaffolds and was 1.701 Gb in length, covering 94.49% of the oil palm genome. 36,105 genes were predicted. Re-sequencing of 17 additional palm trees identified 18.1 million SNPs. We found high genetic variation among palms from different geographical regions, but lower variation among Southeast Asian Dura and Pisifera palms. We mapped 10,000 SNPs on the linkage map of oil palm. In addition, high linkage disequilibrium (LD) was detected in the oil palms used in breeding populations of Southeast Asia, suggesting that LD mapping is likely to be practical in this important oil crop. Our data provide a valuable resource for accelerating genetic improvement and studying the mechanism underlying phenotypic variations of important oil palm traits. © The Author 2016. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  12. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color.

    Science.gov (United States)

    Motamayor, Juan C; Mockaitis, Keithanne; Schmutz, Jeremy; Haiminen, Niina; Livingstone, Donald; Cornejo, Omar; Findley, Seth D; Zheng, Ping; Utro, Filippo; Royaert, Stefan; Saski, Christopher; Jenkins, Jerry; Podicheti, Ram; Zhao, Meixia; Scheffler, Brian E; Stack, Joseph C; Feltus, Frank A; Mustiga, Guiliana M; Amores, Freddy; Phillips, Wilbert; Marelli, Jean Philippe; May, Gregory D; Shapiro, Howard; Ma, Jianxin; Bustamante, Carlos D; Schnell, Raymond J; Main, Dorrie; Gilbert, Don; Parida, Laxmi; Kuhn, David N

    2013-06-03

    Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits.

  13. The genome sequence of the most widely cultivated cacao type and its use to identify candidate genes regulating pod color

    Science.gov (United States)

    2013-01-01

    Background Theobroma cacao L. cultivar Matina 1-6 belongs to the most cultivated cacao type. The availability of its genome sequence and methods for identifying genes responsible for important cacao traits will aid cacao researchers and breeders. Results We describe the sequencing and assembly of the genome of Theobroma cacao L. cultivar Matina 1-6. The genome of the Matina 1-6 cultivar is 445 Mbp, which is significantly larger than a sequenced Criollo cultivar, and more typical of other cultivars. The chromosome-scale assembly, version 1.1, contains 711 scaffolds covering 346.0 Mbp, with a contig N50 of 84.4 kbp, a scaffold N50 of 34.4 Mbp, and an evidence-based gene set of 29,408 loci. Version 1.1 has 10x the scaffold N50 and 4x the contig N50 as Criollo, and includes 111 Mb more anchored sequence. The version 1.1 assembly has 4.4% gap sequence, while Criollo has 10.9%. Through a combination of haplotype, association mapping and gene expression analyses, we leverage this robust reference genome to identify a promising candidate gene responsible for pod color variation. We demonstrate that green/red pod color in cacao is likely regulated by the R2R3 MYB transcription factor TcMYB113, homologs of which determine pigmentation in Rosaceae, Solanaceae, and Brassicaceae. One SNP within the target site for a highly conserved trans-acting siRNA in dicots, found within TcMYB113, seems to affect transcript levels of this gene and therefore pod color variation. Conclusions We report a high-quality sequence and annotation of Theobroma cacao L. and demonstrate its utility in identifying candidate genes regulating traits. PMID:23731509

  14. Identifying Corneal Infections in Formalin-Fixed Specimens Using Next Generation Sequencing.

    Science.gov (United States)

    Li, Zhigang; Breitwieser, Florian P; Lu, Jennifer; Jun, Albert S; Asnaghi, Laura; Salzberg, Steven L; Eberhart, Charles G

    2018-01-01

    We test the ability of next-generation sequencing, combined with computational analysis, to identify a range of organisms causing infectious keratitis. This retrospective study evaluated 16 cases of infectious keratitis and four control corneas in formalin-fixed tissues from the pathology laboratory. Infectious cases also were analyzed in the microbiology laboratory using culture, polymerase chain reaction, and direct staining. Classified sequence reads were analyzed with two different metagenomics classification engines, Kraken and Centrifuge, and visualized using the Pavian software tool. Sequencing generated 20 to 46 million reads per sample. On average, 96% of the reads were classified as human, 0.3% corresponded to known vectors or contaminant sequences, 1.7% represented microbial sequences, and 2.4% could not be classified. The two computational strategies successfully identified the fungal, bacterial, and amoebal pathogens in most patients, including all four bacterial and mycobacterial cases, five of six fungal cases, three of three Acanthamoeba cases, and one of three herpetic keratitis cases. In several cases, additional potential pathogens also were identified. In one case with cytomegalovirus identified by Kraken and Centrifuge, the virus was confirmed by direct testing, while two where Staphylococcus aureus or cytomegalovirus were identified by Centrifuge but not Kraken could not be confirmed. Confirmation was not attempted for an additional three potential pathogens identified by Kraken and 11 identified by Centrifuge. Next generation sequencing combined with computational analysis can identify a wide range of pathogens in formalin-fixed corneal specimens, with potential applications in clinical diagnostics and research.

  15. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    Francioli, Laurent C.; Menelaou, Andronild; Pulit, Sara L.; Van Dijk, Freerk; Palamara, Pier Francesco; Elbers, Clara C.; Neerincx, Pieter B. T.; Ye, Kai; Guryev, Victor; Kloosterman, Wigard P.; Deelen, Patrick; Abdellaoui, Abdel; Van Leeuwen, Elisabeth M.; Van Oven, Mannis; Vermaat, Martijn; Li, Mingkun; Laros, Jeroen F. J.; Karssen, Lennart C.; Kanterakis, Alexandros; Amin, Najaf; Hottenga, Jouke Jan; Lameijer, Eric-Wubbo; Kattenberg, Mathijs; Dijkstra, Martijn; Byelas, Heorhiy; Van Settenl, Jessica; Van Schaik, Barbera D. C.; Bot, Jan; Nijman, Isaac J.; Renkens, Ivo; Marscha, Tobias; Schonhuth, Alexander; Hehir-Kwa, Jayne Y.; Handsaker, Robert E.; Polak, Paz; Sohail, Mashaal; Vuzman, Dana; Hormozdiari, Fereydoun; Van Enckevort, David; Mei, Hailiang; Koval, Vyacheslav; Moed, Ma-Tthijs H.; Van der Velde, K. Joeri; Rivadeneira, Fernando; Estrada, Karol; Medina-Gomez, Carolina; Isaacs, Aaron; Platteel, Mathieu; Swertz, Morris A.; Wijmenga, Cisca

    Whole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch parent-offspring

  16. Whole-genome sequence variation, population structure and demographic history of the Dutch population

    NARCIS (Netherlands)

    The Genome of the Netherlands Consortium; T. Marschall (Tobias); A. Schönhuth (Alexander)

    2014-01-01

    htmlabstractWhole-genome sequencing enables complete characterization of genetic variation, but geographic clustering of rare alleles demands many diverse populations be studied. Here we describe the Genome of the Netherlands (GoNL) Project, in which we sequenced the whole genomes of 250 Dutch

  17. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing

    NARCIS (Netherlands)

    Aflitos, S.A.; Schijlen, E.G.W.M.; Jong, de J.H.S.G.M.; Ridder, de D.; Smit, S.; Finkers, H.J.; Bakker, F.T.; Geest, van de H.C.; Lintel Hekkert, te B.; Haarst, van J.C.; Smits, L.W.M.; Koops, A.J.; Sanchez-Perez, M.J.; Heusden, van A.W.; Visser, R.G.F.; Schranz, M.E.; Peters, S.A.

    2014-01-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative for the Lycopersicon, Arcanum, Eriopersicon, and Neolycopersicon groups which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new

  18. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing

    NARCIS (Netherlands)

    Aflitos, S.; Schijlen, E.; de Jong, H.; de Ridder, D.; Smit, S.; Finkers, R.; Wang, J.; Zhang, G.; Li, N.; Mao, L.; Bakker, F.; Dirks, R.; Breit, T.; Gravendeel, B.; Huits, H.; Struss, D.; Swanson-Wagner, R.; van Leeuwen, H.; van Ham, R.C.H.J.; Fito, L.; Guignier, L.; Sevilla, M.; Ellul, P.; Ganko, E.; Kapur, A.; Reclus, E.; de Geus, B.; van de Geest, H.; te Lintel Hekkert, B.; van Haarst, J.; Smits, L.; Koops, A.; Sanchez-Perez, G.; van Heusden, A.W.; Visser, R.; Quan, Z.; Min, J.; Liao, L.; Wang, X.; Wang, G.; Yue, Z.; Yang, X.; Xu, N.; Schranz, E.; Smets, E.; Vos, R.; Rauwerda, J.; Ursem, R.; Schuit, C.; Kerns, M.; van den Berg, J.; Vriezen, W.; Janssen, A.; Datema, E.; Jahrman, T.; Moquet, F.; Bonnet, J.; Peters, S.

    2014-01-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new

  19. Sequence-length variation of mtDNA HVS-I C-stretch in Chinese ethnic groups.

    Science.gov (United States)

    Chen, Feng; Dang, Yong-hui; Yan, Chun-xia; Liu, Yan-ling; Deng, Ya-jun; Fulton, David J R; Chen, Teng

    2009-10-01

    The purpose of this study was to investigate mitochondrial DNA (mtDNA) hypervariable segment-I (HVS-I) C-stretch variations and explore the significance of these variations in forensic and population genetics studies. The C-stretch sequence variation was studied in 919 unrelated individuals from 8 Chinese ethnic groups using both direct and clone sequencing approaches. Thirty eight C-stretch haplotypes were identified, and some novel and population specific haplotypes were also detected. The C-stretch genetic diversity (GD) values were relatively high, and probability (P) values were low. Additionally, C-stretch length heteroplasmy was observed in approximately 9% of individuals studied. There was a significant correlation (r=-0.961, Ppopulations. The results from the Fst and dA genetic distance matrix, neighbor-joining tree, and principal component map also suggest that C-stretch could be used as a reliable genetic marker in population genetics.

  20. Sequence length variation, indel costs, and congruence in sensitivity analysis

    DEFF Research Database (Denmark)

    Aagesen, Lone; Petersen, Gitte; Seberg, Ole

    2005-01-01

    The behavior of two topological and four character-based congruence measures was explored using different indel treatments in three empirical data sets, each with different alignment difficulties. The analyses were done using direct optimization within a sensitivity analysis framework in which...... the cost of indels was varied. Indels were treated either as a fifth character state, or strings of contiguous gaps were considered single events by using linear affine gap cost. Congruence consistently improved when indels were treated as single events, but no congruence measure appeared as the obviously...... preferable one. However, when combining enough data, all congruence measures clearly tended to select the same alignment cost set as the optimal one. Disagreement among congruence measures was mostly caused by a dominant fragment or a data partition that included all or most of the length variation...

  1. Patterns of genomic variation in the poplar rust fungus Melampsora larici-populina identify pathogenesis-related factors

    Directory of Open Access Journals (Sweden)

    Antoine ePersoons

    2014-09-01

    Full Text Available Melampsora larici-populina is a fungal pathogen responsible for foliar rust disease on poplar trees, which causes damage to forest plantations worldwide, particularly in Northern Europe. The reference genome of the isolate 98AG31 was previously sequenced using a whole genome shotgun strategy, revealing a large genome of 101 megabases containing 16,399 predicted genes, which included secreted protein genes representing poplar rust candidate effectors. In the present study, the genomes of 15 isolates collected over the past 20 years throughout the French territory, representing distinct virulence profiles, were characterized by massively parallel sequencing to assess genetic variation in the poplar rust fungus. Comparison to the reference genome revealed striking structural variations. Analysis of coverage and sequencing depth identified large missing regions between isolates related to the mating type loci. More than 611,824 single-nucleotide polymorphism (SNP positions were uncovered overall, indicating a remarkable level of polymorphism. Based on the accumulation of non-synonymous substitutions in coding sequences and the relative frequencies of synonymous and non-synonymous polymorphisms (i.e. PN/PS, we identify candidate genes that may be involved in fungal pathogenesis. Correlation between non-synonymous SNPs in genes encoding secreted proteins and pathotypes of the studied isolates revealed candidate genes potentially related to virulences 1, 6 and 8 of the poplar rust fungus.

  2. Spectrum of sequence variations in the FANCA gene: an International Fanconi Anemia Registry (IFAR) study.

    Science.gov (United States)

    Levran, Orna; Diotti, Raffaella; Pujara, Kanan; Batish, Sat D; Hanenberg, Helmut; Auerbach, Arleen D

    2005-02-01

    Fanconi anemia (FA) is an autosomal recessive disorder that is defined by cellular hypersensitivity to DNA cross-linking agents, and is characterized clinically by developmental abnormalities, progressive bone-marrow failure, and predisposition to leukemia and solid tumors. There is extensive genetic heterogeneity, with at least 11 different FA complementation groups. FA-A is the most common group, accounting for approximately 65% of all affected individuals. The mutation spectrum of the FANCA gene, located on chromosome 16q24.3, is highly heterogeneous. Here we summarize all sequence variations (mutations and polymorphisms) in FANCA described in the literature and listed in the Fanconi Anemia Mutation Database as of March 2004, and report 61 novel FANCA mutations identified in FA patients registered in the International Fanconi Anemia Registry (IFAR). Thirty-eight novel SNPs, previously unreported in the literature or in dbSNP, were also identified. We studied the segregation of common FANCA SNPs in FA families to generate haplotypes. We found that FANCA SNP data are highly useful for carrier testing, prenatal diagnosis, and preimplantation genetic diagnosis, particularly when the disease-causing mutations are unknown. Twenty-two large genomic deletions were identified by detection of apparent homozygosity for rare SNPs. In addition, a conserved SNP haplotype block spanning at least 60 kb of the FANCA gene was identified in individuals from various ethnic groups. (c) 2005 Wiley-Liss, Inc.

  3. kmer-SVM: a web server for identifying predictive regulatory sequence features in genomic data sets

    Science.gov (United States)

    Fletez-Brant, Christopher; Lee, Dongwon; McCallion, Andrew S.; Beer, Michael A.

    2013-01-01

    Massively parallel sequencing technologies have made the generation of genomic data sets a routine component of many biological investigations. For example, Chromatin immunoprecipitation followed by sequence assays detect genomic regions bound (directly or indirectly) by specific factors, and DNase-seq identifies regions of open chromatin. A major bottleneck in the interpretation of these data is the identification of the underlying DNA sequence code that defines, and ultimately facilitates prediction of, these transcription factor (TF) bound or open chromatin regions. We have recently developed a novel computational methodology, which uses a support vector machine (SVM) with kmer sequence features (kmer-SVM) to identify predictive combinations of short transcription factor-binding sites, which determine the tissue specificity of these genomic assays (Lee, Karchin and Beer, Discriminative prediction of mammalian enhancers from DNA sequence. Genome Res. 2011; 21:2167–80). This regulatory information can (i) give confidence in genomic experiments by recovering previously known binding sites, and (ii) reveal novel sequence features for subsequent experimental testing of cooperative mechanisms. Here, we describe the development and implementation of a web server to allow the broader research community to independently apply our kmer-SVM to analyze and interpret their genomic datasets. We analyze five recently published data sets and demonstrate how this tool identifies accessory factors and repressive sequence elements. kmer-SVM is available at http://kmersvm.beerlab.org. PMID:23771147

  4. Variation in the prion protein sequence in Dutch goat breeds.

    Science.gov (United States)

    Windig, J J; Hoving, R A H; Priem, J; Bossers, A; van Keulen, L J M; Langeveld, J P M

    2016-10-01

    Scrapie is a neurodegenerative disease occurring in goats and sheep. Several haplotypes of the prion protein increase resistance to scrapie infection and may be used in selective breeding to help eradicate scrapie. In this study, frequencies of the allelic variants of the PrP gene are determined for six goat breeds in the Netherlands. Overall frequencies in Dutch goats were determined from 768 brain tissue samples in 2005, 766 in 2008 and 300 in 2012, derived from random sampling for the national scrapie surveillance without knowledge of the breed. Breed specific frequencies were determined in the winter 2013/2014 by sampling 300 breeding animals from the main breeders of the different breeds. Detailed analysis of the scrapie-resistant K222 haplotype was carried out in 2014 for 220 Dutch Toggenburger goats and in 2015 for 942 goats from the Saanen derived White Goat breed. Nine haplotypes were identified in the Dutch breeds. Frequencies for non-wild type haplotypes were generally low. Exception was the K222 haplotype in the Dutch Toggenburger (29%) and the S146 haplotype in the Nubian and Boer breeds (respectively 7 and 31%). The frequency of the K222 haplotype in the Toggenburger was higher than for any other breed reported in literature, while for the White Goat breed it was with 3.1% similar to frequencies of other Saanen or Saanen derived breeds. Further evidence was found for the existence of two M142 haplotypes, M142 /S240 and M142 /P240 . Breeds vary in haplotype frequencies but frequencies of resistant genotypes are generally low and consequently selective breeding for scrapie resistance can only be slow but will benefit from animals identified in this study. The unexpectedly high frequency of the K222 haplotype in the Dutch Toggenburger underlines the need for conservation of rare breeds in order to conserve genetic diversity rare or absent in other breeds. © 2016 Blackwell Verlag GmbH.

  5. Exome sequencing identifies compound heterozygous mutations in CYP4V2 in a pedigree with retinitis pigmentosa.

    Directory of Open Access Journals (Sweden)

    Yun Wang

    Full Text Available Retinitis pigmentosa (RP is a heterogeneous group of progressive retinal degenerations characterized by pigmentation and atrophy in the mid-periphery of the retina. Twenty two subjects from a four-generation Chinese family with RP and thin cornea, congenital cataract and high myopia is reported in this study. All family members underwent complete ophthalmologic examinations. Patients of the family presented with bone spicule-shaped pigment deposits in retina, retinal vascular attenuation, retinal and choroidal dystrophy, as well as punctate opacity of the lens, reduced cornea thickness and high myopia. Peripheral venous blood was obtained from all patients and their family members for genetic analysis. After mutation analysis in a few known RP candidate genes, exome sequencing was used to analyze the exomes of 3 patients III2, III4, III6 and the unaffected mother II2. A total of 34,693 variations shared by 3 patients were subjected to several filtering steps against existing variation databases. Identified variations were verified in the rest family members by PCR and Sanger sequencing. Compound heterozygous c.802-8_810del17insGC and c.1091-2A>G mutations of the CYP4V2 gene, known as genetic defects for Bietti crystalline corneoretinal dystrophy, were identified as causative mutations for RP of this family.

  6. Genetic diversity in breonadia salicina based on intra-species sequence variation of chloroplast dna spacer sequence

    International Nuclear Information System (INIS)

    Qurainy, F.A.; Gaafar, A.R.Z.

    2014-01-01

    Assessment and knowledge of the genetic diversity and variation within and between populations of rare and endangered plants is very important for effective conservation. Intergenic spacer sequences variation of psbA-trnH locus of chloroplast genome was assessed within Breonadia salicina (Rubiaceae), a critically endangered and endemic plant species to South western part of Kingdom of Saudi Arabia. The obtained sequence data from 19 individuals in three populations revealed nine haplotypes. The aligned sequences obtained from the overall Saudi accessions extended to 355 bp, revealing nine haplotypes. A high level of haplotype diversity (Hd = 0.842) and low level of nucleotide diversity (Pi = 0.0058) were detected. Consistently, both hierarchical analysis of molecular variance (AMOVA) and constructed neighbor-joining tree indicated null genetic differentiation among populations. This level of differentiation between populations or between regions in psbA-trnH sequences may be due to effects of the abundance of ancestral haplotype sharing and the presence of private haplotypes fixed for each population. Furthermore, the results revealed almost the same level of genetic diversity in comparison with Yemeni accessions, in which Saudi accessions were sharing three haplotypes from the four haplotypes found in Yemeni accessions. (author)

  7. An Exome Sequencing Study to Assess the Role of Rare Genetic Variation in Pulmonary Fibrosis.

    Science.gov (United States)

    Petrovski, Slavé; Todd, Jamie L; Durheim, Michael T; Wang, Quanli; Chien, Jason W; Kelly, Fran L; Frankel, Courtney; Mebane, Caroline M; Ren, Zhong; Bridgers, Joshua; Urban, Thomas J; Malone, Colin D; Finlen Copeland, Ashley; Brinkley, Christie; Allen, Andrew S; O'Riordan, Thomas; McHutchison, John G; Palmer, Scott M; Goldstein, David B

    2017-07-01

    Idiopathic pulmonary fibrosis (IPF) is an increasingly recognized, often fatal lung disease of unknown etiology. The aim of this study was to use whole-exome sequencing to improve understanding of the genetic architecture of pulmonary fibrosis. We performed a case-control exome-wide collapsing analysis including 262 unrelated individuals with pulmonary fibrosis clinically classified as IPF according to American Thoracic Society/European Respiratory Society/Japanese Respiratory Society/Latin American Thoracic Association guidelines (81.3%), usual interstitial pneumonia secondary to autoimmune conditions (11.5%), or fibrosing nonspecific interstitial pneumonia (7.2%). The majority (87%) of case subjects reported no family history of pulmonary fibrosis. We searched 18,668 protein-coding genes for an excess of rare deleterious genetic variation using whole-exome sequence data from 262 case subjects with pulmonary fibrosis and 4,141 control subjects drawn from among a set of individuals of European ancestry. Comparing genetic variation across 18,668 protein-coding genes, we found a study-wide significant (P RTEL1, and PARN. A model qualifying ultrarare, deleterious, nonsynonymous variants implicated TERT and RTEL1, and a model specifically qualifying loss-of-function variants implicated RTEL1 and PARN. A subanalysis of 186 case subjects with sporadic IPF confirmed TERT, RTEL1, and PARN as study-wide significant contributors to sporadic IPF. Collectively, 11.3% of case subjects with sporadic IPF carried a qualifying variant in one of these three genes compared with the 0.3% carrier rate observed among control subjects (odds ratio, 47.7; 95% confidence interval, 21.5-111.6; P = 5.5 × 10 -22 ). We identified TERT, RTEL1, and PARN-three telomere-related genes previously implicated in familial pulmonary fibrosis-as significant contributors to sporadic IPF. These results support the idea that telomere dysfunction is involved in IPF pathogenesis.

  8. Understanding gene sequence variation in the context of transcription regulation in yeast.

    Directory of Open Access Journals (Sweden)

    Irit Gat-Viks

    2010-01-01

    Full Text Available DNA sequence polymorphism in a regulatory protein can have a widespread transcriptional effect. Here we present a computational approach for analyzing modules of genes with a common regulation that are affected by specific DNA polymorphisms. We identify such regulatory-linkage modules by integrating genotypic and expression data for individuals in a segregating population with complementary expression data of strains mutated in a variety of regulatory proteins. Our procedure searches simultaneously for groups of co-expressed genes, for their common underlying linkage interval, and for their shared regulatory proteins. We applied the method to a cross between laboratory and wild strains of S. cerevisiae, demonstrating its ability to correctly suggest modules and to outperform extant approaches. Our results suggest that middle sporulation genes are under the control of polymorphism in the sporulation-specific tertiary complex Sum1p/Rfm1p/Hst1p. In another example, our analysis reveals novel inter-relations between Swi3 and two mitochondrial inner membrane proteins underlying variation in a module of aerobic cellular respiration genes. Overall, our findings demonstrate that this approach provides a useful framework for the systematic mapping of quantitative trait loci and their role in gene expression variation.

  9. Sequencing by ligation variation with endonuclease V digestion and deoxyinosine-containing query oligonucleotides

    Directory of Open Access Journals (Sweden)

    Ho Antoine

    2011-12-01

    Full Text Available Abstract Background Sequencing-by-ligation (SBL is one of several next-generation sequencing methods that has been developed for massive sequencing of DNA immobilized on arrayed beads (or other clonal amplicons. SBL has the advantage of being easy to implement and accessible to all because it can be performed with off-the-shelf reagents. However, SBL has the limitation of very short read lengths. Results To overcome the read length limitation, research groups have developed complex library preparation processes, which can be time-consuming, difficult, and result in low complexity libraries. Herein we describe a variation on traditional SBL protocols that extends the number of sequential bases that can be sequenced by using Endonuclease V to nick a query primer, thus leaving a ligatable end extended into the unknown sequence for further SBL cycles. To demonstrate the protocol, we constructed a known DNA sequence and utilized our SBL variation, cyclic SBL (cSBL, to resequence this region. Using our method, we were able to read thirteen contiguous bases in the 3' - 5' direction. Conclusions Combining this read length with sequencing in the 5' - 3' direction would allow a read length of over twenty bases on a single tage. Implementing mate-paired tags and this SBL variation could enable > 95% coverage of the genome.

  10. Identifying spatial clustering properties of the 1997-2003 Liguria (Northern Italy) forest-fire sequence

    International Nuclear Information System (INIS)

    Telesca, Luciano; Amatulli, Giuseppe; Lasaponara, Rosa; Lovallo, Michele; Santulli, Adriano

    2007-01-01

    The spatial clustering of the forest-fire sequence (1997-2003) of Liguria Region (Northern Italy) has been analysed using the correlation dimension D C , calculated by means of the correlation integral method. Studying the variations of this parameter, we recognize the presence of a strong variability of the spatial clusterization, modulated by seasonal cycles. Furthermore, we found that the larger fires (size >400 ha) mark the cyclic behaviour of the correlation dimension

  11. Pooled-DNA sequencing identifies genomic regions of selection in Nigerian isolates of Plasmodium falciparum.

    Science.gov (United States)

    Oyebola, Kolapo M; Idowu, Emmanuel T; Olukosi, Yetunde A; Awolola, Taiwo S; Amambua-Ngwa, Alfred

    2017-06-29

    The burden of falciparum malaria is especially high in sub-Saharan Africa. Differences in pressure from host immunity and antimalarial drugs lead to adaptive changes responsible for high level of genetic variations within and between the parasite populations. Population-specific genetic studies to survey for genes under positive or balancing selection resulting from drug pressure or host immunity will allow for refinement of interventions. We performed a pooled sequencing (pool-seq) of the genomes of 100 Plasmodium falciparum isolates from Nigeria. We explored allele-frequency based neutrality test (Tajima's D) and integrated haplotype score (iHS) to identify genes under selection. Fourteen shared iHS regions that had at least 2 SNPs with a score > 2.5 were identified. These regions code for genes that were likely to have been under strong directional selection. Two of these genes were the chloroquine resistance transporter (CRT) on chromosome 7 and the multidrug resistance 1 (MDR1) on chromosome 5. There was a weak signature of selection in the dihydrofolate reductase (DHFR) gene on chromosome 4 and MDR5 genes on chromosome 13, with only 2 and 3 SNPs respectively identified within the iHS window. We observed strong selection pressure attributable to continued chloroquine and sulfadoxine-pyrimethamine use despite their official proscription for the treatment of uncomplicated malaria. There was also a major selective sweep on chromosome 6 which had 32 SNPs within the shared iHS region. Tajima's D of circumsporozoite protein (CSP), erythrocyte-binding antigen (EBA-175), merozoite surface proteins - MSP3 and MSP7, merozoite surface protein duffy binding-like (MSPDBL2) and serine repeat antigen (SERA-5) were 1.38, 1.29, 0.73, 0.84 and 0.21, respectively. We have demonstrated the use of pool-seq to understand genomic patterns of selection and variability in P. falciparum from Nigeria, which bears the highest burden of infections. This investigation identified known

  12. Close Sequence Comparisons are Sufficient to Identify Humancis-Regulatory Elements

    Energy Technology Data Exchange (ETDEWEB)

    Prabhakar, Shyam; Poulin, Francis; Shoukry, Malak; Afzal, Veena; Rubin, Edward M.; Couronne, Olivier; Pennacchio, Len A.

    2005-12-01

    Cross-species DNA sequence comparison is the primary method used to identify functional noncoding elements in human and other large genomes. However, little is known about the relative merits of evolutionarily close and distant sequence comparisons, due to the lack of a universal metric for sequence conservation, and also the paucity of empirically defined benchmark sets of cis-regulatory elements. To address this problem, we developed a general-purpose algorithm (Gumby) that detects slowly-evolving regions in primate, mammalian and more distant comparisons without requiring adjustment of parameters, and ranks conserved elements by P-value using Karlin-Altschul statistics. We benchmarked Gumby predictions against previously identified cis-regulatory elements at diverse genomic loci, and also tested numerous extremely conserved human-rodent sequences for transcriptional enhancer activity using reporter-gene assays in transgenic mice. Human regulatory elements were identified with acceptable sensitivity and specificity by comparison with 1-5 other eutherian mammals or 6 other simian primates. More distant comparisons (marsupial, avian, amphibian and fish) failed to identify many of the empirically defined functional noncoding elements. We derived an intuitive relationship between ancient and recent noncoding sequence conservation from whole genome comparative analysis, which explains some of these findings. Lastly, we determined that, in addition to strength of conservation, genomic location and/or density of surrounding conserved elements must also be considered in selecting candidate enhancers for testing at embryonic time points.

  13. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    OpenAIRE

    Sep?lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain5, Arnab; Clark, Taane G

    2013-01-01

    BACKGROUND: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poi...

  14. Improvisation Planning and Jam Session Design using concepts of Sequence Variation and Flow Experience

    OpenAIRE

    Dubnov , Shlomo; Assayag , Gérard

    2005-01-01

    cote interne IRCAM: Assayag05a; National audience; We describe a model for improvisation design based on Factor Oracle automation, which is extended to perform learning and analysis of incoming sequences in terms of sequence variation parameters, namely replication, recombination and innovation. These parameters describe the improvisation plan and allow the design of new improvisations or analysis and modification of plans of existing improvisations. We further introduce an idea of flow exper...

  15. Identifying recombinants in human and primate immunodeficiency virus sequence alignments using quartet scanning

    Directory of Open Access Journals (Sweden)

    Martin Darren P

    2009-04-01

    Full Text Available Abstract Background Recombination has a profound impact on the evolution of viruses, but characterizing recombination patterns in molecular sequences remains a challenging endeavor. Despite its importance in molecular evolutionary studies, identifying the sequences that exhibit such patterns has received comparatively less attention in the recombination detection framework. Here, we extend a quartet-mapping based recombination detection method to enable identification of recombinant sequences without prior specifications of either query and reference sequences. Through simulations we evaluate different recombinant identification statistics and significance tests. We compare the quartet approach with triplet-based methods that employ additional heuristic tests to identify parental and recombinant sequences. Results Analysis of phylogenetic simulations reveal that identifying the descendents of relatively old recombination events is a challenging task for all methods available, and that quartet scanning performs relatively well compared to the triplet based methods. The use of quartet scanning is further demonstrated by analyzing both well-established and putative HIV-1 recombinant strains. In agreement with recent findings, we provide evidence that the presumed circulating recombinant CRF02_AG is a 'pure' lineage, whereas the presumed parental lineage subtype G has a recombinant origin. We also demonstrate HIV-1 intrasubtype recombination, confirm the hybrid origin of SIV in chimpanzees and further disentangle the recombinant history of SIV lineages in a primate immunodeficiency virus data set. Conclusion Quartet scanning makes a valuable addition to triplet-based methods for identifying recombinant sequences without prior specifications of either query and reference sequences. The new method is available in the VisRD v.3.0 package http://www.cmp.uea.ac.uk/~vlm/visrd.

  16. Somatic mutation profiles of MSI and MSS colorectal cancer identified by whole exome next generation sequencing and bioinformatics analysis.

    Directory of Open Access Journals (Sweden)

    Bernd Timmermann

    Full Text Available BACKGROUND: Colorectal cancer (CRC is with approximately 1 million cases the third most common cancer worldwide. Extensive research is ongoing to decipher the underlying genetic patterns with the hope to improve early cancer diagnosis and treatment. In this direction, the recent progress in next generation sequencing technologies has revolutionized the field of cancer genomics. However, one caveat of these studies remains the large amount of genetic variations identified and their interpretation. METHODOLOGY/PRINCIPAL FINDINGS: Here we present the first work on whole exome NGS of primary colon cancers. We performed 454 whole exome pyrosequencing of tumor as well as adjacent not affected normal colonic tissue from microsatellite stable (MSS and microsatellite instable (MSI colon cancer patients and identified more than 50,000 small nucleotide variations for each tissue. According to predictions based on MSS and MSI pathomechanisms we identified eight times more somatic non-synonymous variations in MSI cancers than in MSS and we were able to reproduce the result in four additional CRCs. Our bioinformatics filtering approach narrowed down the rate of most significant mutations to 359 for MSI and 45 for MSS CRCs with predicted altered protein functions. In both CRCs, MSI and MSS, we found somatic mutations in the intracellular kinase domain of bone morphogenetic protein receptor 1A, BMPR1A, a gene where so far germline mutations are associated with juvenile polyposis syndrome, and show that the mutations functionally impair the protein function. CONCLUSIONS/SIGNIFICANCE: We conclude that with deep sequencing of tumor exomes one may be able to predict the microsatellite status of CRC and in addition identify potentially clinically relevant mutations.

  17. DESCARTES' RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA.

    Science.gov (United States)

    Bhaskar, Anand; Song, Yun S

    2014-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the "folded" SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes' rule of signs for polynomials to the Laplace transform of piecewise continuous functions.

  18. DESCARTES’ RULE OF SIGNS AND THE IDENTIFIABILITY OF POPULATION DEMOGRAPHIC MODELS FROM GENOMIC VARIATION DATA1

    Science.gov (United States)

    Bhaskar, Anand; Song, Yun S.

    2016-01-01

    The sample frequency spectrum (SFS) is a widely-used summary statistic of genomic variation in a sample of homologous DNA sequences. It provides a highly efficient dimensional reduction of large-scale population genomic data and its mathematical dependence on the underlying population demography is well understood, thus enabling the development of efficient inference algorithms. However, it has been recently shown that very different population demographies can actually generate the same SFS for arbitrarily large sample sizes. Although in principle this nonidentifiability issue poses a thorny challenge to statistical inference, the population size functions involved in the counterexamples are arguably not so biologically realistic. Here, we revisit this problem and examine the identifiability of demographic models under the restriction that the population sizes are piecewise-defined where each piece belongs to some family of biologically-motivated functions. Under this assumption, we prove that the expected SFS of a sample uniquely determines the underlying demographic model, provided that the sample is sufficiently large. We obtain a general bound on the sample size sufficient for identifiability; the bound depends on the number of pieces in the demographic model and also on the type of population size function in each piece. In the cases of piecewise-constant, piecewise-exponential and piecewise-generalized-exponential models, which are often assumed in population genomic inferences, we provide explicit formulas for the bounds as simple functions of the number of pieces. Lastly, we obtain analogous results for the “folded” SFS, which is often used when there is ambiguity as to which allelic type is ancestral. Our results are proved using a generalization of Descartes’ rule of signs for polynomials to the Laplace transform of piecewise continuous functions. PMID:28018011

  19. An Evolutionarily Young Polar Bear (Ursus maritimus Endogenous Retrovirus Identified from Next Generation Sequence Data

    Directory of Open Access Journals (Sweden)

    Kyriakos Tsangaras

    2015-11-01

    Full Text Available Transcriptome analysis of polar bear (Ursus maritimus tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV. Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos and black bear (Ursus americanus but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.

  20. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data.

    Science.gov (United States)

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E; Greenwood, Alex D

    2015-11-24

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals.

  1. An Evolutionarily Young Polar Bear (Ursus maritimus) Endogenous Retrovirus Identified from Next Generation Sequence Data

    Science.gov (United States)

    Tsangaras, Kyriakos; Mayer, Jens; Alquezar-Planas, David E.; Greenwood, Alex D.

    2015-01-01

    Transcriptome analysis of polar bear (Ursus maritimus) tissues identified sequences with similarity to Porcine Endogenous Retroviruses (PERV). Based on these sequences, four proviral copies and 15 solo long terminal repeats (LTRs) of a newly described endogenous retrovirus were characterized from the polar bear draft genome sequence. Closely related sequences were identified by PCR analysis of brown bear (Ursus arctos) and black bear (Ursus americanus) but were absent in non-Ursinae bear species. The virus was therefore designated UrsusERV. Two distinct groups of LTRs were observed including a recombinant ERV that contained one LTR belonging to each group indicating that genomic invasions by at least two UrsusERV variants have recently occurred. Age estimates based on proviral LTR divergence and conservation of integration sites among ursids suggest the viral group is only a few million years old. The youngest provirus was polar bear specific, had intact open reading frames (ORFs) and could potentially encode functional proteins. Phylogenetic analyses of UrsusERV consensus protein sequences suggest that it is part of a pig, gibbon and koala retrovirus clade. The young age estimates and lineage specificity of the virus suggests UrsusERV is a recent cross species transmission from an unknown reservoir and places the viral group among the youngest of ERVs identified in mammals. PMID:26610552

  2. Natural history bycatch: a pipeline for identifying metagenomic sequences in RADseq data

    Directory of Open Access Journals (Sweden)

    Iris Holmes

    2018-04-01

    Full Text Available Background Reduced representation genomic datasets are increasingly becoming available from a variety of organisms. These datasets do not target specific genes, and so may contain sequences from parasites and other organisms present in the target tissue sample. In this paper, we demonstrate that (1 RADseq datasets can be used for exploratory analysis of tissue-specific metagenomes, and (2 tissue collections house complete metagenomic communities, which can be investigated and quantified by a variety of techniques. Methods We present an exploratory method for mining metagenomic “bycatch” sequences from a range of host tissue types. We use a combination of the pyRAD assembly pipeline, NCBI’s blastn software, and custom R scripts to isolate metagenomic sequences from RADseq type datasets. Results When we focus on sequences that align with existing references in NCBI’s GenBank, we find that between three and five percent of identifiable double-digest restriction site associated DNA (ddRAD sequences from host tissue samples are from phyla to contain known blood parasites. In addition to tissue samples, we examine ddRAD sequences from metagenomic DNA extracted snake and lizard hind-gut samples. We find that the sequences recovered from these samples match with expected bacterial and eukaryotic gut microbiome phyla. Discussion Our results suggest that (1 museum tissue banks originally collected for host DNA archiving are also preserving valuable parasite and microbiome communities, (2 that publicly available RADseq datasets may include metagenomic sequences that could be explored, and (3 that restriction site approaches are a useful exploratory technique to identify microbiome lineages that could be missed by primer-based approaches.

  3. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Directory of Open Access Journals (Sweden)

    Sathishkumar Natarajan

    Full Text Available Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L. and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs, 1.9 million InDels, and 182,398 putative structural variations (SVs. Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  4. Whole Genome Re-Sequencing and Characterization of Powdery Mildew Disease-Associated Allelic Variation in Melon.

    Science.gov (United States)

    Natarajan, Sathishkumar; Kim, Hoy-Taek; Thamilarasan, Senthil Kumar; Veerappan, Karpagam; Park, Jong-In; Nou, Ill-Sup

    2016-01-01

    Powdery mildew is one of the most common fungal diseases in the world. This disease frequently affects melon (Cucumis melo L.) and other Cucurbitaceous family crops in both open field and greenhouse cultivation. One of the goals of genomics is to identify the polymorphic loci responsible for variation in phenotypic traits. In this study, powdery mildew disease assessment scores were calculated for four melon accessions, 'SCNU1154', 'Edisto47', 'MR-1', and 'PMR5'. To investigate the genetic variation of these accessions, whole genome re-sequencing using the Illumina HiSeq 2000 platform was performed. A total of 754,759,704 quality-filtered reads were generated, with an average of 82.64% coverage relative to the reference genome. Comparisons of the sequences for the melon accessions revealed around 7.4 million single nucleotide polymorphisms (SNPs), 1.9 million InDels, and 182,398 putative structural variations (SVs). Functional enrichment analysis of detected variations classified them into biological process, cellular component and molecular function categories. Further, a disease-associated QTL map was constructed for 390 SNPs and 45 InDels identified as related to defense-response genes. Among them 112 SNPs and 12 InDels were observed in powdery mildew responsive chromosomes. Accordingly, this whole genome re-sequencing study identified SNPs and InDels associated with defense genes that will serve as candidate polymorphisms in the search for sources of resistance against powdery mildew disease and could accelerate marker-assisted breeding in melon.

  5. An Internet-Accessible DNA Sequence Database for Identifying Fusaria from Human and Animal Infections

    Science.gov (United States)

    Because less than one-third of clinically relevant fusaria can be accurately identified to species level using phenotypic data (i.e., morphological species recognition), we constructed a three-locus DNA sequence database to facilitate molecular identification of the 69 Fusarium species associated wi...

  6. Exome Sequencing Fails to Identify the Genetic Cause of Aicardi Syndrome.

    Science.gov (United States)

    Lund, Caroline; Striano, Pasquale; Sorte, Hanne Sørmo; Parisi, Pasquale; Iacomino, Michele; Sheng, Ying; Vigeland, Magnus D; Øye, Anne-Marte; Møller, Rikke Steensbjerre; Selmer, Kaja K; Zara, Federico

    2016-09-01

    Aicardi syndrome (AS) is a well-characterized neurodevelopmental disorder with an unknown etiology. In this study, we performed whole-exome sequencing in 11 female patients with the diagnosis of AS, in order to identify the disease-causing gene. In particular, we focused on detecting variants in the X chromosome, including the analysis of variants with a low number of sequencing reads, in case of somatic mosaicism. For 2 of the patients, we also sequenced the exome of the parents to search for de novo mutations. We did not identify any genetic variants likely to be damaging. Only one single missense variant was identified by the de novo analyses of the 2 trios, and this was considered benign. The failure to identify a disease gene in this study may be due to technical limitations of our study design, including the possibility that the genetic aberration leading to AS is situated in a non-exonic region or that the mutation is somatic and not detectable by our approach. Alternatively, it is possible that AS is genetically heterogeneous and that 11 patients are not sufficient to reveal the causative genes. Future studies of AS should consider designs where also non-exonic regions are explored and apply a sequencing depth so that also low-grade somatic mosaicism can be detected.

  7. Exome Sequencing Fails to Identify the Genetic Cause of Aicardi Syndrome

    DEFF Research Database (Denmark)

    Lund, Caroline; Striano, Pasquale; Sorte, Hanne Sørmo

    2016-01-01

    Aicardi syndrome (AS) is a well-characterized neurodevelopmental disorder with an unknown etiology. In this study, we performed whole-exome sequencing in 11 female patients with the diagnosis of AS, in order to identify the disease-causing gene. In particular, we focused on detecting variants in ...

  8. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    OpenAIRE

    Hu, H.; Haas, S.A.; Chelly, J.; Van Esch, H.; Raynaud, M.; de Brouwer, A.P.M.; Weinert, S.; Froyen, G.; Frints, S.G.M.; Laumonnier, F.; Zemojtel, T.; Love, M.I.; Richard, H.; Emde, A.K.; Bienek, M.

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or loci are yet to be identified. Here, we have investigated 405 unresolved families with XLID. We employed massively parallel sequencing of all X-chromosome exons in the index males. The majority of ...

  9. A Novel Prosthetic Joint Infection Pathogen, Mycoplasma salivarium, Identified by Metagenomic Shotgun Sequencing.

    Science.gov (United States)

    Thoendel, Matthew; Jeraldo, Patricio; Greenwood-Quaintance, Kerryl E; Chia, Nicholas; Abdel, Matthew P; Steckelberg, James M; Osmon, Douglas R; Patel, Robin

    2017-07-15

    Defining the microbial etiology of culture-negative prosthetic joint infection (PJI) can be challenging. Metagenomic shotgun sequencing is a new tool to identify organisms undetected by conventional methods. We present a case where metagenomics was used to identify Mycoplasma salivarium as a novel PJI pathogen in a patient with hypogammaglobulinemia. © The Author 2017. Published by Oxford University Press for the Infectious Diseases Society of America. All rights reserved. For permissions, e-mail: journals.permissions@oup.com.

  10. Complete plastid genome sequence of Primula sinensis (Primulaceae: structure comparison, sequence variation and evidence for accD transfer to nucleus

    Directory of Open Access Journals (Sweden)

    Tong-Jian Liu

    2016-06-01

    Full Text Available Species-rich genus Primula L. is a typical plant group with which to understand genetic variance between species in different levels of relationships. Chloroplast genome sequences are used to be the information resource for quantifying this difference and reconstructing evolutionary history. In this study, we reported the complete chloroplast genome sequence of Primula sinensis and compared it with other related species. This genome of chloroplast showed a typical circular quadripartite structure with 150,859 bp in sequence length consisting of 37.2% GC base. Two inverted repeated regions (25,535 bp were separated by a large single-copy region (82,064 bp and a small single-copy region (17,725 bp. The genome consists of 112 genes, including 78 protein-coding genes, 30 tRNA genes and four rRNA genes. Among them, seven coding genes, seven tRNA genes and four rRNA genes have two copies due to their locations in the IR regions. The accD and infA genes lacking intact open reading frames (ORF were identified as pseudogenes. SSR and sequence variation analyses were also performed on the plastome of Primula sinensis, comparing with another available plastome of P. poissonii. The four most variable regions, rpl36–rps8, rps16–trnQ, trnH–psbA and ndhC–trnV, were identified. Phylogenetic relationship estimates using three sub-datasets extracted from a matrix of 57 protein-coding gene sequences showed the identical result that was consistent with previous studies. A transcript found from P. sinensis transcriptome showed a high similarity to plastid accD functional region and was identified as a putative plastid transit peptide at the N-terminal region. The result strongly suggested that plastid accD has been functionally transferred to the nucleus in P. sinensis.

  11. Sequence variation in TgROP7 gene among Toxoplasma gondii ...

    African Journals Online (AJOL)

    Yomi

    2012-03-27

    Mar 27, 2012 ... Toxoplasma gondii can infect a wide range of hosts including mammals and birds, causing toxoplasmosis which is one of the most common parasitic zoonoses worldwide. The present study examined sequence variation in rhoptry 7 (ROP7) gene among different T. gondii isolates from different hosts and ...

  12. Effective Normalization for Copy Number Variation Detection from Whole Genome Sequencing

    NARCIS (Netherlands)

    Janevski, A.; Varadan, V.; Kamalakaran, S.; Banerjee, N.; Dimitrova, D.

    2012-01-01

    Background Whole genome sequencing enables a high resolution view ofthe human genome and provides unique insights into genome structureat an unprecedented scale. There have been a number of tools to infer copy number variation in the genome. These tools while validatedalso include a number of

  13. Sequence Variation of MHC Class II DQB Gene in Bottlenose Dolphin (Tursiops truncatus from Taiwanese Waters

    Directory of Open Access Journals (Sweden)

    Wei-Cheng Yang

    2008-03-01

    Full Text Available The major histocompatibility complex (MHC is a large multigene coding for glycoproteins that play a key role in the initiation of immune responses in vertebrates. For a better understanding of the immunologic diversity in thriving marine mammal species, the sequence variation of the exon 2 region of MHC DQB locus was analyzed in 42 bottlenose dolphins (Tursiops truncatus collected from strandings and fishery bycatch in Taiwanese waters. The 172 bp sequences amplified showed no more than two alleles in each individual. The high proportion of non-synonymous nucleotide substitutions and the moderate amount of variation suggest positive selection pressure on this locus, arguing against a reduction in the marine environment selection pressure. The phylogenetic relationship among DQB exon 2 sequences of T. truncatus and other cetaceans did not coincide with taxonomic relationship, indicating a trans-species evolutionary pattern.

  14. Incorporation of unique molecular identifiers in TruSeq adapters improves the accuracy of quantitative sequencing.

    Science.gov (United States)

    Hong, Jungeui; Gresham, David

    2017-11-01

    Quantitative analysis of next-generation sequencing (NGS) data requires discriminating duplicate reads generated by PCR from identical molecules that are of unique origin. Typically, PCR duplicates are identified as sequence reads that align to the same genomic coordinates using reference-based alignment. However, identical molecules can be independently generated during library preparation. Misidentification of these molecules as PCR duplicates can introduce unforeseen biases during analyses. Here, we developed a cost-effective sequencing adapter design by modifying Illumina TruSeq adapters to incorporate a unique molecular identifier (UMI) while maintaining the capacity to undertake multiplexed, single-index sequencing. Incorporation of UMIs into TruSeq adapters (TrUMIseq adapters) enables identification of bona fide PCR duplicates as identically mapped reads with identical UMIs. Using TrUMIseq adapters, we show that accurate removal of PCR duplicates results in improved accuracy of both allele frequency (AF) estimation in heterogeneous populations using DNA sequencing and gene expression quantification using RNA-Seq.

  15. Unique Trichomonas vaginalis gene sequences identified in multinational regions of Northwest China.

    Science.gov (United States)

    Liu, Jun; Feng, Meng; Wang, Xiaolan; Fu, Yongfeng; Ma, Cailing; Cheng, Xunjia

    2017-07-24

    Trichomonas vaginalis (T. vaginalis) is a flagellated protozoan parasite that infects humans worldwide. This study determined the sequence of the 18S ribosomal RNA gene of T. vaginalis infecting both females and males in Xinjiang, China. Samples from 73 females and 28 males were collected and confirmed for infection with T. vaginalis, a total of 110 sequences were identified when the T. vaginalis 18S ribosomal RNA gene was sequenced. These sequences were used to prepare a phylogenetic network. The rooted network comprised three large clades and several independent branches. Most of the Xinjiang sequences were in one group. Preliminary results suggest that Xinjiang T. vaginalis isolates might be genetically unique, as indicated by the sequence of their 18S ribosomal RNA gene. Low migration rate of local people in this province may contribute to a genetic conservativeness of T. vaginalis. The unique genetic feature of our isolates may suggest a different clinical presentation of trichomoniasis, including metronidazole susceptibility, T. vaginalis virus or Mycoplasma co-infection characteristics. The transmission and evolution of Xinjiang T. vaginalis is of interest and should be studied further. More attention should be given to T. vaginalis infection in both females and males in Xinjiang.

  16. Identifying genomic changes associated with insecticide resistance in the dengue mosquito Aedes aegypti by deep targeted sequencing

    Science.gov (United States)

    Faucon, Frederic; Dusfour, Isabelle; Gaude, Thierry; Navratil, Vincent; Boyer, Frederic; Chandre, Fabrice; Sirisopa, Patcharawan; Thanispong, Kanutcharee; Juntarajumnong, Waraporn; Poupardin, Rodolphe; Chareonviriyaphap, Theeraphap; Girod, Romain; Corbel, Vincent; Reynaud, Stephane; David, Jean-Philippe

    2015-01-01

    The capacity of mosquitoes to resist insecticides threatens the control of diseases such as dengue and malaria. Until alternative control tools are implemented, characterizing resistance mechanisms is crucial for managing resistance in natural populations. Insecticide biodegradation by detoxification enzymes is a common resistance mechanism; however, the genomic changes underlying this mechanism have rarely been identified, precluding individual resistance genotyping. In particular, the role of copy number variations (CNVs) and polymorphisms of detoxification enzymes have never been investigated at the genome level, although they can represent robust markers of metabolic resistance. In this context, we combined target enrichment with high-throughput sequencing for conducting the first comprehensive screening of gene amplifications and polymorphisms associated with insecticide resistance in mosquitoes. More than 760 candidate genes were captured and deep sequenced in several populations of the dengue mosquito Ae. aegypti displaying distinct genetic backgrounds and contrasted resistance levels to the insecticide deltamethrin. CNV analysis identified 41 gene amplifications associated with resistance, most affecting cytochrome P450s overtranscribed in resistant populations. Polymorphism analysis detected more than 30,000 variants and strong selection footprints in specific genomic regions. Combining Bayesian and allele frequency filtering approaches identified 55 nonsynonymous variants strongly associated with resistance. Both CNVs and polymorphisms were conserved within regions but differed across continents, confirming that genomic changes underlying metabolic resistance to insecticides are not universal. By identifying novel DNA markers of insecticide resistance, this study opens the way for tracking down metabolic changes developed by mosquitoes to resist insecticides within and among populations. PMID:26206155

  17. Genetic variation among the Mapuche Indians from the Patagonian region of Argentina: mitochondrial DNA sequence variation and allele frequencies of several nuclear genes.

    Science.gov (United States)

    Ginther, C; Corach, D; Penacino, G A; Rey, J A; Carnese, F R; Hutz, M H; Anderson, A; Just, J; Salzano, F M; King, M C

    1993-01-01

    DNA samples from 60 Mapuche Indians, representing 39 maternal lineages, were genetically characterized for (1) nucleotide sequences of the mtDNA control region; (2) presence or absence of a nine base duplication in mtDNA region V; (3) HLA loci DRB1 and DQA1; (4) variation at three nuclear genes with short tandem repeats; and (5) variation at the polymorphic marker D2S44. The genetic profile of the Mapuche population was compared to other Amerinds and to worldwide populations. Two highly polymorphic portions of the mtDNA control region, comprising 650 nucleotides, were amplified by the polymerase chain reaction (PCR) and directly sequenced. The 39 maternal lineages were defined by two or three generation families identified by the Mapuches. These 39 lineages included 19 different mtDNA sequences that could be grouped into four classes. The same classes of sequences appear in other Amerinds from North, Central, and South American populations separated by thousands of miles, suggesting that the origin of the mtDNA patterns predates the migration to the Americas. The mtDNA sequence similarity between Amerind populations suggests that the migration throughout the Americas occurred rapidly relative to the mtDNA mutation rate. HLA DRB1 alleles 1602 and 1402 were frequent among the Mapuches. These alleles also occur at high frequency among other Amerinds in North and South America, but not among Spanish, Chinese or African-American populations. The high frequency of these alleles throughout the Americas, and their specificity to the Americas, supports the hypothesis that Mapuches and other Amerind groups are closely related.(ABSTRACT TRUNCATED AT 250 WORDS)

  18. Whole-Exome Sequencing Identifies Rare and Low-Frequency Coding Variants Associated with LDL Cholesterol

    Science.gov (United States)

    Lange, Leslie A.; Hu, Youna; Zhang, He; Xue, Chenyi; Schmidt, Ellen M.; Tang, Zheng-Zheng; Bizon, Chris; Lange, Ethan M.; Smith, Joshua D.; Turner, Emily H.; Jun, Goo; Kang, Hyun Min; Peloso, Gina; Auer, Paul; Li, Kuo-ping; Flannick, Jason; Zhang, Ji; Fuchsberger, Christian; Gaulton, Kyle; Lindgren, Cecilia; Locke, Adam; Manning, Alisa; Sim, Xueling; Rivas, Manuel A.; Holmen, Oddgeir L.; Gottesman, Omri; Lu, Yingchang; Ruderfer, Douglas; Stahl, Eli A.; Duan, Qing; Li, Yun; Durda, Peter; Jiao, Shuo; Isaacs, Aaron; Hofman, Albert; Bis, Joshua C.; Correa, Adolfo; Griswold, Michael E.; Jakobsdottir, Johanna; Smith, Albert V.; Schreiner, Pamela J.; Feitosa, Mary F.; Zhang, Qunyuan; Huffman, Jennifer E.; Crosby, Jacy; Wassel, Christina L.; Do, Ron; Franceschini, Nora; Martin, Lisa W.; Robinson, Jennifer G.; Assimes, Themistocles L.; Crosslin, David R.; Rosenthal, Elisabeth A.; Tsai, Michael; Rieder, Mark J.; Farlow, Deborah N.; Folsom, Aaron R.; Lumley, Thomas; Fox, Ervin R.; Carlson, Christopher S.; Peters, Ulrike; Jackson, Rebecca D.; van Duijn, Cornelia M.; Uitterlinden, André G.; Levy, Daniel; Rotter, Jerome I.; Taylor, Herman A.; Gudnason, Vilmundur; Siscovick, David S.; Fornage, Myriam; Borecki, Ingrid B.; Hayward, Caroline; Rudan, Igor; Chen, Y. Eugene; Bottinger, Erwin P.; Loos, Ruth J.F.; Sætrom, Pål; Hveem, Kristian; Boehnke, Michael; Groop, Leif; McCarthy, Mark; Meitinger, Thomas; Ballantyne, Christie M.; Gabriel, Stacey B.; O’Donnell, Christopher J.; Post, Wendy S.; North, Kari E.; Reiner, Alexander P.; Boerwinkle, Eric; Psaty, Bruce M.; Altshuler, David; Kathiresan, Sekar; Lin, Dan-Yu; Jarvik, Gail P.; Cupples, L. Adrienne; Kooperberg, Charles; Wilson, James G.; Nickerson, Deborah A.; Abecasis, Goncalo R.; Rich, Stephen S.; Tracy, Russell P.; Willer, Cristen J.; Gabriel, Stacey B.; Altshuler, David M.; Abecasis, Gonçalo R.; Allayee, Hooman; Cresci, Sharon; Daly, Mark J.; de Bakker, Paul I.W.; DePristo, Mark A.; Do, Ron; Donnelly, Peter; Farlow, Deborah N.; Fennell, Tim; Garimella, Kiran; Hazen, Stanley L.; Hu, Youna; Jordan, Daniel M.; Jun, Goo; Kathiresan, Sekar; Kang, Hyun Min; Kiezun, Adam; Lettre, Guillaume; Li, Bingshan; Li, Mingyao; Newton-Cheh, Christopher H.; Padmanabhan, Sandosh; Peloso, Gina; Pulit, Sara; Rader, Daniel J.; Reich, David; Reilly, Muredach P.; Rivas, Manuel A.; Schwartz, Steve; Scott, Laura; Siscovick, David S.; Spertus, John A.; Stitziel, Nathaniel O.; Stoletzki, Nina; Sunyaev, Shamil R.; Voight, Benjamin F.; Willer, Cristen J.; Rich, Stephen S.; Akylbekova, Ermeg; Atwood, Larry D.; Ballantyne, Christie M.; Barbalic, Maja; Barr, R. Graham; Benjamin, Emelia J.; Bis, Joshua; Boerwinkle, Eric; Bowden, Donald W.; Brody, Jennifer; Budoff, Matthew; Burke, Greg; Buxbaum, Sarah; Carr, Jeff; Chen, Donna T.; Chen, Ida Y.; Chen, Wei-Min; Concannon, Pat; Crosby, Jacy; Cupples, L. Adrienne; D’Agostino, Ralph; DeStefano, Anita L.; Dreisbach, Albert; Dupuis, Josée; Durda, J. Peter; Ellis, Jaclyn; Folsom, Aaron R.; Fornage, Myriam; Fox, Caroline S.; Fox, Ervin; Funari, Vincent; Ganesh, Santhi K.; Gardin, Julius; Goff, David; Gordon, Ora; Grody, Wayne; Gross, Myron; Guo, Xiuqing; Hall, Ira M.; Heard-Costa, Nancy L.; Heckbert, Susan R.; Heintz, Nicholas; Herrington, David M.; Hickson, DeMarc; Huang, Jie; Hwang, Shih-Jen; Jacobs, David R.; Jenny, Nancy S.; Johnson, Andrew D.; Johnson, Craig W.; Kawut, Steven; Kronmal, Richard; Kurz, Raluca; Lange, Ethan M.; Lange, Leslie A.; Larson, Martin G.; Lawson, Mark; Lewis, Cora E.; Levy, Daniel; Li, Dalin; Lin, Honghuang; Liu, Chunyu; Liu, Jiankang; Liu, Kiang; Liu, Xiaoming; Liu, Yongmei; Longstreth, William T.; Loria, Cay; Lumley, Thomas; Lunetta, Kathryn; Mackey, Aaron J.; Mackey, Rachel; Manichaikul, Ani; Maxwell, Taylor; McKnight, Barbara; Meigs, James B.; Morrison, Alanna C.; Musani, Solomon K.; Mychaleckyj, Josyf C.; Nettleton, Jennifer A.; North, Kari; O’Donnell, Christopher J.; O’Leary, Daniel; Ong, Frank; Palmas, Walter; Pankow, James S.; Pankratz, Nathan D.; Paul, Shom; Perez, Marco; Person, Sharina D.; Polak, Joseph; Post, Wendy S.; Psaty, Bruce M.; Quinlan, Aaron R.; Raffel, Leslie J.; Ramachandran, Vasan S.; Reiner, Alexander P.; Rice, Kenneth; Rotter, Jerome I.; Sanders, Jill P.; Schreiner, Pamela; Seshadri, Sudha; Shea, Steve; Sidney, Stephen; Silverstein, Kevin; Smith, Nicholas L.; Sotoodehnia, Nona; Srinivasan, Asoke; Taylor, Herman A.; Taylor, Kent; Thomas, Fridtjof; Tracy, Russell P.; Tsai, Michael Y.; Volcik, Kelly A.; Wassel, Chrstina L.; Watson, Karol; Wei, Gina; White, Wendy; Wiggins, Kerri L.; Wilk, Jemma B.; Williams, O. Dale; Wilson, Gregory; Wilson, James G.; Wolf, Phillip; Zakai, Neil A.; Hardy, John; Meschia, James F.; Nalls, Michael; Singleton, Andrew; Worrall, Brad; Bamshad, Michael J.; Barnes, Kathleen C.; Abdulhamid, Ibrahim; Accurso, Frank; Anbar, Ran; Beaty, Terri; Bigham, Abigail; Black, Phillip; Bleecker, Eugene; Buckingham, Kati; Cairns, Anne Marie; Caplan, Daniel; Chatfield, Barbara; Chidekel, Aaron; Cho, Michael; Christiani, David C.; Crapo, James D.; Crouch, Julia; Daley, Denise; Dang, Anthony; Dang, Hong; De Paula, Alicia; DeCelie-Germana, Joan; Drumm, Allen DozorMitch; Dyson, Maynard; Emerson, Julia; Emond, Mary J.; Ferkol, Thomas; Fink, Robert; Foster, Cassandra; Froh, Deborah; Gao, Li; Gershan, William; Gibson, Ronald L.; Godwin, Elizabeth; Gondor, Magdalen; Gutierrez, Hector; Hansel, Nadia N.; Hassoun, Paul M.; Hiatt, Peter; Hokanson, John E.; Howenstine, Michelle; Hummer, Laura K.; Kanga, Jamshed; Kim, Yoonhee; Knowles, Michael R.; Konstan, Michael; Lahiri, Thomas; Laird, Nan; Lange, Christoph; Lin, Lin; Lin, Xihong; Louie, Tin L.; Lynch, David; Make, Barry; Martin, Thomas R.; Mathai, Steve C.; Mathias, Rasika A.; McNamara, John; McNamara, Sharon; Meyers, Deborah; Millard, Susan; Mogayzel, Peter; Moss, Richard; Murray, Tanda; Nielson, Dennis; Noyes, Blakeslee; O’Neal, Wanda; Orenstein, David; O’Sullivan, Brian; Pace, Rhonda; Pare, Peter; Parker, H. Worth; Passero, Mary Ann; Perkett, Elizabeth; Prestridge, Adrienne; Rafaels, Nicholas M.; Ramsey, Bonnie; Regan, Elizabeth; Ren, Clement; Retsch-Bogart, George; Rock, Michael; Rosen, Antony; Rosenfeld, Margaret; Ruczinski, Ingo; Sanford, Andrew; Schaeffer, David; Sell, Cindy; Sheehan, Daniel; Silverman, Edwin K.; Sin, Don; Spencer, Terry; Stonebraker, Jackie; Tabor, Holly K.; Varlotta, Laurie; Vergara, Candelaria I.; Weiss, Robert; Wigley, Fred; Wise, Robert A.; Wright, Fred A.; Wurfel, Mark M.; Zanni, Robert; Zou, Fei; Nickerson, Deborah A.; Rieder, Mark J.; Green, Phil; Shendure, Jay; Akey, Joshua M.; Bustamante, Carlos D.; Crosslin, David R.; Eichler, Evan E.; Fox, P. Keolu; Fu, Wenqing; Gordon, Adam; Gravel, Simon; Jarvik, Gail P.; Johnsen, Jill M.; Kan, Mengyuan; Kenny, Eimear E.; Kidd, Jeffrey M.; Lara-Garduno, Fremiet; Leal, Suzanne M.; Liu, Dajiang J.; McGee, Sean; O’Connor, Timothy D.; Paeper, Bryan; Robertson, Peggy D.; Smith, Joshua D.; Staples, Jeffrey C.; Tennessen, Jacob A.; Turner, Emily H.; Wang, Gao; Yi, Qian; Jackson, Rebecca; Peters, Ulrike; Carlson, Christopher S.; Anderson, Garnet; Anton-Culver, Hoda; Assimes, Themistocles L.; Auer, Paul L.; Beresford, Shirley; Bizon, Chris; Black, Henry; Brunner, Robert; Brzyski, Robert; Burwen, Dale; Caan, Bette; Carty, Cara L.; Chlebowski, Rowan; Cummings, Steven; Curb, J. David; Eaton, Charles B.; Ford, Leslie; Franceschini, Nora; Fullerton, Stephanie M.; Gass, Margery; Geller, Nancy; Heiss, Gerardo; Howard, Barbara V.; Hsu, Li; Hutter, Carolyn M.; Ioannidis, John; Jiao, Shuo; Johnson, Karen C.; Kooperberg, Charles; Kuller, Lewis; LaCroix, Andrea; Lakshminarayan, Kamakshi; Lane, Dorothy; Lasser, Norman; LeBlanc, Erin; Li, Kuo-Ping; Limacher, Marian; Lin, Dan-Yu; Logsdon, Benjamin A.; Ludlam, Shari; Manson, JoAnn E.; Margolis, Karen; Martin, Lisa; McGowan, Joan; Monda, Keri L.; Kotchen, Jane Morley; Nathan, Lauren; Ockene, Judith; O’Sullivan, Mary Jo; Phillips, Lawrence S.; Prentice, Ross L.; Robbins, John; Robinson, Jennifer G.; Rossouw, Jacques E.; Sangi-Haghpeykar, Haleh; Sarto, Gloria E.; Shumaker, Sally; Simon, Michael S.; Stefanick, Marcia L.; Stein, Evan; Tang, Hua; Taylor, Kira C.; Thomson, Cynthia A.; Thornton, Timothy A.; Van Horn, Linda; Vitolins, Mara; Wactawski-Wende, Jean; Wallace, Robert; Wassertheil-Smoller, Sylvia; Zeng, Donglin; Applebaum-Bowden, Deborah; Feolo, Michael; Gan, Weiniu; Paltoo, Dina N.; Sholinsky, Phyliss; Sturcke, Anne

    2014-01-01

    Elevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency variants. To determine whether rare or low-frequency coding variants are associated with LDL-C, we exome sequenced 2,005 individuals, including 554 individuals selected for extreme LDL-C (>98th or <2nd percentile). Follow-up analyses included sequencing of 1,302 additional individuals and genotype-based analysis of 52,221 individuals. We observed significant evidence of association between LDL-C and the burden of rare or low-frequency variants in PNPLA5, encoding a phospholipase-domain-containing protein, and both known and previously unidentified variants in PCSK9, LDLR and APOB, three known lipid-related genes. The effect sizes for the burden of rare variants for each associated gene were substantially higher than those observed for individual SNPs identified from GWASs. We replicated the PNPLA5 signal in an independent large-scale sequencing study of 2,084 individuals. In conclusion, this large whole-exome-sequencing study for LDL-C identified a gene not known to be implicated in LDL-C and provides unique insight into the design and analysis of similar experiments. PMID:24507775

  19. HIV-1 envelope sequence-based diversity measures for identifying recent infections.

    Directory of Open Access Journals (Sweden)

    Alexis Kafando

    Full Text Available Identifying recent HIV-1 infections is crucial for monitoring HIV-1 incidence and optimizing public health prevention efforts. To identify recent HIV-1 infections, we evaluated and compared the performance of 4 sequence-based diversity measures including percent diversity, percent complexity, Shannon entropy and number of haplotypes targeting 13 genetic segments within the env gene of HIV-1. A total of 597 diagnostic samples obtained in 2013 and 2015 from recently and chronically HIV-1 infected individuals were selected. From the selected samples, 249 (134 from recent versus 115 from chronic infections env coding regions, including V1-C5 of gp120 and the gp41 ectodomain of HIV-1, were successfully amplified and sequenced by next generation sequencing (NGS using the Illumina MiSeq platform. The ability of the four sequence-based diversity measures to correctly identify recent HIV infections was evaluated using the frequency distribution curves, median and interquartile range and area under the curve (AUC of the receiver operating characteristic (ROC. Comparing the median and interquartile range and evaluating the frequency distribution curves associated with the 4 sequence-based diversity measures, we observed that the percent diversity, number of haplotypes and Shannon entropy demonstrated significant potential to discriminate recent from chronic infections (p<0.0001. Using the AUC of ROC analysis, only the Shannon entropy measure within three HIV-1 env segments could accurately identify recent infections at a satisfactory level. The env segments were gp120 C2_1 (AUC = 0.806, gp120 C2_3 (AUC = 0.805 and gp120 V3 (AUC = 0.812. Our results clearly indicate that the Shannon entropy measure represents a useful tool for predicting HIV-1 infection recency.

  20. Structural variation discovery in the cancer genome using next generation sequencing: Computational solutions and perspectives

    Science.gov (United States)

    Liu, Biao; Conroy, Jeffrey M.; Morrison, Carl D.; Odunsi, Adekunle O.; Qin, Maochun; Wei, Lei; Trump, Donald L.; Johnson, Candace S.; Liu, Song; Wang, Jianmin

    2015-01-01

    Somatic Structural Variations (SVs) are a complex collection of chromosomal mutations that could directly contribute to carcinogenesis. Next Generation Sequencing (NGS) technology has emerged as the primary means of interrogating the SVs of the cancer genome in recent investigations. Sophisticated computational methods are required to accurately identify the SV events and delineate their breakpoints from the massive amounts of reads generated by a NGS experiment. In this review, we provide an overview of current analytic tools used for SV detection in NGS-based cancer studies. We summarize the features of common SV groups and the primary types of NGS signatures that can be used in SV detection methods. We discuss the principles and key similarities and differences of existing computational programs and comment on unresolved issues related to this research field. The aim of this article is to provide a practical guide of relevant concepts, computational methods, software tools and important factors for analyzing and interpreting NGS data for the detection of SVs in the cancer genome. PMID:25849937

  1. Mitochondrial DNA D-loop sequence variation among 5 maternal lines of the Zemaitukai horse breed

    Directory of Open Access Journals (Sweden)

    E. Gus Cothran

    2005-12-01

    Full Text Available Genetic variation in Zemaitukai horses was investigated using mitochondrial DNA (mtDNA sequencing. The study was performed on 421 bp of the mitochondrial DNA control region, which is known to be more variable than other sections of the mitochondrial genome. Samples from each of the remaining maternal family lines of Zemaitukai horses and three random samples for other Lithuanian (Lithuanian Heavy Draught, Zemaitukai large type and ten European horse breeds were sequenced. Five distinct haplotypes were obtained for the five Zemaitukai maternal families supporting the pedigree data. The minimal difference between two different sequence haplotypes was 6 and the maximal 11 nucleotides in Zemaitukai horse breed. A total of 20 nucleotide differences compared to the reference sequence were found in Lithuanian horse breeds. Genetic cluster analysis did not shown any clear pattern of relationship among breeds of different type.

  2. BLAT2DOLite: An Online System for Identifying Significant Relationships between Genetic Sequences and Diseases.

    Directory of Open Access Journals (Sweden)

    Liang Cheng

    Full Text Available The significantly related diseases of sequences could play an important role in understanding the functions of these sequences. In this paper, we introduced BLAT2DOLite, an online system for annotating human genes and diseases and identifying the significant relationships between sequences and diseases. Currently, BLAT2DOLite integrates Entrez Gene database and Disease Ontology Lite (DOLite, which contain loci of gene and relationships between genes and diseases. It utilizes hypergeometric test to calculate P-values between genes and diseases of DOLite. The system can be accessed from: http://123.59.132.21:8080/BLAT2DOLite. The corresponding web service is described in: http://123.59.132.21:8080/BLAT2DOLite/BLAT2DOLiteIDMappingPort?wsdl.

  3. Novel expressed sequences identified in a model of androgen independent prostate cancer

    Directory of Open Access Journals (Sweden)

    Jones Steven JM

    2007-01-01

    Full Text Available Abstract Background Prostate cancer is the most frequently diagnosed cancer in American men, and few effective treatment options are available to patients who develop hormone-refractory prostate cancer. The molecular changes that occur to allow prostate cells to proliferate in the absence of androgens are not fully understood. Results Subtractive hybridization experiments performed with samples from an in vivo model of hormonal progression identified 25 expressed sequences representing novel human transcripts. Intriguingly, these 25 sequences have small open-reading frames and are not highly conserved through evolution, suggesting many of these novel expressed sequences may be derived from untranslated regions of novel transcripts or from non-coding transcripts. Examination of a large metalibrary of human Serial Analysis of Gene Expression (SAGE tags demonstrated that only three of these novel sequences had been previously detected. RT-PCR experiments confirmed that the 6 sequences tested were expressed in specific human tissues, as well as in clinical samples of prostate cancer. Further RT-PCR experiments for five of these fragments indicated they originated from large untranslated regions of unannotated transcripts. Conclusion This study underlines the value of using complementary techniques in the annotation of the human genome. The tissue-specific expression of 4 of the 6 clones tested indicates the expression of these novel transcripts is tightly regulated, and future work will determine the possible role(s these novel transcripts may play in the progression of prostate cancer.

  4. Functional brain activation differences in stuttering identified with a rapid fMRI sequence

    Science.gov (United States)

    Kraft, Shelly Jo; Choo, Ai Leen; Sharma, Harish; Ambrose, Nicoline G.

    2011-01-01

    The purpose of this study was to investigate whether brain activity related to the presence of stuttering can be identified with rapid functional MRI (fMRI) sequences that involved overt and covert speech processing tasks. The long-term goal is to develop sensitive fMRI approaches with developmentally appropriate tasks to identify deviant speech motor and auditory brain activity in children who stutter closer to the age at which recovery from stuttering is documented. Rapid sequences may be preferred for individuals or populations who do not tolerate long scanning sessions. In this report, we document the application of a picture naming and phoneme monitoring task in three minute fMRI sequences with adults who stutter (AWS). If relevant brain differences are found in AWS with these approaches that conform to previous reports, then these approaches can be extended to younger populations. Pairwise contrasts of brain BOLD activity between AWS and normally fluent adults indicated the AWS showed higher BOLD activity in the right inferior frontal gyrus (IFG), right temporal lobe and sensorimotor cortices during picture naming and and higher activity in the right IFG during phoneme monitoring. The right lateralized pattern of BOLD activity together with higher activity in sensorimotor cortices is consistent with previous reports, which indicates rapid fMRI sequences can be considered for investigating stuttering in younger participants. PMID:22133409

  5. Molecular defects identified by whole exome sequencing in a child with Fanconi anemia.

    Science.gov (United States)

    Zheng, Zhaojing; Geng, Juan; Yao, Ru-En; Li, Caihua; Ying, Daming; Shen, Yongnian; Ying, Lei; Yu, Yongguo; Fu, Qihua

    2013-11-10

    Fanconi anemia is a rare genetic disease characterized by bone marrow failure, multiple congenital malformations, and an increased susceptibility to malignancy. At least 15 genes have been identified that are involved in the pathogenesis of Fanconi anemia. However, it is still a challenge to assign the complementation group and to characterize the molecular defects in patients with Fanconi anemia. In the current study, whole exome sequencing was used to identify the affected gene(s) in a boy with Fanconi anemia. A recurring, non-synonymous mutation was found (c.3971C>T, p.P1324L) as well as a novel frameshift mutation (c.989_995del, p.H330LfsX2) in FANCA gene. Our results indicate that whole exome sequencing may be useful in clinical settings for rapid identification of disease-causing mutations in rare genetic disorders such as Fanconi anemia. © 2013 Elsevier B.V. All rights reserved.

  6. ChickVD: a sequence variation database for the chicken genome

    DEFF Research Database (Denmark)

    Wang, Jing; He, Ximiao; Ruan, Jue

    2005-01-01

    Working in parallel with the efforts to sequence the chicken (Gallus gallus) genome, the Beijing Genomics Institute led an international team of scientists from China, USA, UK, Sweden, The Netherlands and Germany to map extensive DNA sequence variation throughout the chicken genome by sampling DN...... on quantitative trait loci using data from collaborating institutions and public resources. Our data can be queried by search engine and homology-based BLAST searches. ChickVD is publicly accessible at http://chicken.genomics.org.cn. Udgivelsesdato: 2005-Jan-1...

  7. Whole-exome sequencing identified a variant in EFTUD2 gene in establishing a genetic diagnosis.

    Science.gov (United States)

    Rengasamy Venugopalan, S; Farrow, E G; Lypka, M

    2017-06-01

    Craniofacial anomalies are complex and have an overlapping phenotype. Mandibulofacial Dysostosis and Oculo-Auriculo-Vertebral Spectrum are conditions that share common craniofacial phenotype and present a challenge in arriving at a diagnosis. In this report, we present a case of female proband who was given a differential diagnosis of Treacher Collins syndrome or Hemifacial Microsomia without certainty. Prior genetic testing reported negative for 22q deletion and FGFR screenings. The objective of this study was to demonstrate the critical role of whole-exome sequencing in establishing a genetic diagnosis of the proband. The participants were 14½-year-old affected female proband/parent trio. Proband/parent trio were enrolled in the study. Surgical tissue sample from the proband and parental blood samples were collected and prepared for whole-exome sequencing. Illumina HiSeq 2500 instrument was used for sequencing (125 nucleotide reads/84X coverage). Analyses of variants were performed using custom-developed software, RUNES and VIKING. Variant analyses following whole-exome sequencing identified a heterozygous de novo pathogenic variant, c.259C>T (p.Gln87*), in EFTUD2 (NM_004247.3) gene in the proband. Previous studies have reported that the variants in EFTUD2 gene were associated with Mandibulofacial Dysostosis with Microcephaly. Patients with facial asymmetry, micrognathia, choanal atresia and microcephaly should be analyzed for variants in EFTUD2 gene. Next-generation sequencing techniques, such as whole-exome sequencing offer great promise to improve the understanding of etiologies of sporadic genetic diseases. © 2017 John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  8. Estimating variation within the genes and inferring the phylogeny of 186 sequenced diverse Escherichia coli genomes

    DEFF Research Database (Denmark)

    Kaas, Rolf Sommer; Rundsten, Carsten Friis; Ussery, David

    2012-01-01

    Background Escherichia coli exists in commensal and pathogenic forms. By measuring the variation of individual genes across more than a hundred sequenced genomes, gene variation can be studied in detail, including the number of mutations found for any given gene. This knowledge will be useful...... for creating better phylogenies, for determination of molecular clocks and for improved typing techniques. Results We find 3,051 gene clusters/families present in at least 95% of the genomes and 1,702 gene clusters present in 100% of the genomes. The former 'soft core' of about 3,000 gene families is perhaps...... more biologically relevant, especially considering that many of these genome sequences are draft quality. The E. coli pan-genome for this set of isolates contains 16,373 gene clusters. A core-gene tree, based on alignment and a pan-genome tree based on gene presence/absence, maps the relatedness...

  9. Identifying transposon insertions and their effects from RNA-sequencing data.

    Science.gov (United States)

    de Ruiter, Julian R; Kas, Sjors M; Schut, Eva; Adams, David J; Koudijs, Marco J; Wessels, Lodewyk F A; Jonkers, Jos

    2017-07-07

    Insertional mutagenesis using engineered transposons is a potent forward genetic screening technique used to identify cancer genes in mouse model systems. In the analysis of these screens, transposon insertion sites are typically identified by targeted DNA-sequencing and subsequently assigned to predicted target genes using heuristics. As such, these approaches provide no direct evidence that insertions actually affect their predicted targets or how transcripts of these genes are affected. To address this, we developed IM-Fusion, an approach that identifies insertion sites from gene-transposon fusions in standard single- and paired-end RNA-sequencing data. We demonstrate IM-Fusion on two separate transposon screens of 123 mammary tumors and 20 B-cell acute lymphoblastic leukemias, respectively. We show that IM-Fusion accurately identifies transposon insertions and their true target genes. Furthermore, by combining the identified insertion sites with expression quantification, we show that we can determine the effect of a transposon insertion on its target gene(s) and prioritize insertions that have a significant effect on expression. We expect that IM-Fusion will significantly enhance the accuracy of cancer gene discovery in forward genetic screens and provide initial insight into the biological effects of insertions on candidate cancer genes. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. Population-genomic variation within RNA viruses of the Western honey bee, Apis mellifera, inferred from deep sequencing.

    Science.gov (United States)

    Cornman, Robert Scott; Boncristiani, Humberto; Dainat, Benjamin; Chen, Yanping; vanEngelsdorp, Dennis; Weaver, Daniel; Evans, Jay D

    2013-03-07

    Deep sequencing of viruses isolated from infected hosts is an efficient way to measure population-genetic variation and can reveal patterns of dispersal and natural selection. In this study, we mined existing Illumina sequence reads to investigate single-nucleotide polymorphisms (SNPs) within two RNA viruses of the Western honey bee (Apis mellifera), deformed wing virus (DWV) and Israel acute paralysis virus (IAPV). All viral RNA was extracted from North American samples of honey bees or, in one case, the ectoparasitic mite Varroa destructor. Coverage depth was generally lower for IAPV than DWV, and marked gaps in coverage occurred in several narrow regions (selection. The Kakugo strain of DWV fell outside of all other DWV sequences at 100% bootstrap support. IAPV consensus sequences supported the existence of multiple clades as had been previously reported, and Fu and Li's D was closer to neutral expectation overall, although a sliding-window analysis identified a significantly positive D within the protease region, suggesting selection maintains diversity in that region. Within-sample mean diversity was comparable between the two viruses on average, although for both viruses there was substantial variation among samples in mean diversity at third codon positions and in the number of high-diversity sites. FST values were bimodal for DWV, likely reflecting neutral divergence in two low-diversity populations, whereas IAPV had several sites that were strong outliers with very low FST. This initial survey of genetic variation within honey bee RNA viruses suggests future directions for studies examining the underlying causes of population-genetic structure in these economically important pathogens.

  11. CNV-RF Is a Random Forest-Based Copy Number Variation Detection Method Using Next-Generation Sequencing.

    Science.gov (United States)

    Onsongo, Getiria; Baughn, Linda B; Bower, Matthew; Henzler, Christine; Schomaker, Matthew; Silverstein, Kevin A T; Thyagarajan, Bharat

    2016-11-01

    Simultaneous detection of small copy number variations (CNVs) (<0.5 kb) and single-nucleotide variants in clinically significant genes is of great interest for clinical laboratories. The analytical variability in next-generation sequencing (NGS) and artifacts in coverage data because of issues with mappability along with lack of robust bioinformatics tools for CNV detection have limited the utility of targeted NGS data to identify CNVs. We describe the development and implementation of a bioinformatics algorithm, copy number variation-random forest (CNV-RF), that incorporates a machine learning component to identify CNVs from targeted NGS data. Using CNV-RF, we identified 12 of 13 deletions in samples with known CNVs, two cases with duplications, and identified novel deletions in 22 additional cases. Furthermore, no CNVs were identified among 60 genes in 14 cases with normal copy number and no CNVs were identified in another 104 patients with clinical suspicion of CNVs. All positive deletions and duplications were confirmed using a quantitative PCR method. CNV-RF also detected heterozygous deletions and duplications with a specificity of 50% across 4813 genes. The ability of CNV-RF to detect clinically relevant CNVs with a high degree of sensitivity along with confirmation using a low-cost quantitative PCR method provides a framework for providing comprehensive NGS-based CNV/single-nucleotide variant detection in a clinical molecular diagnostics laboratory. Copyright © 2016 American Society for Investigative Pathology and the Association for Molecular Pathology. Published by Elsevier Inc. All rights reserved.

  12. Sequence variation of koala retrovirus transmembrane protein p15E among koalas from different geographic regions

    Science.gov (United States)

    Ishida, Yasuko; McCallister, Chelsea; Nikolaidis, Nikolas; Tsangaras, Kyriakos; Helgen, Kristofer M.; Greenwood, Alex D.; Roca, Alfred L.

    2014-01-01

    The koala retrovirus (KoRV), which is transitioning from an exogenous to an endogenous form, has been associated with high mortality in koalas. For other retroviruses, the envelope protein p15E has been considered a candidate for vaccine development. We therefore examined proviral sequence variation of KoRV p15E in a captive Queensland and three wild southern Australian koalas. We generated 163 sequences with intact open reading frames, which grouped into 39 distinct haplotypes. Sixteen distinct haplotypes comprising 139 of the sequences (85%) coded for the same polypeptide. Among the remaining 23 haplotypes, 22 were detected only once among the sequences, and each had 1 or 2 non-synonymous differences from the majority sequence. Several analyses suggested that p15E was under purifying selection. Important epitopes and domains were highly conserved across the p15E sequences and in previously reported exogenous KoRVs. Overall, these results support the potential use of p15E for KoRV vaccine development. PMID:25462343

  13. Deep sequencing of uveal melanoma identifies a recurrent mutation in PLCB4

    DEFF Research Database (Denmark)

    Johansson, Peter; Aoude, Lauren G; Wadt, Karin

    2016-01-01

    Next generation sequencing of uveal melanoma (UM) samples has identified a number of recurrent oncogenic or loss-of-function mutations in key driver genes including: GNAQ, GNA11, EIF1AX, SF3B1 and BAP1. To search for additional driver mutations in this tumor type we carried out whole......, instead, a BRCA mutation signature predominated. In addition to mutations in the known UM driver genes, we found a recurrent mutation in PLCB4 (c.G1888T, p.D630Y, NM_000933), which was validated using Sanger sequencing. The identical mutation was also found in published UM sequence data (1 of 56 tumors......-genome or whole-exome sequencing of 28 tumors or primary cell lines. These samples have a low mutation burden, with a mean of 10.6 protein changing mutations per sample (range 0 to 53). As expected for these sun-shielded melanomas the mutation spectrum was not consistent with an ultraviolet radiation signature...

  14. Sequence variation of the feline immunodeficiency virus genome and its clinical relevance.

    Science.gov (United States)

    Stickney, A L; Dunowska, M; Cave, N J

    2013-06-08

    The ongoing evolution of feline immunodeficiency virus (FIV) has resulted in the existence of a diverse continuum of viruses. FIV isolates differ with regards to their mutation and replication rates, plasma viral loads, cell tropism and the ability to induce apoptosis. Clinical disease in FIV-infected cats is also inconsistent. Genomic sequence variation of FIV is likely to be responsible for some of the variation in viral behaviour. The specific genetic sequences that influence these key viral properties remain to be determined. With knowledge of the specific key determinants of pathogenicity, there is the potential for veterinarians in the future to apply this information for prognostic purposes. Genomic sequence variation of FIV also presents an obstacle to effective vaccine development. Most challenge studies demonstrate acceptable efficacy of a dual-subtype FIV vaccine (Fel-O-Vax FIV) against FIV infection under experimental settings; however, vaccine efficacy in the field still remains to be proven. It is important that we discover the key determinants of immunity induced by this vaccine; such data would compliment vaccine field efficacy studies and provide the basis to make informed recommendations on its use.

  15. High Sequence Variations in Mitochondrial DNA Control Region among Worldwide Populations of Flathead Mullet Mugil cephalus

    Directory of Open Access Journals (Sweden)

    Brian Wade Jamandre

    2014-01-01

    Full Text Available The sequence and structure of the complete mtDNA control region (CR of M. cephalus from African, Pacific, and Atlantic populations are presented in this study to assess its usefulness in phylogeographic studies of this species. The mtDNA CR sequence variations among M. cephalus populations largely exceeded intraspecific polymorphisms that are generally observed in other vertebrates. The length of CR sequence varied among M. cephalus populations due to the presence of indels and variable number of tandem repeats at the 3′ hypervariable domain. The high evolutionary rate of the CR in this species probably originated from these mutations. However, no excessive homoplasic mutations were noticed. Finally, the star shaped tree inferred from the CR polymorphism stresses a rapid radiation worldwide, in this species. The CR still appears as a good marker for phylogeographic investigations and additional worldwide samples are warranted to further investigate the genetic structure and evolution in M. cephalus.

  16. Mitochondrial DNA sequence variation in Finnish patients with matrilineal diabetes mellitus

    Directory of Open Access Journals (Sweden)

    Soini Heidi K

    2012-07-01

    Full Text Available Abstract Background The genetic background of type 2 diabetes is complex involving contribution by both nuclear and mitochondrial genes. There is an excess of maternal inheritance in patients with type 2 diabetes and, furthermore, diabetes is a common symptom in patients with mutations in mitochondrial DNA (mtDNA. Polymorphisms in mtDNA have been reported to act as risk factors in several complex diseases. Findings We examined the nucleotide variation in complete mtDNA sequences of 64 Finnish patients with matrilineal diabetes. We used conformation sensitive gel electrophoresis and sequencing to detect sequence variation. We analysed the pathogenic potential of nonsynonymous variants detected in the sequences and examined the role of the m.16189 T>C variant. Controls consisted of non-diabetic subjects ascertained in the same population. The frequency of mtDNA haplogroup V was 3-fold higher in patients with diabetes. Patients harboured many nonsynonymous mtDNA substitutions that were predicted to be possibly or probably damaging. Furthermore, a novel m.13762 T>G in MTND5 leading to p.Ser476Ala and several rare mtDNA variants were found. Haplogroup H1b harbouring m.16189 T > C and m.3010 G > A was found to be more frequent in patients with diabetes than in controls. Conclusions Mildly deleterious nonsynonymous mtDNA variants and rare population-specific haplotypes constitute genetic risk factors for maternally inherited diabetes.

  17. Utilising identifier error variation in linkage of large administrative data sources

    Directory of Open Access Journals (Sweden)

    Katie Harron

    2017-02-01

    Full Text Available Abstract Background Linkage of administrative data sources often relies on probabilistic methods using a set of common identifiers (e.g. sex, date of birth, postcode. Variation in data quality on an individual or organisational level (e.g. by hospital can result in clustering of identifier errors, violating the assumption of independence between identifiers required for traditional probabilistic match weight estimation. This potentially introduces selection bias to the resulting linked dataset. We aimed to measure variation in identifier error rates in a large English administrative data source (Hospital Episode Statistics; HES and to incorporate this information into match weight calculation. Methods We used 30,000 randomly selected HES hospital admissions records of patients aged 0–1, 5–6 and 18–19 years, for 2011/2012, linked via NHS number with data from the Personal Demographic Service (PDS; our gold-standard. We calculated identifier error rates for sex, date of birth and postcode and used multi-level logistic regression to investigate associations with individual-level attributes (age, ethnicity, and gender and organisational variation. We then derived: i weights incorporating dependence between identifiers; ii attribute-specific weights (varying by age, ethnicity and gender; and iii organisation-specific weights (by hospital. Results were compared with traditional match weights using a simulation study. Results Identifier errors (where values disagreed in linked HES-PDS records or missing values were found in 0.11% of records for sex and date of birth and in 53% of records for postcode. Identifier error rates differed significantly by age, ethnicity and sex (p < 0.0005. Errors were less frequent in males, in 5–6 year olds and 18–19 year olds compared with infants, and were lowest for the Asian ethic group. A simulation study demonstrated that substantial bias was introduced into estimated readmission rates in the presence

  18. Whole exome sequencing identifies novel mutation in eight Chinese children with isolated tetralogy of Fallot.

    Science.gov (United States)

    Liu, Lin; Wang, Hong-Dan; Cui, Cun-Ying; Qin, Yun-Yun; Fan, Tai-Bing; Peng, Bang-Tian; Zhang, Lian-Zhong; Wang, Cheng-Zeng

    2017-12-05

    Tetralogy of Fallot is the most common cyanotic congenital heart disease. However, its pathogenesis remains to be clarified. The purpose of this study was to identify the genetic variants in Tetralogy of Fallot by whole exome sequencing. Whole exome sequencing was performed among eight small families with Tetralogy of Fallot. Differential single nucleotide polymorphisms and small InDels were found by alignment within families and between families and then were verified by Sanger sequencing. Tetralogy of Fallot-related genes were determined by analysis using Gene Ontology /pathway, Online Mendelian Inheritance in Man, PubMed and other databases. A total of sixteen differential single nucleotide polymorphisms loci and eight differential small InDels were discovered. The sixteen differential single nucleotide polymorphisms loci were located on Chr 1, 2, 4, 5, 11, 12, 15, 22 and X. Among the sixteen single nucleotide polymorphisms loci, six has not been reported. The eight differential small InDels were located on Chr 2, 4, 9, 12, 17, 19 and X, whereas of the eight differential small InDels, two has not been reported. Analysis using Gene Ontology /pathway, Online Mendelian Inheritance in Man, PubMed and other databases revealed that PEX5 , NACA , ATXN2 , CELA1 , PCDHB4 and CTBP1 were associated with Tetralogy of Fallot. Our findings identify PEX5 , NACA , ATXN2 , CELA1 , PCDHB4 and CTBP1 mutations as underlying genetic causes of isolated tetralogy of Fallot.

  19. Exome Sequencing Identifies Potential Risk Variants for Mendelian Disorders at High Prevalence in Qatar

    Science.gov (United States)

    Rodriguez-Flores, Juan L.; Fakhro, Khalid; Hackett, Neil R.; Salit, Jacqueline; Fuller, Jennifer; Agosto-Perez, Francisco; Gharbiah, Maey; Malek, Joel A.; Zirie, Mahmoud; Jayyousi, Amin; Badii, Ramin; Al-Marri, Ajayeb Al-Nabet; Chouchane, Lotfi; Stadler, Dora J.; Hunter-Zinck, Haley; Mezey, Jason G.; Crystal, Ronald G.

    2013-01-01

    Exome sequencing of families of related individuals has been highly successful in identifying genetic polymorphisms responsible for Mendelian disorders. Here, we demonstrate the value of the reverse approach, where we use exome sequencing of a sample of unrelated individuals to analyze allele frequencies of known causal mutations for Mendelian diseases. We sequenced the exomes of 100 individuals representing the three major genetic subgroups of the Qatari population (Q1 Bedouin, Q2 Persian-South Asian, Q3 African) and identified 37 variants in 33 genes with effects on 36 clinically significant Mendelian diseases. These include variants not present in 1000 Genomes and variants at high frequency when compared to 1000 Genomes populations. Several of these Mendelian variants were only segregating in one Qatari subpopulation, where the observed subpopulation specificity trends were confirmed in an independent population of 386 Qataris. Pre-marital genetic screening in Qatar tests for only 4 out of the 37, such that this study provides a set of Mendelian disease variants with potential impact on the epidemiological profile of the population that could be incorporated into the testing program if further experimental and clinical characterization confirms high penetrance. PMID:24123366

  20. Complete genome sequence of Clostridium estertheticum DSM 8809, a microbe identified in spoiled vacuum packed beef

    Directory of Open Access Journals (Sweden)

    Zhongyi Yu

    2016-11-01

    Full Text Available Blown pack spoilage (BPS is a major issue for the beef industry. Aetiological agents of BPS involve members of a group of Clostridium species, including Clostridium estertheticum which has the ability to produce gas, mostly carbon dioxide, under anaerobic psychotrophic growth conditions. This spore-forming bacterium grows slowly under laboratory conditions, and it can take up to 3 months to produce a workable culture. These characteristics have limited the study of this commercially challenging bacterium. Consequently information on this bacterium is limited and no effective controls are currently available to confidently detect and manage this production risk. In this study the complete genome of Clostridium estertheticum DSM 8809 was determined by SMRT® sequencing. The genome consists of a circular chromosome of 4.7 Mbp along with a single plasmid carrying a potential tellurite resistance gene tehB and a Tn3-like resolvase-encoding gene tnpR. The genome sequence was searched for central metabolic pathways that would support its biochemical profile and several enzymes contributing to this phenotype were identified. Several putative antibiotic/biocide/metal resistance-encoding genes and virulence factors were also identified in the genome, a feature that requires further research. The availability of the genome sequence will provide a basic blueprint from which to develop valuable biomarkers that could support and improve the detection and control of this bacterium along the beef production chain.

  1. Somatic mosaicism of a CDKL5 mutation identified by next-generation sequencing.

    Science.gov (United States)

    Kato, Takeshi; Morisada, Naoya; Nagase, Hiroaki; Nishiyama, Masahiro; Toyoshima, Daisaku; Nakagawa, Taku; Maruyama, Azusa; Fu, Xue Jun; Nozu, Kandai; Wada, Hiroko; Takada, Satoshi; Iijima, Kazumoto

    2015-10-01

    CDKL5-related encephalopathy is an X-linked dominantly inherited disorder that is characterized by early infantile epileptic encephalopathy or atypical Rett syndrome. We describe a 5-year-old Japanese boy with intractable epilepsy, severe developmental delay, and Rett syndrome-like features. Onset was at 2 months, when his electroencephalogram showed sporadic single poly spikes and diffuse irregular poly spikes. We conducted a genetic analysis using an Illumina® TruSight™ One sequencing panel on a next-generation sequencer. We identified two epilepsy-associated single nucleotide variants in our case: CDKL5 p.Ala40Val and KCNQ2 p.Glu515Asp. CDKL5 p.Ala40Val has been previously reported to be responsible for early infantile epileptic encephalopathy. In our case, the CDKL5 heterozygous mutation showed somatic mosaicism because the boy's karyotype was 46,XY. The KCNQ2 variant p.Glu515Asp is known to cause benign familial neonatal seizures-1, and this variant showed paternal inheritance. Although we believe that the somatic mosaic CDKL5 mutation is mainly responsible for the neurological phenotype in the patient, the KCNQ2 variant might have some neurological effect. Genetic analysis by next-generation sequencing is capable of identifying multiple variants in a patient. Copyright © 2015 The Japanese Society of Child Neurology. Published by Elsevier B.V. All rights reserved.

  2. Genetic mapping and exome sequencing identify variants associated with five novel diseases.

    Directory of Open Access Journals (Sweden)

    Erik G Puffenberger

    Full Text Available The Clinic for Special Children (CSC has integrated biochemical and molecular methods into a rural pediatric practice serving Old Order Amish and Mennonite (Plain children. Among the Plain people, we have used single nucleotide polymorphism (SNP microarrays to genetically map recessive disorders to large autozygous haplotype blocks (mean = 4.4 Mb that contain many genes (mean = 79. For some, uninformative mapping or large gene lists preclude disease-gene identification by Sanger sequencing. Seven such conditions were selected for exome sequencing at the Broad Institute; all had been previously mapped at the CSC using low density SNP microarrays coupled with autozygosity and linkage analyses. Using between 1 and 5 patient samples per disorder, we identified sequence variants in the known disease-causing genes SLC6A3 and FLVCR1, and present evidence to strongly support the pathogenicity of variants identified in TUBGCP6, BRAT1, SNIP1, CRADD, and HARS. Our results reveal the power of coupling new genotyping technologies to population-specific genetic knowledge and robust clinical data.

  3. Expressed sequences tags of the anther smut fungus, Microbotryum violaceum, identify mating and pathogenicity genes

    Directory of Open Access Journals (Sweden)

    Devier Benjamin

    2007-08-01

    Full Text Available Abstract Background The basidiomycete fungus Microbotryum violaceum is responsible for the anther-smut disease in many plants of the Caryophyllaceae family and is a model in genetics and evolutionary biology. Infection is initiated by dikaryotic hyphae produced after the conjugation of two haploid sporidia of opposite mating type. This study describes M. violaceum ESTs corresponding to nuclear genes expressed during conjugation and early hyphal production. Results A normalized cDNA library generated 24,128 sequences, which were assembled into 7,765 unique genes; 25.2% of them displayed significant similarity to annotated proteins from other organisms, 74.3% a weak similarity to the same set of known proteins, and 0.5% were orphans. We identified putative pheromone receptors and genes that in other fungi are involved in the mating process. We also identified many sequences similar to genes known to be involved in pathogenicity in other fungi. The M. violaceum EST database, MICROBASE, is available on the Web and provides access to the sequences, assembled contigs, annotations and programs to compare similarities against MICROBASE. Conclusion This study provides a basis for cloning the mating type locus, for further investigation of pathogenicity genes in the anther smut fungi, and for comparative genomics.

  4. Bm86 midgut protein sequence variation in South Texas cattle fever ticks

    Directory of Open Access Journals (Sweden)

    Kammlah Diane M

    2010-11-01

    Full Text Available Abstract Background Cattle fever ticks, Rhipicephalus (Boophilus microplus and R. (B. annulatus, vector bovine and equine babesiosis, and have significantly expanded beyond the permanent quarantine zone established in South Texas. Currently, there are no vaccines approved for use within the United States for controlling these vectors. Vaccines developed in Australia and Cuba based on the midgut antigen Bm86 have variable efficacy against cattle fever ticks. A possible explanation for this variation in vaccine efficacy is amino acid sequence divergence between the recombinant Bm86 vaccine component and native Bm86 expressed in ticks from different geographical regions of the world. Results There was 91.8% amino acid sequence identity in Bm86 among R. microplus and R. annulatus sequenced from South Texas infestations. When South Texas isolates were compared to the Australian Yeerongpilly and Cuban Camcord vaccine strains, there was 89.8% and 90.0% identity, respectively. Most of the sequence divergence was focused in one region of the protein, amino acids 206-298. Hydrophilicity profiles revealed that two short regions of Bm86 (amino acids 206-210 and 560-570 appear to be more hydrophilic in South Texas isolates compared to vaccine strains. Only one amino acid difference was found between South Texas and vaccine strains within two previously described B-cell epitopes. A total of 4 amino acid differences were observed within three peptides previously shown to induce protective immune responses in cattle. Conclusions Sequence differences between South Texas isolates and Yeerongpilly and Camcord strains are spread throughout the entire Bm86 sequence, suggesting that geographic variation does exist. Differences within previously described B-cell epitopes between South Texas isolates and vaccine strains are minimal; however, short regions of hydrophilic amino acids found unique to South Texas isolates suggest that additional unique surface exposed

  5. Ultra Deep Sequencing of a Baculovirus Population Reveals Widespread Genomic Variations

    Directory of Open Access Journals (Sweden)

    Aurélien Chateigner

    2015-07-01

    Full Text Available Viruses rely on widespread genetic variation and large population size for adaptation. Large DNA virus populations are thought to harbor little variation though natural populations may be polymorphic. To measure the genetic variation present in a dsDNA virus population, we deep sequenced a natural strain of the baculovirus Autographa californica multiple nucleopolyhedrovirus. With 124,221X average genome coverage of our 133,926 bp long consensus, we could detect low frequency mutations (0.025%. K-means clustering was used to classify the mutations in four categories according to their frequency in the population. We found 60 high frequency non-synonymous mutations under balancing selection distributed in all functional classes. These mutants could alter viral adaptation dynamics, either through competitive or synergistic processes. Lastly, we developed a technique for the delimitation of large deletions in next generation sequencing data. We found that large deletions occur along the entire viral genome, with hotspots located in homologous repeat regions (hrs. Present in 25.4% of the genomes, these deletion mutants presumably require functional complementation to complete their infection cycle. They might thus have a large impact on the fitness of the baculovirus population. Altogether, we found a wide breadth of genomic variation in the baculovirus population, suggesting it has high adaptive potential.

  6. Whole-genome and Transcriptome Sequencing of Prostate Cancer Identify New Genetic Alterations Driving Disease Progression

    DEFF Research Database (Denmark)

    Ren, Shancheng; Wei, Gong-Hong; Liu, Dongbing

    2018-01-01

    BACKGROUND: Global disparities in prostate cancer (PCa) incidence highlight the urgent need to identify genomic abnormalities in prostate tumors in different ethnic populations including Asian men. OBJECTIVE: To systematically explore the genomic complexity and define disease-driven genetic......-scale and comprehensive genomic data of prostate cancer from Asian population. Identification of these genetic alterations may help advance prostate cancer diagnosis, prognosis, and treatment....... alterations in PCa. DESIGN, SETTING, AND PARTICIPANTS: The study sequenced whole-genome and transcriptome of tumor-benign paired tissues from 65 treatment-naive Chinese PCa patients. Subsequent targeted deep sequencing of 293 PCa-relevant genes was performed in another cohort of 145 prostate tumors. OUTCOME...

  7. Identifying significant temporal variation in time course microarray data without replicates

    Directory of Open Access Journals (Sweden)

    Porter Weston

    2009-03-01

    Full Text Available Abstract Background An important component of time course microarray studies is the identification of genes that demonstrate significant time-dependent variation in their expression levels. Until recently, available methods for performing such significance tests required replicates of individual time points. This paper describes a replicate-free method that was developed as part of a study of the estrous cycle in the rat mammary gland in which no replicate data was collected. Results A temporal test statistic is proposed that is based on the degree to which data are smoothed when fit by a spline function. An algorithm is presented that uses this test statistic together with a false discovery rate method to identify genes whose expression profiles exhibit significant temporal variation. The algorithm is tested on simulated data, and is compared with another recently published replicate-free method. The simulated data consists both of genes with known temporal dependencies, and genes from a null distribution. The proposed algorithm identifies a larger percentage of the time-dependent genes for a given false discovery rate. Use of the algorithm in a study of the estrous cycle in the rat mammary gland resulted in the identification of genes exhibiting distinct circadian variation. These results were confirmed in follow-up laboratory experiments. Conclusion The proposed algorithm provides a new approach for identifying expression profiles with significant temporal variation without relying on replicates. When compared with a recently published algorithm on simulated data, the proposed algorithm appears to identify a larger percentage of time-dependent genes for a given false discovery rate. The development of the algorithm was instrumental in revealing the presence of circadian variation in the virgin rat mammary gland during the estrous cycle.

  8. Sequence-Based Introgression Mapping Identifies Candidate White Mold Tolerance Genes in Common Bean

    Directory of Open Access Journals (Sweden)

    Sujan Mamidi

    2016-07-01

    Full Text Available White mold, caused by the necrotrophic fungus (Lib. de Bary, is a major disease of common bean ( L.. WM7.1 and WM8.3 are two quantitative trait loci (QTL with major effects on tolerance to the pathogen. Advanced backcross populations segregating individually for either of the two QTL, and a recombinant inbred (RI population segregating for both QTL were used to fine map and confirm the genetic location of the QTL. The QTL intervals were physically mapped using the reference common bean genome sequence, and the physical intervals for each QTL were further confirmed by sequence-based introgression mapping. Using whole-genome sequence data from susceptible and tolerant DNA pools, introgressed regions were identified as those with significantly higher numbers of single-nucleotide polymorphisms (SNPs relative to the whole genome. By combining the QTL and SNP data, WM7.1 was located to a 660-kb region that contained 41 gene models on the proximal end of chromosome Pv07, while the WM8.3 introgression was narrowed to a 1.36-Mb region containing 70 gene models. The most polymorphic candidate gene in the WM7.1 region encodes a BEACH-domain protein associated with apoptosis. Within the WM8.3 interval, a receptor-like protein with the potential to recognize pathogen effectors was the most polymorphic gene. The use of gene and sequence-based mapping identified two candidate genes whose putative functions are consistent with the current model of pathogenicity.

  9. Whole-Genome Sequencing of Sordaria macrospora Mutants Identifies Developmental Genes.

    Science.gov (United States)

    Nowrousian, Minou; Teichert, Ines; Masloff, Sandra; Kück, Ulrich

    2012-02-01

    The study of mutants to elucidate gene functions has a long and successful history; however, to discover causative mutations in mutants that were generated by random mutagenesis often takes years of laboratory work and requires previously generated genetic and/or physical markers, or resources like DNA libraries for complementation. Here, we present an alternative method to identify defective genes in developmental mutants of the filamentous fungus Sordaria macrospora through Illumina/Solexa whole-genome sequencing. We sequenced pooled DNA from progeny of crosses of three mutants and the wild type and were able to pinpoint the causative mutations in the mutant strains through bioinformatics analysis. One mutant is a spore color mutant, and the mutated gene encodes a melanin biosynthesis enzyme. The causative mutation is a G to A change in the first base of an intron, leading to a splice defect. The second mutant carries an allelic mutation in the pro41 gene encoding a protein essential for sexual development. In the mutant, we detected a complex pattern of deletion/rearrangements at the pro41 locus. In the third mutant, a point mutation in the stop codon of a transcription factor-encoding gene leads to the production of immature fruiting bodies. For all mutants, transformation with a wild type-copy of the affected gene restored the wild-type phenotype. Our data demonstrate that whole-genome sequencing of mutant strains is a rapid method to identify developmental genes in an organism that can be genetically crossed and where a reference genome sequence is available, even without prior mapping information.

  10. Sequence variation and phylogenetic analysis of envelope glycoprotein of hepatitis G virus.

    Science.gov (United States)

    Lim, M Y; Fry, K; Yun, A; Chong, S; Linnen, J; Fung, K; Kim, J P

    1997-11-01

    A transfusion-transmissible agent provisionally designated hepatitis G virus (HGV) was recently identified. In this study, we examined the variability of the HGV genome by analysing sequences in the putative envelope region from 72 isolates obtained from diverse geographical sources. The 1561 nucleotide sequence of the E1/E2/NS2a region of HGV was determined from 12 isolates, and compared with three published sequences. The most variability was observed in 400 nucleotides at the N terminus of E2. We next analysed this 400 nucleotide envelope variable region (EV) from an additional 60 HGV isolates. This sequence varied considerably among the 75 isolates, with overall identity ranging from 79.3% to 99.5% at the nucleotide level, and from 83.5% to 100% at the amino acid level. However, hypervariable regions were not identified. Phylogenetic analyses indicated that the 75 HGV isolates belong to a single genotype. A single-tier distribution of evolutionary distances was observed among the 15 E1/E2/NS2a sequences and the 75 EV sequences. In contrast, 11 isolates of HCV were analysed and showed a three-tiered distribution, representing genotypes, subtypes, and isolates. The 75 isolates of HGV fell into four clusters on the phylogenetic tree. Tight geographical clustering was observed among the HGV isolates from Japan and Korea.

  11. Contig Maps and Genomic Sequencing Identify Candidate Genes in the Usher 1C Locus

    Science.gov (United States)

    Higgins, Michael J.; Day, Colleen D.; Smilinich, Nancy J.; Ni, L.; Cooper, Paul R.; Nowak, Norma J.; Davies, Chris; de Jong, Pieter J.; Hejtmancik, Fielding; Evans, Glen A.; Smith, Richard J.H.; Shows, Thomas B.

    1998-01-01

    Usher syndrome 1C (USH1C) is a congenital condition manifesting profound hearing loss, the absence of vestibular function, and eventual retinal degeneration. The USH1C locus has been mapped genetically to a 2- to 3-cM interval in 11p14–15.1 between D11S899 and D11S861. In an effort to identify the USH1C disease gene we have isolated the region between these markers in yeast artificial chromosomes (YACs) using a combination of STS content mapping and Alu–PCR hybridization. The YAC contig is ∼3.5 Mb and has located several other loci within this interval, resulting in the order CEN-LDHA-SAA1-TPH-D11S1310-(D11S1888/KCNC1)-MYOD1-D11S902D11S921-D11S1890-TEL. Subsequent haplotyping and homozygosity analysis refined the location of the disease gene to a 400-kb interval between D11S902 and D11S1890 with all affected individuals being homozygous for the internal marker D11S921. To facilitate gene identification, the critical region has been converted into P1 artificial chromosome (PAC) clones using sequence-tagged sites (STSs) mapped to the YAC contig, Alu–PCR products generated from the YACs, and PAC end probes. A contig of >50 PAC clones has been assembled between D11S1310 and D11S1890, confirming the order of markers used in haplotyping. Three PAC clones representing nearly two-thirds of the USH1C critical region have been sequenced. PowerBLAST analysis identified six clusters of expressed sequence tags (ESTs), two known genes (BIR,SUR1) mapped previously to this region, and a previously characterized but unmapped gene NEFA (DNA binding/EF hand/acidic amino-acid-rich). GRAIL analysis identified 11 CpG islands and 73 exons of excellent quality. These data allowed the construction of a transcription map for the USH1C critical region, consisting of three known genes and six or more novel transcripts. Based on their map location, these loci represent candidate disease loci for USH1C. The NEFA gene was assessed as the USH1C locus by the sequencing of an amplified NEFA

  12. Modeling bias and variation in the stochastic processes of small RNA sequencing.

    Science.gov (United States)

    Argyropoulos, Christos; Etheridge, Alton; Sakhanenko, Nikita; Galas, David

    2017-06-20

    The use of RNA-seq as the preferred method for the discovery and validation of small RNA biomarkers has been hindered by high quantitative variability and biased sequence counts. In this paper we develop a statistical model for sequence counts that accounts for ligase bias and stochastic variation in sequence counts. This model implies a linear quadratic relation between the mean and variance of sequence counts. Using a large number of sequencing datasets, we demonstrate how one can use the generalized additive models for location, scale and shape (GAMLSS) distributional regression framework to calculate and apply empirical correction factors for ligase bias. Bias correction could remove more than 40% of the bias for miRNAs. Empirical bias correction factors appear to be nearly constant over at least one and up to four orders of magnitude of total RNA input and independent of sample composition. Using synthetic mixes of known composition, we show that the GAMLSS approach can analyze differential expression with greater accuracy, higher sensitivity and specificity than six existing algorithms (DESeq2, edgeR, EBSeq, limma, DSS, voom) for the analysis of small RNA-seq data. © The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

  13. Exome Sequencing and Linkage Analysis Identified Novel Candidate Genes in Recessive Intellectual Disability Associated with Ataxia.

    Science.gov (United States)

    Jazayeri, Roshanak; Hu, Hao; Fattahi, Zohreh; Musante, Luciana; Abedini, Seyedeh Sedigheh; Hosseini, Masoumeh; Wienker, Thomas F; Ropers, Hans Hilger; Najmabadi, Hossein; Kahrizi, Kimia

    2015-10-01

    Intellectual disability (ID) is a neuro-developmental disorder which causes considerable socio-economic problems. Some ID individuals are also affected by ataxia, and the condition includes different mutations affecting several genes. We used whole exome sequencing (WES) in combination with homozygosity mapping (HM) to identify the genetic defects in five consanguineous families among our cohort study, with two affected children with ID and ataxia as major clinical symptoms. We identified three novel candidate genes, RIPPLY1, MRPL10, SNX14, and a new mutation in known gene SURF1. All are autosomal genes, except RIPPLY1, which is located on the X chromosome. Two are housekeeping genes, implicated in transcription and translation regulation and intracellular trafficking, and two encode mitochondrial proteins. The pathogenesis of these variants was evaluated by mutation classification, bioinformatic methods, review of medical and biological relevance, co-segregation studies in the particular family, and a normal population study. Linkage analysis and exome sequencing of a small number of affected family members is a powerful new technique which can be used to decrease the number of candidate genes in heterogenic disorders such as ID, and may even identify the responsible gene(s).

  14. Globicatella sanguinis bacteraemia identified by partial 16S rRNA gene sequencing

    DEFF Research Database (Denmark)

    Abdul-Redha, Rawaa Jalil; Balslew, Ulla; Christensen, Jens Jørgen

    2007-01-01

    Globicatella sanguinis is a gram-positive coccus, resembling non-haemolytic streptococci. The organism has been isolated infrequently from normally sterile sites of humans. Three isolates obtained by blood culture could not be identified by Rapid 32 ID Strep, but partial sequencing of the 16S r......RNA gene revealed the identity of the isolated bacteria, and supplementary biochemical tests confirmed the species identification. The cases histories illustrate the dilemma of finding relevant, newly recognized, opportunistic pathogens and the identification achievement (s) that can be obtained by using...

  15. RePS: a sequence assembler that masks exact repeats identified from the shotgun data

    DEFF Research Database (Denmark)

    Wang, Jun; Wong, Gane Ka-Shu; Ni, Peixiang

    2002-01-01

    We describe a sequence assembler, RePS (repeat-masked Phrap with scaffolding), that explicitly identifies exact 20mer repeats from the shotgun data and removes them prior to the assembly. The established software is used to compute meaningful error probabilities for each base. Clone......-end-pairing information is used to construct scaffolds that order and orient the contigs. We show with real data for human and rice that reasonable assemblies are possible even at coverages of only 4x to 6x, despite having up to 42.2% in exact repeats. Udgivelsesdato: 2002-May...

  16. Quantitative Genetics Identifies Cryptic Genetic Variation Involved in the Paternal Regulation of Seed Development.

    Directory of Open Access Journals (Sweden)

    Nuno D Pires

    2016-01-01

    Full Text Available Embryonic development requires a correct balancing of maternal and paternal genetic information. This balance is mediated by genomic imprinting, an epigenetic mechanism that leads to parent-of-origin-dependent gene expression. The parental conflict (or kinship theory proposes that imprinting can evolve due to a conflict between maternal and paternal alleles over resource allocation during seed development. One assumption of this theory is that paternal alleles can regulate seed growth; however, paternal effects on seed size are often very low or non-existent. We demonstrate that there is a pool of cryptic genetic variation in the paternal control of Arabidopsis thaliana seed development. Such cryptic variation can be exposed in seeds that maternally inherit a medea mutation, suggesting that MEA acts as a maternal buffer of paternal effects. Genetic mapping using recombinant inbred lines, and a novel method for the mapping of parent-of-origin effects using whole-genome sequencing of segregant bulks, indicate that there are at least six loci with small, paternal effects on seed development. Together, our analyses reveal the existence of a pool of hidden genetic variation on the paternal control of seed development that is likely shaped by parental conflict.

  17. Completed Ensemble Empirical Mode Decomposition: a Robust Signal Processing Tool to Identify Sequence Strata

    Science.gov (United States)

    Purba, H.; Musu, J. T.; Diria, S. A.; Permono, W.; Sadjati, O.; Sopandi, I.; Ruzi, F.

    2018-03-01

    Well logging data provide many geological information and its trends resemble nonlinear or non-stationary signals. As long well log data recorded, there will be external factors can interfere or influence its signal resolution. A sensitive signal analysis is required to improve the accuracy of logging interpretation which it becomes an important thing to determine sequence stratigraphy. Complete Ensemble Empirical Mode Decomposition (CEEMD) is one of nonlinear and non-stationary signal analysis method which decomposes complex signal into a series of intrinsic mode function (IMF). Gamma Ray and Spontaneous Potential well log parameters decomposed into IMF-1 up to IMF-10 and each of its combination and correlation makes physical meaning identification. It identifies the stratigraphy and cycle sequence and provides an effective signal treatment method for sequence interface. This method was applied to BRK- 30 and BRK-13 well logging data. The result shows that the combination of IMF-5, IMF-6, and IMF-7 pattern represent short-term and middle-term while IMF-9 and IMF-10 represent the long-term sedimentation which describe distal front and delta front facies, and inter-distributary mouth bar facies, respectively. Thus, CEEMD clearly can determine the different sedimentary layer interface and better identification of the cycle of stratigraphic base level.

  18. Natural selection in a population of Drosophila melanogaster explained by changes in gene expression caused by sequence variation in core promoter regions.

    Science.gov (United States)

    Sato, Mitsuhiko P; Makino, Takashi; Kawata, Masakado

    2016-02-09

    Understanding the evolutionary forces that influence variation in gene regulatory regions in natural populations is an important challenge for evolutionary biology because natural selection for such variations could promote adaptive phenotypic evolution. Recently, whole-genome sequence analyses have identified regulatory regions subject to natural selection. However, these studies could not identify the relationship between sequence variation in the detected regions and change in gene expression levels. We analyzed sequence variations in core promoter regions, which are critical regions for gene regulation in higher eukaryotes, in a natural population of Drosophila melanogaster, and identified core promoter sequence variations associated with differences in gene expression levels subjected to natural selection. Among the core promoter regions whose sequence variation could change transcription factor binding sites and explain differences in expression levels, three core promoter regions were detected as candidates associated with purifying selection or selective sweep and seven as candidates associated with balancing selection, excluding the possibility of linkage between these regions and core promoter regions. CHKov1, which confers resistance to the sigma virus and related insecticides, was identified as core promoter regions that has been subject to selective sweep, although it could not be denied that selection for variation in core promoter regions was due to linked single nucleotide polymorphisms in the regulatory region outside core promoter regions. Nucleotide changes in core promoter regions of CHKov1 caused the loss of two basal transcription factor binding sites and acquisition of one transcription factor binding site, resulting in decreased gene expression levels. Of nine core promoter regions regions associated with balancing selection, brat, and CG9044 are associated with neuromuscular junction development, and Nmda1 are associated with learning

  19. Length and repeat-sequence variation in 58 STRs and 94 SNPs in two Spanish populations.

    Science.gov (United States)

    Casals, Ferran; Anglada, Roger; Bonet, Núria; Rasal, Raquel; van der Gaag, Kristiaan J; Hoogenboom, Jerry; Solé-Morata, Neus; Comas, David; Calafell, Francesc

    2017-09-01

    We have genotyped the 58 STRs (27 autosomal, 24 Y-STRs and 7 X-STRs) and 94 autosomal SNPs in Illumina ForenSeq™ Primer Mix A in 88 Spanish Roma (Gypsy) samples and 143 Catalans. Since this platform is based in massive parallel sequencing, we have used simple R scripts to uncover the sequence variation in the repeat region. Thus, we have found, across 58 STRs, 541 length-based alleles, which, after considering repeat-sequence variation, became 804 different alleles. All loci in both populations were in Hardy-Weinberg equilibrium. F ST between both populations was 0.0178 for autosomal SNPs, 0.0146 for autosomal STRs, 0.0101 for X-STRs and 0.1866 for Y-STRs. Combined a priori statistics showed quite large; for instance, pooling all the autosomal loci, the a priori probabilities of discriminating a suspect become 1-(2.3×10 -70 ) and 1-(5.9×10 -73 ), for Roma and Catalans respectively, and the chances of excluding a false father in a trio are 1-(2.6×10 -20 ) and 1-(2.0×10 -21 ). Copyright © 2017 Elsevier B.V. All rights reserved.

  20. Inferring Variation in Copy Number Using High Throughput Sequencing Data in R.

    Science.gov (United States)

    Knaus, Brian J; Grünwald, Niklaus J

    2018-01-01

    Inference of copy number variation presents a technical challenge because variant callers typically require the copy number of a genome or genomic region to be known a priori . Here we present a method to infer copy number that uses variant call format (VCF) data as input and is implemented in the R package vcfR . This method is based on the relative frequency of each allele (in both genic and non-genic regions) sequenced at heterozygous positions throughout a genome. These heterozygous positions are summarized by using arbitrarily sized windows of heterozygous positions, binning the allele frequencies, and selecting the bin with the greatest abundance of positions. This provides a non-parametric summary of the frequency that alleles were sequenced at. The method is applicable to organisms that have reference genomes that consist of full chromosomes or sub-chromosomal contigs. In contrast to other software designed to detect copy number variation, our method does not rely on an assumption of base ploidy, but instead infers it. We validated these approaches with the model system of Saccharomyces cerevisiae and applied it to the oomycete Phytophthora infestans , both known to vary in copy number. This functionality has been incorporated into the current release of the R package vcfR to provide modular and flexible methods to investigate copy number variation in genomic projects.

  1. Whole-Exome Sequencing Identifies ALMS1, IQCB1, CNGA3, and MYO7A Mutations in Patients with Leber Congenital Amaurosis

    OpenAIRE

    Wang, Xia; Wang, Hui; Cao, Ming; Li, Zhe; Chen, Xianfeng; Patenia, Claire; Gore, Athurva; Abboud, Emad B.; Al-Rajhi, Ali A.; Lewis, Richard A.; Lupski, James R.; Mardon, Graeme; Zhang, Kun; Muzny, Donna; Gibbs, Richard A.

    2011-01-01

    It has been well documented that mutations in the same retinal disease gene can result in different clinical phenotypes due to difference in the mutant allele and/or genetic background. To evaluate this, a set of consanguineous patient families with Leber congenital amaurosis (LCA) that do not carry mutations in known LCA disease genes was characterized through homozygosity mapping followed by targeted exon/whole-exome sequencing to identify genetic variations. Among these families, a total o...

  2. Whole genome re-sequencing reveals genome-wide variations among parental lines of 16 mapping populations in chickpea (Cicer arietinum L.).

    Science.gov (United States)

    Thudi, Mahendar; Khan, Aamir W; Kumar, Vinay; Gaur, Pooran M; Katta, Krishnamohan; Garg, Vanika; Roorkiwal, Manish; Samineni, Srinivasan; Varshney, Rajeev K

    2016-01-27

    Chickpea (Cicer arietinum L.) is the second most important grain legume cultivated by resource poor farmers in South Asia and Sub-Saharan Africa. In order to harness the untapped genetic potential available for chickpea improvement, we re-sequenced 35 chickpea genotypes representing parental lines of 16 mapping populations segregating for abiotic (drought, heat, salinity), biotic stresses (Fusarium wilt, Ascochyta blight, Botrytis grey mould, Helicoverpa armigera) and nutritionally important (protein content) traits using whole genome re-sequencing approach. A total of 192.19 Gb data, generated on 35 genotypes of chickpea, comprising 973.13 million reads, with an average sequencing depth of ~10 X for each line. On an average 92.18 % reads from each genotype were aligned to the chickpea reference genome with 82.17 % coverage. A total of 2,058,566 unique single nucleotide polymorphisms (SNPs) and 292,588 Indels were detected while comparing with the reference chickpea genome. Highest number of SNPs were identified on the Ca4 pseudomolecule. In addition, copy number variations (CNVs) such as gene deletions and duplications were identified across the chickpea parental genotypes, which were minimum in PI 489777 (1 gene deletion) and maximum in JG 74 (1,497). A total of 164,856 line specific variations (144,888 SNPs and 19,968 Indels) with the highest percentage were identified in coding regions in ICC 1496 (21 %) followed by ICCV 97105 (12 %). Of 539 miscellaneous variations, 339, 138 and 62 were inter-chromosomal variations (CTX), intra-chromosomal variations (ITX) and inversions (INV) respectively. Genome-wide SNPs, Indels, CNVs, PAVs, and miscellaneous variations identified in different mapping populations are a valuable resource in genetic research and helpful in locating genes/genomic segments responsible for economically important traits. Further, the genome-wide variations identified in the present study can be used for developing high density SNP arrays for

  3. Exome sequencing of a large family identifies potential candidate genes contributing risk to bipolar disorder.

    Science.gov (United States)

    Zhang, Tianxiao; Hou, Liping; Chen, David T; McMahon, Francis J; Wang, Jen-Chyong; Rice, John P

    2018-03-01

    Bipolar disorder is a mental illness with lifetime prevalence of about 1%. Previous genetic studies have identified multiple chromosomal linkage regions and candidate genes that might be associated with bipolar disorder. The present study aimed to identify potential susceptibility variants for bipolar disorder using 6 related case samples from a four-generation family. A combination of exome sequencing and linkage analysis was performed to identify potential susceptibility variants for bipolar disorder. Our study identified a list of five potential candidate genes for bipolar disorder. Among these five genes, GRID1(Glutamate Receptor Delta-1 Subunit), which was previously reported to be associated with several psychiatric disorders and brain related traits, is particularly interesting. Variants with functional significance in this gene were identified from two cousins in our bipolar disorder pedigree. Our findings suggest a potential role for these genes and the related rare variants in the onset and development of bipolar disorder in this one family. Additional research is needed to replicate these findings and evaluate their patho-biological significance. Copyright © 2017 Elsevier B.V. All rights reserved.

  4. Transcriptome Sequencing of Chemically Induced Aquilaria sinensis to Identify Genes Related to Agarwood Formation.

    Science.gov (United States)

    Ye, Wei; Wu, Hongqing; He, Xin; Wang, Lei; Zhang, Weimin; Li, Haohua; Fan, Yunfei; Tan, Guohui; Liu, Taomei; Gao, Xiaoxia

    2016-01-01

    Agarwood is a traditional Chinese medicine used as a clinical sedative, carminative, and antiemetic drug. Agarwood is formed in Aquilaria sinensis when A. sinensis trees are threatened by external physical, chemical injury or endophytic fungal irritation. However, the mechanism of agarwood formation via chemical induction remains unclear. In this study, we characterized the transcriptome of different parts of a chemically induced A. sinensis trunk sample with agarwood. The Illumina sequencing platform was used to identify the genes involved in agarwood formation. A five-year-old Aquilaria sinensis treated by formic acid was selected. The white wood part (B1 sample), the transition part between agarwood and white wood (W2 sample), the agarwood part (J3 sample), and the rotten wood part (F5 sample) were collected for transcriptome sequencing. Accordingly, 54,685,634 clean reads, which were assembled into 83,467 unigenes, were obtained with a Q20 value of 97.5%. A total of 50,565 unigenes were annotated using the Nr, Nt, SWISS-PROT, KEGG, COG, and GO databases. In particular, 171,331,352 unigenes were annotated by various pathways, including the sesquiterpenoid (ko00909) and plant-pathogen interaction (ko03040) pathways. These pathways were related to sesquiterpenoid biosynthesis and defensive responses to chemical stimulation. The transcriptome data of the different parts of the chemically induced A. sinensis trunk provide a rich source of materials for discovering and identifying the genes involved in sesquiterpenoid production and in defensive responses to chemical stimulation. This study is the first to use de novo sequencing and transcriptome assembly for different parts of chemically induced A. sinensis. Results demonstrate that the sesquiterpenoid biosynthesis pathway and WRKY transcription factor play important roles in agarwood formation via chemical induction. The comparative analysis of the transcriptome data of agarwood and A. sinensis lays the foundation

  5. Exome sequencing identifies CTSK mutations in patients originally diagnosed as intermediate osteopetrosis☆

    Science.gov (United States)

    Pangrazio, Alessandra; Puddu, Alessandro; Oppo, Manuela; Valentini, Maria; Zammataro, Luca; Vellodi, Ashok; Gener, Blanca; Llano-Rivas, Isabel; Raza, Jamal; Atta, Irum; Vezzoni, Paolo; Superti-Furga, Andrea; Villa, Anna; Sobacchi, Cristina

    2014-01-01

    Autosomal Recessive Osteopetrosis is a genetic disorder characterized by increased bone density due to lack of resorption by the osteoclasts. Genetic studies have widely unraveled the molecular basis of the most severe forms, while cases of intermediate severity are more difficult to characterize, probably because of a large heterogeneity. Here, we describe the use of exome sequencing in the molecular diagnosis of 2 siblings initially thought to be affected by “intermediate osteopetrosis”, which identified a homozygous mutation in the CTSK gene. Prompted by this finding, we tested by Sanger sequencing 25 additional patients addressed to us for recessive osteopetrosis and found CTSK mutations in 4 of them. In retrospect, their clinical and radiographic features were found to be compatible with, but not typical for, Pycnodysostosis. We sought to identify modifier genes that might have played a role in the clinical manifestation of the disease in these patients, but our results were not informative. In conclusion, we underline the difficulties of differential diagnosis in some patients whose clinical appearance does not fit the classical malignant or benign picture and recommend that CTSK gene be included in the molecular diagnosis of high bone density conditions. PMID:24269275

  6. Identifying Genetic Signatures of Natural Selection Using Pooled Population Sequencing in Picea abies.

    Science.gov (United States)

    Chen, Jun; Källman, Thomas; Ma, Xiao-Fei; Zaina, Giusi; Morgante, Michele; Lascoux, Martin

    2016-07-07

    The joint inference of selection and past demography remain a costly and demanding task. We used next generation sequencing of two pools of 48 Norway spruce mother trees, one corresponding to the Fennoscandian domain, and the other to the Alpine domain, to assess nucleotide polymorphism at 88 nuclear genes. These genes are candidate genes for phenological traits, and most belong to the photoperiod pathway. Estimates of population genetic summary statistics from the pooled data are similar to previous estimates, suggesting that pooled sequencing is reliable. The nonsynonymous SNPs tended to have both lower frequency differences and lower FST values between the two domains than silent ones. These results suggest the presence of purifying selection. The divergence between the two domains based on synonymous changes was around 5 million yr, a time similar to a recent phylogenetic estimate of 6 million yr, but much larger than earlier estimates based on isozymes. Two approaches, one of them novel and that considers both FST and difference in allele frequencies between the two domains, were used to identify SNPs potentially under diversifying selection. SNPs from around 20 genes were detected, including genes previously identified as main target for selection, such as PaPRR3 and PaGI. Copyright © 2016 Chen et al.

  7. Identifying and sequencing a Mycobacterium sp. strain F4 as a potential bioremediation agent for quinclorac.

    Science.gov (United States)

    Li, Yingying; Chen, Wu; Wang, Yunsheng; Luo, Kun; Li, Yue; Bai, Lianyang; Luo, Feng

    2017-01-01

    Quinclorac is a widely used herbicide in rice filed. Unfortunately, quinclorac residues are phytotoxic to many crops/vegetables. The degradation of quinclorac in nature is very slow. On the other hand, degradation of quinclorac using bacteria can be an effective and efficient method to reduce its contamination. In this study, we isolated a quinclorac bioremediation bacterium strain F4 from quinclorac contaminated soils. Based on morphological characteristics and 16S rRNA gene sequence analysis, we identified strain F4 as Mycobacterium sp. We investigated the effects of temperature, pH, inoculation size and initial quinclorac concentration on growth and degrading efficiency of F4 and determined the optimal quinclorac degrading condition of F4. Under optimal degrading conditions, F4 degraded 97.38% of quinclorac from an initial concentration of 50 mg/L in seven days. Our indoor pot experiment demonstrated that the degradation products were non-phytotoxic to tobacco. After analyzing the quinclorac degradation products of F4, we proposed that F4 could employ two pathways to degrade quinclorac: one is through methylation, the other is through dechlorination. Furthermore, we reconstructed the whole genome of F4 through single molecular sequencing and de novo assembly. We identified 77 methyltransferases and eight dehalogenases in the F4 genome to support our hypothesized degradation path.

  8. Exome sequencing identifies CTSK mutations in patients originally diagnosed as intermediate osteopetrosis.

    Science.gov (United States)

    Pangrazio, Alessandra; Puddu, Alessandro; Oppo, Manuela; Valentini, Maria; Zammataro, Luca; Vellodi, Ashok; Gener, Blanca; Llano-Rivas, Isabel; Raza, Jamal; Atta, Irum; Vezzoni, Paolo; Superti-Furga, Andrea; Villa, Anna; Sobacchi, Cristina

    2014-02-01

    Autosomal Recessive Osteopetrosis is a genetic disorder characterized by increased bone density due to lack of resorption by the osteoclasts. Genetic studies have widely unraveled the molecular basis of the most severe forms, while cases of intermediate severity are more difficult to characterize, probably because of a large heterogeneity. Here, we describe the use of exome sequencing in the molecular diagnosis of 2 siblings initially thought to be affected by "intermediate osteopetrosis", which identified a homozygous mutation in the CTSK gene. Prompted by this finding, we tested by Sanger sequencing 25 additional patients addressed to us for recessive osteopetrosis and found CTSK mutations in 4 of them. In retrospect, their clinical and radiographic features were found to be compatible with, but not typical for, Pycnodysostosis. We sought to identify modifier genes that might have played a role in the clinical manifestation of the disease in these patients, but our results were not informative. In conclusion, we underline the difficulties of differential diagnosis in some patients whose clinical appearance does not fit the classical malignant or benign picture and recommend that CTSK gene be included in the molecular diagnosis of high bone density conditions. Copyright © 2013 The Authors. Published by Elsevier Inc. All rights reserved.

  9. Transcriptome sequencing in pediatric acute lymphoblastic leukemia identifies fusion genes associated with distinct DNA methylation profiles

    Directory of Open Access Journals (Sweden)

    Yanara Marincevic-Zuniga

    2017-08-01

    Full Text Available Abstract Background Structural chromosomal rearrangements that lead to expressed fusion genes are a hallmark of acute lymphoblastic leukemia (ALL. In this study, we performed transcriptome sequencing of 134 primary ALL patient samples to comprehensively detect fusion transcripts. Methods We combined fusion gene detection with genome-wide DNA methylation analysis, gene expression profiling, and targeted sequencing to determine molecular signatures of emerging ALL subtypes. Results We identified 64 unique fusion events distributed among 80 individual patients, of which over 50% have not previously been reported in ALL. Although the majority of the fusion genes were found only in a single patient, we identified several recurrent fusion gene families defined by promiscuous fusion gene partners, such as ETV6, RUNX1, PAX5, and ZNF384, or recurrent fusion genes, such as DUX4-IGH. Our data show that patients harboring these fusion genes displayed characteristic genome-wide DNA methylation and gene expression signatures in addition to distinct patterns in single nucleotide variants and recurrent copy number alterations. Conclusion Our study delineates the fusion gene landscape in pediatric ALL, including both known and novel fusion genes, and highlights fusion gene families with shared molecular etiologies, which may provide additional information for prognosis and therapeutic options in the future.

  10. Epidemiological analysis of Salmonella clusters identified by whole genome sequencing, England and Wales 2014.

    Science.gov (United States)

    Waldram, Alison; Dolan, Gayle; Ashton, Philip M; Jenkins, Claire; Dallman, Timothy J

    2018-05-01

    The unprecedented level of bacterial strain discrimination provided by whole genome sequencing (WGS) presents new challenges with respect to the utility and interpretation of the data. Whole genome sequences from 1445 isolates of Salmonella belonging to the most commonly identified serotypes in England and Wales isolated between April and August 2014 were analysed. Single linkage single nucleotide polymorphism thresholds at the 10, 5 and 0 level were explored for evidence of epidemiological links between clustered cases. Analysis of the WGS data organised 566 of the 1445 isolates into 32 clusters of five or more. A statistically significant epidemiological link was identified for 17 clusters. The clusters were associated with foreign travel (n = 8), consumption of Chinese takeaways (n = 4), chicken eaten at home (n = 2), and one each of the following; eating out, contact with another case in the home and contact with reptiles. In the same time frame, one cluster was detected using traditional outbreak detection methods. WGS can be used for the highly specific and highly sensitive detection of biologically related isolates when epidemiological links are obscured. Improvements in the collection of detailed, standardised exposure information would enhance cluster investigations. Copyright © 2017 Elsevier Ltd. All rights reserved.

  11. Identifying and sequencing a Mycobacterium sp. strain F4 as a potential bioremediation agent for quinclorac.

    Directory of Open Access Journals (Sweden)

    Yingying Li

    Full Text Available Quinclorac is a widely used herbicide in rice filed. Unfortunately, quinclorac residues are phytotoxic to many crops/vegetables. The degradation of quinclorac in nature is very slow. On the other hand, degradation of quinclorac using bacteria can be an effective and efficient method to reduce its contamination. In this study, we isolated a quinclorac bioremediation bacterium strain F4 from quinclorac contaminated soils. Based on morphological characteristics and 16S rRNA gene sequence analysis, we identified strain F4 as Mycobacterium sp. We investigated the effects of temperature, pH, inoculation size and initial quinclorac concentration on growth and degrading efficiency of F4 and determined the optimal quinclorac degrading condition of F4. Under optimal degrading conditions, F4 degraded 97.38% of quinclorac from an initial concentration of 50 mg/L in seven days. Our indoor pot experiment demonstrated that the degradation products were non-phytotoxic to tobacco. After analyzing the quinclorac degradation products of F4, we proposed that F4 could employ two pathways to degrade quinclorac: one is through methylation, the other is through dechlorination. Furthermore, we reconstructed the whole genome of F4 through single molecular sequencing and de novo assembly. We identified 77 methyltransferases and eight dehalogenases in the F4 genome to support our hypothesized degradation path.

  12. Whole-genome sequencing identifies recurrent somatic NOTCH2 mutations in splenic marginal zone lymphoma.

    Science.gov (United States)

    Kiel, Mark J; Velusamy, Thirunavukkarasu; Betz, Bryan L; Zhao, Lili; Weigelin, Helmut G; Chiang, Mark Y; Huebner-Chan, David R; Bailey, Nathanael G; Yang, David T; Bhagat, Govind; Miranda, Roberto N; Bahler, David W; Medeiros, L Jeffrey; Lim, Megan S; Elenitoba-Johnson, Kojo S J

    2012-08-27

    Splenic marginal zone lymphoma (SMZL), the most common primary lymphoma of spleen, is poorly understood at the genetic level. In this study, using whole-genome DNA sequencing (WGS) and confirmation by Sanger sequencing, we observed mutations identified in several genes not previously known to be recurrently altered in SMZL. In particular, we identified recurrent somatic gain-of-function mutations in NOTCH2, a gene encoding a protein required for marginal zone B cell development, in 25 of 99 (∼25%) cases of SMZL and in 1 of 19 (∼5%) cases of nonsplenic MZLs. These mutations clustered near the C-terminal proline/glutamate/serine/threonine (PEST)-rich domain, resulting in protein truncation or, rarely, were nonsynonymous substitutions affecting the extracellular heterodimerization domain (HD). NOTCH2 mutations were not present in other B cell lymphomas and leukemias, such as chronic lymphocytic leukemia/small lymphocytic lymphoma (CLL/SLL; n = 15), mantle cell lymphoma (MCL; n = 15), low-grade follicular lymphoma (FL; n = 44), hairy cell leukemia (HCL; n = 15), and reactive lymphoid hyperplasia (n = 14). NOTCH2 mutations were associated with adverse clinical outcomes (relapse, histological transformation, and/or death) among SMZL patients (P = 0.002). These results suggest that NOTCH2 mutations play a role in the pathogenesis and progression of SMZL and are associated with a poor prognosis.

  13. microRNA expression profiling in fetal single ventricle malformation identified by deep sequencing.

    Science.gov (United States)

    Yu, Zhang-Bin; Han, Shu-Ping; Bai, Yun-Fei; Zhu, Chun; Pan, Ya; Guo, Xi-Rong

    2012-01-01

    microRNAs (miRNAs) have emerged as key regulators in many biological processes, particularly cardiac growth and development, although the specific miRNA expression profile associated with this process remains to be elucidated. This study aimed to characterize the cellular microRNA profile involved in the development of congenital heart malformation, through the investigation of single ventricle (SV) defects. Comprehensive miRNA profiling in human fetal SV cardiac tissue was performed by deep sequencing. Differential expression of 48 miRNAs was revealed by sequencing by oligonucleotide ligation and detection (SOLiD) analysis. Of these, 38 were down-regulated and 10 were up-regulated in differentiated SV cardiac tissue, compared to control cardiac tissue. This was confirmed by real-time quantitative reverse transcription-polymerase chain reaction (qRT-PCR) analysis. Predicted target genes of the 48 differentially expressed miRNAs were analyzed by gene ontology and categorized according to cellular process, regulation of biological process and metabolic process. Pathway-Express analysis identified the WNT and mTOR signaling pathways as the most significant processes putatively affected by the differential expression of these miRNAs. The candidate genes involved in cardiac development were identified as potential targets for these differentially expressed microRNAs and the collaborative network of microRNAs and cardiac development related-mRNAs was constructed. These data provide the basis for future investigation of the mechanism of the occurrence and development of fetal SV malformations.

  14. Whole-genome sequencing identifies recurrent mutations in chronic lymphocytic leukaemia

    Science.gov (United States)

    Puente, Xose S.; Pinyol, Magda; Quesada, Víctor; Conde, Laura; Ordóñez, Gonzalo R.; Villamor, Neus; Escaramis, Georgia; Jares, Pedro; Beà, Sílvia; González-Díaz, Marcos; Bassaganyas, Laia; Baumann, Tycho; Juan, Manel; López-Guerra, Mónica; Colomer, Dolors; Tubío, José M. C.; López, Cristina; Navarro, Alba; Tornador, Cristian; Aymerich, Marta; Rozman, María; Hernández, Jesús M.; Puente, Diana A.; Freije, José M. P.; Velasco, Gloria; Gutiérrez-Fernández, Ana; Costa, Dolors; Carrió, Anna; Guijarro, Sara; Enjuanes, Anna; Hernández, Lluís; Yagüe, Jordi; Nicolás, Pilar; Romeo-Casabona, Carlos M.; Himmelbauer, Heinz; Castillo, Ester; Dohm, Juliane C.; de Sanjosé, Silvia; Piris, Miguel A.; de Alava, Enrique; Miguel, Jesús San; Royo, Romina; Gelpí, Josep L.; Torrents, David; Orozco, Modesto; Pisano, David G.; Valencia, Alfonso; Guigó, Roderic; Bayés, Mónica; Heath, Simon; Gut, Marta; Klatt, Peter; Marshall, John; Raine, Keiran; Stebbings, Lucy A.; Futreal, P. Andrew; Stratton, Michael R.; Campbell, Peter J.; Gut, Ivo; López-Guillermo, Armando; Estivill, Xavier; Montserrat, Emili; López-Otín, Carlos; Campo, Elías

    2012-01-01

    Chronic lymphocytic leukaemia (CLL), the most frequent leukaemia in adults in Western countries, is a heterogeneous disease with variable clinical presentation and evolution1,2. Two major molecular subtypes can be distinguished, characterized respectively by a high or low number of somatic hypermutations in the variable region of immunoglobulin genes3,4. The molecular changes leading to the pathogenesis of the disease are still poorly understood. Here we performed whole-genome sequencing of four cases of CLL and identified 46 somatic mutations that potentially affect gene function. Further analysis of these mutations in 363 patients with CLL identified four genes that are recurrently mutated: notch 1 (NOTCH1), exportin 1 (XPO1), myeloid differentiation primary response gene 88 (MYD88) and kelch-like 6 (KLHL6). Mutations in MYD88 and KLHL6 are predominant in cases of CLL with mutated immunoglobulin genes, whereas NOTCH1 and XPO1 mutations are mainly detected in patients with unmutated immunoglobulins. The patterns of somatic mutation, supported by functional and clinical analyses, strongly indicate that the recurrent NOTCH1, MYD88 and XPO1 mutations are oncogenic changes that contribute to the clinical evolution of the disease. To our knowledge, this is the first comprehensive analysis of CLL combining whole-genome sequencing with clinical characteristics and clinical outcomes. It highlights the usefulness of this approach for the identification of clinically relevant mutations in cancer. PMID:21642962

  15. Genomic Aberrations in Crizotinib Resistant Lung Adenocarcinoma Samples Identified by Transcriptome Sequencing.

    Directory of Open Access Journals (Sweden)

    Ali Saber

    Full Text Available ALK-break positive non-small cell lung cancer (NSCLC patients initially respond to crizotinib, but resistance occurs inevitably. In this study we aimed to identify fusion genes in crizotinib resistant tumor samples. Re-biopsies of three patients were subjected to paired-end RNA sequencing to identify fusion genes using deFuse and EricScript. The IGV browser was used to determine presence of known resistance-associated mutations. Sanger sequencing was used to validate fusion genes and digital droplet PCR to validate mutations. ALK fusion genes were detected in all three patients with EML4 being the fusion partner. One patient had no additional fusion genes. Another patient had one additional fusion gene, but without a predicted open reading frame (ORF. The third patient had three additional fusion genes, of which two were derived from the same chromosomal region as the EML4-ALK. A predicted ORF was identified only in the CLIP4-VSNL1 fusion product. The fusion genes validated in the post-treatment sample were also present in the biopsy before crizotinib. ALK mutations (p.C1156Y and p.G1269A detected in the re-biopsies of two patients, were not detected in pre-treatment biopsies. In conclusion, fusion genes identified in our study are unlikely to be involved in crizotinib resistance based on presence in pre-treatment biopsies. The detection of ALK mutations in post-treatment tumor samples of two patients underlines their role in crizotinib resistance.

  16. Detection of genomic variation by selection of a 9 mb DNA region and high throughput sequencing.

    Directory of Open Access Journals (Sweden)

    Sergey I Nikolaev

    Full Text Available Detection of the rare polymorphisms and causative mutations of genetic diseases in a targeted genomic area has become a major goal in order to understand genomic and phenotypic variability. We have interrogated repeat-masked regions of 8.9 Mb on human chromosomes 21 (7.8 Mb and 7 (1.1 Mb from an individual from the International HapMap Project (NA12872. We have optimized a method of genomic selection for high throughput sequencing. Microarray-based selection and sequencing resulted in 260-fold enrichment, with 41% of reads mapping to the target region. 83% of SNPs in the targeted region had at least 4-fold sequence coverage and 54% at least 15-fold. When assaying HapMap SNPs in NA12872, our sequence genotypes are 91.3% concordant in regions with coverage > or = 4-fold, and 97.9% concordant in regions with coverage > or = 15-fold. About 81% of the SNPs recovered with both thresholds are listed in dbSNP. We observed that regions with low sequence coverage occur in close proximity to low-complexity DNA. Validation experiments using Sanger sequencing were performed for 46 SNPs with 15-20 fold coverage, with a confirmation rate of 96%, suggesting that DNA selection provides an accurate and cost-effective method for identifying rare genomic variants.

  17. HLA DNA sequence variation among human populations: molecular signatures of demographic and selective events.

    Directory of Open Access Journals (Sweden)

    Stéphane Buhler

    2011-02-01

    Full Text Available Molecular differences between HLA alleles vary up to 57 nucleotides within the peptide binding coding region of human Major Histocompatibility Complex (MHC genes, but it is still unclear whether this variation results from a stochastic process or from selective constraints related to functional differences among HLA molecules. Although HLA alleles are generally treated as equidistant molecular units in population genetic studies, DNA sequence diversity among populations is also crucial to interpret the observed HLA polymorphism. In this study, we used a large dataset of 2,062 DNA sequences defined for the different HLA alleles to analyze nucleotide diversity of seven HLA genes in 23,500 individuals of about 200 populations spread worldwide. We first analyzed the HLA molecular structure and diversity of these populations in relation to geographic variation and we further investigated possible departures from selective neutrality through Tajima's tests and mismatch distributions. All results were compared to those obtained by classical approaches applied to HLA allele frequencies.Our study shows that the global patterns of HLA nucleotide diversity among populations are significantly correlated to geography, although in some specific cases the molecular information reveals unexpected genetic relationships. At all loci except HLA-DPB1, populations have accumulated a high proportion of very divergent alleles, suggesting an advantage of heterozygotes expressing molecularly distant HLA molecules (asymmetric overdominant selection model. However, both different intensities of selection and unequal levels of gene conversion may explain the heterogeneous mismatch distributions observed among the loci. Also, distinctive patterns of sequence divergence observed at the HLA-DPB1 locus suggest current neutrality but old selective pressures on this gene. We conclude that HLA DNA sequences advantageously complement HLA allele frequencies as a source of data used

  18. Whole exome sequencing identifies RAI1 mutation in a morbidly obese child diagnosed with ROHHAD syndrome.

    Science.gov (United States)

    Thaker, Vidhu V; Esteves, Kristyn M; Towne, Meghan C; Brownstein, Catherine A; James, Philip M; Crowley, Laura; Hirschhorn, Joel N; Elsea, Sarah H; Beggs, Alan H; Picker, Jonathan; Agrawal, Pankaj B

    2015-05-01

    The current obesity epidemic is attributed to complex interactions between genetic and environmental factors. However, a limited number of cases, especially those with early-onset severe obesity, are linked to single gene defects. Rapid-onset obesity with hypothalamic dysfunction, hypoventilation and autonomic dysregulation (ROHHAD) is one of the syndromes that presents with abrupt-onset extreme weight gain with an unknown genetic basis. To identify the underlying genetic etiology in a child with morbid early-onset obesity, hypoventilation, and autonomic and behavioral disturbances who was clinically diagnosed with ROHHAD syndrome. Design/Setting/Intervention: The index patient was evaluated at an academic medical center. Whole-exome sequencing was performed on the proband and his parents. Genetic variants were validated by Sanger sequencing. We identified a novel de novo nonsense mutation, c.3265 C>T (p.R1089X), in the retinoic acid-induced 1 (RAI1) gene in the proband. Mutations in the RAI1 gene are known to cause Smith-Magenis syndrome (SMS). On further evaluation, his clinical features were not typical of either SMS or ROHHAD syndrome. This study identifies a de novo RAI1 mutation in a child with morbid obesity and a clinical diagnosis of ROHHAD syndrome. Although extreme early-onset obesity, autonomic disturbances, and hypoventilation are present in ROHHAD, several of the clinical findings are consistent with SMS. This case highlights the challenges in the diagnosis of ROHHAD syndrome and its potential overlap with SMS. We also propose RAI1 as a candidate gene for children with morbid obesity.

  19. Intraclutch eggshell colour variation in birds: are females able to identify their eggs individually?

    Directory of Open Access Journals (Sweden)

    Miroslav Poláček

    2017-08-01

    Full Text Available Background One possibility suggested regarding female post-mating strategies is differential allocation into offspring investment. Female birds produce not only the largest, but also most colourful eggs of all oviparous taxa. Larger eggs provide space for bigger embryos, or more nutrition for their development, but the question why eggs are more colourful and why there is variation in eggshell colouration remains. In this context, the focus of interest has been to explain inter-clutch variation but in many bird species, eggshell colouration also varies within a clutch. Surprisingly, less attention has been paid to this phenomenon. Therefore, we propose the “female egg recognition” hypothesis, suggesting that mothers use colour characteristics to interpret egg attributes and allocate further investment into each egg accordingly. To evaluate the feasibility of the hypothesis, we tested several underlying predictions and examined their suitability using a dataset from our tree sparrow (Passer montanus study. We predict (i substantial within-clutch variation in eggshell colouration which, (ii should be related to laying sequence, (iii reflect egg quality and, (iv should stimulate a female response. Methods Eggshell coloration data were obtained via digital photography under standardized conditions, taken after clutch completion. Lightness (L*, representing the achromatic properties of an egg has been chosen as the most important predictor in dark cavities and was related to egg quality and position in the nest. Results In our tree sparrows, first and mainly last eggs were less pigmented, providing information about laying order. Egg volume, which predicts chick quality, positively correlates with eggshell coloration. Finally, we could show that female tree sparrows placed darker, but not bigger, eggs into more central incubation positions. Discussion All basic prerequisites for the “female egg recognition” hypothesis are fulfilled. In this

  20. Ulysses: accurate detection of low-frequency structural variations in large insert-size sequencing libraries.

    Science.gov (United States)

    Gillet-Markowska, Alexandre; Richard, Hugues; Fischer, Gilles; Lafontaine, Ingrid

    2015-03-15

    The detection of structural variations (SVs) in short-range Paired-End (PE) libraries remains challenging because SV breakpoints can involve large dispersed repeated sequences, or carry inherent complexity, hardly resolvable with classical PE sequencing data. In contrast, large insert-size sequencing libraries (Mate-Pair libraries) provide higher physical coverage of the genome and give access to repeat-containing regions. They can thus theoretically overcome previous limitations as they are becoming routinely accessible. Nevertheless, broad insert size distributions and high rates of chimerical sequences are usually associated to this type of libraries, which makes the accurate annotation of SV challenging. Here, we present Ulysses, a tool that achieves drastically higher detection accuracy than existing tools, both on simulated and real mate-pair sequencing datasets from the 1000 Human Genome project. Ulysses achieves high specificity over the complete spectrum of variants by assessing, in a principled manner, the statistical significance of each possible variant (duplications, deletions, translocations, insertions and inversions) against an explicit model for the generation of experimental noise. This statistical model proves particularly useful for the detection of low frequency variants. SV detection performed on a large insert Mate-Pair library from a breast cancer sample revealed a high level of somatic duplications in the tumor and, to a lesser extent, in the blood sample as well. Altogether, these results show that Ulysses is a valuable tool for the characterization of somatic mosaicism in human tissues and in cancer genomes. © The Author 2014. Published by Oxford University Press. All rights reserved. For Permissions, please e-mail: journals.permissions@oup.com.

  1. Whole-exome sequencing identifies novel candidate predisposition genes for familial polycythemia vera.

    Science.gov (United States)

    Hirvonen, Elina A M; Pitkänen, Esa; Hemminki, Kari; Aaltonen, Lauri A; Kilpivaara, Outi

    2017-04-20

    Polycythemia vera (PV), characterized by massive production of erythrocytes, is one of the myeloproliferative neoplasms. Most patients carry a somatic gain-of-function mutation in JAK2, c.1849G > T (p.Val617Phe), leading to constitutive activation of JAK-STAT signaling pathway. Familial clustering is also observed occasionally, but high-penetrance predisposition genes to PV have remained unidentified. We studied the predisposition to PV by exome sequencing (three cases) in a Finnish PV family with four patients. The 12 shared variants (maximum allowed minor allele frequency  G (p.Phe418Leu) in ZXDC, c.1931C > G (p.Pro644Arg) in ATN1, and c.701G > A (p.Arg234Gln) in LRRC3. We also observed a rare, predicted benign germline variant c.2912C > G (p.Ala971Gly) in BCORL1 in all four patients. Somatic mutations in BCORL1 have been reported in myeloid malignancies. We further screened the variants in eight PV patients in six other Finnish families, but no other carriers were found. Exome sequencing provides a powerful tool for the identification of novel variants, and understanding the familial predisposition of diseases. This is the first report on Finnish familial PV cases, and we identified three novel candidate variants that may predispose to the disease.

  2. Targeted exome sequencing identified novel USH2A mutations in Usher syndrome families.

    Directory of Open Access Journals (Sweden)

    Xiu-Feng Huang

    Full Text Available Usher syndrome (USH is a leading cause of deaf-blindness in autosomal recessive trait. Phenotypic and genetic heterogeneities in USH make molecular diagnosis much difficult. This is a pilot study aiming to develop an approach based on next-generation sequencing to determine the genetic defects in patients with USH or allied diseases precisely and effectively. Eight affected patients and twelve unaffected relatives from five unrelated Chinese USH families, including 2 pseudo-dominant ones, were recruited. A total of 144 known genes of inherited retinal diseases were selected for deep exome resequencing. Through systematic data analysis using established bioinformatics pipeline and segregation analysis, a number of genetic variants were released. Eleven mutations, eight of them were novel, in the USH2A gene were identified. Biparental mutations in USH2A were revealed in 2 families with pseudo-dominant inheritance. A proband was found to have triple mutations, two of them were supposed to locate in the same chromosome. In conclusion, this study revealed the genetic defects in the USH2A gene and demonstrated the robustness of targeted exome sequencing to precisely and rapidly determine genetic defects. The methodology provides a reliable strategy for routine gene diagnosis of USH.

  3. LigandRFs: random forest ensemble to identify ligand-binding residues from sequence information alone

    KAUST Repository

    Chen, Peng

    2014-12-03

    Background Protein-ligand binding is important for some proteins to perform their functions. Protein-ligand binding sites are the residues of proteins that physically bind to ligands. Despite of the recent advances in computational prediction for protein-ligand binding sites, the state-of-the-art methods search for similar, known structures of the query and predict the binding sites based on the solved structures. However, such structural information is not commonly available. Results In this paper, we propose a sequence-based approach to identify protein-ligand binding residues. We propose a combination technique to reduce the effects of different sliding residue windows in the process of encoding input feature vectors. Moreover, due to the highly imbalanced samples between the ligand-binding sites and non ligand-binding sites, we construct several balanced data sets, for each of which a random forest (RF)-based classifier is trained. The ensemble of these RF classifiers forms a sequence-based protein-ligand binding site predictor. Conclusions Experimental results on CASP9 and CASP8 data sets demonstrate that our method compares favorably with the state-of-the-art protein-ligand binding site prediction methods.

  4. Barcoding lichen-forming fungi using 454 pyrosequencing is challenged by artifactual and biological sequence variation.

    Science.gov (United States)

    Mark, Kristiina; Cornejo, Carolina; Keller, Christine; Flück, Daniela; Scheidegger, Christoph

    2016-09-01

    Although lichens (lichen-forming fungi) play an important role in the ecological integrity of many vulnerable landscapes, only a minority of lichen-forming fungi have been barcoded out of the currently accepted ∼18 000 species. Regular Sanger sequencing can be problematic when analyzing lichens since saprophytic, endophytic, and parasitic fungi live intimately admixed, resulting in low-quality sequencing reads. Here, high-throughput, long-read 454 pyrosequencing in a GS FLX+ System was tested to barcode the fungal partner of 100 epiphytic lichen species from Switzerland using fungal-specific primers when amplifying the full internal transcribed spacer region (ITS). The present study shows the potential of DNA barcoding using pyrosequencing, in that the expected lichen fungus was successfully sequenced for all samples except one. Alignment solutions such as BLAST were found to be largely adequate for the generated long reads. In addition, the NCBI nucleotide database-currently the most complete database for lichen-forming fungi-can be used as a reference database when identifying common species, since the majority of analyzed lichens were identified correctly to the species or at least to the genus level. However, several issues were encountered, including a high sequencing error rate, multiple ITS versions in a genome (incomplete concerted evolution), and in some samples the presence of mixed lichen-forming fungi (possible lichen chimeras).

  5. Microfluidic screening and whole-genome sequencing identifies mutations associated with improved protein secretion by yeast

    DEFF Research Database (Denmark)

    Huang, Mingtao; Bai, Yunpeng; Sjostrom, Staffan L.

    2015-01-01

    There is an increasing demand for biotech-based production of recombinant proteins for use as pharmaceuticals in the food and feed industry and in industrial applications. Yeast Saccharomyces cerevisiae is among preferred cell factories for recombinant protein production, and there is increasing...... interest in improving its protein secretion capacity. Due to the complexity of the secretory machinery in eukaryotic cells, it is difficult to apply rational engineering for construction of improved strains. Here we used high-throughput microfluidics for the screening of yeast libraries, generated by UV...... mutagenesis. Several screening and sorting rounds resulted in the selection of eight yeast clones with significantly improved secretion of recombinant a-amylase. Efficient secretion was genetically stable in the selected clones. We performed whole-genome sequencing of the eight clones and identified 330...

  6. Null alleles and sequence variations at primer binding sites of STR loci within multiplex typing systems.

    Science.gov (United States)

    Yao, Yining; Yang, Qinrui; Shao, Chengchen; Liu, Baonian; Zhou, Yuxiang; Xu, Hongmei; Zhou, Yueqin; Tang, Qiqun; Xie, Jianhui

    2018-01-01

    Rare variants are widely observed in human genome and sequence variations at primer binding sites might impair the process of PCR amplification resulting in dropouts of alleles, named as null alleles. In this study, 5 cases from routine paternity testing using PowerPlex ® 21 System for STR genotyping were considered to harbor null alleles at TH01, FGA, D5S818, D8S1179, and D16S539, respectively. The dropout of alleles was confirmed by using alternative commercial kits AGCU Expressmarker 22 PCR amplification kit and AmpFℓSTR ® . Identifiler ® Plus Kit, and sequencing results revealed a single base variation at the primer binding site of each STR locus. Results from the collection of previous reports show that null alleles at D5S818 were frequently observed in population detected by two PowerPlex ® typing systems and null alleles at D19S433 were mostly observed in Japanese population detected by two AmpFℓSTR™ typing systems. Furthermore, the most popular mutation type appeared the transition from C to T with G to A, which might have a potential relationship with DNA methylation. Altogether, these results can provide helpful information in forensic practice to the elimination of genotyping discrepancy and the development of primer sets. Copyright © 2017 Elsevier B.V. All rights reserved.

  7. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data

    Science.gov (United States)

    Miller, Mark P.; Knaus, Brian J.; Mullins, Thomas D.; Haig, Susan M.

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  8. SSR_pipeline: a bioinformatic infrastructure for identifying microsatellites from paired-end Illumina high-throughput DNA sequencing data.

    Science.gov (United States)

    Miller, Mark P; Knaus, Brian J; Mullins, Thomas D; Haig, Susan M

    2013-01-01

    SSR_pipeline is a flexible set of programs designed to efficiently identify simple sequence repeats (e.g., microsatellites) from paired-end high-throughput Illumina DNA sequencing data. The program suite contains 3 analysis modules along with a fourth control module that can automate analyses of large volumes of data. The modules are used to 1) identify the subset of paired-end sequences that pass Illumina quality standards, 2) align paired-end reads into a single composite DNA sequence, and 3) identify sequences that possess microsatellites (both simple and compound) conforming to user-specified parameters. The microsatellite search algorithm is extremely efficient, and we have used it to identify repeats with motifs from 2 to 25 bp in length. Each of the 3 analysis modules can also be used independently to provide greater flexibility or to work with FASTQ or FASTA files generated from other sequencing platforms (Roche 454, Ion Torrent, etc.). We demonstrate use of the program with data from the brine fly Ephydra packardi (Diptera: Ephydridae) and provide empirical timing benchmarks to illustrate program performance on a common desktop computer environment. We further show that the Illumina platform is capable of identifying large numbers of microsatellites, even when using unenriched sample libraries and a very small percentage of the sequencing capacity from a single DNA sequencing run. All modules from SSR_pipeline are implemented in the Python programming language and can therefore be used from nearly any computer operating system (Linux, Macintosh, and Windows).

  9. Third-Generation Sequencing and Analysis of Four Complete Pig Liver Esterase Gene Sequences in Clones Identified by Screening BAC Library.

    Science.gov (United States)

    Zhou, Qiongqiong; Sun, Wenjuan; Liu, Xiyan; Wang, Xiliang; Xiao, Yuncai; Bi, Dingren; Yin, Jingdong; Shi, Deshi

    2016-01-01

    Pig liver carboxylesterase (PLE) gene sequences in GenBank are incomplete, which has led to difficulties in studying the genetic structure and regulation mechanisms of gene expression of PLE family genes. The aim of this study was to obtain and analysis of complete gene sequences of PLE family by screening from a Rongchang pig BAC library and third-generation PacBio gene sequencing. After a number of existing incomplete PLE isoform gene sequences were analysed, primers were designed based on conserved regions in PLE exons, and the whole pig genome used as a template for Polymerase chain reaction (PCR) amplification. Specific primers were then selected based on the PCR amplification results. A three-step PCR screening method was used to identify PLE-positive clones by screening a Rongchang pig BAC library and PacBio third-generation sequencing was performed. BLAST comparisons and other bioinformatics methods were applied for sequence analysis. Five PLE-positive BAC clones, designated BAC-10, BAC-70, BAC-75, BAC-119 and BAC-206, were identified. Sequence analysis yielded the complete sequences of four PLE genes, PLE1, PLE-B9, PLE-C4, and PLE-G2. Complete PLE gene sequences were defined as those containing regulatory sequences, exons, and introns. It was found that, not only did the PLE exon sequences of the four genes show a high degree of homology, but also that the intron sequences were highly similar. Additionally, the regulatory region of the genes contained two 720bps reverse complement sequences that may have an important function in the regulation of PLE gene expression. This is the first report to confirm the complete sequences of four PLE genes. In addition, the study demonstrates that each PLE isoform is encoded by a single gene and that the various genes exhibit a high degree of sequence homology, suggesting that the PLE family evolved from a single ancestral gene. Obtaining the complete sequences of these PLE genes provides the necessary foundation for

  10. Application of small RNA sequencing to identify microRNAs in acute kidney injury and fibrosis

    Energy Technology Data Exchange (ETDEWEB)

    Pellegrini, Kathryn L. [Department of Medicine, Renal Division, Brigham and Women' s Hospital, Harvard Medical School, Boston, MA (United States); Gerlach, Cory V. [Department of Medicine, Renal Division, Brigham and Women' s Hospital, Harvard Medical School, Boston, MA (United States); Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA (United States); Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Sciences, Harvard Medical School, Boston, MA (United States); Craciun, Florin L.; Ramachandran, Krithika [Department of Medicine, Renal Division, Brigham and Women' s Hospital, Harvard Medical School, Boston, MA (United States); Bijol, Vanesa [Department of Pathology, Brigham and Women' s Hospital, Harvard Medical School, Boston, MA (United States); Kissick, Haydn T. [Department of Surgery, Urology Division, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA (United States); Vaidya, Vishal S., E-mail: vvaidya@bwh.harvard.edu [Department of Medicine, Renal Division, Brigham and Women' s Hospital, Harvard Medical School, Boston, MA (United States); Department of Environmental Health, Harvard T.H. Chan School of Public Health, Boston, MA (United States); Laboratory of Systems Pharmacology, Harvard Program in Therapeutic Sciences, Harvard Medical School, Boston, MA (United States)

    2016-12-01

    Establishing a microRNA (miRNA) expression profile in affected tissues provides an important foundation for the discovery of miRNAs involved in the development or progression of pathologic conditions. We conducted small RNA sequencing to generate a temporal profile of miRNA expression in the kidneys using a mouse model of folic acid-induced (250 mg/kg i.p.) kidney injury and fibrosis. From the 103 miRNAs that were differentially expressed over the time course (> 2-fold, p < 0.05), we chose to further investigate miR-18a-5p, which is expressed during the acute stage of the injury; miR-132-3p, which is upregulated during transition between acute and fibrotic injury; and miR-146b-5p, which is highly expressed at the peak of fibrosis. Using qRT-PCR, we confirmed the increased expression of these candidate miRNAs in the folic acid model as well as in other established mouse models of acute injury (ischemia/reperfusion injury) and fibrosis (unilateral ureteral obstruction). In situ hybridization confirmed high expression of miR-18a-5p, miR-132-3p and miR-146b-5p throughout the kidney cortex in mice and humans with severe kidney injury or fibrosis. When primary human proximal tubular epithelial cells were treated with model nephrotoxicants such as cadmium chloride (CdCl{sub 2}), arsenic trioxide, aristolochic acid (AA), potassium dichromate (K{sub 2}Cr{sub 2}O{sub 7}) and cisplatin, miRNA-132-3p was upregulated 4.3-fold after AA treatment and 1.5-fold after K{sub 2}Cr{sub 2}O{sub 7} and CdCl{sub 2} treatment. These results demonstrate the application of temporal small RNA sequencing to identify miR-18a, miR-132 and miR-146b as differentially expressed miRNAs during distinct phases of kidney injury and fibrosis progression. - Highlights: • We used small RNA sequencing to identify differentially expressed miRNAs in kidney. • Distinct patterns were found for acute injury and fibrotic stages in the kidney. • Upregulation of miR-18a, -132 and -146b was confirmed in mice

  11. Molecular profiling of appendiceal epithelial tumors using massively parallel sequencing to identify somatic mutations.

    Science.gov (United States)

    Liu, Xiaoying; Mody, Kabir; de Abreu, Francine B; Pipas, J Marc; Peterson, Jason D; Gallagher, Torrey L; Suriawinata, Arief A; Ripple, Gregory H; Hourdequin, Kathryn C; Smith, Kerrington D; Barth, Richard J; Colacchio, Thomas A; Tsapakos, Michael J; Zaki, Bassem I; Gardner, Timothy B; Gordon, Stuart R; Amos, Christopher I; Wells, Wendy A; Tsongalis, Gregory J

    2014-07-01

    Some epithelial neoplasms of the appendix, including low-grade appendiceal mucinous neoplasm and adenocarcinoma, can result in pseudomyxoma peritonei (PMP). Little is known about the mutational spectra of these tumor types and whether mutations may be of clinical significance with respect to therapeutic selection. In this study, we identified somatic mutations using the Ion Torrent AmpliSeq Cancer Hotspot Panel v2. Specimens consisted of 3 nonneoplastic retention cysts/mucocele, 15 low-grade mucinous neoplasms (LAMNs), 8 low-grade/well-differentiated mucinous adenocarcinomas with pseudomyxoma peritonei, and 12 adenocarcinomas with/without goblet cell/signet ring cell features. Barcoded libraries were prepared from up to 10 ng of extracted DNA and multiplexed on single 318 chips for sequencing. Data analysis was performed using Golden Helix SVS. Variants that remained after the analysis pipeline were individually interrogated using the Integrative Genomics Viewer. A single Janus kinase 3 (JAK3) mutation was detected in the mucocele group. Eight mutations were identified in the V-Ki-ras2 Kirsten rat sarcoma viral oncogene homolog (KRAS) and GNAS complex locus (GNAS) genes among LAMN samples. Additional gene mutations were identified in the AKT1 (v-akt murine thymoma viral oncogene homolog 1), APC (adenomatous polyposis coli), JAK3, MET (met proto-oncogene), phosphatidylinositol-4,5-bisphosphate 3-kinase (PIK3CA), RB1 (retinoblastoma 1), STK11 (serine/threonine kinase 11), and tumor protein p53 (TP53) genes. Among the PMPs, 6 mutations were detected in the KRAS gene and also in the GNAS, TP53, and RB1 genes. Appendiceal cancers showed mutations in the APC, ATM (ataxia telangiectasia mutated), KRAS, IDH1 [isocitrate dehydrogenase 1 (NADP+)], NRAS [neuroblastoma RAS viral (v-ras) oncogene homolog], PIK3CA, SMAD4 (SMAD family member 4), and TP53 genes. Our results suggest molecular heterogeneity among epithelial tumors of the appendix. Next generation sequencing efforts

  12. Genome-wide patterns of copy number variation in the diversified chicken genomes using next-generation sequencing.

    Science.gov (United States)

    Yi, Guoqiang; Qu, Lujiang; Liu, Jianfeng; Yan, Yiyuan; Xu, Guiyun; Yang, Ning

    2014-11-07

    Copy number variation (CNV) is important and widespread in the genome, and is a major cause of disease and phenotypic diversity. Herein, we performed a genome-wide CNV analysis in 12 diversified chicken genomes based on whole genome sequencing. A total of 8,840 CNV regions (CNVRs) covering 98.2 Mb and representing 9.4% of the chicken genome were identified, ranging in size from 1.1 to 268.8 kb with an average of 11.1 kb. Sequencing-based predictions were confirmed at a high validation rate by two independent approaches, including array comparative genomic hybridization (aCGH) and quantitative PCR (qPCR). The Pearson's correlation coefficients between sequencing and aCGH results ranged from 0.435 to 0.755, and qPCR experiments revealed a positive validation rate of 91.71% and a false negative rate of 22.43%. In total, 2,214 (25.0%) predicted CNVRs span 2,216 (36.4%) RefSeq genes associated with specific biological functions. Besides two previously reported copy number variable genes EDN3 and PRLR, we also found some promising genes with potential in phenotypic variation. Two genes, FZD6 and LIMS1, related to disease susceptibility/resistance are covered by CNVRs. The highly duplicated SOCS2 may lead to higher bone mineral density. Entire or partial duplication of some genes like POPDC3 may have great economic importance in poultry breeding. Our results based on extensive genetic diversity provide a more refined chicken CNV map and genome-wide gene copy number estimates, and warrant future CNV association studies for important traits in chickens.

  13. Extra-binomial variation approach for analysis of pooled DNA sequencing data

    Science.gov (United States)

    Wallace, Chris

    2012-01-01

    Motivation: The invention of next-generation sequencing technology has made it possible to study the rare variants that are more likely to pinpoint causal disease genes. To make such experiments financially viable, DNA samples from several subjects are often pooled before sequencing. This induces large between-pool variation which, together with other sources of experimental error, creates over-dispersed data. Statistical analysis of pooled sequencing data needs to appropriately model this additional variance to avoid inflating the false-positive rate. Results: We propose a new statistical method based on an extra-binomial model to address the over-dispersion and apply it to pooled case-control data. We demonstrate that our model provides a better fit to the data than either a standard binomial model or a traditional extra-binomial model proposed by Williams and can analyse both rare and common variants with lower or more variable pool depths compared to the other methods. Availability: Package ‘extraBinomial’ is on http://cran.r-project.org/ Contact: chris.wallace@cimr.cam.ac.uk Supplementary information: Supplementary data are available at Bioinformatics Online. PMID:22976083

  14. Human Y chromosome copy number variation in the next generation sequencing era and beyond.

    Science.gov (United States)

    Massaia, Andrea; Xue, Yali

    2017-05-01

    The human Y chromosome provides a fertile ground for structural rearrangements owing to its haploidy and high content of repeated sequences. The methodologies used for copy number variation (CNV) studies have developed over the years. Low-throughput techniques based on direct observation of rearrangements were developed early on, and are still used, often to complement array-based or sequencing approaches which have limited power in regions with high repeat content and specifically in the presence of long, identical repeats, such as those found in human sex chromosomes. Some specific rearrangements have been investigated for decades; because of their effects on fertility, or their outstanding evolutionary features, the interest in these has not diminished. However, following the flourishing of large-scale genomics, several studies have investigated CNVs across the whole chromosome. These studies sometimes employ data generated within large genomic projects such as the DDD study or the 1000 Genomes Project, and often survey large samples of healthy individuals without any prior selection. Novel technologies based on sequencing long molecules and combinations of technologies, promise to stimulate the study of Y-CNVs in the immediate future.

  15. Amazonian phylogeography: mtDNA sequence variation in arboreal echimyid rodents (Caviomorpha).

    Science.gov (United States)

    da Silva, M N; Patton, J L

    1993-09-01

    Patterns of evolutionary relationships among haplotype clades of sequences of the mitochondrial cytochrome b DNA gene are examined for five genera of arboreal rodents of the Caviomorph family Echimyidae from the Amazon Basin. Data are available for 798 bp of sequence from a total of 24 separate localities in Peru, Venezuela, Bolivia, and Brazil for Mesomys, Isothrix, Makalata, Dactylomys, and Echimys. Sequence divergence, corrected for multiple hits, is extensive, ranging from less than 1% for comparisons within populations of over 20% among geographic units within genera. Both the degree of differentiation and the geographic patterning of the variation suggest that more than one species composes the Amazonian distribution of the currently recognized Mesomys hispidus, Isothrix bistriata, Makalata didelphoides, and Dactylomys dactylinus. There is general concordance in the geographic range of haplotype clades for each of these taxa, and the overall level of differentiation within them is largely equivalent. These observations suggest that a common vicariant history underlies the respective diversification of each genus. However, estimated times of divergence based on the rate of third position transversion substitutions for the major clades within each genus typically range above 1 million years. Thus, allopatric isolation precipitating divergence must have been considerably earlier than the late Pleistocene forest fragmentation events commonly invoked for Amazonian biota.

  16. Comprehensive assessment of sequence variation within the copy number variable defensin cluster on 8p23 by target enriched in-depth 454 sequencing

    Directory of Open Access Journals (Sweden)

    Zhang Xinmin

    2011-05-01

    Full Text Available Abstract Background In highly copy number variable (CNV regions such as the human defensin gene locus, comprehensive assessment of sequence variations is challenging. PCR approaches are practically restricted to tiny fractions, and next-generation sequencing (NGS approaches of whole individual genomes e.g. by the 1000 Genomes Project is confined by an affordable sequence depth. Combining target enrichment with NGS may represent a feasible approach. Results As a proof of principle, we enriched a ~850 kb section comprising the CNV defensin gene cluster DEFB, the invariable DEFA part and 11 control regions from two genomes by sequence capture and sequenced it by 454 technology. 6,651 differences to the human reference genome were found. Comparison to HapMap genotypes revealed sensitivities and specificities in the range of 94% to 99% for the identification of variations. Using error probabilities for rigorous filtering revealed 2,886 unique single nucleotide variations (SNVs including 358 putative novel ones. DEFB CN determinations by haplotype ratios were in agreement with alternative methods. Conclusion Although currently labor extensive and having high costs, target enriched NGS provides a powerful tool for the comprehensive assessment of SNVs in highly polymorphic CNV regions of individual genomes. Furthermore, it reveals considerable amounts of putative novel variations and simultaneously allows CN estimation.

  17. Functional translation and linguistic variation: the use of didactic sequence in teaching languages

    Directory of Open Access Journals (Sweden)

    Valdecy Oliveira Pontes

    2017-12-01

    Full Text Available In the context of the approach of the linguistic variation of Spanish and the use of Functionalist Translation in Foreign Language classes, this article aims to report the results of the application of a Didactic Sequence (SD, in the style of the Geneva School, Hispanic plays for the teaching of linguistic variation in the pronominal treatment forms of the Spanish-Portuguese Brazilian language pair. SD was applied in the subject "Introduction to Translation Studies in Spanish Language" (2nd semester, offered by the course in Letters - Spanish Language and its Literatures, of the Federal University of Ceará. This article was based on the theoretical foundations of Functionalist Translation (NORD, 1994, 1996, 2009, 2012, Translation and Sociolinguistics (BOLAÑOS-CUELLAR, 2000; MAYORAL, 1998, elaboration of SD (DOLZ; NOVERRAZ; SCHNEUWLY, 2004; CRISTÓVÃO, 2010; BARROS, 2012 and research on the variation in the forms of treatment of Spanish and Portuguese (FONTANELLA DE WEINBER, 1999; SCHERRE et al, 2015.

  18. The R package otu2ot for implementing the entropy decomposition of nucleotide variation in sequence data

    Directory of Open Access Journals (Sweden)

    Alban eRamette

    2014-11-01

    Full Text Available Oligotyping is a novel, supervised computational method that classifies closely related sequences into oligotypes (OTs based on subtle nucleotide variations (Eren et al. 2013. Its application to microbial datasets has helped reveal ecological patterns which are often hidden by the way sequence data are currently clustered to define operational taxonomic units (OTUs. Here, we implemented the OT entropy decomposition procedure and its unsupervised version, Minimal Entropy Decomposition (MED; Eren et al. 2014, in the statistical programming language and environment, R. The aims are to facilitate the integration of computational routines, interactive statistical analyses, and visualization into a single framework. In addition, two complementary approaches are implemented: 1 An analytical method (the broken stick model is proposed to help identify oligotypes of low abundance that could be generated by chance alone and 2 a one-pass profiling (OP method, to efficiently identify those OTUs whose subsequent oligotyping would be most promising. These enhancements are especially useful for large datasets, where a manual screening of entropy analysis results and the creation of a full set of OTs may not be feasible. The package and procedures are illustrated by several tutorials and examples.

  19. Extended exome sequencing identifies BACH2 as a novel major risk locus for Addison's disease.

    Science.gov (United States)

    Eriksson, D; Bianchi, M; Landegren, N; Nordin, J; Dalin, F; Mathioudaki, A; Eriksson, G N; Hultin-Rosenberg, L; Dahlqvist, J; Zetterqvist, H; Karlsson, Å; Hallgren, Å; Farias, F H G; Murén, E; Ahlgren, K M; Lobell, A; Andersson, G; Tandre, K; Dahlqvist, S R; Söderkvist, P; Rönnblom, L; Hulting, A-L; Wahlberg, J; Ekwall, O; Dahlqvist, P; Meadows, J R S; Bensing, S; Lindblad-Toh, K; Kämpe, O; Pielberg, G R

    2016-12-01

    Autoimmune disease is one of the leading causes of morbidity and mortality worldwide. In Addison's disease, the adrenal glands are targeted by destructive autoimmunity. Despite being the most common cause of primary adrenal failure, little is known about its aetiology. To understand the genetic background of Addison's disease, we utilized the extensively characterized patients of the Swedish Addison Registry. We developed an extended exome capture array comprising a selected set of 1853 genes and their potential regulatory elements, for the purpose of sequencing 479 patients with Addison's disease and 1394 controls. We identified BACH2 (rs62408233-A, OR = 2.01 (1.71-2.37), P = 1.66 × 10 -15 , MAF 0.46/0.29 in cases/controls) as a novel gene associated with Addison's disease development. We also confirmed the previously known associations with the HLA complex. Whilst BACH2 has been previously reported to associate with organ-specific autoimmune diseases co-inherited with Addison's disease, we have identified BACH2 as a major risk locus in Addison's disease, independent of concomitant autoimmune diseases. Our results may enable future research towards preventive disease treatment. © 2016 The Authors. Journal of Internal Medicine published by John Wiley & Sons Ltd on behalf of Association for Publication of The Journal of Internal Medicine.

  20. Exome sequencing in 53 sporadic cases of schizophrenia identifies 18 putative candidate genes.

    Directory of Open Access Journals (Sweden)

    Michel Guipponi

    Full Text Available Schizophrenia (SCZ is a severe, debilitating mental illness which has a significant genetic component. The identification of genetic factors related to SCZ has been challenging and these factors remain largely unknown. To evaluate the contribution of de novo variants (DNVs to SCZ, we sequenced the exomes of 53 individuals with sporadic SCZ and of their non-affected parents. We identified 49 DNVs, 18 of which were predicted to alter gene function, including 13 damaging missense mutations, 2 conserved splice site mutations, 2 nonsense mutations, and 1 frameshift deletion. The average number of exonic DNV per proband was 0.88, which corresponds to an exonic point mutation rate of 1.7×10(-8 per nucleotide per generation. The non-synonymous-to-synonymous mutation ratio of 2.06 did not differ from neutral expectations. Overall, this study provides a list of 18 putative candidate genes for sporadic SCZ, and when combined with the results of similar reports, identifies a second proband carrying a non-synonymous DNV in the RGS12 gene.

  1. Next-generation sequencing identifies transportin 3 as the causative gene for LGMD1F.

    Directory of Open Access Journals (Sweden)

    Annalaura Torella

    Full Text Available Limb-girdle muscular dystrophies (LGMD are genetically and clinically heterogeneous conditions. We investigated a large family with autosomal dominant transmission pattern, previously classified as LGMD1F and mapped to chromosome 7q32. Affected members are characterized by muscle weakness affecting earlier the pelvic girdle and the ileopsoas muscles. We sequenced the whole exome of four family members and identified a shared heterozygous frame-shift variant in the Transportin 3 (TNPO3 gene, encoding a member of the importin-β super-family. The TNPO3 gene is mapped within the LGMD1F critical interval and its 923-amino acid human gene product is also expressed in skeletal muscle. In addition, we identified an isolated case of LGMD with a new missense mutation in the same gene. We localized the mutant TNPO3 around the nucleus, but not inside. The involvement of gene related to the nuclear transport suggests a novel disease mechanism leading to muscular dystrophy.

  2. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepú lveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-01-01

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  3. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data.

    Science.gov (United States)

    Sepúlveda, Nuno; Campino, Susana G; Assefa, Samuel A; Sutherland, Colin J; Pain, Arnab; Clark, Taane G

    2013-02-26

    The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model. Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates. In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data.

  4. A Poisson hierarchical modelling approach to detecting copy number variation in sequence coverage data

    KAUST Repository

    Sepúlveda, Nuno

    2013-02-26

    Background: The advent of next generation sequencing technology has accelerated efforts to map and catalogue copy number variation (CNV) in genomes of important micro-organisms for public health. A typical analysis of the sequence data involves mapping reads onto a reference genome, calculating the respective coverage, and detecting regions with too-low or too-high coverage (deletions and amplifications, respectively). Current CNV detection methods rely on statistical assumptions (e.g., a Poisson model) that may not hold in general, or require fine-tuning the underlying algorithms to detect known hits. We propose a new CNV detection methodology based on two Poisson hierarchical models, the Poisson-Gamma and Poisson-Lognormal, with the advantage of being sufficiently flexible to describe different data patterns, whilst robust against deviations from the often assumed Poisson model.Results: Using sequence coverage data of 7 Plasmodium falciparum malaria genomes (3D7 reference strain, HB3, DD2, 7G8, GB4, OX005, and OX006), we showed that empirical coverage distributions are intrinsically asymmetric and overdispersed in relation to the Poisson model. We also demonstrated a low baseline false positive rate for the proposed methodology using 3D7 resequencing data and simulation. When applied to the non-reference isolate data, our approach detected known CNV hits, including an amplification of the PfMDR1 locus in DD2 and a large deletion in the CLAG3.2 gene in GB4, and putative novel CNV regions. When compared to the recently available FREEC and cn.MOPS approaches, our findings were more concordant with putative hits from the highest quality array data for the 7G8 and GB4 isolates.Conclusions: In summary, the proposed methodology brings an increase in flexibility, robustness, accuracy and statistical rigour to CNV detection using sequence coverage data. 2013 Seplveda et al.; licensee BioMed Central Ltd.

  5. Variations in CCL3L gene cluster sequence and non-specific gene copy numbers

    Directory of Open Access Journals (Sweden)

    Edberg Jeffrey C

    2010-03-01

    Full Text Available Abstract Background Copy number variations (CNVs of the gene CC chemokine ligand 3-like1 (CCL3L1 have been implicated in HIV-1 susceptibility, but the association has been inconsistent. CCL3L1 shares homology with a cluster of genes localized to chromosome 17q12, namely CCL3, CCL3L2, and, CCL3L3. These genes are involved in host defense and inflammatory processes. Several CNV assays have been developed for the CCL3L1 gene. Findings Through pairwise and multiple alignments of these genes, we have shown that the homology between these genes ranges from 50% to 99% in complete gene sequences and from 70-100% in the exonic regions, with CCL3L1 and CCL3L3 being identical. By use of MEGA 4 and BioEdit, we aligned sense primers, anti-sense primers, and probes used in several previously described assays against pre-multiple alignments of all four chemokine genes. Each set of probes and primers aligned and matched with overlapping sequences in at least two of the four genes, indicating that previously utilized RT-PCR based CNV assays are not specific for only CCL3L1. The four available assays measured median copies of 2 and 3-4 in European and African American, respectively. The concordance between the assays ranged from 0.44-0.83 suggesting individual discordant calls and inconsistencies with the assays from the expected gene coverage from the known sequence. Conclusions This indicates that some of the inconsistencies in the association studies could be due to assays that provide heterogenous results. Sequence information to determine CNV of the three genes separately would allow to test whether their association with the pathogenesis of a human disease or phenotype is affected by an individual gene or by a combination of these genes.

  6. Screening of whole genome sequences identified high-impact variants for stallion fertility.

    Science.gov (United States)

    Schrimpf, Rahel; Gottschalk, Maren; Metzger, Julia; Martinsson, Gunilla; Sieme, Harald; Distl, Ottmar

    2016-04-14

    g.37455302G>A in NOTCH1 with the de-regressed estimated breeding values of the paternal component of the pregnancy rate per estrus (EBV-PAT). For 9 high-impact variants within the genes CFTR, OVGP1, FBXO43, TSSK6, PKD1, FOXP1, TCP11, SPATA31E1 and NOTCH1 (g.37453246G>C) absence of the homozygous mutant genotype in the validation sample of all 337 fertile stallions was obvious. Therefore, these variants were considered as potentially deleterious factors for stallion fertility. In conclusion, this study revealed 17 genetic variants with a predicted high damaging effect on protein structure and missing homozygous mutant genotype. The g.37455302G>A NOTCH1 variant was identified as a significant stallion fertility locus in Hanoverian stallions and further 9 candidate fertility loci with missing homozygous mutant genotypes were validated in a panel including 19 horse breeds. To our knowledge this is the first study in horses using next generation sequencing data to uncover strong candidate factors for stallion fertility.

  7. A model-based clustering method to detect infectious disease transmission outbreaks from sequence variation.

    Directory of Open Access Journals (Sweden)

    Rosemary M McCloskey

    2017-11-01

    Full Text Available Clustering infections by genetic similarity is a popular technique for identifying potential outbreaks of infectious disease, in part because sequences are now routinely collected for clinical management of many infections. A diverse number of nonparametric clustering methods have been developed for this purpose. These methods are generally intuitive, rapid to compute, and readily scale with large data sets. However, we have found that nonparametric clustering methods can be biased towards identifying clusters of diagnosis-where individuals are sampled sooner post-infection-rather than the clusters of rapid transmission that are meant to be potential foci for public health efforts. We develop a fundamentally new approach to genetic clustering based on fitting a Markov-modulated Poisson process (MMPP, which represents the evolution of transmission rates along the tree relating different infections. We evaluated this model-based method alongside five nonparametric clustering methods using both simulated and actual HIV sequence data sets. For simulated clusters of rapid transmission, the MMPP clustering method obtained higher mean sensitivity (85% and specificity (91% than the nonparametric methods. When we applied these clustering methods to published sequences from a study of HIV-1 genetic clusters in Seattle, USA, we found that the MMPP method categorized about half (46% as many individuals to clusters compared to the other methods. Furthermore, the mean internal branch lengths that approximate transmission rates were significantly shorter in clusters extracted using MMPP, but not by other methods. We determined that the computing time for the MMPP method scaled linearly with the size of trees, requiring about 30 seconds for a tree of 1,000 tips and about 20 minutes for 50,000 tips on a single computer. This new approach to genetic clustering has significant implications for the application of pathogen sequence analysis to public health, where

  8. Identification, characterization, and utilization of genome-wide simple sequence repeats to identify a QTL for acidity in apple

    Science.gov (United States)

    2012-01-01

    Background Apple is an economically important fruit crop worldwide. Developing a genetic linkage map is a critical step towards mapping and cloning of genes responsible for important horticultural traits in apple. To facilitate linkage map construction, we surveyed and characterized the distribution and frequency of perfect microsatellites in assembled contig sequences of the apple genome. Results A total of 28,538 SSRs have been identified in the apple genome, with an overall density of 40.8 SSRs per Mb. Di-nucleotide repeats are the most frequent microsatellites in the apple genome, accounting for 71.9% of all microsatellites. AT/TA repeats are the most frequent in genomic regions, accounting for 38.3% of all the G-SSRs, while AG/GA dimers prevail in transcribed sequences, and account for 59.4% of all EST-SSRs. A total set of 310 SSRs is selected to amplify eight apple genotypes. Of these, 245 (79.0%) are found to be polymorphic among cultivars and wild species tested. AG/GA motifs in genomic regions have detected more alleles and higher PIC values than AT/TA or AC/CA motifs. Moreover, AG/GA repeats are more variable than any other dimers in apple, and should be preferentially selected for studies, such as genetic diversity and linkage map construction. A total of 54 newly developed apple SSRs have been genetically mapped. Interestingly, clustering of markers with distorted segregation is observed on linkage groups 1, 2, 10, 15, and 16. A QTL responsible for malic acid content of apple fruits is detected on linkage group 8, and accounts for ~13.5% of the observed phenotypic variation. Conclusions This study demonstrates that di-nucleotide repeats are prevalent in the apple genome and that AT/TA and AG/GA repeats are the most frequent in genomic and transcribed sequences of apple, respectively. All SSR motifs identified in this study as well as those newly mapped SSRs will serve as valuable resources for pursuing apple genetic studies, aiding the apple breeding

  9. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene.

    Science.gov (United States)

    Bonnefond, Amélie; Philippe, Julien; Durand, Emmanuelle; Dechaume, Aurélie; Huyvaert, Marlène; Montagne, Louise; Marre, Michel; Balkau, Beverley; Fajardy, Isabelle; Vambergue, Anne; Vatin, Vincent; Delplanque, Jérôme; Le Guilcher, David; De Graeve, Franck; Lecoeur, Cécile; Sand, Olivier; Vaxillaire, Martine; Froguel, Philippe

    2012-01-01

    Maturity-onset of the young (MODY) is a clinically heterogeneous form of diabetes characterized by an autosomal-dominant mode of inheritance, an onset before the age of 25 years, and a primary defect in the pancreatic beta-cell function. Approximately 30% of MODY families remain genetically unexplained (MODY-X). Here, we aimed to use whole-exome sequencing (WES) in a four-generation MODY-X family to identify a new susceptibility gene for MODY. WES (Agilent-SureSelect capture/Illumina-GAIIx sequencing) was performed in three affected and one non-affected relatives in the MODY-X family. We then performed a high-throughput multiplex genotyping (Illumina-GoldenGate assay) of the putative causal mutations in the whole family and in 406 controls. A linkage analysis was also carried out. By focusing on variants of interest (i.e. gains of stop codon, frameshift, non-synonymous and splice-site variants not reported in dbSNP130) present in the three affected relatives and not present in the control, we found 69 mutations. However, as WES was not uniform between samples, a total of 324 mutations had to be assessed in the whole family and in controls. Only one mutation (p.Glu227Lys in KCNJ11) co-segregated with diabetes in the family (with a LOD-score of 3.68). No KCNJ11 mutation was found in 25 other MODY-X unrelated subjects. Beyond neonatal diabetes mellitus (NDM), KCNJ11 is also a MODY gene ('MODY13'), confirming the wide spectrum of diabetes related phenotypes due to mutations in NDM genes (i.e. KCNJ11, ABCC8 and INS). Therefore, the molecular diagnosis of MODY should include KCNJ11 as affected carriers can be ideally treated with oral sulfonylureas.

  10. Whole-exome sequencing and high throughput genotyping identified KCNJ11 as the thirteenth MODY gene.

    Directory of Open Access Journals (Sweden)

    Amélie Bonnefond

    Full Text Available BACKGROUND: Maturity-onset of the young (MODY is a clinically heterogeneous form of diabetes characterized by an autosomal-dominant mode of inheritance, an onset before the age of 25 years, and a primary defect in the pancreatic beta-cell function. Approximately 30% of MODY families remain genetically unexplained (MODY-X. Here, we aimed to use whole-exome sequencing (WES in a four-generation MODY-X family to identify a new susceptibility gene for MODY. METHODOLOGY: WES (Agilent-SureSelect capture/Illumina-GAIIx sequencing was performed in three affected and one non-affected relatives in the MODY-X family. We then performed a high-throughput multiplex genotyping (Illumina-GoldenGate assay of the putative causal mutations in the whole family and in 406 controls. A linkage analysis was also carried out. PRINCIPAL FINDINGS: By focusing on variants of interest (i.e. gains of stop codon, frameshift, non-synonymous and splice-site variants not reported in dbSNP130 present in the three affected relatives and not present in the control, we found 69 mutations. However, as WES was not uniform between samples, a total of 324 mutations had to be assessed in the whole family and in controls. Only one mutation (p.Glu227Lys in KCNJ11 co-segregated with diabetes in the family (with a LOD-score of 3.68. No KCNJ11 mutation was found in 25 other MODY-X unrelated subjects. CONCLUSIONS/SIGNIFICANCE: Beyond neonatal diabetes mellitus (NDM, KCNJ11 is also a MODY gene ('MODY13', confirming the wide spectrum of diabetes related phenotypes due to mutations in NDM genes (i.e. KCNJ11, ABCC8 and INS. Therefore, the molecular diagnosis of MODY should include KCNJ11 as affected carriers can be ideally treated with oral sulfonylureas.

  11. Diagnostic SNPs for inferring population structure in American mink (Neovison vison) identified through RAD sequencing

    DEFF Research Database (Denmark)

    2015-01-01

    Data from: "Diagnostic SNPs for inferring population structure in American mink (Neovison vison) identified through RAD sequencing" in Genomic Resources Notes accepted 1 October 2014 to 30 November 2014....

  12. Identifying Patient-Specific Epstein-Barr Nuclear Antigen-1 Genetic Variation and Potential Autoreactive Targets Relevant to Multiple Sclerosis Pathogenesis.

    Directory of Open Access Journals (Sweden)

    Monika Tschochner

    Full Text Available Epstein-Barr virus (EBV infection represents a major environmental risk factor for multiple sclerosis (MS, with evidence of selective expansion of Epstein-Barr Nuclear Antigen-1 (EBNA1-specific CD4+ T cells that cross-recognize MS-associated myelin antigens in MS patients. HLA-DRB1*15-restricted antigen presentation also appears to determine susceptibility given its role as a dominant risk allele. In this study, we have utilised standard and next-generation sequencing techniques to investigate EBNA-1 sequence variation and its relationship to HLA-DR15 binding affinity, as well as examining potential cross-reactive immune targets within the central nervous system proteome.Sanger sequencing was performed on DNA isolated from peripheral blood samples from 73 Western Australian MS cases, without requirement for primary culture, with additional FLX 454 Roche sequencing in 23 samples to identify low-frequency variants. Patient-derived viral sequences were used to predict HLA-DRB1*1501 epitopes (NetMHCII, NetMHCIIpan and candidates were evaluated for cross recognition with human brain proteins.EBNA-1 sequence variation was limited, with no evidence of multiple viral strains and only low levels of variation identified by FLX technology (8.3% nucleotide positions at a 1% cut-off. In silico epitope mapping revealed two known HLA-DRB1*1501-restricted epitopes ('AEG': aa 481-496 and 'MVF': aa 562-577, and two putative epitopes between positions 502-543. We identified potential cross-reactive targets involving a number of major myelin antigens including experimentally confirmed HLA-DRB1*15-restricted epitopes as well as novel candidate antigens within myelin and paranodal assembly proteins that may be relevant to MS pathogenesis.This study demonstrates the feasibility of obtaining autologous EBNA-1 sequences directly from buffy coat samples, and confirms divergence of these sequences from standard laboratory strains. This approach has identified a number of

  13. The Ebola virus VP35 protein binds viral immunostimulatory and host RNAs identified through deep sequencing.

    Directory of Open Access Journals (Sweden)

    Kari A Dilley

    Full Text Available Ebola virus and Marburg virus are members of the Filovirdae family and causative agents of hemorrhagic fever with high fatality rates in humans. Filovirus virulence is partially attributed to the VP35 protein, a well-characterized inhibitor of the RIG-I-like receptor pathway that triggers the antiviral interferon (IFN response. Prior work demonstrates the ability of VP35 to block potent RIG-I activators, such as Sendai virus (SeV, and this IFN-antagonist activity is directly correlated with its ability to bind RNA. Several structural studies demonstrate that VP35 binds short synthetic dsRNAs; yet, there are no data that identify viral immunostimulatory RNAs (isRNA or host RNAs bound to VP35 in cells. Utilizing a SeV infection model, we demonstrate that both viral isRNA and host RNAs are bound to Ebola and Marburg VP35s in cells. By deep sequencing the purified VP35-bound RNA, we identified the SeV copy-back defective interfering (DI RNA, previously identified as a robust RIG-I activator, as the isRNA bound by multiple filovirus VP35 proteins, including the VP35 protein from the West African outbreak strain (Makona EBOV. Moreover, RNAs isolated from a VP35 RNA-binding mutant were not immunostimulatory and did not include the SeV DI RNA. Strikingly, an analysis of host RNAs bound by wild-type, but not mutant, VP35 revealed that select host RNAs are preferentially bound by VP35 in cell culture. Taken together, these data support a model in which VP35 sequesters isRNA in virus-infected cells to avert RIG-I like receptor (RLR activation.

  14. The Ebola virus VP35 protein binds viral immunostimulatory and host RNAs identified through deep sequencing.

    Science.gov (United States)

    Dilley, Kari A; Voorhies, Alexander A; Luthra, Priya; Puri, Vinita; Stockwell, Timothy B; Lorenzi, Hernan; Basler, Christopher F; Shabman, Reed S

    2017-01-01

    Ebola virus and Marburg virus are members of the Filovirdae family and causative agents of hemorrhagic fever with high fatality rates in humans. Filovirus virulence is partially attributed to the VP35 protein, a well-characterized inhibitor of the RIG-I-like receptor pathway that triggers the antiviral interferon (IFN) response. Prior work demonstrates the ability of VP35 to block potent RIG-I activators, such as Sendai virus (SeV), and this IFN-antagonist activity is directly correlated with its ability to bind RNA. Several structural studies demonstrate that VP35 binds short synthetic dsRNAs; yet, there are no data that identify viral immunostimulatory RNAs (isRNA) or host RNAs bound to VP35 in cells. Utilizing a SeV infection model, we demonstrate that both viral isRNA and host RNAs are bound to Ebola and Marburg VP35s in cells. By deep sequencing the purified VP35-bound RNA, we identified the SeV copy-back defective interfering (DI) RNA, previously identified as a robust RIG-I activator, as the isRNA bound by multiple filovirus VP35 proteins, including the VP35 protein from the West African outbreak strain (Makona EBOV). Moreover, RNAs isolated from a VP35 RNA-binding mutant were not immunostimulatory and did not include the SeV DI RNA. Strikingly, an analysis of host RNAs bound by wild-type, but not mutant, VP35 revealed that select host RNAs are preferentially bound by VP35 in cell culture. Taken together, these data support a model in which VP35 sequesters isRNA in virus-infected cells to avert RIG-I like receptor (RLR) activation.

  15. Identifying Genetic Differences Between Dongxiang Blue-Shelled and White Leghorn Chickens Using Sequencing Data

    Directory of Open Access Journals (Sweden)

    Qing-bo Zhao

    2018-02-01

    Full Text Available The Dongxiang Blue-shelled chicken is one of the most valuable Chinese indigenous poultry breeds. However, compared to the Italian native White Leghorn, although this Chinese breed possesses numerous favorable characteristics, it also exhibits lower growth performance and fertility. Here, we utilized genotyping sequencing data obtained via genome reduction on a sequencing platform to detect 100,114 single nucleotide polymorphisms and perform further biological analysis and functional annotation. We employed cross-population extended haplotype homozygosity, eigenvector decomposition combined with genome-wide association studies (EigenGWAS, and efficient mixed-model association expedited methods to detect areas of the genome that are potential selected regions (PSR in both chicken breeds, and performed gene ontology (GO enrichment and quantitative trait loci (QTL analyses annotating using the Kyoto Encyclopedia of Genes and Genomes. The results of this study revealed a total of 2424 outlier loci (p-value <0.01, of which 2144 occur in the White Leghorn breed and 280 occur in the Dongxiang Blue-shelled chicken. These correspond to 327 and 94 PSRs containing 297 and 54 genes, respectively. The most significantly selected genes in Blue-shelled chicken are TMEM141 and CLIC3, while the SLCO1B3 gene, related to eggshell color, was identified via EigenGWAS. We show that the White Leghorn genes JARID2, RBMS3, GPC3, TRIB2, ROBO1, SAMSN1, OSBP2, and IGFALS are involved in immunity, reproduction, and growth, and thus might represent footprints of the selection process. In contrast, we identified six significantly enriched pathways in the Dongxiang Blue-shelled chicken that are related to amino acid and lipid metabolism as well as signal transduction. Our results also reveal the presence of a GO term associated with cell metabolism that occurs mainly in the White Leghorn breed, while the most significant QTL regions mapped to the Chicken QTL Database (GG_4

  16. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations.

    Science.gov (United States)

    Zhao, Yu; Zhou, Donghui; Chen, Jia; Sun, Xiaolin

    2017-01-01

    Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs) play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis. The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10) gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR) amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI) and maximum parsimony (MP). Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes. TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  17. Insights into mechanisms of bacterial antigenic variation derived from the complete genome sequence of Anaplasma marginale.

    Science.gov (United States)

    Palmer, Guy H; Futse, James E; Knowles, Donald P; Brayton, Kelly A

    2006-10-01

    Persistence of Anaplasma spp. in the animal reservoir host is required for efficient tick-borne transmission of these pathogens to animals and humans. Using A. marginale infection of its natural reservoir host as a model, persistent infection has been shown to reflect sequential cycles in which antigenic variants emerge, replicate, and are controlled by the immune system. Variation in the immunodominant outer-membrane protein MSP2 is generated by a process of gene conversion, in which unique hypervariable region sequences (HVRs) located in pseudogenes are recombined into a single operon-linked msp2 expression site. Although organisms expressing whole HVRs derived from pseudogenes emerge early in infection, long-term persistent infection is dependent on the generation of complex mosaics in which segments from different HVRs recombine into the expression site. The resulting combinatorial diversity generates the number of variants both predicted and shown to emerge during persistence.

  18. Genetic variation and DNA fingerprinting of durian types in Malaysia using simple sequence repeat (SSR) markers.

    Science.gov (United States)

    Siew, Ging Yang; Ng, Wei Lun; Tan, Sheau Wei; Alitheen, Noorjahan Banu; Tan, Soon Guan; Yeap, Swee Keong

    2018-01-01

    Durian ( Durio zibethinus ) is one of the most popular tropical fruits in Asia. To date, 126 durian types have been registered with the Department of Agriculture in Malaysia based on phenotypic characteristics. Classification based on morphology is convenient, easy, and fast but it suffers from phenotypic plasticity as a direct result of environmental factors and age. To overcome the limitation of morphological classification, there is a need to carry out genetic characterization of the various durian types. Such data is important for the evaluation and management of durian genetic resources in producing countries. In this study, simple sequence repeat (SSR) markers were used to study the genetic variation in 27 durian types from the germplasm collection of Universiti Putra Malaysia. Based on DNA sequences deposited in Genbank, seven pairs of primers were successfully designed to amplify SSR regions in the durian DNA samples. High levels of variation among the 27 durian types were observed (expected heterozygosity, H E  = 0.35). The DNA fingerprinting power of SSR markers revealed by the combined probability of identity (PI) of all loci was 2.3×10 -3 . Unique DNA fingerprints were generated for 21 out of 27 durian types using five polymorphic SSR markers (the other two SSR markers were monomorphic). We further tested the utility of these markers by evaluating the clonal status of shared durian types from different germplasm collection sites, and found that some were not clones. The findings in this preliminary study not only shows the feasibility of using SSR markers for DNA fingerprinting of durian types, but also challenges the current classification of durian types, e.g., on whether the different types should be called "clones", "varieties", or "cultivars". Such matters have a direct impact on the regulation and management of durian genetic resources in the region.

  19. Whole exome sequencing identifies novel genes for fetal hemoglobin response to hydroxyurea in children with sickle cell anemia.

    Science.gov (United States)

    Sheehan, Vivien A; Crosby, Jacy R; Sabo, Aniko; Mortier, Nicole A; Howard, Thad A; Muzny, Donna M; Dugan-Perez, Shannon; Aygun, Banu; Nottage, Kerri A; Boerwinkle, Eric; Gibbs, Richard A; Ware, Russell E; Flanagan, Jonathan M

    2014-01-01

    Hydroxyurea has proven efficacy in children and adults with sickle cell anemia (SCA), but with considerable inter-individual variability in the amount of fetal hemoglobin (HbF) produced. Sibling and twin studies indicate that some of that drug response variation is heritable. To test the hypothesis that genetic modifiers influence pharmacological induction of HbF, we investigated phenotype-genotype associations using whole exome sequencing of children with SCA treated prospectively with hydroxyurea to maximum tolerated dose (MTD). We analyzed 171 unrelated patients enrolled in two prospective clinical trials, all treated with dose escalation to MTD. We examined two MTD drug response phenotypes: HbF (final %HbF minus baseline %HbF), and final %HbF. Analyzing individual genetic variants, we identified multiple low frequency and common variants associated with HbF induction by hydroxyurea. A validation cohort of 130 pediatric sickle cell patients treated to MTD with hydroxyurea was genotyped for 13 non-synonymous variants with the strongest association with HbF response to hydroxyurea in the discovery cohort. A coding variant in Spalt-like transcription factor, or SALL2, was associated with higher final HbF in this second independent replication sample and SALL2 represents an outstanding novel candidate gene for further investigation. These findings may help focus future functional studies and provide new insights into the pharmacological HbF upregulation by hydroxyurea in patients with SCA.

  20. A targeted sequencing panel identifies rare damaging variants in multiple genes in the cranial neural tube defect, anencephaly.

    Science.gov (United States)

    Ishida, M; Cullup, T; Boustred, C; James, C; Docker, J; English, C; Lench, N; Copp, A J; Moore, G E; Greene, N D E; Stanier, P

    2018-04-01

    Neural tube defects (NTDs) affecting the brain (anencephaly) are lethal before or at birth, whereas lower spinal defects (spina bifida) may lead to lifelong neurological handicap. Collectively, NTDs rank among the most common birth defects worldwide. This study focuses on anencephaly, which despite having a similar frequency to spina bifida and being the most common type of NTD observed in mouse models, has had more limited inclusion in genetic studies. A genetic influence is strongly implicated in determining risk of NTDs and a molecular diagnosis is of fundamental importance to families both in terms of understanding the origin of the condition and for managing future pregnancies. Here we used a custom panel of 191 NTD candidate genes to screen 90 patients with cranial NTDs (n = 85 anencephaly and n = 5 craniorachischisis) with a targeted exome sequencing platform. After filtering and comparing to our in-house control exome database (N = 509), we identified 397 rare variants (minor allele frequency, MAF < 1%), 21 of which were previously unreported and predicted damaging. This included 1 frameshift (PDGFRA), 2 stop-gained (MAT1A; NOS2) and 18 missense variations. Together with evidence for oligogenic inheritance, this study provides new information on the possible genetic causation of anencephaly. © 2017 The Authors. Clinical Genetics published by John Wiley & Sons A/S. Published by John Wiley & Sons Ltd.

  1. Whole Genome Sequencing Identifies a Missense Mutation in HES7 Associated with Short Tails in Asian Domestic Cats.

    Science.gov (United States)

    Xu, Xiao; Sun, Xin; Hu, Xue-Song; Zhuang, Yan; Liu, Yue-Chen; Meng, Hao; Miao, Lin; Yu, He; Luo, Shu-Jin

    2016-08-25

    Domestic cats exhibit abundant variations in tail morphology and serve as an excellent model to study the development and evolution of vertebrate tails. Cats with shortened and kinked tails were first recorded in the Malayan archipelago by Charles Darwin in 1868 and remain quite common today in Southeast and East Asia. To elucidate the genetic basis of short tails in Asian cats, we built a pedigree of 13 cats segregating at the trait with a founder from southern China and performed linkage mapping based on whole genome sequencing data from the pedigree. The short-tailed trait was mapped to a 5.6 Mb region of Chr E1, within which the substitution c. 5T > C in the somite segmentation-related gene HES7 was identified as the causal mutation resulting in a missense change (p.V2A). Validation in 245 unrelated cats confirmed the correlation between HES7-c. 5T > C and Chinese short-tailed feral cats as well as the Japanese Bobtail breed, indicating a common genetic basis of the two. In addition, some of our sampled kinked-tailed cats could not be explained by either HES7 or the Manx-related T-box, suggesting at least three independent events in the evolution of domestic cats giving rise to short-tailed traits.

  2. Genome-wide analysis of regulatory proteases sequences identified through bioinformatics data mining in Taenia solium.

    Science.gov (United States)

    Yan, Hong-Bin; Lou, Zhong-Zi; Li, Li; Brindley, Paul J; Zheng, Yadong; Luo, Xuenong; Hou, Junling; Guo, Aijiang; Jia, Wan-Zhong; Cai, Xuepeng

    2014-06-04

    Cysticercosis remains a major neglected tropical disease of humanity in many regions, especially in sub-Saharan Africa, Central America and elsewhere. Owing to the emerging drug resistance and the inability of current drugs to prevent re-infection, identification of novel vaccines and chemotherapeutic agents against Taenia solium and related helminth pathogens is a public health priority. The T. solium genome and the predicted proteome were reported recently, providing a wealth of information from which new interventional targets might be identified. In order to characterize and classify the entire repertoire of protease-encoding genes of T. solium, which act fundamental biological roles in all life processes, we analyzed the predicted proteins of this cestode through a combination of bioinformatics tools. Functional annotation was performed to yield insights into the signaling processes relevant to the complex developmental cycle of this tapeworm and to highlight a suite of the proteases as potential intervention targets. Within the genome of this helminth parasite, we identified 200 open reading frames encoding proteases from five clans, which correspond to 1.68% of the 11,902 protein-encoding genes predicted to be present in its genome. These proteases include calpains, cytosolic, mitochondrial signal peptidases, ubiquitylation related proteins, and others. Many not only show significant similarity to proteases in the Conserved Domain Database but have conserved active sites and catalytic domains. KEGG Automatic Annotation Server (KAAS) analysis indicated that ~60% of these proteases share strong sequence identities with proteins of the KEGG database, which are involved in human disease, metabolic pathways, genetic information processes, cellular processes, environmental information processes and organismal systems. Also, we identified signal peptides and transmembrane helices through comparative analysis with classes of important regulatory proteases

  3. An effective approach for annotation of protein families with low sequence similarity and conserved motifs: identifying GDSL hydrolases across the plant kingdom.

    Science.gov (United States)

    Vujaklija, Ivan; Bielen, Ana; Paradžik, Tina; Biđin, Siniša; Goldstein, Pavle; Vujaklija, Dušica

    2016-02-18

    The massive accumulation of protein sequences arising from the rapid development of high-throughput sequencing, coupled with automatic annotation, results in high levels of incorrect annotations. In this study, we describe an approach to decrease annotation errors of protein families characterized by low overall sequence similarity. The GDSL lipolytic family comprises proteins with multifunctional properties and high potential for pharmaceutical and industrial applications. The number of proteins assigned to this family has increased rapidly over the last few years. In particular, the natural abundance of GDSL enzymes reported recently in plants indicates that they could be a good source of novel GDSL enzymes. We noticed that a significant proportion of annotated sequences lack specific GDSL motif(s) or catalytic residue(s). Here, we applied motif-based sequence analyses to identify enzymes possessing conserved GDSL motifs in selected proteomes across the plant kingdom. Motif-based HMM scanning (Viterbi decoding-VD and posterior decoding-PD) and the here described PD/VD protocol were successfully applied on 12 selected plant proteomes to identify sequences with GDSL motifs. A significant number of identified GDSL sequences were novel. Moreover, our scanning approach successfully detected protein sequences lacking at least one of the essential motifs (171/820) annotated by Pfam profile search (PfamA) as GDSL. Based on these analyses we provide a curated list of GDSL enzymes from the selected plants. CLANS clustering and phylogenetic analysis helped us to gain a better insight into the evolutionary relationship of all identified GDSL sequences. Three novel GDSL subfamilies as well as unreported variations in GDSL motifs were discovered in this study. In addition, analyses of selected proteomes showed a remarkable expansion of GDSL enzymes in the lycophyte, Selaginella moellendorffii. Finally, we provide a general motif-HMM scanner which is easily accessible through

  4. Whole exome sequencing identifies mutations in Usher syndrome genes in profoundly deaf Tunisian patients.

    Science.gov (United States)

    Riahi, Zied; Bonnet, Crystel; Zainine, Rim; Lahbib, Saida; Bouyacoub, Yosra; Bechraoui, Rym; Marrakchi, Jihène; Hardelin, Jean-Pierre; Louha, Malek; Largueche, Leila; Ben Yahia, Salim; Kheirallah, Moncef; Elmatri, Leila; Besbes, Ghazi; Abdelhak, Sonia; Petit, Christine

    2015-01-01

    Usher syndrome (USH) is an autosomal recessive disorder characterized by combined deafness-blindness. It accounts for about 50% of all hereditary deafness blindness cases. Three clinical subtypes (USH1, USH2, and USH3) are described, of which USH1 is the most severe form, characterized by congenital profound deafness, constant vestibular dysfunction, and a prepubertal onset of retinitis pigmentosa. We performed whole exome sequencing in four unrelated Tunisian patients affected by apparently isolated, congenital profound deafness, with reportedly normal ocular fundus examination. Four biallelic mutations were identified in two USH1 genes: a splice acceptor site mutation, c.2283-1G>T, and a novel missense mutation, c.5434G>A (p.Glu1812Lys), in MYO7A, and two previously unreported mutations in USH1G, i.e. a frameshift mutation, c.1195_1196delAG (p.Leu399Alafs*24), and a nonsense mutation, c.52A>T (p.Lys18*). Another ophthalmological examination including optical coherence tomography actually showed the presence of retinitis pigmentosa in all the patients. Our findings provide evidence that USH is under-diagnosed in Tunisian deaf patients. Yet, early diagnosis of USH is of utmost importance because these patients should undergo cochlear implant surgery in early childhood, in anticipation of the visual loss.

  5. Exome sequencing identifies a novel SMCHD1 mutation in facioscapulohumeral muscular dystrophy 2.

    Science.gov (United States)

    Mitsuhashi, Satomi; Boyden, Steven E; Estrella, Elicia A; Jones, Takako I; Rahimov, Fedik; Yu, Timothy W; Darras, Basil T; Amato, Anthony A; Folkerth, Rebecca D; Jones, Peter L; Kunkel, Louis M; Kang, Peter B

    2013-12-01

    FSHD2 is a rare form of facioscapulohumeral muscular dystrophy (FSHD) characterized by the absence of a contraction in the D4Z4 macrosatellite repeat region on chromosome 4q35 that is the hallmark of FSHD1. However, hypomethylation of this region is common to both subtypes. Recently, mutations in SMCHD1 combined with a permissive 4q35 allele were reported to cause FSHD2. We identified a novel p.Lys275del SMCHD1 mutation in a family affected with FSHD2 using whole-exome sequencing and linkage analysis. This mutation alters a highly conserved amino acid in the ATPase domain of SMCHD1. Subject III-11 is a male who developed asymmetrical muscle weakness characteristic of FSHD at 13 years. Physical examination revealed marked bilateral atrophy at biceps brachii, bilateral scapular winging, some asymmetrical weakness at tibialis anterior and peroneal muscles, and mild lower facial weakness. Biopsy of biceps brachii in subject II-5, the father of III-11, demonstrated lobulated fibers and dystrophic changes. Endomysial and perivascular inflammation was found, which has been reported in FSHD1 but not FSHD2. Given the previous report of SMCHD1 mutations in FSHD2 and the clinical presentations consistent with the FSHD phenotype, we conclude that the SMCHD1 mutation is the likely cause of the disease in this family. Copyright © 2013 Elsevier B.V. All rights reserved.

  6. Antimicrobial susceptibility among clinical Nocardia species identified by multilocus sequence analysis.

    Science.gov (United States)

    McTaggart, Lisa R; Doucet, Jennifer; Witkowska, Maria; Richardson, Susan E

    2015-01-01

    Antimicrobial susceptibility patterns of 112 clinical isolates, 28 type strains, and 9 reference strains of Nocardia were determined using the Sensititre Rapmyco microdilution panel (Thermo Fisher, Inc.). Isolates were identified by highly discriminatory multilocus sequence analysis and were chosen to represent the diversity of species recovered from clinical specimens in Ontario, Canada. Susceptibility to the most commonly used drug, trimethoprim-sulfamethoxazole, was observed in 97% of isolates. Linezolid and amikacin were also highly effective; 100% and 99% of all isolates demonstrated a susceptible phenotype. For the remaining antimicrobials, resistance was species specific with isolates of Nocardia otitidiscaviarum, N. brasiliensis, N. abscessus complex, N. nova complex, N. transvalensis complex, N. farcinica, and N. cyriacigeorgica displaying the traditional characteristic drug pattern types. In addition, the antimicrobial susceptibility profiles of a variety of rarely encountered species isolated from clinical specimens are reported for the first time and were categorized into four additional drug pattern types. Finally, MICs for the control strains N. nova ATCC BAA-2227, N. asteroides ATCC 19247(T), and N. farcinica ATCC 23826 were robustly determined to demonstrate method reproducibility and suitability of the commercial Sensititre Rapmyco panel for antimicrobial susceptibility testing of Nocardia spp. isolated from clinical specimens. The reported values will facilitate quality control and standardization among laboratories. Copyright © 2015, American Society for Microbiology. All Rights Reserved.

  7. Whole exome sequencing identifies mutations in Usher syndrome genes in profoundly deaf Tunisian patients.

    Directory of Open Access Journals (Sweden)

    Zied Riahi

    Full Text Available Usher syndrome (USH is an autosomal recessive disorder characterized by combined deafness-blindness. It accounts for about 50% of all hereditary deafness blindness cases. Three clinical subtypes (USH1, USH2, and USH3 are described, of which USH1 is the most severe form, characterized by congenital profound deafness, constant vestibular dysfunction, and a prepubertal onset of retinitis pigmentosa. We performed whole exome sequencing in four unrelated Tunisian patients affected by apparently isolated, congenital profound deafness, with reportedly normal ocular fundus examination. Four biallelic mutations were identified in two USH1 genes: a splice acceptor site mutation, c.2283-1G>T, and a novel missense mutation, c.5434G>A (p.Glu1812Lys, in MYO7A, and two previously unreported mutations in USH1G, i.e. a frameshift mutation, c.1195_1196delAG (p.Leu399Alafs*24, and a nonsense mutation, c.52A>T (p.Lys18*. Another ophthalmological examination including optical coherence tomography actually showed the presence of retinitis pigmentosa in all the patients. Our findings provide evidence that USH is under-diagnosed in Tunisian deaf patients. Yet, early diagnosis of USH is of utmost importance because these patients should undergo cochlear implant surgery in early childhood, in anticipation of the visual loss.

  8. Whole-genome sequencing identifies genomic heterogeneity at a nucleotide and chromosomal level in bladder cancer

    Science.gov (United States)

    Morrison, Carl D.; Liu, Pengyuan; Woloszynska-Read, Anna; Zhang, Jianmin; Luo, Wei; Qin, Maochun; Bshara, Wiam; Conroy, Jeffrey M.; Sabatini, Linda; Vedell, Peter; Xiong, Donghai; Liu, Song; Wang, Jianmin; Shen, He; Li, Yinwei; Omilian, Angela R.; Hill, Annette; Head, Karen; Guru, Khurshid; Kunnev, Dimiter; Leach, Robert; Eng, Kevin H.; Darlak, Christopher; Hoeflich, Christopher; Veeranki, Srividya; Glenn, Sean; You, Ming; Pruitt, Steven C.; Johnson, Candace S.; Trump, Donald L.

    2014-01-01

    Using complete genome analysis, we sequenced five bladder tumors accrued from patients with muscle-invasive transitional cell carcinoma of the urinary bladder (TCC-UB) and identified a spectrum of genomic aberrations. In three tumors, complex genotype changes were noted. All three had tumor protein p53 mutations and a relatively large number of single-nucleotide variants (SNVs; average of 11.2 per megabase), structural variants (SVs; average of 46), or both. This group was best characterized by chromothripsis and the presence of subclonal populations of neoplastic cells or intratumoral mutational heterogeneity. Here, we provide evidence that the process of chromothripsis in TCC-UB is mediated by nonhomologous end-joining using kilobase, rather than megabase, fragments of DNA, which we refer to as “stitchers,” to repair this process. We postulate that a potential unifying theme among tumors with the more complex genotype group is a defective replication–licensing complex. A second group (two bladder tumors) had no chromothripsis, and a simpler genotype, WT tumor protein p53, had relatively few SNVs (average of 5.9 per megabase) and only a single SV. There was no evidence of a subclonal population of neoplastic cells. In this group, we used a preclinical model of bladder carcinoma cell lines to study a unique SV (translocation and amplification) of the gene glutamate receptor ionotropic N-methyl D-aspertate as a potential new therapeutic target in bladder cancer. PMID:24469795

  9. PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3' UTRs and coding sequences.

    Science.gov (United States)

    Šulc, Miroslav; Marín, Ray M; Robins, Harlan S; Vaníček, Jiří

    2015-07-01

    The purpose of the proposed web server, publicly available at http://paccmit.epfl.ch, is to provide a user-friendly interface to two algorithms for predicting messenger RNA (mRNA) molecules regulated by microRNAs: (i) PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets), which identifies primarily mRNA transcripts targeted in their 3' untranslated regions (3' UTRs), and (ii) PACCMIT-CDS, designed to find mRNAs targeted within their coding sequences (CDSs). While PACCMIT belongs among the accurate algorithms for predicting conserved microRNA targets in the 3' UTRs, the main contribution of the web server is 2-fold: PACCMIT provides an accurate tool for predicting targets also of weakly conserved or non-conserved microRNAs, whereas PACCMIT-CDS addresses the lack of similar portals adapted specifically for targets in CDS. The web server asks the user for microRNAs and mRNAs to be analyzed, accesses the precomputed P-values for all microRNA-mRNA pairs from a database for all mRNAs and microRNAs in a given species, ranks the predicted microRNA-mRNA pairs, evaluates their significance according to the false discovery rate and finally displays the predictions in a tabular form. The results are also available for download in several standard formats. © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research.

  10. PACCMIT/PACCMIT-CDS: identifying microRNA targets in 3′ UTRs and coding sequences

    Science.gov (United States)

    Šulc, Miroslav; Marín, Ray M.; Robins, Harlan S.; Vaníček, Jiří

    2015-01-01

    The purpose of the proposed web server, publicly available at http://paccmit.epfl.ch, is to provide a user-friendly interface to two algorithms for predicting messenger RNA (mRNA) molecules regulated by microRNAs: (i) PACCMIT (Prediction of ACcessible and/or Conserved MIcroRNA Targets), which identifies primarily mRNA transcripts targeted in their 3′ untranslated regions (3′ UTRs), and (ii) PACCMIT-CDS, designed to find mRNAs targeted within their coding sequences (CDSs). While PACCMIT belongs among the accurate algorithms for predicting conserved microRNA targets in the 3′ UTRs, the main contribution of the web server is 2-fold: PACCMIT provides an accurate tool for predicting targets also of weakly conserved or non-conserved microRNAs, whereas PACCMIT-CDS addresses the lack of similar portals adapted specifically for targets in CDS. The web server asks the user for microRNAs and mRNAs to be analyzed, accesses the precomputed P-values for all microRNA–mRNA pairs from a database for all mRNAs and microRNAs in a given species, ranks the predicted microRNA–mRNA pairs, evaluates their significance according to the false discovery rate and finally displays the predictions in a tabular form. The results are also available for download in several standard formats. PMID:25948580

  11. Global Transcriptome Sequencing Identifies Chlamydospore Specific Markers in Candida albicans and Candida dubliniensis

    LENUS (Irish Health Repository)

    Palige, Katja

    2013-04-15

    Candida albicans and Candida dubliniensis are pathogenic fungi that are highly related but differ in virulence and in some phenotypic traits. During in vitro growth on certain nutrient-poor media, C. albicans and C. dubliniensis are the only yeast species which are able to produce chlamydospores, large thick-walled cells of unknown function. Interestingly, only C. dubliniensis forms pseudohyphae with abundant chlamydospores when grown on Staib medium, while C. albicans grows exclusively as a budding yeast. In order to further our understanding of chlamydospore development and assembly, we compared the global transcriptional profile of both species during growth in liquid Staib medium by RNA sequencing. We also included a C. albicans mutant in our study which lacks the morphogenetic transcriptional repressor Nrg1. This strain, which is characterized by its constitutive pseudohyphal growth, specifically produces masses of chlamydospores in Staib medium, similar to C. dubliniensis. This comparative approach identified a set of putatively chlamydospore-related genes. Two of the homologous C. albicans and C. dubliniensis genes (CSP1 and CSP2) which were most strongly upregulated during chlamydospore development were analysed in more detail. By use of the green fluorescent protein as a reporter, the encoded putative cell wall related proteins were found to exclusively localize to C. albicans and C. dubliniensis chlamydospores. Our findings uncover the first chlamydospore specific markers in Candida species and provide novel insights in the complex morphogenetic development of these important fungal pathogens.

  12. Benchmarking to Identify Practice Variation in Test Ordering: A Potential Tool for Utilization Management.

    Science.gov (United States)

    Signorelli, Heather; Straseski, Joely A; Genzen, Jonathan R; Walker, Brandon S; Jackson, Brian R; Schmidt, Robert L

    2015-01-01

    Appropriate test utilization is usually evaluated by adherence to published guidelines. In many cases, medical guidelines are not available. Benchmarking has been proposed as a method to identify practice variations that may represent inappropriate testing. This study investigated the use of benchmarking to identify sites with inappropriate utilization of testing for a particular analyte. We used a Web-based survey to compare 2 measures of vitamin D utilization: overall testing intensity (ratio of total vitamin D orders to blood-count orders) and relative testing intensity (ratio of 1,25(OH)2D to 25(OH)D test orders). A total of 81 facilities contributed data. The average overall testing intensity index was 0.165, or approximately 1 vitamin D test for every 6 blood-count tests. The average relative testing intensity index was 0.055, or one 1,25(OH)2D test for every 18 of the 25(OH)D tests. Both indexes varied considerably. Benchmarking can be used as a screening tool to identify outliers that may be associated with inappropriate test utilization. Copyright© by the American Society for Clinical Pathology (ASCP).

  13. Effect of laying sequence on egg mercury in captive zebra finches: an interpretation considering individual variation.

    Science.gov (United States)

    Ou, Langbo; Varian-Ramos, Claire W; Cristol, Daniel A

    2015-08-01

    Bird eggs are used widely as noninvasive bioindicators for environmental mercury availability. Previous studies, however, have found varying relationships between laying sequence and egg mercury concentrations. Some studies have reported that the mercury concentration was higher in first-laid eggs or declined across the laying sequence, whereas in other studies mercury concentration was not related to egg order. Approximately 300 eggs (61 clutches) were collected from captive zebra finches dosed throughout their reproductive lives with methylmercury (0.3 μg/g, 0.6 μg/g, 1.2 μg/g, or 2.4 μg/g wet wt in diet); the total mercury concentration (mean ± standard deviation [SD] dry wt basis) of their eggs was 7.03 ± 1.38 μg/g, 14.15 ± 2.52 μg/g, 26.85 ± 5.85 μg/g, and 49.76 ± 10.37 μg/g, respectively (equivalent to fresh wt egg mercury concentrations of 1.24 μg/g, 2.50 μg/g, 4.74 μg/g, and 8.79 μg/g). The authors observed a significant decrease in the mercury concentration of successive eggs when compared with the first egg and notable variation between clutches within treatments. The mercury level of individual females within and among treatments did not alter this relationship. Based on the results, sampling of a single egg in each clutch from any position in the laying sequence is sufficient for purposes of population risk assessment, but it is not recommended as a proxy for individual female exposure or as an estimate of average mercury level within the clutch. © 2015 SETAC.

  14. Exploring genetic variation in the tomato (Solanum section Lycopersicon) clade by whole-genome sequencing.

    Science.gov (United States)

    Aflitos, Saulo; Schijlen, Elio; de Jong, Hans; de Ridder, Dick; Smit, Sandra; Finkers, Richard; Wang, Jun; Zhang, Gengyun; Li, Ning; Mao, Likai; Bakker, Freek; Dirks, Rob; Breit, Timo; Gravendeel, Barbara; Huits, Henk; Struss, Darush; Swanson-Wagner, Ruth; van Leeuwen, Hans; van Ham, Roeland C H J; Fito, Laia; Guignier, Laëtitia; Sevilla, Myrna; Ellul, Philippe; Ganko, Eric; Kapur, Arvind; Reclus, Emannuel; de Geus, Bernard; van de Geest, Henri; Te Lintel Hekkert, Bas; van Haarst, Jan; Smits, Lars; Koops, Andries; Sanchez-Perez, Gabino; van Heusden, Adriaan W; Visser, Richard; Quan, Zhiwu; Min, Jiumeng; Liao, Li; Wang, Xiaoli; Wang, Guangbiao; Yue, Zhen; Yang, Xinhua; Xu, Na; Schranz, Eric; Smets, Erik; Vos, Rutger; Rauwerda, Johan; Ursem, Remco; Schuit, Cees; Kerns, Mike; van den Berg, Jan; Vriezen, Wim; Janssen, Antoine; Datema, Erwin; Jahrman, Torben; Moquet, Frederic; Bonnet, Julien; Peters, Sander

    2014-10-01

    We explored genetic variation by sequencing a selection of 84 tomato accessions and related wild species representative of the Lycopersicon, Arcanum, Eriopersicon and Neolycopersicon groups, which has yielded a huge amount of precious data on sequence diversity in the tomato clade. Three new reference genomes were reconstructed to support our comparative genome analyses. Comparative sequence alignment revealed group-, species- and accession-specific polymorphisms, explaining characteristic fruit traits and growth habits in the various cultivars. Using gene models from the annotated Heinz 1706 reference genome, we observed differences in the ratio between non-synonymous and synonymous SNPs (dN/dS) in fruit diversification and plant growth genes compared to a random set of genes, indicating positive selection and differences in selection pressure between crop accessions and wild species. In wild species, the number of single-nucleotide polymorphisms (SNPs) exceeds 10 million, i.e. 20-fold higher than found in most of the crop accessions, indicating dramatic genetic erosion of crop and heirloom tomatoes. In addition, the highest levels of heterozygosity were found for allogamous self-incompatible wild species, while facultative and autogamous self-compatible species display a lower heterozygosity level. Using whole-genome SNP information for maximum-likelihood analysis, we achieved complete tree resolution, whereas maximum-likelihood trees based on SNPs from ten fruit and growth genes show incomplete resolution for the crop accessions, partly due to the effect of heterozygous SNPs. Finally, results suggest that phylogenetic relationships are correlated with habitat, indicating the occurrence of geographical races within these groups, which is of practical importance for Solanum genome evolution studies. © 2014 The Authors The Plant Journal © 2014 John Wiley & Sons Ltd.

  15. Constitutional sequence variation in the Fanconi anaemia group C (FANCC) gene in childhood acute myeloid leukaemia.

    Science.gov (United States)

    Barber, Lisa M; McGrath, Helen E N; Meyer, Stefan; Will, Andrew M; Birch, Jillian M; Eden, Osborn B; Taylor, G Malcolm

    2003-04-01

    The extent to which genetic susceptibility contributes to the causation of childhood acute myeloid leukaemia (AML) is not known. The inherited bone marrow failure disorder Fanconi anaemia (FA) carries a substantially increased risk of AML, raising the possibility that constitutional variation in the FA (FANC) genes is involved in the aetiology of childhood AML. We have screened genomic DNA extracted from remission blood samples of 97 children with sporadic AML and 91 children with sporadic acute lymphoblastic leukaemia (ALL), together with 104 cord blood DNA samples from newborn children, for variations in the Fanconi anaemia group C (FANCC) gene. We found no evidence of known FANCC pathogenic mutations in children with AML, ALL or in the cord blood samples. However, we detected 12 different FANCC sequence variants, of which five were novel to this study. Among six FANCC variants leading to amino-acid substitutions, one (S26F) was present at a fourfold greater frequency in children with AML than in the cord blood samples (odds ratio: 4.09, P = 0.047; 95% confidence interval 1.08-15.54). Our results thus do not exclude the possibility that this polymorphic variant contributes to the risk of a small proportion of childhood AML.

  16. Novel association strategy with copy number variation for identifying new risk Loci of human diseases.

    Directory of Open Access Journals (Sweden)

    Xianfeng Chen

    2010-08-01

    Full Text Available Copy number variations (CNV are important causal genetic variations for human disease; however, the lack of a statistical model has impeded the systematic testing of CNVs associated with disease in large-scale cohort.Here, we developed a novel integrated strategy to test CNV-association in genome-wide case-control studies. We converted the single-nucleotide polymorphism (SNP signal to copy number states using a well-trained hidden Markov model. We mapped the susceptible CNV-loci through SNP site-specific testing to cope with the physiological complexity of CNVs. We also ensured the credibility of the associated CNVs through further window-based CNV-pattern clustering. Genome-wide data with seven diseases were used to test our strategy and, in total, we identified 36 new susceptible loci that are associated with CNVs for the seven diseases: 5 with bipolar disorder, 4 with coronary artery disease, 1 with Crohn's disease, 7 with hypertension, 9 with rheumatoid arthritis, 7 with type 1 diabetes and 3 with type 2 diabetes. Fifteen of these identified loci were validated through genotype-association and physiological function from previous studies, which provide further confidence for our results. Notably, the genes associated with bipolar disorder converged in the phosphoinositide/calcium signaling, a well-known affected pathway in bipolar disorder, which further supports that CNVs have impact on bipolar disorder.Our results demonstrated the effectiveness and robustness of our CNV-association analysis and provided an alternative avenue for discovering new associated loci of human diseases.

  17. Novel sequence variations in LAMA2 and SGCG genes modulating cis-acting regulatory elements and RNA secondary structure

    Directory of Open Access Journals (Sweden)

    Olfa Siala

    2010-01-01

    Full Text Available In this study, we detected new sequence variations in LAMA2 and SGCG genes in 5 ethnic populations, and analysed their effect on enhancer composition and mRNA structure. PCR amplification and DNA sequencing were performed and followed by bioinformatics analyses using ESEfinder as well as MFOLD software. We found 3 novel sequence variations in the LAMA2 (c.3174+22_23insAT and c.6085 +12delA and SGCG (c.*102A/C genes. These variations were present in 210 tested healthy controls from Tunisian, Moroccan, Algerian, Lebanese and French populations suggesting that they represent novel polymorphisms within LAMA2 and SGCG genes sequences. ESEfinder showed that the c.*102A/C substitution created a new exon splicing enhancer in the 3'UTR of SGCG genes, whereas the c.6085 +12delA deletion was situated in the base pairing region between LAMA2 mRNA and the U1snRNA spliceosomal components. The RNA structure analyses showed that both variations modulated RNA secondary structure. Our results are suggestive of correlations between mRNA folding and the recruitment of spliceosomal components mediating splicing, including SR proteins. The contribution of common sequence variations to mRNA structural and functional diversity will contribute to a better study of gene expression.

  18. Spatial and Temporal Stress Drop Variations of the 2011 Tohoku Earthquake Sequence

    Science.gov (United States)

    Miyake, H.

    2013-12-01

    The 2011 Tohoku earthquake sequence consists of foreshocks, mainshock, aftershocks, and repeating earthquakes. To quantify spatial and temporal stress drop variations is important for understanding M9-class megathrust earthquakes. Variability and spatial and temporal pattern of stress drop is a basic information for rupture dynamics as well as useful to source modeling. As pointed in the ground motion prediction equations by Campbell and Bozorgnia [2008, Earthquake Spectra], mainshock-aftershock pairs often provide significant decrease of stress drop. We here focus strong motion records before and after the Tohoku earthquake, and analyze source spectral ratios considering azimuth- and distance dependency [Miyake et al., 2001, GRL]. Due to the limitation of station locations on land, spatial and temporal stress drop variations are estimated by adjusting shifts from the omega-squared source spectral model. The adjustment is based on the stochastic Green's function simulations of source spectra considering azimuth- and distance dependency. We assumed the same Green's functions for event pairs for each station, both the propagation path and site amplification effects are cancelled out. Precise studies of spatial and temporal stress drop variations have been performed [e.g., Allmann and Shearer, 2007, JGR], this study targets the relations between stress drop vs. progression of slow slip prior to the Tohoku earthquake by Kato et al. [2012, Science] and plate structures. Acknowledgement: This study is partly supported by ERI Joint Research (2013-B-05). We used the JMA unified earthquake catalogue and K-NET, KiK-net, and F-net data provided by NIED.

  19. Assessment of genetic variation for the LINE-1 retrotransposon from next generation sequence data

    Directory of Open Access Journals (Sweden)

    Ramos Kenneth

    2010-10-01

    Full Text Available Abstract Background In humans, copies of the Long Interspersed Nuclear Element 1 (LINE-1 retrotransposon comprise 21% of the reference genome, and have been shown to modulate expression and produce novel splice isoforms of transcripts from genes that span or neighbor the LINE-1 insertion site. Results In this work, newly released pilot data from the 1000 Genomes Project is analyzed to detect previously unreported full length insertions of the retrotransposon LINE-1. By direct analysis of the sequence data, we have identified 22 previously unreported LINE-1 insertion sites within the sequence data reported for a mother/father/daughter trio. Conclusions It is demonstrated here that next generation sequencing data, as well as emerging high quality datasets from individual genome projects allow us to assess the amount of heterogeneity with respect to the LINE-1 retrotransposon amongst humans, and provide us with a wealth of testable hypotheses as to the impact that this diversity may have on the health of individuals and populations.

  20. SEQATOMS: a web tool for identifying missing regions in PDB in sequence context

    NARCIS (Netherlands)

    Brandt, B.W.; Heringa, J.; Leunissen, J.A.M.

    2008-01-01

    With over 46 000 proteins, the Protein Data Bank (PDB) is the most important database with structural information of biological macromolecules. PDB files contain sequence and coordinate information. Residues present in the sequence can be absent from the coordinate section, which means their

  1. Whole genome sequencing resource identifies 18 new candidate genes for autism spectrum disorder

    NARCIS (Netherlands)

    Yuen, Ryan K C; Merico, Daniele; Bookman, Matt; Howe, Jennifer L.; Thiruvahindrapuram, Bhooma; Patel, Rohan V.; Whitney, Joe; Deflaux, Nicole; Bingham, Jonathan; Wang, Zhuozhi; Pellecchia, Giovanna; Buchanan, Janet A.; Walker, Susan; Marshall, Christian R.; Uddin, Mohammed; Zarrei, Mehdi; Deneault, Eric; D'Abate, Lia; Chan, Ada J S; Koyanagi, Stephanie; Paton, Tara; Pereira, Sergio L.; Hoang, Ny; Engchuan, Worrawat; Higginbotham, Edward J.; Ho, Karen; Lamoureux, Sylvia; Li, Weili; MacDonald, Jeffrey R.; Nalpathamkalam, Thomas; Sung, Wilson W L; Tsoi, Fiona J.; Wei, John; Xu, Lizhen; Tasse, Anne Marie; Kirby, Emily; Van Etten, William; Twigger, Simon; Roberts, Wendy; Drmic, Irene; Jilderda, Sanne; Modi, Bonnie Mackinnon; Kellam, Barbara; Szego, Michael; Cytrynbaum, Cheryl; Weksberg, Rosanna; Zwaigenbaum, Lonnie; Woodbury-Smith, Marc; Brian, Jessica; Senman, Lili; Iaboni, Alana; Doyle-Thomas, Krissy; Thompson, Ann; Chrysler, Christina; Leef, Jonathan; Savion-Lemieux, Tal; Smith, Isabel M.; Liu, Xudong; Nicolson, Rob; Seifer, Vicki; Fedele, Angie; Cook, Edwin H.; Dager, Stephen; Estes, Annette; Gallagher, Louise; Malow, Beth A.; Parr, Jeremy R.; Spence, Sarah J.; Vorstman, Jacob; Frey, Brendan J.; Robinson, James T.; Strug, Lisa J.; Fernandez, Bridget A.; Elsabbagh, Mayada; Carter, Melissa T.; Hallmayer, Joachim; Knoppers, Bartha M.; Anagnostou, Evdokia; Szatmari, Peter; Ring, Robert H.; Glazer, David; Pletcher, Mathew T.; Scherer, Stephen W.

    2017-01-01

    We are performing whole-genome sequencing of families with autism spectrum disorder (ASD) to build a resource (MSSNG) for subcategorizing the phenotypes and underlying genetic factors involved. Here we report sequencing of 5,205 samples from families with ASD, accompanied by clinical information,

  2. Exome sequencing in amyotrophic lateral sclerosis identifies risk genes and pathways

    NARCIS (Netherlands)

    Cirulli, Elizabeth T.; Lasseigne, Brittany N.; Petrovski, Slavé; Sapp, Peter C.; Dion, Patrick A.; Leblond, Claire S.; Couthouis, Julien; Lu, Yi-Fan; Wang, Quanli; Krueger, Brian J.; Ren, Zhong; Keebler, Jonathan; Han, Yujun; Levy, Shawn E.; Boone, Braden E.; Wimbish, Jack R.; Waite, Lindsay L.; Jones, Angela L.; Carulli, John P.; Day-Williams, Aaron G.; Staropoli, John F.; Xin, Winnie W.; Chesi, Alessandra; Raphael, Alya R.; McKenna-Yasek, Diane; Cady, Janet; de Jong, J. M. B. Vianney; Kenna, Kevin P.; Smith, Bradley N.; Topp, Simon; Miller, Jack; Gkazi, Athina; Al-Chalabi, Ammar; van den Berg, Leonard H.; Veldink, Jan; Silani, Vincenzo; Ticozzi, Nicola; Shaw, Christopher E.; Baloh, Robert H.; Appel, Stanley; Simpson, Ericka; Lagier-Tourenne, Clotilde; Pulst, Stefan M.; Gibson, Summer; Trojanowski, John Q.; Elman, Lauren; McCluskey, Leo; Grossman, Murray; Baas, Frank; ten Asbroek, Anneloor L. M. A.

    2015-01-01

    Amyotrophic lateral sclerosis (ALS) is a devastating neurological disease with no effective treatment. We report the results of a moderate-scale sequencing study aimed at increasing the number of genes known to contribute to predisposition for ALS. We performed whole-exome sequencing of 2869 ALS

  3. Identifying Students' Conceptions of Basic Principles in Sequence Stratigraphy

    Science.gov (United States)

    Herrera, Juan S.; Riggs, Eric M.

    2013-01-01

    Sequence stratigraphy is a major research subject in the geosciences academia and the oil industry. However, the geoscience education literature addressing students' understanding of the basic concepts of sequence stratigraphy is relatively thin, and the topic has not been well explored. We conducted an assessment of 27 students' conceptions of…

  4. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections.

    Directory of Open Access Journals (Sweden)

    Pimlapas Leekitcharoenphon

    Full Text Available Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely source of the human infections.

  5. Investigating Salmonella Eko from Various Sources in Nigeria by Whole Genome Sequencing to Identify the Source of Human Infections

    DEFF Research Database (Denmark)

    Leekitcharoenphon, Pimlapas; Raufu, Ibrahim; Thorup Nielsen, Mette

    2016-01-01

    Twenty-six Salmonella enterica serovar Eko isolated from various sources in Nigeria were investigated by whole genome sequencing to identify the source of human infections. Diversity among the isolates was observed and camel and cattle were identified as the primary reservoirs and the most likely...

  6. Exome sequencing of Pakistani consanguineous families identifies 30 novel candidate genes for recessive intellectual disability.

    Science.gov (United States)

    Riazuddin, S; Hussain, M; Razzaq, A; Iqbal, Z; Shahzad, M; Polla, D L; Song, Y; van Beusekom, E; Khan, A A; Tomas-Roca, L; Rashid, M; Zahoor, M Y; Wissink-Lindhout, W M; Basra, M A R; Ansar, M; Agha, Z; van Heeswijk, K; Rasheed, F; Van de Vorst, M; Veltman, J A; Gilissen, C; Akram, J; Kleefstra, T; Assir, M Z; Grozeva, D; Carss, K; Raymond, F L; O'Connor, T D; Riazuddin, S A; Khan, S N; Ahmed, Z M; de Brouwer, A P M; van Bokhoven, H; Riazuddin, S

    2017-11-01

    Intellectual disability (ID) is a clinically and genetically heterogeneous disorder, affecting 1-3% of the general population. Although research into the genetic causes of ID has recently gained momentum, identification of pathogenic mutations that cause autosomal recessive ID (ARID) has lagged behind, predominantly due to non-availability of sizeable families. Here we present the results of exome sequencing in 121 large consanguineous Pakistani ID families. In 60 families, we identified homozygous or compound heterozygous DNA variants in a single gene, 30 affecting reported ID genes and 30 affecting novel candidate ID genes. Potential pathogenicity of these alleles was supported by co-segregation with the phenotype, low frequency in control populations and the application of stringent bioinformatics analyses. In another eight families segregation of multiple pathogenic variants was observed, affecting 19 genes that were either known or are novel candidates for ID. Transcriptome profiles of normal human brain tissues showed that the novel candidate ID genes formed a network significantly enriched for transcriptional co-expression (P<0.0001) in the frontal cortex during fetal development and in the temporal-parietal and sub-cortex during infancy through adulthood. In addition, proteins encoded by 12 novel ID genes directly interact with previously reported ID proteins in six known pathways essential for cognitive function (P<0.0001). These results suggest that disruptions of temporal parietal and sub-cortical neurogenesis during infancy are critical to the pathophysiology of ID. These findings further expand the existing repertoire of genes involved in ARID, and provide new insights into the molecular mechanisms and the transcriptome map of ID.

  7. A bioinformatics approach for identifying transgene insertion sites using whole genome sequencing data.

    Science.gov (United States)

    Park, Doori; Park, Su-Hyun; Ban, Yong Wook; Kim, Youn Shic; Park, Kyoung-Cheul; Kim, Nam-Soo; Kim, Ju-Kon; Choi, Ik-Young

    2017-08-15

    Genetically modified crops (GM crops) have been developed to improve the agricultural traits of modern crop cultivars. Safety assessments of GM crops are of paramount importance in research at developmental stages and before releasing transgenic plants into the marketplace. Sequencing technology is developing rapidly, with higher output and labor efficiencies, and will eventually replace existing methods for the molecular characterization of genetically modified organisms. To detect the transgenic insertion locations in the three GM rice gnomes, Illumina sequencing reads are mapped and classified to the rice genome and plasmid sequence. The both mapped reads are classified to characterize the junction site between plant and transgene sequence by sequence alignment. Herein, we present a next generation sequencing (NGS)-based molecular characterization method, using transgenic rice plants SNU-Bt9-5, SNU-Bt9-30, and SNU-Bt9-109. Specifically, using bioinformatics tools, we detected the precise insertion locations and copy numbers of transfer DNA, genetic rearrangements, and the absence of backbone sequences, which were equivalent to results obtained from Southern blot analyses. NGS methods have been suggested as an effective means of characterizing and detecting transgenic insertion locations in genomes. Our results demonstrate the use of a combination of NGS technology and bioinformatics approaches that offers cost- and time-effective methods for assessing the safety of transgenic plants.

  8. The complete genomic sequence of a tentative new polerovirus identified in barley in South Korea.

    Science.gov (United States)

    Zhao, Fumei; Lim, Seungmo; Yoo, Ran Hee; Igori, Davaajargal; Kim, Sang-Min; Kwak, Do Yeon; Kim, Sun Lim; Lee, Bong Choon; Moon, Jae Sun

    2016-07-01

    The complete nucleotide sequence of a new barley polerovirus, tentatively named barley virus G (BVG), which was isolated in Gimje, South Korea, has been determined using an RNA sequencing technique combined with polymerase chain reaction methods. The viral genomic RNA of BVG is 5,620 nucleotides long and contains six typical open reading frames commonly observed in other poleroviruses. Sequence comparisons revealed that BVG is most closely related to maize yellow dwarf virus-RMV, with the highest amino acid identities being less than 90 % for all of the corresponding proteins. These results suggested that BVG is a member of a new species in the genus Polerovirus.

  9. Sequencing of sporadic Attention-Deficit Hyperactivity Disorder (ADHD) identifies novel and potentially pathogenic de novo variants and excludes overlap with genes associated with autism spectrum disorder.

    Science.gov (United States)

    Kim, Daniel Seung; Burt, Amber A; Ranchalis, Jane E; Wilmot, Beth; Smith, Joshua D; Patterson, Karynne E; Coe, Bradley P; Li, Yatong K; Bamshad, Michael J; Nikolas, Molly; Eichler, Evan E; Swanson, James M; Nigg, Joel T; Nickerson, Deborah A; Jarvik, Gail P

    2017-06-01

    Attention-Deficit Hyperactivity Disorder (ADHD) has high heritability; however, studies of common variation account for ADHD variance. Using data from affected participants without a family history of ADHD, we sought to identify de novo variants that could account for sporadic ADHD. Considering a total of 128 families, two analyses were conducted in parallel: first, in 11 unaffected parent/affected proband trios (or quads with the addition of an unaffected sibling) we completed exome sequencing. Six de novo missense variants at highly conserved bases were identified and validated from four of the 11 families: the brain-expressed genes TBC1D9, DAGLA, QARS, CSMD2, TRPM2, and WDR83. Separately, in 117 unrelated probands with sporadic ADHD, we sequenced a panel of 26 genes implicated in intellectual disability (ID) and autism spectrum disorder (ASD) to evaluate whether variation in ASD/ID-associated genes were also present in participants with ADHD. Only one putative deleterious variant (Gln600STOP) in CHD1L was identified; this was found in a single proband. Notably, no other nonsense, splice, frameshift, or highly conserved missense variants in the 26 gene panel were identified and validated. These data suggest that de novo variant analysis in families with independently adjudicated sporadic ADHD diagnosis can identify novel genes implicated in ADHD pathogenesis. Moreover, that only one of the 128 cases (0.8%, 11 exome, and 117 MIP sequenced participants) had putative deleterious variants within our data in 26 genes related to ID and ASD suggests significant independence in the genetic pathogenesis of ADHD as compared to ASD and ID phenotypes. © 2017 Wiley Periodicals, Inc. © 2017 Wiley Periodicals, Inc.

  10. Use of geometric morphometrics to identify ecophenotypic variation of juvenile Persian sturgeon Acipenser persicus

    Directory of Open Access Journals (Sweden)

    Shima Bakhshalizadeh

    2017-06-01

    Full Text Available Study of phenotypic variation is essential for identifying discrete phenotypic stocks. We sampled immature Persian sturgeon from the eastern and western portion of the southern Caspian Sea to test for morphological differences that could predict the ecophenotypic variation of Persian sturgeon. Geometric morphometric methods were used to quantify body shape. Configuration of landmark coordinates of fish body were scaled, translated and rotated using generalized Procrustes analysis, followed by univariate analysis of variance of resulting shape coordinates to evaluate potential morphological differences between regions. A principal component analysis was carried out to reduce the number of dimensions without the loss of information. The discriminate function analysis was performed to determine the efficacy of body landmarks for discrimination by geographic variants. Within-group linkage was inferred for dendrogram clusters using Pearson correlation distance on the basis of the average linkage method as a complement for discriminate analysis. Principle component analysis revealed that the largest differences were in body size. Most notable were differences in distance between head landmarks and the dorsal fin between eastern and western regions. Fish from the western region exhibited a longer distance from head landmarks to the dorsal fin than fish from the eastern region. Furthermore, the ventral portion of fish from the western region was longer than that of the eastern individuals. These findings show that juvenile Persian sturgeon already possess morphological traits that can be used to discriminate fish from different regions. Furthermore, these differences are discernible in spite of the volume of artificially-inseminated sturgeon larva that have been released during the past 40 years.

  11. Genomic and transcriptome profiling identified both human and HBV genetic variations and their interactions in Chinese hepatocellular carcinoma

    Directory of Open Access Journals (Sweden)

    Hua Dong

    2015-12-01

    Full Text Available Interaction between HBV and host genome integrations in hepatocellular carcinoma (HCC development is a complex process and the mechanism is still unclear. Here we described in details the quality controls and data mining of aCGH and transcriptome sequencing data on 50 HCC samples from the Chinese patients, published by Dong et al. (2015 (GEO#: GSE65486. In additional to the HBV-MLL4 integration discovered, we also investigated the genetic aberrations of HBV and host genes as well as their genetic interactions. We reported human genome copy number changes and frequent transcriptome variations (e.g. TP53, CTNNB1 mutation, especially MLL family mutations in this cohort of the patients. For HBV genotype C, we identified a novel linkage disequilibrium region covering HBV replication regulatory elements, including basal core promoter, DR1, epsilon and poly-A regions, which is associated with HBV core antigen over-expression and almost exclusive to HBV-MLL4 integration.

  12. The Comparison of Biochemical and Sequencing 16S rDNA Gene Methods to Identify Nontuberculous Mycobacteria

    Directory of Open Access Journals (Sweden)

    Shafipour1, M.

    2014-11-01

    Full Text Available The identification of Mycobacteria in the species level has great medical importance. Biochemical tests are laborious and time-consuming, so new techniques could be used to identify the species. This research aimed to the comparison of biochemical and sequencing 16S rDNA gene methods to identify nontuberculous Mycobacteria in patients suspected to tuberculosis in Golestan province which is the most prevalent region of tuberculosis in Iran. Among 3336 patients suspected to tuberculosis referred to hospitals and health care centres in Golestan province during 2010-2011, 319 (9.56% culture positive cases were collected. Identification of species by using biochemical tests was done. On the samples recognized as nontuberculous Mycobacteria, after DNA extraction by boiling, 16S rDNA PCR was done and their sequencing were identified by NCBI BLAST. Of the 319 positive samples in Golestan Province, 300 cases were M.tuberculosis and 19 cases (5.01% were identified as nontuberculous Mycobacteria by biochemical tests. 15 out of 19 nontuberculous Mycobacteria were identified by PCR and sequencing method as similar by biochemical methods (similarity rate: 78.9%. But after PCR, 1 case known as M.simiae by biochemical test was identified as M. lentiflavum and 3 other cases were identified as Nocardia. Biochemical methods corresponded to the 16S rDNA PCR and sequencing in 78.9% of cases. However, in identification of M. lentiflavum and Nocaria sp. the molecular method is better than biochemical methods.

  13. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Science.gov (United States)

    Greub, Gilbert; Kebbi-Beghdadi, Carole; Bertelli, Claire; Collyn, François; Riederer, Beat M; Yersin, Camille; Croxatto, Antony; Raoult, Didier

    2009-12-23

    With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  14. ATRX mutation in two adult brothers with non-specific moderate intellectual disability identified by exome sequencing

    OpenAIRE

    Moncini, S.; Bedeschi, M.F.; Castronovo, P.; Crippa, M.; Calvello, M.; Garghentino, R.R.; Scuvera, G.; Finelli, P.; Venturin, M.

    2013-01-01

    In this report, we describe two adult brothers affected by moderate non-specific intellectual disability (ID). They showed minor facial anomalies, not clearly ascribable to any specific syndromic patterns, microcephaly, brachydactyly and broad toes. Both brothers presented seizures. Karyotype, subtelomeric and FMR1 analysis were normal in both cases. We performed array-CGH analysis that revealed no copy-number variations potentially associated with ID. Subsequent exome sequence analysis allow...

  15. Mutations in the newly identified RAX regulatory sequence are not a frequent cause of micro/anophthalmia.

    Science.gov (United States)

    Chassaing, Nicolas; Vigouroux, Adeline; Calvas, Patrick

    2009-06-01

    Microphthalmia and anophthalmia are at the severe end of the spectrum of abnormalities in ocular development. A few genes (SOX2, OTX2, RAX, and CHX10) have been implicated in isolated micro/anophthalmia, but causative mutations of these genes explain less than a quarter of these developmental defects. A specifically conserved SOX2/OTX2-mediated RAX expression regulatory sequence has recently been identified. We postulated that mutations in this sequence could lead to micro/anophthalmia, and thus we performed molecular screening of this regulatory element in patients suffering from micro/anophthalmia. Fifty-one patients suffering from nonsyndromic microphthalmia (n = 40) or anophthalmia (n = 11) were included in this study after negative molecular screening for SOX2, OTX2, RAX, and CHX10 mutations. Mutation screening of the RAX regulatory sequence was performed by direct sequencing for these patients. No mutations were identified in the highly conserved RAX regulatory sequence in any of the 51 patients. Mutations in the newly identified RAX regulatory sequence do not represent a frequent cause of nonsyndromic micro/anophthalmia.

  16. Sequence variation in mitochondrial cox1 and nad1 genes of ascaridoid nematodes in cats and dogs from Iran.

    Science.gov (United States)

    Mikaeili, F; Mirhendi, H; Mohebali, M; Hosseini, M; Sharbatkhori, M; Zarei, Z; Kia, E B

    2015-07-01

    The study was conducted to determine the sequence variation in two mitochondrial genes, namely cytochrome c oxidase 1 (pcox1) and NADH dehydrogenase 1 (pnad1) within and among isolates of Toxocara cati, Toxocara canis and Toxascaris leonina. Genomic DNA was extracted from 32 isolates of T. cati, 9 isolates of T. canis and 19 isolates of T. leonina collected from cats and dogs in different geographical areas of Iran. Mitochondrial genes were amplified by polymerase chain reaction (PCR) and sequenced. Sequence data were aligned using the BioEdit software and compared with published sequences in GenBank. Phylogenetic analysis was performed using Bayesian inference and maximum likelihood methods. Based on pairwise comparison, intra-species genetic diversity within Iranian isolates of T. cati, T. canis and T. leonina amounted to 0-2.3%, 0-1.3% and 0-1.0% for pcox1 and 0-2.0%, 0-1.7% and 0-2.6% for pnad1, respectively. Inter-species sequence variation among the three ascaridoid nematodes was significantly higher, being 9.5-16.6% for pcox1 and 11.9-26.7% for pnad1. Sequence and phylogenetic analysis of the pcox1 and pnad1 genes indicated that there is significant genetic diversity within and among isolates of T. cati, T. canis and T. leonina from different areas of Iran, and these genes can be used for studying genetic variation of ascaridoid nematodes.

  17. RNA Sequencing Analysis Reveals Transcriptomic Variations in Tobacco (Nicotiana tabacum Leaves Affected by Climate, Soil, and Tillage Factors

    Directory of Open Access Journals (Sweden)

    Bo Lei

    2014-04-01

    Full Text Available The growth and development of plants are sensitive to their surroundings. Although numerous studies have analyzed plant transcriptomic variation, few have quantified the effect of combinations of factors or identified factor-specific effects. In this study, we performed RNA sequencing (RNA-seq analysis on tobacco leaves derived from 10 treatment combinations of three groups of ecological factors, i.e., climate factors (CFs, soil factors (SFs, and tillage factors (TFs. We detected 4980, 2916, and 1605 differentially expressed genes (DEGs that were affected by CFs, SFs, and TFs, which included 2703, 768, and 507 specific and 703 common DEGs (simultaneously regulated by CFs, SFs, and TFs, respectively. GO and KEGG enrichment analyses showed that genes involved in abiotic stress responses and secondary metabolic pathways were overrepresented in the common and CF-specific DEGs. In addition, we noted enrichment in CF-specific DEGs related to the circadian rhythm, SF-specific DEGs involved in mineral nutrient absorption and transport, and SF- and TF-specific DEGs associated with photosynthesis. Based on these results, we propose a model that explains how plants adapt to various ecological factors at the transcriptomic level. Additionally, the identified DEGs lay the foundation for future investigations of stress resistance, circadian rhythm and photosynthesis in tobacco.

  18. Salmonella Persistence in Tomatoes Requires a Distinct Set of Metabolic Functions Identified by Transposon Insertion Sequencing

    Science.gov (United States)

    Desai, Prerak; Porwollik, Steffen; Canals, Rocio; Perez, Daniel R.; Chu, Weiping; McClelland, Michael; Teplitski, Max

    2016-01-01

    ABSTRACT Human enteric pathogens, such as Salmonella spp. and verotoxigenic Escherichia coli, are increasingly recognized as causes of gastroenteritis outbreaks associated with the consumption of fruits and vegetables. Persistence in plants represents an important part of the life cycle of these pathogens. The identification of the full complement of Salmonella genes involved in the colonization of the model plant (tomato) was carried out using transposon insertion sequencing analysis. With this approach, 230,000 transposon insertions were screened in tomato pericarps to identify loci with reduction in fitness, followed by validation of the screen results using competition assays of the isogenic mutants against the wild type. A comparison with studies in animals revealed a distinct plant-associated set of genes, which only partially overlaps with the genes required to elicit disease in animals. De novo biosynthesis of amino acids was critical to persistence within tomatoes, while amino acid scavenging was prevalent in animal infections. Fitness reduction of the Salmonella amino acid synthesis mutants was generally more severe in the tomato rin mutant, which hyperaccumulates certain amino acids, suggesting that these nutrients remain unavailable to Salmonella spp. within plants. Salmonella lipopolysaccharide (LPS) was required for persistence in both animals and plants, exemplifying some shared pathogenesis-related mechanisms in animal and plant hosts. Similarly to phytopathogens, Salmonella spp. required biosynthesis of amino acids, LPS, and nucleotides to colonize tomatoes. Overall, however, it appears that while Salmonella shares some strategies with phytopathogens and taps into its animal virulence-related functions, colonization of tomatoes represents a distinct strategy, highlighting this pathogen's flexible metabolism. IMPORTANCE Outbreaks of gastroenteritis caused by human pathogens have been increasingly associated with foods of plant origin, with tomatoes

  19. Exome Sequencing Identifies a Novel MAP3K14 Mutation in Recessive Atypical Combined Immunodeficiency

    Directory of Open Access Journals (Sweden)

    Nikola Schlechter

    2017-11-01

    Full Text Available Primary immunodeficiency disorders (PIDs render patients vulnerable to infection with a wide range of microorganisms and thus provide good in vivo models for the assessment of immune responses during infectious challenges. Priming of the immune system, especially in infancy, depends on different environmental exposures and medical practices. This may determine the timing and phenotype of clinical appearance of immune deficits as exemplified with early exposure to Bacillus Calmette-Guérin (BCG vaccination and dissemination in combined immunodeficiencies. Varied phenotype expression poses a challenge to identification of the putative immune deficit. Without the availability of genomic diagnosis and data analysis resources and with limited capacity for functional definition of immune pathways, it is difficult to establish a definitive diagnosis and to decide on appropriate treatment. This study describes the use of exome sequencing to identify a homozygous recessive variant in MAP3K14, NIKVal345Met, in a patient with combined immunodeficiency, disseminated BCG-osis, and paradoxically elevated lymphocytes. Laboratory testing confirmed hypogammaglobulinemia with normal CD19, but failed to confirm a definitive diagnosis for targeted treatment decisions. NIKVal345Met is predicted to be deleterious and pathogenic by two in silico prediction tools and is situated in a gene crucial for effective functioning of the non-canonical nuclear factor-kappa B signaling pathway. Functional analysis of NIKVal345Met- versus NIKWT-transfected human embryonic kidney-293T cells showed that this mutation significantly affects the kinase activity of NIK leading to decreased levels of phosphorylated IkappaB kinase-alpha (IKKα, the target of NIK. BCG-stimulated RAW264.7 cells transfected with NIKVal345Met also presented with reduced levels of phosphorylated IKKα, significantly increased p100 levels and significantly decreased p52 levels compared to cells transfected

  20. Population clustering based on copy number variations detected from next generation sequencing data.

    Science.gov (United States)

    Duan, Junbo; Zhang, Ji-Gang; Wan, Mingxi; Deng, Hong-Wen; Wang, Yu-Ping

    2014-08-01

    Copy number variations (CNVs) can be used as significant bio-markers and next generation sequencing (NGS) provides a high resolution detection of these CNVs. But how to extract features from CNVs and further apply them to genomic studies such as population clustering have become a big challenge. In this paper, we propose a novel method for population clustering based on CNVs from NGS. First, CNVs are extracted from each sample to form a feature matrix. Then, this feature matrix is decomposed into the source matrix and weight matrix with non-negative matrix factorization (NMF). The source matrix consists of common CNVs that are shared by all the samples from the same group, and the weight matrix indicates the corresponding level of CNVs from each sample. Therefore, using NMF of CNVs one can differentiate samples from different ethnic groups, i.e. population clustering. To validate the approach, we applied it to the analysis of both simulation data and two real data set from the 1000 Genomes Project. The results on simulation data demonstrate that the proposed method can recover the true common CNVs with high quality. The results on the first real data analysis show that the proposed method can cluster two family trio with different ancestries into two ethnic groups and the results on the second real data analysis show that the proposed method can be applied to the whole-genome with large sample size consisting of multiple groups. Both results demonstrate the potential of the proposed method for population clustering.

  1. Unexpected tolerance of alpha-cleavage of the prion protein to sequence variations.

    Directory of Open Access Journals (Sweden)

    José B Oliveira-Martins

    Full Text Available The cellular form of the prion protein, PrP(C, undergoes extensive proteolysis at the alpha site (109K [see text]H110. Expression of non-cleavable PrP(C mutants in transgenic mice correlates with neurotoxicity, suggesting that alpha-cleavage is important for PrP(C physiology. To gain insights into the mechanisms of alpha-cleavage, we generated a library of PrP(C mutants with mutations in the region neighbouring the alpha-cleavage site. The prevalence of C1, the carboxy adduct of alpha-cleavage, was determined for each mutant. In cell lines of disparate origin, C1 prevalence was unaffected by variations in charge and hydrophobicity of the region neighbouring the alpha-cleavage site, and by substitutions of the residues in the palindrome that flanks this site. Instead, alpha-cleavage was size-dependently impaired by deletions within the domain 106-119. Almost no cleavage was observed upon full deletion of this domain. These results suggest that alpha-cleavage is executed by an alpha-PrPase whose activity, despite surprisingly limited sequence specificity, is dependent on the size of the central region of PrP(C.

  2. Using an online genome resource to identify myostatin variation in U.S. sheep

    Science.gov (United States)

    We created a public, searchable DNA sequence resource for sheep that contained approximately 14x whole genome sequence of 96 rams. The animals represent 10 popular U.S. breeds and share minimal pedigree relationships, making the resource suitable for viewing gene variants in the user-friendly Integ...

  3. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Science.gov (United States)

    Ping, Zheng; Siegal, Gene P.; Almeida, Jonas S.; Schnitt, Stuart J.; Shen, Dejun

    2014-01-01

    Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA) is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer. PMID:24672738

  4. Mining genome sequencing data to identify the genomic features linked to breast cancer histopathology

    Directory of Open Access Journals (Sweden)

    Zheng Ping

    2014-01-01

    Full Text Available Background: Genetics and genomics have radically altered our understanding of breast cancer progression. However, the genomic basis of various histopathologic features of breast cancer is not yet well-defined. Materials and Methods: The Cancer Genome Atlas (TCGA is an international database containing a large collection of human cancer genome sequencing data. cBioPortal is a web tool developed for mining these sequencing data. We performed mining of TCGA sequencing data in an attempt to characterize the genomic features correlated with breast cancer histopathology. We first assessed the quality of the TCGA data using a group of genes with known alterations in various cancers. Both genome-wide gene mutation and copy number changes as well as a group of genes with a high frequency of genetic changes were then correlated with various histopathologic features of invasive breast cancer. Results: Validation of TCGA data using a group of genes with known alterations in breast cancer suggests that the TCGA has accurately documented the genomic abnormalities of multiple malignancies. Further analysis of TCGA breast cancer sequencing data shows that accumulation of specific genomic defects is associated with higher tumor grade, larger tumor size and receptor negativity. Distinct groups of genomic changes were found to be associated with the different grades of invasive ductal carcinoma. The mutator role of the TP53 gene was validated by genomic sequencing data of invasive breast cancer and TP53 mutation was found to play a critical role in defining high tumor grade. Conclusions: Data mining of the TCGA genome sequencing data is an innovative and reliable method to help characterize the genomic abnormalities associated with histopathologic features of invasive breast cancer.

  5. Sedimentology, sequence-stratigraphy, and geochemical variations in the Mesoproterozoic Nonesuch Formation, northern Wisconsin, USA

    Science.gov (United States)

    Kingsbury Stewart, Esther; Mauk, Jeffrey L.

    2017-01-01

    We use core descriptions and portable X-ray fluorescence analyses to identify lithofacies and stratigraphic surfaces for the Mesoproterozoic Nonesuch Formation within the Ashland syncline, Wisconsin. We group lithofacies into facies associations and construct a sequence stratigraphic framework based on lithofacies stacking and stratigraphic surfaces. The fluvial-alluvial facies association (upper Copper Harbor Conglomerate) is overlain across a transgressive surface by the fluctuating-profundal facies association (lower Nonesuch Formation). The fluctuating-profundal facies association comprises a retrogradational sequence set overlain across a maximum flooding surface by an aggradational-progradational sequence set comprising fluctuating-profundal, fluvial-lacustrine, and fluvial-alluvial facies associations (middle Nonesuch through lower Freda Formations). Lithogeochemistry supports sedimentologic and stratigraphic interpretations. Fe/S molar ratios reflect the oxidation state of the lithofacies; values are most depleted above the maximum flooding surface where lithofacies are chemically reduced and are greatest in the chemically oxidized lithofacies. Si/Al and Zr/Al molar ratios reflect the relative abundance of detrital heavy minerals vs. clay minerals; greater values correlate with larger grain size. Vertical facies association stacking records depositional environments that evolved from fluvial and alluvial, to balanced-fill lake, to overfilled lake, and returning to fluvial and alluvial. Elsewhere in the basin, where accommodation was greatest, some volume of fluvial-lacustrine facies is likely present below the transgressive stratigraphic surface. This succession of continental and lake-basin types indicates a predominant tectonic driver of basin evolution. Lithofacies distribution and geochemistry indicate deposition within an asymmetric half-graben bounded on the east by a west-dipping growth fault. While facies assemblages are lacustrine and continental

  6. Sequence Variation in Rhoptry Neck Protein 10 Gene among Toxoplasma gondii Isolates from Different Hosts and Geographical Locations

    Directory of Open Access Journals (Sweden)

    Yu ZHAO

    2017-09-01

    Full Text Available Background: Toxoplasma gondii, as a eukaryotic parasite of the phylum Apicomplexa, can infect almost all the warm-blooded animals and humans, causing toxoplasmosis. Rhoptry neck proteins (RONs play a key role in the invasion process of T. gondii and are potential vaccine candidate molecules against toxoplasmosis.Methods: The present study examined sequence variation in the rhoptry neck protein 10 (TgRON10 gene among 10 T. gondii isolates from different hosts and geographical locations from Lanzhou province during 2014, and compared with the corresponding sequences of strains ME49 and VEG obtained from the ToxoDB database, using polymerase chain reaction (PCR amplification, sequence analysis, and phylogenetic reconstruction by Bayesian inference (BI and maximum parsimony (MP. Results: Analysis of all the 12 TgRON10 genomic and cDNA sequences revealed 7 exons and 6 introns in the TgRON10 gDNA. The complete genomic sequence of the TgRON10 gene ranged from 4759 bp to 4763 bp, and sequence variation was 0-0.6% among the 12 T. gondii isolates, indicating a low sequence variation in TgRON10 gene. Phylogenetic analysis of TgRON10 sequences showed that the cluster of the 12 T. gondii isolates was not completely consistent with their respective genotypes.Conclusion: TgRON10 gene is not a suitable genetic marker for the differentiation of T. gondii isolates from different hosts and geographical locations, but may represent a potential vaccine candidate against toxoplasmosis, worth further studies.

  7. An Efficient Method for Identifying Gene Fusions by Targeted RNA Sequencing from Fresh Frozen and FFPE Samples.

    Directory of Open Access Journals (Sweden)

    Jonathan A Scolnick

    Full Text Available Fusion genes are known to be key drivers of tumor growth in several types of cancer. Traditionally, detecting fusion genes has been a difficult task based on fluorescent in situ hybridization to detect chromosomal abnormalities. More recently, RNA sequencing has enabled an increased pace of fusion gene identification. However, RNA-Seq is inefficient for the identification of fusion genes due to the high number of sequencing reads needed to detect the small number of fusion transcripts present in cells of interest. Here we describe a method, Single Primer Enrichment Technology (SPET, for targeted RNA sequencing that is customizable to any target genes, is simple to use, and efficiently detects gene fusions. Using SPET to target 5701 exons of 401 known cancer fusion genes for sequencing, we were able to identify known and previously unreported gene fusions from both fresh-frozen and formalin-fixed paraffin-embedded (FFPE tissue RNA in both normal tissue and cancer cells.

  8. Exome sequencing identifies NBEAL2 as the causative gene for gray platelet syndrome

    NARCIS (Netherlands)

    Albers, Cornelis A.; Cvejic, Ana; Favier, Rémi; Bouwmans, Evelien E.; Alessi, Marie-Christine; Bertone, Paul; Jordan, Gregory; Kettleborough, Ross N. W.; Kiddle, Graham; Kostadima, Myrto; Read, Randy J.; Sipos, Botond; Sivapalaratnam, Suthesh; Smethurst, Peter A.; Stephens, Jonathan; Voss, Katrin; Nurden, Alan; Rendon, Augusto; Nurden, Paquita; Ouwehand, Willem H.

    2011-01-01

    Gray platelet syndrome (GPS) is a predominantly recessive platelet disorder that is characterized by mild thrombocytopenia with large platelets and a paucity of α-granules; these abnormalities cause mostly moderate but in rare cases severe bleeding. We sequenced the exomes of four unrelated

  9. Identifying Learning Behaviors by Contextualizing Differential Sequence Mining with Action Features and Performance Evolution

    Science.gov (United States)

    Kinnebrew, John S.; Biswas, Gautam

    2012-01-01

    Our learning-by-teaching environment, Betty's Brain, captures a wealth of data on students' learning interactions as they teach a virtual agent. This paper extends an exploratory data mining methodology for assessing and comparing students' learning behaviors from these interaction traces. The core algorithm employs sequence mining techniques to…

  10. Use of microsatellite markers derived from whole genome sequence data for identifying polymorphism in Phytophthora ramorum

    Science.gov (United States)

    Kelly Ivors; Matteo Garbelotto; Ineke De Vries; Peter Bonants

    2006-01-01

    Investigating the population genetics of Phytophthora ramorum, the causal agent of sudden oak death (SOD), is critical to understanding the biology and epidemiology of this important phytopathogen. Raw sequence data (445,000 reads) of P. ramorum was provided by the Joint Genome Institute. Our objective was to develop and utilize...

  11. Sequence analysis of the its-2 region: a tool to identify strains of Scenedesmus (Chlorophyceae)

    NARCIS (Netherlands)

    Van Hannen, E.J.; Lürling, M.; Van Donk, E.

    2000-01-01

    The genetic distances between several strains of Senedesmus obliquus (Turp,) Kutz,, S, acutus Hortobagyi, and S, naegelii Chod. calculated from ITS-2 sequences were found to be smaller than the genetic distances within other strains of Scenedesmus-that is, in S, acuminatus (Lagerh,) Chod, and S,

  12. Whole-genome sequencing and comprehensive molecular profiling identify new driver mutations in gastric cancer

    NARCIS (Netherlands)

    Wang, Kai; Yuen, Siu Tsan; Xu, Jiangchun; Lee, Siu Po; Yan, Helen H N; Shi, Stephanie T; Siu, Hoi Cheong; Deng, Shibing; Chu, Kent Man; Law, Simon; Chan, Kok Hoe; Chan, Annie S Y; Tsui, Wai Yin; Ho, Siu Lun; Chan, Anthony K W; Man, Jonathan L K; Foglizzo, Valentina; Ng, Man Kin; Chan, April S; Ching, Yick Pang; Cheng, Grace H W; Xie, Tao; Fernandez, Julio; Li, Vivian S W; Clevers, Hans; Rejto, Paul A; Mao, Mao; Leung, Suet Yi

    Gastric cancer is a heterogeneous disease with diverse molecular and histological subtypes. We performed whole-genome sequencing in 100 tumor-normal pairs, along with DNA copy number, gene expression and methylation profiling, for integrative genomic analysis. We found subtype-specific genetic and

  13. Diverse Array of New Viral Sequences Identified in Worldwide Populations of the Asian Citrus Psyllid (Diaphorina citri) Using Viral Metagenomics.

    Science.gov (United States)

    Nouri, Shahideh; Salem, Nidá; Nigg, Jared C; Falk, Bryce W

    2015-12-16

    The Asian citrus psyllid, Diaphorina citri, is the natural vector of the causal agent of Huanglongbing (HLB), or citrus greening disease. Together; HLB and D. citri represent a major threat to world citrus production. As there is no cure for HLB, insect vector management is considered one strategy to help control the disease, and D. citri viruses might be useful. In this study, we used a metagenomic approach to analyze viral sequences associated with the global population of D. citri. By sequencing small RNAs and the transcriptome coupled with bioinformatics analysis, we showed that the virus-like sequences of D. citri are diverse. We identified novel viral sequences belonging to the picornavirus superfamily, the Reoviridae, Parvoviridae, and Bunyaviridae families, and an unclassified positive-sense single-stranded RNA virus. Moreover, a Wolbachia prophage-related sequence was identified. This is the first comprehensive survey to assess the viral community from worldwide populations of an agricultural insect pest. Our results provide valuable information on new putative viruses, some of which may have the potential to be used as biocontrol agents. Insects have the most species of all animals, and are hosts to, and vectors of, a great variety of known and unknown viruses. Some of these most likely have the potential to be important fundamental and/or practical resources. In this study, we used high-throughput next-generation sequencing (NGS) technology and bioinformatics analysis to identify putative viruses associated with Diaphorina citri, the Asian citrus psyllid. D. citri is the vector of the bacterium causing Huanglongbing (HLB), currently the most serious threat to citrus worldwide. Here, we report several novel viral sequences associated with D. citri. Copyright © 2016, American Society for Microbiology. All Rights Reserved.

  14. CYP2D7 sequence variation interferes with TaqMan CYP2D6*15 and *35 genotyping

    Directory of Open Access Journals (Sweden)

    Amanda K Riffel

    2016-01-01

    Full Text Available TaqMan™ genotyping assays are widely used to genotype CYP2D6, which encodes a major drug metabolizing enzyme. Assay design for CYP2D6 can be challenging owing to the presence of two pseudogenes, CYP2D7 and CYP2D8, structural and copy number variation and numerous single nucleotide polymorphisms (SNPs some of which reflect the wild-type sequence of the CYP2D7 pseudogene. The aim of this study was to identify the mechanism causing false positive CYP2D6*15 calls and remediate those by redesigning and validating alternative TaqMan genotype assays. Among 13,866 DNA samples genotyped by the CompanionDx® lab on the OpenArray platform, 70 samples were identified as heterozygotes for 137Tins, the key SNP of CYP2D6*15. However, only 15 samples were confirmed when tested with the Luminex xTAG CYP2D6 Kit and sequencing of CYP2D6-specific long range (XL-PCR products. Genotype and gene resequencing of CYP2D6 and CYP2D7-specific XL-PCR products revealed a CC>GT dinucleotide SNP in exon 1 of CYP2D7 that reverts the sequence to CYP2D6 and allows a TaqMan assay PCR primer to bind. Because CYP2D7 also carries a Tins, a false-positive mutation signal is generated. This CYP2D7 SNP was also responsible for generating false-positive signals for rs769258 (CYP2D6*35 which is also located in exon 1. Although alternative CYP2D6*15 and *35 assays resolved the issue, we discovered a novel CYP2D6*15 subvariant in one sample that carries additional SNPs preventing detection with the alternate assay. The frequency of CYP2D6*15 was 0.1% in this ethnically diverse U.S. population sample. In addition, we also discovered linkage between the CYP2D7 CC>GT dinucleotide SNP and the 77G>A (rs28371696 SNP of CYP2D6*43. The frequency of this tentatively functional allele was 0.2%. Taken together, these findings emphasize that regardless of how careful genotyping assays are designed and evaluated before being commercially marketed, rare or unknown SNPs underneath primer and/or probe

  15. Single nucleotide variants and InDels identified from whole-genome re-sequencing of Guzerat, Gyr, Girolando and Holstein cattle breeds.

    Directory of Open Access Journals (Sweden)

    Nedenia Bonvino Stafuzza

    Full Text Available Whole-genome re-sequencing, alignment and annotation analyses were undertaken for 12 sires representing four important cattle breeds in Brazil: Guzerat (multi-purpose, Gyr, Girolando and Holstein (dairy production. A total of approximately 4.3 billion reads from an Illumina HiSeq 2000 sequencer generated for each animal 10.7 to 16.4-fold genome coverage. A total of 27,441,279 single nucleotide variations (SNVs and 3,828,041 insertions/deletions (InDels were detected in the samples, of which 2,557,670 SNVs and 883,219 InDels were novel. The submission of these genetic variants to the dbSNP database significantly increased the number of known variants, particularly for the indicine genome. The concordance rate between genotypes obtained using the Bovine HD BeadChip array and the same variants identified by sequencing was about 99.05%. The annotation of variants identified numerous non-synonymous SNVs and frameshift InDels which could affect phenotypic variation. Functional enrichment analysis was performed and revealed that variants in the olfactory transduction pathway was over represented in all four cattle breeds, while the ECM-receptor interaction pathway was over represented in Girolando and Guzerat breeds, the ABC transporters pathway was over represented only in Holstein breed, and the metabolic pathways was over represented only in Gyr breed. The genetic variants discovered here provide a rich resource to help identify potential genomic markers and their associated molecular mechanisms that impact economically important traits for Gyr, Girolando, Guzerat and Holstein breeding programs.

  16. Genome-wide association study identified genetic variations and candidate genes for plant architecture component traits in Chinese upland cotton.

    Science.gov (United States)

    Su, Junji; Li, Libei; Zhang, Chi; Wang, Caixiang; Gu, Lijiao; Wang, Hantao; Wei, Hengling; Liu, Qibao; Huang, Long; Yu, Shuxun

    2018-06-01

    Thirty significant associations between 22 SNPs and five plant architecture component traits in Chinese upland cotton were identified via GWAS. Four peak SNP loci located on chromosome D03 were simultaneously associated with more plant architecture component traits. A candidate gene, Gh_D03G0922, might be responsible for plant height in upland cotton. A compact plant architecture is increasingly required for mechanized harvesting processes in China. Therefore, cotton plant architecture is an important trait, and its components, such as plant height, fruit branch length and fruit branch angle, affect the suitability of a cultivar for mechanized harvesting. To determine the genetic basis of cotton plant architecture, a genome-wide association study (GWAS) was performed using a panel composed of 355 accessions and 93,250 single nucleotide polymorphisms (SNPs) identified using the specific-locus amplified fragment sequencing method. Thirty significant associations between 22 SNPs and five plant architecture component traits were identified via GWAS. Most importantly, four peak SNP loci located on chromosome D03 were simultaneously associated with more plant architecture component traits, and these SNPs were harbored in one linkage disequilibrium block. Furthermore, 21 candidate genes for plant architecture were predicted in a 0.95-Mb region including the four peak SNPs. One of these genes (Gh_D03G0922) was near the significant SNP D03_31584163 (8.40 kb), and its Arabidopsis homologs contain MADS-box domains that might be involved in plant growth and development. qRT-PCR showed that the expression of Gh_D03G0922 was upregulated in the apical buds and young leaves of the short and compact cotton varieties, and virus-induced gene silencing (VIGS) proved that the silenced plants exhibited increased PH. These results indicate that Gh_D03G0922 is likely the candidate gene for PH in cotton. The genetic variations and candidate genes identified in this study lay a foundation

  17. Leishmania-specific surface antigens show sub-genus sequence variation and immune recognition.

    Directory of Open Access Journals (Sweden)

    Daniel P Depledge

    2010-09-01

    Full Text Available A family of hydrophilic acylated surface (HASP proteins, containing extensive and variant amino acid repeats, is expressed at the plasma membrane in infective extracellular (metacyclic and intracellular (amastigote stages of Old World Leishmania species. While HASPs are antigenic in the host and can induce protective immune responses, the biological functions of these Leishmania-specific proteins remain unresolved. Previous genome analysis has suggested that parasites of the sub-genus Leishmania (Viannia have lost HASP genes from their genomes.We have used molecular and cellular methods to analyse HASP expression in New World Leishmania mexicana complex species and show that, unlike in L. major, these proteins are expressed predominantly following differentiation into amastigotes within macrophages. Further genome analysis has revealed that the L. (Viannia species, L. (V. braziliensis, does express HASP-like proteins of low amino acid similarity but with similar biochemical characteristics, from genes present on a region of chromosome 23 that is syntenic with the HASP/SHERP locus in Old World Leishmania species and the L. (L. mexicana complex. A related gene is also present in Leptomonas seymouri and this may represent the ancestral copy of these Leishmania-genus specific sequences. The L. braziliensis HASP-like proteins (named the orthologous (o HASPs are predominantly expressed on the plasma membrane in amastigotes and are recognised by immune sera taken from 4 out of 6 leishmaniasis patients tested in an endemic region of Brazil. Analysis of the repetitive domains of the oHASPs has shown considerable genetic variation in parasite isolates taken from the same patients, suggesting that antigenic change may play a role in immune recognition of this protein family.These findings confirm that antigenic hydrophilic acylated proteins are expressed from genes in the same chromosomal region in species across the genus Leishmania. These proteins are

  18. Allele Re-sequencing Technologies

    DEFF Research Database (Denmark)

    Byrne, Stephen; Farrell, Jacqueline Danielle; Asp, Torben

    2013-01-01

    The development of next-generation sequencing technologies has made sequencing an affordable approach for detection of genetic variations associated with various traits. However, the cost of whole genome re-sequencing still remains too high to be feasible for many plant species with large...... alternative to whole genome re-sequencing to identify causative genetic variations in plants. One challenge, however, will be efficient bioinformatics strategies for data handling and analysis from the increasing amount of sequence information....

  19. Novel Genetic Variants of Sporadic Atrial Septal Defect (ASD) in a Chinese Population Identified by Whole-Exome Sequencing (WES).

    Science.gov (United States)

    Liu, Yong; Cao, Yu; Li, Yaxiong; Lei, Dongyun; Li, Lin; Hou, Zong Liu; Han, Shen; Meng, Mingyao; Shi, Jianlin; Zhang, Yayong; Wang, Yi; Niu, Zhaoyi; Xie, Yanhua; Xiao, Benshan; Wang, Yuanfei; Li, Xiao; Yang, Lirong; Wang, Wenju; Jiang, Lihong

    2018-03-05

    BACKGROUND Recently, mutations in several genes have been described to be associated with sporadic ASD, but some genetic variants remain to be identified. The aim of this study was to use whole-exome sequencing (WES) combined with bioinformatics analysis to identify novel genetic variants in cases of sporadic congenital ASD, followed by validation by Sanger sequencing. MATERIAL AND METHODS Five Han patients with secundum ASD were recruited, and their tissue samples were analyzed by WES, followed by verification by Sanger sequencing of tissue and blood samples. Further evaluation using blood samples included 452 additional patients with sporadic secundum ASD (212 male and 240 female patients) and 519 healthy subjects (252 male and 267 female subjects) for further verification by a multiplexed MassARRAY system. Bioinformatic analyses were performed to identify novel genetic variants associated with sporadic ASD. RESULTS From five patients with sporadic ASD, a total of 181,762 genomic variants in 33 exon loci, validated by Sanger sequencing, were selected and underwent MassARRAY analysis in 452 patients with ASD and 519 healthy subjects. Three loci with high mutation frequencies, the 138665410 FOXL2 gene variant, the 23862952 MYH6 gene variant, and the 71098693 HYDIN gene variant were found to be significantly associated with sporadic ASD (PASD (PASD, and supported the use of WES and bioinformatics analysis to identify disease-associated mutations.

  20. Use of next-generation sequencing to detect LDLR gene copy number variation in familial hypercholesterolemia.

    Science.gov (United States)

    Iacocca, Michael A; Wang, Jian; Dron, Jacqueline S; Robinson, John F; McIntyre, Adam D; Cao, Henian; Hegele, Robert A

    2017-11-01

    Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene ( LDLR ). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR ; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. Copyright © 2017 by the American Society for Biochemistry and Molecular Biology, Inc.

  1. Association Mapping and Nucleotide Sequence Variation in Five Drought Tolerance Candidate Genes in Spring Wheat

    Directory of Open Access Journals (Sweden)

    Erena A. Edae

    2013-07-01

    Full Text Available Functional markers are needed for key genes involved in drought tolerance to improve selection for crop yield under moisture stress conditions. The objectives of this study were to (i characterize five drought tolerance candidate genes, namely dehydration responsive element binding 1A (, enhanced response to abscisic acid ( and , and fructan 1-exohydrolase ( and , in wheat ( L. for nucleotide and haplotype diversity, Tajima’s D value, and linkage disequilibrium (LD and (ii associate within-gene single nucleotide polymorphisms (SNPs with phenotypic traits in a spring wheat association mapping panel ( = 126. Field trials were grown under contrasting moisture regimes in Greeley, CO, and Melkassa, Ethiopia, in 2010 and 2011. Genome-specific amplification and DNA sequence analysis of the genes identified SNPs and revealed differences in nucleotide and haplotype diversity, Tajima’s D, and patterns of LD. showed associations (false discovery rate adjusted probability value = 0.1 with normalized difference vegetation index, heading date, biomass, and spikelet number. Both and were associated with harvest index, flag leaf width, and leaf senescence. was associated with grain yield, and was associated with thousand kernel weight and test weight. If validated in relevant genetic backgrounds, the identified marker–trait associations may be applied to functional marker-assisted selection.

  2. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome.

    Science.gov (United States)

    Benoit, Joshua B; Adelman, Zach N; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C; Szuter, Elise M; Hagan, Richard W; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M; Nelson, David R; Rosendale, Andrew J; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R; Ioannidis, Panagiotis; Waterhouse, Robert M; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J Spencer; Gondhalekar, Ameya D; Scharf, Michael E; Peterson, Brittany F; Raje, Kapil R; Hottel, Benjamin A; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S T; Duncan, Elizabeth J; Murali, Shwetha C; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C; Muzny, Donna M; Wheeler, David; Panfilio, Kristen A; Vargas Jentzsch, Iris M; Vargo, Edward L; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T; Anderson, Michelle A E; Jones, Jeffery W; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D; Attardo, Geoffrey M; Robertson, Hugh M; Zdobnov, Evgeny M; Ribeiro, Jose M C; Gibbs, Richard A; Werren, John H; Palli, Subba R; Schal, Coby; Richards, Stephen

    2016-02-02

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host-symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human-bed bug and symbiont-bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite.

  3. Unique features of a global human ectoparasite identified through sequencing of the bed bug genome

    Science.gov (United States)

    Benoit, Joshua B.; Adelman, Zach N.; Reinhardt, Klaus; Dolan, Amanda; Poelchau, Monica; Jennings, Emily C.; Szuter, Elise M.; Hagan, Richard W.; Gujar, Hemant; Shukla, Jayendra Nath; Zhu, Fang; Mohan, M.; Nelson, David R.; Rosendale, Andrew J.; Derst, Christian; Resnik, Valentina; Wernig, Sebastian; Menegazzi, Pamela; Wegener, Christian; Peschel, Nicolai; Hendershot, Jacob M.; Blenau, Wolfgang; Predel, Reinhard; Johnston, Paul R.; Ioannidis, Panagiotis; Waterhouse, Robert M.; Nauen, Ralf; Schorn, Corinna; Ott, Mark-Christoph; Maiwald, Frank; Johnston, J. Spencer; Gondhalekar, Ameya D.; Scharf, Michael E.; Peterson, Brittany F.; Raje, Kapil R.; Hottel, Benjamin A.; Armisén, David; Crumière, Antonin Jean Johan; Refki, Peter Nagui; Santos, Maria Emilia; Sghaier, Essia; Viala, Sèverine; Khila, Abderrahman; Ahn, Seung-Joon; Childers, Christopher; Lee, Chien-Yueh; Lin, Han; Hughes, Daniel S. T.; Duncan, Elizabeth J.; Murali, Shwetha C.; Qu, Jiaxin; Dugan, Shannon; Lee, Sandra L.; Chao, Hsu; Dinh, Huyen; Han, Yi; Doddapaneni, Harshavardhan; Worley, Kim C.; Muzny, Donna M.; Wheeler, David; Panfilio, Kristen A.; Vargas Jentzsch, Iris M.; Vargo, Edward L.; Booth, Warren; Friedrich, Markus; Weirauch, Matthew T.; Anderson, Michelle A. E.; Jones, Jeffery W.; Mittapalli, Omprakash; Zhao, Chaoyang; Zhou, Jing-Jiang; Evans, Jay D.; Attardo, Geoffrey M.; Robertson, Hugh M.; Zdobnov, Evgeny M.; Ribeiro, Jose M. C.; Gibbs, Richard A.; Werren, John H.; Palli, Subba R.; Schal, Coby; Richards, Stephen

    2016-01-01

    The bed bug, Cimex lectularius, has re-established itself as a ubiquitous human ectoparasite throughout much of the world during the past two decades. This global resurgence is likely linked to increased international travel and commerce in addition to widespread insecticide resistance. Analyses of the C. lectularius sequenced genome (650 Mb) and 14,220 predicted protein-coding genes provide a comprehensive representation of genes that are linked to traumatic insemination, a reduced chemosensory repertoire of genes related to obligate hematophagy, host–symbiont interactions, and several mechanisms of insecticide resistance. In addition, we document the presence of multiple putative lateral gene transfer events. Genome sequencing and annotation establish a solid foundation for future research on mechanisms of insecticide resistance, human–bed bug and symbiont–bed bug associations, and unique features of bed bug biology that contribute to the unprecedented success of C. lectularius as a human ectoparasite. PMID:26836814

  4. The genome sequence of the emerging common midwife toad virus identifies an evolutionary intermediate within ranaviruses.

    Science.gov (United States)

    Mavian, Carla; López-Bueno, Alberto; Balseiro, Ana; Casais, Rosa; Alcamí, Antonio; Alejo, Alí

    2012-04-01

    Worldwide amphibian population declines have been ascribed to global warming, increasing pollution levels, and other factors directly related to human activities. These factors may additionally be favoring the emergence of novel pathogens. In this report, we have determined the complete genome sequence of the emerging common midwife toad ranavirus (CMTV), which has caused fatal disease in several amphibian species across Europe. Phylogenetic and gene content analyses of the first complete genomic sequence from a ranavirus isolated in Europe show that CMTV is an amphibian-like ranavirus (ALRV). However, the CMTV genome structure is novel and represents an intermediate evolutionary stage between the two previously described ALRV groups. We find that CMTV clusters with several other ranaviruses isolated from different hosts and locations which might also be included in this novel ranavirus group. This work sheds light on the phylogenetic relationships within this complex group of emerging, disease-causing viruses.

  5. High throughput sequencing identifies chilling responsive genes in sweetpotato (Ipomoea batatas Lam.) during storage.

    Science.gov (United States)

    Xie, Zeyi; Zhou, Zhilin; Li, Hongmin; Yu, Jingjing; Jiang, Jiaojiao; Tang, Zhonghou; Ma, Daifu; Zhang, Baohong; Han, Yonghua; Li, Zongyun

    2018-05-21

    Sweetpotato (Ipomoea batatas L.) is a globally important economic food crop. It belongs to Convolvulaceae family and origins in the tropics; however, sweetpotato is sensitive to cold stress during storage. In this study, we performed transcriptome sequencing to investigate the sweetpotato response to chilling stress during storage. A total of 110,110 unigenes were generated via high-throughput sequencing. Differentially expressed genes (DEGs) analysis showed that 18,681 genes were up-regulated and 21,983 genes were down-regulated in low temperature condition. Many DEGs were related to the cell membrane system, antioxidant enzymes, carbohydrate metabolism, and hormone metabolism, which are potentially associated with sweetpotato resistance to low temperature. The existence of DEGs suggests a molecular basis for the biochemical and physiological consequences of sweetpotato in low temperature storage conditions. Our analysis will provide a new target for enhancement of sweetpotato cold stress tolerance in postharvest storage through genetic manipulation. Copyright © 2018. Published by Elsevier Inc.

  6. Epitope Sequences in Dengue Virus NS1 Protein Identified by Monoclonal Antibodies

    Directory of Open Access Journals (Sweden)

    Leticia Barboza Rocha

    2017-10-01

    Full Text Available Dengue nonstructural protein 1 (NS1 is a multi-functional glycoprotein with essential functions both in viral replication and modulation of host innate immune responses. NS1 has been established as a good surrogate marker for infection. In the present study, we generated four anti-NS1 monoclonal antibodies against recombinant NS1 protein from dengue virus serotype 2 (DENV2, which were used to map three NS1 epitopes. The sequence 193AVHADMGYWIESALNDT209 was recognized by monoclonal antibodies 2H5 and 4H1BC, which also cross-reacted with Zika virus (ZIKV protein. On the other hand, the sequence 25VHTWTEQYKFQPES38 was recognized by mAb 4F6 that did not cross react with ZIKV. Lastly, a previously unidentified DENV2 NS1-specific epitope, represented by the sequence 127ELHNQTFLIDGPETAEC143, is described in the present study after reaction with mAb 4H2, which also did not cross react with ZIKV. The selection and characterization of the epitope, specificity of anti-NS1 mAbs, may contribute to the development of diagnostic tools able to differentiate DENV and ZIKV infections.

  7. MicroRNA repertoire for functional genome research in tilapia identified by deep sequencing.

    Science.gov (United States)

    Yan, Biao; Wang, Zhen-Hua; Zhu, Chang-Dong; Guo, Jin-Tao; Zhao, Jin-Liang

    2014-08-01

    The Nile tilapia (Oreochromis niloticus; Cichlidae) is an economically important species in aquaculture and occupies a prominent position in the aquaculture industry. MicroRNAs (miRNAs) are a class of noncoding RNAs that post-transcriptionally regulate gene expression involved in diverse biological and metabolic processes. To increase the repertoire of miRNAs characterized in tilapia, we used the Illumina/Solexa sequencing technology to sequence a small RNA library using pooled RNA sample isolated from the different developmental stages of tilapia. Bioinformatic analyses suggest that 197 conserved and 27 novel miRNAs are expressed in tilapia. Sequence alignments indicate that all tested miRNAs and miRNAs* are highly conserved across many species. In addition, we characterized the tissue expression patterns of five miRNAs using real-time quantitative PCR. We found that miR-1/206, miR-7/9, and miR-122 is abundantly expressed in muscle, brain, and liver, respectively, implying a potential role in the regulation of tissue differentiation or the maintenance of tissue identity. Overall, our results expand the number of tilapia miRNAs, and the discovery of miRNAs in tilapia genome contributes to a better understanding the role of miRNAs in regulating diverse biological processes.

  8. Leaf Transcriptome Sequencing for Identifying Genic-SSR Markers and SNP Heterozygosity in Crossbred Mango Variety 'Amrapali' (Mangifera indica L.).

    Science.gov (United States)

    Mahato, Ajay Kumar; Sharma, Nimisha; Singh, Akshay; Srivastav, Manish; Jaiprakash; Singh, Sanjay Kumar; Singh, Anand Kumar; Sharma, Tilak Raj; Singh, Nagendra Kumar

    2016-01-01

    Mango (Mangifera indica L.) is called "king of fruits" due to its sweetness, richness of taste, diversity, large production volume and a variety of end usage. Despite its huge economic importance genomic resources in mango are scarce and genetics of useful horticultural traits are poorly understood. Here we generated deep coverage leaf RNA sequence data for mango parental varieties 'Neelam', 'Dashehari' and their hybrid 'Amrapali' using next generation sequencing technologies. De-novo sequence assembly generated 27,528, 20,771 and 35,182 transcripts for the three genotypes, respectively. The transcripts were further assembled into a non-redundant set of 70,057 unigenes that were used for SSR and SNP identification and annotation. Total 5,465 SSR loci were identified in 4,912 unigenes with 288 type I SSR (n ≥ 20 bp). One hundred type I SSR markers were randomly selected of which 43 yielded PCR amplicons of expected size in the first round of validation and were designated as validated genic-SSR markers. Further, 22,306 SNPs were identified by aligning high quality sequence reads of the three mango varieties to the reference unigene set, revealing significantly enhanced SNP heterozygosity in the hybrid Amrapali. The present study on leaf RNA sequencing of mango varieties and their hybrid provides useful genomic resource for genetic improvement of mango.

  9. High throughput sequencing and proteomics to identify immunogenic proteins of a new pathogen: the dirty genome approach.

    Directory of Open Access Journals (Sweden)

    Gilbert Greub

    Full Text Available BACKGROUND: With the availability of new generation sequencing technologies, bacterial genome projects have undergone a major boost. Still, chromosome completion needs a costly and time-consuming gap closure, especially when containing highly repetitive elements. However, incomplete genome data may be sufficiently informative to derive the pursued information. For emerging pathogens, i.e. newly identified pathogens, lack of release of genome data during gap closure stage is clearly medically counterproductive. METHODS/PRINCIPAL FINDINGS: We thus investigated the feasibility of a dirty genome approach, i.e. the release of unfinished genome sequences to develop serological diagnostic tools. We showed that almost the whole genome sequence of the emerging pathogen Parachlamydia acanthamoebae was retrieved even with relatively short reads from Genome Sequencer 20 and Solexa. The bacterial proteome was analyzed to select immunogenic proteins, which were then expressed and used to elaborate the first steps of an ELISA. CONCLUSIONS/SIGNIFICANCE: This work constitutes the proof of principle for a dirty genome approach, i.e. the use of unfinished genome sequences of pathogenic bacteria, coupled with proteomics to rapidly identify new immunogenic proteins useful to develop in the future specific diagnostic tests such as ELISA, immunohistochemistry and direct antigen detection. Although applied here to an emerging pathogen, this combined dirty genome sequencing/proteomic approach may be used for any pathogen for which better diagnostics are needed. These genome sequences may also be very useful to develop DNA based diagnostic tests. All these diagnostic tools will allow further evaluations of the pathogenic potential of this obligate intracellular bacterium.

  10. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Directory of Open Access Journals (Sweden)

    Kei-ichi Morita

    Full Text Available Gorlin syndrome (GS is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs. In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals, whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  11. Simultaneous Detection of Both Single Nucleotide Variations and Copy Number Alterations by Next-Generation Sequencing in Gorlin Syndrome.

    Science.gov (United States)

    Morita, Kei-ichi; Naruto, Takuya; Tanimoto, Kousuke; Yasukawa, Chisato; Oikawa, Yu; Masuda, Kiyoshi; Imoto, Issei; Inazawa, Johji; Omura, Ken; Harada, Hiroyuki

    2015-01-01

    Gorlin syndrome (GS) is an autosomal dominant disorder that predisposes affected individuals to developmental defects and tumorigenesis, and caused mainly by heterozygous germline PTCH1 mutations. Despite exhaustive analysis, PTCH1 mutations are often unidentifiable in some patients; the failure to detect mutations is presumably because of mutations occurred in other causative genes or outside of analyzed regions of PTCH1, or copy number alterations (CNAs). In this study, we subjected a cohort of GS-affected individuals from six unrelated families to next-generation sequencing (NGS) analysis for the combined screening of causative alterations in Hedgehog signaling pathway-related genes. Specific single nucleotide variations (SNVs) of PTCH1 causing inferred amino acid changes were identified in four families (seven affected individuals), whereas CNAs within or around PTCH1 were found in two families in whom possible causative SNVs were not detected. Through a targeted resequencing of all coding exons, as well as simultaneous evaluation of copy number status using the alignment map files obtained via NGS, we found that GS phenotypes could be explained by PTCH1 mutations or deletions in all affected patients. Because it is advisable to evaluate CNAs of candidate causative genes in point mutation-negative cases, NGS methodology appears to be useful for improving molecular diagnosis through the simultaneous detection of both SNVs and CNAs in the targeted genes/regions.

  12. The use of high-throughput DNA sequencing in the investigation of antigenic variation: application to Neisseria species.

    Directory of Open Access Journals (Sweden)

    John K Davies

    Full Text Available Antigenic variation occurs in a broad range of species. This process resembles gene conversion in that variant DNA is unidirectionally transferred from partial gene copies (or silent loci into an expression locus. Previous studies of antigenic variation have involved the amplification and sequencing of individual genes from hundreds of colonies. Using the pilE gene from Neisseria gonorrhoeae we have demonstrated that it is possible to use PCR amplification, followed by high-throughput DNA sequencing and a novel assembly process, to detect individual antigenic variation events. The ability to detect these events was much greater than has previously been possible. In N. gonorrhoeae most silent loci contain multiple partial gene copies. Here we show that there is a bias towards using the copy at the 3' end of the silent loci (copy 1 as the donor sequence. The pilE gene of N. gonorrhoeae and some strains of Neisseria meningitidis encode class I pilin, but strains of N. meningitidis from clonal complexes 8 and 11 encode a class II pilin. We have confirmed that the class II pili of meningococcal strain FAM18 (clonal complex 11 are non-variable, and this is also true for the class II pili of strain NMB from clonal complex 8. In addition when a gene encoding class I pilin was moved into the meningococcal strain NMB background there was no evidence of antigenic variation. Finally we investigated several members of the opa gene family of N. gonorrhoeae, where it has been suggested that limited variation occurs. Variation was detected in the opaK gene that is located close to pilE, but not at the opaJ gene located elsewhere on the genome. The approach described here promises to dramatically improve studies of the extent and nature of antigenic variation systems in a variety of species.

  13. Understanding human genetic variation in the era of high-throughput sequencing

    OpenAIRE

    Knight, Julian C.

    2010-01-01

    The EMBO/EMBL symposium ‘Human Variation: Cause and Consequence' highlighted advances in understanding the molecular basis of human genetic variation and its myriad implications for biology, human origins and disease.

  14. Identifying the factors influencing practice variation in thrombosis medicine: A qualitative content analysis of published practice-pattern surveys.

    Science.gov (United States)

    Skeith, Leslie; Gonsalves, Carol

    2017-11-01

    Practice variation, the differences in clinical management between physicians, is one reason why patient outcomes may differ. Identifying factors that contribute to practice variation in areas of clinical uncertainty or equipoise may have implications for understanding and improving patient care. To discern what factors may influence practice variation, we completed a qualitative content analysis of all practice-pattern surveys in thrombosis medicine in the last 10years. Out of 2117 articles screened using a systematic search strategy, 33 practice-pattern surveys met eligibility criteria. Themes were identified using constant comparative analysis of qualitative data. Practice variation was noted in all 33 practice-pattern surveys. Contributing factors to variation included lack of available evidence, lack of clear and specific guideline recommendations, past experience, patient context, institutional culture and the perceived risk and benefit of a particular treatment. Additional themes highlight the value placed on expertise in challenging clinical scenarios, the complexity of practice variation and the value placed on minimizing practice variation. Copyright © 2017 Elsevier Ltd. All rights reserved.

  15. Deep Sequencing of 71 Candidate Genes to Characterize Variation Associated with Alcohol Dependence.

    Science.gov (United States)

    Clark, Shaunna L; McClay, Joseph L; Adkins, Daniel E; Kumar, Gaurav; Aberg, Karolina A; Nerella, Srilaxmi; Xie, Linying; Collins, Ann L; Crowley, James J; Quackenbush, Corey R; Hilliard, Christopher E; Shabalin, Andrey A; Vrieze, Scott I; Peterson, Roseann E; Copeland, William E; Silberg, Judy L; McGue, Matt; Maes, Hermine; Iacono, William G; Sullivan, Patrick F; Costello, Elizabeth J; van den Oord, Edwin J

    2017-04-01

    Previous genomewide association studies (GWASs) have identified a number of putative risk loci for alcohol dependence (AD). However, only a few loci have replicated and these replicated variants only explain a small proportion of AD risk. Using an innovative approach, the goal of this study was to generate hypotheses about potentially causal variants for AD that can be explored further through functional studies. We employed targeted capture of 71 candidate loci and flanking regions followed by next-generation deep sequencing (mean coverage 78X) in 806 European Americans. Regions included in our targeted capture library were genes identified through published GWAS of alcohol, all human alcohol and aldehyde dehydrogenases, reward system genes including dopaminergic and opioid receptors, prioritized candidate genes based on previous associations, and genes involved in the absorption, distribution, metabolism, and excretion of drugs. We performed single-locus tests to determine if any single variant was associated with AD symptom count. Sets of variants that overlapped with biologically meaningful annotations were tested for association in aggregate. No single, common variant was significantly associated with AD in our study. We did, however, find evidence for association with several variant sets. Two variant sets were significant at the q-value <0.10 level: a genic enhancer for ADHFE1 (p = 1.47 × 10 -5 ; q = 0.019), an alcohol dehydrogenase, and ADORA1 (p = 5.29 × 10 -5 ; q = 0.035), an adenosine receptor that belongs to a G-protein-coupled receptor gene family. To our knowledge, this is the first sequencing study of AD to examine variants in entire genes, including flanking and regulatory regions. We found that in addition to protein coding variant sets, regulatory variant sets may play a role in AD. From these findings, we have generated initial functional hypotheses about how these sets may influence AD. Copyright © 2017 by the Research Society on

  16. Whole-exome sequencing identifies USH2A mutations in a pseudo-dominant Usher syndrome family.

    Science.gov (United States)

    Zheng, Sui-Lian; Zhang, Hong-Liang; Lin, Zhen-Lang; Kang, Qian-Yan

    2015-10-01

    Usher syndrome (USH) is an autosomal recessive (AR) multi-sensory degenerative disorder leading to deaf-blindness. USH is clinically subdivided into three subclasses, and 10 genes have been identified thus far. Clinical and genetic heterogeneities in USH make a precise diagnosis difficult. A dominant‑like USH family in successive generations was identified, and the present study aimed to determine the genetic predisposition of this family. Whole‑exome sequencing was performed in two affected patients and an unaffected relative. Systematic data were analyzed by bioinformatic analysis to remove the candidate mutations via step‑wise filtering. Direct Sanger sequencing and co‑segregation analysis were performed in the pedigree. One novel and two known mutations in the USH2A gene were identified, and were further confirmed by direct sequencing and co‑segregation analysis. The affected mother carried compound mutations in the USH2A gene, while the unaffected father carried a heterozygous mutation. The present study demonstrates that whole‑exome sequencing is a robust approach for the molecular diagnosis of disorders with high levels of genetic heterogeneity.

  17. Recurrent chimeric RNAs enriched in human prostate cancer identified by deep sequencing

    Science.gov (United States)

    Kannan, Kalpana; Wang, Liguo; Wang, Jianghua; Ittmann, Michael M.; Li, Wei; Yen, Laising

    2011-01-01

    Transcription-induced chimeric RNAs, possessing sequences from different genes, are expected to increase the proteomic diversity through chimeric proteins or altered regulation. Despite their importance, few studies have focused on chimeric RNAs especially regarding their presence/roles in human cancers. By deep sequencing the transcriptome of 20 human prostate cancer and 10 matched benign prostate tissues, we obtained 1.3 billion sequence reads, which led to the identification of 2,369 chimeric RNA candidates. Chimeric RNAs occurred in significantly higher frequency in cancer than in matched benign samples. Experimental investigation of a selected 46 set led to the confirmation of 32 chimeric RNAs, of which 27 were highly recurrent and previously undescribed in prostate cancer. Importantly, a subset of these chimeras was present in prostate cancer cell lines, but not detectable in primary human prostate epithelium cells, implying their associations with cancer. These chimeras contain discernable 5′ and 3′ splice sites at the RNA junction, indicating that their formation is mediated by splicing. Their presence is also largely independent of the expression of parental genes, suggesting that other factors are involved in their production and regulation. One chimera, TMEM79-SMG5, is highly differentially expressed in human cancer samples and therefore a potential biomarker. The prevalence of chimeric RNAs may allow the limited number of human genes to encode a substantially larger number of RNAs and proteins, forming an additional layer of cellular complexity. Together, our results suggest that chimeric RNAs are widespread, and increased chimeric RNA events could represent a unique class of molecular alteration in cancer. PMID:21571633

  18. TAPDANCE: An automated tool to identify and annotate transposon insertion CISs and associations between CISs from next generation sequence data

    Directory of Open Access Journals (Sweden)

    Sarver Aaron L

    2012-06-01

    Full Text Available Abstract Background Next generation sequencing approaches applied to the analyses of transposon insertion junction fragments generated in high throughput forward genetic screens has created the need for clear informatics and statistical approaches to deal with the massive amount of data currently being generated. Previous approaches utilized to 1 map junction fragments within the genome and 2 identify Common Insertion Sites (CISs within the genome are not practical due to the volume of data generated by current sequencing technologies. Previous approaches applied to this problem also required significant manual annotation. Results We describe Transposon Annotation Poisson Distribution Association Network Connectivity Environment (TAPDANCE software, which automates the identification of CISs within transposon junction fragment insertion data. Starting with barcoded sequence data, the software identifies and trims sequences and maps putative genomic sequence to a reference genome using the bowtie short read mapper. Poisson distribution statistics are then applied to assess and rank genomic regions showing significant enrichment for transposon insertion. Novel methods of counting insertions are used to ensure that the results presented have the expected characteristics of informative CISs. A persistent mySQL database is generated and utilized to keep track of sequences, mappings and common insertion sites. Additionally, associations between phenotypes and CISs are also identified using Fisher’s exact test with multiple testing correction. In a case study using previously published data we show that the TAPDANCE software identifies CISs as previously described, prioritizes them based on p-value, allows holistic visualization of the data within genome browser software and identifies relationships present in the structure of the data. Conclusions The TAPDANCE process is fully automated, performs similarly to previous labor intensive approaches

  19. Exome sequencing identifies highly recurrent MED12 somatic mutations in breast fibroadenoma.

    Science.gov (United States)

    Lim, Weng Khong; Ong, Choon Kiat; Tan, Jing; Thike, Aye Aye; Ng, Cedric Chuan Young; Rajasegaran, Vikneswari; Myint, Swe Swe; Nagarajan, Sanjanaa; Nasir, Nur Diyana Md; McPherson, John R; Cutcutache, Ioana; Poore, Gregory; Tay, Su Ting; Ooi, Wei Siong; Tan, Veronique Kiak Mien; Hartman, Mikael; Ong, Kong Wee; Tan, Benita K T; Rozen, Steven G; Tan, Puay Hoon; Tan, Patrick; Teh, Bin Tean

    2014-08-01

    Fibroadenomas are the most common breast tumors in women under 30 (refs. 1,2). Exome sequencing of eight fibroadenomas with matching whole-blood samples revealed recurrent somatic mutations solely in MED12, which encodes a Mediator complex subunit. Targeted sequencing of an additional 90 fibroadenomas confirmed highly frequent MED12 exon 2 mutations (58/98, 59%) that are probably somatic, with 71% of mutations occurring in codon 44. Using laser capture microdissection, we show that MED12 fibroadenoma mutations are present in stromal but not epithelial mammary cells. Expression profiling of MED12-mutated and wild-type fibroadenomas revealed that MED12 mutations are associated with dysregulated estrogen signaling and extracellular matrix organization. The fibroadenoma MED12 mutation spectrum is nearly identical to that of previously reported MED12 lesions in uterine leiomyoma but not those of other tumors. Benign tumors of the breast and uterus, both of which are key target tissues of estrogen, may thus share a common genetic basis underpinned by highly frequent and specific MED12 mutations.

  20. Multiplexed resequencing analysis to identify rare variants in pooled DNA with barcode indexing using next-generation sequencer.

    Science.gov (United States)

    Mitsui, Jun; Fukuda, Yoko; Azuma, Kyo; Tozaki, Hirokazu; Ishiura, Hiroyuki; Takahashi, Yuji; Goto, Jun; Tsuji, Shoji

    2010-07-01

    We have recently found that multiple rare variants of the glucocerebrosidase gene (GBA) confer a robust risk for Parkinson disease, supporting the 'common disease-multiple rare variants' hypothesis. To develop an efficient method of identifying rare variants in a large number of samples, we applied multiplexed resequencing using a next-generation sequencer to identification of rare variants of GBA. Sixteen sets of pooled DNAs from six pooled DNA samples were prepared. Each set of pooled DNAs was subjected to polymerase chain reaction to amplify the target gene (GBA) covering 6.5 kb, pooled into one tube with barcode indexing, and then subjected to extensive sequence analysis using the SOLiD System. Individual samples were also subjected to direct nucleotide sequence analysis. With the optimization of data processing, we were able to extract all the variants from 96 samples with acceptable rates of false-positive single-nucleotide variants.

  1. Characterizing novel endogenous retroviruses from genetic variation inferred from short sequence reads

    DEFF Research Database (Denmark)

    Mourier, Tobias; Mollerup, Sarah; Vinner, Lasse

    2015-01-01

    From Illumina sequencing of DNA from brain and liver tissue from the lion, Panthera leo, and tumor samples from the pike-perch, Sander lucioperca, we obtained two assembled sequence contigs with similarity to known retroviruses. Phylogenetic analyses suggest that the pike-perch retrovirus belongs...... to the epsilonretroviruses, and the lion retrovirus to the gammaretroviruses. To determine if these novel retroviral sequences originate from an endogenous retrovirus or from a recently integrated exogenous retrovirus, we assessed the genetic diversity of the parental sequences from which the short Illumina reads...

  2. Overview of errors in the reference sequence and annotation of Mycobacterium tuberculosis H37Rv, and variation amongst its isolates

    KAUST Repository

    Köser, Claudio U.

    2012-06-01

    Since its publication in 1998, the genome sequence of the Mycobacterium tuberculosis H37Rv laboratory strain has acted as the cornerstone for the study of tuberculosis. In this review we address some of the practical aspects that have come to light relating to the use of H37Rv throughout the past decade which are of relevance for the ongoing genomic and laboratory studies of this pathogen. These include errors in the genome reference sequence and its annotation, as well as the recently detected variation amongst isolates of H37Rv from different laboratories. © 2011 Elsevier B.V..

  3. Sequence variation in the alpha-toxin encoding plc gene of Clostridium perfringens strains isolated from diseased and healthy chickens

    DEFF Research Database (Denmark)

    Abildgaard, L; Engberg, RM; Pedersen, Karl

    2009-01-01

    The aim of the present study was to analyse the genetic diversity of the alpha-toxin encoding plc gene and the variation in a-toxin production of Clostridium perfringens type A strains isolated from presumably healthy chickens and chickens suffering from either necrotic enteritis (NE) or cholangio......-hepatitis. The a-toxin encoding plc genes from 60 different pulsed-field gel electrophoresis (PFGE) types (strains) of C perfringens were sequenced and translated in silico to amino acid sequences and the a-toxin production was investigated in batch cultures of 45 of the strains using an enzyme...

  4. Identifying an unknown function in a parabolic equation with overspecified data via He's variational iteration method

    International Nuclear Information System (INIS)

    Dehghan, Mehdi; Tatari, Mehdi

    2008-01-01

    In this research, the He's variational iteration technique is used for computing an unknown time-dependent parameter in an inverse quasilinear parabolic partial differential equation. Parabolic partial differential equations with overspecified data play a crucial role in applied mathematics and physics, as they appear in various engineering models. The He's variational iteration method is an analytical procedure for finding solutions of differential equations, is based on the use of Lagrange multipliers for identification of an optimal value of a parameter in a functional. To show the efficiency of the new approach, several test problems are presented for one-, two- and three-dimensional cases

  5. Combined influence of LDLR and HMGCR sequence variation on lipid-lowering response to simvastatin

    Science.gov (United States)

    Mangravite, Lara M.; Medina, Marisa Wong; Cui, Jinrui; Pressman, Sheila; Smith, Joshua D.; Rieder, Mark J.; Guo, Xiuqing; Nickerson, Deborah A.; Rotter, Jerome I.; Krauss, Ronald M.

    2010-01-01

    Objectives Although statins are efficacious for lowering LDL-cholesterol (LDLC), there is wide inter-individual variation in response. We tested the extent to which combined effects of common alleles of LDLR and HMGCR can contribute to this variability. Methods and Results Haplotypes in the LDLR 3′-untranslated region (3UTR) were tested for association with lipid-lowering response to simvastatin treatment in the Cholesterol and Pharmacogenetics (CAP) trial (335 African-Americans and 609European-Americans). LDLR haplotype 5 (L5)was associated with smaller simvastatin-induced reductions in LDLC, total cholesterol, non-HDL cholesterol, and apolipoprotein B (P=0.0002–0.03)in African-Americans, but not European-Americans. The combined presence of L5 and previously described HMGCR haplotypes in African-Americans was associated with significantly attenuated apoB reduction(−22.4±1.5% N=89) both compared to noncarriers (−30.6±1.5% N=78, P=0.0001) and to carriers of either individual haplotype (−28.2±1.1% N=158, P=0.001). We observed similar differences when measuring simvastatin-mediated induction of LDLR surface expression using lymphoblast cell lines (P=0.03). Conclusions We have identified a common LDLR 3UTR haplotype that is associated with attenuated lipid-lowering response to simvastatin treatment. Response was further reduced in individuals with both LDLR and previously described HMGCR haplotypes. Previously identified racial differences in statin efficacy were partially explained by increased prevalence of these combined haplotypes in African-Americans. PMID:20413733

  6. Next generation sequencing identifies abnormal Y chromosome and candidate causal variants in premature ovarian failure patients.

    Science.gov (United States)

    Lee, Yujung; Kim, Changshin; Park, YoungJoon; Pyun, Jung-A; Kwack, KyuBum

    2016-12-01

    Premature ovarian failure (POF) is characterized by heterogeneous genetic causes such as chromosomal abnormalities and variants in causal genes. Recently, development of techniques made next generation sequencing (NGS) possible to detect genome wide variants including chromosomal abnormalities. Among 37 Korean POF patients, XY karyotype with distal part deletions of Y chromosome, Yp11.32-31 and Yp12 end part, was observed in two patients through NGS. Six deleterious variants in POF genes were also detected which might explain the pathogenesis of POF with abnormalities in the sex chromosomes. Additionally, the two POF patients had no mutation in SRY but three non-synonymous variants were detected in genes regarding sex reversal. These findings suggest candidate causes of POF and sex reversal and show the propriety of NGS to approach the heterogeneous pathogenesis of POF. Copyright © 2016 Elsevier Inc. All rights reserved.

  7. Whole-exome sequencing identifies common and rare variant metabolic QTLs in a Middle Eastern population.

    Science.gov (United States)

    Yousri, Noha A; Fakhro, Khalid A; Robay, Amal; Rodriguez-Flores, Juan L; Mohney, Robert P; Zeriri, Hassina; Odeh, Tala; Kader, Sara Abdul; Aldous, Eman K; Thareja, Gaurav; Kumar, Manish; Al-Shakaki, Alya; Chidiac, Omar M; Mohamoud, Yasmin A; Mezey, Jason G; Malek, Joel A; Crystal, Ronald G; Suhre, Karsten

    2018-01-23

    Metabolomics-genome-wide association studies (mGWAS) have uncovered many metabolic quantitative trait loci (mQTLs) influencing human metabolic individuality, though predominantly in European cohorts. By combining whole-exome sequencing with a high-resolution metabolomics profiling for a highly consanguineous Middle Eastern population, we discover 21 common variant and 12 functional rare variant mQTLs, of which 45% are novel altogether. We fine-map 10 common variant mQTLs to new metabolite ratio associations, and 11 common variant mQTLs to putative protein-altering variants. This is the first work to report common and rare variant mQTLs linked to diseases and/or pharmacological targets in a consanguineous Arab cohort, with wide implications for precision medicine in the Middle East.

  8. Identifying Likely Transmission Pathways within a 10-Year Community Outbreak of Tuberculosis by High-Depth Whole Genome Sequencing.

    Directory of Open Access Journals (Sweden)

    Alexander C Outhred

    Full Text Available Improved tuberculosis control and the need to contain the spread of drug-resistant strains provide a strong rationale for exploring tuberculosis transmission dynamics at the population level. Whole-genome sequencing provides optimal strain resolution, facilitating detailed mapping of potential transmission pathways.We sequenced 22 isolates from a Mycobacterium tuberculosis cluster in New South Wales, Australia, identified during routine 24-locus mycobacterial interspersed repetitive unit typing. Following high-depth paired-end sequencing using the Illumina HiSeq 2000 platform, two independent pipelines were employed for analysis, both employing read mapping onto reference genomes as well as de novo assembly, to control biases in variant detection. In addition to single-nucleotide polymorphisms, the analyses also sought to identify insertions, deletions and structural variants.Isolates were highly similar, with a distance of 13 variants between the most distant members of the cluster. The most sensitive analysis classified the 22 isolates into 18 groups. Four of the isolates did not appear to share a recent common ancestor with the largest clade; another four isolates had an uncertain ancestral relationship with the largest clade.Whole genome sequencing, with analysis of single-nucleotide polymorphisms, insertions, deletions, structural variants and subpopulations, enabled the highest possible level of discrimination between cluster members, clarifying likely transmission pathways and exposing the complexity of strain origin. The analysis provides a basis for targeted public health intervention and enhanced classification of future isolates linked to the cluster.

  9. diffloop: a computational framework for identifying and analyzing differential DNA loops from sequencing data.

    Science.gov (United States)

    Lareau, Caleb A; Aryee, Martin J; Berger, Bonnie

    2018-02-15

    The 3D architecture of DNA within the nucleus is a key determinant of interactions between genes, regulatory elements, and transcriptional machinery. As a result, differences in DNA looping structure are associated with variation in gene expression and cell state. To systematically assess changes in DNA looping architecture between samples, we introduce diffloop, an R/Bioconductor package that provides a suite of functions for the quality control, statistical testing, annotation, and visualization of DNA loops. We demonstrate this functionality by detecting differences between ENCODE ChIA-PET samples and relate looping to variability in epigenetic state. Diffloop is implemented as an R/Bioconductor package available at https://bioconductor.org/packages/release/bioc/html/diffloop.html. aryee.martin@mgh.harvard.edu. Supplementary data are available at Bioinformatics online. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

  10. Greater than the sum of its parts: single-nucleus sequencing identifies convergent evolution of independent EGFR mutants in GBM.

    Science.gov (United States)

    Gini, Beatrice; Mischel, Paul S

    2014-08-01

    Single-cell sequencing approaches are needed to characterize the genomic diversity of complex tumors, shedding light on their evolutionary paths and potentially suggesting more effective therapies. In this issue of Cancer Discovery, Francis and colleagues develop a novel integrative approach to identify distinct tumor subpopulations based on joint detection of clonal and subclonal events from bulk tumor and single-nucleus whole-genome sequencing, allowing them to infer a subclonal architecture. Surprisingly, the authors identify convergent evolution of multiple, mutually exclusive, independent EGFR gain-of-function variants in a single tumor. This study demonstrates the value of integrative single-cell genomics and highlights the biologic primacy of EGFR as an actionable target in glioblastoma. ©2014 American Association for Cancer Research.

  11. Integrative analysis of functional genomic annotations and sequencing data to identify rare causal variants via hierarchical modeling

    Directory of Open Access Journals (Sweden)

    Marinela eCapanu

    2015-05-01

    Full Text Available Identifying the small number of rare causal variants contributing to disease has beena major focus of investigation in recent years, but represents a formidable statisticalchallenge due to the rare frequencies with which these variants are observed. In thiscommentary we draw attention to a formal statistical framework, namely hierarchicalmodeling, to combine functional genomic annotations with sequencing data with theobjective of enhancing our ability to identify rare causal variants. Using simulations weshow that in all configurations studied, the hierarchical modeling approach has superiordiscriminatory ability compared to a recently proposed aggregate measure of deleteriousness,the Combined Annotation-Dependent Depletion (CADD score, supportingour premise that aggregate functional genomic measures can more accurately identifycausal variants when used in conjunction with sequencing data through a hierarchicalmodeling approach

  12. Use of Whole-Genus Genome Sequence Data To Develop a Multilocus Sequence Typing Tool That Accurately Identifies Yersinia Isolates to the Species and Subspecies Levels

    Science.gov (United States)

    Hall, Miquette; Chattaway, Marie A.; Reuter, Sandra; Savin, Cyril; Strauch, Eckhard; Carniel, Elisabeth; Connor, Thomas; Van Damme, Inge; Rajakaruna, Lakshani; Rajendram, Dunstan; Jenkins, Claire; Thomson, Nicholas R.

    2014-01-01

    The genus Yersinia is a large and diverse bacterial genus consisting of human-pathogenic species, a fish-pathogenic species, and a large number of environmental species. Recently, the phylogenetic and population structure of the entire genus was elucidated through the genome sequence data of 241 strains encompassing every known species in the genus. Here we report the mining of this enormous data set to create a multilocus sequence typing-based scheme that can identify Yersinia strains to the species level to a level of resolution equal to that for whole-genome sequencing. Our assay is designed to be able to accurately subtype the important human-pathogenic species Yersinia enterocolitica to whole-genome resolution levels. We also report the validation of the scheme on 386 strains from reference laboratory collections across Europe. We propose that the scheme is an important molecular typing system to allow accurate and reproducible identification of Yersinia isolates to the species level, a process often inconsistent in nonspecialist laboratories. Additionally, our assay is the most phylogenetically informative typing scheme available for Y. enterocolitica. PMID:25339391

  13. Activity of Posaconazole and Other Antifungal Agents against Mucorales Strains Identified by Sequencing of Internal Transcribed Spacers▿

    Science.gov (United States)

    Alastruey-Izquierdo, Ana; Castelli, Maria Victoria; Cuesta, Isabel; Monzon, Araceli; Cuenca-Estrella, Manuel; Rodriguez-Tudela, Juan Luis

    2009-01-01

    The antifungal susceptibility profiles of 77 clinical strains of Mucorales species, identified by internal transcribed spacer sequencing, were analyzed. MICs obtained at 24 and 48 h were compared. Amphotericin B was the most active agent against all isolates, except for Cunninghamella and Apophysomyces isolates. Posaconazole also showed good activity for all species but Cunninghamella bertholletiae. Voriconazole had no activity against any of the fungi tested. Terbinafine showed good activity, except for Rhizopus oryzae, Mucor circinelloides, and Rhizomucor variabilis isolates. PMID:19171801

  14. Activity of posaconazole and other antifungal agents against Mucorales strains identified by sequencing of internal transcribed spacers.

    Science.gov (United States)

    Alastruey-Izquierdo, Ana; Castelli, Maria Victoria; Cuesta, Isabel; Monzon, Araceli; Cuenca-Estrella, Manuel; Rodriguez-Tudela, Juan Luis

    2009-04-01

    The antifungal susceptibility profiles of 77 clinical strains of Mucorales species, identified by internal transcribed spacer sequencing, were analyzed. MICs obtained at 24 and 48 h were compared. Amphotericin B was the most active agent against all isolates, except for Cunninghamella and Apophysomyces isolates. Posaconazole also showed good activity for all species but Cunninghamella bertholletiae. Voriconazole had no activity against any of the fungi tested. Terbinafine showed good activity, except for Rhizopus oryzae, Mucor circinelloides, and Rhizomucor variabilis isolates.

  15. ATRX mutation in two adult brothers with non-specific moderate intellectual disability identified by exome sequencing.

    Science.gov (United States)

    Moncini, S; Bedeschi, M F; Castronovo, P; Crippa, M; Calvello, M; Garghentino, R R; Scuvera, G; Finelli, P; Venturin, M

    2013-12-01

    In this report, we describe two adult brothers affected by moderate non-specific intellectual disability (ID). They showed minor facial anomalies, not clearly ascribable to any specific syndromic patterns, microcephaly, brachydactyly and broad toes. Both brothers presented seizures. Karyotype, subtelomeric and FMR1 analysis were normal in both cases. We performed array-CGH analysis that revealed no copy-number variations potentially associated with ID. Subsequent exome sequence analysis allowed the identification of the ATRX c.109C>T (p.R37X) mutation in both the affected brothers. Sanger sequencing confirmed the presence of the mutation in the brothers and showed that the mother is a healthy carrier. Mutations in the ATRX gene cause the X-linked alpha thalassemia/mental retardation (ATR-X) syndrome (MIM #301040), a severe clinical condition usually associated with profound ID, facial dysmorphism and alpha thalassemia. However, the syndrome is clinically heterogeneous and some mutations, including the c.109C>T, are associated with a broad phenotypic spectrum, with patients displaying a less severe phenotype with only mild-moderate ID. In the case presented here, exome sequencing provided an effective strategy to achieve the molecular diagnosis of ATR-X syndrome, which otherwise would have been difficult to consider due to the mild non-specific phenotype and the absence of a family history with typical severe cases.

  16. Whole-exome sequencing identifies rare and low-frequency coding variants associated with LDL cholesterol

    NARCIS (Netherlands)

    L.A. Lange (Leslie); Y. Hu (Youna); H. Zhang (He); C. Xue (Chenyi); E.M. Schmidt (Ellen); Z.-Z. Tang (Zheng-Zheng); C. Bizon (Chris); E.M. Lange (Ethan); G.D. Smith; E.H. Turner (Emily); Y. Jun (Yang); H.M. Kang (Hyun Min); G.M. Peloso (Gina); P. Auer (Paul); K.-P. Li (Kuo-Ping); J. Flannick (Jason); J. Zhang (Ji); C. Fuchsberger (Christian); K. Gaulton (Kyle); C.M. Lindgren (Cecilia); A. Locke (Adam); A.K. Manning (Alisa); X. Sim (Xueling); M.A. Rivas (Manuel); O.L. Holmen (Oddgeir); R.F. Gottesman (Rebecca); Y. Lu (Yingchang); D. Ruderfer (Douglas); E.A. Stahl (Eli); Q. Duan (Qing); Y. Li (Yun); P. Durda (Peter); S. Jiao (Shuo); A.J. Isaacs (Aaron); A. Hofman (Albert); J.C. Bis (Joshua); D.D. Correa; M.D. Griswold (Michael); M. Jakobsdottir (Margret); G.D. Smith; P.J. Schreiner (Pamela); M.F. Feitosa (Mary Furlan); Q. Zhang (Qunyuan); J.E. Huffman (Jennifer); S. Crosby; C.L. Wassel (Christina); R. Do (Ron); N. Franceschini (Nora); L.W. Martin (Lisa); J.G. Robinson (Jennifer); T.L. Assimes (Themistocles); D.R. Crosslin (David); E.A. Rosenthal (Elisabeth); M.Y. Tsai (Michael); M. Rieder (Mark); D.N. Farlow (Deborah); A.R. Folsom (Aaron); T. Lumley (Thomas); E.R. Fox (Ervin); C.S. Carlson (Christopher); U. Peters (Ulrike); R.D. Jackson (Rebecca); C.M. van Duijn (Cornelia); A.G. Uitterlinden (André); D. Levy (Daniel); J.I. Rotter (Jerome); H.A. Taylor (Herman); V. Gudnason (Vilmundur); D.S. Siscovick (David); M. Fornage (Myriam); I.B. Borecki (Ingrid); C. Hayward (Caroline); I. Rudan (Igor); Y.E. Chen (Y. Eugene); E.P. Bottinger (Erwin); R.J.F. Loos (Ruth); P. Sætrom (Pål); K. Hveem (Kristian); M. Boehnke (Michael); L. Groop (Leif); M.I. McCarthy (Mark); T. Meitinger (Thomas); C. Ballantyne (Christie); S.B. Gabriel (Stacey); C.J. O'Donnell (Christopher); W.S. Post (Wendy S.); K.E. North (Kari); A. Reiner (Alexander); E.A. Boerwinkle (Eric); B.M. Psaty (Bruce); D. Altshuler (David); S. Kathiresan (Sekar); D.Y. Lin (Dan); G.P. Jarvik (Gail); L.A. Cupples (Adrienne); C. Kooperberg (Charles); J.G. Wilson (James); D.A. Nickerson (Deborah); G.R. Abecasis (Gonçalo); S.S. Rich (Stephen); R.P. Tracy (Russell); C.J. Willer (Cristen)

    2014-01-01

    textabstractElevated low-density lipoprotein cholesterol (LDL-C) is a treatable, heritable risk factor for cardiovascular disease. Genome-wide association studies (GWASs) have identified 157 variants associated with lipid levels but are not well suited to assess the impact of rare and low-frequency

  17. Discovery and molecular characterization of a new luteovirus identified by high-throughput sequencing from apple

    Science.gov (United States)

    ‘Rapid Apple Decline’ (RAD) is a newly emerging problem of young, dwarf apple trees in the northeastern USA. The affected trees show trunk necrosis, bark cracking and canker formation before collapsing in the summer. In this study, a new luteovirus and three common viruses were identified from apple...

  18. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    NARCIS (Netherlands)

    Hu, H; Haas, S.A.; Chelly, J.; Esch, H. Van; Raynaud, M.; Brouwer, A.P. de; Weinert, S.; Froyen, G.; Frints, S.G.; Laumonnier, F.; Zemojtel, T.; Love, M.I.; Richard, H.; Emde, A.K.; Bienek, M.; Jensen, C.; Hambrock, M.; Fischer, U.; Langnick, C.; Feldkamp, M.; Wissink-Lindhout, W.; Lebrun, N.; Castelnau, L.; Rucci, J.; Montjean, R.; Dorseuil, O.; Billuart, P.; Stuhlmann, T.; Shaw, M.; Corbett, M.A.; Gardner, A.; Willis-Owen, S.; Tan, C.; Friend, K.L.; Belet, S.; Roozendaal, K.E. van; Jimenez-Pocquet, M.; Moizard, M.P.; Ronce, N.; Sun, R.; O'Keeffe, S.; Chenna, R.; Bommel, A. van; Goke, J.; Hackett, A.; Field, M.; Christie, L.; Boyle, J.; Haan, E.; Nelson, J.; Turner, G.; Baynam, G.; Gillessen-Kaesbach, G.; Muller, U.; Steinberger, D.; Budny, B.; Badura-Stronka, M.; Latos-Bielenska, A.; Ousager, L.B.; Wieacker, P.; Rodriguez Criado, G.; Bondeson, M.L.; Anneren, G.; Dufke, A.; Cohen, M.; Maldergem, L. Van; Vincent-Delorme, C.; Echenne, B.; Simon-Bouy, B.; Kleefstra, T.; Willemsen, M.H.; Fryns, J.P.; Devriendt, K.; Ullmann, R.; Vingron, M.; Wrogemann, K.; Wienker, T.F.; Tzschach, A.; Bokhoven, H. van; Gecz, J.; Jentsch, T.J.; Chen, W.; Ropers, H.H.; Kalscheuer, V.M.

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes or

  19. X-exome sequencing of 405 unresolved families identifies seven novel intellectual disability genes

    DEFF Research Database (Denmark)

    Hu, H; Haas, S A; Chelly, J

    2016-01-01

    X-linked intellectual disability (XLID) is a clinically and genetically heterogeneous disorder. During the past two decades in excess of 100 X-chromosome ID genes have been identified. Yet, a large number of families mapping to the X-chromosome remained unresolved suggesting that more XLID genes...

  20. Exome sequencing identifies early gastric carcinoma as an early stage of advanced gastric cancer.

    Directory of Open Access Journals (Sweden)

    Guhyun Kang

    Full Text Available Gastric carcinoma is one of the major causes of cancer-related mortality worldwide. Early detection and treatment leads to an excellent prognosis in patients with early gastric cancer (EGC, whereas the prognosis of patients with advanced gastric cancer (AGC remains poor. It is unclear whether EGCs and AGCs are distinct entities or whether EGCs are the beginning stages of AGCs. We performed whole exome sequencing of four samples from patients with EGC and compared the results with those from AGCs. In both EGCs and AGCs, a total of 268 genes were commonly mutated and independent mutations were additionally found in EGCs (516 genes and AGCs (3104 genes. A higher frequency of C>G transitions was observed in intestinal-type compared to diffuse-type carcinomas (P = 0.010. The DYRK3, GPR116, MCM10, PCDH17, PCDHB1, RDH5 and UNC5C genes are recurrently mutated in EGCs and may be involved in early carcinogenesis.

  1. Variation of amino acid sequences of serum amyloid a (SAA) and immunohistochemical analysis of amyloid a (AA) in Japanese domestic cats.

    Science.gov (United States)

    Tei, Meina; Uchida, Kazuyuki; Chambers, James K; Watanabe, Ken-Ichi; Tamamoto, Takashi; Ohno, Koichi; Nakayama, Hiroyuki

    2018-02-02

    Amyloid A (AA) amyloidosis, a fatal systemic amyloid disease, occurs secondary to chronic inflammatory conditions in humans. Although persistently elevated serum amyloid A (SAA) levels are required for its pathogenesis, not all individuals with chronic inflammation necessarily develop AA amyloidosis. Furthermore, many diseases in cats are associated with the elevated production of SAA, whereas only a small number actually develop AA amyloidosis. We hypothesized that a genetic mutation in the SAA gene may strongly contribute to the pathogenesis of feline AA amyloidosis. In the present study, genomic DNA from four Japanese domestic cats (JDCs) with AA amyloidosis and from five without amyloidosis was analyzed using polymerase chain reaction (PCR) amplification and direct sequencing. We identified the novel variation combination of 45R-51A in the deduced amino acid sequences of four JDCs with amyloidosis and five without. However, there was no relationship between amino acid variations and the distribution of AA amyloid deposits, indicating that differences in SAA sequences do not contribute to the pathogenesis of AA amyloidosis. Immunohistochemical analysis using antisera against the three different parts of the feline SAA protein-i.e., the N-terminal, central, and C-terminal regions-revealed that feline AA contained the C-terminus, unlike human AA. These results indicate that the cleavage and degradation of the C-terminus are not essential for amyloid fibril formation in JDCs.

  2. Detailed analysis of sequence changes occurring during vlsE antigenic variation in the mouse model of Borrelia burgdorferi infection.

    Directory of Open Access Journals (Sweden)

    Loïc Coutte

    2009-02-01

    Full Text Available Lyme disease Borrelia can infect humans and animals for months to years, despite the presence of an active host immune response. The vls antigenic variation system, which expresses the surface-exposed lipoprotein VlsE, plays a major role in B. burgdorferi immune evasion. Gene conversion between vls silent cassettes and the vlsE expression site occurs at high frequency during mammalian infection, resulting in sequence variation in the VlsE product. In this study, we examined vlsE sequence variation in B. burgdorferi B31 during mouse infection by analyzing 1,399 clones isolated from bladder, heart, joint, ear, and skin tissues of mice infected for 4 to 365 days. The median number of codon changes increased progressively in C3H/HeN mice from 4 to 28 days post infection, and no clones retained the parental vlsE sequence at 28 days. In contrast, the decrease in the number of clones with the parental vlsE sequence and the increase in the number of sequence changes occurred more gradually in severe combined immunodeficiency (SCID mice. Clones containing a stop codon were isolated, indicating that continuous expression of full-length VlsE is not required for survival in vivo; also, these clones continued to undergo vlsE recombination. Analysis of clones with apparent single recombination events indicated that recombinations into vlsE are nonselective with regard to the silent cassette utilized, as well as the length and location of the recombination event. Sequence changes as small as one base pair were common. Fifteen percent of recovered vlsE variants contained "template-independent" sequence changes, which clustered in the variable regions of vlsE. We hypothesize that the increased frequency and complexity of vlsE sequence changes observed in clones recovered from immunocompetent mice (as compared with SCID mice is due to rapid clearance of relatively invariant clones by variable region-specific anti-VlsE antibody responses.

  3. Deep sequencing identifies ethnicity-specific bacterial signatures in the oral microbiome.

    Directory of Open Access Journals (Sweden)

    Matthew R Mason

    Full Text Available Oral infections have a strong ethnic predilection; suggesting that ethnicity is a critical determinant of oral microbial colonization. Dental plaque and saliva samples from 192 subjects belonging to four major ethnicities in the United States were analyzed using terminal restriction fragment length polymorphism (t-RFLP and 16S pyrosequencing. Ethnicity-specific clustering of microbial communities was apparent in saliva and subgingival biofilms, and a machine-learning classifier was capable of identifying an individual's ethnicity from subgingival microbial signatures. The classifier identified African Americans with a 100% sensitivity and 74% specificity and Caucasians with a 50% sensitivity and 91% specificity. The data demonstrates a significant association between ethnic affiliation and the composition of the oral microbiome; to the extent that these microbial signatures appear to be capable of discriminating between ethnicities.

  4. Exome sequencing identifies rare deleterious mutations in DNA repair genes FANCC and BLM as potential breast cancer susceptibility alleles.

    Directory of Open Access Journals (Sweden)

    Ella R Thompson

    2012-09-01

    Full Text Available Despite intensive efforts using linkage and candidate gene approaches, the genetic etiology for the majority of families with a multi-generational breast cancer predisposition is unknown. In this study, we used whole-exome sequencing of thirty-three individuals from 15 breast cancer families to identify potential predisposing genes. Our analysis identified families with heterozygous, deleterious mutations in the DNA repair genes FANCC and BLM, which are responsible for the autosomal recessive disorders Fanconi Anemia and Bloom syndrome. In total, screening of all exons in these genes in 438 breast cancer families identified three with truncating mutations in FANCC and two with truncating mutations in BLM. Additional screening of FANCC mutation hotspot exons identified one pathogenic mutation among an additional 957 breast cancer families. Importantly, none of the deleterious mutations were identified among 464 healthy controls and are not reported in the 1,000 Genomes data. Given the rarity of Fanconi Anemia and Bloom syndrome disorders among Caucasian populations, the finding of multiple deleterious mutations in these critical DNA repair genes among high-risk breast cancer families is intriguing and suggestive of a predisposing role. Our data demonstrate the utility of intra-family exome-sequencing approaches to uncover cancer predisposition genes, but highlight the major challenge of definitively validating candidates where the incidence of sporadic disease is high, germline mutations are not fully penetrant, and individual predisposition genes may only account for a tiny proportion of breast cancer families.

  5. High-throughput sequencing enhanced phage display identifies peptides that bind mycobacteria

    CSIR Research Space (South Africa)

    Ngubane, NAC

    2013-11-01

    Full Text Available . The displayed peptides are flanked by two cysteine residues, which are oxidized during phage assembly to a disulfide bond, resulting in a loop constrained peptide. We initially used the traditional clone picking method to identify the enriched clones... of the library, 1.236109 heptapeptides, it represented sufficient depth to measure the quantitative enrich- ment of relevant peptides. To confirm successful enrichment during selection, we characterized the reduction in diversity of the pool in the consecutive...

  6. A combination of PhP typing and β-d-glucuronidase gene sequence variation analysis for differentiation of Escherichia coli from humans and animals.

    Science.gov (United States)

    Masters, N; Christie, M; Katouli, M; Stratton, H

    2015-06-01

    We investigated the usefulness of the β-d-glucuronidase gene variance in Escherichia coli as a microbial source tracking tool using a novel algorithm for comparison of sequences from a prescreened set of host-specific isolates using a high-resolution PhP typing method. A total of 65 common biochemical phenotypes belonging to 318 E. coli strains isolated from humans and domestic and wild animals were analysed for nucleotide variations at 10 loci along a 518 bp fragment of the 1812 bp β-d-glucuronidase gene. Neighbour-joining analysis of loci variations revealed 86 (76.8%) human isolates and 91.2% of animal isolates were correctly identified. Pairwise hierarchical clustering improved assignment; where 92 (82.1%) human and 204 (99%) animal strains were assigned to their respective cluster. Our data show that initial typing of isolates and selection of common types from different hosts prior to analysis of the β-d-glucuronidase gene sequence improves source identification. We also concluded that numerical profiling of the nucleotide variations can be used as a valuable approach to differentiate human from animal E. coli. This study signifies the usefulness of the β-d-glucuronidase gene as a marker for differentiating human faecal pollution from animal sources.

  7. Particular Candida albicans strains in the digestive tract of dyspeptic patients, identified by multilocus sequence typing.

    Directory of Open Access Journals (Sweden)

    Yan-Bing Gong

    Full Text Available BACKGROUND: Candida albicans is a human commensal that is also responsible for chronic gastritis and peptic ulcerous disease. Little is known about the genetic profiles of the C. albicans strains in the digestive tract of dyspeptic patients. The aim of this study was to evaluate the prevalence, diversity, and genetic profiles among C. albicans isolates recovered from natural colonization of the digestive tract in the dyspeptic patients. METHODS AND FINDINGS: Oral swab samples (n = 111 and gastric mucosa samples (n = 102 were obtained from a group of patients who presented dyspeptic symptoms or ulcer complaints. Oral swab samples (n = 162 were also obtained from healthy volunteers. C. albicans isolates were characterized and analyzed by multilocus sequence typing. The prevalence of Candida spp. in the oral samples was not significantly different between the dyspeptic group and the healthy group (36.0%, 40/111 vs. 29.6%, 48/162; P > 0.05. However, there were significant differences between the groups in the distribution of species isolated and the genotypes of the C. albicans isolates. C. albicans was isolated from 97.8% of the Candida-positive subjects in the dyspeptic group, but from only 56.3% in the healthy group (P < 0.001. DST1593 was the dominant C. albicans genotype from the digestive tract of the dyspeptic group (60%, 27/45, but not the healthy group (14.8%, 4/27 (P < 0.001. CONCLUSIONS: Our data suggest a possible link between particular C. albicans strain genotypes and the host microenvironment. Positivity for particular C. albicans genotypes could signify susceptibility to dyspepsia.

  8. Overview of the creative genome: effects of genome structure and sequence on the generation of variation and evolution.

    Science.gov (United States)

    Caporale, Lynn Helena

    2012-09-01

    This overview of a special issue of Annals of the New York Academy of Sciences discusses uneven distribution of distinct types of variation across the genome, the dependence of specific types of variation upon distinct classes of DNA sequences and/or the induction of specific proteins, the circumstances in which distinct variation-generating systems are activated, and the implications of this work for our understanding of evolution and of cancer. Also discussed is the value of non text-based computational methods for analyzing information carried by DNA, early insights into organizational frameworks that affect genome behavior, and implications of this work for comparative genomics. © 2012 New York Academy of Sciences.

  9. Whole-exome sequencing identifies novel MPL and JAK2 mutations in triple-negative myeloproliferative neoplasms.

    Science.gov (United States)

    Milosevic Feenstra, Jelena D; Nivarthi, Harini; Gisslinger, Heinz; Leroy, Emilie; Rumi, Elisa; Chachoua, Ilyas; Bagienski, Klaudia; Kubesova, Blanka; Pietra, Daniela; Gisslinger, Bettina; Milanesi, Chiara; Jäger, Roland; Chen, Doris; Berg, Tiina; Schalling, Martin; Schuster, Michael; Bock, Christoph; Constantinescu, Stefan N; Cazzola, Mario; Kralovics, Robert

    2016-01-21

    Essential thrombocythemia (ET) and primary myelofibrosis (PMF) are chronic diseases characterized by clonal hematopoiesis and hyperproliferation of terminally differentiated myeloid cells. The disease is driven by somatic mutations in exon 9 of CALR or exon 10 of MPL or JAK2-V617F in >90% of the cases, whereas the remaining cases are termed "triple negative." We aimed to identify the disease-causing mutations in the triple-negative cases of ET and PMF by applying whole-exome sequencing (WES) on paired tumor and control samples from 8 patients. We found evidence of clonal hematopoiesis in 5 of 8 studied cases based on clonality analysis and presence of somatic genetic aberrations. WES identified somatic mutations in 3 of 8 cases. We did not detect any novel recurrent somatic mutations. In 3 patients with clonal hematopoiesis analyzed by WES, we identified a somatic MPL-S204P, a germline MPL-V285E mutation, and a germline JAK2-G571S variant. We performed Sanger sequencing of the entire coding region of MPL in 62, and of JAK2 in 49 additional triple-negative cases of ET or PMF. New somatic (T119I, S204F, E230G, Y591D) and 1 germline (R321W) MPL mutation were detected. All of the identified MPL mutations were gain-of-function when analyzed in functional assays. JAK2 variants were identified in 5 of 57 triple-negative cases analyzed by WES and Sanger sequencing combined. We could demonstrate that JAK2-V625F and JAK2-F556V are gain-of-function mutations. Our results suggest that triple-negative cases of ET and PMF do not represent a homogenous disease entity. Cases with polyclonal hematopoiesis might represent hereditary disorders. © 2016 by The American Society of Hematology.

  10. Fine-scale mapping of natural variation in fly fecundity identifies neuronal domain of expression and function of an aquaporin.

    Directory of Open Access Journals (Sweden)

    Alan O Bergland

    Full Text Available To gain insight into the molecular genetic basis of standing variation in fitness related traits, we identify a novel factor that regulates the molecular and physiological basis of natural variation in female Drosophila melanogaster fecundity. Genetic variation in female fecundity in flies derived from a wild orchard population is heritable and largely independent of other measured life history traits. We map a portion of this variation to a single QTL and then use deficiency mapping to further refine this QTL to 5 candidate genes. Ubiquitous expression of RNAi against only one of these genes, an aquaporin encoded by Drip, reduces fecundity. Within our mapping population Drip mRNA level in the head, but not other tissues, is positively correlated with fecundity. We localize Drip expression to a small population of corazonin producing neurons located in the dorsolateral posterior compartments of the protocerebrum. Expression of Drip-RNAi using both the pan-neuronal ELAV-Gal4 and the Crz-Gal4 drivers reduces fecundity. Low-fecundity RILs have decreased Crz expression and increased expression of pale, the enzyme encoding the rate-limiting step in the production of dopamine, a modulator of insect life histories. Taken together these data suggest that natural variation in Drip expression in the corazonin producing neurons contributes to standing variation in fitness by altering the concentration of two neurohormones.

  11. BLAZAR ANTI-SEQUENCE OF SPECTRAL VARIATION WITHIN INDIVIDUAL BLAZARS: CASES FOR MRK 501 AND 3C 279

    Energy Technology Data Exchange (ETDEWEB)

    Zhang, Jin; Zhang, Shuang-Nan; Liang, En-Wei, E-mail: zhang.jin@hotmail.com [National Astronomical Observatories, Chinese Academy of Sciences, Beijing 100012 (China)

    2013-04-10

    The jet properties of Mrk 501 and 3C 279 are derived by fitting broadband spectral energy distributions (SEDs) with lepton models. The derived {gamma}{sub b} (the break Lorenz factor of the electron distribution) is 10{sup 4}-10{sup 6} for Mrk 501 and 200 {approx} 600 for 3C 279. The magnetic field strength (B) of Mrk 501 is usually one order of magnitude lower than that of 3C 279, but their Doppler factors ({delta}) are comparable. A spectral variation feature where the peak luminosity is correlated with the peak frequency, which is opposite from the blazar sequence, is observed in the two sources. We find that (1) the peak luminosities of the two bumps in the SEDs for Mrk 501 depend on {gamma}{sub b} in both the observer and co-moving frames, but they are not correlated with B and {delta} and (2) the luminosity variation of 3C 279 is dominated by the external Compton (EC) peak and its peak luminosity is correlated with {gamma}{sub b} and {delta}, but anti-correlated with B. These results suggest that {gamma}{sub b} may govern the spectral variation of Mrk 501 and {delta} and B would be responsible for the spectral variation of 3C 279. The narrow distribution of {gamma}{sub b} and the correlation of {gamma}{sub b} and B in 3C 279 would be due to the cooling from the EC process and the strong magnetic field. Based on our brief discussion, we propose that this spectral variation feature may originate from the instability of the corona but not from the variation of the accretion rate as does the blazar sequence.

  12. Whole-exome sequencing, without prior linkage, identifies a mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta.

    Science.gov (United States)

    Poulter, James A; El-Sayed, Walid; Shore, Roger C; Kirkham, Jennifer; Inglehearn, Chris F; Mighell, Alan J

    2014-01-01

    The conventional approach to identifying the defective gene in a family with an inherited disease is to find the disease locus through family studies. However, the rapid development and decreasing cost of next generation sequencing facilitates a more direct approach. Here, we report the identification of a frameshift mutation in LAMB3 as a cause of dominant hypoplastic amelogenesis imperfecta (AI). Whole-exome sequencing of three affected family members and subsequent filtering of shared variants, without prior genetic linkage, sufficed to identify the pathogenic variant. Simultaneous analysis of multiple family members confirms segregation, enhancing the power to filter the genetic variation found and leading to rapid identification of the pathogenic variant. LAMB3 encodes a subunit of Laminin-5, one of a family of basement membrane proteins with essential functions in cell growth, movement and adhesion. Homozygous LAMB3 mutations cause junctional epidermolysis bullosa (JEB) and enamel defects are seen in JEB cases. However, to our knowledge, this is the first report of dominant AI due to a LAMB3 mutation in the absence of JEB.

  13. Identification of mitochondrial DNA sequence variation and development of single nucleotide polymorphic markers for CMS-D8 in cotton.

    Science.gov (United States)

    Suzuki, Hideaki; Yu, Jiwen; Wang, Fei; Zhang, Jinfa

    2013-06-01

    Cytoplasmic male sterility (CMS), which is a maternally inherited trait and controlled by novel chimeric genes in the mitochondrial genome, plays a pivotal role in the production of hybrid seed. In cotton, no PCR-based marker has been developed to discriminate CMS-D8 (from Gossypium trilobum) from its normal Upland cotton (AD1, Gossypium hirsutum) cytoplasm. The objective of the current study was to develop PCR-based single nucleotide polymorphic (SNP) markers from mitochondrial genes for the CMS-D8 cytoplasm. DNA sequence variation in mitochondrial genes involved in the oxidative phosphorylation chain including ATP synthase subunit 1, 4, 6, 8 and 9, and cytochrome c oxidase 1, 2 and 3 subunits were identified by comparing CMS-D8, its isogenic maintainer and restorer lines on the same nuclear genetic background. An allelic specific PCR (AS-PCR) was utilized for SNP typing by incorporating artificial mismatched nucleotides into the third or fourth base from the 3' terminus in both the specific and nonspecific primers. The result indicated that the method modifying allele-specific primers was successful in obtaining eight SNP markers out of eight SNPs using eight primer pairs to discriminate two alleles between AD1 and CMS-D8 cytoplasms. Two of the SNPs for atp1 and cox1 could also be used in combination to discriminate between CMS-D8 and CMS-D2 cytoplasms. Additionally, a PCR-based marker from a nine nucleotide insertion-deletion (InDel) sequence (AATTGTTTT) at the 59-67 bp positions from the start codon of atp6, which is present in the CMS and restorer lines with the D8 cytoplasm but absent in the maintainer line with the AD1 cytoplasm, was also developed. A SNP marker for two nucleotide substitutions (AA in AD1 cytoplasm to CT in CMS-D8 cytoplasm) in the intron (1,506 bp) of cox2 gene was also developed. These PCR-based SNP markers should be useful in discriminating CMS-D8 and AD1 cytoplasms, or those with CMS-D2 cytoplasm as a rapid, simple, inexpensive, and

  14. Saprolegniaceae identified on amphibian eggs throughout the Pacific Northwest, USA, by internal transcribed spacer sequences and phylogenetic analysis.

    Science.gov (United States)

    Petrisko, Jill E; Pearl, Christopher A; Pilliod, David S; Sheridan, Peter P; Williams, Charles F; Peterson, Charles R; Bury, R Bruce

    2008-01-01

    We assessed the diversity and phylogeny of Saprolegniaceae on amphibian eggs from the Pacific Northwest, with particular focus on Saprolegnia ferax, a species implicated in high egg mortality. We identified isolates from eggs of six amphibians with the internal transcribed spacer (ITS) and 5.8S gene regions and BLAST of the GenBank database. We identified 68 sequences as Saprolegniaceae and 43 sequences as true fungi from at least nine genera. Our phylogenetic analysis of the Saprolegniaceae included isolates within the genera Saprolegnia, Achlya and Leptolegnia. Our phylogeny grouped S. semihypogyna with Achlya rather than with the Saprolegnia reference sequences. We found only one isolate that grouped closely with S. ferax, and this came from a hatchery-raised salmon (Idaho) that we sampled opportunistically. We had representatives of 7-12 species and three genera of Saprolegniaceae on our amphibian eggs. Further work on the ecological roles of different species of Saprolegniaceae is needed to clarify their potential importance in amphibian egg mortality and potential links to population declines.

  15. Exome Sequencing Identified a Recessive RDH12 Mutation in a Family with Severe Early-Onset Retinitis Pigmentosa

    Directory of Open Access Journals (Sweden)

    Bo Gong

    2015-01-01

    Full Text Available Retinitis pigmentosa (RP is the most important hereditary retinal disease caused by progressive degeneration of the photoreceptor cells. This study is to identify gene mutations responsible for autosomal recessive retinitis pigmentosa (arRP in a Chinese family using next-generation sequencing technology. A Chinese family with 7 members including two individuals affected with severe early-onset RP was studied. All patients underwent a complete ophthalmic examination. Exome sequencing was performed on a single RP patient (the proband of this family and direct Sanger sequencing on other family members and normal controls was followed to confirm the causal mutations. A homozygous mutation c.437Tidentified as being related to the phenotype of this arRP family. This homozygous mutation was detected in the two affected patients, but not present in other family members and 600 normal controls. Another three normal members in the family were found to carry this heterozygous missense mutation. Our results emphasize the importance of c.437T

  16. Hybridization-based antibody cDNA recovery for the production of recombinant antibodies identified by repertoire sequencing.

    Science.gov (United States)

    Valdés-Alemán, Javier; Téllez-Sosa, Juan; Ovilla-Muñoz, Marbella; Godoy-Lozano, Elizabeth; Velázquez-Ramírez, Daniel; Valdovinos-Torres, Humberto; Gómez-Barreto, Rosa E; Martinez-Barnetche, Jesús

    2014-01-01

    High-throughput sequencing of the antibody repertoire is enabling a thorough analysis of B cell diversity and clonal selection, which may improve the novel antibody discovery process. Theoretically, an adequate bioinformatic analysis could allow identification of candidate antigen-specific antibodies, requiring their recombinant production for experimental validation of their specificity. Gene synthesis is commonly used for the generation of recombinant antibodies identified in silico. Novel strategies that bypass gene synthesis could offer more accessible antibody identification and validation alternatives. We developed a hybridization-based recovery strategy that targets the complementarity-determining region 3 (CDRH3) for the enrichment of cDNA of candidate antigen-specific antibody sequences. Ten clonal groups of interest were identified through bioinformatic analysis of the heavy chain antibody repertoire of mice immunized with hen egg white lysozyme (HEL). cDNA from eight of the targeted clonal groups was recovered efficiently, leading to the generation of recombinant antibodies. One representative heavy chain sequence from each clonal group recovered was paired with previously reported anti-HEL light chains to generate full antibodies, later tested for HEL-binding capacity. The recovery process proposed represents a simple and scalable molecular strategy that could enhance antibody identification and specificity assessment, enabling a more cost-efficient generation of recombinant antibodies.

  17. Identifying Variations in Hydraulic Conductivity on the East River at Crested Butte, CO

    Science.gov (United States)

    Ulmer, K. N.; Malenda, H. F.; Singha, K.

    2016-12-01

    Slug tests are a widely used method to measure saturated hydraulic conductivity, or how easily water flows through an aquifer, by perturbing the piezometric surface and measuring the time the local groundwater table takes to re-equilibrate. Saturated hydraulic conductivity is crucial to calculating the speed and direction of groundwater movement. Therefore, it is important to document data variance from in situ slug tests. This study addresses two potential sources of data variability: different users and different types of slug used. To test for user variability, two individuals slugged the same six wells with water multiple times at a stream meander on the East River near Crested Butte, CO. To test for variations in type of slug test, multiple water and metal slug tests were performed at a single well in the same meander. The distributions of hydraulic conductivities of each test were then tested for variance using both the Kruskal-Wallis test and the Brown-Forsythe test. When comparing the hydraulic conductivity distributions gathered by the two individuals, we found that they were statistically similar. However, we found that the two types of slug tests produced hydraulic conductivity distributions for the same well that are statistically dissimilar. In conclusion, multiple people should be able to conduct slug tests without creating any considerable variations in the resulting hydraulic conductivity values, but only a single type of slug should be used for those tests.

  18. Validation of rearrangement break points identified by paired-end sequencing in natural populations of Drosophila melanogaster.

    Science.gov (United States)

    Cridland, Julie M; Thornton, Kevin R

    2010-01-13

    Several recent studies have focused on the evolution of recently duplicated genes in Drosophila. Currently, however, little is known about the evolutionary forces acting upon duplications that are segregating in natural populations. We used a high-throughput, paired-end sequencing platform (Illumina) to identify structural variants in a population sample of African D. melanogaster. Polymerase chain reaction and sequencing confirmation of duplications detected by multiple, independent paired-ends showed that paired-end sequencing reliably uncovered the break points of structural rearrangements and allowed us to identify a number of tandem duplications segregating within a natural population. Our confirmation experiments show that rates of confirmation are very high, even at modest coverage. Our results also compare well with previous studies using microarrays (Emerson J, Cardoso-Moreira M, Borevitz JO, Long M. 2008. Natural selection shapes genome wide patterns of copy-number polymorphism in Drosophila melanogaster. Science. 320:1629-1631. and Dopman EB, Hartl DL. 2007. A portrait of copy-number polymorphism in Drosophila melanogaster. Proc Natl Acad Sci U S A. 104:19920-19925.), which both gives us confidence in the results of this study as well as confirms previous microarray results.We were also able to identify whole-gene duplications, such as a novel duplication of Or22a, an olfactory receptor, and identify copy-number differences in genes previously known to be under positive selection, like Cyp6g1, which confers resistance to dichlorodiphenyltrichloroethane. Several "hot spots" of duplications were detected in this study, which indicate that particular regions of the genome may be more prone to generating duplications. Finally, population frequency analysis of confirmed events also showed an excess of rare variants in our population, which indicates that duplications segregating in the population may be deleterious and ultimately destined to be lost from the

  19. Genome-wide linkage, exome sequencing and functional analyses identify ABCB6 as the pathogenic gene of dyschromatosis universalis hereditaria.

    Directory of Open Access Journals (Sweden)

    Hong Liu

    Full Text Available As a genetic disorder of abnormal pigmentation, the molecular basis of dyschromatosis universalis hereditaria (DUH had remained unclear until recently when ABCB6 was reported as a causative gene of DUH.We performed genome-wide linkage scan using Illumina Human 660W-Quad BeadChip and exome sequencing analyses using Agilent SureSelect Human All Exon Kits in a multiplex Chinese DUH family to identify the pathogenic mutations and verified the candidate mutations using Sanger sequencing. Quantitative RT-PCR and Immunohistochemistry was performed to verify the expression of the pathogenic gene, Zebrafish was also used to confirm the functional role of ABCB6 in melanocytes and pigmentation.Genome-wide linkage (assuming autosomal dominant inheritance mode and exome sequencing analyses identified ABCB6 as the disease candidate gene by discovering a coding mutation (c.1358C>T; p.Ala453Val that co-segregates with the disease phenotype. Further mutation analysis of ABCB6 in four other DUH families and two sporadic cases by Sanger sequencing confirmed the mutation (c.1358C>T; p.Ala453Val and discovered a second, co-segregating coding mutation (c.964A>C; p.Ser322Lys in one of the four families. Both mutations were heterozygous in DUH patients and not present in the 1000 Genome Project and dbSNP database as well as 1,516 unrelated Chinese healthy controls. Expression analysis in human skin and mutagenesis interrogation in zebrafish confirmed the functional role of ABCB6 in melanocytes and pigmentation. Given the involvement of ABCB6 mutations in coloboma, we performed ophthalmological examination of the DUH carriers of ABCB6 mutations and found ocular abnormalities in them.Our study has advanced our understanding of DUH pathogenesis and revealed the shared pathological mechanism between pigmentary DUH and ocular coloboma.

  20. Porcine MYF6 gene: sequence, homology analysis, and variation in the promoter region.

    Science.gov (United States)

    Wyszyńska-Koko, J; Kurył, J

    2004-01-01

    MYF6 gene codes for the bHLH transcription factor belonging to MyoD family. Its expression accompanies the processes of differentiation and maturation of myotubes during embriogenesis and continues on a relatively high level after birth, affecting the muscle phenotype. The porcine MYF6 gene was amplified and sequenced and compared with MYF6 gene sequences of other species. The amino acid sequence was deduced and an interspecies homology analysis was performed. Myf-6 protein shows a high conservation among species of 99 and 97% identity when comparing pig with cow and human, respectively, and of 93% when comparing pig with mouse and rat. The single nucleotide polymorphism (SNP) was revealed within the promoter region, which appeared to be T --> C transition recognized by a MspI restriction enzyme.

  1. Private selective sweeps identified from next-generation pool-sequencing reveal convergent pathways under selection in two inbred Schistosoma mansoni strains.

    Directory of Open Access Journals (Sweden)

    Julie A J Clément

    Full Text Available BACKGROUND: The trematode flatworms of the genus Schistosoma, the causative agents of schistosomiasis, are among the most prevalent parasites in humans, affecting more than 200 million people worldwide. In this study, we focused on two well-characterized strains of S. mansoni, to explore signatures of selection. Both strains are highly inbred and exhibit differences in life history traits, in particular in their compatibility with the intermediate host Biomphalaria glabrata. METHODOLOGY/PRINCIPAL FINDINGS: We performed high throughput sequencing of DNA from pools of individuals of each strain using Illumina technology and identified single nucleotide polymorphisms (SNP and copy number variations (CNV. In total, 708,898 SNPs were identified and roughly 2,000 CNVs. The SNPs revealed low nucleotide diversity (π = 2 × 10(-4 within each strain and a high differentiation level (Fst = 0.73 between them. Based on a recently developed in-silico approach, we further detected 12 and 19 private (i.e. specific non-overlapping selective sweeps among the 121 and 151 sweeps found in total for each strain. CONCLUSIONS/SIGNIFICANCE: Functional annotation of transcripts lying in the private selective sweeps revealed specific selection for functions related to parasitic interaction (e.g. cell-cell adhesion or redox reactions. Despite high differentiation between strains, we identified evolutionary convergence of genes related to proteolysis, known as a key virulence factor and a potential target of drug and vaccine development. Our data show that pool-sequencing can be used for the detection of selective sweeps in parasite populations and enables one to identify biological functions under selection.

  2. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    Energy Technology Data Exchange (ETDEWEB)

    Van Leersum, M.D. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Schweitzer, M.E. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Gannon, F. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Vinitski, S. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Finkel, G. [Dept. of Pathology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States); Mitchell, D.G. [Dept. of Radiology, Thomas Jefferson Univ. Hospital, Philadelphia, PA (United States)

    1995-08-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  3. Thickness of patellofemoral articular cartilage as measured on MR imaging: sequence comparison of accuracy, reproducibility, and interobserver variation

    International Nuclear Information System (INIS)

    Van Leersum, M.D.; Schweitzer, M.E.; Gannon, F.; Vinitski, S.; Finkel, G.; Mitchell, D.G.

    1995-01-01

    This study was undertaken to assess the accuracy, precision, and reliability of magnetic resonance (MR) measurements of articular cartilage. Fifteen cadaveric patellas were imaged in the axial plane at 1.5 T. Gradient echo and fat-suppressed FSE, T2-weighted, proton density, and T1-weighted sequences were performed. We measured each 5-mm section separately at three standardized positions, giving a total of 900 measurements. These findings were correlated with independently performed measurements of the corresponding anatomic sections. A hundred random measurements were also evaluated for reproducibility and interobserver variation. Although all sequences were highly accurate, the T1-weighted images were the most accurate, with a mean difference of 0.25 mm and a correlation coefficient of 0.85. All sequences were also highly reproducible with little inter-observer variation. In an attempt to improve the accuracy of the MR measurements further, we retrospectively evaluated all measurements with discrepancies greater than 1 mm from the specimen. All these differences were attributable to focal defects causing exaggeration of the thickness on MR imaging. (orig.)

  4. Intraspecific variations in Cyt b and D-loop sequences of Testudine species, Lissemys punctata from south Karnataka

    Directory of Open Access Journals (Sweden)

    R. Lalitha

    2018-01-01

    Full Text Available The freshwater Testudine species have gained importance in recent years, as most of their population is threatened due to exploitation for delicacy and pet trade. In this regard, Lissemys punctata, a freshwater terrapin, predominantly distributed in Asian countries has gained its significance for the study. A pilot study report on mitochondrial markers (Cyt b and D-loop conducted on L. punctata species from southern Karnataka, India was presented in this investigation. A complete region spanning 1.14 kb and ∼1 kb was amplified by HotStart PCR and sequenced by Sanger sequencing. The Cyt b sequence revealed 85 substitution sites, no indels and 17 parsimony informative sites, whereas D-loop showed 189 variable sites, 51 parsimony informative sites with 5′ functional domains TAS, CSB-F, CSBs (1, 2, 3 preceding tandem repeat at 3′ end. Current data highlights the intraspecific variations in these target regions and variations validated using suitable evolutionary models points out that the overall point mutations observed in the region are transitions leading to no structural and functional alterations. The mitochondrial data generated uncover the genetic diversity within species and conservationist can utilize the data to estimate the effective population size or for forensic identification of animal or its seizures during unlawful trade activities.

  5. Cytoplasmic male sterility-associated chimeric open reading frames identified by mitochondrial genome sequencing of four Cajanus genotypes.

    Science.gov (United States)

    Tuteja, Reetu; Saxena, Rachit K; Davila, Jaime; Shah, Trushar; Chen, Wenbin; Xiao, Yong-Li; Fan, Guangyi; Saxena, K B; Alverson, Andrew J; Spillane, Charles; Town, Christopher; Varshney, Rajeev K

    2013-10-01

    The hybrid pigeonpea (Cajanus cajan) breeding technology based on cytoplasmic male sterility (CMS) is currently unique among legumes and displays major potential for yield increase. CMS is defined as a condition in which a plant is unable to produce functional pollen grains. The novel chimeric open reading frames (ORFs) produced as a results of mitochondrial genome rearrangements are considered to be the main cause of CMS. To identify these CMS-related ORFs in pigeonpea, we sequenced the mitochondrial genomes of three C. cajan lines (the male-sterile line ICPA 2039, the maintainer line ICPB 2039, and the hybrid line ICPH 2433) and of the wild relative (Cajanus cajanifolius ICPW 29). A single, circular-mapping molecule of length 545.7 kb was assembled and annotated for the ICPA 2039 line. Sequence annotation predicted 51 genes, including 34 protein-coding and 17 RNA genes. Comparison of the mitochondrial genomes from different Cajanus genotypes identified 31 ORFs, which differ between lines within which CMS is present or absent. Among these chimeric ORFs, 13 were identified by comparison of the related male-sterile and maintainer lines. These ORFs display features that are known to trigger CMS in other plant species and to represent the most promising candidates for CMS-related mitochondrial rearrangements in pigeonpea.

  6. Targeted/exome sequencing identified mutations in ten Chinese patients diagnosed with Noonan syndrome and related disorders

    Directory of Open Access Journals (Sweden)

    Shanshan Xu

    2017-10-01

    Full Text Available Abstract Background Noonan syndrome (NS and Noonan syndrome with multiple lentigines (NSML are autosomal dominant developmental disorders. NS and NSML are caused by abnormalities in genes that encode proteins related to the RAS-MAPK pathway, including PTPN11, RAF1, BRAF, and MAP2K. In this study, we diagnosed ten NS or NSML patients via targeted sequencing or whole exome sequencing (TS/WES. Methods TS/WES was performed to identify mutations in ten Chinese patients who exhibited the following manifestations: potential facial dysmorphisms, short stature, congenital heart defects, and developmental delay. Sanger sequencing was used to confirm the suspected pathological variants in the patients and their family members. Results TS/WES revealed three mutations in the PTPN11 gene, three mutations in RAF1 gene, and four mutations in BRAF gene in the NS and NSML patients who were previously diagnosed based on the abovementioned clinical features. All the identified mutations were determined to be de novo mutations. However, two patients who carried the same mutation in the RAF1 gene presented different clinical features. One patient with multiple lentigines was diagnosed with NSML, while the other patient without lentigines was diagnosed with NS. In addition, a patient who carried a hotspot mutation in the BRAF gene was diagnosed with NS instead of cardiofaciocutaneous syndrome (CFCS. Conclusions TS/WES has emerged as a useful tool for definitive diagnosis and accurate genetic counseling of atypical cases. In this study, we analyzed ten Chinese patients diagnosed with NS and related disorders and identified their correspondingPTPN11, RAF1, and BRAF mutations. Among the target genes, BRAF showed the same degree of correlation with NS incidence as that of PTPN11 or RAF1.

  7. Sequence diversity and copy number variation of Mutator-like transposases in wheat

    Directory of Open Access Journals (Sweden)

    Nobuaki Asakura

    2008-01-01

    Full Text Available Partial transposase-coding sequences of Mutator-like elements (MULEs were isolated from a wild einkorn wheat, Triticum urartu, by degenerate PCR. The isolated sequences were classified into a MuDR or Class I clade and divided into two distinct subclasses (subclass I and subclass II. The average pair-wise identity between members of both subclasses was 58.8% at the nucleotide sequence level. Sequence diversity of subclass I was larger than that of subclass II. DNA gel blot analysis showed that subclass I was present as low copy number elements in the genomes of all Triticum and Aegilops accessions surveyed, while subclass II was present as high copy number elements. These two subclasses seemed uncapable of recognizing each other for transposition. The number of copies of subclass II elements was much higher in Aegilops with the S, Sl and D genomes and polyploid Triticum species than in diploid Triticum with the A genome, indicating that active transposition occurred in S, Sl and D genomes before polyploidization. DNA gel blot analysis of six species selected from three subfamilies of Poaceae demonstrated that only the tribe Triticeae possessed both subclasses. These results suggest that the differentiation of these two subclasses occurred before or immediately after the establishment of the tribe Triticeae.

  8. Sequencing the CHO DXB11 genome reveals regional variations in genomic stability and haploidy

    DEFF Research Database (Denmark)

    Kaas, Christian Schrøder; Kristensen, Claus; Betenbaugh, Michael J.

    2015-01-01

    Background: The DHFR negative CHO DXB11 cell line (also known as DUX-B11 and DUKX) was historically the first CHO cell line to be used for large scale production of heterologous proteins and is still used for production of a number of complex proteins.  Results: Here we present the genomic sequence...... of the CHO DXB11 genome sequenced to a depth of 33x. Overall a significant genomic drift was seen favoring GC -> AT point mutations in line with the chemical mutagenesis strategy used for generation of the cell line. The sequencing depth for each gene in the genome revealed distinct peaks at sequencing...... in eight additional analyzed CHO genomes (15-20% haploidy) but not in the genome of the Chinese hamster. The dhfr gene is confirmed to be haploid in CHO DXB11; transcriptionally active and the remaining allele contains a G410C point mutation causing a Thr137Arg missense mutation. We find similar to 2...

  9. Impact of variations in fatty liver on sonographic detection of focal hepatic lesions originally identified by CT

    OpenAIRE

    Wu, Size; Tu, Rong; Nan, Ruixia; Liu, Guangqing; Cui, Xiaojing; Liang, Xian

    2015-01-01

    Purpose: The aim of this study was to investigate the influence of variations in fatty liver on the ultrasonographic detection of focal liver lesions. Methods: A total of 229 patients with varying degrees of fatty liver and focal liver lesions and 200 patients with focal liver lesions but no fatty liver were randomly selected for inclusion in groups I and II, respectively. Findings of focal liver lesions identified on computed tomography were taken as the reference, and findings on ultrasonog...

  10. Solution of the problem of the identified minimum for the tri-variate ...

    Indian Academy of Sciences (India)

    tified minimum is considered below has zero means, and distinct variances. The solution ... and a non-singular covariance matrix , where ij = ρij σi σj for i ...... (i) through (iv) above, we can use (4.29) to identify a2. 21. , a2. 31. , a2. 12. , a2. 32 uniquely. Now we consider (4.28). In this case, there are two possibilities: (A2. 1, B2.

  11. Striking structural dynamism and nucleotide sequence variation of the transposon Galileo in the genome of Drosophila mojavensis.

    Science.gov (United States)

    Marzo, Mar; Bello, Xabier; Puig, Marta; Maside, Xulio; Ruiz, Alfredo

    2013-02-04

    Galileo is a transposable element responsible for the generation of three chromosomal inversions in natural populations of Drosophila buzzatii. Although the most characteristic feature of Galileo is the long internally-repetitive terminal inverted repeats (TIRs), which resemble the Drosophila Foldback element, its transposase-coding sequence has led to its classification as a member of the P-element superfamily (Class II, subclass 1, TIR order). Furthermore, Galileo has a wide distribution in the genus Drosophila, since it has been found in 6 of the 12 Drosophila sequenced genomes. Among these species, D. mojavensis, the one closest to D. buzzatii, presented the highest diversity in sequence and structure of Galileo elements. In the present work, we carried out a thorough search and annotation of all the Galileo copies present in the D. mojavensis sequenced genome. In our set of 170 Galileo copies we have detected 5 Galileo subfamilies (C, D, E, F, and X) with different structures ranging from nearly complete, to only 2 TIR or solo TIR copies. Finally, we have explored the structural and length variation of the Galileo copies that point out the relatively frequent rearrangements within and between Galileo elements. Different mechanisms responsible for these rearrangements are discussed. Although Galileo is a transposable element with an ancient history in the D. mojavensis genome, our data indicate a recent transpositional activity. Furthermore, the dynamism in sequence and structure, mainly affecting the TIRs, suggests an active exchange of sequences among the copies. This exchange could lead to new subfamilies of the transposon, which could be crucial for the long-term survival of the element in the genome.

  12. T cell receptor sequencing of early-stage breast cancer tumors identifies altered clonal structure of the T cell repertoire.

    Science.gov (United States)

    Beausang, John F; Wheeler, Amanda J; Chan, Natalie H; Hanft, Violet R; Dirbas, Frederick M; Jeffrey, Stefanie S; Quake, Stephen R

    2017-11-28

    Tumor-infiltrating T cells play an important role in many cancers, and can improve prognosis and yield therapeutic targets. We characterized T cells infiltrating both breast cancer tumors and the surrounding normal breast tissue to identify T cells specific to each, as well as their abundance in peripheral blood. Using immune profiling of the T cell beta-chain repertoire in 16 patients with early-stage breast cancer, we show that the clonal structure of the tumor is significantly different from adjacent breast tissue, with the tumor containing ∼2.5-fold greater density of T cells and higher clonality compared with normal breast. The clonal structure of T cells in blood and normal breast is more similar than between blood and tumor, and could be used to distinguish tumor from normal breast tissue in 14 of 16 patients. Many T cell sequences overlap between tissue and blood from the same patient, including ∼50% of T cells between tumor and normal breast. Both tumor and normal breast contain high-abundance "enriched" sequences that are absent or of low abundance in the other tissue. Many of these T cells are either not detected or detected with very low frequency in the blood, suggesting the existence of separate compartments of T cells in both tumor and normal breast. Enriched T cell sequences are typically unique to each patient, but a subset is shared between many different patients. We show that many of these are commonly generated sequences, and thus unlikely to play an important role in the tumor microenvironment. Copyright © 2017 the Author(s). Published by PNAS.

  13. Classification of EEG signals to identify variations in attention during motor task execution.

    Science.gov (United States)

    Aliakbaryhosseinabadi, Susan; Kamavuako, Ernest Nlandu; Jiang, Ning; Farina, Dario; Mrachacz-Kersting, Natalie

    2017-06-01

    Brain-computer interface (BCI) systems in neuro-rehabilitation use brain signals to control external devices. User status such as attention affects BCI performance; thus detecting the user's attention drift due to internal or external factors is essential for high detection accuracy. An auditory oddball task was applied to divert the users' attention during a simple ankle dorsiflexion movement. Electroencephalogram signals were recorded from eighteen channels. Temporal and time-frequency features were projected to a lower dimension space and used to analyze the effect of two attention levels on motor tasks in each participant. Then, a global feature distribution was constructed with the projected time-frequency features of all participants from all channels and applied for attention classification during motor movement execution. Time-frequency features led to significantly better classification results with respect to the temporal features, particularly for electrodes located over the motor cortex. Motor cortex channels had a higher accuracy in comparison to other channels in the global discrimination of attention level. Previous methods have used the attention to a task to drive external devices, such as the P300 speller. However, here we focus for the first time on the effect of attention drift while performing a motor task. It is possible to explore user's attention variation when performing motor tasks in synchronous BCI systems with time-frequency features. This is the first step towards an adaptive real-time BCI with an integrated function to reveal attention shifts from the motor task. Copyright © 2017 Elsevier B.V. All rights reserved.

  14. Chimeric proteins for detection and quantitation of DNA mutations, DNA sequence variations, DNA damage and DNA mismatches

    Science.gov (United States)

    McCutchen-Maloney, Sandra L.

    2002-01-01

    Chimeric proteins having both DNA mutation binding activity and nuclease activity are synthesized by recombinant technology. The proteins are of the general formula A-L-B and B-L-A where A is a peptide having DNA mutation binding activity, L is a linker and B is a peptide having nuclease activity. The chimeric proteins are useful for detection and identification of DNA sequence variations including DNA mutations (including DNA damage and mismatches) by binding to the DNA mutation and cutting the DNA once the DNA mutation is detected.

  15. Population genetic implications from sequence variation in four Y chromosome genes.

    Science.gov (United States)

    Shen, P; Wang, F; Underhill, P A; Franco, C; Yang, W H; Roxas, A; Sung, R; Lin, A A; Hyman, R W; Vollrath, D; Davis, R W; Cavalli-Sforza, L L; Oefner, P J

    2000-06-20

    Some insight into human evolution has been gained from the sequencing of four Y chromosome genes. Primary genomic sequencing determined gene SMCY to be composed of 27 exons that comprise 4,620 bp of coding sequence. The unfinished sequencing of the 5' portion of gene UTY1 was completed by primer walking, and a total of 20 exons were found. By using denaturing HPLC, these two genes, as well as DBY and DFFRY, were screened for polymorphic sites in 53-72 representatives of the five continents. A total of 98 variants were found, yielding nucleotide diversity estimates of 2.45 x 10(-5), 5. 07 x 10(-5), and 8.54 x 10(-5) for the coding regions of SMCY, DFFRY, and UTY1, respectively, with no variant having been observed in DBY. In agreement with most autosomal genes, diversity estimates for the noncoding regions were about 2- to 3-fold higher and ranged from 9. 16 x 10(-5) to 14.2 x 10(-5) for the four genes. Analysis of the frequencies of derived alleles for all four genes showed that they more closely fit the expectation of a Luria-Delbrück distribution than a distribution expected under a constant population size model, providing evidence for exponential population growth. Pairwise nucleotide mismatch distributions date the occurrence of population expansion to approximately 28,000 years ago. This estimate is in accord with the spread of Aurignacian technology and the disappearance of the Neanderthals.

  16. Cross-comparison of the genome sequences from human, chimpanzee, Neanderthal and a Denisovan hominin identifies novel potentially compensated mutations

    Directory of Open Access Journals (Sweden)

    Zhang Guojie

    2011-07-01

    Full Text Available Abstract The recent publication of the draft genome sequences of the Neanderthal and a ~50,000-year-old archaic hominin from Denisova Cave in southern Siberia has ushered in a new age in molecular archaeology. We previously cross-compared the human, chimpanzee and Neanderthal genome sequences with respect to a set of disease-causing/disease-associated missense and regulatory mutations (Human Gene Mutation Database and succeeded in identifying genetic variants which, although apparently pathogenic in humans, may represent a 'compensated' wild-type state in at least one of the other two species. Here, in an attempt to identify further 'potentially compensated mutations' (PCMs of interest, we have compared our dataset of disease-causing/disease-associated mutations with their corresponding nucleotide positions in the Denisovan hominin, Neanderthal and chimpanzee genomes. Of the 15 human putatively disease-causing mutations that were found to be compensated in chimpanzee, Denisovan or Neanderthal, only a solitary F5 variant (Val1736Met was specific to the Denisovan. In humans, this missense mutation is associated with activated protein C resistance and an increased risk of thromboembolism and recurrent miscarriage. It is unclear at this juncture whether this variant was indeed a PCM in the Denisovan or whether it could instead have been associated with disease in this ancient hominin.

  17. Lariat sequencing in a unicellular yeast identifies regulated alternative splicing of exons that are evolutionarily conserved with humans.

    Science.gov (United States)

    Awan, Ali R; Manfredo, Amanda; Pleiss, Jeffrey A

    2013-07-30

    Alternative splicing is a potent regulator of gene expression that vastly increases proteomic diversity in multicellular eukaryotes and is associated with organismal complexity. Although alternative splicing is widespread in vertebrates, little is known about the evolutionary origins of this process, in part because of the absence of phylogenetically conserved events that cross major eukaryotic clades. Here we describe a lariat-sequencing approach, which offers high sensitivity for detecting splicing events, and its application to the unicellular fungus, Schizosaccharomyces pombe, an organism that shares many of the hallmarks of alternative splicing in mammalian systems but for which no previous examples of exon-skipping had been demonstrated. Over 200 previously unannotated splicing events were identified, including examples of regulated alternative splicing. Remarkably, an evolutionary analysis of four of the exons identified here as subject to skipping in S. pombe reveals high sequence conservation and perfect length conservation with their homologs in scores of plants, animals, and fungi. Moreover, alternative splicing of two of these exons have been documented in multiple vertebrate organisms, making these the first demonstrations of identical alternative-splicing patterns in species that are separated by over 1 billion y of evolution.

  18. Targeted next generation sequencing identifies functionally deleterious germline mutations in novel genes in early-onset/familial prostate cancer.

    Directory of Open Access Journals (Sweden)

    Paula Paulo

    2018-04-01

    Full Text Available Considering that mutations in known prostate cancer (PrCa predisposition genes, including those responsible for hereditary breast/ovarian cancer and Lynch syndromes, explain less than 5% of early-onset/familial PrCa, we have sequenced 94 genes associated with cancer predisposition using next generation sequencing (NGS in a series of 121 PrCa patients. We found monoallelic truncating/functionally deleterious mutations in seven genes, including ATM and CHEK2, which have previously been associated with PrCa predisposition, and five new candidate PrCa associated genes involved in cancer predisposing recessive disorders, namely RAD51C, FANCD2, FANCI, CEP57 and RECQL4. Furthermore, using in silico pathogenicity prediction of missense variants among 18 genes associated with breast/ovarian cancer and/or Lynch syndrome, followed by KASP genotyping in 710 healthy controls, we identified "likely pathogenic" missense variants in ATM, BRIP1, CHEK2 and TP53. In conclusion, this study has identified putative PrCa predisposing germline mutations in 14.9% of early-onset/familial PrCa patients. Further data will be necessary to confirm the genetic heterogeneity of inherited PrCa predisposition hinted in this study.

  19. Variation in the number of nucleoli and incomplete homogenization of 18S ribosomal DNA sequences in leaf cells of the cultivated Oriental ginseng (Panax ginseng Meyer).

    Science.gov (United States)

    Chelomina, Galina N; Rozhkovan, Konstantin V; Voronova, Anastasia N; Burundukova, Olga L; Muzarok, Tamara I; Zhuravlev, Yuri N

    2016-04-01

    Wild ginseng, Panax ginseng Meyer, is an endangered species of medicinal plants. In the present study, we analyzed variations within the ribosomal DNA (rDNA) cluster to gain insight into the genetic diversity of the Oriental ginseng, P. ginseng, at artificial plant cultivation. The roots of wild P. ginseng plants were sampled from a nonprotected natural population of the Russian Far East. The slides were prepared from leaf tissues using the squash technique for cytogenetic analysis. The 18S rDNA sequences were cloned and sequenced. The distribution of nucleotide diversity, recombination events, and interspecific phylogenies for the total 18S rDNA sequence data set was also examined. In mesophyll cells, mononucleolar nuclei were estimated to be dominant (75.7%), while the remaining nuclei contained two to four nucleoli. Among the analyzed 18S rDNA clones, 20% were identical to the 18S rDNA sequence of P. ginseng from Japan, and other clones differed in one to six substitutions. The nucleotide polymorphism was more expressed at the positions 440-640 bp, and distributed in variable regions, expansion segments, and conservative elements of core structure. The phylogenetic analysis confirmed conspecificity of ginseng plants cultivated in different regions, with two fixed mutations between P. ginseng and other species. This study identified the evidences of the intragenomic nucleotide polymorphism in the 18S rDNA sequences of P. ginseng. These data suggest that, in cultivated plants, the observed genome instability may influence the synthesis of biologically active compounds, which are widely used in traditional medicine.

  20. Assessing Mitochondrial DNA Variation and Copy Number in Lymphocytes of ~2,000 Sardinians Using Tailored Sequencing Analysis Tools.

    Directory of Open Access Journals (Sweden)

    Jun Ding

    2015-07-01

    Full Text Available DNA sequencing identifies common and rare genetic variants for association studies, but studies typically focus on variants in nuclear DNA and ignore the mitochondrial genome. In fact, analyzing variants in mitochondrial DNA (mtDNA sequences presents special problems, which we resolve here with a general solution for the analysis of mtDNA in next-generation sequencing studies. The new program package comprises 1 an algorithm designed to identify mtDNA variants (i.e., homoplasmies and heteroplasmies, incorporating sequencing error rates at each base in a likelihood calculation and allowing allele fractions at a variant site to differ across individuals; and 2 an estimation of mtDNA copy number in a cell directly from whole-genome sequencing data. We also apply the methods to DNA sequence from lymphocytes of ~2,000 SardiNIA Project participants. As expected, mothers and offspring share all homoplasmies but a lesser proportion of heteroplasmies. Both homoplasmies and heteroplasmies show 5-fold higher transition/transversion ratios than variants in nuclear DNA. Also, heteroplasmy increases with age, though on average only ~1 heteroplasmy reaches the 4% level between ages 20 and 90. In addition, we find that mtDNA copy number averages ~110 copies/lymphocyte and is ~54% heritable, implying substantial genetic regulation of the level of mtDNA. Copy numbers also decrease modestly but significantly with age, and females on average have significantly more copies than males. The mtDNA copy numbers are significantly associated with waist circumference (p-value = 0.0031 and waist-hip ratio (p-value = 2.4×10-5, but not with body mass index, indicating an association with central fat distribution. To our knowledge, this is the largest population analysis to date of mtDNA dynamics, revealing the age-imposed increase in heteroplasmy, the relatively high heritability of copy number, and the association of copy number with metabolic traits.

  1. De novo sequencing of circulating miRNAs identifies novel markers predicting clinical outcome of locally advanced breast cancer

    Directory of Open Access Journals (Sweden)

    Wu Xiwei

    2012-03-01

    Full Text Available Abstract Background MicroRNAs (miRNAs have been recently detected in the circulation of cancer patients, where they are associated with clinical parameters. Discovery profiling of circulating small RNAs has not been reported in breast cancer (BC, and was carried out in this study to identify blood-based small RNA markers of BC clinical outcome. Methods The pre-treatment sera of 42 stage II-III locally advanced and inflammatory BC patients who received neoadjuvant chemotherapy (NCT followed by surgical tumor resection were analyzed for marker identification by deep sequencing all circulating small RNAs. An independent validation cohort of 26 stage II-III BC patients was used to assess the power of identified miRNA markers. Results More than 800 miRNA species were detected in the circulation, and observed patterns showed association with histopathological profiles of BC. Groups of circulating miRNAs differentially associated with ER/PR/HER2 status and inflammatory BC were identified. The relative levels of selected miRNAs measured by PCR showed consistency with their abundance determined by deep sequencing. Two circulating miRNAs, miR-375 and miR-122, exhibited strong correlations with clinical outcomes, including NCT response and relapse with metastatic disease. In the validation cohort, higher levels of circulating miR-122 specifically predicted metastatic recurrence in stage II-III BC patients. Conclusions Our study indicates that certain miRNAs can serve as potential blood-based biomarkers for NCT response, and that miR-122 prevalence in the circulation predicts BC metastasis in early-stage patients. These results may allow optimized chemotherapy treatments and preventive anti-metastasis interventions in future clinical applications.

  2. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    Science.gov (United States)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-01-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  3. Rapid Characterization of Insulin Modifications and Sequence Variations by Proteinase K Digestion and UHPLC-ESI-MS

    Science.gov (United States)

    Yang, Rong-Sheng; Tang, Weijuan; Sheng, Huaming; Meng, Fanyu

    2018-05-01

    Discovery of novel insulin analogs as therapeutics has remained an active area of research. Compared with native human insulin, insulin analog molecules normally incorporate either covalent modifications or amino acid sequence variations. From the drug discovery and development perspective, methods for efficient and detailed characterization of these primary structural changes are very important. In this report, we demonstrate that proteinase K digestion coupled with UPLC-ESI-MS analysis provides a simple and rapid approach to characterize the modifications and sequence variations of insulin molecules. A commercially available proteinase K digestion kit was used to process recombinant human insulin (RHI), insulin glargine, and fluorescein isothiocynate-labeled recombinant human insulin (FITC-RHI) samples. The LC-MS data clearly showed that RHI and insulin glargine samples can be differentiated, and the FITC modifications in all three amine sites of the RHI molecule are well characterized. The end-to-end experiment and data interpretation was achieved within 60 min. This approach is fast and simple, and can be easily implemented in early drug discovery laboratories to facilitate research on more advanced insulin therapeutics. [Figure not available: see fulltext.

  4. PPARGC1A sequence variation and cardiovascular risk-factor levels

    DEFF Research Database (Denmark)

    Brito, E C; Vimaleswaran, K S; Brage, S

    2009-01-01

    .005; rs13117172, p = 0.008) and fasting glucose concentrations (rs7657071, p = 0.002). None remained significant after correcting for the number of statistical comparisons. We proceeded by testing for gene x physical activity interactions for the polymorphisms that showed nominal evidence of association...... in the main effect models. None of these tests was statistically significant. CONCLUSIONS/INTERPRETATION: Variants at PPARGC1A may influence several metabolic traits in this European paediatric cohort. However, variation at PPARGC1A is unlikely to have a major impact on cardiovascular or metabolic health...

  5. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population.

    Science.gov (United States)

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-06-02

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. © The Author 2016. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

  6. Pooled Enrichment Sequencing Identifies Diversity and Evolutionary Pressures at NLR Resistance Genes within a Wild Tomato Population

    Science.gov (United States)

    Stam, Remco; Scheikl, Daniela; Tellier, Aurélien

    2016-01-01

    Nod-like receptors (NLRs) are nucleotide-binding domain and leucine-rich repeats containing proteins that are important in plant resistance signaling. Many of the known pathogen resistance (R) genes in plants are NLRs and they can recognize pathogen molecules directly or indirectly. As such, divergence and copy number variants at these genes are found to be high between species. Within populations, positive and balancing selection are to be expected if plants coevolve with their pathogens. In order to understand the complexity of R-gene coevolution in wild nonmodel species, it is necessary to identify the full range of NLRs and infer their evolutionary history. Here we investigate and reveal polymorphism occurring at 220 NLR genes within one population of the partially selfing wild tomato species Solanum pennellii. We use a combination of enrichment sequencing and pooling ten individuals, to specifically sequence NLR genes in a resource and cost-effective manner. We focus on the effects which different mapping and single nucleotide polymorphism calling software and settings have on calling polymorphisms in customized pooled samples. Our results are accurately verified using Sanger sequencing of polymorphic gene fragments. Our results indicate that some NLRs, namely 13 out of 220, have maintained polymorphism within our S. pennellii population. These genes show a wide range of πN/πS ratios and differing site frequency spectra. We compare our observed rate of heterozygosity with expectations for this selfing and bottlenecked population. We conclude that our method enables us to pinpoint NLR genes which have experienced natural selection in their habitat. PMID:27189991

  7. Full genome sequencing and genetic characterization of Eubenangee viruses identify Pata virus as a distinct species within the genus Orbivirus.

    Directory of Open Access Journals (Sweden)

    Manjunatha N Belaganahalli

    Full Text Available Eubenangee virus has previously been identified as the cause of Tammar sudden death syndrome (TSDS. Eubenangee virus (EUBV, Tilligery virus (TILV, Pata virus (PATAV and Ngoupe virus (NGOV are currently all classified within the Eubenangee virus species of the genus Orbivirus, family Reoviridae. Full genome sequencing confirmed that EUBV and TILV (both of which are from Australia show high levels of aa sequence identity (>92% in the conserved polymerase VP1(Pol, sub-core VP3(T2 and outer core VP7(T13 proteins, and are therefore appropriately classified within the same virus species. However, they show much lower amino acid (aa identity levels in their larger outer-capsid protein VP2 (<53%, consistent with membership of two different serotypes - EUBV-1 and EUBV-2 (respectively. In contrast PATAV showed significantly lower levels of aa sequence identity with either EUBV or TILV (with <71% in VP1(Pol and VP3(T2, and <57% aa identity in VP7(T13 consistent with membership of a distinct virus species. A proposal has therefore been sent to the Reoviridae Study Group of ICTV to recognise 'Pata virus' as a new Orbivirus species, with the PATAV isolate as serotype 1 (PATAV-1. Amongst the other orbiviruses, PATAV shows closest relationships to Epizootic Haemorrhagic Disease virus (EHDV, with 80.7%, 72.4% and 66.9% aa identity in VP3(T2, VP1(Pol, and VP7(T13 respectively. Although Ngoupe virus was not available for these studies, like PATAV it was isolated in Central Africa, and therefore seems likely to also belong to the new species, possibly as a distinct 'type'. The data presented will facilitate diagnostic assay design and the identification of additional isolates of these viruses.

  8. The First Endogenous Herpesvirus, Identified in the Tarsier Genome, and Novel Sequences from Primate Rhadinoviruses and Lymphocryptoviruses

    Science.gov (United States)

    Aswad, Amr; Katzourakis, Aris

    2014-01-01

    Herpesviridae is a diverse family of large and complex pathogens whose genomes are extremely difficult to sequence. This is particularly true for clinical samples, and if the virus, host, or both genomes are being sequenced for the first time. Although herpesviruses are known to occasionally integrate in host genomes, and can also be inherited in a Mendelian fashion, they are notably absent from the genomic fossil record comprised of endogenous viral elements (EVEs). Here, we combine paleovirological and metagenomic approaches to both explore the constituent viral diversity of mammalian genomes and search for endogenous herpesviruses. We describe the first endogenous herpesvirus from the genome of the Philippine tarsier, belonging to the Roseolovirus genus, and characterize its highly defective genome that is integrated and flanked by unambiguous host DNA. From a draft assembly of the aye-aye genome, we use bioinformatic tools to reveal over 100,000 bp of a novel rhadinovirus that is the first lemur gammaherpesvirus, closely related to Kaposi's sarcoma-associated virus. We also identify 58 genes of Pan paniscus lymphocryptovirus 1, the bonobo equivalent of human Epstein-Barr virus. For each of the viruses, we postulate gene function via comparative analysis to known viral relatives. Most notably, the evidence from gene content and phylogenetics suggests that the aye-aye sequences represent the most basal known rhadinovirus, and indicates that tumorigenic herpesviruses have been infecting primates since their emergence in the late Cretaceous. Overall, these data show that a genomic fossil record of herpesviruses exists despite their extremely large genomes, and expands the known diversity of Herpesviridae, which will aid the characterization of pathogenesis. Our analytical approach illustrates the benefit of intersecting evolutionary approaches with metagenomics, genetics and paleovirology. PMID:24945689

  9. Sequence variation in the melanocortin-1 receptor (MC1R pigmentation gene and its role in the cryptic coloration of two South American sand lizards

    Directory of Open Access Journals (Sweden)

    Josmael Corso

    2012-01-01

    Full Text Available In reptiles, dorsal body darkness often varies with substrate color or temperature environment, and is generally presumed to be an adaptation for crypsis or thermoregulation. However, the genetic basis of pigmentation is poorly known in this group. In this study we analyzed the coding region of the melanocortin-1-receptor (MC1R gene, and therefore its role underlying the dorsal color variation in two sympatric species of sand lizards (Liolaemus that inhabit the southeastern coast of South America: L. occipitalis and L. arambarensis. The first is light-colored and occupies aeolic pale sand dunes, while the second is brownish and lives in a darker sandy habitat. We sequenced 630 base pairs of MC1R in both species. In total, 12 nucleotide polymorphisms were observed, and four amino acid replacement sites, but none of them could be associated with a color pattern. Comparative analysis indicated that these taxa are monomorphic for amino acid sites that were previously identified as functionally important in other reptiles. Thus, our results indicate that MC1R is not involved in the pigmentation pattern observed in Liolaemus lizards. Therefore, structural differences in other genes, such as ASIP, or variation in regulatory regions of MC1R may be responsible for this variation. Alternatively, the phenotypic differences observed might be a consequence of non-genetic factors, such as thermoregulatory mechanisms.

  10. Next-Generation Sequencing-Based Detection of Germline Copy Number Variations in BRCA1/BRCA2

    DEFF Research Database (Denmark)

    Schmidt, Ane Y; Hansen, Thomas V O; Ahlborn, Lise B

    2017-01-01

    Genetic testing of BRCA1/2 includes screening for single nucleotide variants and small insertions/deletions and for larger copy number variations (CNVs), primarily by Sanger sequencing and multiplex ligation-dependent probe amplification (MLPA). With the advent of next-generation sequencing (NGS)...

  11. Genotyping-by-sequencing in an orphan plant species Physocarpus opulifolius helps identify the evolutionary origins of the genus Prunus.

    Science.gov (United States)

    Buti, Matteo; Sargent, Daniel J; Mhelembe, Khethani G; Delfino, Pietro; Tobutt, Kenneth R; Velasco, Riccardo

    2016-05-11

    The Rosaceae family encompasses numerous genera exhibiting morphological diversification in fruit types and plant habit as well as a wide variety of chromosome numbers. Comparative genomics between various Rosaceous genera has led to the hypothesis that the ancestral genome of the family contained nine chromosomes, however, the synteny studies performed in the Rosaceae to date encompass species with base chromosome numbers x = 7 (Fragaria), x = 8 (Prunus), and x = 17 (Malus), and no study has included species from one of the many Rosaceous genera containing a base chromosome number of x = 9. A genetic linkage map of the species Physocarpus opulifolius (x = 9) was populated with sequence characterised SNP markers using genotyping by sequencing. This allowed for the first time, the extent of the genome diversification of a Rosaceous genus with a base chromosome number of x = 9 to be performed. Orthologous loci distributed throughout the nine chromosomes of Physocarpus and the eight chromosomes of Prunus were identified which permitted a meaningful comparison of the genomes of these two genera to be made. The study revealed a high level of macro-synteny between the two genomes, and relatively few chromosomal rearrangements, as has been observed in studies of other Rosaceous genomes, lending further support for a relatively simple model of genomic evolution in Rosaceae.

  12. Flavonoid Biosynthesis Genes Putatively Identified in the Aromatic Plant Polygonum minus via Expressed Sequences Tag (EST Analysis

    Directory of Open Access Journals (Sweden)

    Zamri Zainal

    2012-02-01

    Full Text Available P. minus is an aromatic plant, the leaf of which is widely used as a food additive and in the perfume industry. The leaf also accumulates secondary metabolites that act as active ingredients such as flavonoid. Due to limited genomic and transcriptomic data, the biosynthetic pathway of flavonoids is currently unclear. Identification of candidate genes involved in the flavonoid biosynthetic pathway will significantly contribute to understanding the biosynthesis of active compounds. We have constructed a standard cDNA library from P. minus leaves, and two normalized full-length enriched cDNA libraries were constructed from stem and root organs in order to create a gene resource for the biosynthesis of secondary metabolites, especially flavonoid biosynthesis. Thus, large‑scale sequencing of P. minus cDNA libraries identified 4196 expressed sequences tags (ESTs which were deposited in dbEST in the National Center of Biotechnology Information (NCBI. From the three constructed cDNA libraries, 11 ESTs encoding seven genes were mapped to the flavonoid biosynthetic pathway. Finally, three flavonoid biosynthetic pathway-related ESTs chalcone synthase, CHS (JG745304, flavonol synthase, FLS (JG705819 and leucoanthocyanidin dioxygenase, LDOX (JG745247 were selected for further examination by quantitative RT-PCR (qRT-PCR in different P. minus organs. Expression was detected in leaf, stem and root. Gene expression studies have been initiated in order to better understand the underlying physiological processes.

  13. De Novo Transcriptome Sequencing of Olea europaea L. to Identify Genes Involved in the Development of the Pollen Tube.

    Science.gov (United States)

    Iaria, Domenico; Chiappetta, Adriana; Muzzalupo, Innocenzo

    2016-01-01

    In olive (Olea europaea L.), the processes controlling self-incompatibility are still unclear and the molecular basis underlying this process are still not fully characterized. In order to determine compatibility relationships, using next-generation sequencing techniques and a de novo transcriptome assembly strategy, we show that pollen tubes from different olive plants, grown in vitro in a medium containing its own pistil and in combination pollen/pistil from self-sterile and self-fertile cultivars, have a distinct gene expression profile and many of the differentially expressed sequences between the samples fall within gene families involved in the development of the pollen tube, such as lipase, carboxylesterase, pectinesterase, pectin methylesterase, and callose synthase. Moreover, different genes involved in signal transduction, transcription, and growth are overrepresented. The analysis also allowed us to identify members in actin and actin depolymerization factor and fibrin gene family and member of the Ca(2+) binding gene family related to the development and polarization of pollen apical tip. The whole transcriptomic analysis, through the identification of the differentially expressed transcripts set and an extended functional annotation analysis, will lead to a better understanding of the mechanisms of pollen germination and pollen tube growth in the olive.

  14. Use of deep whole-genome sequencing data to identify structure risk variants in breast cancer susceptibility genes.

    Science.gov (United States)

    Guo, Xingyi; Shi, Jiajun; Cai, Qiuyin; Shu, Xiao-Ou; He, Jing; Wen, Wanqing; Allen, Jamie; Pharoah, Paul; Dunning, Alison; Hunter, David J; Kraft, Peter; Easton, Douglas F; Zheng, Wei; Long, Jirong

    2018-03-01

    Functional disruptions of susceptibility genes by large genomic structure variant (SV) deletions in germlines are known to be associated with cancer risk. However, few studies have been conducted to systematically search for SV deletions in breast cancer susceptibility genes. We analysed deep (> 30x) whole-genome sequencing (WGS) data generated in blood samples from 128 breast cancer patients of Asian and European descent with either a strong family history of breast cancer or early cancer onset disease. To identify SV deletions in known or suspected breast cancer susceptibility genes, we used multiple SV calling tools including Genome STRiP, Delly, Manta, BreakDancer and Pindel. SV deletions were detected by at least three of these bioinformatics tools in five genes. Specifically, we identified heterozygous deletions covering a fraction of the coding regions of BRCA1 (with approximately 80kb in two patients), and TP53 genes (with ∼1.6 kb in two patients), and of intronic regions (∼1 kb) of the PALB2 (one patient), PTEN (three patients) and RAD51C genes (one patient). We confirmed the presence of these deletions using real-time quantitative PCR (qPCR). Our study identified novel SV deletions in breast cancer susceptibility genes and the identification of such SV deletions may improve clinical testing.

  15. Whole genome sequencing identifies circulating Beijing-lineage Mycobacterium tuberculosis strains in Guatemala and an associated urban outbreak.

    Science.gov (United States)

    Saelens, Joseph W; Lau-Bonilla, Dalia; Moller, Anneliese; Medina, Narda; Guzmán, Brenda; Calderón, Maylena; Herrera, Raúl; Sisk, Dana M; Xet-Mull, Ana M; Stout, Jason E; Arathoon, Eduardo; Samayoa, Blanca; Tobin, David M

    2015-12-01

    Limited data are available regarding the molecular epidemiology of Mycobacterium tuberculosis (Mtb) strains circulating in Guatemala. Beijing-lineage Mtb strains have gained prevalence worldwide and are associated with increased virulence and drug resistance, but there have been only a few cases reported in Central America. Here we report the first whole genome sequencing of Central American Beijing-lineage strains of Mtb. We find that multiple Beijing-lineage strains, derived from independent founding events, are currently circulating in Guatemala, but overall still represent a relatively small proportion of disease burden. Finally, we identify a specific Beijing-lineage outbreak centered on a poor neighborhood in Guatemala City. Copyright © 2015 The Authors. Published by Elsevier Ltd.. All rights reserved.

  16. POU4F3 mutation screening in Japanese hearing loss patients: Massively parallel DNA sequencing-based analysis identified novel variants associated with autosomal dominant hearing loss.

    Directory of Open Access Journals (Sweden)

    Tomohiro Kitano

    Full Text Available A variant in a transcription factor gene, POU4F3, is responsible for autosomal dominant nonsyndromic hereditary hearing loss, DFNA15. To date, 14 variants, including a whole deletion of POU4F3, have been reported to cause HL in various ethnic groups. In the present study, genetic screening for POU4F3 variants was carried out for a large series of Japanese hearing loss (HL patients to clarify the prevalence and clinical characteristics of DFNA15 in the Japanese population. Massively parallel DNA sequencing of 68 target candidate genes was utilized in 2,549 unrelated Japanese HL patients (probands to identify genomic variations responsible for HL. The detailed clinical features in patients with POU4F3 variants were collected from medical charts and analyzed. Novel 12 POU4F3 likely pathogenic variants (six missense variants, three frameshift variants, and three nonsense variants were successfully identified in 15 probands (2.5% among 602 families exhibiting autosomal dominant HL, whereas no variants were detected in the other 1,947 probands with autosomal recessive or inheritance pattern unknown HL. To obtain the audiovestibular configuration of the patients harboring POU4F3 variants, we collected audiograms and vestibular symptoms of the probands and their affected family members. Audiovestibular phenotypes in a total of 24 individuals from the 15 families possessing variants were characterized by progressive HL, with a large variation in the onset age and severity with or without vestibular symptoms observed. Pure-tone audiograms indicated the most prevalent configuration as mid-frequency HL type followed by high-frequency HL type, with asymmetry observed in approximately 20% of affected individuals. Analysis of the relationship between age and pure-tone average suggested that individuals with truncating variants showed earlier onset and slower progression of HL than did those with non-truncating variants. The present study showed that variants

  17. Massively parallel signature sequencing and bioinformatics analysis identifies up-regulation of TGFBI and SOX4 in human glioblastoma.

    Directory of Open Access Journals (Sweden)

    Biaoyang Lin

    Full Text Available BACKGROUND: A comprehensive network-based understanding of molecular pathways abnormally altered in glioblastoma multiforme (GBM is essential for developing effective therapeutic approaches for this deadly disease. METHODOLOGY/PRINCIPAL FINDINGS: Applying a next generation sequencing technology, massively parallel signature sequencing (MPSS, we identified a total of 4535 genes that are differentially expressed between normal brain and GBM tissue. The expression changes of three up-regulated genes, CHI3L1, CHI3L2, and FOXM1, and two down-regulated genes, neurogranin and L1CAM, were confirmed by quantitative PCR. Pathway analysis revealed that TGF- beta pathway related genes were significantly up-regulated in GBM tumor samples. An integrative pathway analysis of the TGF beta signaling network identified two alternative TGF-beta signaling pathways mediated by SOX4 (sex determining region Y-box 4 and TGFBI (Transforming growth factor beta induced. Quantitative RT-PCR and immunohistochemistry staining demonstrated that SOX4 and TGFBI expression is elevated in GBM tissues compared with normal brain tissues at both the RNA and protein levels. In vitro functional studies confirmed that TGFBI and SOX4 expression is increased by TGF-beta stimulation and decreased by a specific inhibitor of TGF-beta receptor 1 kinase. CONCLUSIONS/SIGNIFICANCE: Our MPSS database for GBM and normal brain tissues provides a useful resource for the scientific community. The identification of non-SMAD mediated TGF-beta signaling pathways acting through SOX4 and TGFBI (GENE ID:7045 in GBM indicates that these alternative pathways should be considered, in addition to the canonical SMAD mediated pathway, in the development of new therapeutic strategies targeting TGF-beta signaling in GBM. Finally, the construction of an extended TGF-beta signaling network with overlaid gene expression changes between GBM and normal brain extends our understanding of the biology of GBM.

  18. Sequencing illustrates the transcriptional response of Legionella pneumophila during infection and identifies seventy novel small non-coding RNAs.

    LENUS (Irish Health Repository)

    Weissenmayer, Barbara A

    2011-01-01

    Second generation sequencing has prompted a number of groups to re-interrogate the transcriptomes of several bacterial and archaeal species. One of the central findings has been the identification of complex networks of small non-coding RNAs that play central roles in transcriptional regulation in all growth conditions and for the pathogen\\'s interaction with and survival within host cells. Legionella pneumophila is a gram-negative facultative intracellular human pathogen with a distinct biphasic lifestyle. One of its primary environmental hosts in the free-living amoeba Acanthamoeba castellanii and its infection by L. pneumophila mimics that seen in human macrophages. Here we present analysis of strand specific sequencing of the transcriptional response of L. pneumophila during exponential and post-exponential broth growth and during the replicative and transmissive phase of infection inside A. castellanii. We extend previous microarray based studies as well as uncovering evidence of a complex regulatory architecture underpinned by numerous non-coding RNAs. Over seventy new non-coding RNAs could be identified; many of them appear to be strain specific and in configurations not previously reported. We discover a family of non-coding RNAs preferentially expressed during infection conditions and identify a second copy of 6S RNA in L. pneumophila. We show that the newly discovered putative 6S RNA as well as a number of other non-coding RNAs show evidence for antisense transcription. The nature and extent of the non-coding RNAs and their expression patterns suggests that these may well play central roles in the regulation of Legionella spp. specific traits and offer clues as to how L. pneumophila adapts to its intracellular niche. The expression profiles outlined in the study have been deposited into Genbank\\'s Gene Expression Omnibus (GEO) database under the series accession GSE27232.

  19. Exercise-Induced Rhabdomyolysis and Stress-Induced Malignant Hyperthermia Events, Association with Malignant Hyperthermia Susceptibility, and RYR1 Gene Sequence Variations

    Directory of Open Access Journals (Sweden)

    Antonella Carsana

    2013-01-01

    Full Text Available Exertional rhabdomyolysis (ER and stress-induced malignant hyperthermia (MH events are syndromes that primarily afflict military recruits in basic training and athletes. Events similar to those occurring in ER and in stress-induced MH events are triggered after exposure to anesthetic agents in MH-susceptible (MHS patients. MH is an autosomal dominant hypermetabolic condition that occurs in genetically predisposed subjects during general anesthesia, induced by commonly used volatile anesthetics and/or the neuromuscular blocking agent succinylcholine. Triggering agents cause an altered intracellular calcium regulation. Mutations in RYR1 gene have been found in about 70% of MH families. The RYR1 gene encodes the skeletal muscle calcium release channel of the sarcoplasmic reticulum, commonly known as ryanodine receptor type 1 (RYR1. The present work reviews the documented cases of ER or of stress-induced MH events in which RYR1 sequence variations, associated or possibly associated to MHS status, have been identified.

  20. Identification of microRNAs from Amur grape (Vitis amurensis Rupr.) by deep sequencing and analysis of microRNA variations with bioinformatics.

    Science.gov (United States)

    Wang, Chen; Han, Jian; Liu, Chonghuai; Kibet, Korir Nicholas; Kayesh, Emrul; Shangguan, Lingfei; Li, Xiaoying; Fang, Jinggui

    2012-03-29

    MicroRNA (miRNA) is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr.) is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs) from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR) analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Deep sequencing of short RNAs from Amur grape flowers and berries identified 72 new potential miRNAs and 34 known but non-conserved mi

  1. Identification of microRNAs from Amur grape (vitis amurensis Rupr. by deep sequencing and analysis of microRNA variations with bioinformatics

    Directory of Open Access Journals (Sweden)

    Wang Chen

    2012-03-01

    Full Text Available Abstract Background MicroRNA (miRNA is a class of functional non-coding small RNA with 19-25 nucleotides in length while Amur grape (Vitis amurensis Rupr. is an important wild fruit crop with the strongest cold resistance among the Vitis species, is used as an excellent breeding parent for grapevine, and has elicited growing interest in wine production. To date, there is a relatively large number of grapevine miRNAs (vv-miRNAs from cultivated grapevine varieties such as Vitis vinifera L. and hybrids of V. vinifera and V. labrusca, but there is no report on miRNAs from Vitis amurensis Rupr, a wild grapevine species. Results A small RNA library from Amur grape was constructed and Solexa technology used to perform deep sequencing of the library followed by subsequent bioinformatics analysis to identify new miRNAs. In total, 126 conserved miRNAs belonging to 27 miRNA families were identified, and 34 known but non-conserved miRNAs were also found. Significantly, 72 new potential Amur grape-specific miRNAs were discovered. The sequences of these new potential va-miRNAs were further validated through miR-RACE, and accumulation of 18 new va-miRNAs in seven tissues of grapevines confirmed by real time RT-PCR (qRT-PCR analysis. The expression levels of va-miRNAs in flowers and berries were found to be basically consistent in identity to those from deep sequenced sRNAs libraries of combined corresponding tissues. We also describe the conservation and variation of va-miRNAs using miR-SNPs and miR-LDs during plant evolution based on comparison of orthologous sequences, and further reveal that the number and sites of miR-SNP in diverse miRNA families exhibit distinct divergence. Finally, 346 target genes for the new miRNAs were predicted and they include a number of Amur grape stress tolerance genes and many genes regulating anthocyanin synthesis and sugar metabolism. Conclusions Deep sequencing of short RNAs from Amur grape flowers and berries identified 72

  2. RNA sequencing of Populus x canadensis roots identifies key molecular mechanisms underlying physiological adaption to excess zinc.

    Directory of Open Access Journals (Sweden)

    Andrea Ariani

    Full Text Available Populus x canadensis clone I-214 exhibits a general indicator phenotype in response to excess Zn, and a higher metal uptake in roots than in shoots with a reduced translocation to aerial parts under hydroponic conditions. This physiological adaptation seems mainly regulated by roots, although the molecular mechanisms that underlie these processes are still poorly understood. Here, differential expression analysis using RNA-sequencing technology was used to identify the molecular mechanisms involved in the response to excess Zn in root. In order to maximize specificity of detection of differentially expressed (DE genes, we consider the intersection of genes identified by three distinct statistical approaches (61 up- and 19 down-regulated and validate them by RT-qPCR, yielding an agreement of 93% between the two experimental techniques. Gene Ontology (GO terms related to oxidation-reduction processes, transport and cellular iron ion homeostasis were enriched among DE genes, highlighting the importance of metal homeostasis in adaptation to excess Zn by P. x canadensis clone I-214. We identified the up-regulation of two Populus metal transporters (ZIP2 and NRAMP1 probably involved in metal uptake, and the down-regulation of a NAS4 gene involved in metal translocation. We identified also four Fe-homeostasis transcription factors (two bHLH38 genes, FIT and BTS that were differentially expressed, probably for reducing Zn-induced Fe-deficiency. In particular, we suggest that the down-regulation of FIT transcription factor could be a mechanism to cope with Zn-induced Fe-deficiency in Populus. These results provide insight into the molecular mechanisms involved in adaption to excess Zn in Populus spp., but could also constitute a starting point for the identification and characterization of molecular markers or biotechnological targets for possible improvement of phytoremediation performances of poplar trees.

  3. Analysis of copy number variations in Holstein cows identify potential mechanisms contributing to differences in residual feed intake.

    Science.gov (United States)

    Hou, Yali; Bickhart, Derek M; Chung, Hoyoung; Hutchison, Jana L; Norman, H Duane; Connor, Erin E; Liu, George E

    2012-11-01

    Genomic structural variation is an important and abundant source of genetic and phenotypic variation. In this study, we performed an initial analysis of copy number variations (CNVs) using BovineHD SNP genotyping data from 147 Holstein cows identified as having high or low feed efficiency as estimated by residual feed intake (RFI). We detected 443 candidate CNV regions (CNVRs) that represent 18.4 Mb (0.6 %) of the genome. To investigate the functional impacts of CNVs, we created two groups of 30 individual animals with extremely low or high estimated breeding values (EBVs) for RFI, and referred to these groups as low intake (LI; more efficient) or high intake (HI; less efficient), respectively. We identified 240 (~9.0 Mb) and 274 (~10.2 Mb) CNVRs from LI and HI groups, respectively. Approximately 30-40 % of the CNVRs were specific to the LI group or HI group of animals. The 240 LI CNVRs overlapped with 137 Ensembl genes. Network analyses indicated that the LI-specific genes were predominantly enriched for those functioning in the inflammatory response and immunity. By contrast, the 274 HI CNVRs contained 177 Ensembl genes. Network analyses indicated that the HI-specific genes were particularly involved in the cell cycle, and organ and bone development. These results relate CNVs to two key variables, namely immune response and organ and bone development. The data indicate that greater feed efficiency relates more closely to immune response, whereas cattle with reduced feed efficiency may have a greater capacity for organ and bone development.

  4. Analyses of Tissue Culture Adaptation of Human Herpesvirus-6A by Whole Genome Deep Sequencing Redefines the Reference Sequence and Identifies Virus Entry Complex Changes.

    Science.gov (United States)

    Tweedy, Joshua G; Escriva, Eric; Topf, Maya; Gompels, Ursula A

    2017-12-31

    Tissue-culture adaptation of viruses can modulate infection. Laboratory passage and bacterial artificial chromosome (BAC)mid cloning of human cytomegalovirus, HCMV, resulted in genomic deletions and rearrangements altering genes encoding the virus entry complex, which affected cellular tropism, virulence, and vaccine development. Here, we analyse these effects on the reference genome for related betaherpesviruses, Roseolovirus, human herpesvirus 6A (HHV-6A) strain U1102. This virus is also naturally "cloned" by germline subtelomeric chromosomal-integration in approximately 1% of human populations, and accurate references are key to understanding pathological relationships between exogenous and endogenous virus. Using whole genome next-generation deep-sequencing Illumina-based methods, we compared the original isolate to tissue-culture passaged and the BACmid-cloned virus. This re-defined the reference genome showing 32 corrections and 5 polymorphisms. Furthermore, minor variant analyses of passaged and BACmid virus identified emerging populations of a further 32 single nucleotide polymorphisms (SNPs) in 10 loci, half non-synonymous indicating cell-culture selection. Analyses of the BAC-virus genome showed deletion of the BAC cassette via loxP recombination removing green fluorescent protein (GFP)-based selection. As shown for HCMV culture effects, select HHV-6A SNPs mapped to genes encoding mediators of virus cellular entry, including virus envelope glycoprotein genes gB and the gH/gL complex. Comparative models suggest stabilisation of the post-fusion conformation. These SNPs are essential to consider in vaccine-design, antimicrobial-resistance, and pathogenesis.

  5. Deep Sequencing Reveals the Complete Genome and Evidence for Transcriptional Activity of the First Virus-Like Sequences Identified in Aristotelia chilensis (Maqui Berry

    Directory of Open Access Journals (Sweden)

    Javier Villacreses

    2015-04-01

    Full Text Available Here, we report the genome sequence and evidence for transcriptional activity of a virus-like element in the native Chilean berry tree Aristotelia chilensis. We propose to name the endogenous sequence as Aristotelia chilensis Virus 1 (AcV1. High-throughput sequencing of the genome of this tree uncovered an endogenous viral element, with a size of 7122 bp, corresponding to the complete genome of AcV1. Its sequence contains three open reading frames (ORFs: ORFs 1 and 2 shares 66%–73% amino acid similarity with members of the Caulimoviridae virus family, especially the Petunia vein clearing virus (PVCV, Petuvirus genus. ORF1 encodes a movement protein (MP; ORF2 a Reverse Transcriptase (RT and a Ribonuclease H (RNase H domain; and ORF3 showed no amino acid sequence similarity with any other known virus proteins. Analogous to other known endogenous pararetrovirus sequences (EPRVs, AcV1 is integrated in the genome of Maqui Berry and showed low viral transcriptional activity, which was detected by deep sequencing technology (DNA and RNA-seq. Phylogenetic analysis of AcV1 and other pararetroviruses revealed a closer resemblance with Petuvirus. Overall, our data suggests that AcV1 could be a new member of Caulimoviridae family, genus Petuvirus, and the first evidence of this kind of virus in a fruit plant.

  6. Demographic history and biologically relevant genetic variation of Native Mexicans inferred from whole-genome sequencing

    OpenAIRE

    Romero-Hidalgo, Sandra; Ochoa-Leyva, Adrián; Garcíarrubio, Alejandro; Acuña-Alonzo, Victor; Antúnez-Argüelles, Erika; Balcazar-Quintero, Martha; Barquera-Lozano, Rodrigo; Carnevale, Alessandra; Cornejo-Granados, Fernanda; Fernández-López, Juan Carlos; García-Herrera, Rodrigo; García-Ortíz, Humberto; Granados-Silvestre, Ángeles; Granados, Julio; Guerrero-Romero, Fernando

    2017-01-01

    Understanding the genetic structure of Native American populations is important to clarify their diversity, demographic history, and to identify genetic factors relevant for biomedical traits. Here, we show a demographic history reconstruction from 12 Native American whole genomes belonging to six distinct ethnic groups representing the three main described genetic clusters of Mexico (Northern, Southern, and Maya). Effective population size estimates of all Native American groups remained bel...

  7. Targeted high-throughput sequencing identifies mutations in atlastin-1 as a cause of hereditary sensory neuropathy type I.

    Science.gov (United States)

    Guelly, Christian; Zhu, Peng-Peng; Leonardis, Lea; Papić, Lea; Zidar, Janez; Schabhüttl, Maria; Strohmaier, Heimo; Weis, Joachim; Strom, Tim M; Baets, Jonathan; Willems, Jan; De Jonghe, Peter; Reilly, Mary M; Fröhlich, Eleonore; Hatz, Martina; Trajanoski, Slave; Pieber, Thomas R; Janecke, Andreas R; Blackstone, Craig; Auer-Grumbach, Michaela

    2011-01-07

    Hereditary sensory neuropathy type I (HSN I) is an axonal form of autosomal-dominant hereditary motor and sensory neuropathy distinguished by prominent sensory loss that leads to painless injuries. Unrecognized, these can result in delayed wound healing and osteomyelitis, necessitating distal amputations. To elucidate the genetic basis of an HSN I subtype in a family in which mutations in the few known HSN I genes had been excluded, we employed massive parallel exon sequencing of the 14.3 Mb disease interval on chromosome 14q. We detected a missense mutation (c.1065C>A, p.Asn355Lys) in atlastin-1 (ATL1), a gene that is known to be mutated in early-onset hereditary spastic paraplegia SPG3A and that encodes the large dynamin-related GTPase atlastin-1. The mutant protein exhibited reduced GTPase activity and prominently disrupted ER network morphology when expressed in COS7 cells, strongly supporting pathogenicity. An expanded screen in 115 additional HSN I patients identified two further dominant ATL1 mutations (c.196G>C [p.Glu66Gln] and c.976 delG [p.Val326TrpfsX8]). This study highlights an unexpected major role for atlastin-1 in the function of sensory neurons and identifies HSN I and SPG3A as allelic disorders.

  8. Use of DNA sequences to identify forensically important fly species and their distribution in the coastal region of Central California.

    Science.gov (United States)

    Nakano, Angie; Honda, Jeff

    2015-08-01

    Forensic entomology has gained prominence in recent years, as improvements in DNA technology and molecular methods have allowed insect and other arthropod evidence to become increasingly useful in criminal and civil investigations. However, comprehensive faunal inventories are still needed, including cataloging local DNA sequences for forensically significant Diptera. This multi-year fly-trapping study was built upon and expanded a previous survey of these flies in Santa Clara County, including the addition of genetic barcoding data from collected species of flies. Flies from the families Calliphoridae, Sarcophagidae, and Muscidae were trapped in meat-baited traps set in a variety of locations throughout the county. Flies were identified using morphological features and confirmed by molecular analysis. A total of 16 calliphorid species, 11 sarcophagid species, and four muscid species were collected and differentiated. This study found more species of flies than previous area surveys and established new county records for two calliphorid species: Cynomya cadaverina and Chrysomya rufifacies. Differences were found in fly fauna in different areas of the county, indicating the importance of microclimates in the distribution of these flies. Molecular analysis supported the use of DNA barcoding as an effective method of identifying cryptic fly species. Copyright © 2015 Elsevier Ireland Ltd. All rights reserved.

  9. Epidemiological study on the penicillin resistance of clinical Streptococcus pneumoniae isolates identified as the common sequence types.

    Science.gov (United States)

    Gao, Wei; Shi, Wei; Chen, Chang-hui; Wen, De-nian; Tian, Jin; Yao, Kai-hu

    2016-10-20

    There were some limitation in the current interpretation about the penicillin resistance mechanism of clinical Streptococcus pneumoniae isolates at the strain level. To explore the possibilities of studying the mechanism based on the sequence types (ST) of this bacteria, 488 isolates collected in Beijing from 1997-2014 and 88 isolates collected in Youyang County, Chongqing and Zhongjiang County, Sichuan in 2015 were analyzed by penicillin minimum inhibitory concentration (MIC) distribution and annual distribution. The results showed that the penicillin MICs of the all isolates covering by the given ST in Beijing have a defined range, either penicillin MIC penicillin MICs in the first few years after it was identified. The penicillin MIC of isolates identified as common STs and collected in Youyang County, Chongqing and Sichuan Zhongjiang County, including the ST271, ST320 and ST81, was around 0.25~2 mg/L (≥0.25 mg/L). Our study revealed the epidemiological distribution of penicillin MICs of the given STs determined in clinical S. pneumoniae isolates, suggesting that it is reasonable to research the penicillin resistance mechanism based on the STs of this bacteria.

  10. TSSer: an automated method to identify transcription start sites in prokaryotic genomes from differential RNA sequencing data.

    Science.gov (United States)

    Jorjani, Hadi; Zavolan, Mihaela

    2014-04-01

    Accurate identification of transcription start sites (TSSs) is an essential step in the analysis of transcription regulatory networks. In higher eukaryotes, the capped analysis of gene expression technology enabled comprehensive annotation of TSSs in genomes such as those of mice and humans. In bacteria, an equivalent approach, termed differential RNA sequencing (dRNA-seq), has recently been proposed, but the application of this approach to a large number of genomes is hindered by the paucity of computational analysis methods. With few exceptions, when the method has been used, annotation of TSSs has been largely done manually. In this work, we present a computational method called 'TSSer' that enables the automatic inference of TSSs from dRNA-seq data. The method rests on a probabilistic framework for identifying both genomic positions that are preferentially enriched in the dRNA-seq data as well as preferentially captured relative to neighboring genomic regions. Evaluating our approach for TSS calling on several publicly available datasets, we find that TSSer achieves high consistency with the curated lists of annotated TSSs, but identifies many additional TSSs. Therefore, TSSer can accelerate genome-wide identification of TSSs in bacterial genomes and can aid in further characterization of bacterial transcription regulatory networks. TSSer is freely available under GPL license at http://www.clipz.unibas.ch/TSSer/index.php

  11. Whole-Exome Sequencing Identified a Novel Compound Heterozygous Mutation of LRRC6 in a Chinese Primary Ciliary Dyskinesia Patient

    Directory of Open Access Journals (Sweden)

    Lv Liu

    2018-01-01

    Full Text Available Primary ciliary dyskinesia (PCD is a clinical rare peculiar disorder, mainly featured by respiratory infection, tympanitis, nasosinusitis, and male infertility. Previous study demonstrated it is an autosomal recessive disease and by 2017 almost 40 pathologic genes have been identified. Among them are the leucine-rich repeat- (LRR- containing 6 (LRRC6 codes for a 463-amino-acid cytoplasmic protein, expressed distinctively in motile cilia cells, including the testis cells and the respiratory epithelial cells. In this study, we applied whole-exome sequencing combined with PCD-known genes filtering to explore the genetic lesion of a PCD patient. A novel compound heterozygous mutation in LRRC6 (c.183T>G/p.N61K; c.179-1G>A was identified and coseparated in this family. The missense mutation (c.183T>G/p.N61K may lead to a substitution of asparagine by lysine at position 61 in exon 3 of LRRC6. The splice site mutation (c.179-1G>A may cause a premature stop codon in exon 4 and decrease the mRNA levels of LRRC6. Both mutations were not present in our 200 local controls, dbSNP, and 1000 genomes. Three bioinformatics programs also predicted that both mutations are deleterious. Our study not only further supported the importance of LRRC6 in PCD, but also expanded the spectrum of LRRC6 mutations and will contribute to the genetic diagnosis and counseling of PCD patients.

  12. Variations of the ISM conditions accross the Main Sequence of star forming galaxies: observations and simulations.

    Science.gov (United States)

    Martinez Galarza, Juan R.; Smith, Howard Alan; Lanz, Lauranne; Hayward, Christopher C.; Zezas, Andreas; Hung, Chao-Ling; Rosenthal, Lee; Weiner, Aaron

    2015-01-01

    A significant amount of evidence has been gathered that leads to the existence of a main sequence (MS) of star formation in galaxies. This MS is expressed in terms of a correlation between the SFR and the stellar mass of the form SFR ∝ M* and spans a few orders of magnitude in both quantities. Several ideas have been suggested to explain fundamental properties of the MS, such as its slope, its dispersion, and its evolution with redshift, but no consensus has been reached regarding its true nature, and whether the membership or not of particular galaxies to this MS underlies the existence of two different modes of star formation. In order to advance in the understanding of the MS, here we use a statistically robust Bayesian SED analysis method (CHIBURST) to consistently analyze the star-forming properties of a set of hydro-dynamical simulations of mergers, as well as observations of real mergers, both local and at intermediate redshift. We find a remarkable, very tight correlation between the specific star formation rate (sSFR) of galaxies, and the typical ISM conditions near their inernal star-forming regions, parametrized via a novel quantity: the compactness parameter (C). The evolution of mergers along this correlation explains the spread of the MS, and implies that the physical conditions of the ISM smoothly evolve between on-MS (secular) conditions and off-MS (coalescence/starburst) conditions. Furthermore, we show that the slope of the correlation can be interpreted in terms of the efficiency in the conversion of gas into stars, and that this efficiency remains unchanged along and across the MS. Finally, we discuss differences in the normalization of the correlation as a function of merger mass and redshift, and conclude that these differences imply the existence of two different modes of star formation, unrelated to the smooth evolution across the MS: a disk-like, low pressure mode and a compact nuclear-starburst mode.

  13. Whole-genome sequencing identifies EN1 as a determinant of bone density and fracture

    DEFF Research Database (Denmark)

    Zheng, Hou-Feng; Forgetta, Vincenzo; Hsu, Yi-Hsiang

    2015-01-01

    . Associations for BMD were derived from whole-genome sequencing (n = 2,882 from UK10K (ref. 10); a population-based genome sequencing consortium), whole-exome sequencing (n = 3,549), deep imputation of genotyped samples using a combined UK10K/1000 Genomes reference panel (n = 26,534), and de novo replication...

  14. Deep sequencing of the viral phoH gene reveals temporal variation, depth-specific composition, and persistent dominance of the same viral phoH genes in the Sargasso Sea

    Directory of Open Access Journals (Sweden)

    Dawn B. Goldsmith

    2015-06-01

    Full Text Available Deep sequencing of the viral phoH gene, a host-derived auxiliary metabolic gene, was used to track viral diversity throughout the water column at the Bermuda Atlantic Time-series Study (BATS site in the summer (September and winter (March of three years. Viral phoH sequences reveal differences in the viral communities throughout a depth profile and between seasons in the same year. Variation was also detected between the same seasons in subsequent years, though these differences were not as great as the summer/winter distinctions. Over 3,600 phoH operational taxonomic units (OTUs; 97% sequence identity were identified. Despite high richness, most phoH sequences belong to a few large, common OTUs whereas the majority of the OTUs are small and rare. While many OTUs make sporadic appearances at just a few times or depths, a small number of OTUs dominate the community throughout the seasons, depths, and years.

  15. A high-density genetic map for anchoring genome sequences and identifying QTLs associated with dwarf vine in pumpkin (Cucurbita maxima Duch.).

    Science.gov (United States)

    Zhang, Guoyu; Ren, Yi; Sun, Honghe; Guo, Shaogui; Zhang, Fan; Zhang, Jie; Zhang, Haiying; Jia, Zhangcai; Fei, Zhangjun; Xu, Yong; Li, Haizhen

    2015-12-24

    Pumpkin (Cucurbita maxima Duch.) is an economically important crop belonging to the Cucurbitaceae family. However, very few genomic and genetic resources are available for this species. As part of our ongoing efforts to sequence the pumpkin genome, high-density genetic map is essential for anchoring and orienting the assembled scaffolds. In addition, a saturated genetic map can facilitate quantitative trait locus (QTL) mapping. A set of 186 F2 plants derived from the cross of pumpkin inbred lines Rimu and SQ026 were genotyped using the genotyping-by-sequencing approach. Using the SNPs we identified, a high-density genetic map containing 458 bin-markers was constructed, spanning a total genetic distance of 2,566.8 cM across the 20 linkage groups of C. maxima with a mean marker density of 5.60 cM. Using this map we were able to anchor 58 assembled scaffolds that covered about 194.5 Mb (71.7%) of the 271.4 Mb assembled pumpkin genome, of which 44 (183.0 Mb; 67.4%) were oriented. Furthermore, the high-density genetic map was used to identify genomic regions highly associated with an important agronomic trait, dwarf vine. Three QTLs on linkage groups (LGs) 1, 3 and 4, respectively, were recovered. One QTL, qCmB2, which was located in an interval of 0.42 Mb on LG 3, explained 21.4% phenotypic variations. Within qCmB2, one gene, Cma_004516, encoding the gibberellin (GA) 20-oxidase in the GA biosynthesis pathway, had a 1249-bp deletion in its promoter in bush type lines, and its expression level was significantly increased during the vine growth and higher in vine type lines than bush type lines, supporting Cma_004516 as a possible candidate gene controlling vine growth in pumpkin. A high-density pumpkin genetic map was constructed, which was used to successfully anchor and orient the assembled genome scaffolds, and to identify QTLs highly associated with pumpkin vine length. The map provided a valuable resource for gene cloning and marker assisted breeding in pumpkin and

  16. De Novo Transcriptome Sequencing in Passiflora edulis Sims to Identify Genes and Signaling Pathways Involved in Cold Tolerance

    Directory of Open Access Journals (Sweden)

    Sian Liu

    2017-11-01

    Full Text Available The passion fruit (Passiflora edulis Sims, also known as the purple granadilla, is widely cultivated as the new darling of the fruit market throughout southern China. This exotic and perennial climber is adapted to warm and humid climates, and thus is generally intolerant of cold. There is limited information about gene regulation and signaling pathways related to the cold stress response in this species. In this study, two transcriptome libraries (KEDU_AP vs. GX_AP were constructed from the aerial parts of cold-tolerant and cold-susceptible varieties of P. edulis, respectively. Overall, 126,284,018 clean reads were obtained, and 86,880 unigenes with a mean size of 1449 bp were assembled. Of these, there were 64,067 (73.74% unigenes with significant similarity to publicly available plant protein sequences. Expression profiles were generated, and 3045 genes were found to be significantly differentially expressed between the KEDU_AP and GX_AP libraries, including 1075 (35.3% up-regulated and 1970 (64.7% down-regulated. These included 36 genes in enriched pathways of plant hormone signal transduction, and 56 genes encoding putative transcription factors. Six genes involved in the ICE1–CBF–COR pathway were induced in the cold-tolerant variety, and their expression levels were further verified using quantitative real-time PCR. This report is the first to identify genes and signaling pathways involved in cold tolerance using high-throughput transcriptome sequencing in P. edulis. These findings may provide useful insights into the molecular mechanisms regulating cold tolerance and genetic breeding in Passiflora spp.

  17. Identifying Faults Associated with the 2001 Avoca Induced(?) Seismicity Sequence of Western New York State Using Potential Field Wavelets.

    Science.gov (United States)

    Horowitz, F. G.; Ebinger, C.; Jordan, T. E.

    2017-12-01

    Results from recent DOE and USGS sponsored projects in the (intraplate) northeastern portions of the US and southeastern portions of Canada have identified locations of steeply dipping structures - many previously unknown - from a Poisson wavelet multiscale edge ('worm') analysis of gravity and magnetic fields. The Avoca sequence of induced(?) seismicity in western New York state occurred during January and February of 2001. The Avoca earthquake sequence is associated with industrial hydraulic fracturing activity "related to a proposed natural gas storage facility near Avoca to be constructed by solution mining" (Kim, 2001). The main Avoca event was a felt Mb = 3.2 earthquake on Feb. 3, 2001 recorded by the Lamont Cooperative Seismic Network. Earlier, smaller events were located by the Canadian Geological Survey's seismic network north of the Canadian border - implying that the event locations might be biased because they occurred off the southern edge of the array. Some of these events were also felt locally, according to local newspaper reports. By plotting the location of the seismic events and that of the injection well - reported via it's API number - we find a strong correlation with structures detected via our potential field worms. The injection occurred near a NE-SW striking structure that was not activated. All but one of the earthquakes occurred about 5 km north of the injection well on or nearby to an E-W striking structure that appears to intersect the NE-SW structure. The final, small (MN=2.2) earthquake was located on a different complex structure about 10 km north of the other events. We suggest that potential field methods such as ours might be appropriate to locating structures of concern for induced seismic activity in association with industrial activity. Reference: Kim, W.-Y. (2001). The Lamont cooperative seismic network and the national seismic system: Earthquake hazard studies in the northeastern United States. Tech. Rep. 98-01, Lamont

  18. High-throughput sequencing of the T cell receptor β gene identifies aggressive early-stage mycosis fungoides.

    Science.gov (United States)

    de Masson, Adele; O'Malley, John T; Elco, Christopher P; Garcia, Sarah S; Divito, Sherrie J; Lowry, Elizabeth L; Tawa, Marianne; Fisher, David C; Devlin, Phillip M; Teague, Jessica E; Leboeuf, Nicole R; Kirsch, Ilan R; Robins, Harlan; Clark, Rachael A; Kupper, Thomas S

    2018-05-09

    Mycosis fungoides (MF), the most common cutaneous T cell lymphoma (CTCL) is a malignancy of skin-tropic memory T cells. Most MF cases present as early stage (stage I A/B, limited to the skin), and these patients typically have a chronic, indolent clinical course. However, a small subset of early-stage cases develop progressive and fatal disease. Because outcomes can be so different, early identification of this high-risk population is an urgent unmet clinical need. We evaluated the use of next-generation high-throughput DNA sequencing of the T cell receptor β gene ( TCRB ) in lesional skin biopsies to predict progression and survival in a discovery cohort of 208 patients with CTCL (177 with MF) from a 15-year longitudinal observational clinical study. We compared these data to the results in an independent validation cohort of 101 CTCL patients (87 with MF). The tumor clone frequency (TCF) in lesional skin, measured by high-throughput sequencing of the TCRB gene, was an independent prognostic factor of both progression-free and overall survival in patients with CTCL and MF in particular. In early-stage patients, a TCF of >25% in the skin was a stronger predictor of progression than any other established prognostic factor (stage IB versus IA, presence of plaques, high blood lactate dehydrogenase concentration, large-cell transformation, or age). The TCF therefore may accurately predict disease progression in early-stage MF. Early identification of patients at high risk for progression could help identify candidates who may benefit from allogeneic hematopoietic stem cell transplantation before their disease becomes treatment-refractory. Copyright © 2018 The Authors, some rights reserved; exclusive licensee American Association for the Advancement of Science. No claim to original U.S. Government Works.

  19. A case-control study identifying chromosomal polymorphic variations as forms of epigenetic alterations associated with the infertility phenotype

    DEFF Research Database (Denmark)

    Minocherhomji, Sheroy; Athalye, Arundhati S; Madon, Prochi F

    2009-01-01

    To study the association of chromosomal polymorphic variations with infertility and subfertility.......To study the association of chromosomal polymorphic variations with infertility and subfertility....

  20. Association of the leucine-7 to proline-7 variation in the signal sequence of neuropeptide Y with major depression

    DEFF Research Database (Denmark)

    Koefoed, Pernille; Woldbye, David P. D.; Hansen, Thomas v. O.

    2012-01-01

    Objective: There is clear evidence of a genetic component in major depression, and several studies indicate that neuropeptide Y (NPY) could play an important role in the pathophysiology of the disease. A well-known polymorphism encoding the substitution of leucine to proline in the signal peptide...... sequence of NPY (Leu7Pro variation) was previously found to protect against depression. Our study aimed at replicating this association in a large Danish population with major depression. Method: Leu7Pro was studied in a sample of depressed patients and ethnically matched controls, as well as psychiatric...... disease controls with schizophrenia. Possible functional consequences of Leu7Pro were explored in vitro. Results: In contrast to previous studies, Pro7 appeared to be a risk allele for depression, being significantly more frequent in the depression sample (5.5 n = 593; p = 0.009; odds ratio, OR: 1...

  1. A RAD-Based Genetic Map for Anchoring Scaffold Sequences and Identifying QTLs in Bitter Gourd (Momordica charantia)

    Science.gov (United States)

    Cui, Junjie; Luo, Shaobo; Niu, Yu; Huang, Rukui; Wen, Qingfang; Su, Jianwen; Miao, Nansheng; He, Weiming; Dong, Zhensheng; Cheng, Jiaowen; Hu, Kailin

    2018-01-01

    Genetic mapping is a basic tool necessary for anchoring assembled scaffold sequences and for identifying QTLs controlling important traits. Though bitter gourd (Momordica charantia) is both consumed and used as a medicinal, research on its genomics and genetic mapping is severely limited. Here, we report the construction of a restriction site associated DNA (RAD)-based genetic map for bitter gourd using an F2 mapping population comprising 423 individuals derived from two cultivated inbred lines, the gynoecious line ‘K44’ and the monoecious line ‘Dali-11.’ This map comprised 1,009 SNP markers and spanned a total genetic distance of 2,203.95 cM across the 11 linkage groups. It anchored a total of 113 assembled scaffolds that covered about 251.32 Mb (85.48%) of the 294.01 Mb assembled genome. In addition, three horticulturally important traits including sex expression, fruit epidermal structure, and immature fruit color were evaluated using a combination of qualitative and quantitative data. As a result, we identified three QTL/gene loci responsible for these traits in three environments. The QTL/gene gy/fffn/ffn, controlling sex expression involved in gynoecy, first female flower node, and female flower number was detected in the reported region. Particularly, two QTLs/genes, Fwa/Wr and w, were found to be responsible for fruit epidermal structure and white immature fruit color, respectively. This RAD-based genetic map promotes the assembly of the bitter gourd genome and the identified genetic loci will accelerate the cloning of relevant genes in the future. PMID:29706980

  2. Panel-based whole exome sequencing identifies novel mutations in microphthalmia and anophthalmia patients showing complex Mendelian inheritance patterns.

    Science.gov (United States)

    Riera, Marina; Wert, Ana; Nieto, Isabel; Pomares, Esther

    2017-11-01

    Microphthalmia and anophthalmia (MA) are congenital eye abnormalities that show an extremely high clinical and genetic complexity. In this study, we evaluated the implementation of whole exome sequencing (WES) for the genetic analysis of MA patients. This approach was used to investigate three unrelated families in which previous single-gene analyses failed to identify the molecular cause. A total of 47 genes previously associated with nonsyndromic MA were included in our panel. WES was performed in one affected patient from each family using the AmpliSeq TM Exome technology and the Ion Proton TM platform. A novel heterozygous OTX2 missense mutation was identified in a patient showing bilateral anophthalmia who inherited the variant from a parent who was a carrier, but showed no sign of the condition. We also describe a new PAX6 missense variant in an autosomal-dominant pedigree affected by mild bilateral microphthalmia showing high intrafamiliar variability, with germline mosaicism determined to be the most plausible molecular cause of the disease. Finally, a heterozygous missense mutation in RBP4 was found to be responsible in an isolated case of bilateral complex microphthalmia. This study highlights that panel-based WES is a reliable and effective strategy for the genetic diagnosis of MA. Furthermore, using this technique, the mutational spectrum of these diseases was broadened, with novel variants identified in each of the OTX2, PAX6, and RBP4 genes. Moreover, we report new cases of reduced penetrance, mosaicism, and variable phenotypic expressivity associated with MA, further demonstrating the heterogeneity of such disorders. © 2017 The Authors. Molecular Genetics & Genomic Medicine published by Wiley Periodicals, Inc.

  3. A RAD-Based Genetic Map for Anchoring Scaffold Sequences and Identifying QTLs in Bitter Gourd (Momordica charantia

    Directory of Open Access Journals (Sweden)

    Junjie Cui

    2018-04-01

    Full Text Available Genetic mapping is a basic tool necessary for anchoring assembled scaffold sequences and for identifying QTLs controlling important traits. Though bitter gourd (Momordica charantia is both consumed and used as a medicinal, research on its genomics and genetic mapping is severely limited. Here, we report the construction of a restriction site associated DNA (RAD-based genetic map for bitter gourd using an F2 mapping population comprising 423 individuals derived from two cultivated inbred lines, the gynoecious line ‘K44’ and the monoecious line ‘Dali-11.’ This map comprised 1,009 SNP markers and spanned a total genetic distance of 2,203.95 cM across the 11 linkage groups. It anchored a total of 113 assembled scaffolds that covered about 251.32 Mb (85.48% of the 294.01 Mb assembled genome. In addition, three horticulturally important traits including sex expression, fruit epidermal structure, and immature fruit color were evaluated using a combination of qualitative and quantitative data. As a result, we identified three QTL/gene loci responsible for these traits in three environments. The QTL/gene gy/fffn/ffn, controlling sex expression involved in gynoecy, first female flower node, and female flower number was detected in the reported region. Particularly, two QTLs/genes, Fwa/Wr and w, were found to be responsible for fruit epidermal structure and white immature fruit color, respectively. This RAD-based genetic map promotes the assembly of the bitter gourd genome and the identified genetic loci will accelerate the cloning of relevant genes in the future.

  4. High-throughput sequencing and copy number variation detection using formalin fixed embedded tissue in metastatic gastric cancer.

    Directory of Open Access Journals (Sweden)

    Seokhwi Kim

    Full Text Available In the era of targeted therapy, mutation profiling of cancer is a crucial aspect of making therapeutic decisions. To characterize cancer at a molecular level, the use of formalin-fixed paraffin-embedded tissue is important. We tested the Ion AmpliSeq Cancer Hotspot Panel v2 and nCounter Copy Number Variation Assay in 89 formalin-fixed paraffin-embedded gastric cancer samples to determine whether they are applicable in archival clinical samples for personalized targeted therapies. We validated the results with Sanger sequencing, real-time quantitative PCR, fluorescence in situ hybridization and immunohistochemistry. Frequently detected somatic mutations included TP53 (28.17%, APC (10.1%, PIK3CA (5.6%, KRAS (4.5%, SMO (3.4%, STK11 (3.4%, CDKN2A (3.4% and SMAD4 (3.4%. Amplifications of HER2, CCNE1, MYC, KRAS and EGFR genes were observed in 8 (8.9%, 4 (4.5%, 2 (2.2%, 1 (1.1% and 1 (1.1% cases, respectively. In the cases with amplification, fluorescence in situ hybridization for HER2 verified gene amplification and immunohistochemistry for HER2, EGFR and CCNE1 verified the overexpression of proteins in tumor cells. In conclusion, we successfully performed semiconductor-based sequencing and nCounter copy number variation analyses in formalin-fixed paraffin-embedded gastric cancer samples. High-throughput screening in archival clinical samples enables faster, more accurate and cost-effective detection of hotspot mutations or amplification in genes.

  5. Molecular phylogeny of Japanese Rhinolophidae based on variations in the complete sequence of the mitochondrial cytochrome b gene.

    Science.gov (United States)

    Sakai, Takahiro; Kikkawa, Yoshiaki; Tsuchiya, Kimiyuki; Harada, Masashi; Kanoe, Masamitsu; Yoshiyuki, Mizuko; Yonekawa, Hiromichi

    2003-04-01

    Microchiroptera have diversified into many species whose size and the shapes of the complicated ear and nose have been adapted to their echolocation abilities. Their speciation processes, and intra- and interspecies relationships are still under discussion. Here we report on the geographical variation of Japanese Rhinolophus ferrumequinum and R. cornutus using the complete sequence of the mitochondrial cytochrome b gene to clarify the phylogenetic positions of the 2 species as well as that of Rhinolophidae within the Microchiroptera. We have found that sequence divergence values within each of the 2 species are unexpectedly low (0.07%-0.94%). We have also found that there is no local specificity of their mtCytb alleles. On the other hand, the divergence values for Japanese Microchiroptera (12.7%-16.6%) are much higher than those for other mammalian genera. Similarly, the values among five genera of Vespertilionidae were 20.5%-27.3%. Phylogenetic analysis shows that the 2 species of family Rhinolophidae in the suborder Microchiroptera belong to the Megachiroptera cluster in the constructed maximum parsimony tree. These results suggest that the speciation of Rhinolophidae involved its divergence as an independent lineage from other Microchiroptera, and other microbats might be paraphyletic. In addition, the tree also shows that the order Chiroptera is monophylitic, and the closest group to Chiroptera is the ungulates.

  6. Whole-Exome Sequencing Identifies One De Novo Variant in the FGD6 Gene in a Thai Family with Autism Spectrum Disorder

    Directory of Open Access Journals (Sweden)

    Chuphong Thongnak

    2018-01-01

    Full Text Available Autism spectrum disorder (ASD has a strong genetic basis, although the genetics of autism is complex and it is unclear. Genetic testing such as microarray or sequencing was widely used to identify autism markers, but they are unsuccessful in several cases. The objective of this study is to identify causative variants of autism in two Thai families by using whole-exome sequencing technique. Whole-exome sequencing was performed with autism-affected children from two unrelated families. Each sample was sequenced on SOLiD 5500xl Genetic Analyzer system followed by combined bioinformatics pipeline including annotation and filtering process to identify candidate variants. Candidate variants were validated, and the segregation study with other family members was performed using Sanger sequencing. This study identified a possible causative variant for ASD, c.2951G>A, in the FGD6 gene. We demonstrated the potential for ASD genetic variants associated with ASD using whole-exome sequencing and a bioinformatics filtering procedure. These techniques could be useful in identifying possible causative ASD variants, especially in cases in which variants cannot be identified by other techniques.

  7. Exploration of the effect of sequence variations located inside the binding pocket of HIV-1 and HIV-2 proteases.

    Science.gov (United States)

    Triki, Dhoha; Billot, Telli; Visseaux, Benoit; Descamps, Diane; Flatters, Delphine; Camproux, Anne-Claude; Regad, Leslie

    2018-04-10

    HIV-2 protease (PR2) is naturally resistant to most FDA (Food and Drug Administration)-approved HIV-1 protease inhibitors (PIs), a major antiretroviral class. In this study, we compared the PR1 and PR2 binding pockets extracted from structures complexed with 12 ligands. The comparison of PR1 and PR2 pocket properties showed that bound PR2 pockets were more hydrophobic with more oxygen atoms and fewer nitrogen atoms than PR1 pockets. The structural comparison of PR1 and PR2 pockets highlighted structural changes induced by their sequence variations and that were consistent with these property changes. Specifically, substitutions at residues 31, 46, and 82 induced structural changes in their main-chain atoms that could affect PI binding in PR2. In addition, the modelling of PR1 mutant structures containing V32I and L76M substitutions revealed a cooperative mechanism leading to structural deformation of flap-residue 45 that could modify PR2 flexibility. Our results suggest that substitutions in the PR1 and PR2 pockets can modify PI binding and flap flexibility, which could underlie PR2 resistance against PIs. These results provide new insights concerning the structural changes induced by PR1 and PR2 pocket variation changes, improving the understanding of the atomic mechanism of PR2 resistance to PIs.

  8. Automatically Identifying Fusion Events between GLUT4 Storage Vesicles and the Plasma Membrane in TIRF Microscopy Image Sequences

    Directory of Open Access Journals (Sweden)

    Jian Wu

    2015-01-01

    Full Text Available Quantitative analysis of the dynamic behavior about membrane-bound secretory vesicles has proven to be important in biological research. This paper proposes a novel approach to automatically identify the elusive fusion events between VAMP2-pHluorin labeled GLUT4 storage vesicles (GSVs and the plasma membrane. The differentiation is implemented to detect the initiation of fusion events by modified forward subtraction of consecutive frames in the TIRFM image sequence. Spatially connected pixels in difference images brighter than a specified adaptive threshold are grouped into a distinct fusion spot. The vesicles are located at the intensity-weighted centroid of their fusion spots. To reveal the true in vivo nature of a fusion event, 2D Gaussian fitting for the fusion spot is used to derive the intensity-weighted centroid and the spot size during the fusion process. The fusion event and its termination can be determined according to the change of spot size. The method is evaluated on real experiment data with ground truth annotated by expert cell biologists. The evaluation results show that it can achieve relatively high accuracy comparing favorably to the manual analysis, yet at a small fraction of time.

  9. Mining environmental high-throughput sequence data sets to identify divergent amplicon clusters for phylogenetic reconstruction and morphotype visualization.

    Science.gov (United States)

    Gimmler, Anna; Stoeck, Thorsten

    2015-08-01

    Environmental high-throughput sequencing (envHTS) is a very powerful tool, which in protistan ecology is predominantly used for the exploration of diversity and its geographic and local patterns. We here used a pyrosequenced V4-SSU rDNA data set from a solar saltern pond as test case to exploit such massive protistan amplicon data sets beyond this descriptive purpose. Therefore, we combined a Swarm-based blastn network including 11 579 ciliate V4 amplicons to identify divergent amplicon clusters with targeted polymerase chain reaction (PCR) primer design for full-length small subunit of the ribosomal DNA retrieval and probe design for fluorescence in situ hybridization (FISH). This powerful strategy allows to benefit from envHTS data sets to (i) reveal the phylogenetic position of the taxon behind divergent amplicons; (ii) improve phylogenetic resolution and evolutionary history of specific taxon groups; (iii) solidly assess an amplicons (species') degree of similarity to its closest described relative; (iv) visualize the morphotype behind a divergent amplicons cluster; (v) rapidly FISH screen many environmental samples for geographic/habitat distribution and abundances of the respective organism and (vi) to monitor the success of enrichment strategies in live samples for cultivation and isolation of the respective organisms. © 2015 Society for Applied Microbiology and John Wiley & Sons Ltd.

  10. Next Generation Sequencing Identifies Five Major Classes of Potentially Therapeutic Enzymes Secreted by Lucilia sericata Medical Maggots.

    Science.gov (United States)

    Franta, Zdeněk; Vogel, Heiko; Lehmann, Rüdiger; Rupp, Oliver; Goesmann, Alexander; Vilcinskas, Andreas

    2016-01-01

    Lucilia sericata larvae are used as an alternative treatment for recalcitrant and chronic wounds. Their excretions/secretions contain molecules that facilitate tissue debridement, disinfect, or accelerate wound healing and have therefore been recognized as a potential source of novel therapeutic compounds. Among the substances present in excretions/secretions various peptidase activities promoting the wound healing processes have been detected but the peptidases responsible for these activities remain mostly unidentified. To explore these enzymes we applied next generation sequencing to analyze the transcriptomes of different maggot tissues (salivary glands, gut, and crop) associated with the production of excretions/secretions and/or with digestion as well as the rest of the larval body. As a result we obtained more than 123.8 million paired-end reads, which were assembled de novo using Trinity and Oases assemblers, yielding 41,421 contigs with an N50 contig length of 2.22 kb and a total length of 67.79 Mb. BLASTp analysis against the MEROPS database identified 1729 contigs in 577 clusters encoding five peptidase classes (serine, cysteine, aspartic, threonine, and metallopeptidases), which were assigned to 26 clans, 48 families, and 185 peptidase species. The individual enzymes were differentially expressed among maggot tissues and included peptidase activities related to the therapeutic effects of maggot excretions/secretions.

  11. Patterns of oligonucleotide sequences in viral and host cell RNA identify mediators of the host innate immune system.

    Directory of Open Access Journals (Sweden)

    Benjamin D Greenbaum

    Full Text Available The innate immune response provides a first line of defense against pathogens by targeting generic differential features that are present in foreign organisms but not in the host. These innate responses generate selection forces acting both in pathogens and hosts that further determine their co-evolution. Here we analyze the nucleic acid sequence fingerprints of these selection forces acting in parallel on both host innate immune genes and ssRNA viral genomes. We do this by identifying dinucleotide biases in the coding regions of innate immune response genes in plasmacytoid dendritic cells, and then use this signal to identify other significant host innate immune genes. The persistence of these biases in the orthologous groups of genes in humans and chickens is also examined. We then compare the significant motifs in highly expressed genes of the innate immune system to those in ssRNA viruses and study the evolution of these motifs in the H1N1 influenza genome. We argue that the significant under-represented motif pattern of CpG in an AU context--which is found in both the ssRNA viruses and innate genes, and has decreased throughout the history of H1N1 influenza replication in humans--is immunostimulatory and has been selected against during the co-evolution of viruses and host innate immune genes. This shows how differences in host immune biology can drive the evolution of viruses that jump into species with different immune priorities than the original host.

  12. Exome sequencing identifies mutations in ABCD1 and DACH2 in two brothers with a distinct phenotype

    OpenAIRE

    Zhang, Yanliang; Liu, Yanhui; Li, Ya; Duan, Yong; Zhang, Keyun; Wang, Junwang; Dai, Yong

    2014-01-01

    Background We report on two brothers with a distinct syndromic phenotype and explore the potential pathogenic cause. Methods Cytogenetic tests and exome sequencing were performed on the two brothers and their parents. Variants detected by exome sequencing were validated by Sanger sequencing. Results The main phenotype of the two brothers included congenital language disorder, growth retardation, intellectual disability, difficulty in standing and walking, and urinary and fecal incontinence. T...

  13. Candidate genes revealed by a genome scan for mosquito resistance to a bacterial insecticide: sequence and gene expression variations

    Directory of Open Access Journals (Sweden)

    David Jean-Philippe

    2009-11-01

    Full Text Available Abstract Background Genome scans are becoming an increasingly popular approach to study the genetic basis of adaptation and speciation, but on their own, they are often helpless at identifying the specific gene(s or mutation(s targeted by selection. This shortcoming is hopefully bound to disappear in the near future, thanks to the wealth of new genomic resources that are currently being developed for many species. In this article, we provide a foretaste of this exciting new era by conducting a genome scan in the mosquito Aedes aegypti with the aim to look for candidate genes involved in resistance to Bacillus thuringiensis subsp. israelensis (Bti insecticidal toxins. Results The genome of a Bti-resistant and a Bti-susceptible strains was surveyed using about 500 MITE-based molecular markers, and the loci showing the highest inter-strain genetic differentiation were sequenced and mapped on the Aedes aegypti genome sequence. Several good candidate genes for Bti-resistance were identified in the vicinity of these highly differentiated markers. Two of them, coding for a cadherin and a leucine aminopeptidase, were further examined at the sequence and gene expression levels. In the resistant strain, the cadherin gene displayed patterns of nucleotide polymorphisms consistent with the action of positive selection (e.g. an excess of high compared to intermediate frequency mutations, as well as a significant under-expression compared to the susceptible strain. Conclusion Both sequence and gene expression analyses agree to suggest a role for positive selection in the evolution of this cadherin gene in the resistant strain. However, it is unlikely that resistance to Bti is conferred by this gene alone, and further investigation will be needed to characterize other genes significantly associated with Bti resistance in Ae. aegypti. Beyond these results, this article illustrates how genome scans can build on the body of new genomic information (here, full

  14. Allelic sequence variations in the hypervariable region of a T-cell receptor β chain: Correlation with restriction fragment length polymorphism in human families and populations

    International Nuclear Information System (INIS)

    Robinson, M.A.

    1989-01-01

    Direct sequence analysis of the human T-cell antigen receptor (TCR) V β1 variable gene identified a single base-pair allelic variation (C/G) located within the coding region. This change results in substitution of a histidine (CAC) for a glutamine (CAG) at position 48 of the TCR β chain, a position predicted to be in the TCR antigen binding site. The V β1 polymorphism was found by DNA sequence analysis of V β1 genes from seven unrelated individuals; V β1 genes were amplified by the polymerase chain reaction, the amplified fragments were cloned into M13 phage vectors, and sequences were determined. To determined the inheritance patterns of the V β1 substitution and to test correlation with V β1 restriction fragment length polymorphism detected with Pvu II and Taq I, allele-specific oligonucleotides were constructed and used to characterize amplified DNA samples. Seventy unrelated individuals and six families were tested for both restriction fragment length polymorphism and for the V β1 substitution. The correlation was also tested using amplified, size-selected, Pvu II- and Taq I-digested DNA samples from heterozygotes. Pvu II allele 1 (61/70) and Taq I allele 1 (66/70) were found to be correlated with the substitution giving rise to a histidine at position 48. Because there are exceptions to the correlation, the use of specific probes to characterize allelic forms of TCR variable genes will provide important tools for studies of basic TCR genetics and disease associations

  15. The Space-Time Variation of Global Crop Yields, Detecting Simultaneous Outliers and Identifying the Teleconnections with Climatic Patterns

    Science.gov (United States)

    Najafi, E.; Devineni, N.; Pal, I.; Khanbilvardi, R.

    2017-12-01

    An understanding of the climate factors that influence the space-time variability of crop yields is important for food security purposes and can help us predict global food availability. In this study, we address how the crop yield trends of countries globally were related to each other during the last several decades and the main climatic variables that triggered high/low crop yields simultaneously across the world. Robust Principal Component Analysis (rPCA) is used to identify the primary modes of variation in wheat, maize, sorghum, rice, soybeans, and barley yields. Relations between these modes of variability and important climatic variables, especially anomalous sea surface temperature (SSTa), are examined from 1964 to 2010. rPCA is also used to identify simultaneous outliers in each year, i.e. systematic high/low crop yields across the globe. The results demonstrated spatiotemporal patterns of these crop yields and the climate-related events that caused them as well as the connection of outliers with weather extremes. We find that among climatic variables, SST has had the most impact on creating simultaneous crop yields variability and yield outliers in many countries. An understanding of this phenomenon can benefit global crop trade networks.

  16. Targeted sequencing identifies associations between IL7R-JAK mutations and epigenetic modulators in T-cell acute lymphoblastic leukemia

    Science.gov (United States)

    Vicente, Carmen; Schwab, Claire; Broux, Michaël; Geerdens, Ellen; Degryse, Sandrine; Demeyer, Sofie; Lahortiga, Idoya; Elliott, Alannah; Chilton, Lucy; La Starza, Roberta; Mecucci, Cristina; Vandenberghe, Peter; Goulden, Nicholas; Vora, Ajay; Moorman, Anthony V.; Soulier, Jean; Harrison, Christine J.; Clappier, Emmanuelle; Cools, Jan

    2015-01-01

    T-cell acute lymphoblastic leukemia is caused by the accumulation of multiple oncogenic lesions, including chromosomal rearrangements and mutations. To determine the frequency and co-occurrence of mutations in T-cell acute lymphoblastic leukemia, we performed targeted re-sequencing of 115 genes across 155 diagnostic samples (44 adult and 111 childhood cases). NOTCH1 and CDKN2A/B were mutated/deleted in more than half of the cases, while an additional 37 genes were mutated/deleted in 4% to 20% of cases. We found that IL7R-JAK pathway genes were mutated in 27.7% of cases, with JAK3 mutations being the most frequent event in this group. Copy number variations were also detected, including deletions of CREBBP or CTCF and duplication of MYB. FLT3 mutations were rare, but a novel extracellular mutation in FLT3 was detected and confirmed to be transforming. Furthermore, we identified complex patterns of pairwise associations, including a significant association between mutations in IL7R-JAK genes and epigenetic regulators (WT1, PRC2, PHF6). Our analyses showed that IL7R-JAK genetic lesions did not confer adverse prognosis in T-cell acute lymphoblastic leukemia cases enrolled in the UK ALL2003 trial. Overall, these results identify interconnections between the T-cell acute lymphoblastic leukemia genome and disease biology, and suggest a potential clinical application for JAK inhibitors in a significant proportion of patients with T-cell acute lymphoblastic leukemia. PMID:26206799

  17. Intrastrain heterogeneity of the mgpB gene in Mycoplasma genitalium is extensive in vitro and in vivo and suggests that variation is generated via recombination with repetitive chromosomal sequences.

    Science.gov (United States)

    Iverson-Cabral, Stefanie L; Astete, Sabina G; Cohen, Craig R; Rocha, Eduardo P C; Totten, Patricia A

    2006-07-01

    Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.

  18. Targeted sequencing identifies genetic alterations that confer primary resistance to EGFR tyrosine kinase inhibitor (Korean Lung Cancer Consortium).

    Science.gov (United States)

    Lim, Sun Min; Kim, Hye Ryun; Cho, Eun Kyung; Min, Young Joo; Ahn, Jin Seok; Ahn, Myung-Ju; Park, Keunchil; Cho, Byoung Chul; Lee, Ji-Hyun; Jeong, Hye Cheol; Kim, Eun Kyung; Kim, Joo-Hang

    2016-06-14

    Non-small-cell lung cancer (NSCLC) patients with activating epidermal growth factor receptor (EGFR) mutations may exhibit primary resistance to EGFR tyrosine kinase inhibitor (TKI). We aimed to examine genomic alterations associated with de novo resistance to gefitinib in a prospective study of NSCLC patients. One-hundred and fifty two patients with activating EGFR mutations were included in this study and 136 patients' tumor sample were available for targeted sequencing of genomic alterations in 22 genes using the Colon and Lung Cancer panel (Ampliseq, Life Technologies). All 132 patients with EGFR mutation were treated with gefitinib for their treatment of advanced NSCLC. Twenty patients showed primary resistance to EGFR TKI, and were classified as non-responders. A total of 543 somatic single-nucleotide variants (498 missense, 13 nonsense) and 32 frameshift insertions/deletions, with a median of 3 mutations per sample. TP53 was most commonly mutated (47%) and mutations in SMAD4 was also common (19%), as well as DDR2 (16%), PIK3CA (15%), STK11 (14%), and BRAF (7%). Genomic mutations in the PI3K/Akt/mTOR pathway were commonly found in non-responders (45%) compared to responders (27%), and they had significantly shorter progression-free survival and overall survival compared to patients without mutations (2.1 vs. 12.8 months, P=0.04, 15.7 vs. not reached, PAkt/mTOR pathway were commonly identified in non-responders and may confer resistance to EGFR TKI. Screening lung adenocarcinoma patients with clinical cancer gene test may aid in selecting out those who show primary resistance to EGFR TKI (NCT01697163).

  19. Individual variation of human S1P₁ coding sequence leads to heterogeneity in receptor function and drug interactions.

    Science.gov (United States)

    Obinata, Hideru; Gutkind, Sarah; Stitham, Jeremiah; Okuno, Toshiaki; Yokomizo, Takehiko; Hwa, John; Hla, Timothy

    2014-12-01

    Sphingosine 1-phosphate receptor 1 (S1P₁), an abundantly-expressed G protein-coupled receptor which regulates key vascular and immune responses, is a therapeutic target in autoimmune diseases. Fingolimod/Gilenya (FTY720), an oral medication for relapsing-remitting multiple sclerosis, targets S1P₁ receptors on immune and neural cells to suppress neuroinflammation. However, suppression of endothelial S1P₁ receptors is associated with cardiac and vascular adverse effects. Here we report the genetic variations of the S1P₁ coding region from exon sequencing of >12,000 individuals and their functional consequences. We conducted functional analyses of 14 nonsynonymous single nucleotide polymorphisms (SNPs) of the S1PR1 gene. One SNP mutant (Arg¹²⁰ to Pro) failed to transmit sphingosine 1-phosphate (S1P)-induced intracellular signals such as calcium increase and activation of p44/42 MAPK and Akt. Two other mutants (Ile⁴⁵ to Thr and Gly³⁰⁵ to Cys) showed normal intracellular signals but impaired S1P-induced endocytosis, which made the receptor resistant to FTY720-induced degradation. Another SNP mutant (Arg¹³ to Gly) demonstrated protection from coronary artery disease in a high cardiovascular risk population. Individuals with this mutation showed a significantly lower percentage of multi-vessel coronary obstruction in a risk factor-matched case-control study. This study suggests that individual genetic variations of S1P₁ can influence receptor function and, therefore, infer differential disease risks and interaction with S1P₁-targeted therapeutics. Copyright © 2014 by the American Society for Biochemistry and Molecular Biology, Inc.

  20. Novel rare missense variations and risk of autism spectrum disorder: whole-exome sequencing in two families with affected siblings and a two-stage follow-up study in a Japanese population.

    Directory of Open Access Journals (Sweden)

    Jun Egawa

    Full Text Available Rare inherited variations in multiplex families with autism spectrum disorder (ASD are suggested to play a major role in the genetic etiology of ASD. To further investigate the role of rare inherited variations, we performed whole-exome sequencing (WES in two families, each with three affected siblings. We also performed a two-stage follow-up case-control study in a Japanese population. WES of the six affected siblings identified six novel rare missense variations. Among these variations, CLN8 R24H was inherited in one family by three affected siblings from an affected father and thus co-segregated with ASD. In the first stage of the follow-up study, we genotyped the six novel rare missense variations identified by WES in 241 patients and 667 controls (the Niigata sample. Only CLN8 R24H had higher mutant allele frequencies in patients (1/482 compared with controls (1/1334. In the second stage, this variation was further genotyped, yet was not detected in a sample of 309 patients and 350 controls (the Nagoya sample. In the combined Niigata and Nagoya samples, there was no significant association (odds ratio = 1.8, 95% confidence interval = 0.1-29.6. These results suggest that CLN8 R24H plays a role in the genetic etiology of ASD, at least in a subset of ASD patients.

  1. PineElm_SSRdb: a microsatellite marker database identified from genomic, chloroplast, mitochondrial and EST sequences of pineapple (Ananas comosus (L.) Merrill).

    Science.gov (United States)

    Chaudhary, Sakshi; Mishra, Bharat Kumar; Vivek, Thiruvettai; Magadum, Santoshkumar; Yasin, Jeshima Khan

    2016-01-01

    Simple Sequence Repeats or microsatellites are resourceful molecular genetic markers. There are only few reports of SSR identification and development in pineapple. Complete genome sequence of pineapple available in the public domain can be used to develop numerous novel SSRs. Therefore, an attempt was made to identify SSRs from genomic, chloroplast, mitochondrial and EST sequences of pineapple which will help in deciphering genetic makeup of its germplasm resources. A total of 359511 SSRs were identified in pineapple (356385 from genome sequence, 45 from chloroplast sequence, 249 in mitochondrial sequence and 2832 from EST sequences). The list of EST-SSR markers and their details are available in the database. PineElm_SSRdb is an open source database available for non-commercial academic purpose at http://app.bioelm.com/ with a mapping tool which can develop circular maps of selected marker set. This database will be of immense use to breeders, researchers and graduates working on Ananas spp. and to others working on cross-species transferability of markers, investigating diversity, mapping and DNA fingerprinting.

  2. Identity-by-descent filtering of exome sequence data identifies PIGV mutations in hyperphosphatasia mental retardation syndrome.

    NARCIS (Netherlands)

    Krawitz, P.M.; Schweiger, M.R.; Rodelsperger, C.; Marcelis, C.L.M.; Kolsch, U.; Meisel, C.; Stephani, F.; Kinoshita, T.; Murakami, Y.; Bauer, S.; Isau, M.; Fischer, A.; Dahl, A.; Kerick, M.; Hecht, J.; Kohler, S.; Jager, M. de; Grunhagen, J.; Condor, B.J. de; Doelken, S.; Brunner, H.G.; Meinecke, P.; Passarge, E.; Thompson, M.D.; Cole, D.E.; Horn, D.; Roscioli, T.; Mundlos, S.; Robinson, P.N.

    2010-01-01

    Hyperphosphatasia mental retardation (HPMR) syndrome is an autosomal recessive form of mental retardation with distinct facial features and elevated serum alkaline phosphatase. We performed whole-exome sequencing in three siblings of a nonconsanguineous union with HPMR and performed computational

  3. Multiple viral infections in Agaricus bisporus - Characterisation of 18 unique RNA viruses and 8 ORFans identified by deep sequencing

    OpenAIRE

    Deakin, Gregory; Dobbs, Edward; Bennett, Julie M.; Jones, Ian M.; Grogan, Helen M.; Burton, Kerry S.

    2017-01-01

    Thirty unique non-host RNAs were sequenced in the cultivated fungus, Agaricus bisporus, comprising 18 viruses each encoding an RdRp domain with an additional 8 ORFans (non-host RNAs with no similarity to known sequences). Two viruses were multipartite with component RNAs showing correlative abundances and common 3′ motifs. The viruses, all positive sense single-stranded, were classified into diverse orders/families. Multiple infections of Agaricus may represent a diverse, dynamic and interact...

  4. Variation in ribosomal and mitochondrial DNA sequences demonstrates the existence of intraspecific groups in Paramecium multimicronucleatum (Ciliophora, Oligohymenophorea).

    Science.gov (United States)

    Tarcz, Sebastian; Potekhin, Alexey; Rautian, Maria; Przyboś, Ewa

    2012-05-01

    This is the first phylogenetic study of the intraspecific variability within Paramecium multimicronucleatum with the application of two-loci analysis (ITS1-5.8S-ITS2-5'LSU rDNA and COI mtDNA) carried out on numerous strains originated from different continents. The species has been shown to have a complex structure of several sibling species within taxonomic species. Our analysis revealed the existence of 10 haplotypes for the rDNA fragment and 15 haplotypes for the COI fragment in the studied material. The mean distance for all of the studied P. multimicronucleatum sequence pairs was p=0.025/0.082 (rDNA/COI). Despite the greater variation of the COI fragment, the COI-derived tree topology is similar to the tree topology constructed on the basis of the rDNA fragment. P. multimicronucleatum strains are divided into three main clades. The tree based on COI fragment analysis presents a greater resolution of the studied P. multimicronucleatum strains. Our results indicate that the strains of P. multimicronucleatum that appear in different clades on the trees could belong to different syngens. Copyright © 2012 Elsevier Inc. All rights reserved.

  5. Gravimetric phenotyping of whole plant transpiration responses to atmospheric vapour pressure deficit identifies genotypic variation in water use efficiency.

    Science.gov (United States)

    Ryan, Annette C; Dodd, Ian C; Rothwell, Shane A; Jones, Ros; Tardieu, Francois; Draye, Xavier; Davies, William J

    2016-10-01

    There is increasing interest in rapidly identifying genotypes with improved water use efficiency, exemplified by the development of whole plant phenotyping platforms that automatically measure plant growth and water use. Transpirational responses to atmospheric vapour pressure deficit (VPD) and whole plant water use efficiency (WUE, defined as the accumulation of above ground biomass per unit of water used) were measured in 100 maize (Zea mays L.) genotypes. Using a glasshouse based phenotyping platform with naturally varying VPD (1.5-3.8kPa), a 2-fold variation in WUE was identified in well-watered plants. Regression analysis of transpiration versus VPD under these conditions, and subsequent whole plant gas exchange at imposed VPDs (0.8-3.4kPa) showed identical responses in specific genotypes. Genotype response of transpiration versus VPD fell into two categories: 1) a linear increase in transpiration rate with VPD with low (high WUE) or high (low WUE) transpiration rate at all VPDs, 2) a non-linear response with a pronounced change point at low VPD (high WUE) or high VPD (low WUE). In the latter group, high WUE genotypes required a significantly lower VPD before transpiration was restricted, and had a significantly lower rate of transpiration in response to VPD after this point, when compared to low WUE genotypes. Change point values were significantly positively correlated with stomatal sensitivity to VPD. A change point in stomatal response to VPD may explain why some genotypes show contradictory WUE rankings according to whether they are measured under glasshouse or field conditions. Furthermore, this novel use of a high throughput phenotyping platform successfully reproduced the gas exchange responses of individuals measured in whole plant chambers, accelerating the identification of plants with high WUE. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.

  6. Identify and Quantify the Mechanistic Sources of Sensor Performance Variation Between Individual Sensors SN1 and SN2

    Energy Technology Data Exchange (ETDEWEB)

    Diaz, Aaron A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Baldwin, David L. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Cinson, Anthony D. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Jones, Anthony M. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Larche, Michael R. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Mathews, Royce [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Mullen, Crystal A. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Pardini, Allan F. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Posakony, Gerald J. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Prowant, Matthew S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Hartman, Trenton S. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States); Edwards, Matthew K. [Pacific Northwest National Lab. (PNNL), Richland, WA (United States)

    2014-08-06

    This Technical Letter Report satisfies the M3AR-14PN2301022 milestone, and is focused on identifying and quantifying the mechanistic sources of sensor performance variation between individual 22-element, linear phased-array sensor prototypes, SN1 and SN2. This effort constitutes an iterative evolution that supports the longer term goal of producing and demonstrating a pre-manufacturing prototype ultrasonic probe that possesses the fundamental performance characteristics necessary to enable the development of a high-temperature sodium-cooled fast reactor inspection system. The scope of the work for this portion of the PNNL effort conducted in FY14 includes performing a comparative evaluation and assessment of the performance characteristics of the SN1 and SN2 22 element PA-UT probes manufactured at PNNL. Key transducer performance parameters, such as sound field dimensions, resolution capabilities, frequency response, and bandwidth are used as a metric for the comparative evaluation and assessment of the SN1 and SN2 engineering test units.

  7. Methods for identifying SNP interactions: a review on variations of Logic Regression, Random Forest and Bayesian logistic regression.

    Science.gov (United States)

    Chen, Carla Chia-Ming; Schwender, Holger; Keith, Jonathan; Nunkesser, Robin; Mengersen, Kerrie; Macrossan, Paula

    2011-01-01

    Due to advancements in computational ability, enhanced technology and a reduction in the price of genotyping, more data are being generated for understanding genetic associations with diseases and disorders. However, with the availability of large data sets comes the inherent challenges of new methods of statistical analysis and modeling. Considering a complex phenotype may be the effect of a combination of multiple loci, various statistical methods have been developed for identifying genetic epistasis effects. Among these methods, logic regression (LR) is an intriguing approach incorporating tree-like structures. Various methods have built on the original LR to improve different aspects of the model. In this study, we review four variations of LR, namely Logic Feature Selection, Monte Carlo Logic Regression, Genetic Programming for Association Studies, and Modified Logic Regression-Gene Expression Programming, and investigate the performance of each method using simulated and real genotype data. We contrast these with another tree-like approach, namely Random Forests, and a Bayesian logistic regression with stochastic search variable selection.

  8. Typing of Panton-Valentine Leukocidin-encoding Phages and lukSF-PV Gene Sequence Variation in Staphylococcus aureus from China

    Directory of Open Access Journals (Sweden)

    Huanqiang Zhao

    2016-08-01

    Full Text Available Panton-Valentine leucocidin (PVL, encoded by lukSF-PV genes, a bi-component and pore-forming toxin, is carried by different staphylococcal bacteriophages. The prevalence of PVL in Staphylococcus aureus (S. aureus have been reported around the globe. However, the data on PVL-encoding phage types, lukSF-PV gene variation and chromosomal phage insertion sites for PVL-positive S. aureus are limited, especially in China. In order to obtain a more complete understanding of the molecular epidemiology of PVL-positive S. aureus, an integrated and modified PCR-based scheme was applied to detect the PVL-encoding phage types. Phage insertion locus and the lukSF-PV variant were determined by PCR and sequencing. Meanwhile, the genetic background was characterized by staphylococcal cassette chromosome mec (SCCmec typing, staphylococcal protein A (spa gene polymorphisms typing, pulsed-field gel electrophoresis (PFGE typing, accessory gene regulator (agr locus typing and multilocus sequence typing (MLST. Seventy eight (78/1175, 6.6% isolates possessed the lukSF-PV genes and 59.0% (46/78 of PVL-positive strains belonged to CC59 lineage. Eight known different PVL-encoding phage types were detected, and Φ7247PVL/ΦST5967PVL (n=13 and ΦPVL (n=12 were the most prevalent among them. While 25 (25/78, 32.1% isolates, belonging to ST30 and ST59 clones, were unable to be typed by the modified PCR-based scheme. Single nucleotide polymorphisms (SNPs were identified at five locations in the lukSF-PV genes, two of which were non-synonymous. Maximum-likelihood tree analysis of attachment sites sequences detected six SNP profiles for attR and eight for attL, respectively. In conclusion, the PVL-positive S. aureus mainly harbored Φ7247PVL/ΦST5967PVL and ΦPVL in the regions studied. lukSF-PV gene sequences, PVL-encoding phages and phage insertion locus generally varied with lineages. Moreover, PVL-positive clones that have emerged worldwide likely carry distinct phages.

  9. Typing of Panton-Valentine Leukocidin-Encoding Phages and lukSF-PV Gene Sequence Variation in Staphylococcus aureus from China.

    Science.gov (United States)

    Zhao, Huanqiang; Hu, Fupin; Jin, Shu; Xu, Xiaogang; Zou, Yuhan; Ding, Baixing; He, Chunyan; Gong, Fang; Liu, Qingzhong

    2016-01-01

    Panton-Valentine leukocidin (PVL, encoded by lukSF-PV genes), a bi-component and pore-forming toxin, is carried by different staphylococcal bacteriophages. The prevalence of PVL in Staphylococcus aureus has been reported around the globe. However, the data on PVL-encoding phage types, lukSF-PV gene variation and chromosomal phage insertion sites for PVL-positive S. aureus are limited, especially in China. In order to obtain a more complete understanding of the molecular epidemiology of PVL-positive S. aureus, an integrated and modified PCR-based scheme was applied to detect the PVL-encoding phage types. Phage insertion locus and the lukSF-PV variant were determined by PCR and sequencing. Meanwhile, the genetic background was characterized by staphylococcal cassette chromosome mec (SCCmec) typing, staphylococcal protein A (spa) gene polymorphisms typing, pulsed-field gel electrophoresis (PFGE) typing, accessory gene regulator (agr) locus typing and multilocus sequence typing (MLST). Seventy eight (78/1175, 6.6%) isolates possessed the lukSF-PV genes and 59.0% (46/78) of PVL-positive strains belonged to CC59 lineage. Eight known different PVL-encoding phage types were detected, and Φ7247PVL/ΦST5967PVL (n = 13) and ΦPVL (n = 12) were the most prevalent among them. While 25 (25/78, 32.1%) isolates, belonging to ST30, and ST59 clones, were unable to be typed by the modified PCR-based scheme. Single nucleotide polymorphisms (SNPs) were identified at five locations in the lukSF-PV genes, two of which were non-synonymous. Maximum-likelihood tree analysis of attachment sites sequences detected six SNP profiles for attR and eight for attL, respectively. In conclusion, the PVL-positive S. aureus mainly harbored Φ7247PVL/ΦST5967PVL and ΦPVL in the regions studied. lukSF-PV gene sequences, PVL-encoding phages, and phage insertion locus generally varied with lineages. Moreover, PVL-positive clones that have emerged worldwide likely carry distinct phages.

  10. Sequence variations in C9orf72 downstream of the hexanucleotide repeat region and its effect on repeat-primed PCR interpretation

    DEFF Research Database (Denmark)

    Nordin, Angelica; Akimoto, Chizuru; Wuolikainen, Anna

    2017-01-01

    A large GGGGCC-repeat expansion mutation (HREM) in C9orf72 is the most common known cause of ALS and FTD in European populations. Sequence variations immediately downstream of the HREM region have previously been observed and have been suggested to be one reason for difficulties in interpreting R...

  11. Deep Ion Torrent sequencing identifies soil fungal community shifts after frequent prescribed fires in a southeastern US forest ecosystem.

    Science.gov (United States)

    Brown, Shawn P; Callaham, Mac A; Oliver, Alena K; Jumpponen, Ari

    2013-12-01

    Prescribed burning is a common management tool to control fuel loads, ground vegetation, and facilitate desirable game species. We evaluated soil fungal community responses to long-term prescribed fire treatments in a loblolly pine forest on the Piedmont of Georgia and utilized deep Internal Transcribed Spacer Region 1 (ITS1) amplicon sequencing afforded by the recent Ion Torrent Personal Genome Machine (PGM). These deep sequence data (19,000 + reads per sample after subsampling) indicate that frequent fires (3-year fire interval) shift soil fungus communities, whereas infrequent fires (6-year fire interval) permit system resetting to a state similar to that without prescribed fire. Furthermore, in nonmetric multidimensional scaling analyses, primarily ectomycorrhizal taxa were correlated with axes associated with long fire intervals, whereas soil saprobes tended to be correlated with the frequent fire recurrence. We conclude that (1) multiplexed Ion Torrent PGM analyses allow deep cost effective sequencing of fungal communities but may suffer from short read lengths and inconsistent sequence quality adjacent to the sequencing adaptor; (2) frequent prescribed fires elicit a shift in soil fungal communities; and (3) such shifts do not occur when fire intervals are longer. Our results emphasize the general responsiveness of these forests to management, and the importance of fire return intervals in meeting management objectives. © 2013 Federation of European Microbiological Societies. Published by John Wiley & Sons Ltd. All rights reserved.

  12. Complete genome sequence analysis identifies a new genotype of brassica yellows virus that infects cabbage and radish in China.

    Science.gov (United States)

    Zhang, Xiao-Yan; Xiang, Hai-Ying; Zhou, Cui-Ji; Li, Da-Wei; Yu, Jia-Lin; Han, Cheng-Gui

    2014-08-01

    For brassica yellows virus (BrYV), proposed to be a member of a new polerovirus species, two clearly distinct genotypes (BrYV-A and BrYV-B) have been described. In this study, the complete nucleotide sequences of two BrYV isolates from radish and Chinese cabbage were determined. Sequence analysis suggested that these isolates represent a new genotype, referred to here as BrYV-C. The full-length sequences of the two BrYV-C isolates shared 93.4-94.8 % identity with BrYV-A and BrYV-B. Further phylogenetic analysis showed that the BrYV-C isolates formed a subgroup that was distinct from the BrYV-A and BrYV-B isolates based on all of the proteins except P5.

  13. MicroRNA of the fifth-instar posterior silk gland of silkworm identified by Solexa sequencing

    Directory of Open Access Journals (Sweden)

    Jisheng Li

    2014-12-01

    Full Text Available No special studies have been focused on the microRNA (miRNA in the fifth-instar posterior silk gland of Bombyx mori. Here, using next-generation sequencing, we acquired 93.2 million processed reads from 10 small RNA libraries. In this paper, we tried to thoroughly describe how our dataset generated from deep sequencing which was recently published in BMC genomics. Results showed that our findings are largely enriched silkworm miRNA depository and may benefit us to reveal the miRNA functions in the process of silk production.

  14. High-fidelity target sequencing of individual molecules identified using barcode sequences: de novo detection and absolute quantitation of mutations in plasma cell-free DNA from cancer patients.

    Science.gov (United States)

    Kukita, Yoji; Matoba, Ryo; Uchida, Junji; Hamakawa, Takuya; Doki, Yuichiro; Imamura, Fumio; Kato, Kikuya

    2015-08-01

    Circulating tumour DNA (ctDNA) is an emerging field of cancer research. However, current ctDNA analysis is usually restricted to one or a few mutation sites due to technical limitations. In the case of massively parallel DNA sequencers, the number of false positives caused by a high read error rate is a major problem. In addition, the final sequence reads do not represent the original DNA population due to the global amplification step during the template preparation. We established a high-fidelity target sequencing system of individual molecules identified in plasma cell-free DNA using barcode sequences; this system consists of the following two steps. (i) A novel target sequencing method that adds barcode sequences by adaptor ligation. This method uses linear amplification to eliminate the errors introduced during the early cycles of polymerase chain reaction. (ii) The monitoring and removal of erroneous barcode tags. This process involves the identification of individual molecules that have been sequenced and for which the number of mutations have been absolute quantitated. Using plasma cell-free DNA from patients with gastric or lung cancer, we demonstrated that the system achieved near complete elimination of false positives and enabled de novo detection and absolute quantitation of mutations in plasma cell-free DNA. © The Author 2015. Published by Oxford University Press on behalf of Kazusa DNA Research Institute.

  15. Use of next-generation sequencing to detect LDLR gene copy number variation in familial hypercholesterolemia[S

    Science.gov (United States)

    Iacocca, Michael A.; Wang, Jian; Dron, Jacqueline S.; Robinson, John F.; McIntyre, Adam D.; Cao, Henian

    2017-01-01

    Familial hypercholesterolemia (FH) is a heritable condition of severely elevated LDL cholesterol, caused predominantly by autosomal codominant mutations in the LDL receptor gene (LDLR). In providing a molecular diagnosis for FH, the current procedure often includes targeted next-generation sequencing (NGS) panels for the detection of small-scale DNA variants, followed by multiplex ligation-dependent probe amplification (MLPA) in LDLR for the detection of whole-exon copy number variants (CNVs). The latter is essential because ∼10% of FH cases are attributed to CNVs in LDLR; accounting for them decreases false negative findings. Here, we determined the potential of replacing MLPA with bioinformatic analysis applied to NGS data, which uses depth-of-coverage analysis as its principal method to identify whole-exon CNV events. In analysis of 388 FH patient samples, there was 100% concordance in LDLR CNV detection between these two methods: 38 reported CNVs identified by MLPA were also successfully detected by our NGS method, while 350 samples negative for CNVs by MLPA were also negative by NGS. This result suggests that MLPA can be removed from the routine diagnostic screening for FH, significantly reducing associated costs, resources, and analysis time, while promoting more widespread assessment of this important class of mutations across diagnostic laboratories. PMID:28874442

  16. Isolation of Canine parvovirus with a view to identify the prevalent serotype on the basis of partial sequence analysis

    Directory of Open Access Journals (Sweden)

    Gurpreet Kaur

    2015-01-01

    Full Text Available Aim: The aim of this study was to isolate Canine parvovirus (CPV from suspected dogs on madin darby canine kidney (MDCK cell line and its confirmation by polymerase chain reaction (PCR and nested PCR (NPCR. Further, VP2 gene of the CPV isolates was amplified and sequenced to determine prevailing antigenic type. Materials and Methods: A total of 60 rectal swabs were collected from dogs showing signs of gastroenteritis, processed and subjected to isolation in MDCK cell line. The samples showing cytopathic effects (CPE were confirmed by PCR and NPCR. These samples were subjected to PCR for amplification of VP2 gene of CPV, sequenced and analyzed to study the prevailing antigenic types of CPV. Results: Out of the 60 samples subjected to isolation in MDCK cell line five samples showed CPE in the form of rounding of cells, clumping of cells and finally detachment of the cells. When these samples and the two commercially available vaccines were subjected to PCR for amplification of VP2 gene, a 1710 bp product was amplified. The sequence analysis revealed that the vaccines belonged to the CPV-2 type and the samples were of CPV-2b type. Conclusion: It can be concluded from the present study that out of a total of 60 samples 5 samples exhibited CPE as observed in MDCK cell line. Sequence analysis of the VP2 gene among the samples and vaccine strains revealed that samples belonged to CPV-2b type and vaccines belonging to CPV-2.

  17. Representational difference analysis of Neisseria meningitidis identifies sequences that are specific for the hyper-virulent lineage III clone

    NARCIS (Netherlands)

    Bart, A.; Dankert, J.; van der Ende, A.

    2000-01-01

    Neisseria meningitidis may cause meningitis and septicemia. Since the early 1980s, an increased incidence of meningococcal disease has been caused by the lineage III clone in many countries in Europe and in New Zealand. We hypothesized that lineage III meningococci have specific DNA sequences,

  18. Isolation of Canine parvovirus with a view to identify the prevalent serotype on the basis of partial sequence analysis.

    Science.gov (United States)

    Kaur, Gurpreet; Chandra, Mudit; Dwivedi, P N; Sharma, N S

    2015-01-01

    The aim of this study was to isolate Canine parvovirus (CPV) from suspected dogs on madin darby canine kidney (MDCK) cell line and its confirmation by polymerase chain reaction (PCR) and nested PCR (NPCR). Further, VP2 gene of the CPV isolates was amplified and sequenced to determine prevailing antigenic type. A total of 60 rectal swabs were collected from dogs showing signs of gastroenteritis, processed and subjected to isolation in MDCK cell line. The samples showing cytopathic effects (CPE) were confirmed by PCR and NPCR. These samples were subjected to PCR for amplification of VP2 gene of CPV, sequenced and analyzed to study the prevailing antigenic types of CPV. Out of the 60 samples subjected to isolation in MDCK cell line five samples showed CPE in the form of rounding of cells, clumping of cells and finally detachment of the cells. When these samples and the two commercially available vaccines were subjected to PCR for amplification of VP2 gene, a 1710 bp product was amplified. The sequence analysis revealed that the vaccines belonged to the CPV-2 type and the samples were of CPV-2b type. It can be concluded from the present study that out of a total of 60 samples 5 samples exhibited CPE as observed in MDCK cell line. Sequence analysis of the VP2 gene among the samples and vaccine strains revealed that samples belonged to CPV-2b type and vaccines belonging to CPV-2.

  19. Impact of variations in fatty liver on sonographic detection of focal hepatic lesions originally identified by CT

    International Nuclear Information System (INIS)

    Wu, Size; Tu, Rong; Nan, Ruixia; Liu, Guang Qing; Cui, Xiao Jing; Liang, Xian

    2016-01-01

    The aim of this study was to investigate the influence of variations in fatty liver on the ultrasonographic detection of focal liver lesions. A total of 229 patients with varying degrees of fatty liver and focal liver lesions and 200 patients with focal liver lesions but no fatty liver were randomly selected for inclusion in groups I and II, respectively. Findings of focal liver lesions identified on computed tomography were taken as the reference, and findings on ultrasonography were compared with them. The number of focal liver lesions in groups I and II were 501 and 413, respectively. The ultrasonographic detection rates of focal liver lesions in groups I and II were 86.8% (435/501) and 94.2% (389/413), respectively. Comparison of the detection of the focal lesions between patients with and without fatty liver or different grades of fatty liver were as follows: mild fatty liver (162/177) vs. liver without fat infiltration (389/413) (P=0.277); mild fatty liver (162/177) vs. moderate fatty liver (190/212) (P=0.604); mild fatty liver (162/177) vs. severe fatty liver (83/112) (P<0.001); moderate fatty liver (190/212) vs. liver without fat infiltration (389/413) (P=0.051); moderate fatty liver (190/212) vs. severe fatty liver (83/112) (P<0.001); severe fatty liver (83/112) vs. liver without fat infiltration (389/413) (P<0.001); and fatty liver (435/501) vs. liver without fat infiltration (389/413) (P<0.001). Mild and moderate fatty liver are not significantly associated with the visualization of the lesion, while severe fatty liver usually impairs the detection of focal lesions in the liver. If a patient with severe fatty liver is suspected to have a liver tumor, ultrasonography should only be chosen cautiously in case of a missed diagnosis

  20. Regional variation in identified cancer care needs of early-career oncologists in China, India, and Pakistan.

    Science.gov (United States)

    Lyerly, H Kim; Fawzy, Maria R; Aziz, Zeba; Nair, Reena; Pramesh, C S; Parmar, Vani; Parikh, Purvish M; Jamal, Rozmin; Irumnaz, Azizunissa; Ren, Jun; Stockler, Martin R; Abernethy, Amy P

    2015-05-01

    Cancer incidence and mortality is increasing in the developing world. Inequities between low-, middle-, and high-income countries affect disease burden and the infrastructure needs in response to cancer. We surveyed early-career oncologists attending workshops in clinical research in three countries with emerging economies about their perception of the evolving cancer burden. A cross-sectional survey questionnaire was distributed at clinical trial concept development workshops held in Beijing, Lahore, Karachi, and Mumbai at major hospitals to acquire information regarding home-country health conditions and needs. A total of 100 respondents participated in the workshops held at major hospitals in the region (India = 29, China = 25, Pakistan = 42, and other = 4). Expected consensus on many issues (e.g., emergence of cancer as a significant health issue) was balanced with significant variation in priorities, opportunities, and challenges. Chinese respondents prioritized improvements in cancer-specific care and palliative care, Indian respondents favored improved cancer detection and advancing research in cancer care, and Pakistani respondents prioritized awareness of cancer and improvements in disease detection and cancer care research. For all, the most frequently cited opportunity was help in improving professional cancer education and training. Predominantly early-career oncologists attending clinical research workshops (in China, India, and Pakistan) identified needs for increasing clinical cancer research, professional education, and public awareness of cancer. Decision makers supporting efforts to reduce the burden of cancer worldwide will need to factor the specific needs and aspirations of health care providers in their country in prioritizing health policies and budgets. ©AlphaMed Press.

  1. Impact of variations in fatty liver on sonographic detection of focal hepatic lesions originally identified by CT

    Energy Technology Data Exchange (ETDEWEB)

    Wu, Size; Tu, Rong; Nan, Ruixia; Liu, Guang Qing; Cui, Xiao Jing; Liang, Xian [Affiliated Hospital of Hainan Medical College, Haikou (China)

    2016-01-15

    The aim of this study was to investigate the influence of variations in fatty liver on the ultrasonographic detection of focal liver lesions. A total of 229 patients with varying degrees of fatty liver and focal liver lesions and 200 patients with focal liver lesions but no fatty liver were randomly selected for inclusion in groups I and II, respectively. Findings of focal liver lesions identified on computed tomography were taken as the reference, and findings on ultrasonography were compared with them. The number of focal liver lesions in groups I and II were 501 and 413, respectively. The ultrasonographic detection rates of focal liver lesions in groups I and II were 86.8% (435/501) and 94.2% (389/413), respectively. Comparison of the detection of the focal lesions between patients with and without fatty liver or different grades of fatty liver were as follows: mild fatty liver (162/177) vs. liver without fat infiltration (389/413) (P=0.277); mild fatty liver (162/177) vs. moderate fatty liver (190/212) (P=0.604); mild fatty liver (162/177) vs. severe fatty liver (83/112) (P<0.001); moderate fatty liver (190/212) vs. liver without fat infiltration (389/413) (P=0.051); moderate fatty liver (190/212) vs. severe fatty liver (83/112) (P<0.001); severe fatty liver (83/112) vs. liver without fat infiltration (389/413) (P<0.001); and fatty liver (435/501) vs. liver without fat infiltration (389/413) (P<0.001). Mild and moderate fatty liver are not significantly associated with the visualization of the lesion, while severe fatty liver usually impairs the detection of focal lesions in the liver. If a patient with severe fatty liver is suspected to have a liver tumor, ultrasonography should only be chosen cautiously in case of a missed diagnosis.

  2. A comprehensive survey of sequence variation in the ABCA4 (ABCR) gene in Stargardt disease and age-related macular degeneration.

    Science.gov (United States)

    Rivera, A; White, K; Stöhr, H; Steiner, K; Hemmrich, N; Grimm, T; Jurklies, B; Lorenz, B; Scholl, H P; Apfelstedt-Sylla, E; Weber, B H

    2000-10-01

    Stargardt disease (STGD) is a common autosomal recessive maculopathy of early and young-adult onset and is caused by alterations in the gene encoding the photoreceptor-specific ATP-binding cassette (ABC) transporter (ABCA4). We have studied 144 patients with STGD and 220 unaffected individuals ascertained from the German population, to complete a comprehensive, population-specific survey of the sequence variation in the ABCA4 gene. In addition, we have assessed the proposed role for ABCA4 in age-related macular degeneration (AMD), a common cause of late-onset blindness, by studying 200 affected individuals with late-stage disease. Using a screening strategy based primarily on denaturing gradient gel electrophoresis, we have identified in the three study groups a total of 127 unique alterations, of which 90 have not been previously reported, and have classified 72 as probable pathogenic mutations. Of the 288 STGD chromosomes studied, mutations were identified in 166, resulting in a detection rate of approximately 58%. Eight different alleles account for 61% of the identified disease alleles, and at least one of these, the L541P-A1038V complex allele, appears to be a founder mutation in the German population. When the group with AMD and the control group were analyzed with the same methodology, 18 patients with AMD and 12 controls were found to harbor possible disease-associated alterations. This represents no significant difference between the two groups; however, for detection of modest effects of rare alleles in complex diseases, the analysis of larger cohorts of patients may be required.

  3. Criteria for confirming sequence periodicity identified by Fourier transform analysis: application to GCR2, a candidate plant GPCR?

    Science.gov (United States)

    Illingworth, Christopher J R; Parkes, Kevin E; Snell, Christopher R; Mullineaux, Philip M; Reynolds, Christopher A

    2008-03-01

    Methods to determine periodicity in protein sequences are useful for inferring function. Fourier transformation is one approach but care is required to ensure the periodicity is genuine. Here we have shown that empirically-derived statistical tables can be used as a measure of significance. Genuine protein sequences data rather than randomly generated sequences were used as the statistical backdrop. The method has been applied to G-protein coupled receptor (GPCR) sequences, by Fourier transformation of hydrophobicity values, codon frequencies and the extent of over-representation of codon pairs; the latter being related to translational step times. Genuine periodicity was observed in the hydrophobicity whereas the apparent periodicity (as inferred from previously reported measures) in the translation step times was not validated statistically. GCR2 has recently been proposed as the plant GPCR receptor for the hormone abscisic acid. It has homology to the Lanthionine synthetase C-like family of proteins, an observation confirmed by fold recognition. Application of the Fourier transform algorithm to the GCR2 family revealed strongly predicted seven fold periodicity in hydrophobicity, suggesting why GCR2 has been reported to be a GPCR, despite negative indications in most transmembrane prediction algorithms. The underlying multiple sequence alignment, also required for the Fourier transform analysis of periodicity, indicated that the hydrophobic regions around the 7 GXXG motifs commence near the C-terminal end of each of the 7 inner helices of the alpha-toroid and continue to the N-terminal region of the helix. The results clearly explain why GCR2 has been understandably but erroneously predicted to be a GPCR.